Actuarial AI March 2026 20 min read

AI Tools for Actuaries: GLM and Tree-Based Regression

Foundational Concepts and the Strategic Integration of Machine Learning into Actuarial Practice

Artificial intelligence and machine learning concept. Concept of artificial neural networks, neuromorphic computing, machine learning, AI, big data, large language model and neural network.

By Wizard & Co

Artificial intelligence is not replacing actuarial science.
It is extending it.

The methodology underlying modern AI tools is remarkably similar to actuarial science. Both disciplines build statistical models to approximate empirically observed phenomena — without requiring full theoretical explanations of the underlying system.

The real strategic issue is not whether actuaries should adopt AI. It is how to integrate machine learning tools into actuarial practice while preserving:

  • Calibration
  • Governance
  • Interpretability
  • Regulatory integrity

This article outlines the foundational transition from Generalized Linear Models (GLMs) to Tree-Based Regression and Gradient Boosting Machines (GBMs), and explains the motivations for integrating AI into modern actuarial modelling.

The Forecasting Foundation: Conditional Expectation as the Core Predictor

In actuarial science, the most accurate predictor for forecasting is the conditional expected value (mean).

It minimizes mean squared error (MSE) and is motivated by the Law of Large Numbers.

All modern predictive modelling — whether GLM, tree-based, or neural network — ultimately attempts to approximate:

μ*(X) = E[Y | X]

The Conditional Mean

The question is not whether AI changes this objective. It does not. It changes the tools used to approximate it.

Generalized Linear Models (GLMs): The Actuarial Standard

Definition

A Generalized Linear Model (GLM) is a regression model that assumes a linear structure in covariates on a transformed link scale to model the conditional mean of a response variable.

g(μ(X)) = ⟨ϑ, X⟩

Where g is the link function, ϑ are parameters, and X are covariates

Why GLMs Dominate Actuarial Practice

Exact coefficient interpretation
Clear multiplicative pricing structure (via log-link)
Strong calibration properties
Alignment with the Exponential Dispersion Family (EDF)
Direct mapping to strictly consistent loss functions

The Balance Property: GLMs with canonical links satisfy Σμ̂(Xᵢ) = ΣYᵢ. This ensures aggregate predictions equal observed totals — essential in pricing and reserving.

Claim Distributions in Actuarial Modelling

Claim Count Distributions

  • Binomial
  • Poisson
  • Negative Binomial

Claim Severity Distributions

  • Gamma
  • Log-normal
  • Inverse Gaussian

These belong to the Exponential Dispersion Family (EDF) — where the cumulant function directly aligns with a strictly consistent loss function for fitting.

The Limitation of Classical GLMs

GLMs impose structural assumptions:

  • Additive or multiplicative effects
  • Manual interaction specification
  • Predefined link structure
  • Explicit feature engineering

As portfolios become higher-dimensional, interaction-heavy, and data-rich, this structure becomes limiting. Tree-based methods address this.

Tree-Based Regression: Recursive Partitioning of Risk

A Regression Tree is a non-parametric model that partitions covariate space into homogeneous subsets to estimate conditional means.

Instead of assuming linearity, tree models:

  • Identify heterogeneous regions
  • Split using standardized binary splits
  • Optimize a strictly consistent loss function
  • Repeat recursively

This naturally captures:

Non-linear effects Threshold behavior High-order interactions

But single trees are unstable — which leads to ensemble methods.

Ensemble Methods in Actuarial AI

1 Bagging (Bootstrap Aggregating)

Reduces variance by averaging independent trees using randomized bootstrap samples.

2 Random Forest

Decorrelates trees via random feature selection, improving stability and reducing overfitting.

3 Gradient Boosting Machines (GBM)

GBMs iteratively add base learners to approximate the negative gradient of a loss function.

Why GBMs are powerful for actuarial tabular data:

  • • Stage-wise bias correction
  • • Strong interaction capture
  • • High predictive accuracy
  • • Efficient implementations (e.g., LightGBM)

GBMs are especially effective for pricing and segmentation tasks.

The Modeler's Dilemma: Interpretability vs Predictive Power

This is central to actuarial AI integration.

Feature GLM Gradient Boosting
Interpretability Exact coefficients Requires SHAP / PDP
Interactions Manual Automatic
Calibration Naturally balanced Requires correction
Governance High transparency Needs explainability layer
Predictive Accuracy Moderate High

Actuaries operate in regulated environments. Predictive accuracy alone is insufficient.

Model Validation: The Workhorse of Actuarial AI

Out-of-sample loss (generalization loss)

The primary validation tool. Model selection must be performed using independent data to avoid in-sample bias.

Validation Methods

Hold-out Sample

Computationally efficient. Splits data into training and validation sets.

K-fold Cross-validation

Data-efficient and uncertainty-aware. Preferable for smaller datasets.

Regularization and Overfitting Control

High-capacity models can adapt to noise. Prevention techniques include:

Early Stopping
LASSO (L1) - Automatic variable selection
Ridge (L2) - Shrink parameters toward zero
Shrinkage in boosting

Regularization prevents unstable pricing structures.

Governance, Calibration, and Balance Correction

Unlike canonical GLMs, neural networks and tree-based models generally do not satisfy the balance property.

Therefore, actuarial AI systems require:

Secondary balance correction Isotonic recalibration Auto-calibration verification

A pricing scheme must be globally unbiased:

E[v·μ(X)] = E[vY]

And locally self-financing.

Explainable AI in Actuarial Practice

Interpretability tools restore governance integrity.

Partial Dependence Plots (PDP)

Visualize average prediction response to one covariate.

SHAP (Shapley Additive Explanations)

Decomposes individual predictions into additive contributions using game-theoretic fairness axioms.

Explainability is not optional. It is required for:

  • • Regulatory reporting
  • • Board oversight
  • • Anti-discrimination assurance
  • • Model risk management

Foundational Motivations for Integrating AI into Actuarial Practice

1

Capturing Complex Interactions

High-dimensional portfolios contain interaction effects that GLMs struggle to model manually.

2

Improving Pure Risk Premium Estimation

Boosting directly optimizes predictive loss functions.

3

Leveraging Unstructured Data

LLMs can extract structured features from claims reports and accident descriptions.

4

Enhancing Fraud Detection

Unsupervised anomaly detection identifies distributional deviations.

5

Improving Segmentation Stability

Ensemble methods reduce variance and improve generalization.

A Structured Integration Framework for Wizard & Co

Wizard & Co advocates a layered approach:

01

Maintain GLM Baseline

Preserve transparent actuarial structure.

02

Apply ML Residual Modelling

Boost GLM residuals (CANN-style integration).

03

Enforce Calibration

Apply isotonic recalibration to restore balance.

04

Add Explainability Layer

Deploy SHAP and PDP for governance transparency.

This preserves actuarial discipline while unlocking predictive gains.

Frequently Asked Questions

The Strategic Reality

AI is not a replacement for actuarial judgment.

It is an expansion of modelling capacity.

The objective remains unchanged:

Produce statistically sound, calibrated, risk-adjusted predictions under uncertainty.

The profession is not moving away from discipline.
It is deepening it.

Further Reading (Selected Sources)

The concepts in this article draw on widely cited actuarial and machine learning references. If you'd like to go deeper:

Wüthrich, M. V., Richman, R., et al. (2026). AI Tools for Actuaries: Course Material. SSRN.

https://ssrn.com/abstract=5162304

Akaike, H. (1974). A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control.

https://doi.org/10.1109/TAC.1974.1100705

Breiman, L., Friedman, J., Olshen, R., Stone, C. (1984). Classification and Regression Trees.

https://doi.org/10.1201/9781315139470

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics.

https://www.jstor.org/stable/2699986

Goldburd, M., Khare, A., Tevet, D. (2020). Generalized Linear Models for Insurance Rating. CAS Monograph.

https://www.casact.org/sites/default/files/2021-01/05-Goldburd-Khare-Tevet.pdf

Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.

https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

Lundberg, S. M., Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions (SHAP). NeurIPS.

https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

Ready to Integrate AI into Your Actuarial Practice?

Our team of experts can help you navigate the transition from GLMs to machine learning while maintaining actuarial discipline.