DevelopmentML#

class chainladder.DevelopmentML(estimator_ml=None, y_ml=None, autoregressive=False, weighted_step=None, drop=None, drop_valuation=None, fit_incrementals=True)[source]#

Interface to scikit-learn estimators for loss development patterns.

DevelopmentML lets reserving workflows use any sklearn-compatible regressor (often inside a Pipeline). It converts a Triangle to a tabular design matrix, fits the ML model, predicts through the terminal development age to complete the lower triangle, and expresses the result as ldf_ for tails and IBNR methods. TweedieGLM is a special case with TweedieRegressor as the only ML step.

Added in version 0.8.1.

Parameters:
estimator_ml: sklearn Estimator

Any sklearn compatible regression estimator, including Pipelines and

y_ml: list or str or sklearn_transformer

The response column(s) for the machine learning algorithm. It must be present within the Triangle.

autoregressive: list of tuple

Each tuple is (feature_name, lag, source_column). feature_name must also appear in the pipeline design matrix. DevelopmentML fills that column with lagged source_column values and, when projecting forward, replaces it with the prior development period’s prediction. Lags should be negative integers (for example -12 on a monthly triangle is one year).

weight_step: str

Step name within estimator_ml that is weighted

drop: tuple or list of tuples

Drops specific origin/development combination(s)

drop_valuation: str or list of str (default = None)

Drops specific valuation periods. str must be date convertible.

fit_incrementals:

Whether the response variable should be converted to an incremental basis for fitting.

Attributes:
estimator_ml: Estimator

An sklearn-style estimator to predict development patterns

ldf_: Triangle

The estimated loss development patterns.

cdf_: Triangle

The estimated cumulative development patterns.

Examples

On multi-LOB triangles such as clrd, an actuary may want a single model fit to all lines at once while letting each line keep its own development pattern. This is the same business problem as the multi-LOB example in TweedieGLM; DevelopmentML generalizes it to any sklearn estimator. Features from any triangle axis can enter an sklearn ColumnTransformer or Pipeline: here, one-hot-encode LOB and development, pass origin through, and fit the same RandomForestRegressor the user guide uses. random_state pins the forest so the fitted pattern is reproducible.

import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

clrd = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"]
design_matrix = ColumnTransformer(
    transformers=[
        ("dummy", OneHotEncoder(drop="first"), ["LOB", "development"]),
        ("passthrough", "passthrough", ["origin"]),
    ]
)
estimator_ml = Pipeline(
    steps=[
        ("design_matrix", design_matrix),
        ("model", RandomForestRegressor(random_state=42)),
    ]
)
m = cl.DevelopmentML(estimator_ml=estimator_ml, y_ml="CumPaidLoss").fit(
    clrd
)
print(m.ldf_.shape)
print(np.round(m.ldf_.values[0, 0, 0, :], 4))
(6, 1, 10, 9)
[2.5988 1.412  1.2037 1.11   1.0701 1.0434 1.0272 1.0366 1.0714]

An actuary who wants to test whether fitting on incremental versus cumulative losses changes the implied development can toggle fit_incrementals and compare the full ldf_ patterns. A log-link Poisson GLM (the same specification TweedieGLM uses) keeps the fitted values positive, so the implied factors stay above 1.0 on either basis. The two bases agree closely here, diverging modestly at the oldest ages.

import numpy as np
from sklearn.linear_model import TweedieRegressor
from sklearn.pipeline import Pipeline

from chainladder.utils.utility_functions import PatsyFormula

tri = cl.load_sample("ukmotor")
pipe = Pipeline(
    steps=[
        ("design_matrix", PatsyFormula("C(development) + C(origin)")),
        ("model", TweedieRegressor(
            power=1, link="log", fit_intercept=False
        )),
    ]
)
m_incr = cl.DevelopmentML(
    pipe, y_ml=[tri.columns[0]], fit_incrementals=True
).fit(tri)
m_cum = cl.DevelopmentML(
    pipe, y_ml=[tri.columns[0]], fit_incrementals=False
).fit(tri)
print(np.round(m_incr.ldf_.values[0, 0, 0, :], 4))
print(np.round(m_cum.ldf_.values[0, 0, 0, :], 4))
[1.904  1.2858 1.1493 1.0989 1.0537 1.0333]
[1.892  1.2822 1.1453 1.1033 1.0547 1.0457]
fit(X, y=None, sample_weight=None)[source]#

Fit the model with X.

Parameters:
XTriangle-like

Set of LDFs to which the estimator will be applied.

yNone

Ignored, use y_ml to set a reponse variable for the ML algorithm

sample_weightTriangle-like

Weights to use in the regression

Returns:
selfobject

Returns the instance itself.

transform(X)[source]#

If X and self are of different shapes, align self to X, else return self.

Parameters:
XTriangle

The triangle to be transformed

Returns:
X_newNew triangle with transformed attributes.

Inherited Methods

DevelopmentML.fit_transform

Fit to data, then transform it.

DevelopmentML.get_metadata_routing

Get metadata routing of this object.

DevelopmentML.get_params

Get parameters for this estimator.

DevelopmentML.pipe

Apply func(self, *args, **kwargs).

DevelopmentML.set_backend

Converts triangle array_backend.

DevelopmentML.set_output

Set output container.

DevelopmentML.set_params

Set the parameters of this estimator.

DevelopmentML.to_json

Serializes triangle object to json format

DevelopmentML.to_pickle

Serializes triangle object to pickle.