DevelopmentML

DevelopmentML#

class chainladder.DevelopmentML(estimator_ml=None, y_ml=None, autoregressive=False, weighted_step=None, drop=None, drop_valuation=None, fit_incrementals=True)[source]#

Interface to scikit-learn estimators for loss development patterns.

DevelopmentML lets reserving workflows use any sklearn-compatible regressor (often inside a Pipeline). It converts a Triangle to a tabular design matrix, fits the ML model, predicts through the terminal development age to complete the lower triangle, and expresses the result as ldf_ for tails and IBNR methods. TweedieGLM is a special case with TweedieRegressor as the only ML step.

Added in version 0.8.1.

Parameters:

estimator_ml: sklearn Estimator: Any sklearn compatible regression estimator, including Pipelines and
y_ml: list or str or sklearn_transformer: The response column(s) for the machine learning algorithm. It must be present within the Triangle.
autoregressive: list of tuple: Each tuple is (feature_name, lag, source_column). feature_name must also appear in the pipeline design matrix. DevelopmentML fills that column with lagged source_column values and, when projecting forward, replaces it with the prior development period’s prediction. Lags should be negative integers (for example -12 on a monthly triangle is one year).
weight_step: str: Step name within estimator_ml that is weighted
drop: tuple or list of tuples: Drops specific origin/development combination(s)
drop_valuation: str or list of str (default = None): Drops specific valuation periods. str must be date convertible.
fit_incrementals:: Whether the response variable should be converted to an incremental basis for fitting.

Attributes:

estimator_ml: Estimator: An sklearn-style estimator to predict development patterns
ldf_: Triangle: The estimated loss development patterns.
cdf_: Triangle: The estimated cumulative development patterns.

Examples

On multi-LOB triangles such as clrd, an actuary may want a single model fit to all lines at once while letting each line keep its own development pattern. This is the same business problem as the multi-LOB example in TweedieGLM; DevelopmentML generalizes it to any sklearn estimator. Features from any triangle axis can enter an sklearn ColumnTransformer or Pipeline: here, one-hot-encode LOB and development, pass origin through, and fit the same RandomForestRegressor the user guide uses. random_state pins the forest so the fitted pattern is reproducible.

import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

clrd = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"]
design_matrix = ColumnTransformer(
    transformers=[
        ("dummy", OneHotEncoder(drop="first"), ["LOB", "development"]),
        ("passthrough", "passthrough", ["origin"]),
    ]
)
estimator_ml = Pipeline(
    steps=[
        ("design_matrix", design_matrix),
        ("model", RandomForestRegressor(random_state=42)),
    ]
)
m = cl.DevelopmentML(estimator_ml=estimator_ml, y_ml="CumPaidLoss").fit(
    clrd
)
print(m.ldf_.shape)
print(np.round(m.ldf_.values[0, 0, 0, :], 4))

(6, 1, 10, 9)
[2.5988 1.412  1.2037 1.11   1.0701 1.0434 1.0272 1.0366 1.0714]

An actuary who wants to test whether fitting on incremental versus cumulative losses changes the implied development can toggle fit_incrementals and compare the full ldf_ patterns. A log-link Poisson GLM (the same specification TweedieGLM uses) keeps the fitted values positive, so the implied factors stay above 1.0 on either basis. The two bases agree closely here, diverging modestly at the oldest ages.

import numpy as np
from sklearn.linear_model import TweedieRegressor
from sklearn.pipeline import Pipeline

from chainladder.utils.utility_functions import PatsyFormula

tri = cl.load_sample("ukmotor")
pipe = Pipeline(
    steps=[
        ("design_matrix", PatsyFormula("C(development) + C(origin)")),
        ("model", TweedieRegressor(
            power=1, link="log", fit_intercept=False
        )),
    ]
)
m_incr = cl.DevelopmentML(
    pipe, y_ml=[tri.columns[0]], fit_incrementals=True
).fit(tri)
m_cum = cl.DevelopmentML(
    pipe, y_ml=[tri.columns[0]], fit_incrementals=False
).fit(tri)
print(np.round(m_incr.ldf_.values[0, 0, 0, :], 4))
print(np.round(m_cum.ldf_.values[0, 0, 0, :], 4))

[1.904  1.2858 1.1493 1.0989 1.0537 1.0333]
[1.892  1.2822 1.1453 1.1033 1.0547 1.0457]

fit(X, y=None, sample_weight=None)[source]#

Fit the model with X.

Parameters:

XTriangle-like: Set of LDFs to which the estimator will be applied.
yNone: Ignored, use y_ml to set a reponse variable for the ML algorithm
sample_weightTriangle-like: Weights to use in the regression

Returns:

selfobject: Returns the instance itself.

transform(X)[source]#

If X and self are of different shapes, align self to X, else return self.

Parameters:

XTriangle: The triangle to be transformed

Returns:

X_newNew triangle with transformed attributes.

Inherited Methods

`DevelopmentML.fit_transform`	Fit to data, then transform it.
`DevelopmentML.get_metadata_routing`	Get metadata routing of this object.
`DevelopmentML.get_params`	Get parameters for this estimator.
`DevelopmentML.pipe`	Apply `func(self, args, *kwargs)`.
`DevelopmentML.set_backend`	Converts triangle array_backend.
`DevelopmentML.set_output`	Set output container.
`DevelopmentML.set_params`	Set the parameters of this estimator.
`DevelopmentML.to_json`	Serializes triangle object to json format
`DevelopmentML.to_pickle`	Serializes triangle object to pickle.

DevelopmentML

Contents

DevelopmentML#