DevelopmentML#
- class chainladder.DevelopmentML(estimator_ml=None, y_ml=None, autoregressive=False, weighted_step=None, drop=None, drop_valuation=None, fit_incrementals=True)[source]#
Interface to scikit-learn estimators for loss development patterns.
DevelopmentMLlets reserving workflows use any sklearn-compatible regressor (often inside aPipeline). It converts aTriangleto a tabular design matrix, fits the ML model, predicts through the terminal development age to complete the lower triangle, and expresses the result asldf_for tails and IBNR methods.TweedieGLMis a special case withTweedieRegressoras the only ML step.Added in version 0.8.1.
- Parameters:
- estimator_ml: sklearn Estimator
Any sklearn compatible regression estimator, including Pipelines and
- y_ml: list or str or sklearn_transformer
The response column(s) for the machine learning algorithm. It must be present within the Triangle.
- autoregressive: list of tuple
Each tuple is
(feature_name, lag, source_column).feature_namemust also appear in the pipeline design matrix.DevelopmentMLfills that column with laggedsource_columnvalues and, when projecting forward, replaces it with the prior development period’s prediction. Lags should be negative integers (for example-12on a monthly triangle is one year).- weight_step: str
Step name within estimator_ml that is weighted
- drop: tuple or list of tuples
Drops specific origin/development combination(s)
- drop_valuation: str or list of str (default = None)
Drops specific valuation periods. str must be date convertible.
- fit_incrementals:
Whether the response variable should be converted to an incremental basis for fitting.
- Attributes:
- estimator_ml: Estimator
An sklearn-style estimator to predict development patterns
- ldf_: Triangle
The estimated loss development patterns.
- cdf_: Triangle
The estimated cumulative development patterns.
Examples
On multi-LOB triangles such as
clrd, an actuary may want a single model fit to all lines at once while letting each line keep its own development pattern. This is the same business problem as the multi-LOB example inTweedieGLM;DevelopmentMLgeneralizes it to any sklearn estimator. Features from any triangle axis can enter an sklearnColumnTransformerorPipeline: here, one-hot-encodeLOBanddevelopment, passoriginthrough, and fit the sameRandomForestRegressorthe user guide uses.random_statepins the forest so the fitted pattern is reproducible.import numpy as np from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestRegressor from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder clrd = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"] design_matrix = ColumnTransformer( transformers=[ ("dummy", OneHotEncoder(drop="first"), ["LOB", "development"]), ("passthrough", "passthrough", ["origin"]), ] ) estimator_ml = Pipeline( steps=[ ("design_matrix", design_matrix), ("model", RandomForestRegressor(random_state=42)), ] ) m = cl.DevelopmentML(estimator_ml=estimator_ml, y_ml="CumPaidLoss").fit( clrd ) print(m.ldf_.shape) print(np.round(m.ldf_.values[0, 0, 0, :], 4))
(6, 1, 10, 9) [2.5988 1.412 1.2037 1.11 1.0701 1.0434 1.0272 1.0366 1.0714]
An actuary who wants to test whether fitting on incremental versus cumulative losses changes the implied development can toggle
fit_incrementalsand compare the fullldf_patterns. A log-link Poisson GLM (the same specificationTweedieGLMuses) keeps the fitted values positive, so the implied factors stay above 1.0 on either basis. The two bases agree closely here, diverging modestly at the oldest ages.import numpy as np from sklearn.linear_model import TweedieRegressor from sklearn.pipeline import Pipeline from chainladder.utils.utility_functions import PatsyFormula tri = cl.load_sample("ukmotor") pipe = Pipeline( steps=[ ("design_matrix", PatsyFormula("C(development) + C(origin)")), ("model", TweedieRegressor( power=1, link="log", fit_intercept=False )), ] ) m_incr = cl.DevelopmentML( pipe, y_ml=[tri.columns[0]], fit_incrementals=True ).fit(tri) m_cum = cl.DevelopmentML( pipe, y_ml=[tri.columns[0]], fit_incrementals=False ).fit(tri) print(np.round(m_incr.ldf_.values[0, 0, 0, :], 4)) print(np.round(m_cum.ldf_.values[0, 0, 0, :], 4))
[1.904 1.2858 1.1493 1.0989 1.0537 1.0333] [1.892 1.2822 1.1453 1.1033 1.0547 1.0457]
- fit(X, y=None, sample_weight=None)[source]#
Fit the model with X.
- Parameters:
- XTriangle-like
Set of LDFs to which the estimator will be applied.
- yNone
Ignored, use y_ml to set a reponse variable for the ML algorithm
- sample_weightTriangle-like
Weights to use in the regression
- Returns:
- selfobject
Returns the instance itself.
Inherited Methods
|
Fit to data, then transform it. |
|
Get metadata routing of this object. |
|
Get parameters for this estimator. |
|
Apply |
|
Converts triangle array_backend. |
|
Set output container. |
|
Set the parameters of this estimator. |
|
Serializes triangle object to json format |
|
Serializes triangle object to pickle. |