Development Tutorial

Development Tutorial#

Getting Started#

This tutorial focuses on selecting the development factors.

Be sure to make sure your packages are updated. For more info on how to update your pakages, visit Keeping Packages Updated.

# Black linter, optional
%load_ext lab_black

import pandas as pd
import numpy as np
import chainladder as cl

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)

pandas: 2.1.4
numpy: 1.24.3
chainladder: 0.8.18

Disclaimer#

Note that a lot of the examples shown might not be applicable in a real world scenario, and is only meant to demonstrate some of the functionalities included in the package. The user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement.

Testing for Violation of Chain Ladder’s Assumptions#

The chain ladder method is based on the strong assumptions of independence across origin periods and across valuation periods. Mack developed tests to verify if these assumptions hold, and these tests have been implemented in the chainladder package.

Before the chain ladder model can be used, we should verify that the data satisfies the underlying assumptions using tests at the desired confidence interval level. If assumptions are violated, we should consider if ultimates can be estimated using other models.

There are two main tests that we need to perform:

The valuation_correlation test:
- This test tests for the assumption of independence of accident years. In fact, it tests for correlation across calendar periods (diagonals), and by extension, origin periods (rows).
- An additional parameter, total, can be passed, depending on if we want to calculate valuation correlation in total across all origins (True), or for each origin separately (False).
- The test uses Z-statistic.
The development_correlation test:
- This test tests for the assumption of independence of the chain ladder method that assumes that subsequent development factors are not correlated (columns).
- The test uses T-statistic.

raa = cl.load_sample("raa")
print(
    "Are valuation years correlated? Or, are the origins correlated?",
    raa.valuation_correlation(p_critical=0.1, total=True).z_critical.values,
)
print(
    "Are development periods coorelated?",
    raa.development_correlation(p_critical=0.5).t_critical.values,
)

Are valuation years correlated? Or, are the origins correlated? [[False]]
Are development periods coorelated? [[ True]]

/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1217: RuntimeWarning: All-NaN slice encountered
  return function_base._ureduce(a, func=_nanmedian, keepdims=keepdims,

The above tests show that the raa triangle is independent in both cases, suggesting that there is no evidence that the chain ladder model is not an appropriate method to develop the ultimate amounts. It is suggested to review Mack’s papers to ensure a proper understanding of the methodology and the choice of p_critical.

Mack also demonstrated that we can test for valuation years’ correlation. To test for each valuation year’s correlation individually, we set total to False.

raa.valuation_correlation(p_critical=0.1, total=False).z_critical

/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1217: RuntimeWarning: All-NaN slice encountered
  return function_base._ureduce(a, func=_nanmedian, keepdims=keepdims,

	1982	1983	1984	1985	1986	1987	1988	1989	1990
1981	False	False	False	False	False	False	False	False	False

Note that the tests are run on the entire 4 dimensions of the triangle.

Estimator Basics#

All development methods follow the sklearn estimator API. These estimators have a few properties that are worth getting used to.

We instantiate the estimator with your choice of assumptions. In the case where we don’t opt for any assumptions, defaults are chosen for you.

At this point, we’ve chosen an estimator and assumptions (even if default) but we have not shown our estimator a Triangle. At this point it is merely instructions on how to fit development patterns, but no patterns exist as of yet.

All estimators have a fit method and you can pass a triangle to your estimator. Let’s fit a Triangle in a Development estimator. Let’s also assign the estimator to a variable so we can reference attributes about it.

genins = cl.load_sample("genins")
dev = cl.Development().fit(genins)

Now that we have fit a Development estimator, it has many additional properties that didn’t exist before fitting. For example, we can view the ldf_

dev.ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

We can view the cdf_

dev.cdf_

	12-Ult	24-Ult	36-Ult	48-Ult	60-Ult	72-Ult	84-Ult	96-Ult	108-Ult
(All)	14.4466	4.1387	2.3686	1.6252	1.3845	1.2543	1.1547	1.0956	1.0177

We can also convert between LDFs and CDFs using incr_to_cum() and cum_to_incr() similar to triangles.

dev.ldf_.incr_to_cum()

	12-Ult	24-Ult	36-Ult	48-Ult	60-Ult	72-Ult	84-Ult	96-Ult	108-Ult
(All)	14.4466	4.1387	2.3686	1.6252	1.3845	1.2543	1.1547	1.0956	1.0177

dev.cdf_.cum_to_incr()

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

Notice these attributes have a trailing underscore (_). This is scikit-learn’s API convention, as its documentation states, “attributes that have been estimated from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_ attribute after fit has been called.” In summary, the trailing underscore in class attributes is a scikit-learn’s convention to denote that the attributes are estimated, or to denote that they are fitted attributes.

print("Assumption parameter (no underscore):", dev.average)
print("Estimated parameter (underscore):\n", dev.ldf_)

Assumption parameter (no underscore): volume
Estimated parameter (underscore):
           12-24     24-36     36-48     48-60     60-72     72-84     84-96    96-108   108-120
(All)  3.490607  1.747333  1.457413  1.173852  1.103824  1.086269  1.053874  1.076555  1.017725

Development Averaging#

Now that we have a grounding in triangle manipulation and the basics of estimators, we can start getting more creative with customizing our development factors.

The basic Development estimator uses a weighted regression through the origin for estimating parameters. Mack showed that using weighted regressions allows for:

volume weighted average development patterns
simple average development factors
OLS regression estimate of development factor where the regression equation is Y = mX + 0

While he posited this framework to suggest the MackChainladder stochastic method, it is an elegant form even for deterministic development pattern selection.

genins = cl.load_sample("genins")
genins

	12	24	36	48	60	72	84	96	108	120
2001	357,848	1,124,788	1,735,330	2,218,270	2,745,596	3,319,994	3,466,336	3,606,286	3,833,515	3,901,463
2002	352,118	1,236,139	2,170,033	3,353,322	3,799,067	4,120,063	4,647,867	4,914,039	5,339,085
2003	290,507	1,292,306	2,218,525	3,235,179	3,985,995	4,132,918	4,628,910	4,909,315
2004	310,608	1,418,858	2,195,047	3,757,447	4,029,929	4,381,982	4,588,268
2005	443,160	1,136,350	2,128,333	2,897,821	3,402,672	3,873,311
2006	396,132	1,333,217	2,180,715	2,985,752	3,691,712
2007	440,832	1,288,463	2,419,861	3,483,130
2008	359,480	1,421,128	2,864,498
2009	376,686	1,363,294
2010	344,014

We can also print the age_to_age factors.

genins.age_to_age

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
2001	3.1432	1.5428	1.2783	1.2377	1.2092	1.0441	1.0404	1.0630	1.0177
2002	3.5106	1.7555	1.5453	1.1329	1.0845	1.1281	1.0573	1.0865
2003	4.4485	1.7167	1.4583	1.2321	1.0369	1.1200	1.0606
2004	4.5680	1.5471	1.7118	1.0725	1.0874	1.0471
2005	2.5642	1.8730	1.3615	1.1742	1.1383
2006	3.3656	1.6357	1.3692	1.2364
2007	2.9228	1.8781	1.4394
2008	3.9533	2.0157
2009	3.6192

And colorcode with heatmap().

genins.age_to_age.heatmap()

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
2001	3.1432	1.5428	1.2783	1.2377	1.2092	1.0441	1.0404	1.0630	1.0177
2002	3.5106	1.7555	1.5453	1.1329	1.0845	1.1281	1.0573	1.0865
2003	4.4485	1.7167	1.4583	1.2321	1.0369	1.1200	1.0606
2004	4.5680	1.5471	1.7118	1.0725	1.0874	1.0471
2005	2.5642	1.8730	1.3615	1.1742	1.1383
2006	3.3656	1.6357	1.3692	1.2364
2007	2.9228	1.8781	1.4394
2008	3.9533	2.0157
2009	3.6192

vol = cl.Development(average="volume").fit(genins).ldf_
vol

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

sim = cl.Development(average="simple").fit(genins).ldf_
sim

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.5661	1.7456	1.4520	1.1810	1.1112	1.0848	1.0527	1.0748	1.0177

In most cases, estimator attributes are Triangles themselves and can be manipulated with just like raw triangles.

print("LDF Type: ", type(vol))
print("Difference between volume and simple average:")
vol - sim

LDF Type:  <class 'chainladder.core.triangle.Triangle'>
Difference between volume and simple average:

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	-0.0755	0.0018	0.0055	-0.0071	-0.0074	0.0015	0.0011	0.0018

We can specify how the LDFs are averaged independently for each age-to-age period. For example, we can use volume averaging on the first pattern, simple the second, regression the third, and then repeat the cycle three times for the 9 age-to-age factors that we need. Note that the array of selected method must be of the same length as the number of age-to-age factors.

cl.Development(average=["volume", "simple", "regression"] * 3).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7456	1.4619	1.1739	1.1112	1.0873	1.0539	1.0748	1.0177

Another example, using volume-weighting for the first factor, simple-weighting for the next 5 factors, and volume-weighting for the last 3 factors.

cl.Development(average=["volume"] + ["simple"] * 5 + ["volume"] * 3).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7456	1.4520	1.1810	1.1112	1.0848	1.0539	1.0766	1.0177

Averaging Period#

Development comes with an n_periods parameter that allows you to select the latest n origin periods for fitting your development patterns. n_periods=-1 is used to indicate the usage of all available periods, which is also the default if the parameter is not specified. The units of n_periods follows the origin_grain of the underlying triangle.

cl.Development().fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

cl.Development(n_periods=-1).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4906	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

cl.Development(n_periods=3).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4604	1.8465	1.3920	1.1539	1.0849	1.0974	1.0539	1.0766	1.0177

Much like average, n_periods can also be set for each age-to-age individually.

cl.Development(n_periods=[8, 2, 6, 5, -1, 2, -1, -1, 5]).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.5325	1.9502	1.4808	1.1651	1.1038	1.0825	1.0539	1.0766	1.0177

Note that if we provide n_periods that is greater than what is available for any particular age-to-age period, all available periods will be used instead.

cl.Development(n_periods=[1, 2, 3, 4, 5, 6, 7, 8, 9]).fit(
    genins
).ldf_ == cl.Development(n_periods=[1, 2, 3, 4, 5, 4, 3, 2, 1]).fit(genins).ldf_

True

Discarding Problematic Link Ratios#

Even with n_periods, there are situations where you might want to be more surgical in our selections. For example, you could have a valuation period with bad data and wish to omit the entire diagonal from your averaging.

cl.Development(drop_valuation="2004").fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.3797	1.7517	1.4426	1.1651	1.1038	1.0863	1.0539	1.0766	1.0177

We can also do an olympic averaging (i.e. exluding high and low from each period).

cl.Development(drop_high=True, drop_low=True).fit(genins).ldf_

/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:523: UserWarning: Some exclusions have been ignored. At least 1 (use preserve = ...) link ratio(s) is required for development estimation.
  warnings.warn(warning)
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:204: UserWarning: Some exclusions have been ignored. At least 1 (use preserve = ...) link ratio(s) is required for development estimation.
  warnings.warn(warning)

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.5201	1.7277	1.4351	1.1930	1.1018	1.0825	1.0573	1.0766	1.0177

The function also accepts intergers. For example, if we want to drop the highest 3 factors from each period.

cl.Development(drop_high=3).fit(genins).ldf_

/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:523: UserWarning: Some exclusions have been ignored. At least 1 (use preserve = ...) link ratio(s) is required for development estimation.
  warnings.warn(warning)
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:204: UserWarning: Some exclusions have been ignored. At least 1 (use preserve = ...) link ratio(s) is required for development estimation.
  warnings.warn(warning)

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.1614	1.6392	1.3687	1.1222	1.0601	1.0441	1.0539	1.0766	1.0177

There’s a preserve that we can use, this variable allows us to specified the minimum number of LDFs required for calculation. If this minimum is not yet, the drop_high and drop_low for that age will be ignored. This is especially useful in the tail when the data is thin.

cl.Development(drop_high=3, drop_low=2, preserve=2).fit(genins).ldf_

/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:523: UserWarning: Some exclusions have been ignored. At least 2 link ratio(s) is required for development estimation.
  warnings.warn(warning)
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/development/base.py:204: UserWarning: Some exclusions have been ignored. At least 2 link ratio(s) is required for development estimation.
  warnings.warn(warning)

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.4108	1.7012	1.4061	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

We can also use an array of booleans or ints.

cl.Development(drop_high=[True, True, False, True], drop_low=[1, 2, 0, 3]).fit(
    genins
).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.5201	1.7685	1.4574	1.2342	1.1038	1.0863	1.0539	1.0766	1.0177

Or maybe there is just a single outlier link-ratio that you don’t think is indicative of future development. For these, you can specify the intersection of the origin and development age of the denominator of the link-ratio to drop.

genins.age_to_age.heatmap()

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
2001	3.1432	1.5428	1.2783	1.2377	1.2092	1.0441	1.0404	1.0630	1.0177
2002	3.5106	1.7555	1.5453	1.1329	1.0845	1.1281	1.0573	1.0865
2003	4.4485	1.7167	1.4583	1.2321	1.0369	1.1200	1.0606
2004	4.5680	1.5471	1.7118	1.0725	1.0874	1.0471
2005	2.5642	1.8730	1.3615	1.1742	1.1383
2006	3.3656	1.6357	1.3692	1.2364
2007	2.9228	1.8781	1.4394
2008	3.9533	2.0157
2009	3.6192

Let’s say we believe the 4.5680 factor from origin 2004 between age 12 and 24 should be dropped, we can use drop=('2004', 12).

cl.Development(drop=("2004", 12)).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.3797	1.7473	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

If there are more than one outliers, you can also pass an array of array to the drop argument.

cl.Development(drop=[("2004", 12), ("2008", 24)]).fit(genins).ldf_

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
(All)	3.3797	1.7041	1.4574	1.1739	1.1038	1.0863	1.0539	1.0766	1.0177

Transformers#

In sklearn, there are two types of estimators: transformers and predictors. A transformer transforms the input data (X) in some ways, and a predictor predicts a new value (or values, Y) by using the input data X.

Development is a transformer, as the returned object is a means to create development patterns, which is used to estimate ultimates, but itself is not a reserving model (predictor).

Transformers come with the tranform and fit_transform method. These will return a Triangle object, but augment it with additional information for use in a subsequent IBNR model (a predictor). drop_high (and drop_low) can take an array of boolean variables, indicating if the highest factor should be dropped for each of the LDF calculation.

transformed_triangle = cl.Development(drop_high=[True] * 4 + [False] * 5).fit_transform(
    genins
)
transformed_triangle

	12	24	36	48	60	72	84	96	108	120
2001	357,848	1,124,788	1,735,330	2,218,270	2,745,596	3,319,994	3,466,336	3,606,286	3,833,515	3,901,463
2002	352,118	1,236,139	2,170,033	3,353,322	3,799,067	4,120,063	4,647,867	4,914,039	5,339,085
2003	290,507	1,292,306	2,218,525	3,235,179	3,985,995	4,132,918	4,628,910	4,909,315
2004	310,608	1,418,858	2,195,047	3,757,447	4,029,929	4,381,982	4,588,268
2005	443,160	1,136,350	2,128,333	2,897,821	3,402,672	3,873,311
2006	396,132	1,333,217	2,180,715	2,985,752	3,691,712
2007	440,832	1,288,463	2,419,861	3,483,130
2008	359,480	1,421,128	2,864,498
2009	376,686	1,363,294
2010	344,014

transformed_triangle.link_ratio

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
2001	3.1432	1.5428	1.2783		1.2092	1.0441	1.0404	1.0630	1.0177
2002	3.5106	1.7555	1.5453	1.1329	1.0845	1.1281	1.0573	1.0865
2003	4.4485	1.7167	1.4583	1.2321	1.0369	1.1200	1.0606
2004		1.5471		1.0725	1.0874	1.0471
2005	2.5642	1.8730	1.3615	1.1742	1.1383
2006	3.3656	1.6357	1.3692	1.2364
2007	2.9228	1.8781	1.4394
2008	3.9533
2009	3.6192

Our transformed triangle behaves as our original genins triangle. However, notice the link_ratios exclude any droppped values you specified.

transformed_triangle.link_ratio.heatmap()

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120
2001	3.1432	1.5428	1.2783		1.2092	1.0441	1.0404	1.0630	1.0177
2002	3.5106	1.7555	1.5453	1.1329	1.0845	1.1281	1.0573	1.0865
2003	4.4485	1.7167	1.4583	1.2321	1.0369	1.1200	1.0606
2004		1.5471		1.0725	1.0874	1.0471
2005	2.5642	1.8730	1.3615	1.1742	1.1383
2006	3.3656	1.6357	1.3692	1.2364
2007	2.9228	1.8781	1.4394
2008	3.9533
2009	3.6192

print(type(transformed_triangle))
transformed_triangle.latest_diagonal

<class 'chainladder.core.triangle.Triangle'>

	2010
2001	3,901,463
2002	5,339,085
2003	4,909,315
2004	4,588,268
2005	3,873,311
2006	3,691,712
2007	3,483,130
2008	2,864,498
2009	1,363,294
2010	344,014

However, it has other attributes that make it IBNR model-ready.

transformed_triangle.cdf_

	12-Ult	24-Ult	36-Ult	48-Ult	60-Ult	72-Ult	84-Ult	96-Ult	108-Ult
(All)	13.1367	3.8870	2.2809	1.6131	1.3845	1.2543	1.1547	1.0956	1.0177

fit_transform() is equivalent to calling fit and transform in succession on the same triangle. Again, this should feel very familiar to the sklearn practitioner.

cl.Development().fit_transform(genins) == cl.Development().fit(genins).transform(genins)

True

The reason you might want want to use fit and transform separately would be when you want to apply development patterns to a a different triangle. For example, we can:

Extract the commercial auto triangles from the clrd dataset
Summarize to an industry level and fit a Development object
We can then transform the individual company triangles with the industry development patterns

clrd = cl.load_sample("clrd")
comauto = clrd[clrd["LOB"] == "comauto"]["CumPaidLoss"]

comauto_industry = comauto.sum()
industry_dev = cl.Development().fit(comauto_industry)

industry_dev.transform(comauto)

	Triangle Summary
Valuation:	1997-12
Grain:	OYDY
Shape:	(157, 1, 10, 10)
Index:	[GRNAME, LOB]
Columns:	[CumPaidLoss]

Working with Multidimensional Triangles#

Several (though not all) of the estimators in chainladder can be fit to several triangles simultaneously. While this can be a convenient shorthand, all these estimators use the same assumptions across every triangle.

clrd = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"]
print("Fitting to " + str(len(clrd.index)) + " industries simultaneously.")
cl.Development().fit_transform(clrd).cdf_

Fitting to 6 industries simultaneously.

	Triangle Summary
Valuation:	2261-12
Grain:	OYDY
Shape:	(6, 1, 1, 9)
Index:	[LOB]
Columns:	[CumPaidLoss]

For greater control, you can slice individual triangles out and fit separate patterns to each.

print(cl.Development(average="simple").fit(clrd.loc["wkcomp"]))
print(cl.Development(n_periods=4).fit(clrd.loc["ppauto"]))
print(cl.Development(average="regression", n_periods=6).fit(clrd.loc["comauto"]))

Development(average='simple')
Development(n_periods=4)
Development(average='regression', n_periods=6)