Development Tutorial

Getting Started

This tutorial focuses on selecting the development factors.

Be sure to make sure your packages are updated. For more info on how to update your pakages, visit Keeping Packages Updated.

# Black linter, optional
%load_ext lab_black

import pandas as pd
import numpy as np
import chainladder as cl

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)
pandas: 1.4.2
numpy: 1.21.5
chainladder: 0.8.13

Disclaimer

Note that a lot of the examples shown might not be applicable in a real world scenario, and is only meant to demonstrate some of the functionalities included in the package. The user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement.

Testing for Violation of Chain Ladder’s Assumptions

The Chain Ladder method is based on the strong assumptions of independence across origin periods and across valuation periods. Mack developed tests to verify if these assumptions hold, and these tests have been implemented in chainladder.

Before the Chain-Ladder model can be used, we should verify that the data satisfies the underlying assumptions using tests at the desired confidence interval level. If assumptions are violated, we should consider if ultimates can be estimated using other models.

Let’s test for independence across origin and development periods. Note that for correlation across valuation periods, the Z-statistic is used; and for correlation across origin periods, the T-statistic is used. For the valuation_correlation test, an additional parameter, total, can be passed depends on if we want to calculate valuation correlation in total across all years (True) consistent with Mack 1993, or for each year separately (False) consistent with Mack 1997.

raa = cl.load_sample("raa")
print(
    "Correlation across valuation years? ",
    raa.valuation_correlation(p_critical=0.1, total=True).z_critical.values,
)
print(
    "Correlation across origin years? ",
    raa.development_correlation(p_critical=0.5).t_critical.values,
)
Correlation across valuation years?  [[False]]
Correlation across origin years?  [[False]]

The above tests show that the raa triangle is independent in both cases, suggesting that there is no evidence that the Chain-Ladder model is not an appropriate method to develop the ultimate amounts. It is suggested to review Mack (1993) and Mack (1997) to ensure a proper understanding of the methodology and the choice of p_critical.

Mack (1997) differs from Mack (1993) for testing valuation years correlation. The 1993 paper looks at the aggregate of all years, while the latter suggests to check independence for each valuation year. To test for each valuation year, we set total to False.

raa.valuation_correlation(p_critical=0.1, total=False).z_critical
1982 1983 1984 1985 1986 1987 1988 1989 1990
1981 False False False False False False False False False

Please note that the tests are run on the entire 4 dimensions of the triangle.

Estimator Basics

All development methods follow the sklearn estimator API. These estimators have a few properties that are worth getting used to.

We instantiate the estimator with your choice of assumptions. In the case where we don’t opt for any assumptions, defaults are chosen for you.

At this point, we’ve chosen an estimator and assumptions (even if default) but we have not shown our estimator a Triangle. At this point it is merely instructions on how to fit development patterns, but no patterns exist as of yet.

All estimators have a fit method and you can pass a triangle to your estimator. Let’s fit a Triangle in a Development estimator. Let’s also assign the estimator to a variable so we can reference attributes about it.

genins = cl.load_sample("genins")
dev = cl.Development().fit(genins)

Now that we have fit a Development estimator, it has many additional properties that didn’t exist before fitting. For example, we can view the ldf_

dev.ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

We can view the cdf_

dev.cdf_
12-Ult 24-Ult 36-Ult 48-Ult 60-Ult 72-Ult 84-Ult 96-Ult 108-Ult
(All) 14.4466 4.1387 2.3686 1.6252 1.3845 1.2543 1.1547 1.0956 1.0177

We can also convert between LDFs and CDFs using incr_to_cum() and cum_to_incr() similar to triangles.

dev.ldf_.incr_to_cum()
12-Ult 24-Ult 36-Ult 48-Ult 60-Ult 72-Ult 84-Ult 96-Ult 108-Ult
(All) 14.4466 4.1387 2.3686 1.6252 1.3845 1.2543 1.1547 1.0956 1.0177
dev.cdf_.cum_to_incr()
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

Notice these attributes have a trailing underscore (_). This is scikit-learn’s API convention, as its documentation states, “attributes that have been estimated from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_ attribute after fit has been called.” In summary, the trailing underscore in class attributes is a scikit-learn’s convention to denote that the attributes are estimated, or to denote that they are fitted attributes.

print("Assumption parameter (no underscore):", dev.average)
print("Estimated parameter (underscore):\n", dev.ldf_)
Assumption parameter (no underscore): volume
Estimated parameter (underscore):
           12-24     24-36     36-48     48-60     60-72     72-84     84-96    96-108   108-120
(All)  3.490607  1.747333  1.457413  1.173852  1.103824  1.086269  1.053874  1.076555  1.017725

Development Averaging

Now that we have a grounding in triangle manipulation and the basics of estimators, we can start getting more creative with customizing our development factors.

The basic Development estimator uses a weighted regression through the origin for estimating parameters. Mack showed that using weighted regressions allows for:

  1. volume weighted average development patterns

  2. simple average development factors

  3. OLS regression estimate of development factor where the regression equation is Y = mX + 0

While he posited this framework to suggest the MackChainladder stochastic method, it is an elegant form even for deterministic development pattern selection.

genins = cl.load_sample("genins")
genins
12 24 36 48 60 72 84 96 108 120
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712
2007 440,832 1,288,463 2,419,861 3,483,130
2008 359,480 1,421,128 2,864,498
2009 376,686 1,363,294
2010 344,014

We can also print the age_to_age factors.

genins.age_to_age
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
2001 3.1432 1.5428 1.2783 1.2377 1.2092 1.0441 1.0404 1.0630 1.0177
2002 3.5106 1.7555 1.5453 1.1329 1.0845 1.1281 1.0573 1.0865
2003 4.4485 1.7167 1.4583 1.2321 1.0369 1.1200 1.0606
2004 4.5680 1.5471 1.7118 1.0725 1.0874 1.0471
2005 2.5642 1.8730 1.3615 1.1742 1.1383
2006 3.3656 1.6357 1.3692 1.2364
2007 2.9228 1.8781 1.4394
2008 3.9533 2.0157
2009 3.6192

And colorcode with heatmap().

genins.age_to_age.heatmap()
  12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
2001 3.1432 1.5428 1.2783 1.2377 1.2092 1.0441 1.0404 1.0630 1.0177
2002 3.5106 1.7555 1.5453 1.1329 1.0845 1.1281 1.0573 1.0865
2003 4.4485 1.7167 1.4583 1.2321 1.0369 1.1200 1.0606
2004 4.5680 1.5471 1.7118 1.0725 1.0874 1.0471
2005 2.5642 1.8730 1.3615 1.1742 1.1383
2006 3.3656 1.6357 1.3692 1.2364
2007 2.9228 1.8781 1.4394
2008 3.9533 2.0157
2009 3.6192
vol = cl.Development(average="volume").fit(genins).ldf_
vol
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177
sim = cl.Development(average="simple").fit(genins).ldf_
sim
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.5661 1.7456 1.4520 1.1810 1.1112 1.0848 1.0527 1.0748 1.0177

In most cases, estimator attributes are Triangles themselves and can be manipulated with just like raw triangles.

print("LDF Type: ", type(vol))
print("Difference between volume and simple average:")
vol - sim
LDF Type:  <class 'chainladder.core.triangle.Triangle'>
Difference between volume and simple average:
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) -0.0755 0.0018 0.0055 -0.0071 -0.0074 0.0015 0.0011 0.0018

We can specify how the LDFs are averaged independently for each age-to-age period. For example, we can use volume averaging on the first pattern, simple the second, regression the third, and then repeat the cycle three times for the 9 age-to-age factors that we need. Note that the array of selected method must be of the same length as the number of age-to-age factors.

cl.Development(average=["volume", "simple", "regression"] * 3).fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7456 1.4619 1.1739 1.1112 1.0873 1.0539 1.0748 1.0177

Another example, using volume-weighting for the first factor, simple-weighting for the next 5 factors, and volume-weighting for the last 3 factors.

cl.Development(average=["volume"] + ["simple"] * 5 + ["volume"] * 3).fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7456 1.4520 1.1810 1.1112 1.0848 1.0539 1.0766 1.0177

Averaging Period

Development comes with an n_periods parameter that allows you to select the latest n origin periods for fitting your development patterns. n_periods=-1 is used to indicate the usage of all available periods, which is also the default if the parameter is not specified. The units of n_periods follows the origin_grain of the underlying triangle.

cl.Development().fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177
cl.Development(n_periods=-1).fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177
cl.Development(n_periods=3).fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4604 1.8465 1.3920 1.1539 1.0849 1.0974 1.0539 1.0766 1.0177

Much like average, n_periods can also be set for each age-to-age individually.

cl.Development(n_periods=[8, 2, 6, 5, -1, 2, -1, -1, 5]).fit(genins).ldf_
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.5325 1.9502 1.4808 1.1651 1.1038 1.0825 1.0539 1.0766 1.0177

Note that if we provide n_periods that is greater than what is available for any particular age-to-age period, all available periods will be used instead.

cl.Development(n_periods=[1, 2, 3, 4, 5, 6, 7, 8, 9]).fit(
    genins
).ldf_ == cl.Development(n_periods=[1, 2, 3, 4, 5, 4, 3, 2, 1]).fit(genins).ldf_
True

Transformers

In sklearn, there are two types of estimators: transformers and predictors. A transformer transforms the input data (X) in some ways, and a predictor predicts a new value (or values, Y) by using the input data X.

Development is a transformer, as the returned object is a means to create development patterns, which is used to estimate ultimates, but itself is not a reserving model (predictor).

Transformers come with the tranform and fit_transform method. These will return a Triangle object, but augment it with additional information for use in a subsequent IBNR model (a predictor). drop_high (and drop_low) can take an array of boolean variables, indicating if the highest factor should be dropped for each of the LDF calculation.

transformed_triangle = cl.Development(drop_high=[True] * 4 + [False] * 5).fit_transform(
    genins
)
transformed_triangle
12 24 36 48 60 72 84 96 108 120
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712
2007 440,832 1,288,463 2,419,861 3,483,130
2008 359,480 1,421,128 2,864,498
2009 376,686 1,363,294
2010 344,014
transformed_triangle.link_ratio
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
2001 3.1432 1.5428 1.2783 1.2092 1.0441 1.0404 1.0630 1.0177
2002 3.5106 1.7555 1.5453 1.1329 1.0845 1.1281 1.0573 1.0865
2003 4.4485 1.7167 1.4583 1.2321 1.0369 1.1200 1.0606
2004 1.5471 1.0725 1.0874 1.0471
2005 2.5642 1.8730 1.3615 1.1742 1.1383
2006 3.3656 1.6357 1.3692 1.2364
2007 2.9228 1.8781 1.4394
2008 3.9533
2009 3.6192

Our transformed triangle behaves as our original genins triangle. However, notice the link_ratios exclude any droppped values you specified.

transformed_triangle.link_ratio.heatmap()
  12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
2001 3.1432 1.5428 1.2783 1.2092 1.0441 1.0404 1.0630 1.0177
2002 3.5106 1.7555 1.5453 1.1329 1.0845 1.1281 1.0573 1.0865
2003 4.4485 1.7167 1.4583 1.2321 1.0369 1.1200 1.0606
2004 1.5471 1.0725 1.0874 1.0471
2005 2.5642 1.8730 1.3615 1.1742 1.1383
2006 3.3656 1.6357 1.3692 1.2364
2007 2.9228 1.8781 1.4394
2008 3.9533
2009 3.6192
print(type(transformed_triangle))
transformed_triangle.latest_diagonal
<class 'chainladder.core.triangle.Triangle'>
2010
2001 3,901,463
2002 5,339,085
2003 4,909,315
2004 4,588,268
2005 3,873,311
2006 3,691,712
2007 3,483,130
2008 2,864,498
2009 1,363,294
2010 344,014

However, it has other attributes that make it IBNR model-ready.

transformed_triangle.cdf_
12-Ult 24-Ult 36-Ult 48-Ult 60-Ult 72-Ult 84-Ult 96-Ult 108-Ult
(All) 13.1367 3.8870 2.2809 1.6131 1.3845 1.2543 1.1547 1.0956 1.0177

fit_transform() is equivalent to calling fit and transform in succession on the same triangle. Again, this should feel very familiar to the sklearn practitioner.

cl.Development().fit_transform(genins) == cl.Development().fit(genins).transform(genins)
True

The reason you might want want to use fit and transform separately would be when you want to apply development patterns to a a different triangle. For example, we can:

  1. Extract the commercial auto triangles from the clrd dataset

  2. Summarize to an industry level and fit a Development object

  3. We can then transform the individual company triangles with the industry development patterns

clrd = cl.load_sample("clrd")
comauto = clrd[clrd["LOB"] == "comauto"]["CumPaidLoss"]

comauto_industry = comauto.sum()
industry_dev = cl.Development().fit(comauto_industry)

industry_dev.transform(comauto)
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (157, 1, 10, 10)
Index: [GRNAME, LOB]
Columns: [CumPaidLoss]

Working with Multidimensional Triangles

Several (though not all) of the estimators in chainladder can be fit to several triangles simultaneously. While this can be a convenient shorthand, all these estimators use the same assumptions across every triangle.

clrd = cl.load_sample("clrd").groupby("LOB").sum()["CumPaidLoss"]
print("Fitting to " + str(len(clrd.index)) + " industries simultaneously.")
cl.Development().fit_transform(clrd).cdf_
Fitting to 6 industries simultaneously.
Triangle Summary
Valuation: 2261-12
Grain: OYDY
Shape: (6, 1, 1, 9)
Index: [LOB]
Columns: [CumPaidLoss]

For greater control, you can slice individual triangles out and fit separate patterns to each.

print(cl.Development(average="simple").fit(clrd.loc["wkcomp"]))
print(cl.Development(n_periods=4).fit(clrd.loc["ppauto"]))
print(cl.Development(average="regression", n_periods=6).fit(clrd.loc["comauto"]))
Development(average='simple')
Development(n_periods=4)
Development(average='regression', n_periods=6)