Online Sandbox Tutorial#

Welcome! If you’ve come here to explore the capabilities of the chainladder-python package, you’ve landed in the perfect spot. This online sandbox tutorial is designed to provide you with a glimpse of the package’s functionalities.

We recommend setting aside about one hour to complete it.

Got Stuck? Click here for the filled in workbook. Have questions? Join the discussion on GitHub.

Setting Up#

We will first need to install the package, as Google Colab’s default environment doesn’t have the chainladder package pre-installed.

Simply execute pip install chainladder, Colab is smart enough to know that this is not a piece of python code, but to execute it in shell. FYI, pip stands for “Package Installer for Python”. You will need to run this step using your terminal instead of using a python notebook when you are ready to install the package on your machine.

pip install __fill_in_code__
ERROR: Invalid requirement: '__fill_in_code__': Expected package name at the start of dependency specifier
    __fill_in_code__
    ^

Note: you may need to restart the kernel to use updated packages.

Other commonly used packages, such as pandas and matplotlib are already pre-installed, we just need to load them into our environment.

import pandas as pd
import matplotlib.pyplot as plt
import chainladder as cl

print("chainladder", cl.__version__)
chainladder 0.9.2

Your Journey Begins#

Let’s begin by looking at a sample dataset, called xyz, which is hosted on https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/xyz.csv.

Let’s load the dataset into the memory with pandas, then inspect its “head”.

xyz_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/xyz.csv"
)
xyz_df.head()
AccidentYear DevelopmentYear Incurred Paid Reported Closed Premium
0 2002 2002 12811 2318 1342 203 61183
1 2003 2003 9651 1743 1373 181 69175
2 2004 2004 16995 2221 1932 235 99322
3 2005 2005 28674 3043 2067 295 138151
4 2006 2006 27066 3531 1473 307 107578

Can you list all of the unique accident years?

xyz_df["AccidentYear"].unique()
array([2002, 2003, 2004, 2005, 2006, 2007, 2008, 1998, 1999, 2000, 2001])

How many are there?

xyz_df["AccidentYear"].nunique()
11

Triangle Basics#

Let’s load the data into the chainladder triangle format. And let’s call it xyz_tri.

xyz_tri = cl.Triangle(
    data=xyz_df,
    origin="AccidentYear",
    development="DevelopmentYear",
    columns=["Incurred", "Paid", "Reported", "Closed", "Premium"],
    cumulative=True,
)
xyz_tri
Triangle Summary
Valuation: 2008-12
Grain: OYDY
Shape: (1, 5, 11, 11)
Index: [Total]
Columns: [Incurred, Paid, Reported, Closed, Premium]

What does the incurred triangle look like?

xyz_tri["Incurred"]
12 24 36 48 60 72 84 96 108 120 132
1998 11,171 12,380 13,216 14,067 14,688 16,366 16,163 15,835 15,822
1999 13,255 16,405 19,639 22,473 23,764 25,094 24,795 25,071 25,107
2000 15,676 18,749 21,900 27,144 29,488 34,458 36,949 37,505 37,246
2001 11,827 16,004 21,022 26,578 34,205 37,136 38,541 38,798
2002 12,811 20,370 26,656 37,667 44,414 48,701 48,169
2003 9,651 16,995 30,354 40,594 44,231 44,373
2004 16,995 40,180 58,866 71,707 70,288
2005 28,674 47,432 70,340 70,655
2006 27,066 46,783 48,804
2007 19,477 31,732
2008 18,632

How about paid?

xyz_tri["Paid"]
12 24 36 48 60 72 84 96 108 120 132
1998 6,309 8,521 10,082 11,620 13,242 14,419 15,311 15,764 15,822
1999 4,666 9,861 13,971 18,127 22,032 23,511 24,146 24,592 24,817
2000 1,302 6,513 12,139 17,828 24,030 28,853 33,222 35,902 36,782
2001 1,539 5,952 12,319 18,609 24,387 31,090 37,070 38,519
2002 2,318 7,932 13,822 22,095 31,945 40,629 44,437
2003 1,743 6,240 12,683 22,892 34,505 39,320
2004 2,221 9,898 25,950 43,439 52,811
2005 3,043 12,219 27,073 40,026
2006 3,531 11,778 22,819
2007 3,529 11,865
2008 3,409

Pandas-like Operations#

Let’s see how .iloc[...] and .loc[...] similarly to pandas. They take 4 parameters: [index, column, origin, valuation].

What if we want the row from AY 1998 Incurred data?

xyz_tri.iloc[:, 0, 0, :]
12 24 36 48 60 72 84 96 108 120 132
1998 11,171 12,380 13,216 14,067 14,688 16,366 16,163 15,835 15,822

What if you only want the valuation at age 60 of AY 1998?

xyz_tri.iloc[:, 0, 0, 4]
60
1998 13,216

Let’s use .loc[...] to get the incurred triangle.

xyz_tri.loc[:, "Incurred", :, :]
12 24 36 48 60 72 84 96 108 120 132
1998 11,171 12,380 13,216 14,067 14,688 16,366 16,163 15,835 15,822
1999 13,255 16,405 19,639 22,473 23,764 25,094 24,795 25,071 25,107
2000 15,676 18,749 21,900 27,144 29,488 34,458 36,949 37,505 37,246
2001 11,827 16,004 21,022 26,578 34,205 37,136 38,541 38,798
2002 12,811 20,370 26,656 37,667 44,414 48,701 48,169
2003 9,651 16,995 30,354 40,594 44,231 44,373
2004 16,995 40,180 58,866 71,707 70,288
2005 28,674 47,432 70,340 70,655
2006 27,066 46,783 48,804
2007 19,477 31,732
2008 18,632

How do we get the latest Incurred diagonal only?

xyz_tri["Incurred"].latest_diagonal
2008
1998 15,822
1999 25,107
2000 37,246
2001 38,798
2002 48,169
2003 44,373
2004 70,288
2005 70,655
2006 48,804
2007 31,732
2008 18,632

Very often, we want incremental triangles instead. Let’s convert the Incurred triangle to the incremental form.

xyz_tri["Incurred"].cum_to_incr()
12 24 36 48 60 72 84 96 108 120 132
1998 11,171 1,209 836 851 621 1,678 -203 -328 -13
1999 13,255 3,150 3,234 2,834 1,291 1,330 -299 276 36
2000 15,676 3,073 3,151 5,244 2,344 4,970 2,491 556 -259
2001 11,827 4,177 5,018 5,556 7,627 2,931 1,405 257
2002 12,811 7,559 6,286 11,011 6,747 4,287 -532
2003 9,651 7,344 13,359 10,240 3,637 142
2004 16,995 23,185 18,686 12,841 -1,419
2005 28,674 18,758 22,908 315
2006 27,066 19,717 2,021
2007 19,477 12,255
2008 18,632

We can also convert the triangle to the valuation format, what we often see on Schedule Ps.

xyz_tri["Incurred"].dev_to_val()
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
1998 11,171 12,380 13,216 14,067 14,688 16,366 16,163 15,835 15,822
1999 13,255 16,405 19,639 22,473 23,764 25,094 24,795 25,071 25,107
2000 15,676 18,749 21,900 27,144 29,488 34,458 36,949 37,505 37,246
2001 11,827 16,004 21,022 26,578 34,205 37,136 38,541 38,798
2002 12,811 20,370 26,656 37,667 44,414 48,701 48,169
2003 9,651 16,995 30,354 40,594 44,231 44,373
2004 16,995 40,180 58,866 71,707 70,288
2005 28,674 47,432 70,340 70,655
2006 27,066 46,783 48,804
2007 19,477 31,732
2008 18,632

Another function that is often useful is the .heatmap() method. Let’s inspect the incurred amount and see if there are trends.

xyz_tri["Incurred"].heatmap()
  12 24 36 48 60 72 84 96 108 120 132
1998 11,171 12,380 13,216 14,067 14,688 16,366 16,163 15,835 15,822
1999 13,255 16,405 19,639 22,473 23,764 25,094 24,795 25,071 25,107
2000 15,676 18,749 21,900 27,144 29,488 34,458 36,949 37,505 37,246
2001 11,827 16,004 21,022 26,578 34,205 37,136 38,541 38,798
2002 12,811 20,370 26,656 37,667 44,414 48,701 48,169
2003 9,651 16,995 30,354 40,594 44,231 44,373
2004 16,995 40,180 58,866 71,707 70,288
2005 28,674 47,432 70,340 70,655
2006 27,066 46,783 48,804
2007 19,477 31,732
2008 18,632

Development#

How can we get the incurred link ratios?

xyz_tri["Incurred"].link_ratio
12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120 120-132
1998 1.1082 1.0675 1.0644 1.0441 1.1142 0.9876 0.9797 0.9992
1999 1.2376 1.1971 1.1443 1.0574 1.0560 0.9881 1.0111 1.0014
2000 1.1960 1.1681 1.2395 1.0864 1.1685 1.0723 1.0150 0.9931
2001 1.3532 1.3135 1.2643 1.2870 1.0857 1.0378 1.0067
2002 1.5900 1.3086 1.4131 1.1791 1.0965 0.9891
2003 1.7610 1.7861 1.3374 1.0896 1.0032
2004 2.3642 1.4651 1.2181 0.9802
2005 1.6542 1.4830 1.0045
2006 1.7285 1.0432
2007 1.6292

We can also apply a .heatmap() to make it too, to help us visulize the highs and lows.

xyz_tri["Incurred"].link_ratio.heatmap()
  12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120 120-132
1998 1.1082 1.0675 1.0644 1.0441 1.1142 0.9876 0.9797 0.9992
1999 1.2376 1.1971 1.1443 1.0574 1.0560 0.9881 1.0111 1.0014
2000 1.1960 1.1681 1.2395 1.0864 1.1685 1.0723 1.0150 0.9931
2001 1.3532 1.3135 1.2643 1.2870 1.0857 1.0378 1.0067
2002 1.5900 1.3086 1.4131 1.1791 1.0965 0.9891
2003 1.7610 1.7861 1.3374 1.0896 1.0032
2004 2.3642 1.4651 1.2181 0.9802
2005 1.6542 1.4830 1.0045
2006 1.7285 1.0432
2007 1.6292

Let’s get a volume-weighted average LDFs for our Incurred triangle.

cl.Development(average="volume").fit(__fill_in_code__).ldf_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 cl.Development(average="volume").fit(__fill_in_code__).ldf_

NameError: name '__fill_in_code__' is not defined

How about the CDFs? But use a simple average instead.

cl.Development(average=__fill_in_code__).fit(xyz_tri["Incurred"]).__fill_in_code__
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 cl.Development(average=__fill_in_code__).fit(xyz_tri["Incurred"]).__fill_in_code__

NameError: name '__fill_in_code__' is not defined

We can also use only the latest 3 periods in the calculation of CDFs.

cl.Development(average="simple", n_periods=__fill_in_code__).fit(xyz_tri["Incurred"]).cdf_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 cl.Development(average="simple", n_periods=__fill_in_code__).fit(xyz_tri["Incurred"]).cdf_

NameError: name '__fill_in_code__' is not defined

Deterministic Models#

Before we can build any models, we need to use fit_transform(), so that the object is actually modified with our selected development pattern(s).

Set the development of the triangle to use only 3 periods, calculated using the simple method.

cl.Development(average="volume", n_periods=__fill_in_code__).fit_transform(__fill_in_code__)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 cl.Development(average="volume", n_periods=__fill_in_code__).fit_transform(__fill_in_code__)

NameError: name '__fill_in_code__' is not defined

Let’s fit a chainladder model to our Incurred triangle.

cl_mod = cl.Chainladder().fit(__fill_in_code__)
cl_mod
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 cl_mod = cl.Chainladder().fit(__fill_in_code__)
      2 cl_mod

NameError: name '__fill_in_code__' is not defined

How can we get the model’s ultimate estimate?

cl_mod.__fill_in_code__
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 cl_mod.__fill_in_code__

NameError: name 'cl_mod' is not defined

How about just the IBNR?

cl_mod.__fill_in_code__
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 cl_mod.__fill_in_code__

NameError: name 'cl_mod' is not defined

Let’s fit an Expected Loss model, with an aprior of 90% on Premium, and get its ultimates.

cl.ExpectedLoss(apriori=__fill_in_code__).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 cl.ExpectedLoss(apriori=__fill_in_code__).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

NameError: name '__fill_in_code__' is not defined

Try it on the Paid triangle, do you get the same ultimate?

cl.ExpectedLoss(apriori=0.90).fit(
    xyz_tri["Paid"], sample_weight=__fill_in_code__
).ultimate_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 2
      1 cl.ExpectedLoss(apriori=0.90).fit(
----> 2     xyz_tri["Paid"], sample_weight=__fill_in_code__
      3 ).ultimate_

NameError: name '__fill_in_code__' is not defined

How about a Bornhuetter-Ferguson model?

cl.__fill_in_code__(apriori=0.90).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[27], line 1
----> 1 cl.__fill_in_code__(apriori=0.90).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

How about Benktander, with 1 iteration, which is the same as BF?

cl.__fill_in_code__(apriori=0.90, n_iters=__fill_in_code__).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[28], line 1
----> 1 cl.__fill_in_code__(apriori=0.90, n_iters=__fill_in_code__).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

How about Cape Cod?

cl.__fill_in_code__().fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[29], line 1
----> 1 cl.__fill_in_code__().fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

Let’s store the Cape Cod model as cc_result. We can also use .to_frame() to leave chainladder and go to a DataFrame. Let’s make a bar chart over origin years to see what they look like.

cc_result = (
    cl.CapeCod()
    .fit(xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal)
    .ultimate_
).to_frame()

plt.plot(
    cc_result.index.year,
    cc_result[__fill_in_code__]
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[30], line 9
      5 ).to_frame()
      6 
      7 plt.plot(
      8     cc_result.index.year,
----> 9     cc_result[__fill_in_code__]
     10 )

NameError: name '__fill_in_code__' is not defined

Stochastic Models#

The Mack’s Chainladder model is available. Let’s use it on the Incurred triangle.

mcl_mod = cl.__fill_in_code__().fit(__fill_in_code__)
mcl_mod
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[31], line 1
----> 1 mcl_mod = cl.__fill_in_code__().fit(__fill_in_code__)
      2 mcl_mod

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

There are many attributes that are available, such as full_std_err_, total_process_risk_, total_parameter_risk_, mack_std_err_ and total_mack_std_err_.

__fill_in_code__.full_std_err_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 __fill_in_code__.full_std_err_

NameError: name '__fill_in_code__' is not defined

MackChainladder also has a summary_ attribute.

__fill_in_code__.summary_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[33], line 1
----> 1 __fill_in_code__.summary_

NameError: name '__fill_in_code__' is not defined

Let’s make a graph, that shows the Reported and IBNR as stacked bars, and error bars showing Mack Standard Errors.

plt.bar(
    mcl_mod.summary_.to_frame().index.year,
    mcl_mod.summary_.to_frame()[__fill_in_code__],
    label="Reported",
)
plt.bar(
    mcl_mod.summary_.to_frame().index.year,
    mcl_mod.summary_.to_frame()[__fill_in_code__],
    bottom=mcl_mod.summary_.to_frame()[__fill_in_code__],
    yerr=mcl_mod.summary_.to_frame()[__fill_in_code__],
    label="IBNR",
)
plt.legend(loc="upper left")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[34], line 2
      1 plt.bar(
----> 2     mcl_mod.summary_.to_frame().index.year,
      3     mcl_mod.summary_.to_frame()[__fill_in_code__],
      4     label="Reported",
      5 )

NameError: name 'mcl_mod' is not defined

ODP Bootstrap is also available. Let’s build sample 10,000 Incurred triangles.

xyz_tri_sampled = (
    cl.BootstrapODPSample(n_sims=__fill_in_code__).fit(__fill_in_code__).resampled_triangles_
)
xyz_tri_sampled
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[35], line 2
      1 xyz_tri_sampled = (
----> 2     cl.BootstrapODPSample(n_sims=__fill_in_code__).fit(__fill_in_code__).resampled_triangles_
      3 )
      4 xyz_tri_sampled

NameError: name '__fill_in_code__' is not defined

We can fit a basic chainladder to all sampled triangles. We now have 10,000 simulated chainladder models, all (most) with unique LDFs.

cl_mod_bootstrapped = cl.Chainladder().fit(xyz_tri_sampled)
cl_mod_bootstrapped
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 cl_mod_bootstrapped = cl.Chainladder().fit(xyz_tri_sampled)
      2 cl_mod_bootstrapped

NameError: name 'xyz_tri_sampled' is not defined

Let’s make another graph.

plt.bar(
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().index.year,
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
    yerr=cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
)
  Cell In[37], line 3
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
                                                              ^
SyntaxError: invalid syntax