Online Sandbox Tutorial

Online Sandbox Tutorial#

Welcome! If you’ve come here to explore the capabilities of the chainladder-python package, you’ve landed in the perfect spot. This online sandbox tutorial is designed to provide you with a glimpse of the package’s functionalities.

We recommend setting aside about one hour to complete it.

Got Stuck? Click here for the filled in workbook. Have questions? Join the discussion on GitHub.

Setting Up#

We will first need to install the package, as Google Colab’s default environment doesn’t have the chainladder package pre-installed.

Simply execute pip install chainladder, Colab is smart enough to know that this is not a piece of python code, but to execute it in shell. FYI, pip stands for “Package Installer for Python”. You will need to run this step using your terminal instead of using a python notebook when you are ready to install the package on your machine.

pip install __fill_in_code__

ERROR: Invalid requirement: '__fill_in_code__': Expected package name at the start of dependency specifier
    __fill_in_code__
    ^

Note: you may need to restart the kernel to use updated packages.

Other commonly used packages, such as pandas and matplotlib are already pre-installed, we just need to load them into our environment.

import pandas as pd
import matplotlib.pyplot as plt
import chainladder as cl

print("chainladder", cl.__version__)

chainladder 0.9.2

Your Journey Begins#

Let’s begin by looking at a sample dataset, called xyz, which is hosted on https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/xyz.csv.

Let’s load the dataset into the memory with pandas, then inspect its “head”.

xyz_df = pd.read_csv(
    "https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/xyz.csv"
)
xyz_df.head()

	AccidentYear	DevelopmentYear	Incurred	Paid	Reported	Closed	Premium
0	2002	2002	12811	2318	1342	203	61183
1	2003	2003	9651	1743	1373	181	69175
2	2004	2004	16995	2221	1932	235	99322
3	2005	2005	28674	3043	2067	295	138151
4	2006	2006	27066	3531	1473	307	107578

Can you list all of the unique accident years?

xyz_df["AccidentYear"].unique()

array([2002, 2003, 2004, 2005, 2006, 2007, 2008, 1998, 1999, 2000, 2001])

How many are there?

xyz_df["AccidentYear"].nunique()

Triangle Basics#

Let’s load the data into the chainladder triangle format. And let’s call it xyz_tri.

xyz_tri = cl.Triangle(
    data=xyz_df,
    origin="AccidentYear",
    development="DevelopmentYear",
    columns=["Incurred", "Paid", "Reported", "Closed", "Premium"],
    cumulative=True,
)
xyz_tri

	Triangle Summary
Valuation:	2008-12
Grain:	OYDY
Shape:	(1, 5, 11, 11)
Index:	[Total]
Columns:	[Incurred, Paid, Reported, Closed, Premium]

What does the incurred triangle look like?

xyz_tri["Incurred"]

	12	24	36	48	60	72	84	96	108	120	132
1998			11,171	12,380	13,216	14,067	14,688	16,366	16,163	15,835	15,822
1999		13,255	16,405	19,639	22,473	23,764	25,094	24,795	25,071	25,107
2000	15,676	18,749	21,900	27,144	29,488	34,458	36,949	37,505	37,246
2001	11,827	16,004	21,022	26,578	34,205	37,136	38,541	38,798
2002	12,811	20,370	26,656	37,667	44,414	48,701	48,169
2003	9,651	16,995	30,354	40,594	44,231	44,373
2004	16,995	40,180	58,866	71,707	70,288
2005	28,674	47,432	70,340	70,655
2006	27,066	46,783	48,804
2007	19,477	31,732
2008	18,632

How about paid?

xyz_tri["Paid"]

	12	24	36	48	60	72	84	96	108	120	132
1998			6,309	8,521	10,082	11,620	13,242	14,419	15,311	15,764	15,822
1999		4,666	9,861	13,971	18,127	22,032	23,511	24,146	24,592	24,817
2000	1,302	6,513	12,139	17,828	24,030	28,853	33,222	35,902	36,782
2001	1,539	5,952	12,319	18,609	24,387	31,090	37,070	38,519
2002	2,318	7,932	13,822	22,095	31,945	40,629	44,437
2003	1,743	6,240	12,683	22,892	34,505	39,320
2004	2,221	9,898	25,950	43,439	52,811
2005	3,043	12,219	27,073	40,026
2006	3,531	11,778	22,819
2007	3,529	11,865
2008	3,409

Pandas-like Operations#

Let’s see how .iloc[...] and .loc[...] similarly to pandas. They take 4 parameters: [index, column, origin, valuation].

What if we want the row from AY 1998 Incurred data?

xyz_tri.iloc[:, 0, 0, :]

	12	24	36	48	60	72	84	96	108	120	132
1998			11,171	12,380	13,216	14,067	14,688	16,366	16,163	15,835	15,822

What if you only want the valuation at age 60 of AY 1998?

xyz_tri.iloc[:, 0, 0, 4]

	60
1998	13,216

Let’s use .loc[...] to get the incurred triangle.

xyz_tri.loc[:, "Incurred", :, :]

	12	24	36	48	60	72	84	96	108	120	132
1998			11,171	12,380	13,216	14,067	14,688	16,366	16,163	15,835	15,822
1999		13,255	16,405	19,639	22,473	23,764	25,094	24,795	25,071	25,107
2000	15,676	18,749	21,900	27,144	29,488	34,458	36,949	37,505	37,246
2001	11,827	16,004	21,022	26,578	34,205	37,136	38,541	38,798
2002	12,811	20,370	26,656	37,667	44,414	48,701	48,169
2003	9,651	16,995	30,354	40,594	44,231	44,373
2004	16,995	40,180	58,866	71,707	70,288
2005	28,674	47,432	70,340	70,655
2006	27,066	46,783	48,804
2007	19,477	31,732
2008	18,632

How do we get the latest Incurred diagonal only?

xyz_tri["Incurred"].latest_diagonal

	2008
1998	15,822
1999	25,107
2000	37,246
2001	38,798
2002	48,169
2003	44,373
2004	70,288
2005	70,655
2006	48,804
2007	31,732
2008	18,632

Very often, we want incremental triangles instead. Let’s convert the Incurred triangle to the incremental form.

xyz_tri["Incurred"].cum_to_incr()

	12	24	36	48	60	72	84	96	108	120	132
1998			11,171	1,209	836	851	621	1,678	-203	-328	-13
1999		13,255	3,150	3,234	2,834	1,291	1,330	-299	276	36
2000	15,676	3,073	3,151	5,244	2,344	4,970	2,491	556	-259
2001	11,827	4,177	5,018	5,556	7,627	2,931	1,405	257
2002	12,811	7,559	6,286	11,011	6,747	4,287	-532
2003	9,651	7,344	13,359	10,240	3,637	142
2004	16,995	23,185	18,686	12,841	-1,419
2005	28,674	18,758	22,908	315
2006	27,066	19,717	2,021
2007	19,477	12,255
2008	18,632

We can also convert the triangle to the valuation format, what we often see on Schedule Ps.

xyz_tri["Incurred"].dev_to_val()

	2000	2001	2002	2003	2004	2005	2006	2007	2008
1998	11,171	12,380	13,216	14,067	14,688	16,366	16,163	15,835	15,822
1999	13,255	16,405	19,639	22,473	23,764	25,094	24,795	25,071	25,107
2000	15,676	18,749	21,900	27,144	29,488	34,458	36,949	37,505	37,246
2001		11,827	16,004	21,022	26,578	34,205	37,136	38,541	38,798
2002			12,811	20,370	26,656	37,667	44,414	48,701	48,169
2003				9,651	16,995	30,354	40,594	44,231	44,373
2004					16,995	40,180	58,866	71,707	70,288
2005						28,674	47,432	70,340	70,655
2006							27,066	46,783	48,804
2007								19,477	31,732
2008									18,632

Another function that is often useful is the .heatmap() method. Let’s inspect the incurred amount and see if there are trends.

xyz_tri["Incurred"].heatmap()

	12	24	36	48	60	72	84	96	108	120	132
1998			11,171	12,380	13,216	14,067	14,688	16,366	16,163	15,835	15,822
1999		13,255	16,405	19,639	22,473	23,764	25,094	24,795	25,071	25,107
2000	15,676	18,749	21,900	27,144	29,488	34,458	36,949	37,505	37,246
2001	11,827	16,004	21,022	26,578	34,205	37,136	38,541	38,798
2002	12,811	20,370	26,656	37,667	44,414	48,701	48,169
2003	9,651	16,995	30,354	40,594	44,231	44,373
2004	16,995	40,180	58,866	71,707	70,288
2005	28,674	47,432	70,340	70,655
2006	27,066	46,783	48,804
2007	19,477	31,732
2008	18,632

Development#

How can we get the incurred link ratios?

xyz_tri["Incurred"].link_ratio

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120	120-132
1998			1.1082	1.0675	1.0644	1.0441	1.1142	0.9876	0.9797	0.9992
1999		1.2376	1.1971	1.1443	1.0574	1.0560	0.9881	1.0111	1.0014
2000	1.1960	1.1681	1.2395	1.0864	1.1685	1.0723	1.0150	0.9931
2001	1.3532	1.3135	1.2643	1.2870	1.0857	1.0378	1.0067
2002	1.5900	1.3086	1.4131	1.1791	1.0965	0.9891
2003	1.7610	1.7861	1.3374	1.0896	1.0032
2004	2.3642	1.4651	1.2181	0.9802
2005	1.6542	1.4830	1.0045
2006	1.7285	1.0432
2007	1.6292

We can also apply a .heatmap() to make it too, to help us visulize the highs and lows.

xyz_tri["Incurred"].link_ratio.heatmap()

	12-24	24-36	36-48	48-60	60-72	72-84	84-96	96-108	108-120	120-132
1998			1.1082	1.0675	1.0644	1.0441	1.1142	0.9876	0.9797	0.9992
1999		1.2376	1.1971	1.1443	1.0574	1.0560	0.9881	1.0111	1.0014
2000	1.1960	1.1681	1.2395	1.0864	1.1685	1.0723	1.0150	0.9931
2001	1.3532	1.3135	1.2643	1.2870	1.0857	1.0378	1.0067
2002	1.5900	1.3086	1.4131	1.1791	1.0965	0.9891
2003	1.7610	1.7861	1.3374	1.0896	1.0032
2004	2.3642	1.4651	1.2181	0.9802
2005	1.6542	1.4830	1.0045
2006	1.7285	1.0432
2007	1.6292

Let’s get a volume-weighted average LDFs for our Incurred triangle.

cl.Development(average="volume").fit(__fill_in_code__).ldf_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 cl.Development(average="volume").fit(__fill_in_code__).ldf_

NameError: name '__fill_in_code__' is not defined

How about the CDFs? But use a simple average instead.

cl.Development(average=__fill_in_code__).fit(xyz_tri["Incurred"]).__fill_in_code__

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 cl.Development(average=__fill_in_code__).fit(xyz_tri["Incurred"]).__fill_in_code__

NameError: name '__fill_in_code__' is not defined

We can also use only the latest 3 periods in the calculation of CDFs.

cl.Development(average="simple", n_periods=__fill_in_code__).fit(xyz_tri["Incurred"]).cdf_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 cl.Development(average="simple", n_periods=__fill_in_code__).fit(xyz_tri["Incurred"]).cdf_

NameError: name '__fill_in_code__' is not defined

Deterministic Models#

Before we can build any models, we need to use fit_transform(), so that the object is actually modified with our selected development pattern(s).

Set the development of the triangle to use only 3 periods, calculated using the simple method.

cl.Development(average="volume", n_periods=__fill_in_code__).fit_transform(__fill_in_code__)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 cl.Development(average="volume", n_periods=__fill_in_code__).fit_transform(__fill_in_code__)

NameError: name '__fill_in_code__' is not defined

Let’s fit a chainladder model to our Incurred triangle.

cl_mod = cl.Chainladder().fit(__fill_in_code__)
cl_mod

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 cl_mod = cl.Chainladder().fit(__fill_in_code__)
      2 cl_mod

NameError: name '__fill_in_code__' is not defined

How can we get the model’s ultimate estimate?

cl_mod.__fill_in_code__

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 cl_mod.__fill_in_code__

NameError: name 'cl_mod' is not defined

How about just the IBNR?

cl_mod.__fill_in_code__

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 cl_mod.__fill_in_code__

NameError: name 'cl_mod' is not defined

Let’s fit an Expected Loss model, with an aprior of 90% on Premium, and get its ultimates.

cl.ExpectedLoss(apriori=__fill_in_code__).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 cl.ExpectedLoss(apriori=__fill_in_code__).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

NameError: name '__fill_in_code__' is not defined

Try it on the Paid triangle, do you get the same ultimate?

cl.ExpectedLoss(apriori=0.90).fit(
    xyz_tri["Paid"], sample_weight=__fill_in_code__
).ultimate_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 2
      1 cl.ExpectedLoss(apriori=0.90).fit(
----> 2     xyz_tri["Paid"], sample_weight=__fill_in_code__
      3 ).ultimate_

NameError: name '__fill_in_code__' is not defined

How about a Bornhuetter-Ferguson model?

cl.__fill_in_code__(apriori=0.90).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[27], line 1
----> 1 cl.__fill_in_code__(apriori=0.90).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

How about Benktander, with 1 iteration, which is the same as BF?

cl.__fill_in_code__(apriori=0.90, n_iters=__fill_in_code__).fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[28], line 1
----> 1 cl.__fill_in_code__(apriori=0.90, n_iters=__fill_in_code__).fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

How about Cape Cod?

cl.__fill_in_code__().fit(
    xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
).ultimate_

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[29], line 1
----> 1 cl.__fill_in_code__().fit(
      2     xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal
      3 ).ultimate_

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

Let’s store the Cape Cod model as cc_result. We can also use .to_frame() to leave chainladder and go to a DataFrame. Let’s make a bar chart over origin years to see what they look like.

cc_result = (
    cl.CapeCod()
    .fit(xyz_tri["Incurred"], sample_weight=xyz_tri["Premium"].latest_diagonal)
    .ultimate_
).to_frame()

plt.plot(
    cc_result.index.year,
    cc_result[__fill_in_code__]
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[30], line 9
      5 ).to_frame()
      6 
      7 plt.plot(
      8     cc_result.index.year,
----> 9     cc_result[__fill_in_code__]
     10 )

NameError: name '__fill_in_code__' is not defined

Stochastic Models#

The Mack’s Chainladder model is available. Let’s use it on the Incurred triangle.

mcl_mod = cl.__fill_in_code__().fit(__fill_in_code__)
mcl_mod

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[31], line 1
----> 1 mcl_mod = cl.__fill_in_code__().fit(__fill_in_code__)
      2 mcl_mod

AttributeError: module 'chainladder' has no attribute '__fill_in_code__'

There are many attributes that are available, such as full_std_err_, total_process_risk_, total_parameter_risk_, mack_std_err_ and total_mack_std_err_.

__fill_in_code__.full_std_err_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 __fill_in_code__.full_std_err_

NameError: name '__fill_in_code__' is not defined

MackChainladder also has a summary_ attribute.

__fill_in_code__.summary_

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[33], line 1
----> 1 __fill_in_code__.summary_

NameError: name '__fill_in_code__' is not defined

Let’s make a graph, that shows the Reported and IBNR as stacked bars, and error bars showing Mack Standard Errors.

plt.bar(
    mcl_mod.summary_.to_frame().index.year,
    mcl_mod.summary_.to_frame()[__fill_in_code__],
    label="Reported",
)
plt.bar(
    mcl_mod.summary_.to_frame().index.year,
    mcl_mod.summary_.to_frame()[__fill_in_code__],
    bottom=mcl_mod.summary_.to_frame()[__fill_in_code__],
    yerr=mcl_mod.summary_.to_frame()[__fill_in_code__],
    label="IBNR",
)
plt.legend(loc="upper left")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[34], line 2
      1 plt.bar(
----> 2     mcl_mod.summary_.to_frame().index.year,
      3     mcl_mod.summary_.to_frame()[__fill_in_code__],
      4     label="Reported",
      5 )

NameError: name 'mcl_mod' is not defined

ODP Bootstrap is also available. Let’s build sample 10,000 Incurred triangles.

xyz_tri_sampled = (
    cl.BootstrapODPSample(n_sims=__fill_in_code__).fit(__fill_in_code__).resampled_triangles_
)
xyz_tri_sampled

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[35], line 2
      1 xyz_tri_sampled = (
----> 2     cl.BootstrapODPSample(n_sims=__fill_in_code__).fit(__fill_in_code__).resampled_triangles_
      3 )
      4 xyz_tri_sampled

NameError: name '__fill_in_code__' is not defined

We can fit a basic chainladder to all sampled triangles. We now have 10,000 simulated chainladder models, all (most) with unique LDFs.

cl_mod_bootstrapped = cl.Chainladder().fit(xyz_tri_sampled)
cl_mod_bootstrapped

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 cl_mod_bootstrapped = cl.Chainladder().fit(xyz_tri_sampled)
      2 cl_mod_bootstrapped

NameError: name 'xyz_tri_sampled' is not defined

Let’s make another graph.

plt.bar(
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().index.year,
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
    yerr=cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
)

  Cell In[37], line 3
    cl_mod_bootstrapped.ultimate_.__fill_in_code__.to_frame().["2261"],
                                                              ^
SyntaxError: invalid syntax