Applying Deterministic Methods#

Getting Started#

This tutorial focuses on using deterministic methods to square a triangle.

Be sure to make sure your packages are updated. For more info on how to update your pakages, visit Keeping Packages Updated.

# Black linter, optional
%load_ext lab_black

import pandas as pd
import numpy as np
import chainladder as cl
import matplotlib.pyplot as plt

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)
pandas: 2.1.4
numpy: 1.24.3
chainladder: 0.8.18

Disclaimer#

Note that a lot of the examples shown might not be applicable in a real world scenario, and is only meant to demonstrate some of the functionalities included in the package. The user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement.

Chainladder Method#

The basic chainladder method is entirely specified by its development pattern selections. For this reason, the Chainladder estimator takes no additional assumptions, i.e. no additional arguments. Let’s start by loading an example dataset and creating an Triangle with Development patterns and a TailCurve. Recall, we can bundle these two estimators into a single Pipeline if we wish.

genins = cl.load_sample("genins")

genins_dev = cl.Pipeline(
    [("dev", cl.Development()), ("tail", cl.TailCurve())]
).fit_transform(genins)

We can now use the basic Chainladder estimator to estimate ultimate_ values of our Triangle.

genins_model = cl.Chainladder().fit(genins_dev)
genins_model.ultimate_
2261
2001 4,016,553
2002 5,594,009
2003 5,537,497
2004 5,454,190
2005 5,001,513
2006 5,261,947
2007 5,827,759
2008 6,984,945
2009 5,808,708
2010 5,116,430

We can also view the ibnr_. Techincally the term IBNR is reserved for Incurred but not Reported, but the chainladder models use it to describe the difference between the ultimate and the latest evaluation period.

genins_model.ibnr_
2261
2001 115,090
2002 254,924
2003 628,182
2004 865,922
2005 1,128,202
2006 1,570,235
2007 2,344,629
2008 4,120,447
2009 4,445,414
2010 4,772,416

It is often useful to see the completed Triangle and this can be accomplished by inspecting the full_triangle_. As with most other estimator properties, the full_triangle_ is itself a Triangle and can be manipulated as such.

genins
12 24 36 48 60 72 84 96 108 120
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712
2007 440,832 1,288,463 2,419,861 3,483,130
2008 359,480 1,421,128 2,864,498
2009 376,686 1,363,294
2010 344,014
genins_model.full_triangle_
12 24 36 48 60 72 84 96 108 120 132 9999
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463 3,948,071 4,016,553
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085 5,433,719 5,498,632 5,594,009
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315 5,285,148 5,378,826 5,443,084 5,537,497
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268 4,835,458 5,205,637 5,297,906 5,361,197 5,454,190
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311 4,207,459 4,434,133 4,773,589 4,858,200 4,916,237 5,001,513
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712 4,074,999 4,426,546 4,665,023 5,022,155 5,111,171 5,172,231 5,261,947
2007 440,832 1,288,463 2,419,861 3,483,130 4,088,678 4,513,179 4,902,528 5,166,649 5,562,182 5,660,771 5,728,396 5,827,759
2008 359,480 1,421,128 2,864,498 4,174,756 4,900,545 5,409,337 5,875,997 6,192,562 6,666,635 6,784,799 6,865,853 6,984,945
2009 376,686 1,363,294 2,382,128 3,471,744 4,075,313 4,498,426 4,886,502 5,149,760 5,544,000 5,642,266 5,709,671 5,808,708
2010 344,014 1,200,818 2,098,228 3,057,984 3,589,620 3,962,307 4,304,132 4,536,015 4,883,270 4,969,825 5,029,196 5,116,430
genins_model.full_triangle_.dev_to_val()
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ... 2012 2013 2014 2015 2016 2017 2018 2019 2020 2261
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463 ... 3,948,071 3,948,071 3,948,071 3,948,071 3,948,071 3,948,071 3,948,071 3,948,071 3,948,071 4,016,553
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085 ... 5,498,632 5,498,632 5,498,632 5,498,632 5,498,632 5,498,632 5,498,632 5,498,632 5,498,632 5,594,009
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315 ... 5,378,826 5,443,084 5,443,084 5,443,084 5,443,084 5,443,084 5,443,084 5,443,084 5,443,084 5,537,497
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268 ... 5,205,637 5,297,906 5,361,197 5,361,197 5,361,197 5,361,197 5,361,197 5,361,197 5,361,197 5,454,190
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311 ... 4,434,133 4,773,589 4,858,200 4,916,237 4,916,237 4,916,237 4,916,237 4,916,237 4,916,237 5,001,513
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712 ... 4,426,546 4,665,023 5,022,155 5,111,171 5,172,231 5,172,231 5,172,231 5,172,231 5,172,231 5,261,947
2007 440,832 1,288,463 2,419,861 3,483,130 ... 4,513,179 4,902,528 5,166,649 5,562,182 5,660,771 5,728,396 5,728,396 5,728,396 5,728,396 5,827,759
2008 359,480 1,421,128 2,864,498 ... 4,900,545 5,409,337 5,875,997 6,192,562 6,666,635 6,784,799 6,865,853 6,865,853 6,865,853 6,984,945
2009 376,686 1,363,294 ... 3,471,744 4,075,313 4,498,426 4,886,502 5,149,760 5,544,000 5,642,266 5,709,671 5,709,671 5,808,708
2010 344,014 ... 2,098,228 3,057,984 3,589,620 3,962,307 4,304,132 4,536,015 4,883,270 4,969,825 5,029,196 5,116,430

Notice the calendar year of our ultimates. While ultimates will generally be realized before this date, the chainladder package picks the highest allowable date available for its ultimate_ valuation.

genins_model.full_triangle_.valuation_date
Timestamp('2261-12-31 23:59:59.999999999')

We can further manipulate the “triangle”, such as applying cum_to_incr().

genins_model.full_triangle_.dev_to_val().cum_to_incr()
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ... 2012 2013 2014 2015 2016 2017 2018 2019 2020 2261
2001 357,848 766,940 610,542 482,940 527,326 574,398 146,342 139,950 227,229 67,948 ... 68,482
2002 352,118 884,021 933,894 1,183,289 445,745 320,996 527,804 266,172 425,046 ... 64,913 95,377
2003 290,507 1,001,799 926,219 1,016,654 750,816 146,923 495,992 280,405 ... 93,678 64,257 94,413
2004 310,608 1,108,250 776,189 1,562,400 272,482 352,053 206,286 ... 370,179 92,268 63,291 92,993
2005 443,160 693,190 991,983 769,488 504,851 470,639 ... 226,674 339,456 84,611 58,038 85,275
2006 396,132 937,085 847,498 805,037 705,960 ... 351,548 238,477 357,132 89,016 61,060 89,715
2007 440,832 847,631 1,131,398 1,063,269 ... 424,501 389,349 264,121 395,534 98,588 67,626 99,362
2008 359,480 1,061,648 1,443,370 ... 725,788 508,792 466,660 316,566 474,073 118,164 81,054 119,092
2009 376,686 986,608 ... 1,089,616 603,569 423,113 388,076 263,257 394,241 98,266 67,405 99,038
2010 344,014 ... 897,410 959,756 531,636 372,687 341,826 231,882 347,255 86,555 59,371 87,234

Another useful property is full_expectation_. Similar to the full_triangle, it “squares” the Triangle, but replaces the known data with expected values implied by the model and development pattern.

genins_model.full_expectation_
12 24 36 48 60 72 84 96 108 120 132 9999
2001 270,061 942,678 1,647,172 2,400,610 2,817,960 3,110,531 3,378,874 3,560,909 3,833,515 3,901,463 3,948,071 4,016,553
2002 376,125 1,312,904 2,294,081 3,343,423 3,924,682 4,332,157 4,705,889 4,959,416 5,339,085 5,433,719 5,498,632 5,594,009
2003 372,325 1,299,641 2,270,905 3,309,647 3,885,035 4,288,393 4,658,349 4,909,315 5,285,148 5,378,826 5,443,084 5,537,497
2004 366,724 1,280,089 2,236,741 3,259,856 3,826,587 4,223,877 4,588,268 4,835,458 5,205,637 5,297,906 5,361,197 5,454,190
2005 336,287 1,173,846 2,051,100 2,989,300 3,508,995 3,873,311 4,207,459 4,434,133 4,773,589 4,858,200 4,916,237 5,001,513
2006 353,798 1,234,970 2,157,903 3,144,956 3,691,712 4,074,999 4,426,546 4,665,023 5,022,155 5,111,171 5,172,231 5,261,947
2007 391,842 1,367,765 2,389,941 3,483,130 4,088,678 4,513,179 4,902,528 5,166,649 5,562,182 5,660,771 5,728,396 5,827,759
2008 469,648 1,639,355 2,864,498 4,174,756 4,900,545 5,409,337 5,875,997 6,192,562 6,666,635 6,784,799 6,865,853 6,984,945
2009 390,561 1,363,294 2,382,128 3,471,744 4,075,313 4,498,426 4,886,502 5,149,760 5,544,000 5,642,266 5,709,671 5,808,708
2010 344,014 1,200,818 2,098,228 3,057,984 3,589,620 3,962,307 4,304,132 4,536,015 4,883,270 4,969,825 5,029,196 5,116,430

With some clever arithmetic, we can use these objects to give us other useful information. For example, we can retrospectively review the actual Triangle against its modeled expectation.

genins_model.full_triangle_ - genins_model.full_expectation_
12 24 36 48 60 72 84 96 108 120 132 9999
2001 87,787 182,110 88,158 -182,340 -72,364 209,463 87,462 45,377 0
2002 -24,007 -76,765 -124,048 9,899 -125,615 -212,094 -58,022 -45,377
2003 -81,818 -7,335 -52,380 -74,468 100,960 -155,475 -29,439 -0
2004 -56,116 138,769 -41,694 497,591 203,342 158,105 0
2005 106,873 -37,496 77,233 -91,479 -106,323 0 0 0
2006 42,334 98,247 22,812 -159,204 0 0
2007 48,990 -79,302 29,920 0 0 0 0
2008 -110,168 -218,227 0 0 0
2009 -13,875 0 0 0 0 0 0 0
2010 0 0 0 0 0

We can also filter out the lower right part of the triangle with [genins_model.full_triangle_.valuation <= genins.valuation_date].

(
    genins_model.full_triangle_[
        genins_model.full_triangle_.valuation <= genins.valuation_date
    ]
    - genins_model.full_expectation_[
        genins_model.full_triangle_.valuation <= genins.valuation_date
    ]
)
12 24 36 48 60 72 84 96 108 120
2001 87,787 182,110 88,158 -182,340 -72,364 209,463 87,462 45,377 0
2002 -24,007 -76,765 -124,048 9,899 -125,615 -212,094 -58,022 -45,377
2003 -81,818 -7,335 -52,380 -74,468 100,960 -155,475 -29,439
2004 -56,116 138,769 -41,694 497,591 203,342 158,105
2005 106,873 -37,496 77,233 -91,479 -106,323
2006 42,334 98,247 22,812 -159,204
2007 48,990 -79,302 29,920
2008 -110,168 -218,227
2009 -13,875
2010

Getting comfortable with manipulating Triangles will greatly improve our ability to extract value out of the chainladder package. Here is another way of getting the same answer.

genins_AvE = genins - genins_model.full_expectation_
genins_AvE[genins_AvE.valuation <= genins.valuation_date]
12 24 36 48 60 72 84 96 108 120
2001 87,787 182,110 88,158 -182,340 -72,364 209,463 87,462 45,377 0
2002 -24,007 -76,765 -124,048 9,899 -125,615 -212,094 -58,022 -45,377
2003 -81,818 -7,335 -52,380 -74,468 100,960 -155,475 -29,439
2004 -56,116 138,769 -41,694 497,591 203,342 158,105
2005 106,873 -37,496 77,233 -91,479 -106,323
2006 42,334 98,247 22,812 -159,204
2007 48,990 -79,302 29,920
2008 -110,168 -218,227
2009 -13,875
2010

We can also filter out the lower right part of the triangle with [genins_model.full_triangle_.valuation <= genins.valuation_date] before applying the heatmap().

genins_AvE[genins_AvE.valuation <= genins.valuation_date].heatmap()
  12 24 36 48 60 72 84 96 108 120
2001 87,787 182,110 88,158 -182,340 -72,364 209,463 87,462 45,377 0
2002 -24,007 -76,765 -124,048 9,899 -125,615 -212,094 -58,022 -45,377
2003 -81,818 -7,335 -52,380 -74,468 100,960 -155,475 -29,439
2004 -56,116 138,769 -41,694 497,591 203,342 158,105
2005 106,873 -37,496 77,233 -91,479 -106,323
2006 42,334 98,247 22,812 -159,204
2007 48,990 -79,302 29,920
2008 -110,168 -218,227
2009 -13,875
2010

Can you figure out how to get the expected IBNR runoff in the upcoming year?

cal_yr_ibnr = genins_model.full_triangle_.dev_to_val().cum_to_incr()
cal_yr_ibnr[cal_yr_ibnr.valuation.year == 2011]
2011
2001 46,608
2002 94,634
2003 375,833
2004 247,190
2005 334,148
2006 383,287
2007 605,548
2008 1,310,258
2009 1,018,834
2010 856,804

Expected Loss Method#

Next, let’s talk about the expected loss method, where we know the ultimate loss already (but then why are you trying to estimate your ultimate losses?). The ExpectedLoss model estimator has many of the same attributes as the Chainladder estimator. It comes with one input assumption, the a priori (apriori). This is a scalar multiplier that will be applied to an exposure vector, which will produce an a priori ultimate estimate vector that we can use for the model.

Earlier, we used the Chainladder method on the genins data. Let’s use the average of the Chainladder ultimate to help us derive an a priori in the expected loss method.

Below, we use genins_model.ultimate_ * 0 and add the mean to it to preserve the index values of years 2001 - 2020.

expected_loss_apriori = genins_model.ultimate_ * 0 + genins_model.ultimate_.mean()
expected_loss_apriori
2261
2001 5,460,355
2002 5,460,355
2003 5,460,355
2004 5,460,355
2005 5,460,355
2006 5,460,355
2007 5,460,355
2008 5,460,355
2009 5,460,355
2010 5,460,355

Let’s assume that we know our expected loss will be 95% of the a priori, which is common if the a priori is a function of something else, like earned premium. We set apriori with 0.95 inside the function, then call fit on the data, genins, and assigh the sample_weight with our vector of a prioris, expected_loss_apriori.

EL_model = cl.ExpectedLoss(apriori=0.95).fit(
    genins, sample_weight=expected_loss_apriori
)
EL_model.ultimate_
2261
2001 5,187,337
2002 5,187,337
2003 5,187,337
2004 5,187,337
2005 5,187,337
2006 5,187,337
2007 5,187,337
2008 5,187,337
2009 5,187,337
2010 5,187,337

Bornhuetter-Ferguson Method#

The BornhuetterFerguson estimator is another deterministic method having many of the same attributes as the Chainladder estimator. It comes with one input assumption, the a priori (apriori). This is a scalar multiplier that will be applied to an exposure vector, which will produce an a priori ultimate estimate vector that we can use for the model.

Since the CAS Loss Reserve Database has premium, we will use it as an example. Let’s grab the paid loss and net earned premium for the commercial auto line of business.

Remember that apriori is a scaler, which we need to apply it to a vector of exposures. Let’s assume that the a priori is 0.75, for 75% loss ratio.

Let’s set an apriori Loss Ratio estimate of 75%

The BornhuetterFerguson method along with all other expected loss methods like CapeCod and Benktander (discussed later), need to take in an exposure vector. The exposure vector has to be a Triangle itself. Remember that the Triangle class supports single exposure vectors.

comauto = cl.load_sample("clrd").groupby("LOB").sum().loc["comauto"]

bf_model = cl.BornhuetterFerguson(apriori=0.75)
bf_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
BornhuetterFerguson(apriori=0.75)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
bf_model.ultimate_
2261
1988 626,097
1989 679,224
1990 728,363
1991 729,927
1992 767,610
1993 833,686
1994 918,582
1995 954,377
1996 985,280
1997 1,031,637

Having an apriori that takes on only a constant for all origins can be limiting. This shouldn’t stop the practitioner from exploiting the fact that the apriori can be embedded directly in the exposure vector itself allowing full cusomization of the apriori.

b1 = cl.BornhuetterFerguson(apriori=0.75).fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

b2 = cl.BornhuetterFerguson(apriori=1.00).fit(
    comauto["CumPaidLoss"],
    sample_weight=0.75 * comauto["EarnedPremNet"].latest_diagonal,
)

b1.ultimate_ == b2.ultimate_
True

If we need to create a new colume, such as AdjEarnedPrmNet with varying implied loss ratios. It is recommend that we perform any data modification in pandas instead of Triangle forms.

Let’s perform the estimate using Chainladder and compare the results.

bf_model.ultimate_.to_frame(origin_as_datetime=True).index.year
Index([1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997], dtype='int32')
plt.plot(
    bf_model.ultimate_.to_frame(origin_as_datetime=True).index.year,
    bf_model.ultimate_.to_frame(origin_as_datetime=True),
    label="BF",
)

cl_model = cl.Chainladder().fit(comauto["CumPaidLoss"])
plt.plot(
    cl_model.ultimate_.to_frame(origin_as_datetime=True).index.year,
    cl_model.ultimate_.to_frame(origin_as_datetime=True),
    label="CL",
)

plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x7f9fdc2c6e50>
../../_images/6cc8c4ac3b86383e4cb6246ac942cbf67663cb8415f6bc50c68d5b00a317bcd2.png

Benktander Method#

The Benktander method is similar to the BornhuetterFerguson method, but allows for the specification of one additional assumption, n_iters, the number of iterations to recalculate the ultimates. The Benktander method generalizes both the BornhuetterFerguson and the Chainladder estimator through this assumption.

  • When n_iters = 1, the result is equivalent to the BornhuetterFerguson estimator.

  • When n_iters is sufficiently large, the result converges to the Chainladder estimator.

bk_model = cl.Benktander(apriori=0.75, n_iters=2)
bk_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
Benktander(apriori=0.75, n_iters=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Fitting the Benktander method looks identical to the other methods.

bk_model.fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
Benktander(apriori=0.75, n_iters=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
plt.plot(
    bf_model.ultimate_.to_frame(origin_as_datetime=True).index.year,
    bf_model.ultimate_.to_frame(origin_as_datetime=True),
    label="BF",
)
plt.plot(
    cl_model.ultimate_.to_frame(origin_as_datetime=True).index.year,
    cl_model.ultimate_.to_frame(origin_as_datetime=True),
    label="CL",
)
plt.plot(
    bk_model.ultimate_.to_frame(origin_as_datetime=True).index.year,
    bk_model.ultimate_.to_frame(origin_as_datetime=True),
    label="BK",
)
plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x7f9fdc1c2d90>
../../_images/290208114a4aada45d66cf460d493c68cd889989ddbe460a678062176b41d7bd.png

Cape Cod Method#

The CapeCod method is similar to the BornhuetterFerguson method, except its apriori is computed from the Triangle itself. Instead of specifying an apriori, decay and trend need to be specified.

  • decay is the rate that gives weights to earlier origin periods, this parameter is required by the Generalized Cape Cod Method, as discussed in Using Best Practices to Determine a Best Reserve Estimate by Struzzieri and Hussian. As the decay factor approaches 1 (the default value), the result approaches the traditional Cape Cod method. As the decay factor approaches 0, the result approaches the Chainladder method.

  • trend is the trend rate along the origin axis to reflect systematic inflationary impacts on the a priori.

When we fit a CapeCod method, we can see the apriori it computes with the given decay and trend assumptions. Since it is an array of estimated parameters, this CapeCod attribute is called the apriori_, with a trailing underscore.

cc_model = cl.CapeCod()
cc_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
CapeCod()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

With decay=1, each origin period gets the same apriori_ (this is the traditional Cape Cod). The apriori_ is calculated using the latest diagonal over the used-up exposure, where the used-up exposure is the exposure vector / CDF. Let’s validate the calculation of the a priori.

latest_diagonal = comauto["CumPaidLoss"].latest_diagonal

cdf_as_origin_vector = (
    cl.Chainladder().fit(comauto["CumPaidLoss"]).ultimate_
    / comauto["CumPaidLoss"].latest_diagonal
)

latest_diagonal.sum() / (
    comauto["EarnedPremNet"].latest_diagonal / cdf_as_origin_vector
).sum()
0.6856862224535671

With decay=0, the apriori_ for each origin period stands on its own.

cc_model = cl.CapeCod(decay=0, trend=0).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
cc_model.apriori_
2261
1988 0.6853
1989 0.7041
1990 0.6903
1991 0.6478
1992 0.6518
1993 0.6815
1994 0.6925
1995 0.7004
1996 0.7039
1997 0.7619

Doing the same on our manually calculated apriori_ yields the same result.

latest_diagonal / (comauto["EarnedPremNet"].latest_diagonal / cdf_as_origin_vector)
1997
1988 0.6853
1989 0.7041
1990 0.6903
1991 0.6478
1992 0.6518
1993 0.6815
1994 0.6925
1995 0.7004
1996 0.7039
1997 0.7619

Let’s verify the result of this Cape Cod model’s result with the Chainladder’s.

cc_model.ultimate_ - cl_model.ultimate_
2261
1988
1989
1990
1991
1992 0.0000
1993 0.0000
1994
1995 0.0000
1996
1997 0.0000

We can examine the apriori_s to see whether there exhibit any trends over time.

plt.plot(
    cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    cc_model.apriori_.to_frame(origin_as_datetime=True),
)
[<matplotlib.lines.Line2D at 0x7f9fdc0153d0>]
../../_images/8d6828db9e13da50606886afa673909186f68bbb5a0fb52606817f0f1c4a2f64.png

Looks like there is a small positive trend, let’s judgementally select the trend as 1%.

trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

plt.plot(
    cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Untrended",
)
plt.plot(
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Trended",
)
plt.legend(loc="lower right")
<matplotlib.legend.Legend at 0x7f9fdc05b3d0>
../../_images/8cd44ae07c6b7edd66e9126dae0549c4a5b0c4dab6b4d844de42e910d242d6b1.png

We can of course utilize both the trend and the decay parameters together. Adding trend to the CapeCod method is intended to adjust the apriori_s to a common level. Once at a common level, the apriori_ can be estimated from multiple origin periods using the decay factor.

trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

trended_decayed_cc_model = cl.CapeCod(decay=0.75, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

plt.plot(
    cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Untrended",
)
plt.plot(
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Trended",
)
plt.plot(
    trended_decayed_cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    trended_decayed_cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Trended and Decayed",
)
plt.legend(loc="lower right")
<matplotlib.legend.Legend at 0x7f9fdb7aad90>
../../_images/cfcb956992c7a5531e93385807aae73c9c5a47e409f56742b9b23abe9bcfbd81.png

Once estimated, it is necessary to detrend our apriori_s back to their untrended levels and these are contained in detrended_apriori_. It is the detrended_apriori_ that gets used in the calculation of ultimate_ losses.

plt.plot(
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True).index.year,
    trended_cc_model.apriori_.to_frame(origin_as_datetime=True),
    label="Trended",
)
plt.plot(
    trended_cc_model.detrended_apriori_.to_frame(origin_as_datetime=True).index.year,
    trended_cc_model.detrended_apriori_.to_frame(origin_as_datetime=True),
    label="Detended to Original",
)
plt.legend(loc="lower right")
<matplotlib.legend.Legend at 0x7f9fdb663e50>
../../_images/01686fec04c0e4c156294cffae9024c08efa7d37fc3d589f57124e3145b5c7e2.png

The detrended_apriori_ is a much smoother estimate of the initial expected ultimate_. With the detrended_apriori_ in hand, the CapeCod method estimator behaves exactly like our the BornhuetterFerguson model.

bf_model = cl.BornhuetterFerguson().fit(
    X=comauto["CumPaidLoss"],
    sample_weight=trended_cc_model.detrended_apriori_
    * comauto["EarnedPremNet"].latest_diagonal,
)

bf_model.ultimate_.sum() == trended_cc_model.ultimate_.sum()
True

Recap#

All the deterministic estimators have ultimate_, ibnr_, full_expecation_ and full_triangle_ attributes that are themselves Triangles. These can be manipulated in a variety of ways to gain additional insights from our model. The expected loss methods take in an exposure vector, which itself is a Triangle through the sample_weight argument of the fit method. The CapeCod method has the additional attributes apriori_ and detrended_apriori_ to accommodate the selection of its trend and decay assumptions.

Finally, these estimators work very well with the transformers discussed in previous tutorials. Let’s demonstrate the compositional nature of these estimators.

wkcomp = (
    cl.load_sample("clrd")
    .groupby("LOB")
    .sum()
    .loc["wkcomp"][["CumPaidLoss", "EarnedPremNet"]]
)
wkcomp
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (1, 2, 10, 10)
Index: [LOB]
Columns: [CumPaidLoss, EarnedPremNet]

Let’s calculate the age-to-age factors:

  • Without the the 1995 valuation period

  • Using volume weighted for the first 5 factors, and simple average for the next 4 factors (for a total of 9 age-to-age factors)

  • Using no more than 7 periods (with n_periods)

patterns = cl.Pipeline(
    [
        (
            "dev",
            cl.Development(
                average=["volume"] * 5 + ["simple"] * 4,
                n_periods=7,
                drop_valuation="1995",
            ),
        ),
        ("tail", cl.TailCurve(curve="inverse_power", extrap_periods=80)),
    ]
)
cc = cl.CapeCod(decay=0.8, trend=0.02).fit(
    X=patterns.fit_transform(wkcomp["CumPaidLoss"]),
    sample_weight=wkcomp["EarnedPremNet"].latest_diagonal,
)
cc.ultimate_
2261
1988 1,331,221
1989 1,416,505
1990 1,523,470
1991 1,581,962
1992 1,541,458
1993 1,484,168
1994 1,525,963
1995 1,548,534
1996 1,541,068
1997 1,507,592
plt.bar(
    cc.ultimate_.to_frame(origin_as_datetime=True).index.year,
    cc.ultimate_.to_frame(origin_as_datetime=True)["2261"],
)
<BarContainer object of 10 artists>
../../_images/b212720d9d428ea7e01781c912b76f37ee086a0e1cef5574531c69614ad28999.png

Voting Chainladder#

A VotingChainladder is an ensemble meta-estimator that fits several base chainladder methods, each on the whole triangle. Then it combines the individual predictions based on a matrix of weights to form a final prediction.

Let’s begin by loading the raa dataset.

raa = cl.load_sample("raa")

Instantiate the Chainladder’s estimator.

cl_mod = cl.Chainladder()

Instantiate the Expected Loss’s estimator.

el_mod = cl.ExpectedLoss(apriori=1)

Instantiate the Bornhuetter-Ferguson’s estimator. Remember that the BornhuetterFerguson requires one argument, the apriori.

bf_mod = cl.BornhuetterFerguson(apriori=1)

Instantiate the Cape Cod’s estimator and their required arguments.

cc_mod = cl.CapeCod(decay=1, trend=0)

Instantiate the Benktander’s estimator and their required arguments.

bk_mod = cl.Benktander(apriori=1, n_iters=2)

Let’s prepare the estimators variable. The estimators parameter in VotingChainladder must be in an array of tuples, with (estimator_name, estimator) pairing.

estimators = [
    ("cl", cl_mod),
    ("el", el_mod),
    ("bf", bf_mod),
    ("cc", cc_mod),
    ("bk", bk_mod),
]

Recall that some estimators (in this case, BornhuetterFerguson, CapeCod, and Benktander) also require the variable sample_weight, let’s use the mean of Chainladder’s average ultimate estimate.

sample_weight = cl_mod.fit(raa).ultimate_ * 0 + (
    float(cl_mod.fit(raa).ultimate_.sum()) / 10
)
sample_weight
2261
1981 21,312
1982 21,312
1983 21,312
1984 21,312
1985 21,312
1986 21,312
1987 21,312
1988 21,312
1989 21,312
1990 21,312
model_weights = np.array(
    [[0.6, 0, 0.2, 0.2, 0]] * 4 + [[0, 0, 0.5, 0.5, 0]] * 3 + [[0, 0, 0, 1, 0]] * 3
)

vot_mod = cl.VotingChainladder(estimators=estimators, weights=model_weights).fit(
    raa, sample_weight=sample_weight
)
vot_mod.ultimate_
2261
1981 18,834
1982 16,876
1983 24,059
1984 28,543
1985 28,237
1986 19,905
1987 18,947
1988 23,107
1989 20,005
1990 21,606
plt.plot(
    cl_mod.fit(raa).ultimate_.to_frame(origin_as_datetime=True).index.year,
    cl_mod.fit(raa).ultimate_.to_frame(origin_as_datetime=True),
    label="Chainladder",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    el_mod.fit(raa, sample_weight=sample_weight)
    .ultimate_.to_frame(origin_as_datetime=True)
    .index.year,
    el_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(
        origin_as_datetime=True
    ),
    label="Expected Loss",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    bf_mod.fit(raa, sample_weight=sample_weight)
    .ultimate_.to_frame(origin_as_datetime=True)
    .index.year,
    bf_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(
        origin_as_datetime=True
    ),
    label="Bornhuetter-Ferguson",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    cc_mod.fit(raa, sample_weight=sample_weight)
    .ultimate_.to_frame(origin_as_datetime=True)
    .index.year,
    cc_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(
        origin_as_datetime=True
    ),
    label="Cape Cod",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    bk_mod.fit(raa, sample_weight=sample_weight)
    .ultimate_.to_frame(origin_as_datetime=True)
    .index.year,
    bk_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(
        origin_as_datetime=True
    ),
    label="Benktander",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    vot_mod.ultimate_.to_frame(origin_as_datetime=True).index.year,
    vot_mod.ultimate_.to_frame(origin_as_datetime=True),
    label="Selected",
)
plt.legend(loc="best")
<matplotlib.legend.Legend at 0x7f9fdb4dcd10>
../../_images/3f047b03007d7acbca00b86bdfd734c4a615cff928d3f6d7a01dc884a3866c36.png

We can also call the weights attribute to confirm the weights being used by the VotingChainladder ensemble model.

vot_mod.weights
array([[0.6, 0. , 0.2, 0.2, 0. ],
       [0.6, 0. , 0.2, 0.2, 0. ],
       [0.6, 0. , 0.2, 0.2, 0. ],
       [0.6, 0. , 0.2, 0.2, 0. ],
       [0. , 0. , 0.5, 0.5, 0. ],
       [0. , 0. , 0.5, 0.5, 0. ],
       [0. , 0. , 0.5, 0.5, 0. ],
       [0. , 0. , 0. , 1. , 0. ],
       [0. , 0. , 0. , 1. , 0. ],
       [0. , 0. , 0. , 1. , 0. ]])