Triangle#

Introduction#

With this User Guide, we will be covering all of the core functionality of chainladder. All important user functionality can be referenced from the top level of the library and so importing the library as the cl namespace is the preferred way of importing from chainladder.

import chainladder as cl

import matplotlib.pyplot as plt
plt.style.use('ggplot')
%config InlineBackend.figure_format = 'retina'

Analysis is the intersection of data, models, and assumptions.

Reserving analysis is no different and the loss Triangle is the most ubiquitous data construct used by actuaries today. The chainladder package has its own :class:Triangle data structure that behaves much like a pandas DataFrame.

Why not Pandas?#

This begs the question, why not just use pandas? There are several advantages over having a dedicated Triangle object:

  • Actuaries work with sets of triangles. DataFrames, being two dimensional, support single triangles with grace but become unwieldy with multiple triangles.

  • We can carry through the meaningful pandas functionality while also supporting triangle specific methods not found in pandas

  • Improved memory footprint with sparse array representation in backend

  • Calculated fields with “virtual” columns allows for lazy column evaluation of Triangles as well as improved memory footprint for larger triangles.

  • Ability to support GPU-based backends.

Ultimately, there are a lot of things pandas can do that are not relevant to reserving, and there are a lot of things a Triangle needs to do that are not handled easily with pandas.

Structure#

The :class:Triangle is the data structure of the chainladder package. Just as Scikit-learn likes to only consume numpy arrays, Chainladder only likes Triangles. It is a 4D data structure with labeled axes. These axes are its index, columns, origin, development.

index (axis 0): The index is the lowest grain at which you want to manage the triangle. These can be things like state or company. Like a pandas.multiIndex, you can throw more than one column into the index.

columns (axis 1): Columns are where you would want to store the different numeric values of your data. Paid, Incurred, Counts are all reasonable choices for the columns of your triangle.

origin (axis 2): The origin is the period of time from which your columns originate. It can be an Accident Month, Report Year, Policy Quarter or any other period-like vector.

development (axis 3): Development represents the development age or date of your triangle. Valuation Month, Valuation Year, Valuation Quarter are good choices.

Despite this structure, you interact with it in the style of pandas. You would use index and columns in the same way you would for a pandas DataFrame. You can think of the 4D structure as a pandas DataFrame where each cell (row, column) is its own triangle.

../_images/triangle_graphic.PNG

Like pandas, you can access the values property of a triangle to get its numpy representation, however the Triangle class provides many helper methods to keep the shape of the numpy representation in sync with the other Triangle properties.

Creating a Triangle#

Basic requirements#

You must have a pandas DataFrame on hand to create a triangle. While data can come in a variety of forms those formats should be coerced to a pandas DataFrame before creating a triangle. The DataFrame also must be in tabular (long) format, not triangle (wide) format:

../_images/triangle_bad_good.PNG

At a minimum, the DataFrame must also:

  1. have “date-like” columns for the origin and development period of the triangle.

  2. Have a numeric column(s) representing the amount(s) of the triangle.

The reason for these restriction is that the :class:Triangle infers a lot of useful properties from your DataFrame. For example, it will determine the grain and valuation_date of your triangle which in turn are used to derive many other properties of your triangle without further prompting from you.

Date Inference#

When instantiating a :class:Triangle, the origin and development arguments can take a str representing the column name in your pandas DataFrame that contains the relevant information. Alternatively, the arguments can also take a list in the case where your DataFrame includes multiple columns that represent the dimension, e.g. ['accident_year','accident_quarter'] can be supplied to create an origin dimension at the accident quarter grain.

cl.Triangle(data, origin='Acc Year', development=['Cal Year', 'Cal Month'], columns=['Paid Loss'])

The :class:Triangle relies heavily on pandas date inference. In fact, pd.to_datetime(date_like) is exactly how it works. While pandas is excellent at inference, it is not perfect. When initializing a Triangle you can always use the origin_format and/or development_format arguments to force the inference. For example, origin_format='%Y/%m/%d'

Multidimensional Triangle#

So far we’ve seen how to create a single Triangle, but as described in the Intro the Triangle class can hold multiple triangles at once. These triangles share the same origin and development axes and act as individual cells would in a pandas DataFrame. By specifying one or more column and one or more index, we can fill out the 4D triangle structure.

cl.Triangle(data, origin='Acc Year', development='Cal Year',
            columns=['Paid Loss', 'Incurred Loss'],
            index=['Line of Business', 'State'])

Sample Data#

The chainladder package has several sample triangles. Many of these come from existing papers and can be used to verify the results of those papers. Additionally, They are a quick way of exploring the functionality of the package. These triangles can be called by name using the :func:~chainladder.load_sample function.

cl.load_sample('clrd')
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (775, 6, 10, 10)
Index: [GRNAME, LOB]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]

Other Parameters#

Whether a triangle is cumulative or incremental in nature cannot be inferred from the “date-like” vectors of your DataFrame. You can optionally specify this property with the cumulative parameter.

cl.Triangle(
    data, origin='Acc Year', development=['Cal Year'], 
    columns=['PaidLoss'], cumulative=True)

Note

The cumulative parameter is completely optional. If it is not specified, the Triangle will infer its cumulative/incremental status at the point you call on the cum_to_incr or incr_to_cum methods discussed below. Some methods may not work until the cumulative/incremental status is known.

Backends

Triangle is built on numpy which serves as the array backend by default. However, you can now swap array_backend between numpy, cupy, and sparse to switch between CPU and GPU-based computations or dense and sparse backends.

Array backends can be set globally:

cl.options.set_option('array_backend', 'cupy')
cl.options.reset_option()

Alternatively, they can be set per Triangle instance.

cl.Triangle(..., array_backend='cupy')

Note

You must have a CUDA-enabled graphics card and CuPY installed to use the GPU backend. These are optional dependencies of chainladder.

Chainladder by default will swap between the numpy and sparse backends. This substantially improves the memory footprint of chainladder substantially beyond what can be achieved with pandas alone. When a Triangle becomes sufficiently large and has a lot of 0 or null entries, the triangle will silently swap between the numpy and sparse backends.

prism = cl.load_sample('prism')
prism
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
Triangle Summary
Valuation: 2017-12
Grain: OMDM
Shape: (34244, 4, 120, 120)
Index: [ClaimNo, Line, Type, ClaimLiability, Limit, Deductible]
Columns: [reportedCount, closedPaidCount, Paid, Incurred]
prism.array_backend
'sparse'
prism.sum().array_backend
'numpy'

You can globally disable the backend swapping by invoking auto_sparse(False). Any triangle with a cupy backend will not invoke auto_sparse. In the future it may be supported when there is better sparse-GPU array support.:

cl.options.set_option('auto_sparse', False)
prism = cl.load_sample('prism', array_backend='sparse')
prism.array_backend
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
'sparse'
prism.sum().array_backend
'numpy'

Warning

Loading ‘prism’ with the numpy backend will consume all of your systems memory.

Basic Functionality#

Representation#

The Triangle has two different representations. When only a single index AND single column is selected. The triangle is the typical 2-dimensional representation we typically think of.

triangle = cl.load_sample('ukmotor')
print(triangle.shape)
triangle
(1, 1, 7, 7)
12 24 36 48 60 72 84
2007 3,511 6,726 8,992 10,704 11,763 12,350 12,690
2008 4,001 7,703 9,981 11,161 12,117 12,746
2009 4,355 8,287 10,233 11,755 12,993
2010 4,295 7,750 9,773 11,093
2011 4,150 7,897 10,217
2012 5,102 9,650
2013 6,283

If more than one index or more than one column is present, then the Triangle takes on more of a summary view.

triangle = cl.load_sample('CLRD')
print(triangle.shape)
triangle
(775, 6, 10, 10)
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (775, 6, 10, 10)
Index: [GRNAME, LOB]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]

Valuation vs Development#

While most Estimators that use triangles expect the development period to be expressed as an origin age, it is possible to transform a triangle into a valuation triangle where the development periods are converted to valuation periods. Expressing triangles this way may provide a more convenient view of valuation slices. Switching between a development triangle and a valuation triangle can be accomplished with the method dev_to_val and its inverse val_to_dev.

cl.load_sample('raa').dev_to_val()
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
1981 5,012 8,269 10,907 11,805 13,539 16,181 18,009 18,608 18,662 18,834
1982 106 4,285 5,396 10,666 13,782 15,599 15,496 16,169 16,704
1983 3,410 8,992 13,873 16,141 18,735 22,214 22,863 23,466
1984 5,655 11,555 15,766 21,266 23,425 26,083 27,067
1985 1,092 9,565 15,836 22,169 25,955 26,180
1986 1,513 6,445 11,702 12,935 15,852
1987 557 4,020 10,946 12,314
1988 1,351 6,947 13,112
1989 3,133 5,395
1990 2,063

Triangles have the is_val_tri property that denotes whether a triangle is in valuation mode. The latest diagonal of a Triangle is a valuation triangle.

cl.load_sample('raa').latest_diagonal.is_val_tri
True

Incremental vs Cumulative#

A triangle is either cumulative or incremental. The is_cumulative property will identify this trait. Accumulating an incremental triangle can be acomplished with incr_to_cum. The inverse operation is cum_to_incr.

raa = cl.load_sample('raa')
print(raa.is_cumulative)
raa.cum_to_incr()
True
12 24 36 48 60 72 84 96 108 120
1981 5,012 3,257 2,638 898 1,734 2,642 1,828 599 54 172
1982 106 4,179 1,111 5,270 3,116 1,817 -103 673 535
1983 3,410 5,582 4,881 2,268 2,594 3,479 649 603
1984 5,655 5,900 4,211 5,500 2,159 2,658 984
1985 1,092 8,473 6,271 6,333 3,786 225
1986 1,513 4,932 5,257 1,233 2,917
1987 557 3,463 6,926 1,368
1988 1,351 5,596 6,165
1989 3,133 2,262
1990 2,063

Triangle Grain#

If your triangle has origin and development grains that are more frequent then yearly, you can easily swap to a higher grain using the grain method of the Triangle. The grain method recognizes Yearly (Y), Quarterly (Q), and Monthly (M) grains for both the origin period and development period.

cl.load_sample('quarterly')
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
Triangle Summary
Valuation: 2006-03
Grain: OYDQ
Shape: (1, 2, 12, 45)
Index: [Total]
Columns: [incurred, paid]
cl.load_sample('quarterly').grain('OYDY')
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
Triangle Summary
Valuation: 2006-03
Grain: OYDY
Shape: (1, 2, 12, 12)
Index: [Total]
Columns: [incurred, paid]

It is generally a good practice to bring your data in at the lowest grain available, so that you have full flexibility in aggregating to the grain of your choosing for analysis and separately, the grain of your choosing for reporting and communication.

Commutativity#

Where possible, the triangle methods are designed to be commutative. For example, each of these operations is functionally equivalent.

tri = cl.load_sample('quarterly')
# Functionally equivalent transformations
tri.grain('OYDY').val_to_dev() == tri.val_to_dev().grain('OYDY')
tri.cum_to_incr().grain('OYDY').val_to_dev() == tri.val_to_dev().cum_to_incr().grain('OYDY')
tri.grain('OYDY').cum_to_incr().val_to_dev().incr_to_cum() == tri.val_to_dev().grain('OYDY')
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
True

Performance Tips#

Being mindful of commutativity and computational intensity can really help improve the performance of the package, particularly for really large triangles. Consider these examples that produce identical outputs but with drastically different performance. In general, aggregations reduce the number of cells in a Triangle and should come as early in your method chain as possible.

import timeit
prism = cl.load_sample('prism')
# Accumulation before aggregation - BAD
timeit.timeit(lambda : prism.incr_to_cum().sum(), number=1)
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/base.py:250: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  arr = dict(zip(datetime_arg, pd.to_datetime(**item)))
10.157972535999988
# Aggregation before accumulation - GOOD
timeit.timeit(lambda : prism.sum().incr_to_cum(), number=1)
0.02395860700016783

In other cases, querying the Triangle in clever ways can improve performance. Consider that the latest_diagonal of a cumulative Triangle is equal to the sum of its incremental values along the ‘development’ axis.

# Accumulating a large triangle to get latest_diagonal - BAD
timeit.timeit(lambda : prism.incr_to_cum().latest_diagonal, number=1)
9.175006923999945
# Summing incrementals of a large triangle to get latest_diagonal - GOOD
timeit.timeit(lambda : prism.sum('development'), number=1)
0.03220626000006632

Trend#

A uniform trend factor can also be applied to a Triangle. The trend can be applied along the origin or valuation axes.

tri = cl.load_sample('ukmotor')
# Dividing by original triangle to show the trend factor
tri.trend(0.05, axis='valuation')/tri
12 24 36 48 60 72 84
2007 1.3401 1.2763 1.2155 1.1576 1.1025 1.0500 1.0000
2008 1.2763 1.2155 1.1576 1.1025 1.0500 1.0000
2009 1.2155 1.1576 1.1025 1.0500 1.0000
2010 1.1576 1.1025 1.0500 1.0000
2011 1.1025 1.0500 1.0000
2012 1.0500 1.0000
2013 1.0000

While the trend method only allows for a single trend, you can create compound trends using start and end arguments and chaining them together.

tri.trend(0.05, axis='valuation', start=tri.valuation_date, end='2011-12-31') \
   .trend(0.10, axis='valuation', start='2011-12-31')/tri
12 24 36 48 60 72 84
2007 1.6142 1.4674 1.3340 1.2128 1.1025 1.0500 1.0000
2008 1.4674 1.3340 1.2128 1.1025 1.0500 1.0000
2009 1.3340 1.2127 1.1025 1.0500 1.0000
2010 1.2128 1.1025 1.0500 1.0000
2011 1.1025 1.0500 1.0000
2012 1.0500 1.0000
2013 1.0000

Correlation Tests#

The multiplicative chainladder method is based on the strong assumptions of independence across origin years and across valuation years. Mack developed tests to verify if these assumptions hold.

These tests are included as methods on the triangle class valuation_correlation and development_correlation. False indicates that correlation between years is not sufficiently large.

triangle = cl.load_sample('raa')
triangle.valuation_correlation().z_critical
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1217: RuntimeWarning: All-NaN slice encountered
  return function_base._ureduce(a, func=_nanmedian, keepdims=keepdims,
1982 1983 1984 1985 1986 1987 1988 1989 1990
1981 False False False False False False False False False

triangle.development_correlation().t_critical

There are many properties of these correlation tests and they’ve been included as their own classes. Refer to ValuationCorrelation and DevelopmentCorrelation for additional information.

[Mack, 1994]

Pandas-style syntax#

We’ve chosen to keep as close as possible to pandas syntax for Triangle data manipulation. Relying on the most widely used data manipulation library in Python gives us two benefits. This not only allows for easier adoption, but also provides stability to the chainladder API.

Slicing and Filtering#

With a newly minted Triangle, individual triangles can be sliced out of the object using pandas-style loc/iloc or boolean filtering.

clrd = cl.load_sample('clrd')
clrd.iloc[0,1]
clrd[clrd['LOB']=='othliab']
clrd['EarnedPremDIR']
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (775, 1, 10, 10)
Index: [GRNAME, LOB]
Columns: [EarnedPremDIR]

Note

Boolean filtering on non-index columns in pandas feels natural. We’ve exposed the same syntax specifically for the index column(s) of the Triangle without the need for reset_index() or trying to boolean-filter a MultiIndex. This is a divergence from the pandas API.

As of version 0.7.6, four-dimensional slicing is supported:

clrd = cl.load_sample('clrd')
clrd.iloc[[0, 10, 3], 1:8, :5, :]
clrd.loc[:'Aegis Grp', 'CumPaidLoss':, '1990':'1994', :48]
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (5, 5, 5, 4)
Index: [GRNAME, LOB]
Columns: [CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]

As of version 0.8.3, .iat and at functionality have been added. Similar to pandas, one can use these for value assignment for a single cell of a Triangle. When a ‘sparse’ backend is in use, these accessors are the only way to modify individual cells of a triangle.

raa = cl.load_sample('raa').set_backend('sparse')
# To modify a sparse triangle, we need to use at or iat
raa.at['Total', 'values', '1985', 12] = 10000

Arithmetic#

Most arithmetic operations can be used to create new triangles within your triangle instance. Like with pandas, these can automatically be added as new columns to your Triangle.

clrd = cl.load_sample('clrd')
clrd['CaseIncur'] = clrd['IncurLoss']-clrd['BulkLoss']
clrd
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (775, 7, 10, 10)
Index: [GRNAME, LOB]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet, CaseIncur]

For origin and development axes, arithmetic follows numpy broadcasting <https://numpy.org/doc/1.18/user/theory.broadcasting.html#array-broadcasting-in-numpy>_ rules. If broadcasting fails, arithmetic operations will rely on origin and development vectors to determine whether an operation is legal.

raa = cl.load_sample('raa')
# Allow for arithmetic beyond numpy broadcasting rules
raa[raa.origin<'1985']+raa[raa.origin>='1985']
12 24 36 48 60 72 84 96 108 120
1981 5,012 8,269 10,907 11,805 13,539 16,181 18,009 18,608 18,662 18,834
1982 106 4,285 5,396 10,666 13,782 15,599 15,496 16,169 16,704
1983 3,410 8,992 13,873 16,141 18,735 22,214 22,863 23,466
1984 5,655 11,555 15,766 21,266 23,425 26,083 27,067
1985 1,092 9,565 15,836 22,169 25,955 26,180
1986 1,513 6,445 11,702 12,935 15,852
1987 557 4,020 10,946 12,314
1988 1,351 6,947 13,112
1989 3,133 5,395
1990 2,063
# Numpy broadcasting equivalent fails
raa[raa.origin<'1985'].values+raa[raa.origin>='1985'].values

Arithmetic between two Triangles with different labels will align the axes of each Triangle consistent with arithmetic of a pandas Series. Bypassing index matching can be accomplished with arithmetic between an Triangle and an array.

s1 = cl.load_sample('clrd').iloc[:3]
s2 = s1.sort_index(ascending=False)
s1 + s2 == 2 * s1
True
s1 + s2.values == 2 * s1
False
s3 = s1.iloc[:, ::-1]
s1 + s3 == 2 * s1
True

Virtual Columns#

There are instances where we want to defer calculations, we can create “virtual” columns that defer calculation to when needed. These columns can be created by wrapping a normal column in a function. Lambda expressions work as a tidy representation of virtual columns.

clrd = cl.load_sample('clrd')
# A physical column with immediate evaluation
clrd['PaidLossRatio'] = clrd['CumPaidLoss'] / clrd['EarnedPremDIR']
# A virtual column with deferred evaluation
clrd['PaidLossRatio'] = lambda clrd : clrd['CumPaidLoss'] / clrd['EarnedPremDIR']
# Good - Defer loss ratio calculation until after summing premiums and losses
clrd.sum()['PaidLossRatio']
12 24 36 48 60 72 84 96 108 120
1988 0.2424 0.4783 0.5980 0.6682 0.7097 0.7327 0.7449 0.7515 0.7569 0.7591
1989 0.2517 0.4901 0.6115 0.6829 0.7240 0.7457 0.7576 0.7651 0.7687
1990 0.2548 0.4903 0.6114 0.6806 0.7168 0.7368 0.7488 0.7547
1991 0.2364 0.4558 0.5673 0.6311 0.6658 0.6839 0.6938
1992 0.2423 0.4601 0.5671 0.6266 0.6598 0.6765
1993 0.2463 0.4618 0.5644 0.6227 0.6538
1994 0.2523 0.4602 0.5592 0.6159
1995 0.2478 0.4445 0.5363
1996 0.2459 0.4280
1997 0.2383

Virtual column expressions should only reference other columns in the same triangle. A Triangle without all the underlying columns will fail.

# Eliminating EarnedPremDIR will result in a calculation failure
clrd[['CumPaidLoss', 'PaidLossRatio']].sum()['PaidLossRatio']

When used in tandem with the ‘sparse’ backend, virtual columns can also substantially reduce the memory footprint of your Triangle. This is because the calculation expression is the only thing in memory.

Aggregations#

It is generally good practice to bring your data into chainladder at a ganularity that is comfortably supported by your system RAM. This provides the greatest flexibility in analyzing your data within the chainladder framework. However, not everything needs to be analyzed at the most granular level. Like pandas, you can aggregate multiple triangles within a Triangle by using sum() which can optionally be coupled with groupby().

clrd = cl.load_sample('clrd')
clrd.sum()
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (1, 6, 10, 10)
Index: [GRNAME, LOB]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]
clrd.groupby('LOB').sum()
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (6, 6, 10, 10)
Index: [LOB]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]

By default, the aggregation will apply to the first axis with a length greater than 1. Alternatively, you can specify the axis using the axis argument of the aggregate method.

Like pandas, the groupby method supports any groupable list. This allows for complex groupings that can be derived dynamically.

clrd = cl.load_sample('clrd')
clrd = clrd[clrd['LOB']=='comauto']
# Identify the largest commercial auto carriers (by premium) for 1997
top_10 = clrd['EarnedPremDIR'].groupby('GRNAME').sum().latest_diagonal.loc[..., '1997', :].to_frame().nlargest(10)
# Group any companies together that are not in the top 10
clrd.groupby(clrd.index['GRNAME'].map(lambda x: x if x in top_10.index else 'Remainder')).sum()
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (11, 6, 10, 10)
Index: [GRNAME]
Columns: [IncurLoss, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet]

Converting to DataFrame#

When a triangle is presented with a single index level and single column, it becomes a 2D object. As such, its display format changes to that similar to a dataframe. These 2D triangles can easily be converted to a pandas dataframe using the to_frame method.

clrd = cl.load_sample('clrd')
clrd[clrd['LOB']=='ppauto']['CumPaidLoss'].sum().to_frame()
12 24 36 48 60 72 84 96 108 120
1988-01-01 3092818.0 5942711.0 7239089.0 7930109.0 8318795.0 8518201.0 8610355.0 8655509.0 8682451.0 8690036.0
1989-01-01 3556683.0 6753435.0 8219551.0 9018288.0 9441842.0 9647917.0 9753014.0 9800477.0 9823747.0 NaN
1990-01-01 4015052.0 7478257.0 9094949.0 9945288.0 10371175.0 10575467.0 10671988.0 10728411.0 NaN NaN
1991-01-01 4065571.0 7564284.0 9161104.0 10006407.0 10419901.0 10612083.0 10713621.0 NaN NaN NaN
1992-01-01 4551591.0 8344021.0 10047179.0 10901995.0 11336777.0 11555121.0 NaN NaN NaN NaN
1993-01-01 5020277.0 9125734.0 10890282.0 11782219.0 12249826.0 NaN NaN NaN NaN NaN
1994-01-01 5569355.0 9871002.0 11641397.0 12600432.0 NaN NaN NaN NaN NaN NaN
1995-01-01 5803124.0 10008734.0 11807279.0 NaN NaN NaN NaN NaN NaN NaN
1996-01-01 5835368.0 9900842.0 NaN NaN NaN NaN NaN NaN NaN NaN
1997-01-01 5754249.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN

From this point the results can be operated on directly in pandas. The to_frame functionality works when a Triangle is sliced down to any two axes and is not limited to just the index and column.

clrd['CumPaidLoss'].groupby('LOB').sum().latest_diagonal.to_frame()
origin 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
LOB
comauto 626097.0 674441.0 718396.0 711762.0 731033.0 762039.0 768095.0 675166.0 510191.0 272342.0
medmal 217239.0 222707.0 235717.0 275923.0 267007.0 276235.0 252449.0 209222.0 107474.0 20361.0
othliab 317889.0 350684.0 361103.0 426085.0 389250.0 434995.0 402244.0 294332.0 191258.0 54130.0
ppauto 8690036.0 9823747.0 10728411.0 10713621.0 11555121.0 12249826.0 12600432.0 11807279.0 9900842.0 5754249.0
prodliab 110973.0 112614.0 121255.0 100276.0 76059.0 94462.0 111264.0 62018.0 28107.0 10682.0
wkcomp 1241715.0 1308706.0 1394675.0 1414747.0 1328801.0 1187581.0 1114842.0 962081.0 736040.0 340132.0

The entire 4D triangle can be flattened to a DataFrame in long format. This can be handy for moving back and forth between pandas and chainladder.

Because chainladder only supports valuation dates when creating new triangles, it is often helpful converting to a valuation format before moving to pandas.

clrd = cl.load_sample('clrd')
df = clrd.dev_to_val().cum_to_incr().to_frame()
df.head()
origin valuation IncurLoss CumPaidLoss BulkLoss EarnedPremDIR EarnedPremCeded EarnedPremNet
GRNAME LOB
Adriatic Ins Co othliab 1995-01-01 1995-12-31 23:59:59.999999999 8.0 NaN 8.0 139.0 131.0 8.0
othliab 1995-01-01 1996-12-31 23:59:59.999999999 3.0 NaN -4.0 NaN NaN NaN
othliab 1995-01-01 1997-12-31 23:59:59.999999999 -4.0 3.0 NaN NaN NaN NaN
othliab 1996-01-01 1996-12-31 23:59:59.999999999 40.0 NaN 40.0 410.0 359.0 51.0
othliab 1997-01-01 1997-12-31 23:59:59.999999999 67.0 NaN 31.0 458.0 425.0 33.0
cl.Triangle(
    df.reset_index(), index=['GRNAME', 'LOB'], 
    origin='origin', development='valuation',
    columns=['BulkLoss', 'CumPaidLoss', 'EarnedPremCeded', 
             'EarnedPremDIR', 'EarnedPremNet', 'IncurLoss']
).incr_to_cum()
/home/docs/checkouts/readthedocs.org/user_builds/chainladder-python/conda/latest/lib/python3.11/site-packages/chainladder/core/triangle.py:189: UserWarning: 
            The cumulative property of your triangle is not set. This may result in
            undesirable behavior. In a future release this will result in an error.
  warnings.warn(
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (775, 6, 10, 10)
Index: [GRNAME, LOB]
Columns: [BulkLoss, CumPaidLoss, EarnedPremCeded, EarnedPremDIR, EarnedPremNet, IncurLoss]

To enforce long format when moving to pandas the keepdims argument guarantees that all 4 dimensions of the Triangle will be preserved when moving to pandas.

cl.load_sample('raa').to_frame(keepdims=True).head()
origin development values
Total
Total 1981-01-01 12 5012.0
Total 1981-01-01 24 8269.0
Total 1981-01-01 36 10907.0
Total 1981-01-01 48 11805.0
Total 1981-01-01 60 13539.0

Exposing Pandas functionality#

The ability to move from a triangle to a pandas DataFrame opens up the full suite of pandas functionality to you. For the more commonly used functionality, we handle the to_frame() for you. For example, triangle.to_frame().plot() is equivalent to triangle.plot().

cl.load_sample('clrd').groupby('LOB').sum().loc['wkcomp', 'CumPaidLoss'].T.plot(
    marker='.',
    title='CAS Loss Reserve Database: Workers Compensation').set(
    xlabel='Development Period', ylabel='Cumulative Paid Loss');
../_images/7ffb1b70029eb82408fc9258e9db404634e1f7f630cb29e8cace7883eed4e98a.png

Many of the more commonly used pandas methods are passed through in this way allowing for working with triangles as DataFrames.

Aggregations

IO

Shaping

Other

sum

to_clipboard

unstack

plot

mean

to_csv

pivot

rename

median

to_excel

melt

pct_chg

max

to_json

T

round

min

to_html

drop

hvplot

prod

to_dict

dropna

drop_duplicates

var

describe

std

cumsum

quantile

Note

While some of these methods have been rewritten to return a Triangle, Many are pandas methods and will have return values consistent with pandas.

Accessors#

Like pandas .str and .dt accessor functions, you can also perform operations on the origin, development or valuation of a triangle. For example, all of these operations are legal.

raa = cl.load_sample('raa')
x = raa[raa.origin=='1986']
x = raa[(raa.development>=24)&(raa.development<=48)]
x = raa[raa.origin<='1985-JUN']
x = raa[raa.origin>'1987-01-01'][raa.development<=36]
x = raa[raa.valuation<raa.valuation_date]

These accessors apply boolean filtering along the origin or development of the triangle. Because boolean filtering can only work on one axis at a time, you may need to split up your indexing to achieve a desired result.

# Illegal use of boolean filtering of two different axes
x = raa[(raa.origin>'1987-01-01')&(raa.development<=36)]
# Instead, chain the boolean filters together.
x = raa[raa.origin>'1987-01-01'][raa.development<=36]

When using the accessors to filter a triangle, you may be left with empty portions of the triangle that need to be trimmed up. The dropna method will look for any origin periods or development periods that are fully empty at the edges of the triangle and eliminate them for you.

raa = cl.load_sample('raa')
raa[raa.origin>'1987-01-01']
12 24 36 48 60 72 84 96 108 120
1988 1,351 6,947 13,112
1989 3,133 5,395
1990 2,063

There are many more methods available to manipulate triangles. The complete list of methods is available under the Triangle docstrings.