PatsyFormula#

class chainladder.PatsyFormula(formula=None)[source]#

A sklearn-style Transformer for patsy formulas.

PatsyFormula allows for R-style formula preprocessing of the design_matrix of a machine learning algorithm. It’s particularly useful with the DevelopmentML and TweedieGLM estimators.

Parameters:
formula: str

A string representation of the regression model X features.

Attributes:
design_info_:

The patsy instructions for generating the design_matrix, X.

Examples

If a development-only Poisson GLM produces residuals that vary systematically by accident year, adding C(origin) to the formula introduces origin-level intercepts and reduces that structure. The expanded model matrix has more columns (one per development period plus one per origin), which PatsyFormula builds from the same R-style string.

genins = cl.load_sample("genins")
by_dev = cl.TweedieGLM(design_matrix="C(development)").fit(genins)
by_both = cl.TweedieGLM(
    design_matrix="C(development) + C(origin)"
).fit(genins)
print(len(by_dev.coef_))
print(len(by_both.coef_))
print(by_dev.ldf_.values[0, 0, 0, :].round(4))
print(by_both.ldf_.values[0, 0, 0, :].round(4))
10
19
[3.5085 1.7436 1.4379 1.1656 1.0991 1.0832 1.0511 1.0693 1.0135]
[3.491  1.7474 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177]

When TweedieGLM is not flexible enough (for example, when you need a non-Tweedie model or a continuous origin term), build a custom DevelopmentML pipeline and use PatsyFormula as the preprocessing step with the same formula syntax.

from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from chainladder.utils.utility_functions import PatsyFormula

genins = cl.load_sample("genins")
col = genins.columns[0]
dev_only = cl.DevelopmentML(
    Pipeline(
        [
            ("design_matrix", PatsyFormula("C(development)")),
            ("model", LinearRegression(fit_intercept=False)),
        ]
    ),
    y_ml=col,
    fit_incrementals=False,
).fit(genins)
print(dev_only.ldf_.values[0, 0, 0, :].round(4))
[3.515  1.735  1.3993 1.152  1.0988 1.0926 1.0332 1.0245 0.8507]
fit(X, y=None, sample_weight=None)[source]#
transform(X)[source]#

Inherited Methods

PatsyFormula.fit_transform

Fit to data, then transform it.

PatsyFormula.get_metadata_routing

Get metadata routing of this object.

PatsyFormula.get_params

Get parameters for this estimator.

PatsyFormula.set_output

Set output container.

PatsyFormula.set_params

Set the parameters of this estimator.