GridSearch

GridSearch#

class chainladder.GridSearch(estimator, param_grid, scoring, verbose=0, error_score='raise', n_jobs=None)[source]#

Exhaustive search over specified parameter values for an estimator. Important members are fit, predict. GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used. The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

Parameters:
estimator: estimator object.

This is assumed to implement the chainladder estimator interface.

param_grid: dict or list of dictionaries

Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

scoring: callable or dict of callable(s)

Should be of the form {‘name’: callable}. The callable(s) should return a single value.

verbose: integer

Controls the verbosity: the higher, the more messages.

error_score: ‘raise’ or numeric

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Default is ‘raise’ but from version 0.22 it will change to np.nan.

n_jobs: int, default=None

The number of jobs to use for the computation. This will only provide speedup for n_targets > 1 and sufficient large problems. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes:
results_: DataFrame

A DataFrame with each param_grid key as a column and the scoring score as the last column

Examples

Suppose an actuary reserving the industry medical malpractice line wants to see how the choice between simple and volume-weighted averaging affects the fitted development pattern. GridSearch fits one pipeline per candidate in param_grid, and the scoring callables can record any fitted attribute, such as the full vector of ldf_ by development age.

clrd = cl.load_sample("clrd")
medmal = clrd.groupby("LOB").sum().loc["medmal"]["CumPaidLoss"]
pipe = cl.Pipeline(
    [("dev", cl.Development()), ("cl", cl.Chainladder())]
)
param_grid = {"dev__average": ["simple", "volume"]}

def ldf_by_age(model):
    ldf = model.named_steps.dev.ldf_
    return ldf.values[0, 0, 0, :].round(3).tolist()

grid = cl.GridSearch(
    pipe, param_grid, scoring={"ldf": ldf_by_age}, n_jobs=1
).fit(medmal)
for _, row in grid.results_.iterrows():
    print(row["dev__average"], row["ldf"])
simple [6.076, 1.976, 1.384, 1.2, 1.102, 1.068, 1.039, 1.029, 1.018]
volume [5.856, 1.963, 1.376, 1.199, 1.099, 1.067, 1.039, 1.028, 1.018]

Because both rows are loss development factors, they are directly comparable age by age. Simple averaging gives every origin year an equal vote, while volume weighting lets the origin years with the most losses dominate, and for this triangle that distinction matters most at the immature 12-24 age (6.076 versus 5.856). The two candidates converge as the line matures, so the averaging choice mainly moves the reserve carried for the most recent origin years.

fit(X, y=None, **fit_params)[source]#

Fit the model with X.

Parameters:
X: Triangle-like

Set of LDFs to which the tail will be applied.

y: Ignored
fit_params: (optional) dict of string -> object

Parameters passed to the fit method of the estimator

Returns:
self: object

Returns the instance itself.

Inherited Methods

GridSearch.get_metadata_routing

Get metadata routing of this object.

GridSearch.get_params

Get parameters for this estimator.

GridSearch.set_params

Set the parameters of this estimator.