Shortcuts

lumin.optimisation package

Submodules

lumin.optimisation.features module

lumin.optimisation.features.get_rf_feat_importance(rf, inputs, targets, weights=None)[source]

Compute feature importance for a Random Forest model using rfpimp.

Parameters
  • rf (ForestRegressor) – trained Random Forest model

  • inputs (DataFrame) – input data as Pandas DataFrame

  • targets (ndarray) – target data as Numpy array

  • weights (Optional[ndarray]) – Optional data weights as Numpy array

Return type

DataFrame

lumin.optimisation.features.rf_rank_features(train_df, val_df, objective, train_feats, targ_name='gen_target', wgt_name=None, importance_cut=0.0, n_estimators=40, n_rfs=1, savename=None, plot_settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Compute relative permutation importance of input features via using Random Forests. A reduced set of ‘important features’ is obtained by cutting on relative importance and a new model is trained and evaluated on this reduced set. RFs will have their hyper-parameters roughly optimised, both when training on all features and once when training on important features. Relative importances may be computed multiple times (via n_rfs) and averaged. In which case the standard error is also computed.

Parameters
  • train_df (DataFrame) – training data as Pandas DataFrame

  • val_df (DataFrame) – validation data as Pandas DataFrame

  • objective (str) – string representation of objective: either ‘classification’ or ‘regression’

  • train_feats (List[str]) – complete list of training features

  • targ_name (str) – name of column containing target data

  • wgt_name (Optional[str]) – name of column containing weight data. If set, will use weights for training and evaluation, otherwise will not

  • importance_cut (float) – minimum importance required to be considered an ‘important feature’

  • n_estimators (int) – number of trees to use in each forest

  • n_rfs (int) – number of trainings to perform on all training features in order to compute importances

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • plot_settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

List[str]

Returns

List of features passing importance_cut, ordered by importance

lumin.optimisation.hyper_param module

lumin.optimisation.hyper_param.get_opt_rf_params(x_trn, y_trn, x_val, y_val, objective, w_trn=None, w_val=None, params={'max_features': [0.3, 0.5, 0.7, 0.9], 'min_samples_leaf': [1, 3, 5, 10, 25, 50, 100]}, n_estimators=40, verbose=True)[source]

Use an ordered parameter-scan to roughly optimise Random Forest hyper-parameters.

Parameters
  • x_trn (ndarray) – training input data

  • y_trn (ndarray) – training target data

  • x_val (ndarray) – validation input data

  • y_val (ndarray) – validation target data

  • objective (str) – string representation of objective: either ‘classification’ or ‘regression’

  • w_trn (Optional[ndarray]) – training weights

  • w_val (Optional[ndarray]) – validation weights

  • params (OrderedDict) – ordered dictionary mapping parameters to optimise to list of values to cosnider

  • n_estimators (int) – number of trees to use in each forest

  • verbose – Print extra information and show a live plot of model performance

Returns

dictionary mapping parameters to their optimised values rf: best performing Random Forest

Return type

params

lumin.optimisation.hyper_param.fold_lr_find(fy, model_builder, bs, train_on_weights=True, shuffle_fold=True, n_folds=-1, lr_bounds=[1e-05, 10], callback_partials=None, plot_settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Wrapper function for training using LRFinder which runs a Smith LR range test (https://arxiv.org/abs/1803.09820) using folds in FoldYielder. Trains models for 1 fold, interpolating LR between set bounds. This repeats for each fold in FoldYielder, and loss evolution is averaged.

Parameters
  • fy (FoldYielder) – FoldYielder providing training data

  • model_builder (ModelBuilder) – ModelBuilder providing networks and optimisers

  • bs (int) – batch size

  • train_on_weights (bool) – If weights are present, whether to use them for training

  • shuffle_fold (bool) – whether to shuffle data in folds

  • n_folds (int) – if >= 1, will only train n_folds number of models, otherwise will train one model per fold

  • lr_bounds (Tuple[float, float]) – starting and ending LR values

  • callback_partials (Optional[List[partial]]) – optional list of functools.partial, each of which will a instantiate Callback when called

  • plot_settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

List[LRFinder]

Returns

List of LRFinder which were used for each model trained

lumin.optimisation.threshold module

lumin.optimisation.threshold.binary_class_cut(df, top_perc=5.0, min_pred=0.9, wgt_factor=1.0, br=0.0, syst_unc_b=0.0, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', plot_settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Attention

Depreciated as renamed to binary_class_cut_by_ams(). Will be removed in v0.4.

Return type

Tuple[float, float, float]

lumin.optimisation.threshold.binary_class_cut_by_ams(df, top_perc=5.0, min_pred=0.9, wgt_factor=1.0, br=0.0, syst_unc_b=0.0, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', plot_settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Optimise a cut on a signal-background classifier prediction by the Approximate Median Significance Cut which should generalise better by taking the mean class prediction of the top top_perc percentage of points as ranked by AMS

Parameters
  • df (DataFrame) – Pandas DataFrame containing data

  • top_perc (float) – top percentage of events to consider as ranked by AMS

  • min_pred (float) – minimum prediction to consider

  • wgt_factor (float) – single multiplicative coeficient for rescaling signal and background weights before computing AMS

  • br (float) – background offset bias

  • syst_unc_b (float) – fractional systemtatic uncertainty on background

  • pred_name (str) – column to use as predictions

  • targ_name (str) – column to use as truth labels for signal and background

  • wgt_name (str) – column to use as weights for signal and background events

  • plot_settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

Tuple[float, float, float]

Returns

Optimised cut AMS at cut Maximum AMS

Module contents

Read the Docs v: v0.3.1
Versions
latest
stable
v0.3.2
v0.3.1
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer and user documentation for LUMIN

View Docs

Tutorials

Get tutorials for beginner and advanced researchers demonstrating many of the features of LUMIN

View Tutorials