Shortcuts

lumin.nn.metrics package

Submodules

lumin.nn.metrics.class_eval module

class lumin.nn.metrics.class_eval.AMS(n_total, wgt_name, targ_name='targets', br=0, syst_unc_b=0, use_quick_scan=True)[source]

Bases: lumin.nn.metrics.eval_metric.EvalMetric

Class to compute maximum Approximate Median Significance (https://arxiv.org/abs/1007.1727) using classifier which directly predicts the class of data in a binary classifiaction problem. AMS is computed on a single fold of data provided by a FoldYielder and automatically reweights data by event multiplicity to account missing weights.

Parameters
  • n_total (int) – total number of events in entire data set

  • wgt_name (str) – name of weight group in fold file to use. N.B. if you have reweighted to balance classes, be sure to use the un-reweighted weights.

  • targ_name (str) – name of target group in fold file

  • br (float) – constant bias offset for background yield

  • syst_unc_b (float) – fractional systematic uncertainty on background yield

  • use_quick_scan (bool) – whether to optimise AMS by the ams_scan_quick() method (fast but suffers floating point precision) if False use ams_scan_slow() (slower but more accurate)

Examples::
>>> ams_metric = AMS(n_total=250000, br=10, wgt_name='gen_orig_weight')
>>>
>>> ams_metric = AMS(n_total=250000, syst_unc_b=0.1,
...                  wgt_name='gen_orig_weight', use_quick_scan=False)
evaluate(fy, idx, y_pred)[source]

Compute maximum AMS on fold using provided predictions.

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

float

Returns

Maximum AMS computed on reweighted data from fold

Examples::
>>> ams = ams_metric.evaluate(train_fy, val_id, val_preds)
class lumin.nn.metrics.class_eval.MultiAMS(n_total, wgt_name, targ_name, zero_preds, one_preds, br=0, syst_unc_b=0, use_quick_scan=True)[source]

Bases: lumin.nn.metrics.class_eval.AMS

Class to compute maximum Approximate Median Significance (https://arxiv.org/abs/1007.1727) using classifier which predicts the class of data in a multiclass classifiaction problem which can be reduced to a binary classification problem AMS is computed on a single fold of data provided by a FoldYielder and automatically reweights data by event multiplicity to account missing weights.

Parameters
  • n_total (int) – total number of events in entire data set

  • wgt_name (str) – name of weight group in fold file to use. N.B. if you have reweighted to balance classes, be sure to use the un-reweighted weights.

  • targ_name (str) – name of target group in fold file which indicates whether the event is signal or background

  • zero_preds (List[str]) – list of predicted classes which correspond to class 0 in the form pred_[i], where i is a NN output index

  • one_preds (List[str]) – list of predicted classes which correspond to class 1 in the form pred_[i], where i is a NN output index

  • br (float) – constant bias offset for background yield

  • syst_unc_b (float) – fractional systematic uncertainty on background yield

  • use_quick_scan (bool) – whether to optimise AMS by the ams_scan_quick() method (fast but suffers floating point precision) if False use ams_scan_slow() (slower but more accurate)

Examples::
>>> ams_metric = MultiAMS(n_total=250000, br=10, targ_name='gen_target',
...                       wgt_name='gen_orig_weight',
...                       zero_preds=['pred_0', 'pred_1', 'pred_2'],
...                       one_preds=['pred_3'])
>>>
>>> ams_metric = MultiAMS(n_total=250000, syst_unc_b=0.1,
...                       targ_name='gen_target',
...                       wgt_name='gen_orig_weight',
...                       use_quick_scan=False,
...                       zero_preds=['pred_0', 'pred_1', 'pred_2'],
...                       one_preds=['pred_3'])
evaluate(fy, idx, y_pred)[source]

Compute maximum AMS on fold using provided predictions.

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

float

Returns

Maximum AMS computed on reweighted data from fold

Examples::
>>> ams = ams_metric.evaluate(train_fy, val_id, val_preds)

lumin.nn.metrics.eval_metric module

class lumin.nn.metrics.eval_metric.EvalMetric(targ_name, wgt_name=None)[source]

Bases: abc.ABC

Abstract class for evaluating performance of a model using some metric

Parameters
  • targ_name (str) – name of group in fold file containing regression targets

  • wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

abstract evaluate(fy, idx, y_pred)[source]

Evaluate the required metric for a given fold and set of predictions

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

float

Returns

metric value

get_df(fy, idx, y_pred)[source]

Returns a DataFrame for the given fold containing targets, weights, and predictions

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

DataFrame

Returns

DataFrame for the given fold containing targets, weights, and predictions

lumin.nn.metrics.reg_eval module

class lumin.nn.metrics.reg_eval.RegPull(return_mean, use_bootstrap=False, use_weights=True, use_pull=True, targ_name='targets', wgt_name=None)[source]

Bases: lumin.nn.metrics.eval_metric.EvalMetric

Compute mean or standard deviation of delta or pull of some feature which is being directly regressed to. Optionally, use bootstrap resampling on validation data.

Parameters
  • return_mean (bool) – whether to return the mean or the standard deviation

  • use_bootstrap (bool) – whether to bootstrap resamples validation fold when computing statisitic

  • use_weights (bool) – whether to actually use weights if wgt_name is set

  • use_pull (bool) – whether to return the pull (differences / targets) or delta (differences)

  • targ_name (str) – name of group in fold file containing regression targets

  • wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

Examples::
>>> mean_pull  = RegPull(return_mean=True, use_bootstrap=True,
...                      use_pull=True)
>>>
>>> std_delta  = RegPull(return_mean=False, use_bootstrap=True,
...                      use_pull=False)
>>>
>>> mean_pull  = RegPull(return_mean=True, use_bootstrap=False,
...                      use_pull=True, wgt_name='weights')
evaluate(fy, idx, y_pred)[source]

Compute statisitic on fold using provided predictions.

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

float

Returns

Statistic set in initialisation computed on the chsoen fold

Examples::
>>> mean = mean_pull.evaluate(train_fy, val_id, val_preds)
class lumin.nn.metrics.reg_eval.RegAsProxyPull(proxy_func, return_mean, use_bootstrap=False, use_weights=True, use_pull=True, targ_name='targets', wgt_name=None)[source]

Bases: lumin.nn.metrics.reg_eval.RegPull

Compute mean or standard deviation of delta or pull of some feature which is being indirectly regressed to via a proxy function. Optionally, use bootstrap resampling on validation data.

Parameters
  • proxy_func (Callable[[DataFrame], None]) – function which acts on regression predictions and adds pred and gen_target columns to the Pandas DataFrame it is passed which contains prediction columns pred_{i}

  • return_mean (bool) – whether to return the mean or the standard deviation

  • use_bootstrap (bool) – whether to bootstrap resamples validation fold when computing statisitic

  • use_weights (bool) – whether to actually use weights if wgt_name is set

  • use_pull (bool) – whether to return the pull (differences / targets) or delta (differences)

  • targ_name (str) – name of group in fold file containing regression targets

  • wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

Examples::
>>> def reg_proxy_func(df):
>>>     df['pred'] = calc_pair_mass(df, (1.77682, 1.77682),
...                                 {targ[targ.find('_t')+3:]:
...                                 f'pred_{i}' for i, targ
...                                 in enumerate(targ_feats)})
>>>     df['gen_target'] = 125
>>>
>>> std_delta = RegAsProxyPull(proxy_func=reg_proxy_func,
...                            return_mean=False, use_pull=False)
evaluate(fy, idx, y_pred)[source]

Compute statisitic on fold using provided predictions.

Parameters
  • fy (FoldYielder) – FoldYielder interfacing to data

  • idx (int) – fold index corresponding to fold for which y_pred was computed

  • y_pred (ndarray) – predictions for fold

Return type

float

Returns

Statistic set in initialisation computed on the chsoen fold

Examples::
>>> mean = mean_pull.evaluate(train_fy, val_id, val_preds)

Module contents

Read the Docs v: v0.3.1
Versions
latest
stable
v0.3.2
v0.3.1
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer and user documentation for LUMIN

View Docs

Tutorials

Get tutorials for beginner and advanced researchers demonstrating many of the features of LUMIN

View Tutorials