lumin.nn.metrics package¶

Submodules¶

lumin.nn.metrics.class_eval module¶

class lumin.nn.metrics.class_eval.AMS(n_total, wgt_name, targ_name='targets', br=0, syst_unc_b=0, use_quick_scan=True)[source]¶

Bases: lumin.nn.metrics.eval_metric.EvalMetric

Class to compute maximum Approximate Median Significance (https://arxiv.org/abs/1007.1727) using classifier which directly predicts the class of data in a binary classifiaction problem. AMS is computed on a single fold of data provided by a FoldYielder and automatically reweights data by event multiplicity to account missing weights.

Parameters

n_total (int) – total number of events in entire data set
wgt_name (str) – name of weight group in fold file to use. N.B. if you have reweighted to balance classes, be sure to use the un-reweighted weights.
targ_name (str) – name of target group in fold file
br (float) – constant bias offset for background yield
syst_unc_b (float) – fractional systematic uncertainty on background yield
use_quick_scan (bool) – whether to optimise AMS by the ams_scan_quick() method (fast but suffers floating point precision) if False use ams_scan_slow() (slower but more accurate)

Examples::

>>> ams_metric = AMS(n_total=250000, br=10, wgt_name='gen_orig_weight')
>>>
>>> ams_metric = AMS(n_total=250000, syst_unc_b=0.1,
...                  wgt_name='gen_orig_weight', use_quick_scan=False)

evaluate(fy, idx, y_pred)[source]¶

Compute maximum AMS on fold using provided predictions.

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

float

Returns

Maximum AMS computed on reweighted data from fold

Examples::

>>> ams = ams_metric.evaluate(train_fy, val_id, val_preds)

class lumin.nn.metrics.class_eval.MultiAMS(n_total, wgt_name, targ_name, zero_preds, one_preds, br=0, syst_unc_b=0, use_quick_scan=True)[source]¶

Bases: lumin.nn.metrics.class_eval.AMS

Class to compute maximum Approximate Median Significance (https://arxiv.org/abs/1007.1727) using classifier which predicts the class of data in a multiclass classifiaction problem which can be reduced to a binary classification problem AMS is computed on a single fold of data provided by a FoldYielder and automatically reweights data by event multiplicity to account missing weights.

Parameters

n_total (int) – total number of events in entire data set
wgt_name (str) – name of weight group in fold file to use. N.B. if you have reweighted to balance classes, be sure to use the un-reweighted weights.
targ_name (str) – name of target group in fold file which indicates whether the event is signal or background
zero_preds (List[str]) – list of predicted classes which correspond to class 0 in the form pred_[i], where i is a NN output index
one_preds (List[str]) – list of predicted classes which correspond to class 1 in the form pred_[i], where i is a NN output index
br (float) – constant bias offset for background yield
syst_unc_b (float) – fractional systematic uncertainty on background yield
use_quick_scan (bool) – whether to optimise AMS by the ams_scan_quick() method (fast but suffers floating point precision) if False use ams_scan_slow() (slower but more accurate)

Examples::

>>> ams_metric = MultiAMS(n_total=250000, br=10, targ_name='gen_target',
...                       wgt_name='gen_orig_weight',
...                       zero_preds=['pred_0', 'pred_1', 'pred_2'],
...                       one_preds=['pred_3'])
>>>
>>> ams_metric = MultiAMS(n_total=250000, syst_unc_b=0.1,
...                       targ_name='gen_target',
...                       wgt_name='gen_orig_weight',
...                       use_quick_scan=False,
...                       zero_preds=['pred_0', 'pred_1', 'pred_2'],
...                       one_preds=['pred_3'])

evaluate(fy, idx, y_pred)[source]¶

Compute maximum AMS on fold using provided predictions.

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

float

Returns

Maximum AMS computed on reweighted data from fold

Examples::

>>> ams = ams_metric.evaluate(train_fy, val_id, val_preds)

lumin.nn.metrics.eval_metric module¶

class lumin.nn.metrics.eval_metric.EvalMetric(targ_name, wgt_name=None)[source]¶

Bases: abc.ABC

Abstract class for evaluating performance of a model using some metric

Parameters

targ_name (str) – name of group in fold file containing regression targets
wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

abstract evaluate(fy, idx, y_pred)[source]¶

Evaluate the required metric for a given fold and set of predictions

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

float

Returns

metric value

get_df(fy, idx, y_pred)[source]¶

Returns a DataFrame for the given fold containing targets, weights, and predictions

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

DataFrame

Returns

DataFrame for the given fold containing targets, weights, and predictions

lumin.nn.metrics.reg_eval module¶

class lumin.nn.metrics.reg_eval.RegPull(return_mean, use_bootstrap=False, use_weights=True, use_pull=True, targ_name='targets', wgt_name=None)[source]¶

Bases: lumin.nn.metrics.eval_metric.EvalMetric

Compute mean or standard deviation of delta or pull of some feature which is being directly regressed to. Optionally, use bootstrap resampling on validation data.

Parameters

return_mean (bool) – whether to return the mean or the standard deviation
use_bootstrap (bool) – whether to bootstrap resamples validation fold when computing statisitic
use_weights (bool) – whether to actually use weights if wgt_name is set
use_pull (bool) – whether to return the pull (differences / targets) or delta (differences)
targ_name (str) – name of group in fold file containing regression targets
wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

Examples::

>>> mean_pull  = RegPull(return_mean=True, use_bootstrap=True,
...                      use_pull=True)
>>>
>>> std_delta  = RegPull(return_mean=False, use_bootstrap=True,
...                      use_pull=False)
>>>
>>> mean_pull  = RegPull(return_mean=True, use_bootstrap=False,
...                      use_pull=True, wgt_name='weights')

evaluate(fy, idx, y_pred)[source]¶

Compute statisitic on fold using provided predictions.

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

float

Returns

Statistic set in initialisation computed on the chsoen fold

Examples::

>>> mean = mean_pull.evaluate(train_fy, val_id, val_preds)

class lumin.nn.metrics.reg_eval.RegAsProxyPull(proxy_func, return_mean, use_bootstrap=False, use_weights=True, use_pull=True, targ_name='targets', wgt_name=None)[source]¶

Bases: lumin.nn.metrics.reg_eval.RegPull

Compute mean or standard deviation of delta or pull of some feature which is being indirectly regressed to via a proxy function. Optionally, use bootstrap resampling on validation data.

Parameters

proxy_func (Callable[[DataFrame], None]) – function which acts on regression predictions and adds pred and gen_target columns to the Pandas DataFrame it is passed which contains prediction columns pred_{i}
return_mean (bool) – whether to return the mean or the standard deviation
use_bootstrap (bool) – whether to bootstrap resamples validation fold when computing statisitic
use_weights (bool) – whether to actually use weights if wgt_name is set
use_pull (bool) – whether to return the pull (differences / targets) or delta (differences)
targ_name (str) – name of group in fold file containing regression targets
wgt_name (Optional[str]) – name of group in fold file containing datapoint weights

Examples::

>>> def reg_proxy_func(df):
>>>     df['pred'] = calc_pair_mass(df, (1.77682, 1.77682),
...                                 {targ[targ.find('_t')+3:]:
...                                 f'pred_{i}' for i, targ
...                                 in enumerate(targ_feats)})
>>>     df['gen_target'] = 125
>>>
>>> std_delta = RegAsProxyPull(proxy_func=reg_proxy_func,
...                            return_mean=False, use_pull=False)

evaluate(fy, idx, y_pred)[source]¶

Compute statisitic on fold using provided predictions.

Parameters

fy (FoldYielder) – FoldYielder interfacing to data
idx (int) – fold index corresponding to fold for which y_pred was computed
y_pred (ndarray) – predictions for fold

Return type

float

Returns

Statistic set in initialisation computed on the chsoen fold

Examples::

>>> mean = mean_pull.evaluate(train_fy, val_id, val_preds)

lumin.nn.metrics package¶

Submodules¶

lumin.nn.metrics.class_eval module¶

lumin.nn.metrics.eval_metric module¶

lumin.nn.metrics.reg_eval module¶

Module contents¶

Docs

Tutorials