Shortcuts

lumin.utils package

Submodules

lumin.utils.data module

lumin.utils.data.check_val_set(train, val, test=None, n_folds=None)[source]

Method to check validation set suitability by seeing whether Random Forests can predict whether events belong to one dataset or another. If a FoldYielder is passed, then trainings are run once per fold and averaged. Will compute the ROC AUC for set discrimination (should be close to 0.5) and compute the feature importances to aid removal of discriminating features.

Parameters:
  • train (Union[DataFrame, ndarray, FoldYielder]) – training data

  • val (Union[DataFrame, ndarray, FoldYielder]) – validation data

  • test (Union[DataFrame, ndarray, FoldYielder, None]) – optional testing data

  • n_folds (Optional[int]) – if set and if passed a FoldYielder, will only use the first n_folds folds

Return type:

None

lumin.utils.misc module

class lumin.utils.misc.BackwardHook(module, hook_fn=None)[source]

Bases: ForwardHook

Create a hook for performing an action based on the backward pass thorugh a nn.Module

Parameters:
  • module (Module) – nn.Module to hook

  • hook_fn (Optional[Callable[[Module, Union[Tensor, Tuple[Tensor]], Union[Tensor, Tuple[Tensor]]], None]]) – Optional function to perform. Default is to record input and output of module

Examples::
>>> hook = BackwardHook(model.tail.dense)
>>> model.predict(inputs)
>>> print(hook.inputs)
class lumin.utils.misc.ForwardHook(module, hook_fn=None)[source]

Bases: object

Create a hook for performing an action based on the forward pass thorugh a nn.Module

Parameters:
  • module (Module) – nn.Module to hook

  • hook_fn (Optional[Callable[[Module, Union[Tensor, Tuple[Tensor]], Union[Tensor, Tuple[Tensor]]], None]]) – Optional function to perform. Default is to record input and output of module

Examples::
>>> hook = ForwardHook(model.tail.dense)
>>> model.predict(inputs)
>>> print(hook.inputs)
hook_fn(module, input, output)[source]

Default hook function records inputs and outputs of module

Parameters:
  • module (Module) – nn.Module to hook

  • input (Union[Tensor, Tuple[Tensor]]) – input tensor

  • output (Union[Tensor, Tuple[Tensor]]) – output tensor of module

Return type:

None

remove()[source]

Call when finished to remove hook

Return type:

None

lumin.utils.misc.hard_identity(x)[source]

A hardcoded identity function to replace lambda x: x

Parameters:

x (Any) – anything

Return type:

Any

Returns:

input

lumin.utils.misc.ids2unique(ids)[source]

Map a permutaion of integers to a unique number, or a 2D array of integers to unique numbers by row. Returned numbers are unique for a given permutation of integers. This is achieved by computing the product of primes raised to powers equal to the integers. Beacause of this, it can be easy to produce numbers which are too large to be stored if many (large) integers are passed.

Parameters:

ids (Union[List[int], ndarray]) – (array of) permutation(s) of integers to map

Return type:

ndarray

Returns:

(Array of) unique id(s) for given permutation(s)

lumin.utils.misc.is_partially(var)[source]

Retuns true if var is partial, function, or class, else false.

Parameters:

var (Any) – variable to inspect

Return type:

bool

Returns:

true if var is partial or partialler, else false

lumin.utils.misc.str2bool(string)[source]

Convert string representation of Boolean to bool

Parameters:

string (Union[str, bool]) – string representation of Boolean (or a Boolean)

Return type:

bool

Returns:

bool if bool was passed else, True if lowercase string matches is in (“yes”, “true”, “t”, “1”)

lumin.utils.misc.subsample_df(df, objective, targ_name, n_samples=None, replace=False, strat_key=None, wgt_name=None)[source]

Subsamples, or samples with replacement, a DataFrame. Will automatically reweight data such that weight sums remain the same as the original DataFrame (per class)

Parameters:
  • df (DataFrame) – DataFrame to sample

  • objective (str) – string representation of objective: either ‘classification’ or ‘regression’

  • targ_name (str) – name of column containing target data

  • n_samples (Optional[int]) – If set, will sample that number of data points, otherwise will sample with replacement a new DataFRame of the same size as the original

  • replace (bool) – whether to sample with replacement

  • strat_key (Optional[str]) – column name to use for stratified subsampling, if desired

  • wgt_name (Optional[str]) – name of column containing weight data. If set, will reweight subsampled data, otherwise will not

Return type:

DataFrame

lumin.utils.misc.to_binary_class(df, zero_preds, one_preds)[source]

Map class precitions back to a binary prediction. The maximum prediction for features listed in zero_preds is treated as the prediction for class 0, vice versa for one_preds. The binary prediction is added to df in place as column ‘pred’

Parameters:
  • df (DataFrame) – DataFrame containing prediction features

  • zero_preds (List[str]) – list of column names for predictions associated with class 0

  • one_preds (List[str]) – list of column names for predictions associated with class 0

Return type:

None

lumin.utils.misc.to_device(x, device=device(type='cpu'))[source]

Recursively place Tensor(s) onto device

Parameters:

x (Union[Tensor, List[Tensor]]) – Tensor(s) to place on device

Return type:

Union[Tensor, List[Tensor]]

Returns:

Tensor(s) on device

lumin.utils.misc.to_np(x)[source]

Convert Tensor x to a Numpy array

Parameters:

x (Tensor) – Tensor to convert

Return type:

ndarray

Returns:

x as a Numpy array

lumin.utils.misc.to_tensor(x)[source]

Convert Numpy array to Tensor with possibility of a None being passed

Parameters:

x (Optional[ndarray]) – Numpy array or None

Return type:

Optional[Tensor]

Returns:

x as Tensor or None

lumin.utils.multiprocessing module

lumin.utils.multiprocessing.mp_run(args, func)[source]

Run multiple instances of function simultaneously by using a list of argument dictionaries Runs given function once per entry in args list.

Important

Function should put a dictionary of results into the mp.Queue and each result key should be unique otherwise they will overwrite one another.

Parameters:
  • args (List[Dict[Any, Any]]) – list of dictionaries of arguments

  • func (Callable[[Any], Any]) – function to which to pass dictionary arguments

Return type:

Dict[Any, Any]

Returns:

Dictionary of results

lumin.utils.statistics module

lumin.utils.statistics.bootstrap_stats(args, out_q=None)[source]

Computes statistics and KDEs of data via sampling with replacement

Parameters:
  • args (Dict[str, Any]) – dictionary of arguments. Possible keys are: data - data to resample name - name prepended to returned keys in result dict weights - array of weights matching length of data to use for weighted resampling n - number of times to resample data x - points at which to compute the kde values of resample data kde - whether to compute the kde values at x-points for resampled data mean - whether to compute the means of the resampled data std - whether to compute standard deviation of resampled data c68 - whether to compute the width of the absolute central 68.2 percentile of the resampled data

  • out_q (Optional[Queue]) – if using multiporcessing can place result dictionary in provided queue

Return type:

Optional[Dict[str, Any]]

Returns:

Result dictionary if out_q is None else None.

lumin.utils.statistics.get_moments(arr)[source]

Computes mean and std of data, and their associated uncertainties

Parameters:

arr (ndarray) – univariate data

Return type:

Tuple[float, float, float, float]

Returns:

  • mean

  • statistical uncertainty of mean

  • standard deviation

  • statistical uncertainty of standard deviation

lumin.utils.statistics.uncert_round(value, uncert)[source]

Round value according to given uncertainty using one significant figure of the uncertainty

Parameters:
  • value (float) – value to round

  • uncert (float) – uncertainty of value

Return type:

Tuple[float, float]

Returns:

  • rounded value

  • rounded uncertainty

Module contents

Docs

Access comprehensive developer and user documentation for LUMIN

View Docs

Tutorials

Get tutorials for beginner and advanced researchers demonstrating many of the features of LUMIN

View Tutorials