lumin.utils package¶

Submodules¶

lumin.utils.data module¶

lumin.utils.data.check_val_set(train, val, test=None, n_folds=None)[source]¶

Method to check validation set suitability by seeing whether Random Forests can predict whether events belong to one dataset or another. If a FoldYielder is passed, then trainings are run once per fold and averaged. Will compute the ROC AUC for set discrimination (should be close to 0.5) and compute the feature importances to aid removal of discriminating features.

Parameters

train (Union[DataFrame, ndarray, FoldYielder]) – training data
val (Union[DataFrame, ndarray, FoldYielder]) – validation data
test (Union[DataFrame, ndarray, FoldYielder, None]) – optional testing data
n_folds (Optional[int]) – if set and if passed a FoldYielder, will only use the first n_folds folds

Return type

None

lumin.utils.misc module¶

lumin.utils.misc.to_np(x)[source]¶

Convert Tensor x to a Numpy array

Parameters: x (Tensor) – Tensor to convert
Return type: ndarray
Returns: x as a Numpy array

lumin.utils.misc.to_device(x, device=device(type='cpu'))[source]¶

Recursively place Tensor(s) onto device

Parameters: x (Union[Tensor, List[Tensor]]) – Tensor(s) to place on device
Return type: Union[Tensor, List[Tensor]]
Returns: Tensor(s) on device

lumin.utils.misc.to_tensor(x)[source]¶

Convert Numpy array to Tensor with possibility of a None being passed

Parameters: x (Optional[ndarray]) – Numpy array or None
Return type: Optional[Tensor]
Returns: x as Tensor or None

lumin.utils.misc.str2bool(string)[source]¶

Convert string representation of Boolean to bool

Parameters: string (Union[str, bool]) – string representation of Boolean (or a Boolean)
Return type: bool
Returns: bool if bool was passed else, True if lowercase string matches is in (“yes”, “true”, “t”, “1”)

lumin.utils.misc.to_binary_class(df, zero_preds, one_preds)[source]¶

Map class precitions back to a binary prediction. The maximum prediction for features listed in zero_preds is treated as the prediction for class 0, vice versa for one_preds. The binary prediction is added to df in place as column ‘pred’

Parameters

df (DataFrame) – DataFrame containing prediction features
zero_preds (List[str]) – list of column names for predictions associated with class 0
one_preds (List[str]) – list of column names for predictions associated with class 0

Return type

None

lumin.utils.misc.ids2unique(ids)[source]¶

Map a permutaion of integers to a unique number, or a 2D array of integers to unique numbers by row. Returned numbers are unique for a given permutation of integers. This is achieved by computing the product of primes raised to powers equal to the integers. Beacause of this, it can be easy to produce numbers which are too large to be stored if many (large) integers are passed.

Parameters: ids (Union[List[int], ndarray]) – (array of) permutation(s) of integers to map
Return type: ndarray
Returns: (Array of) unique id(s) for given permutation(s)

class lumin.utils.misc.ForwardHook(module, hook_fn=None)[source]¶

Bases: object

Create a hook for performing an action based on the forward pass thorugh a nn.Module

Parameters

module (Module) – nn.Module to hook
hook_fn (Optional[Callable[[Module, Union[Tensor, Tuple[Tensor]], Union[Tensor, Tuple[Tensor]]], None]]) – Optional function to perform. Default is to record input and output of module

Examples::

>>> hook = ForwardHook(model.tail.dense)
>>> model.predict(inputs)
>>> print(hook.inputs)

hook_fn(module, input, output)[source]¶

Default hook function records inputs and outputs of module

Parameters

module (Module) – nn.Module to hook
input (Union[Tensor, Tuple[Tensor]]) – input tensor
output (Union[Tensor, Tuple[Tensor]]) – output tensor of module

Return type

None

remove()[source]¶

Call when finished to remove hook

Return type: None

class lumin.utils.misc.BackwardHook(module, hook_fn=None)[source]¶

Bases: lumin.utils.misc.ForwardHook

Create a hook for performing an action based on the backward pass thorugh a nn.Module

Parameters

module (Module) – nn.Module to hook
hook_fn (Optional[Callable[[Module, Union[Tensor, Tuple[Tensor]], Union[Tensor, Tuple[Tensor]]], None]]) – Optional function to perform. Default is to record input and output of module

Examples::

>>> hook = BackwardHook(model.tail.dense)
>>> model.predict(inputs)
>>> print(hook.inputs)

lumin.utils.misc.subsample_df(df, objective, targ_name, n_samples=None, replace=False, strat_key=None, wgt_name=None)[source]¶

Subsamples, or samples with replacement, a DataFrame. Will automatically reweight data such that weight sums remain the same as the original DataFrame (per class)

Parameters

df (DataFrame) – DataFrame to sample
objective (str) – string representation of objective: either ‘classification’ or ‘regression’
targ_name (str) – name of column containing target data
n_samples (Optional[int]) – If set, will sample that number of data points, otherwise will sample with replacement a new DataFRame of the same size as the original
replace (bool) – whether to sample with replacement
strat_key (Optional[str]) – column name to use for stratified subsampling, if desired
wgt_name (Optional[str]) – name of column containing weight data. If set, will reweight subsampled data, otherwise will not

Return type

DataFrame

lumin.utils.misc.is_partially(var)[source]¶

Retuns true if var is partial, function, or class, else false.

Parameters: var (Any) – variable to inspect
Return type: bool
Returns: true if var is partial or partialler, else false

lumin.utils.multiprocessing module¶

lumin.utils.multiprocessing.mp_run(args, func)[source]¶

Run multiple instances of function simultaneously by using a list of argument dictionaries Runs given function once per entry in args list.

Important

Function should put a dictionary of results into the mp.Queue and each result key should be unique otherwise they will overwrite one another.

Parameters

args (List[Dict[Any, Any]]) – list of dictionaries of arguments
func (Callable[[Any], Any]) – function to which to pass dictionary arguments

Return type

Dict[Any, Any]

Returns

Dictionary of results

lumin.utils.statistics module¶

lumin.utils.statistics.bootstrap_stats(args, out_q=None)[source]¶

Computes statistics and KDEs of data via sampling with replacement

Parameters

args (Dict[str, Any]) – dictionary of arguments. Possible keys are: data - data to resample name - name prepended to returned keys in result dict weights - array of weights matching length of data to use for weighted resampling n - number of times to resample data x - points at which to compute the kde values of resample data kde - whether to compute the kde values at x-points for resampled data mean - whether to compute the means of the resampled data std - whether to compute standard deviation of resampled data c68 - whether to compute the width of the absolute central 68.2 percentile of the resampled data
out_q (Optional[<bound method BaseContext.Queue of <multiprocessing.context.DefaultContext object at 0x7f97a0c10748>>]) – if using multiporcessing can place result dictionary in provided queue

Return type

Union[None, Dict[str, Any]]

Returns

Result dictionary if out_q is None else None.

lumin.utils.statistics.get_moments(arr)[source]¶

Computes mean and std of data, and their associated uncertainties

Parameters

arr (ndarray) – univariate data

Return type

Tuple[float, float, float, float]

Returns

mean
statistical uncertainty of mean
standard deviation
statistical uncertainty of standard deviation

lumin.utils.statistics.uncert_round(value, uncert)[source]¶

Round value according to given uncertainty using one significant figure of the uncertainty

Parameters

value (float) – value to round
uncert (float) – uncertainty of value

Return type

Tuple[float, float]

Returns

rounded value
rounded uncertainty

lumin.utils package¶

Submodules¶

lumin.utils.data module¶

lumin.utils.misc module¶

lumin.utils.multiprocessing module¶

lumin.utils.statistics module¶

Module contents¶

Docs

Tutorials