lumin.utils package¶
Submodules¶
lumin.utils.data module¶
- lumin.utils.data.check_val_set(train, val, test=None, n_folds=None)[source]¶
Method to check validation set suitability by seeing whether Random Forests can predict whether events belong to one dataset or another. If a
FoldYielder
is passed, then trainings are run once per fold and averaged. Will compute the ROC AUC for set discrimination (should be close to 0.5) and compute the feature importances to aid removal of discriminating features.- Parameters:
train (
Union
[DataFrame
,ndarray
,FoldYielder
]) – training dataval (
Union
[DataFrame
,ndarray
,FoldYielder
]) – validation datatest (
Union
[DataFrame
,ndarray
,FoldYielder
,None
]) – optional testing datan_folds (
Optional
[int
]) – if set and if passed aFoldYielder
, will only use the first n_folds folds
- Return type:
None
lumin.utils.misc module¶
- class lumin.utils.misc.BackwardHook(module, hook_fn=None)[source]¶
Bases:
ForwardHook
Create a hook for performing an action based on the backward pass thorugh a nn.Module
- Parameters:
module (
Module
) – nn.Module to hookhook_fn (
Optional
[Callable
[[Module
,Union
[Tensor
,Tuple
[Tensor
]],Union
[Tensor
,Tuple
[Tensor
]]],None
]]) – Optional function to perform. Default is to record input and output of module
- Examples::
>>> hook = BackwardHook(model.tail.dense) >>> model.predict(inputs) >>> print(hook.inputs)
- class lumin.utils.misc.ForwardHook(module, hook_fn=None)[source]¶
Bases:
object
Create a hook for performing an action based on the forward pass thorugh a nn.Module
- Parameters:
module (
Module
) – nn.Module to hookhook_fn (
Optional
[Callable
[[Module
,Union
[Tensor
,Tuple
[Tensor
]],Union
[Tensor
,Tuple
[Tensor
]]],None
]]) – Optional function to perform. Default is to record input and output of module
- Examples::
>>> hook = ForwardHook(model.tail.dense) >>> model.predict(inputs) >>> print(hook.inputs)
- lumin.utils.misc.hard_identity(x)[source]¶
A hardcoded identity function to replace lambda x: x
- Parameters:
x (
Any
) – anything- Return type:
Any
- Returns:
input
- lumin.utils.misc.ids2unique(ids)[source]¶
Map a permutaion of integers to a unique number, or a 2D array of integers to unique numbers by row. Returned numbers are unique for a given permutation of integers. This is achieved by computing the product of primes raised to powers equal to the integers. Beacause of this, it can be easy to produce numbers which are too large to be stored if many (large) integers are passed.
- Parameters:
ids (
Union
[List
[int
],ndarray
]) – (array of) permutation(s) of integers to map- Return type:
ndarray
- Returns:
(Array of) unique id(s) for given permutation(s)
- lumin.utils.misc.is_partially(var)[source]¶
Retuns true if var is partial, function, or class, else false.
- Parameters:
var (
Any
) – variable to inspect- Return type:
bool
- Returns:
true if var is partial or partialler, else false
- lumin.utils.misc.str2bool(string)[source]¶
Convert string representation of Boolean to bool
- Parameters:
string (
Union
[str
,bool
]) – string representation of Boolean (or a Boolean)- Return type:
bool
- Returns:
bool if bool was passed else, True if lowercase string matches is in (“yes”, “true”, “t”, “1”)
- lumin.utils.misc.subsample_df(df, objective, targ_name, n_samples=None, replace=False, strat_key=None, wgt_name=None)[source]¶
Subsamples, or samples with replacement, a DataFrame. Will automatically reweight data such that weight sums remain the same as the original DataFrame (per class)
- Parameters:
df (
DataFrame
) – DataFrame to sampleobjective (
str
) – string representation of objective: either ‘classification’ or ‘regression’targ_name (
str
) – name of column containing target datan_samples (
Optional
[int
]) – If set, will sample that number of data points, otherwise will sample with replacement a new DataFRame of the same size as the originalreplace (
bool
) – whether to sample with replacementstrat_key (
Optional
[str
]) – column name to use for stratified subsampling, if desiredwgt_name (
Optional
[str
]) – name of column containing weight data. If set, will reweight subsampled data, otherwise will not
- Return type:
DataFrame
- lumin.utils.misc.to_binary_class(df, zero_preds, one_preds)[source]¶
Map class precitions back to a binary prediction. The maximum prediction for features listed in zero_preds is treated as the prediction for class 0, vice versa for one_preds. The binary prediction is added to df in place as column ‘pred’
- Parameters:
df (
DataFrame
) – DataFrame containing prediction featureszero_preds (
List
[str
]) – list of column names for predictions associated with class 0one_preds (
List
[str
]) – list of column names for predictions associated with class 0
- Return type:
None
- lumin.utils.misc.to_device(x, device=device(type='cpu'))[source]¶
Recursively place Tensor(s) onto device
- Parameters:
x (
Union
[Tensor
,List
[Tensor
]]) – Tensor(s) to place on device- Return type:
Union
[Tensor
,List
[Tensor
]]- Returns:
Tensor(s) on device
lumin.utils.multiprocessing module¶
- lumin.utils.multiprocessing.mp_run(args, func)[source]¶
Run multiple instances of function simultaneously by using a list of argument dictionaries Runs given function once per entry in args list.
Important
Function should put a dictionary of results into the mp.Queue and each result key should be unique otherwise they will overwrite one another.
- Parameters:
args (
List
[Dict
[Any
,Any
]]) – list of dictionaries of argumentsfunc (
Callable
[[Any
],Any
]) – function to which to pass dictionary arguments
- Return type:
Dict
[Any
,Any
]- Returns:
Dictionary of results
lumin.utils.statistics module¶
- lumin.utils.statistics.bootstrap_stats(args, out_q=None)[source]¶
Computes statistics and KDEs of data via sampling with replacement
- Parameters:
args (
Dict
[str
,Any
]) – dictionary of arguments. Possible keys are: data - data to resample name - name prepended to returned keys in result dict weights - array of weights matching length of data to use for weighted resampling n - number of times to resample data x - points at which to compute the kde values of resample data kde - whether to compute the kde values at x-points for resampled data mean - whether to compute the means of the resampled data std - whether to compute standard deviation of resampled data c68 - whether to compute the width of the absolute central 68.2 percentile of the resampled dataout_q (
Optional
[Queue
]) – if using multiporcessing can place result dictionary in provided queue
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
Result dictionary if out_q is None else None.
- lumin.utils.statistics.get_moments(arr)[source]¶
Computes mean and std of data, and their associated uncertainties
- Parameters:
arr (
ndarray
) – univariate data- Return type:
Tuple
[float
,float
,float
,float
]- Returns:
mean
statistical uncertainty of mean
standard deviation
statistical uncertainty of standard deviation
- lumin.utils.statistics.uncert_round(value, uncert)[source]¶
Round value according to given uncertainty using one significant figure of the uncertainty
- Parameters:
value (
float
) – value to rounduncert (
float
) – uncertainty of value
- Return type:
Tuple
[float
,float
]- Returns:
rounded value
rounded uncertainty