lumin.utils package¶
Submodules¶
lumin.utils.data module¶
-
lumin.utils.data.
check_val_set
(train, val, test=None, n_folds=None)[source]¶ Method to check validation set suitability by seeing whether Random Forests can predict whether events belong to one dataset or another. If a
FoldYielder
is passed, then trainings are run once per fold and averaged. Will compute the ROC AUC for set discrimination (should be close to 0.5) and compute the feature importances to aid removal of discriminating features.- Parameters
train (
Union
[DataFrame
,ndarray
,FoldYielder
]) – training dataval (
Union
[DataFrame
,ndarray
,FoldYielder
]) – validation datatest (
Union
[DataFrame
,ndarray
,FoldYielder
,None
]) – optional testing datan_folds (
Optional
[int
]) – if set and if passed aFoldYielder
, will only use the first n_folds folds
- Return type
None
lumin.utils.misc module¶
-
lumin.utils.misc.
to_np
(x)[source]¶ Convert Tensor x to a Numpy array
- Parameters
x (
Tensor
) – Tensor to convert- Return type
ndarray
- Returns
x as a Numpy array
-
lumin.utils.misc.
to_device
(x, device=device(type='cpu'))[source]¶ Recursively place Tensor(s) onto device
- Parameters
x (
Union
[Tensor
,List
[Tensor
]]) – Tensor(s) to place on device- Return type
Union
[Tensor
,List
[Tensor
]]- Returns
Tensor(s) on device
-
lumin.utils.misc.
to_tensor
(x)[source]¶ Convert Numpy array to Tensor with possibility of a None being passed
- Parameters
x (
Optional
[ndarray
]) – Numpy array or None- Return type
Optional
[Tensor
]- Returns
x as Tensor or None
-
lumin.utils.misc.
str2bool
(string)[source]¶ Convert string representation of Boolean to bool
- Parameters
string (
Union
[str
,bool
]) – string representation of Boolean (or a Boolean)- Return type
bool
- Returns
bool if bool was passed else, True if lowercase string matches is in (“yes”, “true”, “t”, “1”)
-
lumin.utils.misc.
to_binary_class
(df, zero_preds, one_preds)[source]¶ Map class precitions back to a binary prediction. The maximum prediction for features listed in zero_preds is treated as the prediction for class 0, vice versa for one_preds. The binary prediction is added to df in place as column ‘pred’
- Parameters
df (
DataFrame
) – DataFrame containing prediction featureszero_preds (
List
[str
]) – list of column names for predictions associated with class 0one_preds (
List
[str
]) – list of column names for predictions associated with class 0
- Return type
None
-
lumin.utils.misc.
ids2unique
(ids)[source]¶ Map a permutaion of integers to a unique number, or a 2D array of integers to unique numbers by row. Returned numbers are unique for a given permutation of integers. This is achieved by computing the product of primes raised to powers equal to the integers. Beacause of this, it can be easy to produce numbers which are too large to be stored if many (large) integers are passed.
- Parameters
ids (
Union
[List
[int
],ndarray
]) – (array of) permutation(s) of integers to map- Return type
ndarray
- Returns
(Array of) unique id(s) for given permutation(s)
-
class
lumin.utils.misc.
ForwardHook
(module, hook_fn=None)[source]¶ Bases:
object
Create a hook for performing an action based on the forward pass thorugh a nn.Module
- Parameters
module (
Module
) – nn.Module to hookhook_fn (
Optional
[Callable
[[Module
,Union
[Tensor
,Tuple
[Tensor
]],Union
[Tensor
,Tuple
[Tensor
]]],None
]]) – Optional function to perform. Default is to record input and output of module
- Examples::
>>> hook = ForwardHook(model.tail.dense) >>> model.predict(inputs) >>> print(hook.inputs)
-
class
lumin.utils.misc.
BackwardHook
(module, hook_fn=None)[source]¶ Bases:
lumin.utils.misc.ForwardHook
Create a hook for performing an action based on the backward pass thorugh a nn.Module
- Parameters
module (
Module
) – nn.Module to hookhook_fn (
Optional
[Callable
[[Module
,Union
[Tensor
,Tuple
[Tensor
]],Union
[Tensor
,Tuple
[Tensor
]]],None
]]) – Optional function to perform. Default is to record input and output of module
- Examples::
>>> hook = BackwardHook(model.tail.dense) >>> model.predict(inputs) >>> print(hook.inputs)
-
lumin.utils.misc.
subsample_df
(df, objective, targ_name, n_samples=None, replace=False, strat_key=None, wgt_name=None)[source]¶ Subsamples, or samples with replacement, a DataFrame. Will automatically reweight data such that weight sums remain the same as the original DataFrame (per class)
- Parameters
df (
DataFrame
) – DataFrame to sampleobjective (
str
) – string representation of objective: either ‘classification’ or ‘regression’targ_name (
str
) – name of column containing target datan_samples (
Optional
[int
]) – If set, will sample that number of data points, otherwise will sample with replacement a new DataFRame of the same size as the originalreplace (
bool
) – whether to sample with replacementstrat_key (
Optional
[str
]) – column name to use for stratified subsampling, if desiredwgt_name (
Optional
[str
]) – name of column containing weight data. If set, will reweight subsampled data, otherwise will not
- Return type
DataFrame
lumin.utils.multiprocessing module¶
-
lumin.utils.multiprocessing.
mp_run
(args, func)[source]¶ Run multiple instances of function simultaneously by using a list of argument dictionaries Runs given function once per entry in args list.
Important
Function should put a dictionary of results into the mp.Queue and each result key should be unique otherwise they will overwrite one another.
- Parameters
args (
List
[Dict
[Any
,Any
]]) – list of dictionaries of argumentsfunc (
Callable
[[Any
],Any
]) – function to which to pass dictionary arguments
- Return type
Dict
[Any
,Any
]- Returns
Dictionary of results
lumin.utils.statistics module¶
-
lumin.utils.statistics.
bootstrap_stats
(args, out_q=None)[source]¶ Computes statistics and KDEs of data via sampling with replacement
- Parameters
args (
Dict
[str
,Any
]) – dictionary of arguments. Possible keys are: data - data to resample name - name prepended to returned keys in result dict weights - array of weights matching length of data to use for weighted resampling n - number of times to resample data x - points at which to compute the kde values of resample data kde - whether to compute the kde values at x-points for resampled data mean - whether to compute the means of the resampled data std - whether to compute standard deviation of resampled data c68 - whether to compute the width of the absolute central 68.2 percentile of the resampled dataout_q (
Optional
[<bound method BaseContext.Queue of <multiprocessing.context.DefaultContext object at 0x7f97a0c10748>>]) – if using multiporcessing can place result dictionary in provided queue
- Return type
Union
[None
,Dict
[str
,Any
]]- Returns
Result dictionary if out_q is None else None.
-
lumin.utils.statistics.
get_moments
(arr)[source]¶ Computes mean and std of data, and their associated uncertainties
- Parameters
arr (
ndarray
) – univariate data- Return type
Tuple
[float
,float
,float
,float
]- Returns
mean
statistical uncertainty of mean
standard deviation
statistical uncertainty of standard deviation
-
lumin.utils.statistics.
uncert_round
(value, uncert)[source]¶ Round value according to given uncertainty using one significant figure of the uncertainty
- Parameters
value (
float
) – value to rounduncert (
float
) – uncertainty of value
- Return type
Tuple
[float
,float
]- Returns
rounded value
rounded uncertainty