lumin.nn.models package¶
Submodules¶
lumin.nn.models.helpers module¶
-
class
lumin.nn.models.helpers.
CatEmbedder
(cat_names, cat_szs, emb_szs=None, max_emb_sz=50, emb_load_path=None)[source]¶ Bases:
object
Helper class for embedding categorical features. Designed to be passed to
ModelBuilder
. Note that the classmethodfrom_fy()
may be used to instantiate anCatEmbedder
from aFoldYielder
.- Parameters
cat_names (
List
[str
]) – list of names of catgorical features in order in which they will be passed as inputs columnscat_szs (
List
[int
]) – list of cardinalities (number of unique elements) for each featureemb_szs (
Optional
[List
[int
]]) – Optional list of embedding sizes for each feature. If None, will use min(max_emb_sz, (1+sz)//2)max_emb_sz (
int
) – Maximum size of embedding if emb_szs is Noneemb_load_path (
Union
[Path
,str
,None
]) – if not None, will causeModelBuilder
to attempt to load pretrained embeddings from path
- Examples::
>>> cat_embedder = CatEmbedder(cat_names=['n_jets', 'channel'], cat_szs=[5, 3]) >>> >>> cat_embedder = CatEmbedder(cat_names=['n_jets', 'channel'], cat_szs=[5, 3], emb_szs=[2, 2]) >>> >>> cat_embedder = CatEmbedder(cat_names=['n_jets', 'channel'], cat_szs=[5, 3], emb_szs=[2, 2], emb_load_path=Path('weights'))
-
calc_emb_szs
()[source]¶ Method used to set sizes of embeddings for each categorical feature when no embedding sizes are explicitly passed Uses rule of thumb of min(50, (1+cardinality)/2)
- Return type
None
-
classmethod
from_fy
(fy, emb_szs=None, max_emb_sz=50, emb_load_path=None)[source]¶ Instantiate an
CatEmbedder
from aFoldYielder
, i.e. avoid having to pass cat_names and cat_szs.- Parameters
fy (
FoldYielder
) –FoldYielder
with training dataemb_szs (
Optional
[List
[int
]]) – Optional list of embedding sizes for each feature. If None, will use min(max_emb_sz, (1+sz)//2)max_emb_sz (
int
) – Maximum size of embedding if emb_szs is Noneemb_load_path (
Union
[Path
,str
,None
]) – if not None, will causeModelBuilder
to attempt to load pretrained embeddings from path
- Returns
- Examples::
>>> cat_embedder = CatEmbedder.from_fy(train_fy) >>> >>> cat_embedder = CatEmbedder.from_fy(train_fy, emb_szs=[2, 2]) >>> >>> cat_embedder = CatEmbedder.from_fy( train_fy, emb_szs=[2, 2], emb_load_path=Path('weights'))
-
lumin.nn.models.helpers.
Embedder
(cat_names, cat_szs, emb_szs=None, max_emb_sz=50, emb_load_path=None)[source]¶ Attention
Depreciated in favour of
CatEmbedder
and will be removed in v0.4.
lumin.nn.models.initialisations module¶
-
lumin.nn.models.initialisations.
lookup_normal_init
(act, fan_in=None, fan_out=None)[source]¶ Lookup for weight initialisation using Normal distributions
- Parameters
act (
str
) – string representation of activation functionfan_in (
Optional
[int
]) – number of inputs to neuronfan_out (
Optional
[int
]) – number of outputs from neuron
- Return type
Callable
[[Tensor
],None
]- Returns
Callable to initialise weight tensor
-
lumin.nn.models.initialisations.
lookup_uniform_init
(act, fan_in=None, fan_out=None)[source]¶ Lookup weight initialisation using Uniform distributions
- Parameters
act (
str
) – string representation of activation functionfan_in (
Optional
[int
]) – number of inputs to neuronfan_out (
Optional
[int
]) – number of outputs from neuron
- Return type
Callable
[[Tensor
],None
]- Returns
Callable to initialise weight tensor
lumin.nn.models.model module¶
-
class
lumin.nn.models.model.
Model
(model_builder=None)[source]¶ Bases:
lumin.nn.models.abs_model.AbsModel
Wrapper class to handle training and inference of NNs created via a
ModelBuilder
. Note that saved models can be instantiated direcly viafrom_save()
classmethod.- Parameters
model_builder (
Optional
[ModelBuilder
]) –ModelBuilder
which will construct the network, loss, and optimiser
- Examples::
>>> model = Model(model_builder)
-
evaluate
(inputs, targets, weights=None, callbacks=None, mask_inputs=True)[source]¶ Compute loss on provided data.
- Parameters
inputs (
Tensor
) – input data as tensor on devicetargets (
Tensor
) – targets as tensor on deviceweights (
Optional
[Tensor
]) – Optional weights as tensor on devicecallbacks (
Optional
[List
[AbsCallback
]]) – list of any callbacks to use during evaluationmask_inputs (
bool
) – whether to apply input mask if one has been set
- Return type
float
- Returns
(weighted) loss of model predictions on provided data
-
export2onnx
(name, bs=1)[source]¶ Export network to ONNX format. Note that ONNX expects a fixed batch size (bs) which is the number of datapoints your wish to pass through the model concurrently.
- Parameters
name (
str
) – filename for exported filebs (
int
) – batch size for exported models
- Return type
None
-
export2tfpb
(name, bs=1)[source]¶ Export network to Tensorflow ProtocolBuffer format, via ONNX. Note that ONNX expects a fixed batch size (bs) which is the number of datapoints your wish to pass through the model concurrently.
- Parameters
name (
str
) – filename for exported filebs (
int
) – batch size for exported models
- Return type
None
-
fit
(batch_yielder, callbacks=None)[source]¶ Fit network for one complete iteration of a
BatchYielder
, i.e. one (sub-)epoch- Parameters
batch_yielder (
BatchYielder
) –BatchYielder
providing training data in form of tuple of inputs, targtes, and weights as tensors on devicecallbacks (
Optional
[List
[AbsCallback
]]) – list ofAbsCallback
to be used during training
- Return type
float
- Returns
Loss on training data averaged across all minibatches
-
classmethod
from_save
(name, model_builder)[source]¶ Instantiated a
Model
and load saved state from file.- Parameters
name (
str
) – name of file containing saved statemodel_builder (
ModelBuilder
) –ModelBuilder
which was used to construct the network
- Return type
AbsModel
- Returns
Instantiated
Model
with network weights, optimiser state, and input mask loaded from saved state
- Examples::
>>> model = Model.from_save('weights/model.h5', model_builder)
-
get_feat_importance
(fy, eval_metric=None)[source]¶ Call
get_nn_feat_importance()
passing thisModel
and provided arguments- Parameters
fy (
FoldYielder
) –FoldYielder
interfacing to data on which to evaluate importanceeval_metric (
Optional
[EvalMetric
]) – OptionalEvalMetric
to use for quantifying performance
- Return type
DataFrame
-
get_lr
()[source]¶ Get learning rate of optimiser
- Return type
float
- Returns
learning rate of optimiser
-
get_mom
()[source]¶ Get momentum/beta_1 of optimiser
- Return type
float
- Returns
momentum/beta_1 of optimiser
-
get_out_size
()[source]¶ Get number of outputs of model
- Return type
int
- Returns
Number of outputs of model
-
get_param_count
(trainable=True)[source]¶ Return number of parameters in model.
- Parameters
trainable (
bool
) – if true (default) only count trainable parameters- Return type
int
- Returns
NUmber of (trainable) parameters in model
-
get_weights
()[source]¶ Get state_dict of weights for network
- Return type
OrderedDict
- Returns
state_dict of weights for network
-
load
(name, model_builder=None)[source]¶ Load model, optimiser, and input mask states from file
- Parameters
name (
str
) – name of save filemodel_builder (
Optional
[ModelBuilder
]) – ifModel
was not initialised with aModelBuilder
, you will need to pass one here
- Return type
None
-
predict
(inputs, as_np=True, pred_name='pred')[source]¶ Apply model to inputed data and compute predictions. A compatability method to call
predict_array()
or meth:~lumin.nn.models.model.Model.predict_folds, depending on input type.- Parameters
inputs (
Union
[ndarray
,DataFrame
,Tensor
,FoldYielder
]) – input data as Numpy array, Pandas DataFrame, or tensor on device, orFoldYielder
interfacing to dataas_np (
bool
) – whether to return predictions as Numpy array (otherwise tensor) if inputs are a Numpy array, Pandas DataFrame, or tensorpred_name (
str
) – name of group to which to save predictions if inputs are aFoldYielder
- Return type
Union
[ndarray
,Tensor
,None
]- Returns
if inputs are a Numpy array, Pandas DataFrame, or tensor, will return predicitions as either array or tensor
-
predict_array
(inputs, as_np=True, mask_inputs=True)[source]¶ Pass inputs through network and obtain predictions.
- Parameters
inputs (
Union
[ndarray
,DataFrame
,Tensor
]) – input data as Numpy array, Pandas DataFrame, or tensor on deviceas_np (
bool
) – whether to return predictions as Numpy array (otherwise tensor)mask_inputs (
bool
) – whether to apply input mask if one has been set
- Return type
Union
[ndarray
,Tensor
]- Returns
Model prediction(s) per datapoint
-
predict_folds
(fy, pred_name='pred')[source]¶ Apply model to all dataaccessed by a
FoldYielder
and save predictions as new group in fold file- Parameters
fy (
FoldYielder
) –FoldYielder
interfacing to datapred_name (
str
) – name of group to which to save predictions
- Return type
None
-
save
(name)[source]¶ Save model, optimiser, and input mask states to file
- Parameters
name (
str
) – name of save file- Return type
None
-
set_input_mask
(mask)[source]¶ Mask input columns by only using input columns whose indeces are listed in mask
- Parameters
mask (
ndarray
) – array of column indeces to use from all input columns- Return type
None
-
set_lr
(lr)[source]¶ set learning rate of optimiser
- Parameters
lr (
float
) – learning rate of optimiser- Return type
None
lumin.nn.models.model_builder module¶
-
class
lumin.nn.models.model_builder.
ModelBuilder
(objective, n_out, cont_feats=None, model_args=None, opt_args=None, cat_embedder=None, loss='auto', head=<class 'lumin.nn.models.blocks.head.CatEmbHead'>, body=<class 'lumin.nn.models.blocks.body.FullyConnected'>, tail=<class 'lumin.nn.models.blocks.tail.ClassRegMulti'>, lookup_init=<function lookup_normal_init>, lookup_act=<function lookup_act>, pretrain_file=None, freeze_head=False, freeze_body=False, freeze_tail=False, cat_args=None, n_cont_in=None)[source]¶ Bases:
object
Class to build models to specified architecture on demand along with an optimiser.
Attention
cat_args is now depreciated in favour of cat_embedder and will be removed in v0.4
Attention
n_cont_in is now depreciated in favour of cont_feats and will be removed in v0.4
- Parameters
objective (
str
) – string representation of network objective, i.e. ‘classification’, ‘regression’, ‘multiclass’n_out (
int
) – number of outputs requiredcont_feats (
Optional
[List
[str
]]) – list of names of continuous input featuresmodel_args (
Optional
[Dict
[str
,Dict
[str
,Any
]]]) – dictionary of dictionaries of keyword arguments to pass to head, body, and tail to control architrctureopt_args (
Optional
[Dict
[str
,Any
]]) – dictionary of arguments to pass to optimiser. Missing kargs will be filled with default values. Currently, only ADAM (default), RAdam, Ranger, and SGD are available.cat_embedder (
Optional
[CatEmbedder
]) –CatEmbedder
for embedding categorical inputsloss (
Any
) – either and uninstantiated loss class, or leave as ‘auto’ to select loss according to objectivehead (
AbsHead
) – uninstantiated class which can receive input data and upscale it to model widthbody (
AbsBody
) – uninstantiated class which implements the main bulk of the model’s hidden layerstail (
AbsTail
) – uninstantiated class which scales the body to the required number of outputs and implements any final activation function and output scalinglookup_init (
Callable
[[str
,Optional
[int
],Optional
[int
]],Callable
[[Tensor
],None
]]) – function taking choice of activation function, number of inputs, and number of outputs an returning a function to initialise layer weights.lookup_act (
Callable
[[str
],Module
]) – function taking choice of activation function and returning an activation function layerpretrain_file (
Optional
[str
]) – if set, will load saved parameters for entire network from saved modelfreeze_head (
bool
) – whether to start with the head parameters set to untrainablefreeze_body (
bool
) – whether to start with the body parameters set to untrainablecat_args (
Optional
[Dict
[str
,Any
]]) – depreciated in place of cat_embeddern_cont_in (
Optional
[int
]) – depreciated in favour of cont_feats
- Examples::
>>> model_builder = ModelBuilder(objective='classifier', >>> cont_feats=cont_feats, n_out=1, >>> model_args={'body':{'depth':4, >>> 'width':100}}) >>> >>> min_targs = np.min(targets, axis=0).reshape(targets.shape[1],1) >>> max_targs = np.max(targets, axis=0).reshape(targets.shape[1],1) >>> min_targs[min_targs > 0] *=0.8 >>> min_targs[min_targs < 0] *=1.2 >>> max_targs[max_targs > 0] *=1.2 >>> max_targs[max_targs < 0] *=0.8 >>> y_range = np.hstack((min_targs, max_targs)) >>> model_builder = ModelBuilder( >>> objective='regression', cont_feats=cont_feats, n_out=6, >>> cat_embedder=CatEmbedder.from_fy(train_fy), >>> model_args={'body':{'depth':4, 'width':100}, >>> 'tail':{y_range=y_range}) >>> >>> model_builder = ModelBuilder(objective='multiclassifier', >>> cont_feats=cont_feats, n_out=5, >>> model_args={'body':{'width':100, >>> 'depth':6, >>> 'do':0.1, >>> 'res':True}}) >>> >>> model_builder = ModelBuilder(objective='classifier', >>> cont_feats=cont_feats, n_out=1, >>> model_args={'body':{'depth':4, >>> 'width':100}}, >>> opt_args={'opt':'sgd', >>> 'momentum':0.8, >>> 'weight_decay':1e-5}, >>> loss=partial(SignificanceLoss, >>> sig_weight=sig_weight, >>> bkg_weight=bkg_weight, >>> func=calc_ams_torch))
-
build_model
()[source]¶ Construct entire network module
- Return type
Module
- Returns
Instantiated nn.Module
-
classmethod
from_model_builder
(model_builder, pretrain_file=None, freeze_head=False, freeze_body=False, freeze_tail=False, loss=None, opt_args=None)[source]¶ Instantiate a
ModelBuilder
from an exisitngModelBuilder
, but with options to adjust loss, optimiser, pretraining, and module freezing- Parameters
model_builder – existing
ModelBuilder
or filename for a pickledModelBuilder
pretrain_file (
Optional
[str
]) – if set, will load saved parameters for entire network from saved modelfreeze_head (
bool
) – whether to start with the head parameters set to untrainablefreeze_body (
bool
) – whether to start with the body parameters set to untrainablefreeze_tail (
bool
) – whether to start with the tail parameters set to untrainableloss (
Optional
[Any
]) – either and uninstantiated loss class, or leave as ‘auto’ to select loss according to objectiveopt_args (
Optional
[Dict
[str
,Any
]]) – dictionary of arguments to pass to optimiser. Missing kargs will be filled with default values. Choice of optimiser (‘opt’) keyword can either be set by passing the string name (e.g. ‘adam’ ), but only ADAM and SGD are available this way, or by passing an uninstantiated optimiser (e.g. torch.optim.Adam). If no optimser is set, then it defaults to ADAM. Additional keyword arguments can be set, and these will be passed tot he optimiser during instantiation
- Returns
Instantiated
ModelBuilder
- Examples::
>>> new_model_builder = ModelBuilder.from_model_builder( >>> ModelBuidler) >>> >>> new_model_builder = ModelBuilder.from_model_builder( >>> ModelBuidler, loss=partial( >>> SignificanceLoss, sig_weight=sig_weight, >>> bkg_weight=bkg_weight, func=calc_ams_torch)) >>> >>> new_model_builder = ModelBuilder.from_model_builder( >>> 'weights/model_builder.pkl', >>> opt_args={'opt':'sgd', 'momentum':0.8, 'weight_decay':1e-5}) >>> >>> new_model_builder = ModelBuilder.from_model_builder( >>> 'weights/model_builder.pkl', >>> opt_args={'opt':torch.optim.Adam, ... 'momentum':0.8, ... 'weight_decay':1e-5})
-
get_body
(n_in, feat_map)[source]¶ Construct body module
- Return type
AbsBody
- Returns
Instantiated body nn.Module
-
get_model
()[source]¶ Construct model, loss, and optimiser, optionally loading pretrained weights
- Return type
Tuple
[Module
,Optimizer
,Any
]- Returns
Instantiated network, optimiser linked to model parameters, and uninstantiated loss
-
get_out_size
()[source]¶ Get number of outputs of model
- Return type
int
- Returns
number of outputs of network
-
get_tail
(n_in)[source]¶ Construct tail module
- Return type
Module
- Returns
Instantiated tail nn.Module
-
load_pretrained
(model)[source]¶ Load model weights from pretrained file
- Parameters
model (
Module
) – instantiated model, i.e. return ofbuild_model()
- Returns
model with weights loaded