lumin.plotting package¶
Submodules¶
lumin.plotting.data_viewing module¶
- lumin.plotting.data_viewing.compare_events(events)[source]¶
Plots at least two events side by side in their transverse and longitudinal projections
- Parameters:
events (
list
) – list of DataFrames containing vector coordinates for 3 momenta- Return type:
None
- lumin.plotting.data_viewing.plot_binary_sample_feat(df, feat, targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=None, log_y=False, lim_x=None, density=True, feat_name=None, units=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
More advanced plotter for feature distributions in a binary class problem with stacked distributions for backgrounds and user-defined binning Note that plotting colours can be controled by seeting the settings.sample2col dictionary
- Parameters:
df (
DataFrame
) – DataFrame with targets and predictionsfeat (
str
) – name of column to plot the distribution oftarg_name (
str
) – name of column to use as targetswgt_name (
str
) – name of column to use as sample weightssample_name (
str
) – name of column to use as process nameswgt_scale (
float
) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling. Only applicable when density = Falsebins (
Union
[int
,List
[int
],None
]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binninglog_y (
bool
) – whether to use a log scale for the y-axislim_x (
Optional
[Tuple
[float
,float
]]) – limit for plotting on the x-axisdensity – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
feat_name (
Optional
[str
]) – Name of feature to put on x-axis, can be in LaTeX.units (
Optional
[str
]) – units used to measure feature, if applicable. Can be in LaTeX, but should not include ‘$’.savename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.data_viewing.plot_feat(df, feat, wgt_name=None, cuts=None, labels='', plot_bulk=True, n_samples=100000, plot_params=None, size='mid', show_moments=True, ax_labels={'x': None, 'y': None}, log_x=False, log_y=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
A flexible function to provide indicative information about the 1D distribution of a feature. By default it will produce a weighted KDE+histogram for the [1,99] percentile of the data, as well as compute the mean and standard deviation of the data in this region. Distributions are weighted by sampling with replacement the data with probabilities propotional to the sample weights. By passing a list of cuts and labels, it will plot multiple distributions of the same feature for different cuts. Since it is designed to provide quick, indicative information, more specific functions (such as plot_kdes_from_bs) should be used to provide final results.
Important
NaN and Inf values are removed prior to plotting and no attempt is made to coerce them to real numbers
- Parameters:
df (
DataFrame
) – Pandas DataFrame containing datafeat (
str
) – column name to plotwgt_name (
Optional
[str
]) – if set, will use column to weight datacuts (
Optional
[List
[Series
]]) – optional list of cuts to apply to feature. Will add one KDE+hist for each cut listed on the same plotlabels (
Optional
[List
[str
]]) – optional list of labels for each KDE+histplot_bulk (
bool
) – whether to plot the [1,99] percentile of the data, or all of itn_samples (
int
) – if plotting weighted distributions, how many samples to useplot_params (
Union
[Dict
[str
,Any
],List
[Dict
[str
,Any
]],None
]) – optional list of of arguments to pass to Seaborn Distplot for each KDE+histsize (
str
) – string to pass tostr2sz()
to determin size of plotshow_moments (
bool
) – whether to compute and display the mean and standard deviationax_labels (
Dict
[str
,Any
]) – dictionary of x and y axes labelslog_x (
bool
) – if true, will use log scale for x-axislog_y (
bool
) – if true, will use log scale for y-axissavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.data_viewing.plot_kdes_from_bs(x, bs_stats, name2args, feat, units=None, moments=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>, show_plot=True)[source]¶
Plots KDEs computed via
bootstrap_stats()
- Parameters:
x (
ndarray
) – x-axis valuesbs_stats (
Dict
[str
,Any
]) – (filtered) dictionary retruned bybootstrap_stats()
name2args (
Dict
[str
,Dict
[str
,Any
]]) – Dictionary mapping names of different distributions to arguments to pass to seaborn tsplotfeat (
str
) – Name of feature being plotted (for axis lablels)units (
Optional
[str
]) – Optional units to show on axesmoments – whether to display mean and standard deviation of each distribution
savename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearanceshow_plot (
bool
) – whether to the show plot, or just save them
- Return type:
None
- lumin.plotting.data_viewing.plot_rank_order_dendrogram(df, threshold=0.8, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Plots a dendrogram of features in df clustered via Spearman’s rank correlation coefficient. Also returns a sets of features with correlation coefficients greater than the threshold
- Parameters:
df (
DataFrame
) – Pandas DataFrame containing datathreshold (
float
) – Threshold on correlation coefficientsavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
Dict
[str
,Union
[List
[str
],float
]]- Returns:
Dict of sets of features with correlation coefficients greater than the threshold and cluster distance
lumin.plotting.interpretation module¶
- lumin.plotting.interpretation.plot_1d_partial_dependence(model, df, feat, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_clusters=10, n_points=20, pdp_isolate_kargs=None, pdp_plot_kargs=None, y_lim=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Wrapper for PDPbox to plot 1D dependence of specified feature using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale the x-axis back to its original values.
- Parameters:
model (
Any
) – any trained model with a .predict methoddf (
DataFrame
) – DataFrame containing training datafeat (
str
) – feature for which to evaluate the partial dependence of the modeltrain_feats (
List
[str
]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fittedignore_feats (
Optional
[List
[str
]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)input_pipe (
Optional
[Pipeline
]) – SK-Learn Pipeline which was used to process the training datasample_sz (
Optional
[int
]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.wgt_name (
Optional
[str
]) – Optional column name to use as sampling weightsn_points (
int
) – number of points at which to evaluate the model output, passed to pdp_isolate as num_grid_pointsn_clusters (
Optional
[int
]) – number of clusters in which to group dependency lines. Set to None to show all linespdp_isolate_kargs (
Optional
[Dict
[str
,Any
]]) – optional dictionary of keyword arguments to pass to pdp_isolatepdp_plot_kargs (
Optional
[Dict
[str
,Any
]]) – optional dictionary of keyword arguments to pass to pdp_ploty_lim (
Union
[Tuple
[float
,float
],List
[float
],None
]) – If set, will limit y-axis plot range to tuplesavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.interpretation.plot_2d_partial_dependence(model, df, feats, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_points=[20, 20], pdp_interact_kargs=None, pdp_interact_plot_kargs=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Wrapper for PDPbox to plot 2D dependence of specified pair of features using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale them back to their original values.
- Parameters:
model (
Any
) – any trained model with a .predict methoddf (
DataFrame
) – DataFrame containing training datafeats (
Tuple
[str
,str
]) – pair of features for which to evaluate the partial dependence of the modeltrain_feats (
List
[str
]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fittedignore_feats (
Optional
[List
[str
]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)input_pipe (
Optional
[Pipeline
]) – SK-Learn Pipeline which was used to process the training datasample_sz (
Optional
[int
]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.wgt_name (
Optional
[str
]) – Optional column name to use as sampling weightsn_points (
Tuple
[int
,int
]) – pair of numbers of points at which to evaluate the model output, passed to pdp_interact as num_grid_pointsn_clusters – number of clusters in which to group dependency lines. Set to None to show all lines
pdp_isolate_kargs – optional dictionary of keyword arguments to pass to pdp_isolate
pdp_plot_kargs – optional dictionary of keyword arguments to pass to pdp_plot
savename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.interpretation.plot_bottleneck_weighted_inputs(model, bottleneck_idx, inputs, log_y=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Interpret how a single-neuron bottleneck in a :class:MultiBlock relies on input features by plotting the absolute values of the features times their associated weight for a given set of input data.
- Parameters:
model (
AbsModel
) – model to interpretbottleneck_idx (
int
) – index of the bottleneck to interpret, i.e. model.body.bottleneck_blocks[bottleneck_idx]inputs (
Union
[ndarray
,Tensor
]) – input data to use for interpretationlog_y (
bool
) – whether to plot a log scale for the y-axissavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.interpretation.plot_embedding(embed, feat, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Visualise weights in provided categorical entity-embedding matrix
- Parameters:
embed (
OrderedDict
) – state_dict of trained nn.Embeddingfeat (
str
) – name of feature embeddedsavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.interpretation.plot_importance(df, feat_name='Feature', imp_name='Importance', unc_name='Uncertainty', threshold=None, x_lbl='Importance via feature permutation', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Plot feature importances as computted via get_nn_feat_importance, get_ensemble_feat_importance, or rf_rank_features
- Parameters:
df (
DataFrame
) – DataFrame containing columns of features, importances and, optionally, uncertaintiesfeat_name (
str
) – column name for featuresimp_name (
str
) – column name for importancesunc_name (
str
) – column name for uncertainties (if present)threshold (
Optional
[float
]) – if set, will draw a line at the threshold hold used for feature importancex_lbl (
str
) – label to put on the x-axissavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.interpretation.plot_multibody_weighted_outputs(model, inputs, block_names=None, use_mean=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Interpret how a model relies on the outputs of each block in a :class:MultiBlock by plotting the outputs of each block as weighted by the tail block. This function currently only supports models whose tail block contains a single neuron in the first dense layer. Input data is passed through the model and the absolute sums of the weighted block outputs are computed per datum, and optionally averaged over the number of block outputs.
- Parameters:
model (
AbsModel
) – model to interpretinputs (
Union
[ndarray
,Tensor
]) – input data to use for interpretationblock_names (
Optional
[List
[str
]]) – names for each block to use when plottinguse_mean (
bool
) – if True, will average the weighted outputs over the number of output neurons in each blocksavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
lumin.plotting.plot_settings module¶
- class lumin.plotting.plot_settings.PlotSettings(**kargs)[source]¶
Bases:
object
Class to provide control over plot appearances. Default parameters are set automatically, and can be adjusted by passing values as keyword arguments during initialisation (or changed after instantiation)
- Parameters:
arguments (keyword) – used to set relevant plotting parameters
lumin.plotting.results module¶
- lumin.plotting.results.plot_binary_class_pred(df, pred_name='pred', targ_name='gen_target', wgt_name=None, wgt_scale=1, log_y=False, lim_x=(0, 1), density=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Basic plotter for prediction distribution in a binary classification problem. Note that labels are set using the settings.targ2class dictionary, which by default is {0: ‘Background’, 1: ‘Signal’}.
- Parameters:
df (
DataFrame
) – DataFrame with targets and predictionspred_name (
str
) – name of column to use as predictionstarg_name (
str
) – name of column to use as targetswgt_name (
Optional
[str
]) – optional name of column to use as sample weightswgt_scale (
float
) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescalinglog_y (
bool
) – whether to use a log scale for the y-axislim_x (
Tuple
[float
,float
]) – limit for plotting on the x-axisdensity – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
savename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
- lumin.plotting.results.plot_roc(data, pred_name='pred', targ_name='gen_target', wgt_name=None, labels=None, plot_params=None, n_bootstrap=0, log_x=False, plot_baseline=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
Plot receiver operating characteristic curve(s), optionally using booststrap resampling
- Parameters:
data (
Union
[DataFrame
,List
[DataFrame
]]) – (list of) DataFrame(s) from which to draw predictions and targetspred_name (
str
) – name of column to use as predictionstarg_name (
str
) – name of column to use as targetswgt_name (
Optional
[str
]) – optional name of column to use as sample weightslabels (
Union
[str
,List
[str
],None
]) – (list of) label(s) for plot legendplot_params (
Union
[Dict
[str
,Any
],List
[Dict
[str
,Any
]],None
]) – (list of) dictionar[y/ies] of argument(s) to pass to line plotn_bootstrap (
int
) – if greater than 0, will bootstrap resample the data that many times when computing the ROC AUC. Currently, this does not affect the shape of the lines, which are based on computing the ROC for the entire dataset as is.log_x (
bool
) – whether to use a log scale for plotting the x-axis, useful for high AUC lineplot_baseline (
bool
) – whether to plot a dotted line for AUC=0.5. Currently incompatable with log_x=Truesavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
Dict
[str
,Union
[float
,Tuple
[float
,float
]]]- Returns:
Dictionary mapping data labels to aucs (and uncertainties if n_bootstrap > 0)
- lumin.plotting.results.plot_sample_pred(df, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=35, log_y=True, lim_x=(0, 1), density=False, zoom_args=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶
More advanced plotter for prediction distribution in a binary class problem with stacked distributions for backgrounds and user-defined binning Can also zoom in to specified parts of plot Note that plotting colours can be controled by seeting the settings.sample2col dictionary
- Parameters:
df (
DataFrame
) – DataFrame with targets and predictionspred_name (
str
) – name of column to use as predictionstarg_name (
str
) – name of column to use as targetswgt_name (
str
) – name of column to use as sample weightssample_name (
str
) – name of column to use as process nameswgt_scale (
float
) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescalingbins (
Union
[int
,List
[int
]]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binninglog_y (
bool
) – whether to use a log scale for the y-axislim_x (
Tuple
[float
,float
]) – limit for plotting on the x-axisdensity – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
zoom_args (
Optional
[Dict
[str
,Any
]]) – arguments to control the optional zoomed in section, e.g. {‘x’:(0.4,0.45), ‘y’:(0.2, 1500), ‘anchor’:(0,0.25,0.95,1), ‘width_scale’:1, ‘width_zoom’:4, ‘height_zoom’:3}savename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancessettings (
PlotSettings
) –PlotSettings
class to control figure appearance
- Return type:
None
lumin.plotting.training module¶
- lumin.plotting.training.plot_lr_finders(lr_finders, lr_range=None, loss_range='auto', log_y='auto', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>, show_plot=True)[source]¶
Plot mean loss evolution against learning rate for several
fold_lr_find
.- Parameters:
lr_finders (
List
[AbsCallback
]) – list offold_lr_find
)lr_range (
Union
[float
,Tuple
,None
]) – limits the range of learning rates plotted on the x-axis: if float, maximum LR; if tuple, minimum & maximum LRloss_range (
Union
[float
,Tuple
,str
,None
]) – limits the range of losses plotted on the x-axis: if float, maximum loss; if tuple, minimum & maximum loss; if None, no limits; if ‘auto’, computes an upper limit automaticallylog_y (
Union
[str
,bool
]) – whether to plot y-axis as log. If ‘auto’, will set to log if maximal fractional difference in loss values is greater than 50savename (
Optional
[str
]) – Optional name of file to which to save the plotsettings (
PlotSettings
) –PlotSettings
class to control figure appearanceshow_plot (
bool
) – whether to show the plot, or just save them
- Return type:
None
- lumin.plotting.training.plot_train_history(histories, savename=None, ignore_trn=False, settings=<lumin.plotting.plot_settings.PlotSettings object>, show=True, xlow=0, log_y=False)[source]¶
Plot histories object returned by
train_models()
showing the loss evolution over time per model trained.- Parameters:
histories (
List
[OrderedDict
]) – list of dictionaries mapping loss type to values at each (sub)-epochsavename (
Optional
[str
]) – Optional name of file to which to save the plot of feature importancesignore_trn (
bool
) – whether to ignore training losssettings (
PlotSettings
) –PlotSettings
class to control figure appearanceshow (
bool
) – whether or not to show the plot, or just save itxlow (
int
) – if set, will cut out the first given number of epochslog_y (
bool
) – whether to plot the y-axis with a log scale
- Return type:
None