lumin.plotting package¶

Submodules¶

lumin.plotting.data_viewing module¶

lumin.plotting.data_viewing.plot_feat(df, feat, wgt_name=None, cuts=None, labels='', plot_bulk=True, n_samples=100000, plot_params=None, size='mid', show_moments=True, ax_labels={'x': None, 'y': 'Density'}, log_x=False, log_y=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

A flexible function to provide indicative information about the 1D distribution of a feature. By default it will produce a weighted KDE+histogram for the [1,99] percentile of the data, as well as compute the mean and standard deviation of the data in this region. Distributions are weighted by sampling with replacement the data with probabilities propotional to the sample weights. By passing a list of cuts and labels, it will plot multiple distributions of the same feature for different cuts. Since it is designed to provide quick, indicative information, more specific functions (such as plot_kdes_from_bs) should be used to provide final results.

Important

NaN and Inf values are removed prior to plotting and no attempt is made to coerce them to real numbers

Parameters

df (DataFrame) – Pandas DataFrame containing data
feat (str) – column name to plot
wgt_name (Optional[str]) – if set, will use column to weight data
cuts (Optional[List[Series]]) – optional list of cuts to apply to feature. Will add one KDE+hist for each cut listed on the same plot
labels (Optional[List[str]]) – optional list of labels for each KDE+hist
plot_bulk (bool) – whether to plot the [1,99] percentile of the data, or all of it
n_samples (int) – if plotting weighted distributions, how many samples to use
plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – optional list of of arguments to pass to Seaborn Distplot for each KDE+hist
size (str) – string to pass to str2sz() to determin size of plot
show_moments (bool) – whether to compute and display the mean and standard deviation
ax_labels (Dict[str, Any]) – dictionary of x and y axes labels
log_x (bool) – if true, will use log scale for x-axis
log_y (bool) – if true, will use log scale for y-axis
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.data_viewing.compare_events(events)[source]¶

Plots at least two events side by side in their transverse and longitudinal projections

Parameters: events (list) – list of DataFrames containing vector coordinates for 3 momenta
Return type: None

lumin.plotting.data_viewing.plot_rank_order_dendrogram(df, threshold=0.8, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Plots a dendrogram of features in df clustered via Spearman’s rank correlation coefficient. Also returns a sets of features with correlation coefficients greater than the threshold

Parameters

df (DataFrame) – Pandas DataFrame containing data
threshold (float) – Threshold on correlation coefficient
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

Dict[str, Union[List[str], float]]

Returns

Dict of sets of features with correlation coefficients greater than the threshold and cluster distance

lumin.plotting.data_viewing.plot_kdes_from_bs(x, bs_stats, name2args, feat, units=None, moments=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Plots KDEs computed via bootstrap_stats()

Parameters

bs_stats (Dict[str, Any]) – (filtered) dictionary retruned by bootstrap_stats()
name2args (Dict[str, Dict[str, Any]]) – Dictionary mapping names of different distributions to arguments to pass to seaborn tsplot
feat (str) – Name of feature being plotted (for axis lablels)
units (Optional[str]) – Optional units to show on axes
moments – whether to display mean and standard deviation of each distribution
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.data_viewing.plot_binary_sample_feat(df, feat, targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=None, log_y=False, lim_x=None, density=True, feat_name=None, units=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

More advanced plotter for feature distributions in a binary class problem with stacked distributions for backgrounds and user-defined binning Note that plotting colours can be controled by seeting the settings.sample2col dictionary

Parameters

df (DataFrame) – DataFrame with targets and predictions
feat (str) – name of column to plot the distribution of
targ_name (str) – name of column to use as targets
wgt_name (str) – name of column to use as sample weights
sample_name (str) – name of column to use as process names
wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling. Only applicable when density = False
bins (Union[int, List[int], None]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binning
log_y (bool) – whether to use a log scale for the y-axis
lim_x (Optional[Tuple[float, float]]) – limit for plotting on the x-axis
density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
feat_name (Optional[str]) – Name of feature to put on x-axis, can be in LaTeX.
units (Optional[str]) – units used to measure feature, if applicable. Can be in LaTeX, but should not include ‘$’.
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation module¶

lumin.plotting.interpretation.plot_importance(df, feat_name='Feature', imp_name='Importance', unc_name='Uncertainty', threshold=None, x_lbl='Importance via feature permutation', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Plot feature importances as computted via get_nn_feat_importance, get_ensemble_feat_importance, or rf_rank_features

Parameters

df (DataFrame) – DataFrame containing columns of features, importances and, optionally, uncertainties
feat_name (str) – column name for features
imp_name (str) – column name for importances
unc_name (str) – column name for uncertainties (if present)
threshold (Optional[float]) – if set, will draw a line at the threshold hold used for feature importance
x_lbl (str) – label to put on the x-axis
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_embedding(embed, feat, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Visualise weights in provided categorical entity-embedding matrix

Parameters

embed (OrderedDict) – state_dict of trained nn.Embedding
feat (str) – name of feature embedded
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_1d_partial_dependence(model, df, feat, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_clusters=10, n_points=20, pdp_isolate_kargs=None, pdp_plot_kargs=None, y_lim=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Wrapper for PDPbox to plot 1D dependence of specified feature using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale the x-axis back to its original values.

Parameters

model (Any) – any trained model with a .predict method
df (DataFrame) – DataFrame containing training data
feat (str) – feature for which to evaluate the partial dependence of the model
train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted
ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)
input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data
sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.
wgt_name (Optional[str]) – Optional column name to use as sampling weights
n_points (int) – number of points at which to evaluate the model output, passed to pdp_isolate as num_grid_points
n_clusters (Optional[int]) – number of clusters in which to group dependency lines. Set to None to show all lines
pdp_isolate_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_isolate
pdp_plot_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_plot
y_lim (Union[Tuple[float, float], List[float], None]) – If set, will limit y-axis plot range to tuple
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_2d_partial_dependence(model, df, feats, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_points=[20, 20], pdp_interact_kargs=None, pdp_interact_plot_kargs=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Wrapper for PDPbox to plot 2D dependence of specified pair of features using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale them back to their original values.

Parameters

model (Any) – any trained model with a .predict method
df (DataFrame) – DataFrame containing training data
feats (Tuple[str, str]) – pair of features for which to evaluate the partial dependence of the model
train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted
ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)
input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data
sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.
wgt_name (Optional[str]) – Optional column name to use as sampling weights
n_points (Tuple[int, int]) – pair of numbers of points at which to evaluate the model output, passed to pdp_interact as num_grid_points
n_clusters – number of clusters in which to group dependency lines. Set to None to show all lines
pdp_isolate_kargs – optional dictionary of keyword arguments to pass to pdp_isolate
pdp_plot_kargs – optional dictionary of keyword arguments to pass to pdp_plot
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_multibody_weighted_outputs(model, inputs, block_names=None, use_mean=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Interpret how a model relies on the outputs of each block in a :class:MultiBlock by plotting the outputs of each block as weighted by the tail block. This function currently only supports models whose tail block contains a single neuron in the first dense layer. Input data is passed through the model and the absolute sums of the weighted block outputs are computed per datum, and optionally averaged over the number of block outputs.

Parameters

model (AbsModel) – model to interpret
inputs (Union[ndarray, Tensor]) – input data to use for interpretation
block_names (Optional[List[str]]) – names for each block to use when plotting
use_mean (bool) – if True, will average the weighted outputs over the number of output neurons in each block
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_bottleneck_weighted_inputs(model, bottleneck_idx, inputs, log_y=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Interpret how a single-neuron bottleneck in a :class:MultiBlock relies on input features by plotting the absolute values of the features times their associated weight for a given set of input data.

Parameters

model (AbsModel) – model to interpret
bottleneck_idx (int) – index of the bottleneck to interpret, i.e. model.body.bottleneck_blocks[bottleneck_idx]
inputs (Union[ndarray, Tensor]) – input data to use for interpretation
log_y (bool) – whether to plot a log scale for the y-axis
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.plot_settings module¶

class lumin.plotting.plot_settings.PlotSettings(**kargs)[source]¶

Bases: object

Class to provide control over plot appearances. Default parameters are set automatically, and can be adjusted by passing values as keyword arguments during initialisation (or changed after instantiation)

Parameters: arguments (keyword) – used to set relevant plotting parameters

str2sz(sz, ax)[source]¶

Used to map requested plot sizes to actual dimensions

Parameters

sz (str) – string representation of size
ax (str) – axis dimension requested

Return type

float

Returns

width of plot dimension

lumin.plotting.results module¶

lumin.plotting.results.plot_roc(data, pred_name='pred', targ_name='gen_target', wgt_name=None, labels=None, plot_params=None, n_bootstrap=0, log_x=False, plot_baseline=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Plot receiver operating characteristic curve(s), optionally using booststrap resampling

Parameters

data (Union[DataFrame, List[DataFrame]]) – (list of) DataFrame(s) from which to draw predictions and targets
pred_name (str) – name of column to use as predictions
targ_name (str) – name of column to use as targets
wgt_name (Optional[str]) – optional name of column to use as sample weights
labels (Union[str, List[str], None]) – (list of) label(s) for plot legend
plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – (list of) dictionar[y/ies] of argument(s) to pass to line plot
n_bootstrap (int) – if greater than 0, will bootstrap resample the data that many times when computing the ROC AUC. Currently, this does not affect the shape of the lines, which are based on computing the ROC for the entire dataset as is.
log_x (bool) – whether to use a log scale for plotting the x-axis, useful for high AUC line
plot_baseline (bool) – whether to plot a dotted line for AUC=0.5. Currently incompatable with log_x=True
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

Dict[str, Union[float, Tuple[float, float]]]

Returns

Dictionary mapping data labels to aucs (and uncertainties if n_bootstrap > 0)

lumin.plotting.results.plot_binary_class_pred(df, pred_name='pred', targ_name='gen_target', wgt_name=None, wgt_scale=1, log_y=False, lim_x=(0, 1), density=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Basic plotter for prediction distribution in a binary classification problem. Note that labels are set using the settings.targ2class dictionary, which by default is {0: ‘Background’, 1: ‘Signal’}.

Parameters

df (DataFrame) – DataFrame with targets and predictions
pred_name (str) – name of column to use as predictions
targ_name (str) – name of column to use as targets
wgt_name (Optional[str]) – optional name of column to use as sample weights
wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling
log_y (bool) – whether to use a log scale for the y-axis
lim_x (Tuple[float, float]) – limit for plotting on the x-axis
density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.results.plot_sample_pred(df, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=35, log_y=True, lim_x=(0, 1), density=False, zoom_args=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

More advanced plotter for prediction distribution in a binary class problem with stacked distributions for backgrounds and user-defined binning Can also zoom in to specified parts of plot Note that plotting colours can be controled by seeting the settings.sample2col dictionary

Parameters

df (DataFrame) – DataFrame with targets and predictions
pred_name (str) – name of column to use as predictions
targ_name (str) – name of column to use as targets
wgt_name (str) – name of column to use as sample weights
sample_name (str) – name of column to use as process names
wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling
bins (Union[int, List[int]]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binning
log_y (bool) – whether to use a log scale for the y-axis
lim_x (Tuple[float, float]) – limit for plotting on the x-axis
density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints
zoom_args (Optional[Dict[str, Any]]) – arguments to control the optional zoomed in section, e.g. {‘x’:(0.4,0.45), ‘y’:(0.2, 1500), ‘anchor’:(0,0.25,0.95,1), ‘width_scale’:1, ‘width_zoom’:4, ‘height_zoom’:3}
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.training module¶

lumin.plotting.training.plot_train_history(histories, savename=None, ignore_trn=False, settings=<lumin.plotting.plot_settings.PlotSettings object>, show=True, xlow=0, log_y=False)[source]¶

Plot histories object returned by train_models() showing the loss evolution over time per model trained.

Parameters

histories (List[OrderedDict]) – list of dictionaries mapping loss type to values at each (sub)-epoch
savename (Optional[str]) – Optional name of file to which to save the plot of feature importances
ignore_trn (bool) – whether to ignore training loss
settings (PlotSettings) – PlotSettings class to control figure appearance
show (bool) – whether or not to show the plot, or just save it
xlow (int) – if set, will cut out the first given number of epochs
log_y (bool) – whether to plot the y-axis with a log scale

Return type

None

lumin.plotting.training.plot_lr_finders(lr_finders, lr_range=None, loss_range='auto', log_y='auto', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]¶

Plot mean loss evolution against learning rate for several fold_lr_find.

Parameters

lr_finders (List[AbsCallback]) – list of fold_lr_find)
lr_range (Union[float, Tuple, None]) – limits the range of learning rates plotted on the x-axis: if float, maximum LR; if tuple, minimum & maximum LR
loss_range (Union[float, Tuple, str, None]) – limits the range of losses plotted on the x-axis: if float, maximum loss; if tuple, minimum & maximum loss; if None, no limits; if ‘auto’, computes an upper limit automatically
log_y (Union[str, bool]) – whether to plot y-axis as log. If ‘auto’, will set to log if maximal fractional difference in loss values is greater than 50
savename (Optional[str]) – Optional name of file to which to save the plot
settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting package¶

Submodules¶

lumin.plotting.data_viewing module¶

lumin.plotting.interpretation module¶

lumin.plotting.plot_settings module¶

lumin.plotting.results module¶

lumin.plotting.training module¶

Module contents¶

Docs

Tutorials