• Docs >
  • lumin.plotting package
Shortcuts

lumin.plotting package

Submodules

lumin.plotting.data_viewing module

lumin.plotting.data_viewing.compare_events(events)[source]

Plots at least two events side by side in their transverse and longitudinal projections

Parameters:

events (list) – list of DataFrames containing vector coordinates for 3 momenta

Return type:

None

lumin.plotting.data_viewing.plot_binary_sample_feat(df, feat, targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=None, log_y=False, lim_x=None, density=True, feat_name=None, units=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

More advanced plotter for feature distributions in a binary class problem with stacked distributions for backgrounds and user-defined binning Note that plotting colours can be controled by seeting the settings.sample2col dictionary

Parameters:
  • df (DataFrame) – DataFrame with targets and predictions

  • feat (str) – name of column to plot the distribution of

  • targ_name (str) – name of column to use as targets

  • wgt_name (str) – name of column to use as sample weights

  • sample_name (str) – name of column to use as process names

  • wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling. Only applicable when density = False

  • bins (Union[int, List[int], None]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binning

  • log_y (bool) – whether to use a log scale for the y-axis

  • lim_x (Optional[Tuple[float, float]]) – limit for plotting on the x-axis

  • density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints

  • feat_name (Optional[str]) – Name of feature to put on x-axis, can be in LaTeX.

  • units (Optional[str]) – units used to measure feature, if applicable. Can be in LaTeX, but should not include ‘$’.

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.data_viewing.plot_feat(df, feat, wgt_name=None, cuts=None, labels='', plot_bulk=True, n_samples=100000, plot_params=None, size='mid', show_moments=True, ax_labels={'x': None, 'y': None}, log_x=False, log_y=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

A flexible function to provide indicative information about the 1D distribution of a feature. By default it will produce a weighted KDE+histogram for the [1,99] percentile of the data, as well as compute the mean and standard deviation of the data in this region. Distributions are weighted by sampling with replacement the data with probabilities propotional to the sample weights. By passing a list of cuts and labels, it will plot multiple distributions of the same feature for different cuts. Since it is designed to provide quick, indicative information, more specific functions (such as plot_kdes_from_bs) should be used to provide final results.

Important

NaN and Inf values are removed prior to plotting and no attempt is made to coerce them to real numbers

Parameters:
  • df (DataFrame) – Pandas DataFrame containing data

  • feat (str) – column name to plot

  • wgt_name (Optional[str]) – if set, will use column to weight data

  • cuts (Optional[List[Series]]) – optional list of cuts to apply to feature. Will add one KDE+hist for each cut listed on the same plot

  • labels (Optional[List[str]]) – optional list of labels for each KDE+hist

  • plot_bulk (bool) – whether to plot the [1,99] percentile of the data, or all of it

  • n_samples (int) – if plotting weighted distributions, how many samples to use

  • plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – optional list of of arguments to pass to Seaborn Distplot for each KDE+hist

  • size (str) – string to pass to str2sz() to determin size of plot

  • show_moments (bool) – whether to compute and display the mean and standard deviation

  • ax_labels (Dict[str, Any]) – dictionary of x and y axes labels

  • log_x (bool) – if true, will use log scale for x-axis

  • log_y (bool) – if true, will use log scale for y-axis

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.data_viewing.plot_kdes_from_bs(x, bs_stats, name2args, feat, units=None, moments=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>, show_plot=True)[source]

Plots KDEs computed via bootstrap_stats()

Parameters:
  • x (ndarray) – x-axis values

  • bs_stats (Dict[str, Any]) – (filtered) dictionary retruned by bootstrap_stats()

  • name2args (Dict[str, Dict[str, Any]]) – Dictionary mapping names of different distributions to arguments to pass to seaborn tsplot

  • feat (str) – Name of feature being plotted (for axis lablels)

  • units (Optional[str]) – Optional units to show on axes

  • moments – whether to display mean and standard deviation of each distribution

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

  • show_plot (bool) – whether to the show plot, or just save them

Return type:

None

lumin.plotting.data_viewing.plot_rank_order_dendrogram(df, threshold=0.8, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plots a dendrogram of features in df clustered via Spearman’s rank correlation coefficient. Also returns a sets of features with correlation coefficients greater than the threshold

Parameters:
  • df (DataFrame) – Pandas DataFrame containing data

  • threshold (float) – Threshold on correlation coefficient

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

Dict[str, Union[List[str], float]]

Returns:

Dict of sets of features with correlation coefficients greater than the threshold and cluster distance

lumin.plotting.interpretation module

lumin.plotting.interpretation.plot_1d_partial_dependence(model, df, feat, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_clusters=10, n_points=20, pdp_isolate_kargs=None, pdp_plot_kargs=None, y_lim=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Wrapper for PDPbox to plot 1D dependence of specified feature using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale the x-axis back to its original values.

Parameters:
  • model (Any) – any trained model with a .predict method

  • df (DataFrame) – DataFrame containing training data

  • feat (str) – feature for which to evaluate the partial dependence of the model

  • train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted

  • ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)

  • input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data

  • sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.

  • wgt_name (Optional[str]) – Optional column name to use as sampling weights

  • n_points (int) – number of points at which to evaluate the model output, passed to pdp_isolate as num_grid_points

  • n_clusters (Optional[int]) – number of clusters in which to group dependency lines. Set to None to show all lines

  • pdp_isolate_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_isolate

  • pdp_plot_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_plot

  • y_lim (Union[Tuple[float, float], List[float], None]) – If set, will limit y-axis plot range to tuple

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.interpretation.plot_2d_partial_dependence(model, df, feats, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_points=[20, 20], pdp_interact_kargs=None, pdp_interact_plot_kargs=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Wrapper for PDPbox to plot 2D dependence of specified pair of features using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale them back to their original values.

Parameters:
  • model (Any) – any trained model with a .predict method

  • df (DataFrame) – DataFrame containing training data

  • feats (Tuple[str, str]) – pair of features for which to evaluate the partial dependence of the model

  • train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted

  • ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)

  • input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data

  • sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.

  • wgt_name (Optional[str]) – Optional column name to use as sampling weights

  • n_points (Tuple[int, int]) – pair of numbers of points at which to evaluate the model output, passed to pdp_interact as num_grid_points

  • n_clusters – number of clusters in which to group dependency lines. Set to None to show all lines

  • pdp_isolate_kargs – optional dictionary of keyword arguments to pass to pdp_isolate

  • pdp_plot_kargs – optional dictionary of keyword arguments to pass to pdp_plot

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.interpretation.plot_bottleneck_weighted_inputs(model, bottleneck_idx, inputs, log_y=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Interpret how a single-neuron bottleneck in a :class:MultiBlock relies on input features by plotting the absolute values of the features times their associated weight for a given set of input data.

Parameters:
  • model (AbsModel) – model to interpret

  • bottleneck_idx (int) – index of the bottleneck to interpret, i.e. model.body.bottleneck_blocks[bottleneck_idx]

  • inputs (Union[ndarray, Tensor]) – input data to use for interpretation

  • log_y (bool) – whether to plot a log scale for the y-axis

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.interpretation.plot_embedding(embed, feat, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Visualise weights in provided categorical entity-embedding matrix

Parameters:
  • embed (OrderedDict) – state_dict of trained nn.Embedding

  • feat (str) – name of feature embedded

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.interpretation.plot_importance(df, feat_name='Feature', imp_name='Importance', unc_name='Uncertainty', threshold=None, x_lbl='Importance via feature permutation', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot feature importances as computted via get_nn_feat_importance, get_ensemble_feat_importance, or rf_rank_features

Parameters:
  • df (DataFrame) – DataFrame containing columns of features, importances and, optionally, uncertainties

  • feat_name (str) – column name for features

  • imp_name (str) – column name for importances

  • unc_name (str) – column name for uncertainties (if present)

  • threshold (Optional[float]) – if set, will draw a line at the threshold hold used for feature importance

  • x_lbl (str) – label to put on the x-axis

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.interpretation.plot_multibody_weighted_outputs(model, inputs, block_names=None, use_mean=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Interpret how a model relies on the outputs of each block in a :class:MultiBlock by plotting the outputs of each block as weighted by the tail block. This function currently only supports models whose tail block contains a single neuron in the first dense layer. Input data is passed through the model and the absolute sums of the weighted block outputs are computed per datum, and optionally averaged over the number of block outputs.

Parameters:
  • model (AbsModel) – model to interpret

  • inputs (Union[ndarray, Tensor]) – input data to use for interpretation

  • block_names (Optional[List[str]]) – names for each block to use when plotting

  • use_mean (bool) – if True, will average the weighted outputs over the number of output neurons in each block

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.plot_settings module

class lumin.plotting.plot_settings.PlotSettings(**kargs)[source]

Bases: object

Class to provide control over plot appearances. Default parameters are set automatically, and can be adjusted by passing values as keyword arguments during initialisation (or changed after instantiation)

Parameters:

arguments (keyword) – used to set relevant plotting parameters

str2sz(sz, ax)[source]

Used to map requested plot sizes to actual dimensions

Parameters:
  • sz (str) – string representation of size

  • ax (str) – axis dimension requested

Return type:

float

Returns:

width of plot dimension

lumin.plotting.results module

lumin.plotting.results.plot_binary_class_pred(df, pred_name='pred', targ_name='gen_target', wgt_name=None, wgt_scale=1, log_y=False, lim_x=(0, 1), density=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Basic plotter for prediction distribution in a binary classification problem. Note that labels are set using the settings.targ2class dictionary, which by default is {0: ‘Background’, 1: ‘Signal’}.

Parameters:
  • df (DataFrame) – DataFrame with targets and predictions

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (Optional[str]) – optional name of column to use as sample weights

  • wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling

  • log_y (bool) – whether to use a log scale for the y-axis

  • lim_x (Tuple[float, float]) – limit for plotting on the x-axis

  • density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.results.plot_roc(data, pred_name='pred', targ_name='gen_target', wgt_name=None, labels=None, plot_params=None, n_bootstrap=0, log_x=False, plot_baseline=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot receiver operating characteristic curve(s), optionally using booststrap resampling

Parameters:
  • data (Union[DataFrame, List[DataFrame]]) – (list of) DataFrame(s) from which to draw predictions and targets

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (Optional[str]) – optional name of column to use as sample weights

  • labels (Union[str, List[str], None]) – (list of) label(s) for plot legend

  • plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – (list of) dictionar[y/ies] of argument(s) to pass to line plot

  • n_bootstrap (int) – if greater than 0, will bootstrap resample the data that many times when computing the ROC AUC. Currently, this does not affect the shape of the lines, which are based on computing the ROC for the entire dataset as is.

  • log_x (bool) – whether to use a log scale for plotting the x-axis, useful for high AUC line

  • plot_baseline (bool) – whether to plot a dotted line for AUC=0.5. Currently incompatable with log_x=True

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

Dict[str, Union[float, Tuple[float, float]]]

Returns:

Dictionary mapping data labels to aucs (and uncertainties if n_bootstrap > 0)

lumin.plotting.results.plot_sample_pred(df, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=35, log_y=True, lim_x=(0, 1), density=False, zoom_args=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

More advanced plotter for prediction distribution in a binary class problem with stacked distributions for backgrounds and user-defined binning Can also zoom in to specified parts of plot Note that plotting colours can be controled by seeting the settings.sample2col dictionary

Parameters:
  • df (DataFrame) – DataFrame with targets and predictions

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (str) – name of column to use as sample weights

  • sample_name (str) – name of column to use as process names

  • wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling

  • bins (Union[int, List[int]]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binning

  • log_y (bool) – whether to use a log scale for the y-axis

  • lim_x (Tuple[float, float]) – limit for plotting on the x-axis

  • density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints

  • zoom_args (Optional[Dict[str, Any]]) – arguments to control the optional zoomed in section, e.g. {‘x’:(0.4,0.45), ‘y’:(0.2, 1500), ‘anchor’:(0,0.25,0.95,1), ‘width_scale’:1, ‘width_zoom’:4, ‘height_zoom’:3}

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type:

None

lumin.plotting.training module

lumin.plotting.training.plot_lr_finders(lr_finders, lr_range=None, loss_range='auto', log_y='auto', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>, show_plot=True)[source]

Plot mean loss evolution against learning rate for several fold_lr_find.

Parameters:
  • lr_finders (List[AbsCallback]) – list of fold_lr_find)

  • lr_range (Union[float, Tuple, None]) – limits the range of learning rates plotted on the x-axis: if float, maximum LR; if tuple, minimum & maximum LR

  • loss_range (Union[float, Tuple, str, None]) – limits the range of losses plotted on the x-axis: if float, maximum loss; if tuple, minimum & maximum loss; if None, no limits; if ‘auto’, computes an upper limit automatically

  • log_y (Union[str, bool]) – whether to plot y-axis as log. If ‘auto’, will set to log if maximal fractional difference in loss values is greater than 50

  • savename (Optional[str]) – Optional name of file to which to save the plot

  • settings (PlotSettings) – PlotSettings class to control figure appearance

  • show_plot (bool) – whether to show the plot, or just save them

Return type:

None

lumin.plotting.training.plot_train_history(histories, savename=None, ignore_trn=False, settings=<lumin.plotting.plot_settings.PlotSettings object>, show=True, xlow=0, log_y=False)[source]

Plot histories object returned by train_models() showing the loss evolution over time per model trained.

Parameters:
  • histories (List[OrderedDict]) – list of dictionaries mapping loss type to values at each (sub)-epoch

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • ignore_trn (bool) – whether to ignore training loss

  • settings (PlotSettings) – PlotSettings class to control figure appearance

  • show (bool) – whether or not to show the plot, or just save it

  • xlow (int) – if set, will cut out the first given number of epochs

  • log_y (bool) – whether to plot the y-axis with a log scale

Return type:

None

Module contents

Docs

Access comprehensive developer and user documentation for LUMIN

View Docs

Tutorials

Get tutorials for beginner and advanced researchers demonstrating many of the features of LUMIN

View Tutorials