Shortcuts

lumin.plotting package

Submodules

lumin.plotting.data_viewing module

lumin.plotting.data_viewing.plot_feat(df, feat, wgt_name=None, cuts=None, labels='', plot_bulk=True, n_samples=100000, plot_params=None, size='mid', show_moments=True, ax_labels={'x': None, 'y': 'Density'}, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

A flexible function to provide indicative information about the 1D distribution of a feature. By default it will produce a weighted KDE+histogram for the [1,99] percentile of the data, as well as compute the mean and standard deviation of the data in this region. Distributions are weighted by sampling with replacement the data with probabilities propotional to the sample weights. By passing a list of cuts and labels, it will plot multiple distributions of the same feature for different cuts. Since it is designed to provide quick, indicative information, more specific functions (such as plot_kdes_from_bs) should be used to provide final results.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data

  • feat (str) – column name to plot

  • wgt_name (Optional[str]) – if set, will use column to weight data

  • cuts (Optional[List[Series]]) – optional list of cuts to apply to feature. Will add one KDE+hist for each cut listed on the same plot

  • labels (Optional[List[str]]) – optional list of labels for each KDE+hist

  • plot_bulk (bool) – whether to plot the [1,99] percentile of the data, or all of it

  • n_samples (int) – if plotting weighted distributions, how many samples to use

  • plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – optional list of of arguments to pass to Seaborn Distplot for each KDE+hist

  • size (str) – string to pass to str2sz() to determin size of plot

  • show_moments (bool) – whether to compute and display the mean and standard deviation

  • ax_labels (Dict[str, Any]) – dictionary of x and y axes labels

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.data_viewing.compare_events(events)[source]

Plot at least two events side by side in their transverse and longitudinal projections

Parameters

events (list) – list of DataFrames containing vector coordinates for 3 momenta

Return type

None

lumin.plotting.data_viewing.plot_rank_order_dendrogram(df, threshold=0.8, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot dendrogram of features in df clustered via Spearman’s rank correlation coefficient. Also returns a list pairs of features with correlation coefficients greater than the threshold

Parameters
  • df (DataFrame) – Pandas DataFrame containing data

  • threshold (float) – Threshold on correlation coefficient

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

List[List[str]]

Returns

List of pairs of features with correlation coefficients greater than the threshold

lumin.plotting.data_viewing.plot_kdes_from_bs(x, bs_stats, name2args, feat, units=None, moments=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot KDEs computed via bootstrap_stats()

Parameters
  • bs_stats (Dict[str, Any]) – (filtered) dictionary retruned by bootstrap_stats()

  • name2args (Dict[str, Dict[str, Any]]) – Dictionary mapping names of different distributions to arguments to pass to seaborn tsplot

  • feat (str) – Name of feature being plotted (for axis lablels)

  • units (Optional[str]) – Optional units to show on axes

  • moments – whether to display mean and standard deviation of each distribution

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation module

lumin.plotting.interpretation.plot_importance(df, feat_name='Feature', imp_name='Importance', unc_name='Uncertainty', threshold=None, x_lbl='Importance via feature permutation', savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot feature importances as computted via get_nn_feat_importance, get_ensemble_feat_importance, or rf_rank_features

Parameters
  • df (DataFrame) – DataFrame containing columns of features, importances and, optionally, uncertainties

  • feat_name (str) – column name for features

  • imp_name (str) – column name for importances

  • unc_name (str) – column name for uncertainties (if present)

  • threshold (Optional[float]) – if set, will draw a line at the threshold hold used for feature importance

  • x_lbl (str) – label to put on the x-axis

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_embedding(embed, feat, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Visualise weights in provided categorical entity-embedding matrix

Parameters
  • embed (OrderedDict) – state_dict of trained nn.Embedding

  • feat (str) – name of feature embedded

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_1d_partial_dependence(model, df, feat, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_clusters=10, n_points=20, pdp_isolate_kargs=None, pdp_plot_kargs=None, y_lim=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Wrapper for PDPbox to plot 1D dependence of specified feature using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale the x-axis back to its original values.

Parameters
  • model (Any) – any trained model with a .predict method

  • df (DataFrame) – DataFrame containing training data

  • feat (str) – feature for which to evaluate the partial dependence of the model

  • train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted

  • ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)

  • input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data

  • sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.

  • wgt_name (Optional[str]) – Optional column name to use as sampling weights

  • n_points (int) – number of points at which to evaluate the model output, passed to pdp_isolate as num_grid_points

  • n_clusters (Optional[int]) – number of clusters in which to group dependency lines. Set to None to show all lines

  • pdp_isolate_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_isolate

  • pdp_plot_kargs (Optional[Dict[str, Any]]) – optional dictionary of keyword arguments to pass to pdp_plot

  • y_lim (Union[Tuple[float, float], List[float], None]) – If set, will limit y-axis plot range to tuple

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_2d_partial_dependence(model, df, feats, train_feats, ignore_feats=None, input_pipe=None, sample_sz=None, wgt_name=None, n_points=[20, 20], pdp_interact_kargs=None, pdp_interact_plot_kargs=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Wrapper for PDPbox to plot 2D dependence of specified pair of features using provided NN or RF. If features have been preprocessed using an SK-Learn Pipeline, then that can be passed in order to rescale them back to their original values.

Parameters
  • model (Any) – any trained model with a .predict method

  • df (DataFrame) – DataFrame containing training data

  • feats (Tuple[str, str]) – pair of features for which to evaluate the partial dependence of the model

  • train_feats (List[str]) – list of all training features including ones which were later ignored, i.e. input features considered when input_pipe was fitted

  • ignore_feats (Optional[List[str]]) – features present in training data which were not used to train the model (necessary to correctly deprocess feature using input_pipe)

  • input_pipe (Optional[Pipeline]) – SK-Learn Pipeline which was used to process the training data

  • sample_sz (Optional[int]) – if set, will only compute partial dependence on a random sample with replacement of the training data, sampled according to weights (if set). Speeds up computation and allows weighted partial dependencies to computed.

  • wgt_name (Optional[str]) – Optional column name to use as sampling weights

  • n_points (Tuple[int, int]) – pair of numbers of points at which to evaluate the model output, passed to pdp_interact as num_grid_points

  • n_clusters – number of clusters in which to group dependency lines. Set to None to show all lines

  • pdp_isolate_kargs – optional dictionary of keyword arguments to pass to pdp_isolate

  • pdp_plot_kargs – optional dictionary of keyword arguments to pass to pdp_plot

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_multibody_weighted_outputs(model, inputs, block_names=None, use_mean=False, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Interpret how a model relies on the outputs of each block in a :class:MultiBlock by plotting the outputs of each block as weighted by the tail block. This function currently only supports models whose tail block contains a single neuron in the first dense layer. Input data is passed through the model and the absolute sums of the weighted block outputs are computed per datum, and optionally averaged over the number of block outputs.

Parameters
  • model (AbsModel) – model to interpret

  • inputs (Union[ndarray, Tensor]) – input data to use for interpretation

  • block_names (Optional[List[str]]) – names for each block to use when plotting

  • use_mean (bool) – if True, will average the weighted outputs over the number of output neurons in each block

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.interpretation.plot_bottleneck_weighted_inputs(model, bottleneck_idx, inputs, log_y=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Interpret how a single-neuron bottleneck in a :class:MultiBlock relies on input features by plotting the absolute values of the features times their associated weight for a given set of input data.

Parameters
  • model (AbsModel) – model to interpret

  • bottleneck_idx (int) – index of the bottleneck to interpret, i.e. model.body.bottleneck_blocks[bottleneck_idx]

  • inputs (Union[ndarray, Tensor]) – input data to use for interpretation

  • log_y (bool) – whether to plot a log scale for the y-axis

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.plot_settings module

class lumin.plotting.plot_settings.PlotSettings(**kargs)[source]

Bases: object

Class to provide control over plot appearances. Default parameters are set automatically, and can be adjusted by passing values as keyword arguments during initialisation (or changed after instantiation)

Parameters

arguments (keyword) – used to set relevant plotting parameters

str2sz(sz, ax)[source]

Used to map requested plot sizes to actual dimensions

Parameters
  • sz (str) – string representation of size

  • ax (str) – axis dimension requested

Return type

float

Returns

width of plot dimension

lumin.plotting.results module

lumin.plotting.results.plot_roc(data, pred_name='pred', targ_name='gen_target', wgt_name=None, labels=None, plot_params=None, n_bootstrap=0, log_x=False, plot_baseline=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot receiver operating characteristic curve(s), optionally using booststrap resampling

Parameters
  • data (Union[DataFrame, List[DataFrame]]) – (list of) DataFrame(s) from which to draw predictions and targets

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (Optional[str]) – optional name of column to use as sample weights

  • labels (Union[str, List[str], None]) – (list of) label(s) for plot legend

  • plot_params (Union[Dict[str, Any], List[Dict[str, Any]], None]) – (list of) dictionar[y/ies] of argument(s) to pass to line plot

  • n_bootstrap (int) – if greater than 0, will bootstrap resample the data that many times when computing the ROC AUC. Currently, this does not affect the shape of the lines, which are based on computing the ROC for the entire dataset as is.

  • log_x (bool) – whether to use a log scale for plotting the x-axis, useful for high AUC line

  • plot_baseline (bool) – whether to plot a dotted line for AUC=0.5. Currently incompatable with log_x=True

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

Dict[str, Union[float, Tuple[float, float]]]

Returns

Dictionary mapping data labels to aucs (and uncertainties if n_bootstrap > 0)

lumin.plotting.results.plot_binary_class_pred(df, pred_name='pred', targ_name='gen_target', wgt_name=None, wgt_scale=1, log_y=False, lim_x=(0, 1), density=True, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Basic plotter for prediction distribution in a binary classification problem. Note that labels are set using the settings.targ2class dictionary, which by default is {0: ‘Background’, 1: ‘Signal’}.

Parameters
  • df (DataFrame) – DataFrame with targets and predictions

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (Optional[str]) – optional name of column to use as sample weights

  • wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling

  • log_y (bool) – whether to use a log scale for the y-axis

  • lim_x (Tuple[float, float]) – limit for plotting on the x-axis

  • density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.results.plot_sample_pred(df, pred_name='pred', targ_name='gen_target', wgt_name='gen_weight', sample_name='gen_sample', wgt_scale=1, bins=35, log_y=True, lim_x=(0, 1), density=False, zoom_args=None, savename=None, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

More advanceed plotter for prediction distirbution in a binary class problem with stacked distributions for backgrounds and user-defined binning Can also zoom in to specified parts of plot Note that plotting colours can be controled by seeting the settings.sample2col dictionary

Parameters
  • df (DataFrame) – DataFrame with targets and predictions

  • pred_name (str) – name of column to use as predictions

  • targ_name (str) – name of column to use as targets

  • wgt_name (str) – name of column to use as sample weights

  • sample_name (str) – name of column to use as process names

  • wgt_scale (float) – applies a global multiplicative rescaling to sample weights. Default 1 = no rescaling

  • bins (Union[int, List[int]]) – either the number of bins to use for a uniform binning, or a list of bin edges for a variable-width binning

  • log_y (bool) – whether to use a log scale for the y-axis

  • lim_x (Tuple[float, float]) – limit for plotting on the x-axis

  • density – whether to normalise each distribution to one, or keep set to sum of weights / datapoints

  • zoom_args (Optional[Dict[str, Any]]) – arguments to control the optional zoomed in section, e.g. {‘x’:(0.4,0.45), ‘y’:(0.2, 1500), ‘anchor’:(0,0.25,0.95,1), ‘width_scale’:1, ‘width_zoom’:4, ‘height_zoom’:3}

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.training module

lumin.plotting.training.plot_train_history(histories, savename=None, ignore_trn=True, settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot histories object returned by fold_train_ensemble() showing the loss evolution over time per model trained.

Parameters
  • histories (List[Dict[str, List[float]]]) – list of dictionaries mapping loss type to values at each (sub)-epoch

  • savename (Optional[str]) – Optional name of file to which to save the plot of feature importances

  • ignore_trn – whether to ignore training loss

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

lumin.plotting.training.plot_lr_finders(lr_finders, lr_range=None, loss_range='auto', settings=<lumin.plotting.plot_settings.PlotSettings object>)[source]

Plot mean loss evolution against learning rate for several fold_lr_find.

Parameters
  • lr_finders (List[LRFinder]) – list of fold_lr_find)

  • lr_range (Union[float, Tuple, None]) – limits the range of learning rates plotted on the x-axis: if float, maximum LR; if tuple, minimum & maximum LR

  • loss_range (Union[float, Tuple, str, None]) – limits the range of losses plotted on the x-axis: if float, maximum loss; if tuple, minimum & maximum loss; if None, no limits; if ‘auto’, computes an upper limit automatically

  • settings (PlotSettings) – PlotSettings class to control figure appearance

Return type

None

Module contents

Read the Docs v: v0.5.0
Versions
latest
stable
v0.5.0
v0.4.0.1
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer and user documentation for LUMIN

View Docs

Tutorials

Get tutorials for beginner and advanced researchers demonstrating many of the features of LUMIN

View Tutorials