giotto.diagrams.Scaler

class giotto.diagrams.Scaler(metric='bottleneck', metric_params=None, function=<function amax>, n_jobs=None)

Linear scaling of persistence diagrams.

A positive scale factor is calculated during fit by considering all available persistence diagrams and homology dimensions. During transform, all birth-death pairs are divided by this factor.

The value of the scale factor depends on two things:

  • A way of computing, for each homology dimension, the amplitude in that dimension of a persistence diagram consisting of birth-death-dimension triples [b, d, q]. Together, metric and metric_params define this in the same way as in Amplitude.

  • A scalar-valued function which is applied to the resulting two-dimensional array of amplitudes.

Parameters
metric'bottleneck' | 'wasserstein' | 'landscape' | 'betti' | 'heat', optional, default: 'bottleneck'

Distance or dissimilarity function used to define the amplitude of a subdiagram as its distance from the diagonal diagram:

  • 'bottleneck' and 'wasserstein' refer to the identically named perfect-matching–based notions of distance.

  • 'landscape' refers to the \(L^p\) distance between persistence landscapes.

  • 'betti' refers to the \(L^p\) distance between Betti curves.

  • 'heat' refers to the \(L^p\) distance between Gaussian-smoothed diagrams.

metric_paramsdict or None, optional, default: None

Additional keyword arguments for the metric function:

  • If metric == 'bottleneck' there are no available arguments.

  • If metric == 'wasserstein' the only argument is p (int, default: 2).

  • If metric == 'betti' the available arguments are p (float, default: 2.) and n_values (int, default: 100).

  • If metric == 'landscape' the available arguments are p (float, default: 2.), n_values (int, default: 100) and n_layers (int, default: 1).

  • If metric == 'heat' the available arguments are p (float, default: 2.), sigma (float, default: 1.) and n_values (int, default: 100).

functioncallable, optional, default: numpy.max

Function used to extract a positive scalar from the collection of amplitude vectors in fit.

n_jobsint or None, optional, default: None

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Attributes
effective_metric_params_dict

Dictionary containing all information present in metric_params as well as on any relevant quantities computed in fit.

homology_dimensions_list

Homology dimensions seen in fit, sorted in ascending order.

scale_float

Value by which to rescale diagrams.

Notes

To compute scaling factors without first splitting the computation between different homology dimensions, data should be first transformed by an instance of ForgetDimension.

Methods

fit(self, X[, y])

Store all observed homology dimensions in homology_dimensions_ and compute scale_.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, X[, copy])

Scale back the data to the original representation.

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, X[, y])

Divide all birth and death values in X by scale_.

__init__(self, metric='bottleneck', metric_params=None, function=<function amax at 0x10f7bef28>, n_jobs=None)

Initialize self. See help(type(self)) for accurate signature.

fit(self, X, y=None)

Store all observed homology dimensions in homology_dimensions_ and compute scale_. Then, return the estimator.

Parameters
Xndarray, shape (n_samples, n_features, 3)

Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q).

yNone

There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns
selfobject
fit_transform(self, X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xnumpy array of shape [n_samples, n_features]

Training set.

ynumpy array of shape [n_samples]

Target values.

Returns
X_newnumpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

inverse_transform(self, X, copy=None)

Scale back the data to the original representation. Multiplies by the scale found in fit.

Parameters
Xndarray, shape (n_samples, n_features, 3)

Data to apply the inverse transform to.

Returns
Xsndarray, shape (n_samples, n_features, 3)

Rescaled diagrams.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
transform(self, X, y=None)

Divide all birth and death values in X by scale_.

Parameters
Xndarray, shape (n_samples, n_features, 3)

Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q).

yNone

There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns
Xsndarray, shape (n_samples, n_features, 3)

Rescaled diagrams.