giotto.diagrams
.PairwiseDistance¶
-
class
giotto.diagrams.
PairwiseDistance
(metric='landscape', metric_params=None, order=2.0, n_jobs=None)¶ Distances between pairs of persistence diagrams, constructed from the distances between their respective subdiagrams with constant homology dimension.
Given two collections of persistence diagrams consisting of birth-death-dimension triples [b, d, q], a collection of distance matrices or a single distance matrix between pairs of diagrams is calculated according to the following steps:
All diagrams are partitioned into subdiagrams corresponding to distinct homology dimensions.
Pairwise distances between subdiagrams of equal homology dimension are calculated according to the parameters metric and metric_params. This gives a collection of distance matrices, \(\mathbf{D} = (D_{q_1}, \ldots, D_{q_n})\).
The final result is either \(\mathbf{D}\) itself as a three-dimensional array, or a single distance matrix constructed by taking norms of the vectors of distances between diagram pairs.
- Parameters
- metric
'bottleneck'
|'wasserstein'
|'landscape'
|'betti'
|'heat'
, optional, default:'landscape'
Distance or dissimilarity function between subdiagrams:
'bottleneck'
and'wasserstein'
refer to the identically named perfect-matching–based notions of distance.'landscape'
refers to the \(L^p\) distance between persistence landscapes.'betti'
refers to the \(L^p\) distance between Betti curves.'heat'
refers to the \(L^p\) distance between Gaussian-smoothed diagrams.
- metric_paramsdict or None, optional, default:
None
Additional keyword arguments for the metric function:
If
metric == 'bottleneck'
the only argument is delta (float, default:0.01
). When equal to0.
, an exact algorithm is used; otherwise, a faster approximate algorithm is used.If
metric == 'wasserstein'
the available arguments are p (int, default:2
) and delta (float, default:0.01
). Unlike the case of'bottleneck'
, delta cannot be set to0.
and an exact algorithm is not available.If
metric == 'betti'
the available arguments are p (float, default:2.
) and n_values (int, default:100
).If
metric == 'landscape'
the available arguments are p (float, default:2.
), n_values (int, default:100
) and n_layers (int, default:1
).If
metric == 'heat'
the available arguments are p (float, default:2.
), sigma (float, default:1.
) and n_values (int, default:100
).
- orderfloat or None, optional, default:
2.
If
None
,transform
returns for each pair of diagrams a vector of distances corresponding to the dimensions inhomology_dimensions_
. Otherwise, the \(p\)-norm of these vectors with \(p\) equal to order is taken.- n_jobsint or None, optional, default:
None
The number of jobs to use for the computation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.
- metric
- Attributes
See also
Notes
To compute distances without first splitting the computation between different homology dimensions, data should be first transformed by an instance of
ForgetDimension
.Hera is used as a C++ backend for computing bottleneck and Wasserstein distances between persistence diagrams. Python bindings were modified for performance from the Dyonisus 2 package.
Methods
fit
(self, X[, y])Store all observed homology dimensions in
homology_dimensions_
and computeeffective_metric_params
.fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y])Computes a distance or vector of distances between the diagrams in X and the diagrams seen in
fit
.-
__init__
(self, metric='landscape', metric_params=None, order=2.0, n_jobs=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(self, X, y=None)¶ Store all observed homology dimensions in
homology_dimensions_
and computeeffective_metric_params
. Then, return the estimator.This method is there to implement the usual scikit-learn API and hence work in pipelines.
- Parameters
- Xndarray, shape (n_samples_fit, n_features, 3)
Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q).
- yNone
There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
-
fit_transform
(self, X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
- Xnumpy array of shape [n_samples, n_features]
Training set.
- ynumpy array of shape [n_samples]
Target values.
- Returns
- X_newnumpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
- Parameters
- deepboolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Returns
- self
-
transform
(self, X, y=None)¶ Computes a distance or vector of distances between the diagrams in X and the diagrams seen in
fit
.- Parameters
- Xndarray, shape (n_samples, n_features, 3)
Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q).
- yNone
There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- Xtndarray, shape (n_samples_fit, n_samples, n_homology_dimensions) if order is
None
, else (n_samples_fit, n_samples) Distance matrix or collection of distance matrices between diagrams in X and diagrams seen in
fit
. In the second case, index i along axis 2 corresponds to the i-th homology dimension inhomology_dimensions_
.
- Xtndarray, shape (n_samples_fit, n_samples, n_homology_dimensions) if order is