giotto.homology.ConsistentRescaling

class giotto.homology.ConsistentRescaling(metric='euclidean', metric_params={}, neighbor_rank=1, n_jobs=None)

Rescaling of distances between pairs of points by the geometric mean of the distances to the respective \(k\)-th nearest neighbours.

Based on ideas in [1]. The computation during transform depends on the nature of the array X. If each entry in X along axis 0 represents a distance matrix \(D\), then the corresponding entry in the transformed array is the distance matrix \(D'_{ij} = D_{ij}/\sqrt{D_{ik_i}D_{jk_j}}\), where \(k_i\) is the index of the \(k\)-th largest value in row \(i\) (and similarly for \(j\)). If the entries in X represent point clouds, their distance matrices are first computed, and then rescaled according to the same formula.

Parameters
metricstring or callable, optional, default: 'euclidean'

If set to 'precomputed', each entry in X along axis 0 is interpreted to be a distance matrix. Otherwise, entries are interpreted as feature arrays, and metric determines a rule with which to calculate distances between pairs of instances (i.e. rows) in these arrays. If metric is a string, it must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter, or a metric listed in sklearn.pairwise.PAIRWISE_DISTANCE_FUNCTIONS, including “euclidean”, “manhattan” or “cosine”. If metric is a callable function, it is called on each pair of instances and the resulting value recorded. The callable should take two arrays from the entry in X as input, and return a value indicating the distance between them.

metric_paramsdict, optional, default: {}

Additional keyword arguments for the metric function.

neighbor_rankint, optional, default: 1

Rank of the neighbors used to modify the metric structure according to the “consistent rescaling” procedure.

n_jobsint or None, optional, default: None

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

References

1

T. Berry and T. Sauer, “Consistent manifold representation for topological data analysis”; Foundations of data analysis 1, pp. 1–38, 2019; doi: 10.3934/fods.2019001.

Examples

>>> import numpy as np
>>> from giotto.homology import ConsistentRescaling
>>> X = np.array([[[0, 0], [1, 2], [5, 6]]])
>>> cr = ConsistentRescaling()
>>> X_rescaled = cr.fit_transform(X)
>>> print(X_rescaled.shape)
(1, 3, 3)

Methods

fit(self, X[, y])

Do nothing and return the estimator unchanged.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, X[, y])

For each entry in the input data array X, find the metric structure after consistent rescaling and encodes it as a distance matrix.

__init__(self, metric='euclidean', metric_params={}, neighbor_rank=1, n_jobs=None)

Initialize self. See help(type(self)) for accurate signature.

fit(self, X, y=None)

Do nothing and return the estimator unchanged.

This method is there to implement the usual scikit-learn API and hence work in pipelines.

Parameters
Xndarray, shape (n_samples, n_points, n_points) or (n_samples, n_points, n_dimensions)

Input data. If metric == 'precomputed', the input should be an ndarray whose each entry along axis 0 is a distance matrix of shape (n_points, n_points). Otherwise, each such entry will be interpreted as an array of n_points row vectors in n_dimensions-dimensional space.

yNone

There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns
selfobject
fit_transform(self, X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xnumpy array of shape [n_samples, n_features]

Training set.

ynumpy array of shape [n_samples]

Target values.

Returns
X_newnumpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
transform(self, X, y=None)

For each entry in the input data array X, find the metric structure after consistent rescaling and encodes it as a distance matrix. Then, arrange all results in a single ndarray of appropriate shape.

Parameters
Xndarray, shape (n_samples, n_points, n_points) or (n_samples, n_points, n_dimensions)

Input data. If metric == 'precomputed', the input should be an ndarray whose each entry along axis 0 is a distance matrix of shape (n_points, n_points). Otherwise, each such entry will be interpreted as an array of n_points row vectors in n_dimensions-dimensional space.

yNone

There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns
Xtndarray, shape (n_samples, n_points, n_points)

Array containing (as entries along axis 0) the distance matrices after consistent rescaling.