manifold hypothesis machine learning

Manifold Hypothesis

manifold hypothesis machine learning

What is the Manifold Hypothesis?

The Manifold Hypothesis states that real-world high-dimensional data lie on low-dimensional manifolds embedded within the high-dimensional space.

This hypothesis is better explained in examples, however.

Let's tackle the "embedded manifold" bit first, before we get to how it applies to machine learning and data.

A manifold is really just a technical term that is used to classify spaces of arbitrary dimension. For every whole number there exists a flat space called Euclidean space that has characteristics very similar to the cartesian plane. For example the Pythagorean theorem holds and thus the shortest distance between points is a straight line (in contrast, this is not true on a circle or sphere).  The dimension of a Euclidean space is essentially the number of (independent) degrees of freedom - basically, the number of (orthogonal) directions one can "move" inside the space).  A line has one such dimension, an infinite plane has 2, and an  infinite volume has 3, and so n.  A manifold is essentially a generalization of Euclidean space such that locally (small areas) are approximately the same as Euclidean space but the entire space fails to be have the same properties of Euclidean space when observed in its entirety.  This theoretical framework always mathematicians and other quantitively motivated scientists to describe spaces, like spheres, tori (donut-shaped spaces) and mobius bands, in a precise way and even allows a whole plethora of mathematical machinery, including calculus, to be used in a meaningful way.  The upshot is that now the classes of spaces upon which calculus now makes sense is expanded to include spaces that may be curved in arbitrary ways, or even have holes like the torus. 

So now we take this idea, and apply it to high-dimensional data.   Imagine we are interested in classify all (black and white) mages with mxn pixels.  Each pixel has a numerical value, and each can vary depending on what the image is, which could correspond to anything from an award wining photo to meaningless noise.  The point is that we have mxn degrees of freedom so we can treat an image of mxn pixels as being a single point in living in a space (manifold)  of dimension N = mn ,  Now, imagine the set of all mxn imagines that are photos of Einstein.  Clearly we now have some restriction on the choice of values for the pixels if we want the images to be photos of Einstein rather than something else.  Obviously random choices will not generate such images.  Therefore, we expect there to be less freedom of choice and hence:

The manifold hypothesis states that that this subset should actually live in an (ambient) space of lower dimension, in fact a dimension much, much smaller than  N . 

Why This Hypothesis is Important in Artificial Intelligence? 

The Manifold Hypothesis explains ( heuristically ) why machine learning techniques are able to find useful features and produce accurate predictions from datasets that have a potentially large number of dimensions ( variables).    The fact that the actual data set of interest actually lives on in a space of low dimension, means that a given machine learning model only needs to learn to focus on a few key features of the dataset to make decisions.  However these key features may turn out to be complicated functions of the original variables.  Many of the algorithms behind machine learning techniques focus on ways to determine these (embedding) functions.

MIT has an excellent paper  on testing the hypothesis. We also recommend checking out  Colah’s blog .

The world's most comprehensive data science & artificial intelligence glossary

Please sign up or login with your details

Generation Overview

AI Generator calls

AI Video Generator calls

AI Chat messages

Genius Mode messages

Genius Mode images

AD-free experience

Private images

  • Includes 500 AI Image generations, 1750 AI Chat Messages, 30 AI Video generations, 60 Genius Mode Messages and 60 Genius Mode Images per month. If you go over any of these limits, you will be charged an extra $5 for that group.
  • For example: if you go over 500 AI images, but stay within the limits for AI Chat and Genius Mode, you'll be charged $5 per additional 500 AI Image generations.
  • Includes 100 AI Image generations and 300 AI Chat Messages. If you go over any of these limits, you will have to pay as you go.
  • For example: if you go over 100 AI images, but stay within the limits for AI Chat, you'll have to reload on credits to generate more images. Choose from $5 - $1000. You'll only pay for what you use.

Out of credits

Refill your membership to continue using DeepAI

Share your generations with friends

scikit-learn homepage

2.2. Manifold learning #

../_images/sphx_glr_plot_compare_methods_001.png

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

2.2.1. Introduction #

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

digits_img

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

PCA_img

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

See Manifold learning on handwritten digits: Locally Linear Embedding, Isomap… for an example of dimensionality reduction on handwritten digits.

See Comparison of Manifold Learning methods for an example of dimensionality reduction on a toy “S-curve” dataset.

The manifold learning implementations available in scikit-learn are summarized below

2.2.2. Isomap #

One of the earliest approaches to manifold learning is the Isomap algorithm, short for Isometric Mapping. Isomap can be viewed as an extension of Multi-dimensional Scaling (MDS) or Kernel PCA. Isomap seeks a lower-dimensional embedding which maintains geodesic distances between all points. Isomap can be performed with the object Isomap .

../_images/sphx_glr_plot_lle_digits_005.png

The Isomap algorithm comprises three stages:

Nearest neighbor search. Isomap uses BallTree for efficient neighbor search. The cost is approximately \(O[D \log(k) N \log(N)]\) , for \(k\) nearest neighbors of \(N\) points in \(D\) dimensions.

Shortest-path graph search. The most efficient known algorithms for this are Dijkstra’s Algorithm , which is approximately \(O[N^2(k + \log(N))]\) , or the Floyd-Warshall algorithm , which is \(O[N^3]\) . The algorithm can be selected by the user with the path_method keyword of Isomap . If unspecified, the code attempts to choose the best algorithm for the input data.

Partial eigenvalue decomposition. The embedding is encoded in the eigenvectors corresponding to the \(d\) largest eigenvalues of the \(N \times N\) isomap kernel. For a dense solver, the cost is approximately \(O[d N^2]\) . This cost can often be improved using the ARPACK solver. The eigensolver can be specified by the user with the eigen_solver keyword of Isomap . If unspecified, the code attempts to choose the best algorithm for the input data.

The overall complexity of Isomap is \(O[D \log(k) N \log(N)] + O[N^2(k + \log(N))] + O[d N^2]\) .

\(N\) : number of training data points

\(D\) : input dimension

\(k\) : number of nearest neighbors

\(d\) : output dimension

“A global geometric framework for nonlinear dimensionality reduction” Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. Science 290 (5500)

2.2.3. Locally Linear Embedding #

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods. It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

Locally linear embedding can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding .

../_images/sphx_glr_plot_lle_digits_006.png

The standard LLE algorithm comprises three stages:

Nearest Neighbors Search . See discussion under Isomap above.

Weight Matrix Construction . \(O[D N k^3]\) . The construction of the LLE weight matrix involves the solution of a \(k \times k\) linear equation for each of the \(N\) local neighborhoods.

Partial Eigenvalue Decomposition . See discussion under Isomap above.

The overall complexity of standard LLE is \(O[D \log(k) N \log(N)] + O[D N k^3] + O[d N^2]\) .

“Nonlinear dimensionality reduction by locally linear embedding” Roweis, S. & Saul, L. Science 290:2323 (2000)

2.2.4. Modified Locally Linear Embedding #

One well-known issue with LLE is the regularization problem. When the number of neighbors is greater than the number of input dimensions, the matrix defining each local neighborhood is rank-deficient. To address this, standard LLE applies an arbitrary regularization parameter \(r\) , which is chosen relative to the trace of the local weight matrix. Though it can be shown formally that as \(r \to 0\) , the solution converges to the desired embedding, there is no guarantee that the optimal solution will be found for \(r > 0\) . This problem manifests itself in embeddings which distort the underlying geometry of the manifold.

One method to address the regularization problem is to use multiple weight vectors in each neighborhood. This is the essence of modified locally linear embedding (MLLE). MLLE can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding , with the keyword method = 'modified' . It requires n_neighbors > n_components .

../_images/sphx_glr_plot_lle_digits_007.png

The MLLE algorithm comprises three stages:

Nearest Neighbors Search . Same as standard LLE

Weight Matrix Construction . Approximately \(O[D N k^3] + O[N (k-D) k^2]\) . The first term is exactly equivalent to that of standard LLE. The second term has to do with constructing the weight matrix from multiple weights. In practice, the added cost of constructing the MLLE weight matrix is relatively small compared to the cost of stages 1 and 3.

Partial Eigenvalue Decomposition . Same as standard LLE

The overall complexity of MLLE is \(O[D \log(k) N \log(N)] + O[D N k^3] + O[N (k-D) k^2] + O[d N^2]\) .

“MLLE: Modified Locally Linear Embedding Using Multiple Weights” Zhang, Z. & Wang, J.

2.2.5. Hessian Eigenmapping #

Hessian Eigenmapping (also known as Hessian-based LLE: HLLE) is another method of solving the regularization problem of LLE. It revolves around a hessian-based quadratic form at each neighborhood which is used to recover the locally linear structure. Though other implementations note its poor scaling with data size, sklearn implements some algorithmic improvements which make its cost comparable to that of other LLE variants for small output dimension. HLLE can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding , with the keyword method = 'hessian' . It requires n_neighbors > n_components * (n_components + 3) / 2 .

../_images/sphx_glr_plot_lle_digits_008.png

The HLLE algorithm comprises three stages:

Nearest Neighbors Search . Same as standard LLE Weight Matrix Construction . Approximately \(O[D N k^3] + O[N d^6]\) . The first term reflects a similar cost to that of standard LLE. The second term comes from a QR decomposition of the local hessian estimator. Partial Eigenvalue Decomposition . Same as standard LLE The overall complexity of standard HLLE is \(O[D \log(k) N \log(N)] + O[D N k^3] + O[N d^6] + O[d N^2]\) . \(N\) : number of training data points \(D\) : input dimension \(k\) : number of nearest neighbors \(d\) : output dimension

“Hessian Eigenmaps: Locally linear embedding techniques for high-dimensional data” Donoho, D. & Grimes, C. Proc Natl Acad Sci USA. 100:5591 (2003)

2.2.6. Spectral Embedding #

Spectral Embedding is an approach to calculating a non-linear embedding. Scikit-learn implements Laplacian Eigenmaps, which finds a low dimensional representation of the data using a spectral decomposition of the graph Laplacian. The graph generated can be considered as a discrete approximation of the low dimensional manifold in the high dimensional space. Minimization of a cost function based on the graph ensures that points close to each other on the manifold are mapped close to each other in the low dimensional space, preserving local distances. Spectral embedding can be performed with the function spectral_embedding or its object-oriented counterpart SpectralEmbedding .

The Spectral Embedding (Laplacian Eigenmaps) algorithm comprises three stages:

Weighted Graph Construction . Transform the raw input data into graph representation using affinity (adjacency) matrix representation.

Graph Laplacian Construction . unnormalized Graph Laplacian is constructed as \(L = D - A\) for and normalized one as \(L = D^{-\frac{1}{2}} (D - A) D^{-\frac{1}{2}}\) .

Partial Eigenvalue Decomposition . Eigenvalue decomposition is done on graph Laplacian.

The overall complexity of spectral embedding is \(O[D \log(k) N \log(N)] + O[D N k^3] + O[d N^2]\) .

“Laplacian Eigenmaps for Dimensionality Reduction and Data Representation” M. Belkin, P. Niyogi, Neural Computation, June 2003; 15 (6):1373-1396

2.2.7. Local Tangent Space Alignment #

Though not technically a variant of LLE, Local tangent space alignment (LTSA) is algorithmically similar enough to LLE that it can be put in this category. Rather than focusing on preserving neighborhood distances as in LLE, LTSA seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn the embedding. LTSA can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding , with the keyword method = 'ltsa' .

../_images/sphx_glr_plot_lle_digits_009.png

The LTSA algorithm comprises three stages:

Weight Matrix Construction . Approximately \(O[D N k^3] + O[k^2 d]\) . The first term reflects a similar cost to that of standard LLE.

The overall complexity of standard LTSA is \(O[D \log(k) N \log(N)] + O[D N k^3] + O[k^2 d] + O[d N^2]\) .

“Principal manifolds and nonlinear dimensionality reduction via tangent space alignment” Zhang, Z. & Zha, H. Journal of Shanghai Univ. 8:406 (2004)

2.2.8. Multi-dimensional Scaling (MDS) #

Multidimensional scaling ( MDS ) seeks a low-dimensional representation of the data in which the distances respect well the distances in the original high-dimensional space.

In general, MDS is a technique used for analyzing similarity or dissimilarity data. It attempts to model similarity or dissimilarity data as distances in a geometric spaces. The data can be ratings of similarity between objects, interaction frequencies of molecules, or trade indices between countries.

There exists two types of MDS algorithm: metric and non metric. In scikit-learn, the class MDS implements both. In Metric MDS, the input similarity matrix arises from a metric (and thus respects the triangular inequality), the distances between output two points are then set to be as close as possible to the similarity or dissimilarity data. In the non-metric version, the algorithms will try to preserve the order of the distances, and hence seek for a monotonic relationship between the distances in the embedded space and the similarities/dissimilarities.

../_images/sphx_glr_plot_lle_digits_010.png

Let \(S\) be the similarity matrix, and \(X\) the coordinates of the \(n\) input points. Disparities \(\hat{d}_{ij}\) are transformation of the similarities chosen in some optimal ways. The objective, called the stress, is then defined by \(\sum_{i < j} d_{ij}(X) - \hat{d}_{ij}(X)\)

The simplest metric MDS model, called absolute MDS , disparities are defined by \(\hat{d}_{ij} = S_{ij}\) . With absolute MDS, the value \(S_{ij}\) should then correspond exactly to the distance between point \(i\) and \(j\) in the embedding point.

Most commonly, disparities are set to \(\hat{d}_{ij} = b S_{ij}\) .

Non metric MDS focuses on the ordination of the data. If \(S_{ij} > S_{jk}\) , then the embedding should enforce \(d_{ij} < d_{jk}\) . For this reason, we discuss it in terms of dissimilarities ( \(\delta_{ij}\) ) instead of similarities ( \(S_{ij}\) ). Note that dissimilarities can easily be obtained from similarities through a simple transform, e.g. \(\delta_{ij}=c_1-c_2 S_{ij}\) for some real constants \(c_1, c_2\) . A simple algorithm to enforce proper ordination is to use a monotonic regression of \(d_{ij}\) on \(\delta_{ij}\) , yielding disparities \(\hat{d}_{ij}\) in the same order as \(\delta_{ij}\) .

A trivial solution to this problem is to set all the points on the origin. In order to avoid that, the disparities \(\hat{d}_{ij}\) are normalized. Note that since we only care about relative ordering, our objective should be invariant to simple translation and scaling, however the stress used in metric MDS is sensitive to scaling. To address this, non-metric MDS may use a normalized stress, known as Stress-1 defined as

The use of normalized Stress-1 can be enabled by setting normalized_stress=True , however it is only compatible with the non-metric MDS problem and will be ignored in the metric case.

../_images/sphx_glr_plot_mds_001.png

“Modern Multidimensional Scaling - Theory and Applications” Borg, I.; Groenen P. Springer Series in Statistics (1997)

“Nonmetric multidimensional scaling: a numerical method” Kruskal, J. Psychometrika, 29 (1964)

“Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis” Kruskal, J. Psychometrika, 29, (1964)

2.2.9. t-distributed Stochastic Neighbor Embedding (t-SNE) #

t-SNE ( TSNE ) converts affinities of data points to probabilities. The affinities in the original space are represented by Gaussian joint probabilities and the affinities in the embedded space are represented by Student’s t-distributions. This allows t-SNE to be particularly sensitive to local structure and has a few other advantages over existing techniques:

Revealing the structure at many scales on a single map

Revealing data that lie in multiple, different, manifolds or clusters

Reducing the tendency to crowd points together at the center

While Isomap, LLE and variants are best suited to unfold a single continuous low dimensional manifold, t-SNE will focus on the local structure of the data and will tend to extract clustered local groups of samples as highlighted on the S-curve example. This ability to group samples based on the local structure might be beneficial to visually disentangle a dataset that comprises several manifolds at once as is the case in the digits dataset.

The Kullback-Leibler (KL) divergence of the joint probabilities in the original space and the embedded space will be minimized by gradient descent. Note that the KL divergence is not convex, i.e. multiple restarts with different initializations will end up in local minima of the KL divergence. Hence, it is sometimes useful to try different seeds and select the embedding with the lowest KL divergence.

The disadvantages to using t-SNE are roughly:

t-SNE is computationally expensive, and can take several hours on million-sample datasets where PCA will finish in seconds or minutes

The Barnes-Hut t-SNE method is limited to two or three dimensional embeddings.

The algorithm is stochastic and multiple restarts with different seeds can yield different embeddings. However, it is perfectly legitimate to pick the embedding with the least error.

Global structure is not explicitly preserved. This problem is mitigated by initializing points with PCA (using init='pca' ).

../_images/sphx_glr_plot_lle_digits_013.png

The main purpose of t-SNE is visualization of high-dimensional data. Hence, it works best when the data will be embedded on two or three dimensions.

Optimizing the KL divergence can be a little bit tricky sometimes. There are five parameters that control the optimization of t-SNE and therefore possibly the quality of the resulting embedding:

early exaggeration factor

learning rate

maximum number of iterations

angle (not used in the exact method)

The perplexity is defined as \(k=2^{(S)}\) where \(S\) is the Shannon entropy of the conditional probability distribution. The perplexity of a \(k\) -sided die is \(k\) , so that \(k\) is effectively the number of nearest neighbors t-SNE considers when generating the conditional probabilities. Larger perplexities lead to more nearest neighbors and less sensitive to small structure. Conversely a lower perplexity considers a smaller number of neighbors, and thus ignores more global information in favour of the local neighborhood. As dataset sizes get larger more points will be required to get a reasonable sample of the local neighborhood, and hence larger perplexities may be required. Similarly noisier datasets will require larger perplexity values to encompass enough local neighbors to see beyond the background noise.

The maximum number of iterations is usually high enough and does not need any tuning. The optimization consists of two phases: the early exaggeration phase and the final optimization. During early exaggeration the joint probabilities in the original space will be artificially increased by multiplication with a given factor. Larger factors result in larger gaps between natural clusters in the data. If the factor is too high, the KL divergence could increase during this phase. Usually it does not have to be tuned. A critical parameter is the learning rate. If it is too low gradient descent will get stuck in a bad local minimum. If it is too high the KL divergence will increase during optimization. A heuristic suggested in Belkina et al. (2019) is to set the learning rate to the sample size divided by the early exaggeration factor. We implement this heuristic as learning_rate='auto' argument. More tips can be found in Laurens van der Maaten’s FAQ (see references). The last parameter, angle, is a tradeoff between performance and accuracy. Larger angles imply that we can approximate larger regions by a single point, leading to better speed but less accurate results.

“How to Use t-SNE Effectively” provides a good discussion of the effects of the various parameters, as well as interactive plots to explore the effects of different parameters.

The Barnes-Hut t-SNE that has been implemented here is usually much slower than other manifold learning algorithms. The optimization is quite difficult and the computation of the gradient is \(O[d N log(N)]\) , where \(d\) is the number of output dimensions and \(N\) is the number of samples. The Barnes-Hut method improves on the exact method where t-SNE complexity is \(O[d N^2]\) , but has several other notable differences:

The Barnes-Hut implementation only works when the target dimensionality is 3 or less. The 2D case is typical when building visualizations.

Barnes-Hut only works with dense input data. Sparse data matrices can only be embedded with the exact method or can be approximated by a dense low rank projection for instance using PCA

Barnes-Hut is an approximation of the exact method. The approximation is parameterized with the angle parameter, therefore the angle parameter is unused when method=”exact”

Barnes-Hut is significantly more scalable. Barnes-Hut can be used to embed hundred of thousands of data points while the exact method can handle thousands of samples before becoming computationally intractable

For visualization purpose (which is the main use case of t-SNE), using the Barnes-Hut method is strongly recommended. The exact t-SNE method is useful for checking the theoretically properties of the embedding possibly in higher dimensional space but limit to small datasets due to computational constraints.

Also note that the digits labels roughly match the natural grouping found by t-SNE while the linear 2D projection of the PCA model yields a representation where label regions largely overlap. This is a strong clue that this data can be well separated by non linear methods that focus on the local structure (e.g. an SVM with a Gaussian RBF kernel). However, failing to visualize well separated homogeneously labeled groups with t-SNE in 2D does not necessarily imply that the data cannot be correctly classified by a supervised model. It might be the case that 2 dimensions are not high enough to accurately represent the internal structure of the data.

“Visualizing High-Dimensional Data Using t-SNE” van der Maaten, L.J.P.; Hinton, G. Journal of Machine Learning Research (2008)

“t-Distributed Stochastic Neighbor Embedding” van der Maaten, L.J.P.

“Accelerating t-SNE using Tree-Based Algorithms” van der Maaten, L.J.P.; Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

“Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets” Belkina, A.C., Ciccolella, C.O., Anno, R., Halpert, R., Spidlen, J., Snyder-Cappione, J.E., Nature Communications 10, 5415 (2019).

2.2.10. Tips on practical use #

Make sure the same scale is used over all features. Because manifold learning methods are based on a nearest-neighbor search, the algorithm may perform poorly otherwise. See StandardScaler for convenient ways of scaling heterogeneous data.

The reconstruction error computed by each routine can be used to choose the optimal output dimension. For a \(d\) -dimensional manifold embedded in a \(D\) -dimensional parameter space, the reconstruction error will decrease as n_components is increased until n_components == d .

Note that noisy data can “short-circuit” the manifold, in essence acting as a bridge between parts of the manifold that would otherwise be well-separated. Manifold learning on noisy and/or incomplete data is an active area of research.

Certain input configurations can lead to singular weight matrices, for example when more than two points in the dataset are identical, or when the data is split into disjointed groups. In this case, solver='arpack' will fail to find the null space. The easiest way to address this is to use solver='dense' which will work on a singular matrix, though it may be very slow depending on the number of input points. Alternatively, one can attempt to understand the source of the singularity: if it is due to disjoint sets, increasing n_neighbors may help. If it is due to identical points in the dataset, removing these points may help.

Totally Random Trees Embedding can also be useful to derive non-linear representations of feature space, also it does not perform dimensionality reduction.

Manifold-Learning

Introduction to manifold learning - mathematical theory and applied python examples (Multidimensional Scaling, Isomap, Locally Linear Embedding, Spectral Embedding/Laplacian Eigenmaps)

Manifold Learning: Introduction and Foundational Algorithms

Mathematical theory with examples and applications in python.

  • Overview of manifolds and the basic topology of data
  • Statistical learning and instrinsic dimensionality
  • The manifold hypothesis
  • Classical, metric, and non-metric MDS algorithms
  • Example applications to quantitative psychology and social science
  • Geodesic distances and the isometric mapping algorithm
  • Implementation details and applications with facial images and coil-100 object images
  • Locally linear reconstructions and optimization problems
  • Example applications with image data
  • From the general to the discrete Laplacian operators
  • Visualizing spectral embedding with the networkx library
  • Spectral embedding with NLTK and the Brown text corpus

Neural Networks, Manifolds, and Topology

Posted on April 6, 2014

Recently, there’s been a great deal of excitement and interest in deep neural networks because they’ve achieved breakthrough results in areas such as computer vision. 1

However, there remain a number of concerns about them. One is that it can be quite challenging to understand what a neural network is really doing. If one trains it well, it achieves high quality results, but it is challenging to understand how it is doing so. If the network fails, it is hard to understand what went wrong.

While it is challenging to understand the behavior of deep neural networks in general, it turns out to be much easier to explore low-dimensional deep neural networks – networks that only have a few neurons in each layer. In fact, we can create visualizations to completely understand the behavior and training of such networks. This perspective will allow us to gain deeper intuition about the behavior of neural networks and observe a connection linking neural networks to an area of mathematics called topology.

A number of interesting things follow from this, including fundamental lower-bounds on the complexity of a neural network capable of classifying certain datasets.

A Simple Example

Let’s begin with a very simple dataset, two curves on a plane. The network will learn to classify points as belonging to one or the other.

The obvious way to visualize the behavior of a neural network – or any classification algorithm, for that matter – is to simply look at how it classifies every possible data point.

We’ll start with the simplest possible class of neural network, one with only an input layer and an output layer. Such a network simply tries to separate the two classes of data by dividing them with a line.

That sort of network isn’t very interesting. Modern neural networks generally have multiple layers between their input and output, called “hidden” layers. At the very least, they have one.

As before, we can visualize the behavior of this network by looking at what it does to different points in its domain. It separates the data with a more complicated curve than a line.

With each layer, the network transforms the data, creating a new representation . 2 We can look at the data in each of these representations and how the network classifies them. When we get to the final representation, the network will just draw a line through the data (or, in higher dimensions, a hyperplane).

In the previous visualization, we looked at the data in its “raw” representation. You can think of that as us looking at the input layer. Now we will look at it after it is transformed by the first layer. You can think of this as us looking at the hidden layer.

Each dimension corresponds to the firing of a neuron in the layer.

Continuous Visualization of Layers

In the approach outlined in the previous section, we learn to understand networks by looking at the representation corresponding to each layer. This gives us a discrete list of representations.

The tricky part is in understanding how we go from one to another. Thankfully, neural network layers have nice properties that make this very easy.

There are a variety of different kinds of layers used in neural networks. We will talk about tanh layers for a concrete example. A tanh layer \(\tanh(Wx+b)\) consists of:

  • A linear transformation by the “weight” matrix \(W\)
  • A translation by the vector \(b\)
  • Point-wise application of tanh.

We can visualize this as a continuous transformation, as follows:

The story is much the same for other standard layers, consisting of an affine transformation followed by pointwise application of a monotone activation function.

We can apply this technique to understand more complicated networks. For example, the following network classifies two spirals that are slightly entangled, using four hidden layers. Over time, we can see it shift from the “raw” representation to higher level ones it has learned in order to classify the data. While the spirals are originally entangled, by the end they are linearly separable.

On the other hand, the following network, also using multiple layers, fails to classify two spirals that are more entangled.

It is worth explicitly noting here that these tasks are only somewhat challenging because we are using low-dimensional neural networks. If we were using wider networks, all this would be quite easy.

(Andrej Karpathy has made a nice demo based on ConvnetJS that allows you to interactively explore networks with this sort of visualization of training!)

Topology of tanh Layers

Each layer stretches and squishes space, but it never cuts, breaks, or folds it. Intuitively, we can see that it preserves topological properties. For example, a set will be connected afterwards if it was before (and vice versa).

Transformations like this, which don’t affect topology, are called homeomorphisms. Formally, they are bijections that are continuous functions both ways.

Theorem : Layers with \(N\) inputs and \(N\) outputs are homeomorphisms, if the weight matrix, \(W\) , is non-singular. (Though one needs to be careful about domain and range.)

Proof : Let’s consider this step by step:

  • Let’s assume \(W\) has a non-zero determinant. Then it is a bijective linear function with a linear inverse. Linear functions are continuous. So, multiplying by \(W\) is a homeomorphism.
  • Translations are homeomorphisms
  • tanh (and sigmoid and softplus but not ReLU) are continuous functions with continuous inverses. They are bijections if we are careful about the domain and range we consider. Applying them pointwise is a homeomorphism

Thus, if \(W\) has a non-zero determinant, our layer is a homeomorphism. ∎

This result continues to hold if we compose arbitrarily many of these layers together.

Topology and Classification

Consider a two dimensional dataset with two classes \(A, B \subset \mathbb{R}^2\) :

\[A = \{x | d(x,0) < 1/3\}\]

\[B = \{x | 2/3 < d(x,0) < 1\}\]

Claim : It is impossible for a neural network to classify this dataset without having a layer that has 3 or more hidden units, regardless of depth.

As mentioned previously, classification with a sigmoid unit or a softmax layer is equivalent to trying to find a hyperplane (or in this case a line) that separates \(A\) and \(B\) in the final represenation. With only two hidden units, a network is topologically incapable of separating the data in this way, and doomed to failure on this dataset.

In the following visualization, we observe a hidden representation while a network trains, along with the classification line. As we watch, it struggles and flounders trying to learn a way to do this.

In the end it gets pulled into a rather unproductive local minimum. Although, it’s actually able to achieve \(\sim 80\%\) classification accuracy.

This example only had one hidden layer, but it would fail regardless.

Proof : Either each layer is a homeomorphism, or the layer’s weight matrix has determinant 0. If it is a homemorphism, \(A\) is still surrounded by \(B\) , and a line can’t separate them. But suppose it has a determinant of 0: then the dataset gets collapsed on some axis. Since we’re dealing with something homeomorphic to the original dataset, \(A\) is surrounded by \(B\) , and collapsing on any axis means we will have some points of \(A\) and \(B\) mix and become impossible to distinguish between. ∎

If we add a third hidden unit, the problem becomes trivial. The neural network learns the following representation:

With this representation, we can separate the datasets with a hyperplane.

To get a better sense of what’s going on, let’s consider an even simpler dataset that’s 1-dimensional:

\[A = [-\frac{1}{3}, \frac{1}{3}]\]

\[B = [-1, -\frac{2}{3}] \cup [\frac{2}{3}, 1]\]

Without using a layer of two or more hidden units, we can’t classify this dataset. But if we use one with two units, we learn to represent the data as a nice curve that allows us to separate the classes with a line:

What’s happening? One hidden unit learns to fire when \(x > -\frac{1}{2}\) and one learns to fire when \(x > \frac{1}{2}\) . When the first one fires, but not the second, we know that we are in A.

The Manifold Hypothesis

Is this relevant to real world data sets, like image data? If you take the manifold hypothesis really seriously, I think it bears consideration.

The manifold hypothesis is that natural data forms lower-dimensional manifolds in its embedding space. There are both theoretical 3 and experimental 4 reasons to believe this to be true. If you believe this, then the task of a classification algorithm is fundamentally to separate a bunch of tangled manifolds.

In the previous examples, one class completely surrounded another. However, it doesn’t seem very likely that the dog image manifold is completely surrounded by the cat image manifold. But there are other, more plausible topological situations that could still pose an issue, as we will see in the next section.

Links And Homotopy

Another interesting dataset to consider is two linked tori, \(A\) and \(B\) .

Much like the previous datasets we considered, this dataset can’t be separated without using \(n+1\) dimensions, namely a \(4\) th dimension.

Links are studied in knot theory, an area of topology. Sometimes when we see a link, it isn’t immediately obvious whether it’s an unlink (a bunch of things that are tangled together, but can be separated by continuous deformation) or not.

If a neural network using layers with only 3 units can classify it, then it is an unlink. (Question: Can all unlinks be classified by a network with only 3 units, theoretically?)

From this knot perspective, our continuous visualization of the representations produced by a neural network isn’t just a nice animation, it’s a procedure for untangling links. In topology, we would call it an ambient isotopy between the original link and the separated ones.

Formally, an ambient isotopy between manifolds \(A\) and \(B\) is a continuous function \(F: [0,1] \times X \to Y\) such that each \(F_t\) is a homeomorphism from \(X\) to its range, \(F_0\) is the identity function, and \(F_1\) maps \(A\) to \(B\) . That is, \(F_t\) continuously transitions from mapping \(A\) to itself to mapping \(A\) to \(B\) .

Theorem : There is an ambient isotopy between the input and a network layer’s representation if: a) \(W\) isn’t singular, b) we are willing to permute the neurons in the hidden layer, and c) there is more than 1 hidden unit.

Proof : Again, we consider each stage of the network individually:

  • The hardest part is the linear transformation. In order for this to be possible, we need \(W\) to have a positive determinant. Our premise is that it isn’t zero, and we can flip the sign if it is negative by switching two of the hidden neurons, and so we can guarantee the determinant is positive. The space of positive determinant matrices is path-connected , so there exists \(p: [0,1] \to GL_n(\mathbb{R})\) 5 such that \(p(0) = Id\) and \(p(1) = W\) . We can continually transition from the identity function to the \(W\) transformation with the function \(x \to p(t)x\) , multiplying \(x\) at each point in time \(t\) by the continuously transitioning matrix \(p(t)\) .
  • We can continually transition from the identity function to the \(b\) translation with the function \(x \to x + tb\) .
  • We can continually transition from the identity function to the pointwise use of σ with the function: \(x \to (1-t)x + tσ(x)\) . ∎

I imagine there is probably interest in programs automatically discovering such ambient isotopies and automatically proving the equivalence of certain links, or that certain links are separable. It would be interesting to know if neural networks can beat whatever the state of the art is there.

(Apparently determining if knots are trivial is NP. This doesn’t bode well for neural networks.)

The sort of links we’ve talked about so far don’t seem likely to turn up in real world data, but there are higher dimensional generalizations. It seems plausible such things could exist in real world data.

Links and knots are \(1\) -dimensional manifolds, but we need 4 dimensions to be able to untangle all of them. Similarly, one can need yet higher dimensional space to be able to unknot \(n\) -dimensional manifolds. All \(n\) -dimensional manifolds can be untangled in \(2n+2\) dimensions. 6

(I know very little about knot theory and really need to learn more about what’s known regarding dimensionality and links. If we know a manifold can be embedded in n-dimensional space, instead of the dimensionality of the manifold, what limit do we have?)

The Easy Way Out

The natural thing for a neural net to do, the very easy route, is to try and pull the manifolds apart naively and stretch the parts that are tangled as thin as possible. While this won’t be anywhere close to a genuine solution, it can achieve relatively high classification accuracy and be a tempting local minimum.

It would present itself as very high derivatives on the regions it is trying to stretch, and sharp near-discontinuities. We know these things happen. 7 Contractive penalties, penalizing the derivatives of the layers at data points, are the natural way to fight this. 8

Since these sort of local minima are absolutely useless from the perspective of trying to solve topological problems, topological problems may provide a nice motivation to explore fighting these issues.

On the other hand, if we only care about achieving good classification results, it seems like we might not care. If a tiny bit of the data manifold is snagged on another manifold, is that a problem for us? It seems like we should be able to get arbitrarily good classification results despite this issue.

(My intuition is that trying to cheat the problem like this is a bad idea: it’s hard to imagine that it won’t be a dead end. In particular, in an optimization problem where local minima are a big problem, picking an architecture that can’t genuinely solve the problem seems like a recipe for bad performance.)

Better Layers for Manipulating Manifolds?

The more I think about standard neural network layers – that is, with an affine transformation followed by a point-wise activation function – the more disenchanted I feel. It’s hard to imagine that these are really very good for manipulating manifolds.

Perhaps it might make sense to have a very different kind of layer that we can use in composition with more traditional ones?

The thing that feels natural to me is to learn a vector field with the direction we want to shift the manifold:

And then deform space based on it:

One could learn the vector field at fixed points (just take some fixed points from the training set to use as anchors) and interpolate in some manner. The vector field above is of the form:

\[F(x) = \frac{v_0f_0(x) + v_1f_1(x)}{1+f_0(x)+f_1(x)}\]

Where \(v_0\) and \(v_1\) are vectors and \(f_0(x)\) and \(f_1(x)\) are n-dimensional gaussians. This is inspired a bit by radial basis functions .

K-Nearest Neighbor Layers

I’ve also begun to think that linear separability may be a huge, and possibly unreasonable, amount to demand of a neural network. In some ways, it feels like the natural thing to do would be to use k-nearest neighbors (k-NN). However, k-NN’s success is greatly dependent on the representation it classifies data from, so one needs a good representation before k-NN can work well.

As a first experiment, I trained some MNIST networks (two-layer convolutional nets, no dropout) that achieved \(\sim 1\%\) test error. I then dropped the final softmax layer and used the k-NN algorithm. I was able to consistently achieve a reduction in test error of 0.1-0.2%.

Still, this doesn’t quite feel like the right thing. The network is still trying to do linear classification, but since we use k-NN at test time, it’s able to recover a bit from mistakes it made.

k-NN is differentiable with respect to the representation it’s acting on, because of the 1/distance weighting. As such, we can train a network directly for k-NN classification. This can be thought of as a kind of “nearest neighbor” layer that acts as an alternative to softmax.

We don’t want to feedforward our entire training set for each mini-batch because that would be very computationally expensive. I think a nice approach is to classify each element of the mini-batch based on the classes of other elements of the mini-batch, giving each one a weight of 1/(distance from classification target). 9

Sadly, even with sophisticated architecture, using k-NN only gets down to 5-4% test error – and using simpler architectures gets worse results. However, I’ve put very little effort into playing with hyper-parameters.

Still, I really aesthetically like this approach, because it seems like what we’re “asking” the network to do is much more reasonable. We want points of the same manifold to be closer than points of others, as opposed to the manifolds being separable by a hyperplane. This should correspond to inflating the space between manifolds for different categories and contracting the individual manifolds. It feels like simplification.

Topological properties of data, such as links, may make it impossible to linearly separate classes using low-dimensional networks, regardless of depth. Even in cases where it is technically possible, such as spirals, it can be very challenging to do so.

To accurately classify data with neural networks, wide layers are sometimes necessary. Further, traditional neural network layers do not seem to be very good at representing important manipulations of manifolds; even if we were to cleverly set weights by hand, it would be challenging to compactly represent the transformations we want. New layers, specifically motivated by the manifold perspective of machine learning, may be useful supplements.

(This is a developing research project. It’s posted as an experiment in doing research openly. I would be delighted to have your feedback on these ideas: you can comment inline or at the end. For typos, technical errors, or clarifications you would like to see added, you are encouraged to make a pull request on github .)

Acknowledgments

Thank you to Yoshua Bengio, Michael Nielsen, Dario Amodei, Eliana Lorch, Jacob Steinhardt, and Tamsyn Waterhouse for their comments and encouragement.

This seems to have really kicked off with Krizhevsky et al. , (2012) , who put together a lot of different pieces to achieve outstanding results. Since then there’s been a lot of other exciting work. ↩

These representations, hopefully, make the data “nicer” for the network to classify. There has been a lot of work exploring representations recently. Perhaps the most fascinating has been in Natural Language Processing: the representations we learn of words, called word embeddings, have interesting properties. See Mikolov et al. (2013) , Turian et al. (2010) , and, Richard Socher’s work . To give you a quick flavor, there is a very nice visualization associated with the Turian paper. ↩

A lot of the natural transformations you might want to perform on an image, like translating or scaling an object in it, or changing the lighting, would form continuous curves in image space if you performed them continuously. ↩

Carlsson et al. found that local patches of images form a klein bottle. ↩

\(GL_n(\mathbb{R})\) is the set of invertible \(n \times n\) matrices on the reals, formally called the general linear group of degree \(n\) . ↩

This result is mentioned in Wikipedia’s subsection on Isotopy versions . ↩

See Szegedy et al. , where they are able to modify data samples and find slight modifications that cause some of the best image classification neural networks to misclasify the data. It’s quite troubling. ↩

Contractive penalties were introduced in contractive autoencoders. See Rifai et al. (2011) . ↩

I used a slightly less elegant, but roughly equivalent algorithm because it was more practical to implement in Theano: feedforward two different batches at the same time, and classify them based on each other. ↩

  • A-Z Publications

Annual Review of Statistics and Its Application

Volume 11, 2024, review article, open access, manifold learning: what, how, and why.

  • Marina Meilă 1 , and Hanyu Zhang 2
  • View Affiliations Hide Affiliations Affiliations: 1 Department of Statistics, University of Washington, Seattle, Washington, USA; email: [email protected] 2 ByteDance, Inc., Bellevue, Washington, USA
  • Vol. 11:393-417 (Volume publication date April 2024) https://doi.org/10.1146/annurev-statistics-040522-115238
  • First published as a Review in Advance on November 29, 2023
  • Copyright © 2024 by the author(s). This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third-party material in this article for license information

Manifold learning (ML), also known as nonlinear dimension reduction, is a set of methods to find the low-dimensional structure of data. Dimension reduction for large, high-dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high-dimensional point clouds and allow one to visualize, denoise, and interpret them. This review presents the underlying principles of ML, its representative methods, and their statistical foundations, all from a practicing statistician's perspective. It describes the trade-offs and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.

Article metrics loading...

Full text loading...

Literature Cited

  • Aamari E , Levrard C. 2018 . . Stability and minimax optimality of tangential Delaunay complexes for manifold reconstruction. . Discrete Comput. Geom. 59 : ( 4 ): 923 – 71 [Crossref] [Google Scholar]
  • Aamari E , Levrard C. 2019 . . Nonasymptotic rates for manifold, tangent space and curvature estimation. . Ann. Stat. 47 : ( 1 ): 177 – 204 [Crossref] [Google Scholar]
  • Altan E , Solla SA , Miller LE , Perreault EJ. 2021 . . Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. . PLOS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1008591 [Google Scholar]
  • Arias-Castro E , Pelletier B. 2013 . . On the convergence of maximum variance unfolding. . J. Mach. Learn. Res. 14 : : 1747 – 70 [Google Scholar]
  • Assouad P. 1983 . . Plongements lipschitziens dans {{ r }} n . . Bull. Soc. Math. France 111 : : 429 – 48 [Crossref] [Google Scholar]
  • Aswani A , Bickel P , Tomlin C. 2011 . . Regression on manifolds: estimation of the exterior derivative. . Ann. Stat. 39 : ( 1 ): 48 – 81 [Crossref] [Google Scholar]
  • Belkin M , Niyogi P. 2003 . . Laplacian Eigenmaps for dimensionality reduction and data representation. . Neural Comput . 15 : ( 6 ): 1373 – 96 [Crossref] [Google Scholar]
  • Belkin M , Niyogi P . 2007 . . Convergence of Laplacian Eigenmaps. . In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference , ed. B Schölkopf, JC Platt, T Hoffman , pp. 129 – 36 . Cambridge, MA: : MIT Press [Google Scholar]
  • Belkin M , Niyogi P , Sindhwani V. 2006 . . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. . J. Mach. Learn. Res. 7 : ( 85 ): 2399 – 434 [Google Scholar]
  • Bérard P , Besson G , Gallot S. 1994 . . Embedding Riemannian manifolds by their heat kernel. . Geom. Funct. Anal. 4 : ( 4 ): 373 – 98 [Crossref] [Google Scholar]
  • Berry T , Sauer T. 2019 . . Consistent manifold representation for topological data analysis. . Found. Data Sci. 1 : ( 1 ): 1 – 38 [Google Scholar]
  • Böhm JN , Berens P , Kobak D. 2022 . . Attraction-repulsion spectrum in neighbor embeddings. . J. Mach. Learn. Res. 23 : ( 95 ): 1 – 32 [Google Scholar]
  • Block A , Jia Z , Polyanskiy Y , Rakhlin A. 2022 . . Intrinsic dimension estimation using Wasserstein distance. . J. Mach. Learn. Res. 23 : ( 313 ): 1 – 37 [Google Scholar]
  • Boninsegna L , Gobbo G , Noé F , Clementi C. 2015 . . Investigating molecular kinetics by variationally optimized diffusion maps. . J. Chem. Theory Comput. 11 : ( 12 ): 5947 – 60 [Crossref] [Google Scholar]
  • Borovitskiy V , Terenin A , Mostowsky P , Deisenroth MP . 2020 . . Matérn Gaussian processes on Riemannian manifolds. . In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) , ed. H Larochelle, M Ranzato, R Hadsell, M Balcan, H Lin , pp. 12426 – 37 . Red Hook, NY: : Curran [Google Scholar]
  • Calder J , Trillos NG. 2022 . . Improved spectral convergence rates for graph Laplacians on є -graphs and k -NN graphs. . Appl. Comput. Harmon. Anal. 60 : : 123 – 75 [Crossref] [Google Scholar]
  • Carreira-Perpiñan MA. 2010 . . The elastic embedding algorithm for dimensionality reduction. . In ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning , ed. J Fürnkranz, T Joachims , pp. 167 – 74 . Madison, WI: : Omnipress [Google Scholar]
  • Ceriotti M , Tribello GA , Parrinello M. 2013 . . Demonstrating the transferability and the descriptive power of sketch-map. . J. Chem. Theory Comput. 9 : ( 3 ): 1521 – 32 [Crossref] [Google Scholar]
  • Chatalic A , Schreuder N , Rosasco L , Rudi A. 2022 . . Nyström kernel mean embeddings. . Proc. Mach. Learn. Res. 162 : : 3006 – 24 [Google Scholar]
  • Chen G , Little AV , Maggioni M . 2013 . . Multi-resolution geometric analysis for data in high dimensions. . In Excursions in Harmonic Analysis , Vol. 1 , ed. TD Andrews, R Balan, JJ Benedetto, W Czaja, KA Okoudjou , pp. 259 – 85 . Boston: : Birkhäuser [Google Scholar]
  • Chen L , Buja A. 2009 . . Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis. . J. Am. Stat. Assoc. 104 : ( 485 ): 209 – 19 [Crossref] [Google Scholar]
  • Chen YC , Genovese CR , Wasserman L. 2015 . . Asymptotic theory for density ridges. . Ann. Stat. 43 : ( 5 ): 1896 – 928 [Crossref] [Google Scholar]
  • Chen YC , Meilă M . 2019 . . Selecting the independent coordinates of manifolds with large aspect ratios. . In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) , ed. H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett , pp. 1086 – 95 . Red Hook, NY: : Curran [Google Scholar]
  • Chmiela S , Tkatchenko A , Sauceda H , Poltavsky I , Schütt KT , Müller KR. 2017 . . Machine learning of accurate energy-conserving molecular force fields. . Sci. Adv. 3 : : e1603015 [Crossref] [Google Scholar]
  • Coifman RR , Lafon S. 2006 . . Diffusion Maps. . Appl. Comput. Harmon. Anal. 30 : ( 1 ): 5 – 30 [Crossref] [Google Scholar]
  • Connor M , Rozell C. 2016 . . Unsupervised learning of manifold models for neural coding of physical transformations in the ventral visual pathway . Presented at Neural Information Processing Systems (NIPS) Workshop , Brains and Bits: Neuroscience Meets Machine Learning , Barcelona, Spain: , Dec. 9–10 [Google Scholar]
  • Costa J , Girotra A , Hero A. 2005 . . Estimating local intrinsic dimension with k-nearest neighbor graphs. . In IEEE/SP 13th Workshop on Statistical Signal Processing, 2005 , pp. 417 – 22 . Piscataway, NJ: : IEEE [Google Scholar]
  • Cunningham JP , Yu BM. 2014 . . Dimensionality reduction for large-scale neural recordings. . Nat. Neurosci. 16 : : 1500 – 9 [Crossref] [Google Scholar]
  • Das P , Moll M , Stamati H , Kavraki L , Clementi C. 2006 . . Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. . Proc. Natl. Acad. Sci . 103 : ( 26 ): 9885 – 90 [Crossref] [Google Scholar]
  • Diaconis P , Goel S , Holmes S. 2008 . . Horseshoes in multidimensional scaling and local kernel methods. . Ann. Appl. Stat. 2 : ( 3 ): 777 – 807 [Crossref] [Google Scholar]
  • do Carmo M. 1992 . . Riemannian Geometry . New York: : Springer [Google Scholar]
  • Dsilva CJ , Talmon R , Coifman RR , Kevrekidis IG. 2018 . . Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. . Appl. Comput. Harmon. Anal. 44 : ( 3 ): 759 – 73 [Crossref] [Google Scholar]
  • Dsilva CJ , Talmon R , Gear CW , Coifman RR , Kevrekidis IG. 2016 . . Data-driven reduction for a class of multiscale fast-slow stochastic dynamical systems. . SIAM J. Appl. Dyn. Syst. 15 : ( 3 ): 1327 – 51 [Crossref] [Google Scholar]
  • Falconer K. 2003 . . Alternative definitions of dimension. . In Fractal Geometry: Mathematical Foundations and Applications , ed. K Falconer , pp. 39 – 58 . New York: : John Wiley & Sons [Google Scholar]
  • Farahmand AM , Szepesvári C , Audibert JY. 2007 . . Manifold-adaptive dimension estimation. . In Proceedings of the 24th International Conference on Machine Learning, ICML '07 , pp. 265 – 72 . New York: : ACM [Google Scholar]
  • Fefferman C , Mitter S , Narayanan H. 2016 . . Testing the manifold hypothesis. . J. Am. Math. Soc. 29 : ( 4 ): 983 – 1049 [Crossref] [Google Scholar]
  • García Trillos N , Gerlach M , Hein M , Slepčev D. 2020 . . Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. . Found. Comput. Math. 20 : ( 4 ): 827 – 87 [Crossref] [Google Scholar]
  • García Trillos N , Slepčev D. 2018 . . A variational approach to the consistency of spectral clustering. . Appl. Comput. Harmon. Anal. 45 : ( 2 ): 239 – 81 [Crossref] [Google Scholar]
  • Genovese CR , Perone-Pacifico M , Verdinelli I , Wasserman LA. 2012 . . Minimax manifold estimation. . J. Mach. Learn. Res. 13 : : 1263 – 91 [Google Scholar]
  • Giné E , Koltchinskii V. 2006 . . Concentration inequalities and asymptotic results for ratio type empirical processes. . Ann. Probab. 34 : ( 3 ): 1143 – 216 [Crossref] [Google Scholar]
  • Goldberg Y , Zakai A , Kushnir D , Ritov Y. 2008 . . Manifold learning: the price of normalization. . J. Mach. Learn. Res. 9 : ( 63 ): 1909 – 39 [Google Scholar]
  • Goodfellow I , Bengio Y , Courville A. 2016 . . Deep Learning . Cambridge, MA: : MIT Press [Google Scholar]
  • Grassberger P , Procaccia I. 1983 . . Measuring the strangeness of strange attractors. . Phys. D Nonlinear Phenom. 9 : ( 1 ): 189 – 208 [Crossref] [Google Scholar]
  • Hein M , Audibert J , von Luxburg U. 2007 . . Graph Laplacians and their convergence on random neighborhood graphs. . J. Mach. Learn. Res. 8 : : 1325 – 68 [Google Scholar]
  • Herring CA , Banerjee A , McKinley ET , Simmons AJ , Ping J , et al. 2018 . . Unsupervised trajectory analysis of single-cell RNA-seq and imaging data reveals alternative tuft cell origins in the gut. . Cell Syst . 6 : ( 1 ): 37 – 51.e9 [Crossref] [Google Scholar]
  • Hinton GE , Roweis S . 2002 . . Stochastic neighbor embedding. . In Advances in Neural Information Processing Systems 15 (NIPS 2002) , ed. S Becker, S Thrun, K Obermayer , pp. 857 – 64 . Cambridge, MA: : MIT Press [Google Scholar]
  • Im DJ , Verma N , Branson K. 2018 . . Stochastic neighbor embedding under f-divergences. . arXiv:1811.01247 [cs.LG]
  • Isayev O , Fourches D , Muratov EN , Oses C , Rasch K , et al. 2015 . . Materials cartography: representing and mining materials space using structural and electronic fingerprints. . Chem. Mater. ( 27 ): 735 – 43 [Crossref] [Google Scholar]
  • Jacomy M , Venturini T , Heymann S , Bastian M. 2014 . . ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. . PLOS ONE 9 : ( 6 ): e98679 [Crossref] [Google Scholar]
  • Jolliffe IT. 2002 . . Principal Component Analysis . New York: : Springer [Google Scholar]
  • Joncas D , Meilă M , McQueen J . 2017 . . Improved graph Laplacian via geometric self-consistency. . In Advances in Neural Information Processing Systems 30 (NIPS 2017) , ed. I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett , pp. 4457 – 66 . Red Hook, NY: : Curran [Google Scholar]
  • Kim J , Rinaldo A , Wasserman LA. 2019 . . Minimax rates for estimating the dimension of a manifold. . J. Comput. Geom. 10 : ( 1 ): 42 – 95 [Google Scholar]
  • Kirichenko A , van Zanten H. 2017 . . Estimating a smooth function on a large graph by Bayesian Laplacian regularisation. . Electron. J. Stat. 11 : ( 1 ): 891 – 915 [Crossref] [Google Scholar]
  • Kleindessner M , von Luxburg U. 2015 . . Dimensionality estimation without distances. . J. Mach. Learn. Res. 38 : : 471 – 79 [Google Scholar]
  • Kobak D , Linderman G , Steinerberger S , Kluger Y , Berens P . 2020 . . Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations. . In Machine Learning and Knowledge Discovery in Databases , ed. U Brefeld, E Fromont, A Hotho, A Knobbe, M Maathuis, C Robardet , pp. 124 – 39 . New York: : Springer [Google Scholar]
  • Koelle SJ , Zhang H , Meilă M , Chen YC. 2022 . . Manifold coordinates with physical meaning. . J. Mach. Learn. Res. 23 : ( 133 ): 1 – 57 [Google Scholar]
  • Kohli D , Cloninger A , Mishne G. 2021 . . LDLE: low distortion local eigenmaps. . J. Mach. Learn. Res. 22 : ( 282 ): 1 – 64 [Google Scholar]
  • Koltchinskii VI. 2000 . . Empirical geometry of multivariate data: a deconvolution approach. . Ann. Stat. 28 : ( 2 ): 591 – 629 [Crossref] [Google Scholar]
  • Lee JM. 2003 . . Introduction to Smooth Manifolds . New York: : Springer-Verlag [Google Scholar]
  • Levina E , Bickel PJ . 2004 . . Maximum likelihood estimation of intrinsic dimension. . In Advances in Neural Information Processing Systems 17 (NIPS 2004) , ed. L Saul, Y Weiss, L Bottou , pp. 777 – 84 . Red Hook, NY: : Curran [Google Scholar]
  • Lin B , He X , Zhang C , Ji M . 2013 . . Parallel vector field embedding. . J. Mach. Learn. Res. 14 : ( 90 ): 2945 – 77 [Google Scholar]
  • Linderman GC , Steinerberger S. 2019 . . Clustering with t-SNE, provably. . SIAM J. Math. Data Sci. 1 : ( 2 ): 313 – 32 [Crossref] [Google Scholar]
  • Luo C , Safa I , Wang Y. 2009 . . Approximating gradients for meshes and point clouds via diffusion metric. . Comput. Graph. Forum 28 : ( 5 ): 1497 – 508 [Crossref] [Google Scholar]
  • McInnes L , Healy J , Saul N , Grossberger L. 2018 . . UMAP: uniform manifold approximation and projection. . J. Open Source Softw. 3 : ( 29 ): 861 [Crossref] [Google Scholar]
  • McQueen J , Meilă M , Joncas D . 2016 . . Nearly isometric embedding by relaxation. . In Advances in Neural Information Processing Systems 29 (NIPS 2016) , ed. D Lee, M Sugiyama, U Luxburg, I Guyon, R Garnett , pp. 2631 – 39 . Red Hook, NY: : Curran [Google Scholar]
  • McQueen J , Meilă M , VanderPlas J , Zhang Z. 2016a . . Megaman: manifold learning with millions of points. . arXiv:1603.02763 [cs.LG]
  • McQueen J , Meilă M , VanderPlas J , Zhang Z. 2016b . . Megaman: scalable manifold learning in Python. . J. Mach. Learn. Res. 17 : ( 148 ): 1 – 5 [Google Scholar]
  • Meilă M . 2015 . . Spectral clustering. . In Handbook of Cluster Analysis , ed. C Hennig, M Meilă, F Murtagh, R Rocci , pp. 125 – 39 . New York: : Chapman and Hall/CRC [Google Scholar]
  • Meilă M , Shi J. 2001 . . A random walks view of spectral segmentation. . Proc. Mach. Learn. Res. R3 : : 203 – 8 [Google Scholar]
  • Meilă M , Zhang H. 2023 . . Manifold learning: what, how, and why. . arXiv:2311.03757 [stat.ML]
  • Mohammed K , Narayanan H. 2017 . . Manifold learning using kernel density estimation and local principal components analysis. . arXiv:1709.03615 [math.ST]
  • Nadler B , Lafon S , Coifman R , Kevrekidis I. 2006 . . Diffusion Maps, spectral clustering and eigenfunctions of Fokker-Planck operators. . In Advances in Neural Information Processing Systems 18 (NIPS 2005) , ed. Y Weiss, B Schölkopf, J Platt , pp. 955 – 62 . Cambridge, MA: : MIT Press [Google Scholar]
  • Ng A , Jordan M , Weiss Y . 2001 . . On spectral clustering: Analysis and an algorithm. . In Advances in Neural Information Processing Systems 14 (NIPS 2001) , ed. T Dietterich, S Becker, Z Ghahramani , pp. 849 – 56 . Cambridge, MA: : MIT Press [Google Scholar]
  • Noé F , Clementi C. 2017 . . Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. . Curr. Opin. Struct. Biol. 43 : : 141 – 47 [Crossref] [Google Scholar]
  • Ozertem U , Erdogmus D. 2011 . . Locally defined principal curves and surfaces. . J. Mach. Learn. Res. 12 : ( 34 ): 1249 – 86 [Google Scholar]
  • Perrault-Joncas D , Meilă M. 2013 . . Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery. . arXiv:1305.7255 [stat.ML]
  • Perrault-Joncas D , Meilă M , McQueen J . 2017 . . Improved graph Laplacian via geometric self-consistency. . In NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems , ed. U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus , pp. 4457 – 66 . Red Hook, NY: : Curran [Google Scholar]
  • Pettis KW , Bailey TA , Jain AK , Dubes RC. 1979 . . An intrinsic dimensionality estimator from near-neighbor information. . IEEE Trans. Pattern Anal. Mach. Intell. 1 : ( 1 ): 25 – 37 [Crossref] [Google Scholar]
  • Poličar PG , Stražar M , Zupan B. 2019 . . opentSNE: a modular Python library for t-SNE dimensionality reduction and embedding. . bioRxiv 731877. https://doi.org/10.1101/731877
  • Portegies JW. 2016 . . Embeddings of Riemannian manifolds with heat kernels and eigenfunctions. . Commun. Pure Appl. Math. 69 : ( 3 ): 478 – 518 [Crossref] [Google Scholar]
  • Ram P , Lee D , March W , Gray A . 2009 . . Linear-time algorithms for pairwise statistical problems. . In Advances in Neural Information Processing Systems 22 (NIPS 2009) , ed. Y Bengio, D Schuurmans, J Lafferty, C Williams, A Culotta , pp. 1527 – 35 . Red Hook, NY: : Curran [Google Scholar]
  • Rohrdanz MA , Zheng W , Maggioni M , Clementi C. 2011 . . Determination of reaction coordinates via locally scaled diffusion map. . J. Chem. Phys. 134 : ( 12 ): 124116 [Crossref] [Google Scholar]
  • Rosenberg S. 1997 . . The Laplacian on a Riemannian Manifold: An Introduction to Analysis on Manifolds . Cambridge, UK: : Cambridge Univ. Press [Google Scholar]
  • Roweis S , Saul L. 2000 . . Nonlinear dimensionality reduction by locally linear embedding. . Science 290 : ( 5500 ): 2323 – 26 [Crossref] [Google Scholar]
  • Sha F , Saul LK. 2005 . . Analysis and Extension of Spectral Methods for Nonlinear Dimensionality Reduction (ICML'05) . New York: : ACM [Google Scholar]
  • Shi J , Malik J. 2000 . . Normalized cuts and image segmentation. . IEEE Trans. Pattern Anal. Mach. Intell. 22 : ( 8 ): 888 – 905 [Crossref] [Google Scholar]
  • Singer A. 2006 . . From graph to manifold Laplacian: the convergence rate. . Appl. Comput. Harmon. Anal. 21 : ( 1 ): 128 – 34 [Crossref] [Google Scholar]
  • Singer A , Wu HT. 2012 . . Vector Diffusion Maps and the connection Laplacian. . Commun. Pure Appl. Math. 65 : ( 8 ): 1067 – 144 [Crossref] [Google Scholar]
  • Slepčev D , Thorpe M. 2019 . . Analysis of p -Laplacian regularization in semisupervised learning. . SIAM J. Math. Anal. 51 : ( 3 ): 2085 – 120 [Crossref] [Google Scholar]
  • Sogge CD. 2014 . . Hangzhou Lectures on Eigenfunctions of the Laplacian . Princeton, NJ: : Princeton Univ. Press [Google Scholar]
  • Tenenbaum JB , de Silva V , Langford JC. 2000 . . A global geometric framework for nonlinear dimensionality reduction. . Science 290 : ( 5500 ): 2319 – 23 [Crossref] [Google Scholar]
  • Ting D , Huang L , Jordan MI. 2010 . . An analysis of the convergence of graph Laplacians. . In ICML'10: Proceedings of the 27th International Conference on Machine Learning , ed. J Fürnkranz, T Joachims , pp. 1079 – 86 . Madison, WI: : Omnipress [Google Scholar]
  • Ting D , Jordan MI. 2018 . . On nonlinear dimensionality reduction, linear smoothing and autoencoding. . arXiv:1803.02432 [stat.ML]
  • Ting D , Jordan MI. 2020 . . Manifold learning via manifold deflation. . arXiv:2007.03315 [stat.ML]
  • Tribello GA , Ceriotti M , Parrinello M. 2012 . . Using sketch-map coordinates to analyze and bias molecular dynamics simulations. . Proc. Natl. Acad. Sci . 109 : : 5196 – 201 [Crossref] [Google Scholar]
  • van der Maaten L. 2014 . . Accelerating t-SNE using tree-based algorithms. . J. Mach. Learn. Res. 15 : ( 93 ): 3221 – 45 [Google Scholar]
  • van der Maaten L , Hinton G. 2008 . . Visualizing data using t-SNE. . J. Mach. Learn. Res. 9 : : 2579 – 605 [Google Scholar]
  • Vanderplas J , Connolly A. 2009 . . Reducing the dimensionality of data: locally linear embedding of Sloan galaxy spectra. . Astron. J. 138 : ( 5 ): 1365 [Crossref] [Google Scholar]
  • Verma N. 2011 . . Towards an algorithmic realization of Nash's embedding theorem . Work. Pap. , Univ. Calif. , San Diego: [Google Scholar]
  • von Luxburg U. 2007 . . A tutorial on spectral clustering. . Stat. Comput. 17 : ( 4 ): 395 – 416 [Crossref] [Google Scholar]
  • Wasserman L. 2018 . . Topological data analysis. . Annu. Rev. Stat. Appl. 5 : : 501 – 32 [Crossref] [Google Scholar]
  • Weinberger KQ , Saul LK. 2006 . . An introduction to nonlinear dimensionality reduction by maximum variance unfolding. . In Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference , pp. 1683 – 86 . Washington, DC: : AAAI Press [Google Scholar]
  • Yu K , Zhang T. 2010 . . Improved local coordinate coding using local tangents. . In ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning , ed. J Fürnkranz, T Joachims , pp. 1215 – 22 . Madison, WI: : Omnipress [Google Scholar]
  • Zhang Y , Gilbert AC , Steinerberger S. 2022 . . May the force be with you. . In 58th Annual Allerton Conference on Communication, Control, and Computing , pp. 1 – 8 . Piscataway, NJ: : IEEE [Google Scholar]
  • Zhang Y , Steinerberger S. 2022 . . t-SNE, forceful colorings and mean field limits. . Res. Math. Sci. 9 : : 42 [Crossref] [Google Scholar]

Data & Media loading...

Supplemental Material

Download the Supplemental Appendix (PDF).

  • Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, functional data analysis, probabilistic forecasting, bayesian computing with inla: a review, functional regression, topological data analysis, algorithmic fairness: choices, assumptions, and definitions, microbiome, metagenomics, and high-dimensional compositional data analysis, learning deep generative models, on p -values and bayes factors, q-learning: theory and applications.

Manifold Hypothesis

Youtube search... ...Google search

  • Backpropagation ... FFNN ... Forward-Forward ... Activation Functions ... Softmax ... Loss ... Boosting ... Gradient Descent ... Hyperparameter ... Manifold Hypothesis ... PCA
  • Objective vs. Cost vs. Loss vs. Error Function
  • Optimization Methods
  • Embedding ... Fine-tuning ... RAG ... Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction . ...find outliers
  • Manifold Wikipedia
  • Manifolds and Neural Activity: An Introduction | Kevin Luxem - Towards Data Science

The Manifold Hypothesis states that real-world high-dimensional data (images, neural activity) lie on low-dimensional manifolds manifolds embedded within the high-dimensional space . ... manifolds are topological spaces that look locally like Euclidean spaces.

The Manifold Hypothesis explains (heuristically) why machine learning techniques are able to find useful features and produce accurate predictions from datasets that have a potentially large number of dimensions ( variables) . The fact that the actual data set of interest actually lives on in a space of low dimension , means that a given machine learning model only needs to learn to focus on a few key features of the dataset to make decisions. However these key features may turn out to be complicated functions of the original variables. Many of the algorithms behind machine learning techniques focus on ways to determine these ( embedding ) functions. What is the Manifold Hypothesis? | DeepAI

for Data Visualization... - Stefan Kühn
like PCA - Principal Component Analysis - are widely used in Machine Learning for a variety of tasks. But besides the well-known standard methods there are a lot more tools available, especially in the of Manifold Learning. We will interactively explore these tools and present applications for Data Visualization and Feature Engineering using scikit-learn.


Kartik C

  • 1 What is a Manifold?
  • 2 Manifold Learning
  • 3 Manifold Alignment
  • 4 Manifold - Defenses Against Adversarial Attacks
  • 5 Einstein Manifolds

What is a Manifold?

Manifolds are a complex topic, but they are very important in mathematics and physics. By understanding manifolds, we can better understand the world around us. A manifold is a curved surface that looks like a flat plane when you zoom in. For example, the surface of a sphere is a manifold. It looks curved from a distance, but if you look at a small enough patch of the sphere, it looks flat. Manifolds are important in mathematics and physics because they allow us to describe curved surfaces using flat geometry. This is useful because flat geometry is much simpler to understand than curved geometry.

For example, we can use manifolds to describe the space around us. Even though the space around us is curved, we can describe it using flat geometry by breaking it up into a bunch of small patches. Each patch is a manifold, and we can use flat geometry to describe each patch.

Manifolds are also important in relativity. Relativity is a theory of physics that describes space and time. In relativity, space and time are combined into a four-dimensional manifold.

This manifold is curved, but we can use flat geometry to describe it by breaking it up into a bunch of small patches. Each patch is a manifold, and we can use flat geometry to describe each patch. Here is a simple analogy to help you understand manifolds:

Imagine you have a map of the world. The map is flat, but the Earth is a sphere. This means that the map is distorted. For example, Greenland appears to be larger than Africa on a map, but in reality Africa is much larger.

  • A manifold is like a way to unfold the map so that it accurately represents the surface of the Earth. This would allow you to compare the distances between different places on the Earth more accurately.

Manifold Learning

  • Representation Learning

Manifold learning is an approach to non-linear dimensionality reduction . Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high. Manifold learning | SciKitLearn

Manifold learning is based on the assumption that many high-dimensional datasets lie on a low-dimensional manifold , which is a curved surface that is embedded in a higher-dimensional space . It is particularly useful for datasets that lie on a low-dimensional manifold , even if the manifold is non-linear. Manifold learning algorithms aim to find a low-dimensional embedding of the data that preserves the intrinsic geometry of the manifold. This can be useful for visualization, data analysis, and machine learning tasks. Some popular manifold learning algorithms include:

  • Local Linear Embedding (LLE) : LLE constructs a local linear model for each data point and then finds a low-dimensional embedding that preserves the local structure of the data.
  • Isomap : Isomap constructs a graph of the data points based on their pairwise distances and then uses multidimensional scaling (MDS) to find a low-dimensional embedding of the graph.
  • T-Distributed Stochastic Neighbor Embedding (t-SNE) : t-SNE uses a probabilistic approach to find a [[Dimensional Reduction|low-dimensional] embedding of the data that preserves the local and global structure of the data.

Manifold learning algorithms have been used in a wide variety of applications, including:

  • Image processing : Manifold learning can be used to reduce the dimensionality of images without losing important information. This can be useful for image compression, classification, and retrieval.
  • Natural Language Processing (NLP) : Manifold learning can be used to reduce the dimensionality of text data without losing important information. This can be useful for text classification, clustering, and machine translation.
  • Bioinformatics : Manifold learning can be used to reduce the dimensionality of biological data, such as gene expression data and protein structure data. This can be useful for identifying disease biomarkers and developing new drugs.

Manifold Alignment

  • Manifold alignment
  • Manifold Alignment | C. Wang, P. Krafft, & S. Mahadevan - UMass Amherst

Imagine you have two different datasets, one of images of cats and one of images of dogs. Both datasets are high-dimensional, meaning that each image is represented by a long list of numbers.

Manifold alignment is a technique that can be used to find a common representation of these two datasets, even though they are different. It does this by assuming that the two datasets lie on a common manifold.

A manifold is a curved surface that locally resembles a flat plane. For example, the Earth's surface is a manifold. It is curved, but if you look at a small enough patch, it looks flat.

Manifold alignment works by finding a projection from each dataset to the manifold. This projection maps each image to a point on the manifold. The goal is to find projections that preserve the relationships between the images in each dataset.

Once the projections have been found, the two datasets can be represented in the same space. This makes it possible to compare the images in the two datasets directly.

For example, manifold alignment could be used to develop a system that can identify cats and dogs in images, even if the images are taken from different angles or in different lighting conditions.

Here is a simple analogy to help you understand manifold alignment:

Manifold alignment is like finding a way to unfold the map so that it accurately represents the surface of the Earth. This would allow you to compare the distances between different places on the Earth more accurately.

Manifold alignment is a powerful technique that can be used to solve a variety of problems in machine learning. It is often used in applications such as image recognition, natural language processing, and data visualization.

Manifold - Defenses Against Adversarial Attacks

  • Defenses Against Adversarial Attacks

An adversarial example is a specially crafted input that is designed to fool a machine learning model. Adversarial examples are often created by adding small perturbations to normal inputs. These perturbations are imperceptible to humans, but they can cause the model to make incorrect predictions.

The manifold of normal examples is a mathematical object that represents the set of all normal inputs. It is a curved surface, and each normal input corresponds to a point on the manifold.

The reformer network is a type of neural network that can be used to defend against adversarial attacks. It works by moving adversarial examples towards the manifold of normal examples. This makes it more likely that the model will classify the adversarial examples correctly.

To understand how the reformer network works, it is helpful to think about an analogy. Imagine that the manifold of normal examples is a mountain range. The reformer network is like a force that pushes adversarial examples uphill, towards the highest peaks of the mountain range. The higher up the mountain an adversarial example is, the more likely it is to be classified correctly.

The reformer network is effective for correctly classifying adversarial examples with small perturbation because it moves the adversarial examples towards the manifold of normal examples. This makes it more likely that the model will classify the adversarial examples correctly.

Here is a concrete example of how the reformer network could be used to defend against an adversarial attack on an image classification model:

Suppose an attacker creates an adversarial image of a cat that is classified as a dog by the model. The reformer network could be used to move the adversarial image towards the manifold of normal images of cats. This would make it more likely that the model will classify the adversarial image correctly, as a cat.

The reformer network is a promising new defense against adversarial attacks. It is still under development, but it has shown promising results in experiments.

Einstein Manifolds

Einstein manifolds are special types of curved surfaces that are important in the theory of general relativity. General relativity is a theory of gravity that describes space and time as a curved four-dimensional surface. Einstein manifolds are surfaces that satisfy the Einstein field equations, which are the equations that govern gravity.

One way to think about Einstein manifolds is to imagine a trampoline with a bowling ball in the middle. The bowling ball will cause the trampoline to bend. This is similar to how gravity causes spacetime to bend.

If you roll a marble across the trampoline, it will follow a curved path. This is because the marble is following the curvature of the trampoline. In the same way, objects in spacetime follow curved paths because of the curvature of spacetime.

Einstein manifolds are mathematical models of bent spacetime. By studying Einstein manifolds, we can learn more about the nature of gravity and the structure of the universe.

Here is a simpler analogy:

Imagine you have a piece of paper. You can draw a straight line on the paper. This is like a flat surface.

Now, imagine you crumple the piece of paper into a ball. You can still draw a line on the paper, but it will be curved. This is like a curved surface.

Einstein manifolds are like curved surfaces, but they are four-dimensional. This means that they have four directions, instead of just two.

Einstein manifolds are important because they are models of the spacetime in which we live. Spacetime is the fabric of the universe, and it is curved because of gravity. By studying Einstein manifolds, we can learn more about the nature of gravity and how it works.

Navigation menu

Personal tools.

  • View source
  • View history
  • ConceptChains
  • Capabilities
  • Case Studies
  • What links here
  • Related changes
  • Special pages
  • Printable version
  • Permanent link
  • Page information
  • This page was last edited on 16 September 2023, at 16:15.
  • Privacy policy
  • Disclaimers

Powered by MediaWiki

DeepSense: test prioritization for neural network based on multiple mutation and manifold spatial distribution

  • Research Paper
  • Published: 30 July 2024

Cite this article

manifold hypothesis machine learning

  • FengYu Yang 1 , 2 ,
  • YuAn Chen 1 ,
  • Tong Chen 1 ,
  • Ying Ma 1 &
  • Jie Liao 3  

Deep learning systems have been used extensively in several fields in recent years, but deep neural network (DNN) can also make incorrect decisions and lead to significant losses. Testing tasks based on DNN systems often require annotation of test data to obtain oracle information. However, collecting and annotating large amounts of disparate data from application scenarios is very expensive and time consuming. Thus, we propose DeepSense, an effective neural network test prioritization technique, to select more test inputs that reveal neural network faults as early as possible in the unlabeled test dataset. DeepSense considers the full range of model fault detection capabilities, including faults near the boundary and near the centroid of class. To be specific, we first designed multiple mutation features and extracted them from the neural network model and input samples based on different mutation operators. Then, the sample embedding features are extracted to construct an undirected weighted graph, and a random walk is performed to calculate the distance similarity and manifold similarity, and then spatial nearest neighbor features are designed and extracted. Finally, MLP is used to combine multiple mutation features and spatial nearest neighbor features to predict the input sample fault revealing ability and set priorities accordingly. We evaluate DeepSense on four popular image datasets, and results show that DeepSense significantly outperforms existing test input prioritization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

manifold hypothesis machine learning

Similar content being viewed by others

manifold hypothesis machine learning

TPFL: Test Input Prioritization for Deep Neural Networks Based on Fault Localization

manifold hypothesis machine learning

D-Score: A White-Box Diagnosis Score for CNNs Based on Mutation Operators

manifold hypothesis machine learning

Neuron importance-aware coverage analysis for deep neural network testing

Availability of data and materials.

Data images supporting the experiment training are publicly available in the MINIST, Fashion-MINIST, CIFAR-10 and SVHN repository as part of this record: MINIST: http://yann.lecun.com/exdb/mnist/ Fashion-MINIST: https://yann.lecun.com/exdb/mnist/ CIFAR-10: http://www.cs.toronto.edu/ \(\sim\) kriz/cifar.html SVHN: http://ufldl.stanford.edu/housenumbers/

Han SC, Lin CH, Shen C, Wang Q, Guan XH (2023) Interpreting adversarial examples in deep learning: a review. ACM Comput Surv. https://doi.org/10.1145/3594869

Article   Google Scholar  

Cinà AE, Grosse K, Demontis A, Vascon S, Zellinger W (2023) Wild patterns reloaded: a survey of machine learning security against training data poisoning. ACM Comput Surv. https://doi.org/10.1145/3585385

Xiong P, Tegegn M, Sarin JS (2023) It is all about data: a survey on the effects of data on adversarial robustness. ACM Comput Surv. https://doi.org/10.48550/arXiv.2303.09767

Long T, Gao Q, Xu LL, Zhou ZB (2022) A survey on adversarial attacks in computer vision: taxonomy, visualization and future directions. Comput Secur 121:102847. https://doi.org/10.1016/j.cose.2022.102847

Sun JZ, Li J, Wen SL (2023) DeepMC: DNN test sample optimization method jointly guided by misclassification and coverage. Appl Intell 53(12):15787–15801

Al-Qadasi H, Wu CS, Falcone Y, Bensalem S (2022) DeepAbstraction: 2-level prioritization for unlabeled test inputs in deep neural networks. In: International conference on artificial intelligence testing, IEEE, pp 64–71

Feng Y, Shi QK, Gao XY (2020) DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks. In: International symposium on software testing and analysis, ACM, pp 177–178

Li Y, Li M, Lai QX (2021) TestRank: bringing order into unlabeled test instances for deep learning tasks. Neural Inf Process Syst 34:20874–20886

Google Scholar  

Byun T, Rayadurgam S, Heimdahl MPE (2021) Black-box testing of deep neural networks. In: International symposium on software reliability engineering. IEEE, pp 309–320

Gao XY, Feng Y, Yin YN (2022) Adaptive test selection for deep neural networks. In: International conference on software engineering. IEEE, pp 73–85

Zhang L, Sun XC, Li Y, Zhang ZY (2019) A noise-sensitivity-analysis-based test prioritization technique for deep neural networks. https://doi.org/10.48550/arXiv.1901.00054 . Accessed 1 Jan 2019

Ma W, Papadakis M, Tsakmails A (2021) Test selection for deep learning systems. ACM Trans Softw Eng Methodol 30(2):1–22

Kim J, Feldt R, Yoo S (2019) Guiding deep learning system testing using surprise adequacy. In: International conference on software engineering. IEEE, pp 1039–1049

Weiss M, Chakraborty R, Tonella P (2021) A review and refinement of surprise adequacy. In: International workshop on deep learning for testing and testing for deep learning. IEEE, pp 17–24

Wang ZY, Xu SH, Cai XR, Ji H (2020) Test input selection for deep neural networks. J Phys Conf Ser 1693(1):012017

Shen WJ, Li YH, Chen L (2020) Multiple-boundary clustering and prioritization to promote neural network retraining. In: Automated software engineering. IEEE, pp 410–422

Wang Z, You HM, Chen JJ (2021) Prioritizing test inputs for deep neural networks via mutation analysis. In: Software engineering. IEEE, pp 397–409

Cayton L (2005) Algorithms for manifold learning. Univ. of California at San Diego Tech. Rep.

Guan S, Loew M (2020) Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In: Machine learning and applications. IEEE, pp 101–106

Petrović G, Ivanković M, Fraser G, Just R (2021) Does mutation testing improve testing practices?. In: Software engineering. IEEE, pp 910–921

Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. In: Annual conference on neural information processing systems, Wiley, p 9

Zhou DY, Weston J, Gretton A (2003) Ranking on data manifolds. In: neural information processing systems, MIT Press, pp 169–176

Iscen A, Tolias G, Avrithis Y (2017) Efficient diffusion on region manifolds: Recovering small objects with compact CNN representations. In: computer vision and pattern recognition. IEEE, pp 926–935

Page L, Brin S, Motwani Winograd T (1998) The PageRank citation ranking: bringing order to the web. In: ASIS, pp 161–172

Cho M, Lee KM (2012) Mode-seeking on graphs via random walks. In: Computer vision and pattern recognition. IEEE, pp 606–613

Zhang CN, Benz P, Imatiaz T, Kweon IS (2020) Understanding adversarial examples from the mutual influence of images and perturbations. In: Computer vision and pattern recognition, IEEE, pp 14509–14518

Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: Security and privacy. IEEE, pp 39–57

Kurakin A, Goodfellow I, Bengio S (2016) Adversarial machine learning at scale. https://doi.org/10.48550/arXiv.1611.01236 . Accessed 11 Feb 2017

Papernot, N. McDaniel, P. Jha S (2016) The limitations of deep learning in adversarial settings. In: Security and privacy. IEEE, pp 372–387

Download references

Author information

Authors and affiliations.

Nanchang Hangkong University, Feng and South Avenue, Nanchang, 330063, China

FengYu Yang, YuAn Chen, Tong Chen & Ying Ma

Nanjing University of Aeronautics and Astronautics, Imperial street, Nanjing, 210016, China

FengYu Yang

Jiangxi Hongdu Aviation Industry (Group) Corporation Limited, Qingyunpu Road, Nanchang, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Tong Chen .

Ethics declarations

Conflict of interest.

The authors have no potential Conflict of interest to disclose. The authors have no relevant financial or non-financial interests to disclose.

Ethics approval and consent to participate

This paper does not involve human or animal studies. There are no human or animal subjects in this paper and informed consent does not apply. The authors did not receive support from any organization for the submitted work.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Yang, F., Chen, Y., Chen, T. et al. DeepSense: test prioritization for neural network based on multiple mutation and manifold spatial distribution. Evol. Intel. (2024). https://doi.org/10.1007/s12065-024-00961-4

Download citation

Received : 12 December 2023

Revised : 19 June 2024

Accepted : 20 June 2024

Published : 30 July 2024

DOI : https://doi.org/10.1007/s12065-024-00961-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Test prioritization
  • Deep learning testing
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

remotesensing-logo

Article Menu

manifold hypothesis machine learning

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Angle estimation using learning-based doppler deconvolution in beamspace with forward-looking radar.

manifold hypothesis machine learning

1. Introduction

  • An end-to-end learning-based FLR target angle estimation framework is established, which omits the computationally expensive iterative optimization.
  • A regression strategy is adopted, which can avoid the off-grid effects.
  • A semi-supervised mechanism is introduced in learning through the manifold regularization framework to avoid building overly large FLR target positioning datasets.

2.1. Mathematical Model

2.2. angle estimation using ssl-fae, 2.3. training ssl-fae, 3. numerical simulations and results, 3.1. experiment 1, 3.1.1. case 1, 3.1.2. case 2, 3.1.3. case 3, 3.1.4. case 4, 3.1.5. case 5, 3.2. experiment 2, 3.3. experiment 3, 3.4. experiment 4, 4. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest, abbreviations.

FLRForward-looking radar
SVRSupport vector regression
SNRSignal-to-noise ratio
APDAntenna pattern deconvolution
SVMSupport vector machines
ASPArray signal processing
SSL-FAESemi-supervised learning framework for FLR angle estimation
MVDRMinimum variance distortionless response
CSCompressive sensing
RKHSReproducing kernel Hilbert space
RBFRadial basis function
SCRSignal-to-clutter ratio
  • Chen, H.; Li, Y.; Gao, W.; Zhang, W.; Sun, H.; Guo, L.; Yu, J. Bayesian Forward-Looking Superresolution Imaging Using Doppler Deconvolution in Expanded Beam Space for High-Speed Platform. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Zhang, Y.; Huang, Y.; Li, W.; Yang, J. Angular Superresolution for Scanning Radar with Improved Regularized Iterative Adaptive Approach. IEEE Trans. Geosci. Remote Sens. 2016 , 13 , 846–850. [ Google Scholar ] [ CrossRef ]
  • Chen, H.M.; Li, M.; Wang, Z.; Lu, Y.; Zhang, P.; Wu, Y. Sparse Super-resolution Imaging for Airborne Single Channel Forward-looking Radar in Expanded Beam Space via l p Regularisation. Electron. Lett. 2015 , 51 , 863–865. [ Google Scholar ] [ CrossRef ]
  • Dropkin, H.; Ly, C. Superresolution for scanning antenna. In Proceedings of the 1997 IEEE National Radar Conference, Syracuse, New York, NY, USA, 13–15 May 1997; pp. 306–308. [ Google Scholar ]
  • Li, Y.; Liu, J.; Jiang, X.; Huang, X. Angular Superresol for Signal Model in Coherent Scanning Radars. IEEE Trans. Aerosp. Electron. Syst. 2019 , 55 , 3103–3116. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Zhang, Y.; Li, W.; Huang, Y.; Yang, J. Super-Resolution Surface Mapping for Scanning Radar: Inverse Filtering Based on the Fast Iterative Adaptive Approach. IEEE Trans. Geosci. Remote Sens. 2018 , 56 , 127–144. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Jakobsson, A.; Zhang, Y.; Huang, Y.; Yang, J. Wideband Sparse Reconstruction for Scanning Radar. IEEE Trans. Geosci. Remote Sens. 2018 , 56 , 6055–6068. [ Google Scholar ] [ CrossRef ]
  • Zhu, R.; Wen, J.; Xiong, X. Forward-looking imaging algorithm for airborne radar based on beam-space multiple signal classification. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1276–1280. [ Google Scholar ]
  • Zha, Y.; Huang, Y.; Sun, Z.; Wang, Y.; Yang, J. Bayesian Deconvolution for Angular Super-Resolution in Forward-Looking Scanning Radar. Sensors 2015 , 15 , 6924–6946. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, Y.; Zhang, Q.; Li, C.; Zhang, Y.; Huang, Y.; Yang, J. Sea-Surface Target Angular Superresolution in Forward-Looking Radar Imaging Based on Maximum A Posteriori Algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019 , 12 , 2822–2834. [ Google Scholar ] [ CrossRef ]
  • Zhang, Q.; Zhang, Y.; Huang, Y.; Zhang, Y. Azimuth Superresolution of Forward-Looking Radar Imaging Which Relies on Linearized Bregman. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019 , 12 , 2032–2043. [ Google Scholar ] [ CrossRef ]
  • Yang, J.; Kang, Y.; Zhang, Y.; Huang, Y.; Zhang, Y. A Bayesian angular superresolution method with lognormal constraint for sea-surface target. IEEE Access 2020 , 8 , 13419–13428. [ Google Scholar ] [ CrossRef ]
  • Duarte, M.F.; Baraniuk, R.G. Spectral compressive sensing. Appl. Comput. Harmon. Anal. 2013 , 35 , 111–129. [ Google Scholar ] [ CrossRef ]
  • Chi, Y.; Scharf, L.L.; Pezeshki, A.; Calderbank, A.R. Sensitivity to basis mismatch in compressed sensing. IEEE Trans. Signal Process. 2011 , 59 , 2182–2195. [ Google Scholar ] [ CrossRef ]
  • Herman, M.A.; Strohmer, T. General deviants: An analysis of perturbations in compressed sensing. IEEE J. Sel. Top. Signal Process. 2010 , 4 , 342–349. [ Google Scholar ] [ CrossRef ]
  • Wang, M.; Yang, S.; Wu, S.; Luo, F. A RBFNN Approach for DoA Estimation of Ultra Wideband Antenna Array. Neurocomputing 2008 , 71 , 631–640. [ Google Scholar ] [ CrossRef ]
  • Xiao, X.; Zhao, S.; Zhong, X.; Jones, D.L.; Chng, E.S.; Li, H. A Learning-Based Approach to Direction of Arrival Estimation in Noisy and Reverberant Environments. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 2814–2818. [ Google Scholar ] [ CrossRef ]
  • Chakrabarty, S.; Habets, E.A.P. Broadband Doa Estimation Using Convolutional Neural Networks Trained with Noise Signals. In Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 15–18 October 2017; pp. 136–140. [ Google Scholar ] [ CrossRef ]
  • Sun, Y.; Chen, J.; Yuen, C.; Rahardja, S. Indoor Sound Source Localization with Probabilistic Neural Network. IEEE Trans. Ind. Electron. 2018 , 65 , 6403–6413. [ Google Scholar ] [ CrossRef ]
  • Liu, Z.M.; Zhang, C.; Yu, P.S. Direction-of-Arrival Estimation Based on Deep Neural Networks with Robustness to Array Imperfections. IEEE Trans. Antennas Propag. 2018 , 66 , 7315–7327. [ Google Scholar ] [ CrossRef ]
  • Lo, T.; Leung, H.; Litva, J. Radial Basis Function Neural Network for Direction-of-Arrivals Estimation. IEEE Signal Process. Lett. 1994 , 1 , 45–47. [ Google Scholar ] [ CrossRef ]
  • El Zooghby, A.; Christodoulou, C.; Georgiopoulos, M. A Neural Network-Based Smart Antenna for Multiple Source Tracking. IEEE Trans. Antennas Propag. 2000 , 48 , 768–776. [ Google Scholar ] [ CrossRef ]
  • Randazzo, A.; Abou-Khousa, M.A.; Pastorino, M.; Zoughi, R. Direction of Arrival Estimation Based on Support Vector Regression: Experimental Validation and Comparison with MUSIC. IEEE Antennas Wirel. Propag. Lett. 2007 , 6 , 379–382. [ Google Scholar ] [ CrossRef ]
  • Dehghanpour, M.; Vakili, V.T.; Farrokhi, A. DOA Estimation Using Multiple Kernel Learning SVM Considering Mutual Coupling. In Proceedings of the 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems, Bucharest, Romania, 19–21 September 2012; pp. 55–61. [ Google Scholar ] [ CrossRef ]
  • Ashok, C.; Venkateswaran, N. Support Vector Regression Based DOA Estimation in Heavy Tailed Noise Environment. In Proceedings of the 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 23–25 March 2016; pp. 99–102. [ Google Scholar ] [ CrossRef ]
  • Venkateswaran, N.; Ashok, C. DOA Estimation of Near-Field Sources Using Support Vector Regression. In Proceedings of the 2016 IEEE Region 10 Conference (TENCON), Singapore, 22–25 November 2016; pp. 1570–1574. [ Google Scholar ] [ CrossRef ]
  • Kabaoglu, N. Target Tracking Using Particle Filters with Support Vector Regression. IEEE Trans. Veh. Technol. 2009 , 58 , 2569–2573. [ Google Scholar ] [ CrossRef ]
  • Ozer, S.; Cirpan, H.; Kabaoglu, N. Support Vector Machines Based Target Tracking Techniques. In Proceedings of the 2006 IEEE 14th Signal Processing and Communications Applications, Antalya, Turkey, 17–19 April 2006; pp. 1–4. [ Google Scholar ] [ CrossRef ]
  • Kabaoğlu, N.; Çırpan, H.A. Wideband Target Tracking by Using SVR-based Sequential Monte Carlo Method. Signal Process. 2008 , 88 , 2804–2816. [ Google Scholar ] [ CrossRef ]
  • Wu, L.L.; Huang, Z.T. Coherent SVR Learning for Wideband Direction-of-Arrival Estimation. IEEE Signal Process. Lett. 2019 , 26 , 642–646. [ Google Scholar ] [ CrossRef ]
  • Wu, L. Array Signal Processing Based on Machine Learning. Ph.D. Thesis, National University of Defense Technology, Changsha, China, 2022. [ Google Scholar ]
  • Perry, R.P.; DiPietro, R.C.; Fante, R.L. Coherent Integration with Range Migration Using Keystone Formatting. In Proceedings of the 2007 IEEE Radar Conference, Waltham, MA, USA, 17–20 April 2007; pp. 863–868. [ Google Scholar ] [ CrossRef ]
  • Li, W.; Li, M.; Zuo, L.; Chen, H.; Wu, Y. Real aperture radar forward-looking imaging based on variational Bayesian in presence of outliers. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Chen, H.; Wang, Z.; Zhang, Y.; Jin, X.; Gao, W.; Yu, J. Data-driven airborne bayesian forward-looking superresolution imaging based on generalized Gaussian distribution. Front. Signal Process. 2023 , 3 , 1093203. [ Google Scholar ] [ CrossRef ]
  • Benesty, J.; Chen, J.; Huang, Y. A generalized MVDR spectrum. IEEE Signal Process. Lett. 2005 , 12 , 827–830. [ Google Scholar ] [ CrossRef ]
  • Wen, X.; Kuang, G.; Hu, J.; Zhan, R.; Zhang, J. Forward-looking imaging of scanning phased array radar based on the compressed sensing. Prog. Electromagn. Res. 2013 , 143 , 575–604. [ Google Scholar ] [ CrossRef ]
  • Chen, L.; Tsang, I.W.; Xu, D. Laplacian Embedded Regression for Scalable Manifold Regularization. IEEE Trans. Neural Netw. Learn. Syst. 2012 , 23 , 902–915. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hernández, N.; Biscay, R.J.; Talavera, I. Support Vector Regression Methods for Functional Data. In Progress in Pattern Recognition, Image Analysis and Applications ; Rueda, L., Mery, D., Kittler, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 4756, pp. 564–573. [ Google Scholar ] [ CrossRef ]
  • Shen, X.J.; Dong, Y.; Gou, J.P.; Zhan, Y.Z.; Fan, J. Least Squares Kernel Ensemble Regression in Reproducing Kernel Hilbert Space. Neurocomputing 2018 , 311 , 235–244. [ Google Scholar ] [ CrossRef ]
  • Beale, R.; Jackson, T. Neural Computing—An Introduction ; CRC Press: Boca Raton, FL, USA, 1990. [ Google Scholar ]

Click here to enlarge figure

ParameterSymbolValue
frequency 77 GHz
bandwidth 2 GHz
pulsewidthT2 s
pulse repetition frequencyPRF10 kHz
altitudeH100 m
velocityv0–250 km/h
SymbolValue
0.1
0.1
10 m/s40 m/s70 m/s100 m/s130 m/s160 m/s
SVR
SSL-FAE
MVDRBayesianDoppler–MVDRDoppler–BayesianSVRSSL-FAE
Training----0.3399 s1.5508 s
Estimation0.0874 s0.0055 s0.4118 s0.0073 s0.0016 s0.0028 s
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Li, W.; Xu, X.; Xu, Y.; Luan, Y.; Tang, H.; Chen, L.; Zhang, F.; Liu, J.; Yu, J. Angle Estimation Using Learning-Based Doppler Deconvolution in Beamspace with Forward-Looking Radar. Remote Sens. 2024 , 16 , 2840. https://doi.org/10.3390/rs16152840

Li W, Xu X, Xu Y, Luan Y, Tang H, Chen L, Zhang F, Liu J, Yu J. Angle Estimation Using Learning-Based Doppler Deconvolution in Beamspace with Forward-Looking Radar. Remote Sensing . 2024; 16(15):2840. https://doi.org/10.3390/rs16152840

Li, Wenjie, Xinhao Xu, Yihao Xu, Yuchen Luan, Haibo Tang, Longyong Chen, Fubo Zhang, Jie Liu, and Junming Yu. 2024. "Angle Estimation Using Learning-Based Doppler Deconvolution in Beamspace with Forward-Looking Radar" Remote Sensing 16, no. 15: 2840. https://doi.org/10.3390/rs16152840

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Statistics > Machine Learning

Title: transient anisotropic kernel for probabilistic learning on manifolds.

Abstract: PLoM (Probabilistic Learning on Manifolds) is a method introduced in 2016 for handling small training datasets by projecting an Itô equation from a stochastic dissipative Hamiltonian dynamical system, acting as the MCMC generator, for which the KDE-estimated probability measure with the training dataset is the invariant measure. PLoM performs a projection on a reduced-order vector basis related to the training dataset, using the diffusion maps (DMAPS) basis constructed with a time-independent isotropic kernel. In this paper, we propose a new ISDE projection vector basis built from a transient anisotropic kernel, providing an alternative to the DMAPS basis to improve statistical surrogates for stochastic manifolds with heterogeneous data. The construction ensures that for times near the initial time, the DMAPS basis coincides with the transient basis. For larger times, the differences between the two bases are characterized by the angle of their spanned vector subspaces. The optimal instant yielding the optimal transient basis is determined using an estimation of mutual information from Information Theory, which is normalized by the entropy estimation to account for the effects of the number of realizations used in the estimations. Consequently, this new vector basis better represents statistical dependencies in the learned probability measure for any dimension. Three applications with varying levels of statistical complexity and data heterogeneity validate the proposed theory, showing that the transient anisotropic kernel improves the learned probability measure.
Comments: 44 pages, 14 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
classes: 68Q32, 68T05, 62R30, 6 0J20
 classes: G.3
Cite as: [stat.ML]
  (or [stat.ML] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Manifold Hypothesis Definition

    manifold hypothesis machine learning

  2. The Manifold Learning Hypothesis.

    manifold hypothesis machine learning

  3. Manifold Learning

    manifold hypothesis machine learning

  4. The Operator Learning Manifold Hypothesis.

    manifold hypothesis machine learning

  5. The manifold hypothesis and classification of datasets : r/deeplearning

    manifold hypothesis machine learning

  6. Scikit-Learn

    manifold hypothesis machine learning

VIDEO

  1. Decision Tree :: Decision Tree Hypothesis @ Machine Learning Techniques (機器學習技法)

  2. Riemannian Geometry

  3. Radial Basis Function Network :: RBF Network Hypothesis @ Machine Learning Techniques (機器學習技法)

  4. Unit-1 Machine Learning

  5. Hypothesis #artificialintelligence #machinelearning #coderella #computerscience #machine #algorithm

  6. Neural Network :: Neural Network Hypothesis @ Machine Learning Techniques (機器學習技法)

COMMENTS

  1. Manifold hypothesis

    The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization .

  2. Manifold Hypothesis Definition

    The Manifold Hypothesis explains ( heuristically) why machine learning techniques are able to find useful features and produce accurate predictions from datasets that have a potentially large number of dimensions ( variables). The fact that the actual data set of interest actually lives on in a space of low dimension, means that a given machine ...

  3. 2.2. Manifold learning

    Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high. ... "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis ... L.J.P.; Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

  4. Manifold Learning: Introduction and Foundational Algorithms

    The manifold hypothesis. Chapter 1: Multidimensional Scaling. Classical, metric, and non-metric MDS algorithms. Example applications to quantitative psychology and social science. Chapter 2: ISOMAP. Geodesic distances and the isometric mapping algorithm. Implementation details and applications with facial images and coil-100 object images.

  5. Neural Networks, Manifolds, and Topology -- colah's blog

    The manifold hypothesis is that natural data forms lower-dimensional manifolds in its embedding space. ... New layers, specifically motivated by the manifold perspective of machine learning, may be useful supplements. (This is a developing research project. It's posted as an experiment in doing research openly.

  6. [2311.03757] Manifold learning: what, how, and why

    Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret ...

  7. Deep Generative Models through the Lens of the Manifold Hypothesis: A

    In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold ...

  8. Manifold Learning: What, How, and Why

    Manifold learning (ML), also known as nonlinear dimension reduction, is a set of methods to find the low-dimensional structure of data. Dimension reduction for large, high-dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high-dimensional point clouds and allow one to visualize, denoise, and interpret them.

  9. Manifold Learning

    In its early days, the primary applications of manifold learning were to reduce the representation dimension, to beat the curse of dimensionality in machine learning applications such as face recognition (cf. []).Another parallel stream of work has been to leverage these methods for understanding structure of data by embedding into 2D or 3D maps.

  10. PDF Testing the manifold hypothesis

    TESTING THE MANIFOLD HYPOTHESIS CHARLESFEFFERMAN,SANJOYMITTER,ANDHARIHARANNARAYANAN Contents 1. Introduction 984 1.1. Definitions 988 1.2. Constants 988 1.3. d-planes 988 ... to the underlying hypothesis as the "manifold hypothesis." Manifold learning, in particular, fitting low dimensional nonlinear manifolds to sampled data points in

  11. [2011.01307] The Mathematical Foundations of Manifold Learning

    Manifold learning is a popular and quickly-growing subfield of machine learning based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. This thesis presents a mathematical perspective on manifold learning, delving into the intersection of kernel learning, spectral graph theory, and differential geometry. Emphasis is placed on ...

  12. Manifold Hypothesis

    The Manifold Hypothesis explains (heuristically) why machine learning techniques are able to find useful features and produce accurate predictions from datasets that have a potentially large number of dimensions ( variables).The fact that the actual data set of interest actually lives on in a space of low dimension, means that a given machine learning model only needs to learn to focus on a ...

  13. PDF Sample complexity of testing the manifold hypothesis

    2. We obtain a minimax lower bound on the sample complexity of any rule for learning a manifold from Fin Theorem 6 showing that for a fixed error, the the dependence of the sample complexity on intrinsic dimension, curvature and volume must be at least exponen- tial, polynomial, and linear, respectively. 3.

  14. PDF The Manifold Hypothesis for Gradient-Based Explanations

    ents are being projected on the data manifold [14,40]. In explainable machine learning, it has been shown that expla-nations can be manipulated by modifying the model outside of the image manifold, and that one can defend against such attacks by projecting the explanations back on the manifold [13]. The hypothesis that natural image data ...

  15. Machine learning-accelerated computational fluid dynamics

    Significance. Accurate simulation of fluids is important for many science and engineering problems but is very computationally demanding. In contrast, machine-learning models can approximate physics very quickly but at the cost of accuracy. Here we show that using machine learning inside traditional fluid simulations can improve both accuracy ...

  16. [2208.11665] Statistical exploration of the Manifold Hypothesis

    The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested ...

  17. Information geometry in optimization, machine learning and statistical

    The present article gives an introduction to information geometry and surveys its applications in the area of machine learning, optimization and statistical inference. Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds. They give a Riemannian structure together with a pair of dual ...

  18. A Review of Physics-Informed Machine Learning in Fluid Mechanics

    Physics-informed machine-learning (PIML) enables the integration of domain knowledge with machine learning (ML) algorithms, which results in higher data efficiency and more stable predictions. This provides opportunities for augmenting—and even replacing—high-fidelity numerical simulations of complex turbulent flows, which are often expensive due to the requirement of high temporal and ...

  19. Manifold Learning in Regression Tasks

    The paper presents a new geometrically motivated method for non-linear regression based on Manifold learning technique. The regression problem is to construct a predictive function which estimates an unknown smooth mapping f from q-dimensional inputs to m-dimensional outputs based on a training data set consisting of given 'input-output' pairs.

  20. Efficient aerodynamic shape optimization by using unsupervised manifold

    Unsupervised non-linear manifold learning (Izenman, Citation 2012; Fukami & Taira, Citation 2023) is a novel approach for capturing geometric features. It treats the data as distributions on low-dimensional manifolds, and uses the non-Euclidean distance to capture the intrinsic attributes of features.

  21. Manifold Hypothesis in Data Analysis: Double Geometrically

    Manifold hypothesis states that data points in high-dimensional space actually lie in close vicinity of a manifold of much lower dimension. In many cases this hypothesis was empirically verified and used to enhance unsupervised and semi-supervised learning. Here we present new approach to manifold hypothesis checking and underlying manifold dimension estimation. In order to do it we use two ...

  22. The Manifold Hypothesis for Gradient-Based Explanations

    Some form of adjustment to the model architecture or training algorithm is necessary, since we show that generalization of neural networks alone does not imply the alignment of model gradients with the data manifold. Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2206.07387 [cs.LG]

  23. DeepSense: test prioritization for neural network based on ...

    Deep learning systems have been used extensively in several fields in recent years, but deep neural network (DNN) can also make incorrect decisions and lead to significant losses. Testing tasks based on DNN systems often require annotation of test data to obtain oracle information. However, collecting and annotating large amounts of disparate data from application scenarios is very expensive ...

  24. Angle Estimation Using Learning-Based Doppler Deconvolution in ...

    The measurement of the target azimuth angle using forward-looking radar (FLR) is widely applied in unmanned systems, such as obstacle avoidance and tracking applications. This paper proposes a semi-supervised support vector regression (SVR) method to solve the problem of small sample learning of the target angle with FLR. This method utilizes function approximation to solve the problem of ...

  25. Transient anisotropic kernel for probabilistic learning on manifolds

    PLoM (Probabilistic Learning on Manifolds) is a method introduced in 2016 for handling small training datasets by projecting an Itô equation from a stochastic dissipative Hamiltonian dynamical system, acting as the MCMC generator, for which the KDE-estimated probability measure with the training dataset is the invariant measure. PLoM performs a projection on a reduced-order vector basis ...