t-SNE Intuition



Like PCA, t-SNE, or t-distributed Stochastic Neighborhood Embedding, is a visualization and dimensionality reduction algorithm. However, unlike PCA, t-SNE is a highly advanced State of the Art (SOTA) algorithm which beautifully summarizes high dimensional data into interpretable visualizations, a feat PCA can barely achieve especially on clustered data.

Courtesy: distill.pub


t-SNE is a relatively new algorithm formulated in 2008, and thus, uses very advanced mathematics involving kernels and statistical distributions. To be able to understand the mathematical intuition behind t-SNE, a very clear concept of advanced mathematics is mandatory. Thus, within the scope of this article, the mathematics behind the t-SNE architecture will be avoided and will be dealt with in another post.

PCA tries to preserve the global structure of data points (Refer to this article: ) PCA’s main objective does not lie with preserving the relative distance between points but with the overall variance along axes.

On the other hand, t-SNE aims to preserve local structure. This implies that it cares more about the relative distance between points rather than the overall variance.

t-SNE works by creating embeddings of points on a higher dimension to a lower dimension (2 dimensions for visualization purposes). Since its prime objective is preserving local structure, it effectively takes care of neighborhoods but fails to preserve distances between different neighborhoods. Also, the overall structure/size of the neighborhood cannot be preserved by t-SNE.
A prime proof against validity of the t-SNE concept is the Crowding-Problem. This problem arises when points are distributed in the corners of a square, cube or hyper cube. Considering a square, where two opposite points are different neighborhoods, and two simultaneous corners are in the same neighborhood, we aim to reduce the dimensionality from 2-D to 1-D.


To execute this, we place three points successfully. However, when we try to place the 4th point (X4), the distance between the 4th points and its two neighbors (X2, X3) cannot be preserved since these neighbors are shared by the opposing corner point (X1) and thus, are placed on either side of X1.
Thus, it proves that sometimes it is impossible to preserve distances in neighborhoods when the data points are placed as above.

The design of Stochastic Neighborhood Embedding gave rise to the Crowding problem, and t-Distribution was used to resolve this very problem. It does not guarantee complete resolution of the problem but tries to achieve the best possible result.

t-SNE is an iterative algorithm, which means it keeps adjusting the embeddings until a stable structure is achieved in the required dimension.

Two most important parameters of t-SNE are:


i)                    Perplexity: It is roughly the number of neighbors which are required to be preserved while altering dimensions.

·         Low perplexity: Results are ambiguous and not useful.
·         Increases gradually: Results improve and a certain stability is achieved in the embedding structure.
·         High Perplexity (greater than # of samples): messy plot, cannot provide insights

ii)                   Steps: Number of iterations for running t-SNE algorithm.

Both these parameters need to be adjusted several times to get a consistent result. Refer to this live example of t-SNE to understand the functioning of these parameters visually.

t-SNE is a probabilistic algorithm which means that every time it is ran, it will give tend to give a slightly different result.

t-SNE’s incompatibility with cluster analysis:


v  Cluster size in a t-SNE representation does not mean much
v  A sparse and dense cluster in the original dimension might end up with similar sized cluster representations in the lower dimension. This is because it is a property of the t-SNE algorithm to contract sparse points and expand dense points.
v  Thus, looking at the t-SNE representation of clusters one cannot conclude anything about the cluster size or even the distance between the clusters (t-SNE does not preserve distance between different neighborhoods)

Summarizing:

  • Never conclude after running t-SNE only once with a fixed perplexity or step parameter.
  • The perplexity parameter must always be between 2 and the number of samples/points.
  •  Expect varying, but consistent in characteristic, results every time the algorithm runs due to its probabilistic nature.
  • The algorithm groups points based on visual similarity when it comes to image processing.
  • t-SNE cannot be used to interpret cluster size and intra-cluster distances.


Comments

Brands Worked with or Featured On

Brands Worked with or Featured On

Popular Posts