A Nonlinear Mapping for Data Structure AnalysisA Nonlinear Mapping for Data Structure AnalysisSammon, J. W.1969
Paper summaryjoecohenThis paper presents what is known as `Sammon's mapping`. This method produces points in any $\mathbb{R}^n$ space using only a distance function between points. You can define any distance function $d^*$ that represents relationships between points. This function can even be non-symmetric. The power is that any relationship encoded into a distance function or distance matrix can be visualized.
For mapping $n$ points from some dimension in another the algorithm starts by generating $n$ random points in the space (called d-space) that you would like to map the points to. You can just pick these at random because they will be moved later.
The algorithm then performs gradient decent to minimize the *Sammon's stress* which can also be called the objective function.
$$
\text{Sammon's stress} = \frac{1}{
\sum\limits_{i<j} d^{*}_{ij}}
\sum_{i<j}
\frac{ ( d^{*}_{ij}-d_{ij})^2}
{d^{*}_{ij}}
$$
To minimize this objective function a partial derivative is taken with respect to each dimension of each point in d-space. For each dimension $y$ the distance between points $p$ and $q$ are modified using a scaled partial derivative and a learning rate. The paper calls this a "magic factor" MF but it is referred to today as a learning rate $\lambda$.
$$y_{pq}' = y_{pq}-\lambda \Delta_{pq}$$
The partial is scaled by the second derivative:
$$\Delta_{pq}=\left.\frac{\partial E}{\partial y_{pq}} \middle/ \frac{\partial^2 E}{\partial y_{pq}^2}\right.$$
Using the second derivative might be overkill for this. The objective function should also be minimized using only the first derivative. Possibly using new update rules for stochastic optimization like \cite{conf/colt/DuchiHS10} or \cite{journals/corr/KingmaB14} may be more efficient.
This paper presents what is known as `Sammon's mapping`. This method produces points in any $\mathbb{R}^n$ space using only a distance function between points. You can define any distance function $d^*$ that represents relationships between points. This function can even be non-symmetric. The power is that any relationship encoded into a distance function or distance matrix can be visualized.
For mapping $n$ points from some dimension in another the algorithm starts by generating $n$ random points in the space (called d-space) that you would like to map the points to. You can just pick these at random because they will be moved later.
The algorithm then performs gradient decent to minimize the *Sammon's stress* which can also be called the objective function.
$$
\text{Sammon's stress} = \frac{1}{
\sum\limits_{i<j} d^{*}_{ij}}
\sum_{i<j}
\frac{ ( d^{*}_{ij}-d_{ij})^2}
{d^{*}_{ij}}
$$
To minimize this objective function a partial derivative is taken with respect to each dimension of each point in d-space. For each dimension $y$ the distance between points $p$ and $q$ are modified using a scaled partial derivative and a learning rate. The paper calls this a "magic factor" MF but it is referred to today as a learning rate $\lambda$.
$$y_{pq}' = y_{pq}-\lambda \Delta_{pq}$$
The partial is scaled by the second derivative:
$$\Delta_{pq}=\left.\frac{\partial E}{\partial y_{pq}} \middle/ \frac{\partial^2 E}{\partial y_{pq}^2}\right.$$
Using the second derivative might be overkill for this. The objective function should also be minimized using only the first derivative. Possibly using new update rules for stochastic optimization like \cite{conf/colt/DuchiHS10} or \cite{journals/corr/KingmaB14} may be more efficient.