*Ordering points to identify the clustering structure* (OPTICS) is a data clustering algorithm presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander in 1999 (Kriegel et al., 2011). It builds on the ideas of DBSCAN (Ester et al., 1996), which I described in a previous post. But let’s cut the intro and dive right into it.

# Idea

Let $(X,d)$ be a finite discrete metric space, i.e. let $X$ be a data set on which we have a notion of similarity expressed in terms of a suitable distance function $d$. Given a positive integer $m \in \mathbb{N}$ we can define a notion of *density* on points of our data set, which in turn can be used to define a *perturbed metric*. Given a starting point $x_0 \in X$ the OPTICS-algorithm iteratively defines a linear ordering on $X$ in terms of a filtration $X_0 \subset X_1 \subset \ldots \subset X_n$ of $X$, where $X_{n+1}$ is obtained from $X_n$ by appending the closest element (with respect to the perturbed metric) in its complement.

The original OPTICS-algorithm also takes an additional parameter $\varepsilon > 0$. However this is only needed to reduce the time complexity and is ignored in our discussion for now – to be more precise, we implicitly set $\varepsilon = \infty$.

# Definitions

Let $(X,d)$ be a finite discrete metric space and $m \in \mathbb{N}$ a positive integer. We define the **(co-)density $\delta_m(x)$ of a point $x$** by

where $\mathfrak{nn}_m(x) $ is an $m$-nearest-neighbor of $x$. Loosely speaking: the lower the value $\delta_m(x)$ the closer the neighbors of $x$ are distributed around $x$. Note that with $\varepsilon = \delta_m(x)$ the point $x$ is a *core point* in the sense of (Ester et al., 1996) – in a previous post about DBSCAN we called such a point *$\varepsilon$-dense*. That is why in the literature $\delta_m(x)$ is referred to as **core-distance**. I however like to think of it, and hence refer to it, as a notion of co-density.
Given the co-density $\delta_m(.)$ we can define the **reachability distance $r_x(y)$ of $y$ from $x$** by

Note that $r_x(y)$ is not symmetric, since the density of $x$ and $y$ may differ.

# Sketch of the algorithm

Choose a starting point $x_0 \in X$. Then we can iteratively define a filtration $X_0 \subset \ldots \subset X_n$ of the data set $X$ by

\[X_0 := \{ x_0 \} \ \ \text{and} \ \ X_{k+1} := X_k \cup \{ x_{k+1} \},\]where $x_{k+1}$ minimizes $r_{X_k}(.)$ over $X \setminus X_k$. In parallel we define a sequence $(r_n)$ of distances by

\[r_0 = 0 \ \ \text{and} \ \ r_{k+1} := r_{X_k}(x_{k+1}).\]Note that a small distance $r_k$ may be understood as $x_k$ being close to a rather dense region. Therefore the filtration tries to stay in dense regions for as long as possible, before it passes a less dense region. The cluster structure can now be extracted by analyzing the *reachability-plot*, that is a $2$-dimensional plot, with the ordered $x_k$ on the $x$-axis and the associated distances $r_k$ on the $y$-axis. By the above considerations it should be clear that clusters show up as valleys in the reachability plot. The deeper the valley, the denser the cluster.

# Example

Consider the data points $X\subset \mathbb{R}^2$ in the euclidian plane given in the following figure:

Below you see the reachability plot corresponding the OPTICS-algorithm applied to the above data set with $m=4$. The starting point is chosen among the group of points on the left.

# References

- Kriegel, H.-P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering.
*WIREs Data Mining and Knowledge Discovery*,*1*(3), 231–240. https://doi.org/https://doi.org/10.1002/widm.30 - Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.
*Proceedings of the Second International Conference on Knowledge Discovery and Data Mining*, 226–231.