Back to Overview

Mapper - A discrete generalization of the Reeb graph

— 8 December 2014

Mapper is a construction that uses a given cluster-algorithm to associate a simplicial complex to a reference map on a given data set. It was introduced by Carlsson–Mémoli–Singh in (Singh et al., 2007) and lies at the core of the of the topological data analysis startup Ayasdi. A good reference, I personally enjoyed reading, is (Carlsson, 2009). Mapper is closely related to the Reeb graph of a real-valued function on a topological space. Just as the Reeb graph it is able to capture certain topological and geometrical features of the underlying space.

The nerve of a cover

Let $X$ be a topological space. We associate to each open cover \(\mathcal{U}=\{U_i\}_I\) a simplicial complex \(\check N(\mathcal{U})\) called the nerve of $\mathcal{U}$ defined as follows: there is a vertex $i$ for each $U_i$, and a $k$-simplex spanned by $i_0,…,i_k$ whenever the intersection $U_{i_0} \cap \ldots \cap U_{i_k}$ of the corresponding sets is nonempty.

Reference maps, filters, and lenses

Let $f:X \to Y$ be a continuous map between topological spaces $X$ and $Y$. Any open cover \(\mathcal{V}\) of $Y$ induces an open cover \(f^*\mathcal{V}\) of $X$ obtained by the pullback under $f$, i.e.

\[f^*\mathcal{V} := \big\{ f^{-1}(V) \mid {V \in \mathcal{V}} \big\}.\]

For example set $Y = \mathbb{R}$ covered by equally sized overlapping intervals, and think of $f$ as a height function on $X$. In the context of the Mapper construction these maps are called filters or lenses.

Cluster functions

For a given space $X$ let \(\mathcal{P}(X)\) denote its power set. Then we call an assignment

\[X \leadsto \pi(X)\subset \mathcal{P}(X),\]

that associates to a space $X$ a family $\pi(X)$ of subsets of $X$, a cluster function or cluster algorithm – note that this is not really a function in the mathematical sense, but rather in the sense of computer science and programming. We refer to an element in $\pi(X)$ as a cluster. We don’t require $\pi(X)$ to satisfy any properties at that point – it obviously would make sense to require $X = \dot\bigcup_{\pi(X)} U$ in the cluster context. From a programming perspective think of it as the signature of a function implementing a certain cluster algorithm.

Given a family of (open) sets \(\mathcal{U}=\{ U_i \}_I\) we define the associated family \(\pi_*(\mathcal{U})\) of cluster sets by applying $\pi$ to each set $U_i$ and collecting the results, i.e. we define

\[\pi_*(\mathcal{U}) := \bigcup_{i \in I} \pi(U_i).\]

We now have everything in place to define the Mapper-construction.

Put it all together

Let $X$ be a topological space (the data set), $\pi$ be a cluster algorithm, $f:X \to Y$ a reference map to a topological space $Y$, and $\mathcal{V}$ an open cover of $Y$. Then the result of Mapper applied to this triple is the simplicial complex $\mathcal{M}(\pi, f, \mathcal{V})$ defined by

\[\mathcal{M}(\pi, f, \mathcal{V}) := \big(\check{N} \circ \pi_* \circ f^* \big) (\mathcal{V}) = \check{N} \big( \pi_*(f^*\mathcal{V}) \big).\]

References

  1. Singh, G., Memoli, F., & Carlsson, G. (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In M. Botsch, R. Pajarola, B. Chen, & M. Zwicker (Eds.), Eurographics Symposium on Point-Based Graphics. The Eurographics Association. https://doi.org/10.2312/SPBG/SPBG07/091-100
  2. Carlsson, G. (2009). Topology and Data. Bull. Amer. Math. Soc., 46, 255–308.