A gentle introduction to Graph based SLAM

This blog post introduces Simultaneous Localization and Mapping from first principles. It covers necessary topics such as Unimodal Gaussian distributions, Factor Graphs, Matrix Factorization techniques, and the Bayes Tree


The problem of Simultaneous Localization and Mapping (SLAM) has a rich history in robotics dating back to 1986 . Briefly, it is the problem of localizing – locating – the robot in an unknown map of the environment, a fundamental problem for any mobile robot. Since the location of the robot is specified against an unknown map, the localization task takes the form of estimating a piece-wise linear trajectory for the robot, as it moves around in the environment from its arbitrary origin. The map may be represented as a set of distinct landmarks (called landmark-based SLAM) or as a dense representation of the environment such as a voxel map (called dense SLAM).

Most recent literature (post 2000s) partition the problem into the frontend and backend.

When all the variables to be estimated are Special Euclidean \(\text{SE}(n)\) poses, and the sensors generate relative pose measurements, the optimization problem is called Pose Graph Optimization (PGO) or Synchronization over the Special Euclidean Group . If all the variables are Special orthogonal \(\text{SO}(n)\) rotations, then the problem is also called rotation averaging .

This article addresses the backend optimization and motivates the problem formulation from first principles.


Filtering vs Smoothing

Most methods prior to 2000s were predominantly filtering methods. In fact, most courses [1] [2] that deal with state estimation also start the course with filtering techniques.

When a robot moves around in an environment, it’s state space (the variables it estimates) grows, i.e., for every time-step, a new pose variable needs to be estimated. Loosely speaking, filtering methods only update the variables at the current time step, while smoothing methods may update all the variables, past and current. Chapter 3 and 4 of Probabilistic Robotics is a good treatment of Filtering methods relevant to the problem of SLAM.

Here, I address Smoothing techniques.

Gaussian Random Variables

A random variable is an outcome of a random event, and has an associated probability distribution. In our context, the act of measurement is the random event, which produces an output that is a random variable. A classic instance that is cited in many books is the example of an odometry measurement. Perhaps due to wheel slip or inconsistent hall-effect, or some other physical property, the sensor measurement may be noisy (random). In most cases, we expect the sensor to reproduce measurements faithful to its underlying state, but many a time, it may not. This property may be mathematically modeled by a Gaussian random variable.

\[\begin{align} \mathbf{z} &= h(\mathbf{x}) + \nu \\ \nu &\sim \mathcal{N}(0, \Sigma) \end{align}\]

Here, equation (1) means that the state \(\mathbf{x}\) is transformed by a measurement function \(h\) which is corrupted by inherent additive noise \(\nu\) in the sensor to produce a measurement \(\mathbf{z}\). We model this noise \(\nu\) as a random variable sampled from a Gaussian probability distribution. If \(h(\mathbf{x})\) were the true underlying measurement, then if we sampled from the sensor, we get noise added to the measurement and probability distribution of the noise as follows

\[\begin{align} \nu &\sim \mathbf{z} - h(\mathbf{x}) \\ p(\nu) &= \frac{1}{\sqrt{2 \pi \Sigma}} \exp( \| \nu \|^2_{\Sigma} ) \end{align}\]
Figure 1: The Normal distribution or the Gaussian probability distribution illustrated with different means ($ \mu $) and variances ($ \sigma^2 $). Image taken ingloriously from Wikipedia.

The choice of the probability distribution is rather curious. The Gaussian distribution is part of the exponential family of distributions, and has some convenient algebraic properties:

  1. It is fully described by its sufficient statistic. For instance, even though the Gaussian distribution has probability mass almost everywhere in its domain (infinite support), the distribution is fully described by its mean and covariance.
  2. The product of Gaussian distributions results in another Gaussian distribution i.e., the conjugate prior of a Gaussian distribution is another Gaussian.

Conditional Independence of Random variables

In Probability theory, two random variables \(\mathbf{x}\) and \(\mathbf{y}\) are said to be independent, if their joint distribution equals the product of their probabilities. Intuitively, it means that if the value of the random variable \(\mathbf{x}\) is observed, then the probability of \(\mathbf{y}\) is unaffected:

\[\begin{align} p(\mathbf{x}, \mathbf{y}) = p(\mathbf{x}) p(\mathbf{y}) \end{align}\]

Similarly, if we consider three random variables \(\mathbf{x}\), \(\mathbf{y}\) and \(\mathbf{z}\), then they are said to be conditionally independent, if \(\begin{align} p(\mathbf{x}, \mathbf{y} | \mathbf{z}) = p(\mathbf{x} | \mathbf{z}) p(\mathbf{y} | \mathbf{z}) \end{align}\)

Conditional independence in the context of SLAM is hopefully illustrated with the following example.

Example 1:

Consider a robot in a one-dimensional world as in Figure 2 (a). At time \(t_0\), if we know the location \(\mathbf{x}_0\) and \(\mathbf{x}_1\), then the robot samples from the odometry sensor a measurement \(\mathbf{z}_0\) of (say 50.3 m) as follows

\[\begin{align} \mathbf{z}_0 \sim h_{\text{odometry}}(\mathbf{x}_0, \mathbf{x}_1) + \nu_{\text{odometry}} \end{align}\]

Similarly, at time \(t=1\), if we know that the robot is at 50 meters from origin, and then it samples from the range sensor two measurements \(\mathbf{z}_1\) and \(\mathbf{z}_2\) as follows:

\[\begin{align} \mathbf{z}_1 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_0) + \nu_{\text{range}}\\ \mathbf{z}_2 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_1) + \nu_{\text{range}} \end{align}\]
Figure 2: Illustration of a 1-D robot that observes two landmarks after moving forward by 50 meters at timestep 1.

Figure 3(a) is a graphical representation of the above robot, where the state variables are denoted with large circles annotated with variable names, and measurements are denoted with edges with a filled dot.

Next, consider the selected portion – in dotted lines – of Figure 3(a) for simplicity. Here we see that, to sample measurements $\mathbf{z}_0$ and $\mathbf{z}_1$, we need to sample from the joint probability distribution i.e., \(p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)\).

\[p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) = p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) \\\]

Now consider the conditional distribution over the two measurements \(\mathbf{z}_0, \mathbf{z}_1\). From Equation (6) and (7) we know \(\mathbf{z}_0\) only depends on \(\mathbf{x}_0\) and \(\mathbf{x_1}\), and from Equation (8) a similar argument can be made for \(\mathbf{z_1}\). Therefore, we can safely assume the following:

\[\begin{align*} p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)~~~~\text{conditionally independent}\\ p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1) p(\mathbf{z}_1 | \mathbf{x}_1, \mathbf{l}_0)~~~~~~~~~~~~~~~\text{conditioned on redundant variables} \end{align*}\]

The above assumption of conditional independence between measurements given state variables is key in reducing the complexity of estimation.

Why Factor Graphs?

Factor graphs are bipartite graphs – graphs containing two types of random variables – that describe the factorization of a probability distribution. Factorizing a joint probability distribution amounts to finding the subsets of random variables that are conditionally independent of each other and therefore be written as a product.

For example 1, Figure 3(a) concisely represented the factorization of the joint probability distribution between all the random variables.

Figure 3: (a) Factor Graph representation for Example 1. (b) Equivalent factor graph illustrating the bipartite nature of the graph