This blog post introduces Simultaneous Localization and Mapping from first principles. It covers necessary topics such as Unimodal Gaussian distributions, Factor Graphs, Matrix Factorization techniques, and the Bayes Tree
The problem of Simultaneous Localization and Mapping (SLAM) has a rich history in robotics dating back to 1986
Most recent literature (post 2000s) partition the problem into the frontend and backend.
When all the variables to be estimated are Special Euclidean \(\text{SE}(n)\) poses, and the sensors generate relative pose measurements, the optimization problem is called Pose Graph Optimization (PGO)
This article addresses the backend optimization and motivates the problem formulation from first principles.
Most methods prior to 2000s were predominantly filtering methods. In fact, most courses [1] [2] that deal with state estimation also start the course with filtering techniques.
When a robot moves around in an environment, itβs state space (the variables it estimates) grows, i.e., for every time-step, a new pose variable needs to be estimated. Loosely speaking, filtering methods only update the variables at the current time step, while smoothing methods may update all the variables, past and current. Chapter 3 and 4 of Probabilistic Robotics
Here, I address Smoothing techniques.
A random variable is an outcome of a random event, and has an associated probability distribution. In our context, the act of measurement is the random event, which produces an output that is a random variable. A classic instance that is cited in many books
Here, equation (1) means that the state \(\mathbf{x}\) is transformed by a measurement function \(h\) which is corrupted by inherent additive noise \(\nu\) in the sensor to produce a measurement \(\mathbf{z}\). We model this noise \(\nu\) as a random variable sampled from a Gaussian probability distribution. If \(h(\mathbf{x})\) were the true underlying measurement, then if we sampled from the sensor, we get noise added to the measurement and probability distribution of the noise as follows
\[\begin{align} \nu &\sim \mathbf{z} - h(\mathbf{x}) \\ p(\nu) &= \frac{1}{\sqrt{2 \pi \Sigma}} \exp( \| \nu \|^2_{\Sigma} ) \end{align}\]The choice of the probability distribution is rather curious. The Gaussian distribution is part of the exponential family of distributions, and has some convenient algebraic properties:
In Probability theory, two random variables \(\mathbf{x}\) and \(\mathbf{y}\) are said to be independent, if their joint distribution equals the product of their probabilities. Intuitively, it means that if the value of the random variable \(\mathbf{x}\) is observed, then the probability of \(\mathbf{y}\) is unaffected:
\[\begin{align} p(\mathbf{x}, \mathbf{y}) = p(\mathbf{x}) p(\mathbf{y}) \end{align}\]Similarly, if we consider three random variables \(\mathbf{x}\), \(\mathbf{y}\) and \(\mathbf{z}\), then they are said to be conditionally independent, if \(\begin{align} p(\mathbf{x}, \mathbf{y} | \mathbf{z}) = p(\mathbf{x} | \mathbf{z}) p(\mathbf{y} | \mathbf{z}) \end{align}\)
Conditional independence in the context of SLAM is hopefully illustrated with the following example.
Consider a robot in a one-dimensional world as in Figure 2 (a). At time \(t_0\), if we know the location \(\mathbf{x}_0\) and \(\mathbf{x}_1\), then the robot samples from the odometry sensor a measurement \(\mathbf{z}_0\) of (say 50.3 m) as follows
\[\begin{align} \mathbf{z}_0 \sim h_{\text{odometry}}(\mathbf{x}_0, \mathbf{x}_1) + \nu_{\text{odometry}} \end{align}\]Similarly, at time \(t=1\), if we know that the robot is at 50 meters from origin, and then it samples from the range sensor two measurements \(\mathbf{z}_1\) and \(\mathbf{z}_2\) as follows:
\[\begin{align} \mathbf{z}_1 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_0) + \nu_{\text{range}}\\ \mathbf{z}_2 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_1) + \nu_{\text{range}} \end{align}\]Figure 3(a) is a graphical representation of the above robot, where the state variables are denoted with large circles annotated with variable names, and measurements are denoted with edges with a filled dot.
Next, consider the selected portion β in dotted lines β of Figure 3(a) for simplicity. Here we see that, to sample measurements $\mathbf{z}_0$ and $\mathbf{z}_1$, we need to sample from the joint probability distribution i.e., \(p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)\).
\[p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) = p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) \\\]Now consider the conditional distribution over the two measurements \(\mathbf{z}_0, \mathbf{z}_1\). From Equation (6) and (7) we know \(\mathbf{z}_0\) only depends on \(\mathbf{x}_0\) and \(\mathbf{x_1}\), and from Equation (8) a similar argument can be made for \(\mathbf{z_1}\). Therefore, we can safely assume the following:
\[\begin{align*} p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)~~~~\text{conditionally independent}\\ p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1) p(\mathbf{z}_1 | \mathbf{x}_1, \mathbf{l}_0)~~~~~~~~~~~~~~~\text{conditioned on redundant variables} \end{align*}\]The above assumption of conditional independence between measurements given state variables is key in reducing the complexity of estimation.
Factor graphs are bipartite graphs β graphs containing two types of random variables β that describe the factorization of a probability distribution. Factorizing a joint probability distribution amounts to finding the subsets of random variables that are conditionally independent of each other and therefore be written as a product.
For example 1, Figure 3(a) concisely represented the factorization of the joint probability distribution between all the random variables.