A gentle introduction to Graph based SLAM

This blog post introduces Simultaneous Localization and Mapping from first principles. It covers necessary topics such as Unimodal Gaussian distributions, Factor Graphs, Matrix Factorization techniques, and the Bayes Tree

Introduction

The problem of Simultaneous Localization and Mapping (SLAM) has a rich history in robotics dating back to 1986 . Briefly, it is the problem of localizing – locating – the robot in an unknown map of the environment, a fundamental problem for any mobile robot. Since the location of the robot is specified against an unknown map, the localization task takes the form of estimating a piece-wise linear trajectory for the robot, as it moves around in the environment from its arbitrary origin. The map may be represented as a set of distinct landmarks (landmark-based SLAM) or as a dense representation of the environment such as a voxel map (dense SLAM).

Most recent literature (post 2000s) partition the problem into the frontend and backend.

When all the variables to be estimated are Special Euclidean \(\text{SE}(n)\) poses, and the sensors generate relative pose measurements, the optimization problem is called Pose Graph Optimization (PGO) or Synchronization over the Special Euclidean Group . If all the variables are Special orthogonal \(\text{SO}(n)\) rotations, then the problem is also called rotation averaging .

This article addresses the backend optimization and motivates the problem formulation from first principles.

Preliminaries

Filtering vs Smoothing

Most methods prior to 2000s, were predominantly filtering methods. In fact, most courses [1] [2] that deal with state estimation also start the course with filtering techniques.

When a robot moves around in an environment, it’s state space (the variables it estimates) grows, i.e., for every time-step a new pose variable needs to be estimated. Loosely speaking, filtering methods only update the variables at the current time step, while smoothing methods may update all variables, past and current. Chapter 3 and 4 of Probabilistic Robotics is a good treatment of Filtering methods relevant to the problem of SLAM.

Here, I address Smoothing techniques.

Gaussian Random Variables

A random variable is an outcome of a random event. In our context, the act of measurement is the random event, which produces an output that is a random variable. A classic instance that is cited in many books is the example of an odometry measurement. Perhaps due to wheel slip or inconsistent hall-effect, or some other physical property, the sensor measurement may be noisy (random). In most cases, we expect the sensor to reproduce measurements faithful to its underlying state, but many a time, it may not. This property may be mathematically modeled by a Gaussian random variable.

\[\begin{align} \mathbf{z} &= h(\mathbf{x}) + \nu \\ \nu &\sim \mathcal{N}(0, \Sigma) \end{align}\]

Here, equation (1) means that the state \(\mathbf{x}\) is transformed by a measurement function \(h\) which is corrupted by inherent additive noise \(\nu\) in the sensor to produce a measurement \(\mathbf{z}\). We model this noise \(\nu\) as a random variable sampled from a Gaussian probability distribution. If \(h(\mathbf{x})\) were the underlying true measurement, then if we sample from the sensor, we get noise and probability distribution of the noise as follows

\[\begin{align} \nu &\sim \mathbf{z} - h(\mathbf{x}) \\ p(\nu) &= \frac{1}{\sqrt{2 \pi \Sigma}} \exp( \| \nu \|^2_{\Sigma} ) \end{align}\]

The choice of the probability distribution is rather curious. The Gaussian distribution is part of the exponential family of distributions, and has some convenient algebraic properties:

  1. It is fully described by its sufficient statistic. For instance, even though the Gaussian distribution has probability mass almost everywhere in its domain (infinite support), the distribution is fully described by its mean and covariance.
  2. The product of Gaussian distributions results in another Gaussian distribution i.e., the conjugate prior of a Gaussian distribution is another Gaussian.

Conditional Independence of Random variables

In Probability theory, two random variables \(\mathbf{x}\) and \(\mathbf{y}\) are said to be independent, if their joint distribution equals the product of their probabilities. Intuitively, it means that if the value of \(\mathbf{x}\) is observed, then the probability of \(\mathbf{y}\) is unaffected.

\[\begin{align} p(\mathbf{x}, \mathbf{y}) = p(\mathbf{x}) p(\mathbf{y}) \end{align}\]

Similarly, if we consider three random variables \(\mathbf{x}\), \(\mathbf{y}\) and \(\mathbf{z}\), then they are said to be conditionally independent, if \(\begin{align} p(\mathbf{x}, \mathbf{y} | \mathbf{z}) = p(\mathbf{x} | \mathbf{z}) p(\mathbf{y} | \mathbf{z}) \end{align}\)

Conditional independence in the context of SLAM is hopefully illustrated with the following example:

Consider a robot in a one dimensional world as in Figure 1 (a). At time \(t_0\), if we know the location \(\mathbf{x}_0\) and \(\mathbf{x}_1\), then the robot samples from the odometry sensor a measurement \(\mathbf{z}_0\) of (say 50.3 m) as follows

\[\begin{align} \mathbf{z}_0 \sim h_{\text{odometry}}(\mathbf{x}_0, \mathbf{x}_1) + \nu_{\text{odometry}} \end{align}\]

Similarly, at time \(t=1\), if we know that the robot is at 50 meters from origin, and then it samples from the range sensor two measurements \(\mathbf{z}_1\) and \(\mathbf{z}_2\) as follows:

\[\begin{align} \mathbf{z}_1 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_0) + \nu_{\text{range}}\\ \mathbf{z}_2 \sim h_{\text{range}}(\mathbf{x}_1, \mathbf{l}_1) + \nu_{\text{range}} \end{align}\]
Figure 1: (a) Illustration of a 1-D robot that observes two landmarks after moving forward by 50 meters at timestep 1. (b) Graphical model representation of the 1-D robot setting.

Next, consider the selected portion – in dotted lines – of Figure 1(b) for simplicity. Here we see that, to sample a measurement, we need to sample from the joint probability distribution i.e., \(p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)\).

\[p(\mathbf{z}_0, \mathbf{z}_1, \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) = p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) \\\]

Now consider the conditional probability distribution over the two measurements \(\mathbf{z}_0, \mathbf{z}_1\). From Equation (5) and (4) we know \(\mathbf{z}_0\) only depends on \(\mathbf{x}_0\) and \(\mathbf{x_1}\), and from Equation (6) a similar argument can be made for \(\mathbf{z_1}\). Therefore we can safely assume the following:

\[\begin{align*} p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) p(\mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0)~~~~\text{conditionally independent}\\ p(\mathbf{z}_0, \mathbf{z}_1 | \mathbf{x}_0, \mathbf{x}_1, \mathbf{l}_0) &= p(\mathbf{z}_0 | \mathbf{x}_0, \mathbf{x}_1) p(\mathbf{z}_1 | \mathbf{x}_1, \mathbf{l}_0)~~~~~~~~~~~~~~~\text{conditioned on redundant variables} \end{align*}\]

The above assumption of conditional independence between measurements given state variables is key in reducing the complexity of estimation. It can also be observed that Figure 1(b) implicitly encodes the conditional independence between the measurements.

MAP estimation over Factor Graphs

Why Factor Graphs?

This theme supports rendering beautiful math in inline and display modes using MathJax 3 engine. You just need to surround your math expression with $$, like $$ E = mc^2 $$. If you leave it inside a paragraph, it will produce an inline expression, just like \(E = mc^2\).

To use display mode, again surround your expression with $$ and place it as a separate paragraph. Here is an example:

\[\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)\]

Note that MathJax 3 is a major re-write of MathJax that brought a significant improvement to the loading and rendering speed, which is now on par with KaTeX.


Citations

Citations are then used in the article body with the <d-cite> tag. The key attribute is a reference to the id provided in the bibliography. The key attribute can take multiple ids, separated by commas.

The citation is presented inline like this: (a number that displays more information on hover). If you have an appendix, a bibliography is automatically created and populated in it.

Distill chose a numerical inline citation style to improve readability of citation dense articles and because many of the benefits of longer citations are obviated by displaying more information on hover. However, we consider it good style to mention author last names if you discuss something at length and it fits into the flow well — the authors are human and it’s nice for them to have the community associate them with their work.


Footnotes

Just wrap the text you would like to show up in a footnote in a <d-footnote> tag. The number of the footnote will be automatically generated.This will become a hoverable footnote.


Code Blocks

Syntax highlighting is provided within <d-code> tags. An example of inline code snippets: <d-code language="html">let x = 10;</d-code>. For larger blocks of code, add a block attribute:

var x = 25; function(x) { return x * x; }

Note: <d-code> blocks do not look well in the dark mode. You can always use the default code-highlight using the highlight liquid tag:

var x = 25;
function(x) {
  return x * x;
}

Layouts

The main text column is referred to as the body. It is the assumed layout of any direct descendants of the d-article element.

.l-body

For images you want to display a little larger, try .l-page:

.l-page

All of these have an outset variant if you want to poke out from the body text a little bit. For instance:

.l-body-outset

.l-page-outset

Occasionally you’ll want to use the full browser width. For this, use .l-screen. You can also inset the element a little from the edge of the browser by using the inset variant.

.l-screen

.l-screen-inset

The final layout is for marginalia, asides, and footnotes. It does not interrupt the normal flow of .l-body sized text except on mobile screen sizes.

.l-gutter


Other Typography

Emphasis, aka italics, with asterisks (*asterisks*) or underscores (_underscores_).

Strong emphasis, aka bold, with asterisks or underscores.

Combined emphasis with asterisks and underscores.

Strikethrough uses two tildes. Scratch this.

  1. First ordered list item
  2. Another item ⋅⋅* Unordered sub-list.
  3. Actual numbers don’t matter, just that it’s a number ⋅⋅1. Ordered sub-list
  4. And another item.

⋅⋅⋅You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we’ll use three here to also align the raw Markdown).

⋅⋅⋅To have a line break without a paragraph, you will need to use two trailing spaces.⋅⋅ ⋅⋅⋅Note that this line is separate, but within the same paragraph.⋅⋅ ⋅⋅⋅(This is contrary to the typical GFM line break behaviour, where trailing spaces are not required.)

I’m an inline-style link

I’m an inline-style link with title

I’m a reference-style link

I’m a relative reference to a repository file

You can use numbers for reference-style link definitions

Or leave it empty and use the link text itself.

URLs and URLs in angle brackets will automatically get turned into links. http://www.example.com or http://www.example.com and sometimes example.com (but not on Github, for example).

Some text to show that the reference links can follow later.

Here’s our logo (hover to see the title text):

Inline-style: alt text

Reference-style: alt text

Inline code has back-ticks around it.

1
2
var s = "JavaScript syntax highlighting";
alert(s);
1
2
s = "Python syntax highlighting"
print s
1
2
No language indicated, so no syntax highlighting.
But let's throw in a <b>tag</b>.

Colons can be used to align columns.

Tables Are Cool
col 3 is right-aligned $1600
col 2 is centered $12
zebra stripes are neat $1

There must be at least 3 dashes separating each header cell. The outer pipes (|) are optional, and you don’t need to make the raw Markdown line up prettily. You can also use inline Markdown.

Markdown Less Pretty
Still renders nicely
1 2 3

Blockquotes are very handy in email to emulate reply text. This line is part of the same quote.

Quote break.

This is a very long line that will still be quoted properly when it wraps. Oh boy let’s keep writing to make sure this is long enough to actually wrap for everyone. Oh, you can put Markdown into a blockquote.

Here’s a line for us to start with.

This line is separated from the one above by two newlines, so it will be a separate paragraph.

This line is also a separate paragraph, but… This line is only separated by a single newline, so it’s a separate line in the same paragraph.