The Monte Carlo Integration method uses random numbers to approximate the area of pretty much any shape we choose. The Metropolis algorithm {{ "metropolis1953equation" | cite }} is a slightly more advanced Monte Carlo method which uses random numbers to approximate a probability distribution:
where
In practice, it's often easy for us to know
One example of a complicated probability function arises when considering a physical system of
where
The physicist Ludwig Boltzmann {{ "ludwig_boltzmann_wiki" | cite }} discovered that when such a system is in equilibrium at some temperature
where the numerator is called the Boltzmann factor, and
We can see now that the probability density function is a difficult calculation, particularly because of
To see that
Let's assume that the particles interact, meaning that the position of one particle affects that of another.
This could be the case, for example, if all the particles were charged, and so they would be repelling or attracting each other.
This means that the energy
In most cases, there is no known analytical expression for the above integral, so it has to be done numerically.
To do so, imagine that we divide the $$1$$D line segment into only
What's really powerful about the Metropolis approach is that you don't need to know the probability function itself. Instead, you just need a function which is proportional to it. What this means for the Boltzmann distribution is that you only need to know the term,
The Metropolis algorithm can bypass the calculation of
Finally, the Metropolis algorithm can be modified or implemented in other methods, and forms the basis of many advanced sampling algorithms. The most popular is probably the Metropolis-Hastings algorithm {{ "hastings1970monte" | cite }} which is fundamentally the same. Some other algorithms that use this method are Metropolis-adjusted Langevin algorithm {{ "mala_wiki" | cite }}, and Hamiltonian Monte Carlo {{ "hmc_wiki" | cite }}, to name a few. They are often used for physical systems that follow a Boltzmann distribution.
In the rest of this chapter, we will look at $$1$$D examples to understand the Metropolis algorithm. Although the algorithm is not particularly efficient in just one dimension, it is much easier to understand in one dimension than in multiple dimensions. The Metropolis algorithm is very similar to a random walk, so let's first see how we can get a distribution from a random walk.
The dot in the figure above is a "walker", whose initial position is
The Metropolis algorithm works in a similar way to the random walk, but differs crucially in one way – after choosing a random step for the walker, a decision is made about whether to accept or reject the step based on the function
The
Although convergence occurs eventually, not all parts of the distribution achieve convergence quickly.
Note from the animation above, that the walker very quickly replicates the distribution of the two peaks on the left, but takes quite a while to even reach the third peak to the right.
This is because there is a long low probability region between the third peak and second peak that acts as a "barrier."
This may not necessarily be a bad thing – sometimes one might want to estimate how long something takes to transition from one state to another, and often these peaks represent such 'states'.
So averaging over many metropolis runs may give some estimate of these transition times.
If global sampling is the goal, the process of exploration could be sped up by choosing larger step sizes for the walker, for example by choosing step size
Now let's dive into the actual algorithm with some example code!
Let our target distribution be $$ P(x) = \frac{f(x)}{\int_{-10}^{10} f(x)}, $$
where
The code for defining this function is given below.
{% method %} {% sample lang="py" %} import:4-18, lang:"python" {% endmethod %}
Since this is an easy function to integrate, and hence get our target distribution
Next, we define our walker's symmetric step generating function.
As in the random walk example, we will use a random real number between
{% method %} {% sample lang="py" %} import:20-22, lang:"python" {% endmethod %}
However,
- If the function
$$g$$ is discrete, you will only sample discrete values. For example, if$$g$$ returns only$$-1$$ or$$+1$$ , and nothing in between, you will sample only integer steps away from the initial$$x_0$$ . - The average step size really matters! A small step-size means the walker will carefully sample nearby regions more, but will walk more slowly, so might not be good at exploring far and wide. On the other hand, a walker with a large step size may not sample nearby regions accurately – and actually has a higher chance of being rejected if the walker is already in a high probability region, since the acceptance ratio is more drastic for large steps. The effect of step-size on the walker's efficiency is far from obvious!
The question of how to choose an optimal
After choosing
{% method %} {% sample lang="py" %} import:70-71, lang:"python" {% endmethod %}
- Generate new proposed position
$$x' = x_t + g$$ . - Calculate the acceptance probability, $$ A = \min\left(1, \frac{f(x')}{f(x_t)}\right). $$
- Accept proposal,
$$x'$$ with probability$$A$$ . If your programming language doesn't have a built-in method for this,- Generate a random number
$$u$$ between$$0$$ and$$1$$ . - If $$ u \leq A $$, then accept move, and set new position,
$$x_{t+1} = x' $$ . - Otherwise, reject move, and set new position to current position,
$$x_{t+1} = x_t $$ .
- Generate a random number
- Increment
$$t \rightarrow t + 1$$ and repeat from step 1.
The code for steps 1 to 3 is:
{% method %} {% sample lang="py" %} import:34-42, lang:"python" {% endmethod %}
The following plot shows the result of running the algorithm for different numbers of iterations (
The following code puts everything together, and runs the Metropolis algorithm for a number of steps given by num_steps
.
All the positions visited by the algorithm are then written to a file, which can be later read and fed into a histogram or other density calculating scheme.
The code also incorporates a few tests of the algorithm using the test_metropolis_iterate
method.
This test will create a normalized density histogram from the generated data, and compare it to
{% method %} {% sample lang="py" %} import, lang:"python" {% endmethod %}
<script> MathJax.Hub.Queue(["Typeset",MathJax.Hub]); </script>{% references %} {% endreferences %}
The code examples are licensed under the MIT license (found in LICENSE.md).
-
The animation "Animated Random Walk" was created by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
-
The animation "Animated Metropolis" was created by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
-
The image "Plot of P(x)" was created by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
-
The image "Multiple Histograms" was created by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
The text of this chapter was written by K. Shudipto Amin and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
After initial licensing (#560), the following pull requests have modified the text or graphics of this chapter:
- none