Hamiltonian Monte Carlo: A Practical Tutorial

by Faj Lennon 46 views

Hamiltonian Monte Carlo (HMC) is a powerful Markov Chain Monte Carlo (MCMC) algorithm that leverages Hamiltonian dynamics to efficiently explore complex probability distributions. Guys, if you're looking to level up your Bayesian inference game, understanding HMC is a must! This tutorial will guide you through the core concepts of HMC, its advantages, and how you can implement it. We'll break down the math, explain the intuition, and provide practical examples to get you started.

What is Hamiltonian Monte Carlo?

At its heart, Hamiltonian Monte Carlo is a clever algorithm designed to sample from a target probability distribution, often a posterior distribution in Bayesian statistics. Unlike simpler MCMC methods like Metropolis or Gibbs sampling, HMC uses information about the gradient of the target distribution to propose new states more efficiently. This leads to faster convergence and better exploration of the sample space, especially in high-dimensional problems.

The Hamiltonian Framework

The magic of HMC lies in borrowing concepts from physics, specifically Hamiltonian dynamics. Imagine a particle moving on a potential energy surface. The potential energy is related to the negative log of our target probability distribution. The particle's movement is also governed by its kinetic energy, which depends on its momentum. The total energy of the system, which is the sum of potential and kinetic energy, is called the Hamiltonian.

Mathematically, the Hamiltonian H(q, p) is defined as:

H(q, p) = U(q) + K(p)

Where:

  • q represents the position (our parameters of interest).
  • p represents the momentum (auxiliary variables).
  • U(q) is the potential energy (related to the negative log of the target distribution).
  • K(p) is the kinetic energy (typically defined as pTM-1p / 2, where M is a mass matrix).

The key idea is that if we simulate the motion of this particle according to Hamilton's equations, we can generate new samples that are likely to have high probability under our target distribution. Hamilton's equations describe how the position and momentum of the particle change over time:

  • dq/dt = ∂H/∂p
  • dp/dt = -∂H/∂q

Why Hamiltonian Dynamics?

Why go through all this physics stuff? The beauty of Hamiltonian dynamics is that it conserves energy. This means that, in theory, if we start with a particular energy level, the particle will stay on that energy level as it moves. In the context of HMC, this translates to proposing new states that have similar probability densities to the current state, making them more likely to be accepted. This efficient proposal mechanism allows HMC to move through the sample space much more quickly than other MCMC methods.

The Leapfrog Integrator

In practice, we can't solve Hamilton's equations analytically for complex problems. Instead, we use a numerical integration method called the leapfrog integrator. The leapfrog integrator is a symplectic integrator, which means it approximately preserves the energy of the system over time. This is crucial for the accuracy and stability of HMC. The leapfrog integrator updates the position and momentum in a staggered fashion:

  1. Update momentum halfway: p(t + ε/2) = p(t) - (ε/2) * ∂U/∂q(q(t))
  2. Update position: q(t + ε) = q(t) + ε * M-1p(t + ε/2)
  3. Update momentum the other half: p(t + ε) = p(t + ε/2) - (ε/2) * ∂U/∂q(q(t + ε))

Where ε is the step size. By repeating these steps L times, we simulate the Hamiltonian dynamics for a trajectory of length Lε. This trajectory gives us a new proposed state.

Acceptance/Rejection Step

Even with the leapfrog integrator, there's still a chance that the numerical integration will introduce some error and violate energy conservation. To correct for this, HMC includes an acceptance/rejection step based on the Metropolis criterion. We calculate the change in the Hamiltonian (the energy) between the initial state and the proposed state:

ΔH = H(qnew, pnew) - H(qold, pold)

The acceptance probability is then given by:

α = min(1, exp(-ΔH))

We accept the proposed state with probability α. If the proposed state is rejected, we stay at the current state. This acceptance/rejection step ensures that the HMC algorithm samples from the correct target distribution.

Advantages of Hamiltonian Monte Carlo

So, why should you use HMC over other MCMC methods? Here are some key advantages:

  • Faster Convergence: HMC's use of gradient information allows it to explore the sample space more efficiently, leading to faster convergence, especially in high-dimensional problems. This is because it avoids the random walk behavior of simpler methods.
  • Better Exploration: HMC is less likely to get stuck in local modes of the target distribution. The Hamiltonian dynamics help it to traverse energy barriers and explore different regions of the sample space more effectively. The leapfrog integrator helps maintain the trajectory in the correct region.
  • Reduced Autocorrelation: HMC typically produces samples with lower autocorrelation than other MCMC methods. This means that the samples are more independent, which leads to more accurate estimates of the target distribution.
  • Scalability: While HMC can be computationally expensive for very high-dimensional problems, it generally scales better than other MCMC methods. Techniques like stochastic gradient HMC can further improve its scalability.

In essence, HMC is a powerful tool that can significantly improve the efficiency and accuracy of Bayesian inference. By leveraging Hamiltonian dynamics, it overcomes many of the limitations of traditional MCMC methods.

Implementing Hamiltonian Monte Carlo

Now, let's get our hands dirty and see how to implement HMC. We'll use Python and the NumPy library for numerical computation. Guys, don't worry if you're not a Python expert; the code is relatively straightforward and well-commented.

Example: Sampling from a Gaussian Distribution

Let's start with a simple example: sampling from a Gaussian distribution with mean μ = 0 and standard deviation σ = 1. First, we need to define the potential energy function, which is related to the negative log of the Gaussian probability density:

import numpy as np

def potential_energy(q, mu=0, sigma=1):
    return 0.5 * ((q - mu) / sigma)**2

def gradient_potential_energy(q, mu=0, sigma=1):
    return (q - mu) / sigma**2

Next, we need to implement the leapfrog integrator:

def leapfrog_integrator(q, p, grad_U, step_size, L, mass=1):
    q_new = q
    p_new = p

    # Half step for momentum
    p_new = p_new - (step_size / 2) * grad_U(q_new)

    for _ in range(L - 1):
        # Full step for position
        q_new = q_new + step_size * p_new / mass
        # Full step for momentum
        p_new = p_new - step_size * grad_U(q_new)

    # Full step for position
    q_new = q_new + step_size * p_new / mass
    # Half step for momentum
    p_new = p_new - (step_size / 2) * grad_U(q_new)

    return q_new, p_new

Finally, we can implement the HMC algorithm:

def hamiltonian_monte_carlo(U, grad_U, epsilon, L, num_samples, initial_q, mass=1):
    samples = np.zeros(num_samples)
    q = initial_q

    for i in range(num_samples):
        # Sample random momentum
        p = np.random.normal(0, np.sqrt(mass))
        q_current = q
        p_current = p

        # Perform leapfrog integration
        q_new, p_new = leapfrog_integrator(q_current, p_current, grad_U, epsilon, L, mass)

        # Compute Hamiltonian change
        H_current = U(q_current) + 0.5 * (p_current**2) / mass
        H_new = U(q_new) + 0.5 * (p_new**2) / mass
        dH = H_new - H_current

        # Accept or reject the sample
        alpha = min(1, np.exp(-dH))
        if np.random.rand() < alpha:
            q = q_new

        samples[i] = q

    return samples

Now, let's run the HMC algorithm and plot the results:

# Set parameters
epsilon = 0.1  # Step size
L = 10         # Number of leapfrog steps
num_samples = 10000  # Number of samples
initial_q = 0.0  # Initial position

# Run HMC
samples = hamiltonian_monte_carlo(potential_energy, gradient_potential_energy, epsilon, L, num_samples, initial_q)

# Plot the results
import matplotlib.pyplot as plt

plt.hist(samples, bins=50, density=True)
plt.title("HMC Samples from Gaussian Distribution")
plt.xlabel("x")
plt.ylabel("Density")
plt.show()

This code will generate a histogram of the samples drawn from the Gaussian distribution using HMC. You should see that the histogram closely resembles the shape of a Gaussian distribution with mean 0 and standard deviation 1. This demonstrates that the HMC algorithm is working correctly.

Tuning HMC Parameters

The performance of HMC depends crucially on the choice of two parameters: the step size ε and the number of leapfrog steps L. Tuning these parameters can be challenging, but here are some general guidelines:

  • Step Size (ε): A larger step size allows the algorithm to explore the sample space more quickly, but it can also lead to larger errors in the numerical integration and a higher rejection rate. A smaller step size reduces the integration error but can slow down the exploration. A good starting point is to choose a step size that results in an acceptance rate between 0.6 and 0.9.
  • Number of Leapfrog Steps (L): The number of leapfrog steps determines the length of the trajectory. A longer trajectory allows the algorithm to explore more of the sample space in each iteration, but it also increases the computational cost. A shorter trajectory can lead to more random walk behavior. A good rule of thumb is to choose L such that the trajectory covers a significant portion of the sample space without becoming too computationally expensive.

Automated methods for tuning these parameters, such as dual averaging, are often used in practice. These methods adapt the step size and number of leapfrog steps during the sampling process to optimize the performance of the HMC algorithm.

Conclusion

Hamiltonian Monte Carlo is a powerful and versatile MCMC algorithm that can significantly improve the efficiency and accuracy of Bayesian inference. By leveraging Hamiltonian dynamics, HMC overcomes many of the limitations of traditional MCMC methods. While implementing and tuning HMC can be challenging, the benefits in terms of faster convergence, better exploration, and reduced autocorrelation make it a valuable tool for any statistician or data scientist. So, guys, dive in, experiment, and unlock the power of HMC for your own Bayesian adventures!