Stein Variational Inference: Approximate Learning and Inference with Stein's Method

Machine learning and statistics are essentially about matching observed data and mathematical models (typically probabilistic models). Hence, mathematical tools that measure and optimize notions of discrepancies between data and distributions play a central role in all aspects of statistical learning, including estimating parameters and models from empirical data, simulating and reasoning with given models (e.g., Bayesian posterior inference), and evaluating the goodness-of-fit between data and models (evaluation); see Figure 1 left. However, as the scale and complexity of the probabilistic models that empower modern AI/ML/statistical problems grow rapidly and dramatically, the classical discrepancy tools, many of which center around Kullback- Leibler (KL) divergence, become computationally infeasible.

Stein variational inference, consisting of kernelized Stein discrepancy (KSD) and Stein variational gradient descent (SVGD), is a collection of computational devices for discrepancy evaluation and optimization for intractable models. It draws key insights from Stein's method due to Charles M. Stein, which is a set of remarkably powerful theoretical techniques originally developed for proving approximation and limit theorems in probability theory. Stein variational methods turn Stein’s method into practical computational tools that can be used to handle complex data and models in computational statistics and machine learning.

alt text 

Figure 1. Left: Probabilistic learning, inference, evaluation tasks can be viewed as evaluating or minimizing discrepancies between data and distributions. Kernelized Stein discrepancy (KSD) provides a unified tool useful for all the three tasks on unnormalized distributions. Right: Key theoretical concepts in Stein variational inference (lower panel), which can be viewed as a "kernelized'' counterpart of the classical Wasserstein gradient flow theory of Langevin Monte Carlo (upper panel).

Kernelized Stein Discrepancy

Kernelized Stein discrepancy (KSD), based on combining the classical Stein discrepancy with reproducing kernel Hilbert space (RKHS), allows us to access the compatibility between empirical data and probabilistic distributions, and provides a powerful tool for developing algorithms for model evaluation (goodness-of-fit test), as well as learning and inference in general. Unlike the traditional divergence measures (such as KL, Chi-square divergence), KSD does not require to evaluate the normalization constant of the distribution, and can be applied even for the intractable, unnormalized distributions widely used in modern machine learning and statistics.

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation

Liu, Lee, Jordan; ICML, 2016, [A short note], [code: matlab, R ]

[See more details here>>].

Stein Variational Gradient Descent

Stein variational gradient descent (SVGD) is a nonparametric variational inference algorithm that mixes the advantages of variational inference, Monte Carlo, quasi Monte Carl and gradient based optimization, based on exploiting an interesting connection between Stein discrepancy and KL divergence. It is a deterministic sampling algorithm that evolves a set of interacting particles to form sample efficient particle approximation to the given distribution. It is both 1) a gradient flow of KL divergence w.r.t. a ‘‘kernelized’’ Wasserstein metric on the space of distributions, and 2) a numerical quadrature method which arranges a set of points to match the expectation of a set of basis functions induced from Stein operator.

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Liu, Wang; NeurIPS, 2016 [code]

Stein Variational Gradient Descent as Gradient Flow

Liu; NeurIPS, 2017

Stein Variational Gradient Descent as Moment Matching

Liu, Wang; NeurIPS, 2018

[See more details here>>].

Slides/Notes

Probabilistic Learning and Inference Using Stein's Method [slides, slides]

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation [ICML2016 slides]

Short notes:

A Short Note on Kernelized Stein Discrepancy, 2016

Stein Variational Gradient Descent: Theory and Applications, 2016

Learning to Sample Using Stein Discrepancy, 2016

Papers

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation

Liu, Lee, Jordan; ICML, 2016 [code: matlab, R ]

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Liu, Wang; NeurIPS, 2016 [code]

Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning

Wang, Liu; preprint 2016 [code]

Two methods for Wild Variational Inference

Liu, Feng; preprint, 2016

Black-box Importance Sampling

Liu, Lee; AISTATS, 2017

Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE

Liu, Wang, 2017

Learning to Draw Samples with Amortized Stein Variational Gradient Descent

Feng et al. UAI 2017

Stein Variational Gradient Descent as Gradient Flow

Liu; NeurIPS, 2017

Stein Variational Policy Gradient

Yang et al. UAI. 2017

Stein Variational Gradient Descent as Moment Matching

Liu; NeurIPS, 2018

Stein Variational Gradient Descent Without Gradient

Han, Liu; ICML, 2018

Goodness-of-Fit Testing for Discrete Distributions via Stein Discrepancy

Yang et al. ICML 2018

Stein variational gradient descent with matrix-valued kernels

Wang, Tang, Bajaj, Liu, NeurIPS, 2019

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models

Wang, Liu, ICML, 2019

Learning Self-Imitating Diverse Policies

Gangwani et al. ICLR. 2019

Stein Variational Inference for Discrete Distributions

Han, et al. AISTATS, 2020

Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent

Liu et al. NeurIPS 2021

Sampling with Trustworthy Constraints: A Variational Gradient Framework

Liu et al. NeurIPS 2021

Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent

Zhang et al. NeurIPS 2022

Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments

Anastasiou et al. 2022