A framework for sampling from posterior distributions of parameters of artificial neural networks is presented. The idea is to couple the posterior with an auxiliary random variable such that both the forward distribution (of auxiliary variables given the parameters) and the reverse distribution (of parameters given the auxiliary variables) are fast to sample. Particularly useful is the case that the conditional distribution of the auxiliary variables given the parameters is Gaussian with independent coordinates. We show in our construction that the reverse distribution of parameters given the auxiliary variables is log-concave. This permits accurate computation of the score which is the gradient of the log of the density of the auxiliary random variables. Using this score as the drift function one may run a stochastic (Langevin) diffusion to sample from the auxiliary distribution. Then a draw from the log-concave conditional for the parameters permits sampling from their posterior distribution. Along with these algorithmic developments we present corresponding bounds for statistical risk and for online learning regret for predictions based on these neural network fits.
This is based on joint work with Curtis McDonald of Yale University.
Personal website of Andrew R. Barron