Jonathan Ho, Ajay Jain, Pieter Abbeel
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion.
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models ...
Deep generative models of all kinds have recently exhibited high quality samples in a wide variety of data modalities. G...
Diffusion models [53] are latent variable models of the form $p_\theta(\mathbf{x}_0) := \int p_\theta(\mathbf{x}_{0:T}) ...
Diffusion models might appear to be a restricted class of latent variable models, but they allow a large number of degre...
We ignore the fact that the forward process variances $\beta_t$ are learnable by reparameterization and instead fix them...
Now we discuss our choices in $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_...
We assume that image data consists of integers in $\{0, 1, \ldots, 255\}$ scaled linearly to $[-1, 1]$. This ensures tha...
With the reverse process and decoder defined above, the variational bound, consisting of terms derived from Eqs. (12) an...
We set $T = 1000$ for all experiments so that the number of neural network evaluations needed during sampling matches pr...
Table 1 shows Inception scores, FID scores, and negative log likelihoods (lossless codelengths) on CIFAR10. With our FID...
In Table 2, we show the sample quality effects of reverse process parameterizations and training objectives (Section 3.2...
Table 1 also shows the codelengths of our CIFAR10 models. The gap between train and test is at most 0.03 bits per dimens...
We can interpolate source images $\mathbf{x}_0, \mathbf{x}_0' \sim q(\mathbf{x}_0)$ in latent space using $q$ as a stoch...
While diffusion models might resemble flows [9, 46, 10, 32, 5, 16, 23] and VAEs [33, 47, 37], diffusion models are desig...
We have presented high quality image samples using diffusion models, and we have found connections among diffusion model...
Our work on diffusion models takes on a similar scope as existing work on other types of deep generative models, such as...
Below is a derivation of Eq. (5), the reduced variance variational bound for diffusion models. This material is from Soh...
Our neural network architecture follows the backbone of PixelCNN++ [52], which is a U-Net [48] based on a Wide ResNet [7...
Our model architecture, forward process definition, and prior differ from NCSN [55, 56] in subtle but important ways tha...
Additional samples: Figure 11, 13, 16, 17, 18, and 19 show uncurated samples from the diffusion models trained on CelebA...