Denoising Diffusion Probabilistic Models

Jonathan Ho; Ajay Jain; Pieter Abbeel

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

Abstract

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion.

AI explanations are being prepared (some are ready below)... Refresh in a moment to see them.

Abstract

p.1

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models ...

1 Introduction

p.1

Deep generative models of all kinds have recently exhibited high quality samples in a wide variety of data modalities. G...

2 Background

Mathp.2

Diffusion models [53] are latent variable models of the form $p_\theta(\mathbf{x}_0) := \int p_\theta(\mathbf{x}_{0:T}) ...

3 Diffusion models and denoising autoencoders

p.3

Diffusion models might appear to be a restricted class of latent variable models, but they allow a large number of degre...

3.1 Forward process and $L_T$

p.3

We ignore the fact that the forward process variances $\beta_t$ are learnable by reparameterization and instead fix them...

3.2 Reverse process and $L_{1:T-1}$

Mathp.3

Now we discuss our choices in $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_...

3.3 Data scaling, reverse process decoder, and $L_0$

Mathp.4

We assume that image data consists of integers in $\{0, 1, \ldots, 255\}$ scaled linearly to $[-1, 1]$. This ensures tha...

3.4 Simplified training objective

Mathp.4

With the reverse process and decoder defined above, the variational bound, consisting of terms derived from Eqs. (12) an...

4 Experiments

p.5

We set $T = 1000$ for all experiments so that the number of neural network evaluations needed during sampling matches pr...

4.1 Sample quality

p.5

Table 1 shows Inception scores, FID scores, and negative log likelihoods (lossless codelengths) on CIFAR10. With our FID...

4.2 Reverse process parameterization and training objective ablation

p.6

In Table 2, we show the sample quality effects of reverse process parameterizations and training objectives (Section 3.2...

4.3 Progressive coding

Mathp.6

Table 1 also shows the codelengths of our CIFAR10 models. The gap between train and test is at most 0.03 bits per dimens...

4.4 Interpolation

p.8

We can interpolate source images $\mathbf{x}_0, \mathbf{x}_0' \sim q(\mathbf{x}_0)$ in latent space using $q$ as a stoch...

5 Related Work

p.8

While diffusion models might resemble flows [9, 46, 10, 32, 5, 16, 23] and VAEs [33, 47, 37], diffusion models are desig...

6 Conclusion

p.9

We have presented high quality image samples using diffusion models, and we have found connections among diffusion model...

Broader Impact

p.9

Our work on diffusion models takes on a similar scope as existing work on other types of deep generative models, such as...

A Extended derivations

Mathp.13

Below is a derivation of Eq. (5), the reduced variance variational bound for diffusion models. This material is from Soh...

B Experimental details

p.14

Our neural network architecture follows the backbone of PixelCNN++ [52], which is a U-Net [48] based on a Wide ResNet [7...

C Discussion on related work

p.15

Our model architecture, forward process definition, and prior differ from NCSN [55, 56] in subtle but important ways tha...

D Samples

p.15

Additional samples: Figure 11, 13, 16, 17, 18, and 19 show uncurated samples from the diffusion models trained on CelebA...