Implementing Variational Autoencoders in Keras: Beyond the Quickstart Tutorial

Keras is awesome. It is a very well-designed library that clearly abides by its guiding principles of modularity and extensibility, enabling us to easily assemble powerful, complex models from primitive building blocks. This has been demonstrated in numerous blog posts and tutorials, in particular, the excellent tutorial on Building Autoencoders in Keras. As the name suggests, that tutorial provides examples of how to implement various kinds of autoencoders in Keras, including the variational autoencoder (VAE) [1].

../../images/vae/result_combined.png

Visualization of 2D manifold of MNIST digits (left) and the representation of digits in latent space colored according to their digit labels (right).

Like all autoencoders, the variational autoencoder is primarily used for unsupervised learning of hidden representations. However, they are fundamentally different to your usual neural network-based autoencoder in that they approach the problem from a probabilistic perspective. They specify a joint distribution over the observed and latent variables, and approximate the intractable posterior conditional density over latent variables with variational inference, using an inference network [2] [3] (or more classically, a recognition model [4]) to amortize the cost of inference.

Read more…

Working with Samples of Distributions over Convolutional Kernels

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
In [2]:
import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt

from tensorflow.examples.tutorials.mnist import input_data as mnist_data
In [3]:
tf.__version__
Out[3]:
'1.2.1'
In [4]:
sess = tf.InteractiveSession()
In [5]:
mnist = mnist_data.read_data_sets("/home/tiao/Desktop/MNIST")
Extracting /home/tiao/Desktop/MNIST/train-images-idx3-ubyte.gz
Extracting /home/tiao/Desktop/MNIST/train-labels-idx1-ubyte.gz
Extracting /home/tiao/Desktop/MNIST/t10k-images-idx3-ubyte.gz
Extracting /home/tiao/Desktop/MNIST/t10k-labels-idx1-ubyte.gz
In [6]:
# 50 single-channel (grayscale) 28x28 images
x = mnist.train.images[:50].reshape(-1, 28, 28, 1)
x.shape
Out[6]:
(50, 28, 28, 1)
In [7]:
fig, ax = plt.subplots(figsize=(5, 5))

# showing an arbitrarily chosen image
ax.imshow(np.squeeze(x[5], axis=-1), cmap='gray')

plt.show()

Standard 2D Convolution with conv2d

In [8]:
# 32 kernels of size 5x5x1
kernel = tf.truncated_normal([5, 5, 1, 32], stddev=0.1)
kernel.get_shape().as_list()
Out[8]:
[5, 5, 1, 32]
In [9]:
x_conved = tf.nn.conv2d(x, kernel, 
                        strides=[1, 1, 1, 1], 
                        padding='SAME')
x_conved.get_shape().as_list()
Out[9]:
[50, 28, 28, 32]
In [10]:
x_conved[5, ..., 0].eval().shape
Out[10]:
(28, 28)
In [11]:
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(9, 4))

# showing what the 0th filter looks like
ax1.imshow(kernel[..., 0, 0].eval(), cmap='gray')

# show the previous arbitrarily chosen image
# convolved with the 0th filter
ax2.imshow(x_conved[5, ..., 0].eval(), cmap='gray')

plt.show()

Sample from a Distribution over Kernels

In [12]:
# 8x32 kernels of size 5x5x1
kernels = tf.truncated_normal([8, 5, 5, 1, 32], stddev=0.1)
kernels.get_shape().as_list()
Out[12]:
[8, 5, 5, 1, 32]

Approach 1: Map over samples with conv2d

In [13]:
x_tiled = tf.tile(tf.expand_dims(x, 0), [8, 1, 1, 1, 1])
x_tiled.get_shape().as_list()
Out[13]:
[8, 50, 28, 28, 1]
In [19]:
tf.nn.conv2d(x_tiled[0], kernels[0], 
             strides=[1, 1, 1, 1], 
             padding='SAME').get_shape().as_list()
Out[19]:
[50, 28, 28, 32]
In [15]:
x_conved1 = tf.map_fn(lambda args: tf.nn.conv2d(*args, strides=[1, 1, 1, 1], padding='SAME'),
                      elems=(x_tiled, kernels), dtype=tf.float32)
x_conved1.get_shape().as_list()
Out[15]:
[8, 50, 28, 28, 32]

Approach 2: Flattening

In [16]:
kernels_flat = tf.reshape(tf.transpose(kernels, 
                                       perm=(1, 2, 3, 4, 0)), 
                          shape=(5, 5, 1, 32*8))
kernels_flat.get_shape().as_list()
Out[16]:
[5, 5, 1, 256]
In [17]:
x_conved2 = tf.transpose(tf.reshape(tf.nn.conv2d(x, kernels_flat, 
                                                 strides=[1, 1, 1, 1], 
                                                 padding='SAME'), 
                                    shape=(50, 28, 28, 32, 8)), 
                         perm=(4, 0, 1, 2, 3))
x_conved2.get_shape().as_list()
Out[17]:
[8, 50, 28, 28, 32]
In [18]:
tf.reduce_all(tf.equal(x_conved1, x_conved2)).eval()
Out[18]:
True

Variational Inference with Implicit Approximate Inference Models - @fhuszar's Explaining Away Example Pt. 1 (WIP)

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
In [70]:
import numpy as np
import keras.backend as K

import matplotlib.pyplot as plt
import seaborn as sns

from scipy.stats import logistic, multivariate_normal, norm
from scipy.special import expit

from keras.models import Model, Sequential
from keras.layers import Activation, Add, Dense, Dot, Input
from keras.optimizers import Adam
from keras.utils.vis_utils import model_to_dot

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation

from IPython.display import HTML, SVG, display_html
from tqdm import tnrange, tqdm_notebook
In [3]:
# display animation inline
plt.rc('animation', html='html5')
plt.style.use('seaborn-notebook')
sns.set_context('notebook')
In [4]:
np.set_printoptions(precision=2,
                    edgeitems=3,
                    linewidth=80,
                    suppress=True)
In [5]:
K.tf.__version__
Out[5]:
'1.2.1'
In [6]:
LATENT_DIM = 2
NOISE_DIM = 3
BATCH_SIZE = 200
PRIOR_VARIANCE = 2.
LEARNING_RATE = 3e-3
PRETRAIN_EPOCHS = 60

Bayesian Logistic Regression (Synthetic Data)

In [7]:
z_min, z_max = -5, 5
In [8]:
z1, z2 = np.mgrid[z_min:z_max:300j, z_min:z_max:300j]
In [9]:
z_grid = np.dstack((z1, z2))
z_grid.shape
Out[9]:
(300, 300, 2)
In [10]:
prior = multivariate_normal(mean=np.zeros(LATENT_DIM), 
                            cov=PRIOR_VARIANCE)
In [11]:
log_prior = prior.logpdf(z_grid)
log_prior.shape
Out[11]:
(300, 300)
In [13]:
np.allclose(log_prior, 
            -.5*np.sum(z_grid**2, axis=2)/PRIOR_VARIANCE \
            -np.log(2*np.pi*PRIOR_VARIANCE))
Out[13]:
True
In [15]:
fig, ax = plt.subplots(figsize=(5, 5))

ax.contourf(z1, z2, log_prior, cmap='magma')

ax.set_xlabel('$w_1$')
ax.set_ylabel('$w_2$')

ax.set_xlim(z_min, z_max)
ax.set_ylim(z_min, z_max)

plt.show()
In [16]:
x = np.array([0, 5, 8, 12, 50])
In [37]:
def log_likelihood(z, x, beta_0=3., beta_1=1.):
    beta = beta_0 + np.sum(beta_1*np.maximum(0, z**3), axis=-1)
    return -np.log(beta) - x/beta
In [44]:
llhs = log_likelihood(z_grid, x.reshape(-1, 1, 1))
llhs.shape
Out[44]:
(5, 300, 300)
In [59]:
fig, axes = plt.subplots(ncols=len(x), nrows=1, figsize=(20, 4))
fig.tight_layout()

for i, ax in enumerate(axes):
    
    ax.contourf(z1, z2, llhs[i,::,::], cmap=plt.cm.magma)

    ax.set_xlim(z_min, z_max)
    ax.set_ylim(z_min, z_max)
    
    ax.set_title('$p(x = {{{0}}} \mid z)$'.format(x[i]))
    ax.set_xlabel('$z_1$')    
    
    if not i:
        ax.set_ylabel('$z_2$')

plt.show()