Visualizing and Animating Optimization Algorithms with Matplotlib

In this series of notebooks, we demonstrate some useful patterns and recipes for visualizing animating optimization algorithms using Matplotlib.

In [1]:
%matplotlib inline
In [2]:
import matplotlib.pyplot as plt
import autograd.numpy as np

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import LogNorm
from matplotlib import animation
from IPython.display import HTML

from autograd import elementwise_grad, value_and_grad
from scipy.optimize import minimize
from collections import defaultdict
from itertools import zip_longest
from functools import partial

We shall restrict our attention to 3-dimensional problems for right now (i.e. optimizing over only 2 parameters), though what follows can be extended to higher dimensions by plotting all pairs of parameters against each other, effectively projecting the problem to 3-dimensions.

The Wikipedia article on Test functions for optimization has a few functions that are useful for evaluating optimization algorithms. In particular, we shall look at Beale's function:

$$ f(x, y) = (1.5 - x + xy)^2 + (2.25 - x + xy^2)^2 + (2.625 - x + xy^3)^2 $$
In [3]:
f  = lambda x, y: (1.5 - x + x*y)**2 + (2.25 - x + x*y**2)**2 + (2.625 - x + x*y**3)**2
In [4]:
xmin, xmax, xstep = -4.5, 4.5, .2
ymin, ymax, ystep = -4.5, 4.5, .2
In [5]:
x, y = np.meshgrid(np.arange(xmin, xmax + xstep, xstep), np.arange(ymin, ymax + ystep, ystep))
In [6]:
z = f(x, y)

We know the global minima is at $(3, 0.5)$

In [7]:
minima = np.array([3., .5])
In [8]:
f(*minima)
Out[8]:
0.0
In [9]:
minima_ = minima.reshape(-1, 1)
minima_
Out[9]:
array([[ 3. ],
       [ 0.5]])
In [10]:
f(*minima_)
Out[10]:
array([ 0.])

3D Surface Plot

In [11]:
fig = plt.figure(figsize=(8, 5))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x, y, z, norm=LogNorm(), rstride=1, cstride=1, 
                edgecolor='none', alpha=.8, cmap=plt.cm.jet)
ax.plot(*minima_, f(*minima_), 'r*', markersize=10)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_zlabel('$z$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

plt.show()

2D Contour Plot and Gradient Vector Field

We use autograd to compute the gradient vector field, and plot it with Matplotlib's quiver method.

In [12]:
dz_dx = elementwise_grad(f, argnum=0)(x, y)
dz_dy = elementwise_grad(f, argnum=1)(x, y)
In [13]:
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.quiver(x, y, x - dz_dx, y - dz_dy, alpha=.5)
ax.plot(*minima_, 'r*', markersize=18)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

plt.show()

Gradient-based Optimization

For the purposes of this demonstration, we use SciPy's optimization methods. It goes without saying that the code and patterns for producing these visualizations generalizes easily to other optimization tools and libraries.

We set the starting point as $(3, 4)$, since it is challenging for algorithms with a little too much momentum in the gradient descent update rule, as they may overshoot and end up in some local minima.

In [14]:
x0 = np.array([3., 4.])

Again, we use autograd to compute the gradients, and augment the function slightly to adhere to Scipy's optimization interface.

In [15]:
func = value_and_grad(lambda args: f(*args))

The method we use first is Newton-CG, and set the callback simply as print. Note that we can only do this in Python 3, where print is an actual function.

In [16]:
res = minimize(func, x0=x0, method='Newton-CG',
               jac=True, tol=1e-20, callback=print)
[ 2.71113991  3.35161828]
[ 2.48008912  2.78955116]
[ 2.29965866  2.30123678]
[ 2.16373347  1.8756312 ]
[ 2.06741079  1.50235414]
[ 2.00766238  1.17079384]
[ 1.98485905  0.86972447]
[ 2.00511126  0.59071489]
[ 2.07692544  0.34891823]
[ 2.17857778  0.21644485]
[ 2.55966682  0.38003383]
[ 2.80228089  0.44954972]
[ 2.94477854  0.48765376]
[ 2.94564749  0.48601427]
[ 2.95359059  0.48810805]
[ 2.97113927  0.49269804]
[ 2.99870879  0.49976069]
[ 2.99999481  0.49999876]
[ 3.00000001  0.49999999]
[ 3.   0.5]
[ 3.   0.5]
In [17]:
dict(res)
Out[17]:
{'fun': 2.5770606809684326e-18,
 'jac': array([  8.13600382e-10,   1.86117137e-09]),
 'message': 'Optimization terminated successfully.',
 'nfev': 22,
 'nhev': 0,
 'nit': 21,
 'njev': 104,
 'status': 0,
 'success': True,
 'x': array([ 3. ,  0.5])}

The results look plausibly good, but would be more convincing with some visualization. Let us define a new callback function that appends the intermediate values to a list instead of simply printing it.

In [18]:
def make_minimize_cb(path=[]):
    
    def minimize_cb(xk):
        # note that we make a deep copy of xk
        path.append(np.copy(xk))

    return minimize_cb

We initialize the list with the starting value.

In [19]:
path_ = [x0]
In [20]:
res = minimize(func, x0=x0, method='Newton-CG',
               jac=True, tol=1e-20, callback=make_minimize_cb(path_))
In [21]:
dict(res)
Out[21]:
{'fun': 2.5770606809684326e-18,
 'jac': array([  8.13600382e-10,   1.86117137e-09]),
 'message': 'Optimization terminated successfully.',
 'nfev': 22,
 'nhev': 0,
 'nit': 21,
 'njev': 104,
 'status': 0,
 'success': True,
 'x': array([ 3. ,  0.5])}

We cast the list to a NumPy array and transpose it so it's easier and more natural to work with.

In [22]:
path = np.array(path_).T
path.shape
Out[22]:
(2, 22)

Static Quiver Plot of Path on 2D Contour Plot

Again, using the quiver method, but in a slightly different way than before, we can represent each step, it's length and direction, using the arrows.

In [23]:
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.quiver(path[0,:-1], path[1,:-1], path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1], scale_units='xy', angles='xy', scale=1, color='k')
ax.plot(*minima_, 'r*', markersize=18)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))
Out[23]:
(-4.5, 4.5)

Static Quiver Plot of Path on 3D Surface Plot

However, this is slightly less useful when plotted against a 3D surface plot...

In [24]:
fig = plt.figure(figsize=(8, 5))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x, y, z, norm=LogNorm(), rstride=1, cstride=1, edgecolor='none', alpha=.8, cmap=plt.cm.jet)
ax.quiver(path[0,:-1], path[1,:-1], f(*path[::,:-1]), 
          path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1], f(*(path[::,1:]-path[::,:-1])), 
          color='k')
ax.plot(*minima_, f(*minima_), 'r*', markersize=10)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_zlabel('$z$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))
Out[24]:
(-4.5, 4.5)

Animating Single Path on 2D Contour Plot

We can also animate the trajectory of the optimization algorithm using the excellent FuncAnimation class. First we draw the 2D contour plot as we did before, and initialize the line and point (which are Line2D objects). Guides on how to use the FuncAnimation class can be found in tutorials such as Jake Vanderplas' Matplotlib Animation Tutorial, so we will not go into too much detail here.

In [25]:
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.plot(*minima_, 'r*', markersize=18)

line, = ax.plot([], [], 'b', label='Newton-CG', lw=2)
point, = ax.plot([], [], 'bo')

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

ax.legend(loc='upper left')
Out[25]:
<matplotlib.legend.Legend at 0x113d9fd30>