# Visualizing and Animating Optimization Algorithms with Matplotlib

In this series of notebooks, we demonstrate some useful patterns and recipes for visualizing animating optimization algorithms using Matplotlib.

In :
%matplotlib inline

In :
import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import LogNorm
from matplotlib import animation
from IPython.display import HTML

from scipy.optimize import minimize
from collections import defaultdict
from itertools import zip_longest
from functools import partial


We shall restrict our attention to 3-dimensional problems for right now (i.e. optimizing over only 2 parameters), though what follows can be extended to higher dimensions by plotting all pairs of parameters against each other, effectively projecting the problem to 3-dimensions.

The Wikipedia article on Test functions for optimization has a few functions that are useful for evaluating optimization algorithms. In particular, we shall look at Beale's function:

$$f(x, y) = (1.5 - x + xy)^2 + (2.25 - x + xy^2)^2 + (2.625 - x + xy^3)^2$$

In :
f  = lambda x, y: (1.5 - x + x*y)**2 + (2.25 - x + x*y**2)**2 + (2.625 - x + x*y**3)**2

In :
xmin, xmax, xstep = -4.5, 4.5, .2
ymin, ymax, ystep = -4.5, 4.5, .2

In :
x, y = np.meshgrid(np.arange(xmin, xmax + xstep, xstep), np.arange(ymin, ymax + ystep, ystep))

In :
z = f(x, y)


We know the global minima is at $(3, 0.5)$

In :
minima = np.array([3., .5])

In :
f(*minima)

Out:
0.0
In :
minima_ = minima.reshape(-1, 1)
minima_

Out:
array([[ 3. ],
[ 0.5]])
In :
f(*minima_)

Out:
array([ 0.])

#### 3D Surface Plot¶

In :
fig = plt.figure(figsize=(8, 5))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x, y, z, norm=LogNorm(), rstride=1, cstride=1,
edgecolor='none', alpha=.8, cmap=plt.cm.jet)
ax.plot(*minima_, f(*minima_), 'r*', markersize=10)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_zlabel('$z$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

plt.show() #### 2D Contour Plot and Gradient Vector Field¶

We use autograd to compute the gradient vector field, and plot it with Matplotlib's quiver method.

In :
dz_dx = elementwise_grad(f, argnum=0)(x, y)

In :
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.quiver(x, y, x - dz_dx, y - dz_dy, alpha=.5)
ax.plot(*minima_, 'r*', markersize=18)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

plt.show() For the purposes of this demonstration, we use SciPy's optimization methods. It goes without saying that the code and patterns for producing these visualizations generalizes easily to other optimization tools and libraries.

We set the starting point as $(3, 4)$, since it is challenging for algorithms with a little too much momentum in the gradient descent update rule, as they may overshoot and end up in some local minima.

In :
x0 = np.array([3., 4.])


Again, we use autograd to compute the gradients, and augment the function slightly to adhere to Scipy's optimization interface.

In :
func = value_and_grad(lambda args: f(*args))


The method we use first is Newton-CG, and set the callback simply as print. Note that we can only do this in Python 3, where print is an actual function.

In :
res = minimize(func, x0=x0, method='Newton-CG',
jac=True, tol=1e-20, callback=print)

[ 2.71113991  3.35161828]
[ 2.48008912  2.78955116]
[ 2.29965866  2.30123678]
[ 2.16373347  1.8756312 ]
[ 2.06741079  1.50235414]
[ 2.00766238  1.17079384]
[ 1.98485905  0.86972447]
[ 2.00511126  0.59071489]
[ 2.07692544  0.34891823]
[ 2.17857778  0.21644485]
[ 2.55966682  0.38003383]
[ 2.80228089  0.44954972]
[ 2.94477854  0.48765376]
[ 2.94564749  0.48601427]
[ 2.95359059  0.48810805]
[ 2.97113927  0.49269804]
[ 2.99870879  0.49976069]
[ 2.99999481  0.49999876]
[ 3.00000001  0.49999999]
[ 3.   0.5]
[ 3.   0.5]

In :
dict(res)

Out:
{'fun': 2.5770606809684326e-18,
'jac': array([  8.13600382e-10,   1.86117137e-09]),
'message': 'Optimization terminated successfully.',
'nfev': 22,
'nhev': 0,
'nit': 21,
'njev': 104,
'status': 0,
'success': True,
'x': array([ 3. ,  0.5])}

The results look plausibly good, but would be more convincing with some visualization. Let us define a new callback function that appends the intermediate values to a list instead of simply printing it.

In :
def make_minimize_cb(path=[]):

def minimize_cb(xk):
# note that we make a deep copy of xk
path.append(np.copy(xk))

return minimize_cb


We initialize the list with the starting value.

In :
path_ = [x0]

In :
res = minimize(func, x0=x0, method='Newton-CG',
jac=True, tol=1e-20, callback=make_minimize_cb(path_))

In :
dict(res)

Out:
{'fun': 2.5770606809684326e-18,
'jac': array([  8.13600382e-10,   1.86117137e-09]),
'message': 'Optimization terminated successfully.',
'nfev': 22,
'nhev': 0,
'nit': 21,
'njev': 104,
'status': 0,
'success': True,
'x': array([ 3. ,  0.5])}

We cast the list to a NumPy array and transpose it so it's easier and more natural to work with.

In :
path = np.array(path_).T
path.shape

Out:
(2, 22)

#### Static Quiver Plot of Path on 2D Contour Plot¶

Again, using the quiver method, but in a slightly different way than before, we can represent each step, it's length and direction, using the arrows.

In :
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.quiver(path[0,:-1], path[1,:-1], path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1], scale_units='xy', angles='xy', scale=1, color='k')
ax.plot(*minima_, 'r*', markersize=18)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

Out:
(-4.5, 4.5) #### Static Quiver Plot of Path on 3D Surface Plot¶

However, this is slightly less useful when plotted against a 3D surface plot...

In :
fig = plt.figure(figsize=(8, 5))
ax = plt.axes(projection='3d', elev=50, azim=-50)

ax.plot_surface(x, y, z, norm=LogNorm(), rstride=1, cstride=1, edgecolor='none', alpha=.8, cmap=plt.cm.jet)
ax.quiver(path[0,:-1], path[1,:-1], f(*path[::,:-1]),
path[0,1:]-path[0,:-1], path[1,1:]-path[1,:-1], f(*(path[::,1:]-path[::,:-1])),
color='k')
ax.plot(*minima_, f(*minima_), 'r*', markersize=10)

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_zlabel('$z$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

Out:
(-4.5, 4.5) #### Animating Single Path on 2D Contour Plot¶

We can also animate the trajectory of the optimization algorithm using the excellent FuncAnimation class. First we draw the 2D contour plot as we did before, and initialize the line and point (which are Line2D objects). Guides on how to use the FuncAnimation class can be found in tutorials such as Jake Vanderplas' Matplotlib Animation Tutorial, so we will not go into too much detail here.

In :
fig, ax = plt.subplots(figsize=(10, 6))

ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.plot(*minima_, 'r*', markersize=18)

line, = ax.plot([], [], 'b', label='Newton-CG', lw=2)
point, = ax.plot([], [], 'bo')

ax.set_xlabel('$x$')
ax.set_ylabel('$y$')

ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))

ax.legend(loc='upper left')

Out:
<matplotlib.legend.Legend at 0x113d9fd30>