4 min read
Computing Jacobians

When computing Jacobians by hand, it is common to use the so-called “index notation”. However, directly using the definition of total derivative can be faster.

What is the Jacobian?

Let f ⁣:RnRmf \colon \R^n \to \R^m be an arbitrary function

f(x)=[f1(x)fm(x)],f(\bm{x}) = \begin{bmatrix} f_1(\bm{x}) \\ \vdots \\ f_m(\bm{x}) \end{bmatrix},

where fi ⁣:RnRf_i \colon \R^n \to \R. We define its Jacobian matrix at a point as follows:

J=[f1fm]=[f1x1f1xnfmx1fmxn]Rm×n.J = \begin{bmatrix} \rule[0.5ex]{5mm}{0.1pt} & \nabla f_1 & \rule[0.5ex]{5mm}{0.1pt} \\ & \vdots & \\ \rule[0.5ex]{5mm}{0.1pt} & \nabla f_m & \rule[0.5ex]{5mm}{0.1pt} \\ \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ & \vdots & \\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \\ \end{bmatrix} \in \R^{m \times n}.

Computing the Jacobian using index notation

Example 1

Let us calculate the Jacobian of f ⁣:RnRmf \colon \R^n \to \R^m given by f(x)=Axf(\bm{x}) = A\bm{x}. Note that ff can be expressed as

f(x)=[A11A1nAm1Amn][x1xn].f(\bm{x}) = \begin{bmatrix} A_{11} & \dots & A_{1n} \\ & \vdots & \\ A_{m1} & \dots & A_{mn} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.

Thus, fi(x)=k=1nAikxkf_i(\bm{x}) = \sum_{k = 1}^n A_{ik}x_k. Taking the derivative with respect to xjx_j we obtain that fixj=Aij\frac{\partial f_i}{\partial x_j} = A_{ij}. Thus, the Jacobian is given by J=AJ = A.

Example 2

Let us now compute the Jacobian of f ⁣:RnRf \colon \R^n \to \R given by f(x)=xTAxf(\bm{x}) = \bm{x}^T A \bm{x}. First we will need to do some gymnastics to get a closed form for ff:

f(x)=[x1xn][A11A1nAn1Ann][x1xn]=[x1xn][k=1nA1kxkk=1nAnkxk]=l,k=1nAlkxlxk\begin{align*} f(\bm{x}) &= \begin{bmatrix} x_1 & \dots & x_n \end{bmatrix} \begin{bmatrix} A_{11} & \dots & A_{1n} \\ & \vdots & \\ A_{n1} & \dots & A_{nn} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} \\ &= \begin{bmatrix} x_1 & \dots & x_n \end{bmatrix} \begin{bmatrix} \sum_{k = 1}^n A_{1k} x_k \\ \vdots \\ \sum_{k = 1}^n A_{nk} x_k \end{bmatrix} \\ &= \sum_{l, k = 1}^n A_{lk} x_l x_k \end{align*}

Let us take the derivative with respect to xjx_j:

fxj=kjnAkjxk+kjnAjkxk+2Ajjxj=k=1n(Akj+Ajk)xk\begin{align*} \frac{\partial f}{\partial x_j} &= \sum_{k \neq j}^n A_{kj} x_k + \sum_{k \neq j}^n A_{jk} x_k + 2A_{jj} x_j \\ &= \sum_{k = 1}^n (A_{kj} + A_{jk}) x_k \end{align*}

As we can (not very easily) see, the Jacobian is given by xT(A+AT)\bm{x}^T (A + A^T).

Computing the Jacobian using the definition of total derivative

The total derivative of ff at a point is the best linear approximation of the function near that point. If ff is differentiable, the total derivative will be equal to the Jacobian.

f(x+h)=f(x)+J(x)h+o(h)f(\bm{x} + \bm{h}) = f(\bm{x}) + J(\bm{x}) \bm{h} + o(\lVert h \rVert)

Thus, we can compute the Jacobian by expressing f(x+h)f(x)f(\bm{x} + \bm{h}) - f(\bm{x}) as J(x)h+o(h)J(\bm{x}) \bm{h} + o(\lVert \bm{h} \rVert). As we will see, this is sometimes significantly faster than using index notation.

Example 1 revisited

For f(x)=Axf(\bm{x}) = A\bm{x}, we compute

f(x+h)f(x)=A(x+h)Ax=Ah.f(\bm{x} + \bm{h}) - f(\bm{x}) = A(\bm{x} + \bm{h}) - A\bm{x} = A\bm{h}.

Thus, the Jacobian is AA.

Example 2 revisited For f(x)=xTAxf(\bm{x}) = \bm{x}^T A \bm{x}, we can compute

f(x+h)f(x)=(x+h)TA(x+h)xTAx=xTAh+hTAx+hTAh=xT(A+AT)h+o(h)\begin{align*} f(\bm{x} + \bm{h}) - f(\bm{x}) &= (\bm{x} + \bm{h})^T A (\bm{x} + \bm{h}) - \bm{x}^T A \bm{x} \\ &= \bm{x}^T A \bm{h} + \bm{h}^T A \bm{x} + \bm{h}^T A \bm{h} \\ &= \bm{x}^T (A + A^T) \bm{h} + o(\lVert \bm{h} \rVert) \end{align*}

We can immediately see that the Jacobian is xT(A+AT)\bm{x}^T (A + A^T).