When computing Jacobians by hand, it is common to use the so-called “index notation”. However, directly using the definition of total derivative can be faster.
What is the Jacobian?
Let f : R n → R m f \colon \R^n \to \R^m f : R n → R m be an arbitrary function
f ( x ) = [ f 1 ( x ) ⋮ f m ( x ) ] , f(\bm{x}) =
\begin{bmatrix}
f_1(\bm{x}) \\
\vdots \\
f_m(\bm{x})
\end{bmatrix}, f ( x ) = f 1 ( x ) ⋮ f m ( x ) ,
where f i : R n → R f_i \colon \R^n \to \R f i : R n → R . We define its Jacobian matrix at a point as follows:
J = [ ∇ f 1 ⋮ ∇ f m ] = [ ∂ f 1 ∂ x 1 … ∂ f 1 ∂ x n ⋮ ∂ f m ∂ x 1 … ∂ f m ∂ x n ] ∈ R m × n . J =
\begin{bmatrix}
\rule[0.5ex]{5mm}{0.1pt} & \nabla f_1 & \rule[0.5ex]{5mm}{0.1pt} \\
& \vdots & \\
\rule[0.5ex]{5mm}{0.1pt} & \nabla f_m & \rule[0.5ex]{5mm}{0.1pt} \\
\end{bmatrix} =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\
& \vdots & \\
\frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \\
\end{bmatrix} \in \R^{m \times n}. J = ∇ f 1 ⋮ ∇ f m = ∂ x 1 ∂ f 1 ∂ x 1 ∂ f m … ⋮ … ∂ x n ∂ f 1 ∂ x n ∂ f m ∈ R m × n .
Computing the Jacobian using index notation
Example 1
Let us calculate the Jacobian of f : R n → R m f \colon \R^n \to \R^m f : R n → R m given by f ( x ) = A x f(\bm{x}) = A\bm{x} f ( x ) = A x . Note that f f f can be expressed as
f ( x ) = [ A 11 … A 1 n ⋮ A m 1 … A m n ] [ x 1 ⋮ x n ] . f(\bm{x}) =
\begin{bmatrix}
A_{11} & \dots & A_{1n} \\
& \vdots & \\
A_{m1} & \dots & A_{mn} \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
\vdots \\
x_n
\end{bmatrix}. f ( x ) = A 11 A m 1 … ⋮ … A 1 n A mn x 1 ⋮ x n .
Thus, f i ( x ) = ∑ k = 1 n A i k x k f_i(\bm{x}) = \sum_{k = 1}^n A_{ik}x_k f i ( x ) = ∑ k = 1 n A ik x k . Taking the derivative with respect to x j x_j x j we obtain that ∂ f i ∂ x j = A i j \frac{\partial f_i}{\partial x_j} = A_{ij} ∂ x j ∂ f i = A ij . Thus, the Jacobian is given by J = A J = A J = A .
Example 2
Let us now compute the Jacobian of f : R n → R f \colon \R^n \to \R f : R n → R given by f ( x ) = x T A x f(\bm{x}) = \bm{x}^T A \bm{x} f ( x ) = x T A x . First we will need to do some gymnastics to get a closed form for f f f :
f ( x ) = [ x 1 … x n ] [ A 11 … A 1 n ⋮ A n 1 … A n n ] [ x 1 ⋮ x n ] = [ x 1 … x n ] [ ∑ k = 1 n A 1 k x k ⋮ ∑ k = 1 n A n k x k ] = ∑ l , k = 1 n A l k x l x k \begin{align*}
f(\bm{x}) &=
\begin{bmatrix}
x_1 & \dots & x_n
\end{bmatrix}
\begin{bmatrix}
A_{11} & \dots & A_{1n} \\
& \vdots & \\
A_{n1} & \dots & A_{nn} \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
\vdots \\
x_n
\end{bmatrix} \\
&=
\begin{bmatrix}
x_1 & \dots & x_n
\end{bmatrix}
\begin{bmatrix}
\sum_{k = 1}^n A_{1k} x_k \\
\vdots \\
\sum_{k = 1}^n A_{nk} x_k
\end{bmatrix} \\
&= \sum_{l, k = 1}^n A_{lk} x_l x_k
\end{align*} f ( x ) = [ x 1 … x n ] A 11 A n 1 … ⋮ … A 1 n A nn x 1 ⋮ x n = [ x 1 … x n ] ∑ k = 1 n A 1 k x k ⋮ ∑ k = 1 n A nk x k = l , k = 1 ∑ n A l k x l x k
Let us take the derivative with respect to x j x_j x j :
∂ f ∂ x j = ∑ k ≠ j n A k j x k + ∑ k ≠ j n A j k x k + 2 A j j x j = ∑ k = 1 n ( A k j + A j k ) x k \begin{align*}
\frac{\partial f}{\partial x_j} &= \sum_{k \neq j}^n A_{kj} x_k + \sum_{k \neq j}^n A_{jk} x_k + 2A_{jj} x_j \\
&= \sum_{k = 1}^n (A_{kj} + A_{jk}) x_k
\end{align*} ∂ x j ∂ f = k = j ∑ n A kj x k + k = j ∑ n A jk x k + 2 A jj x j = k = 1 ∑ n ( A kj + A jk ) x k
As we can (not very easily) see, the Jacobian is given by x T ( A + A T ) \bm{x}^T (A + A^T) x T ( A + A T ) .
Computing the Jacobian using the definition of total derivative
The total derivative of f f f at a point is the best linear approximation of the function near that point. If f f f is differentiable, the total derivative will be equal to the Jacobian.
f ( x + h ) = f ( x ) + J ( x ) h + o ( ∥ h ∥ ) f(\bm{x} + \bm{h}) = f(\bm{x}) + J(\bm{x}) \bm{h} + o(\lVert h \rVert) f ( x + h ) = f ( x ) + J ( x ) h + o (∥ h ∥)
Thus, we can compute the Jacobian by expressing f ( x + h ) − f ( x ) f(\bm{x} + \bm{h}) - f(\bm{x}) f ( x + h ) − f ( x ) as J ( x ) h + o ( ∥ h ∥ ) J(\bm{x}) \bm{h} + o(\lVert \bm{h} \rVert) J ( x ) h + o (∥ h ∥) . As we will see, this is sometimes significantly faster than using index notation.
Example 1 revisited
For f ( x ) = A x f(\bm{x}) = A\bm{x} f ( x ) = A x , we compute
f ( x + h ) − f ( x ) = A ( x + h ) − A x = A h . f(\bm{x} + \bm{h}) - f(\bm{x}) = A(\bm{x} + \bm{h}) - A\bm{x} = A\bm{h}. f ( x + h ) − f ( x ) = A ( x + h ) − A x = A h .
Thus, the Jacobian is A A A .
Example 2 revisited
For
f ( x ) = x T A x f(\bm{x}) = \bm{x}^T A \bm{x} f ( x ) = x T A x , we can compute
f ( x + h ) − f ( x ) = ( x + h ) T A ( x + h ) − x T A x = x T A h + h T A x + h T A h = x T ( A + A T ) h + o ( ∥ h ∥ ) \begin{align*}
f(\bm{x} + \bm{h}) - f(\bm{x}) &= (\bm{x} + \bm{h})^T A (\bm{x} + \bm{h}) - \bm{x}^T A \bm{x} \\
&= \bm{x}^T A \bm{h} + \bm{h}^T A \bm{x} + \bm{h}^T A \bm{h} \\
&= \bm{x}^T (A + A^T) \bm{h} + o(\lVert \bm{h} \rVert)
\end{align*} f ( x + h ) − f ( x ) = ( x + h ) T A ( x + h ) − x T A x = x T A h + h T A x + h T A h = x T ( A + A T ) h + o (∥ h ∥)
We can immediately see that the Jacobian is x T ( A + A T ) \bm{x}^T (A + A^T) x T ( A + A T ) .