Generalized Gauss Newton matrix

Decomposition of Hessian

\[\begin{align} H_{ij} &= \frac{\partial}{\partial \theta_{j}} \frac{\partial L(f(\theta))}{\partial\theta_{i}}\\ &= \frac{\partial}{\partial \theta_{j}}(\sum_{k}\frac{\partial L}{\partial f(\theta)_{k}}f(\theta)\frac{\partial f(\theta)_{k}}{\partial\theta_{i}})\\ &= \sum_{k}(\frac{\partial}{\partial \theta_{j}}(\frac{\partial L}{\partial f(\theta)_{k}})f(\theta)\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i}\theta_{j}})\\ &= \sum_{k}(\sum_{l}\frac{\partial^{2} L}{\partial f(\theta)_{k} \partial f(\theta)_{l}}f(\theta)\frac{\partial f(\theta)_{l}}{\partial \theta_{i}}\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i} \partial \theta_{j}})\\ &= \sum_{k}\sum_{l}\frac{\partial^{2} L}{\partial f(\theta)_{k} \partial f(\theta)_{l}}f(\theta)\frac{\partial f(\theta)_{l}}{\partial \theta_{i}}\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\sum_{k}\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i}\partial\theta_{j}} \end{align} \]

Summarizing the results in matrix form, we can obtain that

\[\begin{equation} H = \frac{\partial f}{\partial\theta}^{T} \frac{\partial^{2} L}{\partial f\partial f} \frac{\partial f}{\partial\theta} + \sum_{k} \frac{\partial L}{\partial f_{k}}\frac{\partial^{2} f_{k}}{\partial\theta\partial\theta} \end{equation} \]

Generalized Gauss Newton

In equation (6), if we drop the second term, we obtain the Generalized Gauss Newton matrix

\[\begin{equation} G = \frac{\partial f}{\partial\theta}^{T} \frac{\partial^{2} L}{\partial f\partial f} \frac{\partial f}{\partial\theta} \end{equation} \]

Gauss Newton matrix is exactly the Hessian matrix if

  1. The model fits the data perfectly
  2. Or, the model is linearized

References

  1. https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2022/
  2. https://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/