Generalized Gauss Newton matrix
Decomposition of Hessian
\[\begin{align}
H_{ij} &= \frac{\partial}{\partial \theta_{j}} \frac{\partial L(f(\theta))}{\partial\theta_{i}}\\
&= \frac{\partial}{\partial \theta_{j}}(\sum_{k}\frac{\partial L}{\partial f(\theta)_{k}}f(\theta)\frac{\partial f(\theta)_{k}}{\partial\theta_{i}})\\
&= \sum_{k}(\frac{\partial}{\partial \theta_{j}}(\frac{\partial L}{\partial f(\theta)_{k}})f(\theta)\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i}\theta_{j}})\\
&= \sum_{k}(\sum_{l}\frac{\partial^{2} L}{\partial f(\theta)_{k} \partial f(\theta)_{l}}f(\theta)\frac{\partial f(\theta)_{l}}{\partial \theta_{i}}\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i} \partial \theta_{j}})\\
&= \sum_{k}\sum_{l}\frac{\partial^{2} L}{\partial f(\theta)_{k} \partial f(\theta)_{l}}f(\theta)\frac{\partial f(\theta)_{l}}{\partial \theta_{i}}\frac{\partial f(\theta)_{k}}{\partial \theta_{i}} +\sum_{k}\frac{\partial L}{\partial f(\theta)_{k}} f(\theta)\frac{\partial^{2} f(\theta)_{k}}{\partial\theta_{i}\partial\theta_{j}}
\end{align}
\]
Summarizing the results in matrix form, we can obtain that
\[\begin{equation}
H = \frac{\partial f}{\partial\theta}^{T} \frac{\partial^{2} L}{\partial f\partial f} \frac{\partial f}{\partial\theta} + \sum_{k} \frac{\partial L}{\partial f_{k}}\frac{\partial^{2} f_{k}}{\partial\theta\partial\theta}
\end{equation}
\]
Generalized Gauss Newton
In equation (6), if we drop the second term, we obtain the Generalized Gauss Newton matrix
\[\begin{equation}
G = \frac{\partial f}{\partial\theta}^{T} \frac{\partial^{2} L}{\partial f\partial f} \frac{\partial f}{\partial\theta}
\end{equation}
\]
Gauss Newton matrix is exactly the Hessian matrix if
- The model fits the data perfectly
- Or, the model is linearized
References
- https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2022/
- https://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/