Kalman Filter
Consider the Hidden Markov Model structure, there are latent variables \(\{x_t\}^T_{t=1}\) and observation variables \(\{y_t\}^T_{t=1}\), where \(x_t,y_t\in \mathcal{R}^D\). The followings are the transition between latents, and the emission between latent and observed states,
$$
x_t=Ax_{t-1}+\epsilon\leftarrow\epsilon\sim\mathcal{N}(0,Q) \\
p(x_t|x_{t-1}) = \mathcal{N}(x_t;Ax_{t-1},Q)\\
y_t=Cx_t+\sigma\leftarrow\sigma\sim\mathcal{N}(0,R) \\
p(y_t|x_t) = \mathcal{N}(y_t,Cx_t,R)
$$
Update procedures, firstly predict the hidden state,
\begin{align*}
p(x_{t-1}|y_{1:t-1}) &= \mathcal{N}(x_{t-1};x^{t-1}_{t-1},V^{t-1}_{t-1}) \\
p(x_t|y_{1:t-1})&=\int p(x_t,x_{t-1}|y_{1:t-1}) dx_{t-1}=\int p(x_t|x_{t-1})p(x_{t-1}|y_{1:t-1}) dx_{t-1} \\
&=\mathcal{N}(x_t;x_t^{t-1},V_t^{t-1})
\end{align*}
Derive the expressions of parameters, \(dx_t=d\epsilon\)
\begin{align*}
&x_t^{t-1}=\int dx_t\int dx_{t-1}x_tp(x_t|x_{t-1})p(x_{t-1}|y_{1:t-1}) \\
&=\int d\epsilon\int dx_{t-1}(Ax_{t-1}+\epsilon)\mathcal{N}(\epsilon;0,Q)\mathcal{N}(x_{t-1};x^{t-1}_{t-1},V^{t-1}_{t-1})\\
&=Ax^{t-1}_{t-1}\\
&V_t^{t-1}=\int dx_t\int dx_{t-1}x_tx_t^\top p(x_t|x_{t-1})p(x_{t-1}|y_{1:t-1}) \\
&=\int d\epsilon\int dx_{t-1}(Ax_{t-1}x_{t-1}^\top A^\top+\epsilon\epsilon^\top)\mathcal{N}(\epsilon;0,Q)\mathcal{N}(x_{t-1};x^{t-1}_{t-1},V^{t-1}_{t-1})\\
&=AV^{t-1}_{t-1}A^\top+Q\\
\end{align*}
Update the observed state,
\begin{align*}
p(x_t|y_{1:t})&=p(x_t|y_t,y_{1:t-1})\propto\underbrace{p(y_t|x_t)}_{\text{likelihood}}\underbrace{p(x_t|y_{1:t-1}}_{\text{prior}})=\mathcal{N}(x_t;x_t^t,V^t_t)
\end{align*}
First derive \(V^t_t\), if consider the log-product of likelihood and prior, and log-probability of marginal filter,
\begin{align*}
&L.H.S = log(\mathcal{N}(y_t;Cx_t,R)\mathcal{N}(x_t;x_t^{t-1},V_t^{t-1})) \\
&\approx(y_t-Cx_t)^\top R^{-1}(y_t-Cx_t)+(x_t-x_t^{t-1})^\top (V_t^{t-1})^{-1}(x_t-x_t^{t-1})\\
&\approx x^\top_tC^\top R^{-1}Cx_t-2x^\top_tC^\top R^{-1}y_t+x^\top_t(V_t^{t-1})^{-1}x_t-2x^\top_t(V_t^{t-1})^{-1}x_t^{t-1}\\
&=x^\top_t((V_t^{t-1})^{-1}+C^\top R^{-1}C)x_t-2x_t^\top(C^\top R^{-1}y_t+(V_t^{t-1})^{-1}x_t^{t-1}) \\
&R.H.S=log\mathcal{N}(x_t;x^t_t,V^t_t)\approx (x_t-x_t^t)^\top (V^t_t)^{-1}(x_t-x^t_t) \\
&\approx x_t^\top (V^t_t)^{-1}x_t-2x_t^\top(V^t_t)^{-1}x^t_t
\end{align*}
So,
\begin{align*}
&V^t_t=((V_t^{t-1})^{-1}+C^\top R^{-1}C)^{-1}=V_t^{t-1}-V_t^{t-1}C^\top(CV_t^{t-1}C^\top+R)^{-1}CV_t^{t-1}\\
&=V_t^{t-1}-K_tCV_t^{t-1}\\
&K_t = V_t^{t-1}C^\top(CV_t^{t-1}C^\top+R)^{-1}
\end{align*}
For \(x^t_t\),
\begin{align*}
&\arg\max_{x_t}p(x_t|y_{1:t})=\arg\min_{x_t}(y_t-Cx_t)^\top R^{-1}(y_t-Cx_t)+(x_t-x_t^{t-1})^\top (V_t^{t-1})^{-1}(x_t-x_t^{t-1})\\
&\approx\arg\min_{x_t}x^\top_t((V_t^{t-1})^{-1}+C^\top R^{-1}C)x_t-2x_t^\top(C^\top R^{-1}y_t+(V_t^{t-1})^{-1}x_t^{t-1}) \\
\end{align*}
To find the expression, differentiate the equation (laxy typing \(V=V_t^{t-1},x=x_t^{t-1}\)),
\begin{align*}
&\frac{\partial}{\partial x_t}\big(x^\top_t((V_t^{t-1})^{-1}+C^\top R^{-1}C)x_t-2x_t^\top(C^\top R^{-1}y_t+(V_t^{t-1})^{-1}x_t^{t-1})\big) \\
&=2(V^{-1}+C^\top R^{-1}C)x_t-2(C^\top R^{-1}y_t+V^{-1}x)=0\\
&x_t=(V^{-1}+C^\top R^{-1}C)^{-1}(C^\top R^{-1}y_t+V^{-1}x)\\
&=(V-VC^\top(R+CVC^\top)^{-1}CV)(C^\top R^{-1}y_t+V^{-1}x)\\
&=(V-K_tCV)(C^\top R^{-1}y_t+V^{-1}x)\\
&=x-K_tCx+(V-K_tCV)C^\top R^{-1}y_t\\
\end{align*}
Consider the form of \(K_t\),
\begin{align*}
K_t &= VC^\top(CVC^\top+R)^{-1} \\
K_t(CVC^\top+R) &=K_tCVC^\top+K_tR =VC^\top \\
(K_tC-I)VC^\top+K_tR &= 0 \\
K_tR &= (I-K_tC)VC^\top\\
K_t &=(V-K_tCV)C^\top R^{-1}\\
\end{align*}
So,
\begin{align*}
x_t &= x-K_tCx+K_ty_t = x+K_t(y_t-Cx)\\
x_t &= x^{t-1}_t+K_t(y_t-Cx_t^{t-1})
\end{align*}
References