Define $a_{0}$ as the bias, the subscript ij represents the direction from output nodes i: $a^{(n)}_{i}$ to input nodes j: $z^{(n+1)}_{j}$, the superscript represents the layer index
$$z^{(2)} = \theta^{(1)}a^{(1)} $$For example,
$$\left[ \begin{array}{} z^{(2)}_{1} \\ z^{(2)}_{2} \\ \end{array} \right] = \left[ \begin{array}{} \theta^{(1)}_{01} & \theta^{(1)}_{11} & \theta^{(1)}_{21} \\ \theta^{(1)}_{02} & \theta^{(1)}_{12} & \theta^{(1)}_{22} \\ \end{array} \right] \space \left[ \begin{array}{} a^{(1)}_{0}\\ a^{(1)}_{1}\\ a^{(1)}_{2}\\ \end{array} \right] $$Bias will be added back before next layer of matrix multiplication.
$$a^{(2)} = \left[ \begin{array}{} a^{(2)}_{1}\\ a^{(2)}_{2}\\ \end{array} \right] \to \left[ \begin{array}{} a^{(2)}_{0}\\ a^{(2)}_{1}\\ a^{(2)}_{2}\\ \end{array} \right]$$Similarily for layer 3,
$$z^{(3)} = \theta^{(2)}a^{(2)} $$ $$\left[ \begin{array}{} z^{(3)}_{1} \\ z^{(3)}_{2} \\ \end{array} \right] = \left[ \begin{array}{} \theta^{(2)}_{01} & \theta^{(2)}_{11} & \theta^{(2)}_{21} \\ \theta^{(2)}_{02} & \theta^{(2)}_{12} & \theta^{(2)}_{22} \\ \end{array} \right] \space \left[ \begin{array}{} a^{(2)}_{0}\\ a^{(2)}_{1}\\ a^{(2)}_{2}\\ \end{array} \right] $$ $$a^{(3)} = g(z^{(3)}) = \left[ \begin{array}{} \frac{1}{1+ e^{-z^{(3)}_{1}}}\\ \frac{1}{1+ e^{-z^{(3)}_{2}}}\\ \end{array} \right] = \left[ \begin{array}{} a^{(3)}_{1}\\ a^{(3)}_{2}\\ \end{array} \right]$$But since layer 3 is the last layer, no bias will be added back.
Generally,
$$\frac{\partial J(\theta)}{\partial \theta^{(\ell)}_{ij}} = a^{(\ell)}_{i}g'(z^{\ell + 1}_{j})\delta^{\ell + 1}_{j} $$For $\theta^{(1)}$,
$$\frac{\partial \Delta}{\partial \theta^{(1)}_{ij}} = \bigg( \frac{\mathrm d \Delta}{\mathrm d \Delta_1}\frac{\mathrm d \Delta_1}{\mathrm d \delta^{(3)}_{1}}\frac{\partial \delta^{(3)}_{1}}{\partial a^{(3)}_{1}}\frac{\mathrm d a^{(3)}_{1}}{\mathrm d z^{(3)}_{1}} \frac{\partial z^{(3)}_{1}}{\partial a^{(2)}_{j}} + \frac{\mathrm d \Delta}{\mathrm d \Delta_2}\frac{\mathrm d \Delta_2}{\mathrm d \delta^{(3)}_{2}}\frac{\partial \delta^{(3)}_{2}}{\partial a^{(3)}_{2}}\frac{\mathrm d a^{(3)}_{2}}{\mathrm d z^{(3)}_{2}} \frac{\partial z^{(3)}_{2}}{\partial a^{(2)}_{j}} \bigg) \frac{\mathrm d a^{(2)}_{j}}{\mathrm d z^{(2)}_{j}} \frac{\partial z^{(2)}_{j}}{\partial \theta^{(1)}_{ij}} $$ $$\frac{\partial \Delta}{\partial \theta^{(1)}_{11}} = \bigg( \frac{\mathrm d \Delta}{\mathrm d \Delta_1}\frac{\mathrm d \Delta_1}{\mathrm d \delta^{(3)}_{1}}\frac{\partial \delta^{(3)}_{1}}{\partial a^{(3)}_{1}}\frac{\mathrm d a^{(3)}_{1}}{\mathrm d z^{(3)}_{1}} \frac{\partial z^{(3)}_{1}}{\partial a^{(2)}_{1}} + \frac{\mathrm d \Delta}{\mathrm d \Delta_2}\frac{\mathrm d \Delta_2}{\mathrm d \delta^{(3)}_{2}}\frac{\partial \delta^{(3)}_{2}}{\partial a^{(3)}_{2}}\frac{\mathrm d a^{(3)}_{2}}{\mathrm d z^{(3)}_{2}} \frac{\partial z^{(3)}_{2}}{\partial a^{(2)}_{1}} \bigg) \frac{\mathrm d a^{(2)}_{1}}{\mathrm d z^{(2)}_{1}} \frac{\partial z^{(2)}_{1}}{\partial \theta^{(1)}_{11}} = \big(\delta^{(3)}_{1}g'(z^{(3)}_{1})\theta^{(2)}_{11} + \delta^{(3)}_{2}g'(z^{(3)}_{2})\theta^{(2)}_{12}\big)g'(z^{(2)}_{1})a^{(1)}_1 = a^{(1)}_{1}g'(z^{(2)}_{1})\delta^{(2)}_{1}$$ $$\frac{\partial \Delta}{\partial \theta^{(1)}_{12}} = \bigg( \frac{\mathrm d \Delta}{\mathrm d \Delta_1}\frac{\mathrm d \Delta_1}{\mathrm d \delta^{(3)}_{1}}\frac{\partial \delta^{(3)}_{1}}{\partial a^{(3)}_{1}}\frac{\mathrm d a^{(3)}_{1}}{\mathrm d z^{(3)}_{1}} \frac{\partial z^{(3)}_{1}}{\partial a^{(2)}_{2}} + \frac{\mathrm d \Delta}{\mathrm d \Delta_2}\frac{\mathrm d \Delta_2}{\mathrm d \delta^{(3)}_{2}}\frac{\partial \delta^{(3)}_{2}}{\partial a^{(3)}_{2}}\frac{\mathrm d a^{(3)}_{2}}{\mathrm d z^{(3)}_{2}} \frac{\partial z^{(3)}_{2}}{\partial a^{(2)}_{2}} \bigg) \frac{\mathrm d a^{(2)}_{2}}{\mathrm d z^{(2)}_{2}} \frac{\partial z^{(2)}_{2}}{\partial \theta^{(1)}_{12}} = \big(\delta^{(3)}_{1}g'(z^{(3)}_{1})\theta^{(2)}_{21} + \delta^{(3)}_{2}g'(z^{(3)}_{2})\theta^{(2)}_{22}\big)g'(z^{(2)}_{2})a^{(1)}_1 = a^{(1)}_{1}g'(z^{(2)}_{2})\delta^{(2)}_{2}$$ $$\frac{\partial \Delta}{\partial \theta^{(1)}_{21}} = \bigg( \frac{\mathrm d \Delta}{\mathrm d \Delta_1}\frac{\mathrm d \Delta_1}{\mathrm d \delta^{(3)}_{1}}\frac{\partial \delta^{(3)}_{1}}{\partial a^{(3)}_{1}}\frac{\mathrm d a^{(3)}_{1}}{\mathrm d z^{(3)}_{1}} \frac{\partial z^{(3)}_{1}}{\partial a^{(2)}_{1}} + \frac{\mathrm d \Delta}{\mathrm d \Delta_2}\frac{\mathrm d \Delta_2}{\mathrm d \delta^{(3)}_{2}}\frac{\partial \delta^{(3)}_{2}}{\partial a^{(3)}_{2}}\frac{\mathrm d a^{(3)}_{2}}{\mathrm d z^{(3)}_{2}} \frac{\partial z^{(3)}_{2}}{\partial a^{(2)}_{1}} \bigg) \frac{\mathrm d a^{(2)}_{1}}{\mathrm d z^{(2)}_{1}} \frac{\partial z^{(2)}_{1}}{\partial \theta^{(1)}_{21}} = a^{(1)}_{2}g'(z^{(2)}_{1})\delta^{(2)}_{1}$$ $$\frac{\partial \Delta}{\partial \theta^{(1)}_{22}} = \bigg( \frac{\mathrm d \Delta}{\mathrm d \Delta_1}\frac{\mathrm d \Delta_1}{\mathrm d \delta^{(3)}_{1}}\frac{\partial \delta^{(3)}_{1}}{\partial a^{(3)}_{1}}\frac{\mathrm d a^{(3)}_{1}}{\mathrm d z^{(3)}_{1}} \frac{\partial z^{(3)}_{1}}{\partial a^{(2)}_{2}} + \frac{\mathrm d \Delta}{\mathrm d \Delta_2}\frac{\mathrm d \Delta_2}{\mathrm d \delta^{(3)}_{2}}\frac{\partial \delta^{(3)}_{2}}{\partial a^{(3)}_{2}}\frac{\mathrm d a^{(3)}_{2}}{\mathrm d z^{(3)}_{2}} \frac{\partial z^{(3)}_{2}}{\partial a^{(2)}_{2}} \bigg) \frac{\mathrm d a^{(2)}_{2}}{\mathrm d z^{(2)}_{2}} \frac{\partial z^{(2)}_{2}}{\partial \theta^{(1)}_{22}} = a^{(1)}_{2}g'(z^{(2)}_{2})\delta^{(2)}_{2}$$ $$ \delta^{(2)} = \left[ \begin{array}{} \delta^{(2)}_{1}\\ \delta^{(2)}_{2}\\ \end{array}\right] = \left[ \begin{array}{} \theta^{(2)}_{11}\delta^{(3)}_1g'(z^{(3)}_1) + \theta^{(2)}_{12}\delta^{(3)}_2g'(z^{(3)}_2) \\ \theta^{(2)}_{21}\delta^{(3)}_1g'(z^{(3)}_1) + \theta^{(2)}_{22}\delta^{(3)}_2g'(z^{(3)}_2) \\ \end{array}\right] = \left[ \begin{array}{} \theta^{(2)}_{11} & \theta^{(2)}_{12} \\ \theta^{(2)}_{21} & \theta^{(2)}_{22} \\ \end{array}\right] \space \left[ \begin{array}{} \delta^{(3)}_1g'(z^{(3)}_1) \\ \delta^{(3)}_2g'(z^{(3)}_2) \\ \end{array}\right] = (\theta^{(2)})^{T} \delta^{(3)}.*g'(z^{(3)})$$Generally,
$$\delta^{(\ell)}= \begin{cases} a^{(\ell)} - y & \text{for} \space \ell = L\\ (\theta^{(\ell)})^{T}\delta^{(\ell + 1)}.*g'(z^{(\ell + 1)}) & \text{for} \space 1<\ell \leq L-1 \end{cases} $$where L = last layer
The form of weight matrix is given by,
$$\theta^{(\ell)} = \left[ \begin{array}{} \theta^{(\ell)}_{11} & \theta^{(\ell)}_{21} & \cdots & \theta^{(\ell)}_{n1} \\ \theta^{(\ell)}_{12} & \theta^{(\ell)}_{22} & ... & \theta^{(\ell)}_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ \theta^{(\ell)}_{1n} & \cdots & \cdots & \theta^{(\ell)}_{nn} \end{array}\right] $$