It is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is assumed that the samples are generated by a Gaussian like distribution, ranges from -1 to 1.
$$c_p = \frac{cov_{x,y}}{\sigma(x)*\sigma(y)}$$It is with the prior of non-Gaussian distribution.
$$c_s = \frac{cov_{r(x),r(y)}}{\sigma(r(x))*\sigma(r(y))}$$Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.
Just a simple vector analysis,
$$\cos\theta = \frac{\vec{a}\cdot\vec{b}}{\|\vec{a}\|\|\vec{b}\|}$$The loss function can be described using a Euclidean distance function, this is usually used to train embedding network, like face embedding,
$$\mathcal{L}(A,P,N) = \max\bigg(\|f(A)-f(P)\|^2-\|f(A)-f(N)\|^2+\alpha,0\bigg)$$where A is an anchor input (reference face), P is a positive input of the same class as A, N is a negative input of a different class from A, \(\alpha\) is a margin between positive and negative pairs, and f is an embedding.