The PageRank of a page is the probability I would end up on that page if I surfed the Internet randomly for an infinite amount of time.
The transition probability matrix,
$$A_{ij}=p(x_t=j|x_{t-1}=i)$$Stochastic / Markov matrix,
$$\sum^M_{j=1}A_{ij}=\sum^M_{j=1}p(x_t=j|x_{t-1}=i)=1$$To calculate the transition probability, we use add-1 or epsilon smoothing, so that even if there are unknown transitions, there will be no zero division.
$$p(x_t=j|x_{t-1}=i)=\frac{C(i\rightarrow j)+1}{C(i)+V}$$ $$p(x_t=j|x_{t-1}=i)=\frac{C(i\rightarrow j)+\epsilon}{C(i)+\epsilon V}$$where V is vocabulary size (number of unique words). The expression is closely related to beta posterior mean.
The state probability distribution at time \(t\), \(\pi_t=[\{p(x_t=i)\}_{i=1}^N]\).
This gives us the model of pagerank. Each state is a page link, the transition is the link between two pages, the transition probability becomes,
$$p(x_t=j|x_{t-1}=i)=\frac{1}{n(i)}$$only if i links to j, \(n(i)\)is number of links on page i. Otherwise, the probability will be 0. However, the zero probability will become dominant, since there are billion of pages on the internet having 0 transition probabilities, smoothing is required.
Assume A and U are valid Markov matrices, so as G,
$$G = .85A+.15U \qquad U_{ij}=\frac{1}{M} \qquad \forall i,j=1\cdots M $$ $$\Pi_{\infty}=\Pi_{\infty}G$$The state probability is the ranking value of respective page. This addresses the problem of spamming and creating fake pages.