Introduction to Recommendation System

Naive Methods

Popularity

Count of views/ visits to the website, music, movie, etc. Problem is that it is unfair to new product.

Average Ratings

Average of ratings given by the users. Problem: unable to calculate without buyers' reviews or ratings.

A/B Testing

A/B testing (also known as split testing or bucket testing) is a method of comparing two versions of a webpage or app against each other to determine which one performs better. AB testing is essentially an experiment where two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal.

There are not much people willing to revisit the site, once the website is not appealing. You can potentially lose clients at the beginning.

Product Associations

Intuition: if someone loves buying art books, you should recommend more art books rather than bike.

It is written as the conditional probability, $\frac{C(A,B)}{C(B)}=\frac{p(A,B)}{p(B)}=p(A|B)$

Problem: over-associate two products

Lift is formulated as,

$$Lift = \frac{p(A,B)}{p(A)p(B)} = \frac{p(A|B)}{p(A)} = \frac{p(B|A)}{p(B)}$$

If two variables are independent, $p(A|B)=p(A)$, $\frac{p(A|B)}{p(A)}=1$. Only if they are dependent, buying B increases the probability to buy A, then lift is greater than 1.

Balancing popularity with age

Basic

$$\frac{f(popularity)}{g(age)}$$

Higher popularity leads to higher ranking, Older age lower ranking.

Hacker News Formula

$$s = \frac{(u-d-1)^{0.8}}{(a+2)^g}p$$

where s is score, g gravity = 1.8 greater than .8, this means age is more important to up down votes. u number of up votes, d no of down. p is the penalty (controversy, commercial, etc). Sublinear function is applied to popularity.

Reddit Formula

$$s = \frac{u-d}{|u-d|}\log\big(\max(1,|u-d|)\big)+\frac{a}{45000}$$

Sublinear log since first 100 votes are more important at the beginning. Scores increase over time.

Rating

Binary outcomes (-1/1) as classification
5 star ratings as regression

Confidence Intervals

$$\bar{X}=\frac{1}{N}\sum^N_{i=1}X_i \sim \mathcal{N}(\mu,\frac{\sigma^2}{N})$$ $$X\sim\mathcal{N}(\mu,\sigma^2)$$

Normal approximation

$$CI_{95\%} = [\bar{X}+z_{left}\frac{s}{\sqrt{N}},\bar{X}+z_{right}\frac{s}{\sqrt{N}}]$$

where s is the standard deviation, $z_{left}=\Phi^{-1}(.025)$, $z_{left}=\Phi^{-1}(.975)$, $\Phi$ CDF.

Bernoulli CI Approximation

$$\hat{p}=\frac{\#success}{N} \sim probability$$ $$CI_{95\%} = [\hat{p}-1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{N}},\hat{p}+1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{N}}]$$

Wilson Interval

$$\frac{\hat{p}+\frac{z^2}{2N}}{1+\frac{z^2}{N}}\pm \frac{z}{1+\frac{z^2}{N}}\sqrt{\frac{\hat{p}(1-\hat{p})}{N}+\frac{z^2}{4N^2}}$$

Then sort by lower bound. This ensures same average ratings with little votes will have lower score than those with more votes, since the lower bound of confidence interval with high variance will have a much smaller lower bound than that with low variance (more samples). But it is a pessimistic approach.

Smoothing (/Damping)

To fix the zero division problem of the average rating, the rating can be rewritten as,

$$r = \frac{1}{N+\lambda}(\sum^N_{i=1}X_i+\lambda\mu_0)$$

where $\mu_0$ is the global average.

Explore-Exploit Dilemma

To exploit, I need more confidence, so take more samples to explore. But exploring may cause loss, I need to exploit, exploitation can make me not to discover better strategy, so explore again.

References

A/B Testing by Optimizely