While I have searched online quite a bit, I didn’t find a satisfying simple derivation of Kelly Bet formula. So I have worked on it by my own, and it turns out the derivation can be quite simple

Assume X is the distribution of potential investment return (a random variable), and d is the percentage of capital investment (what percentage of net worth you want to put in), assume we bet N times,

Each bet gives return: 1 + d*X

Total return (multiply all individual returns) after N times is:

product(1 + d*X)

Apply a monotone log function on this: log(product(1+d*X)) = sum(log(1+d*X))

So our goal is to maximize this sum, in order to maximize total return.

If we apply taylor expansion on log(1+d*X) and take the first two terms (assuming d*X is not too big, we don’t need the third order term):

sum(log(1+d*X)) = sum(d*X – d*d*X*X/2) = d*sum(X) – d*d*sum(X*X)/2

= d*m*N – d*d*(V + m*m)*N/2

(Here m is the mean of X, and V is the variance of X. And we used the fact that m = sum(X)/N, V = sum(X*X)/N – m*m ==> sum(X*X) = (V+m*m)*N )

To maximize this value, we need to set the first derivative relative to d to zero:

first derivative = m*N – d*(V+m*m)*N = 0

==>** d = m/(V+m*m) **

When m*m << V, this can be simplified to **d = m/V (mean/variance)**

Here we can see that d = m/(V+m*m) is another simple yet more accurate formula for Kelly Bet. Many articles uses d = m/V, where V is variance or second central moment, but it really should be second non-central moment, as shown in the Wiki: http://en.wikipedia.org/wiki/Kelly_criterion

Example:

A binary game has 60% chance winning, and 40% chance losing. The expected return forms a Bernoulli distribution. Mean is 0.6 – 0.4 = 0.2, variance = 1 – 0.2*0.2 = 0.96.

From the formula d = m/(V+m*m), d is 0.2/(0.96 + 0.04) = 0.2. Numerical test shows that this is a correct formula, and more accurate than d = m/V formula.

What are the assumptions we have used?

1. We assumed the distribution is known and doesn’t change.

2. We assumed there are a large of number bets we can do within our interested time frame. (Otherwise, sum(X) is a distribution itself, and is not same as mean *N). For trading, it is not a problem. For long term investment, this might be a problem, but with enough diversification and fairly long term horizon (10-20 years), it should be OK (if we have 7 positions at any time, and each position’s average hold time is 1 year, total we have 70 bets in 10 years, not ideal, but still close to normal distributions, still for long term fundamental investing, we have to be more cautious to use Kelly Bet as it is, since fluctuation within 2-3 years could still be pretty big, human psychology may not sustain such big fluctuations).

3. We assumed d*X is small, so we can ignore the 3rd term in Tayler Expansion. So what small it has to be? If the third term is 5% to 10% of the second term, it may be small enough, and that requires d*X < 0.15 to 0.3. Normally for fundamental investment or short term trading, this should be closely satisfied, since trading has small returns, and fundamental investment has small capital percentage per-position. However, this condition is very important, if it is not satisfied, the final conclusion is often completely wrong (such like more upside may result in less capital allocation).

So the real questionable assumption here is the #1: distribution unknown. We don’t know the distribution, not even the mean or variance. **This means to claim Kelly Bet as the optimal bet size, we have to be very conservative on estimating the mean and variance.** **Any mistake on the aggressive side is much more devastating than being on the conservative side. In another word, it is “better safe than sorry”.**

For stocks, the distribution is certainly not a Bernoulli distribution, the returns are more like a log-normal distribution, although the formula above didn’t use any assumption of a particular distribution, we only need to know mean and variance.