A simple derivation of Kelly Bet

While I have searched online quite a bit, I didn’t find a satisfying simple derivation of Kelly Bet formula. So I have worked on it by my own, and it turns out the derivation can be quite simple

Assume X is the distribution of potential investment return (a random variable), and d is the percentage of capital investment (what percentage of net worth you want to put in), assume we bet N times,

Each bet gives return: 1 + d*X

Total return (multiply all individual returns) after N times is:

product(1 + d*X)

Apply a monotone log function on this: log(product(1+d*X)) = sum(log(1+d*X))

So our goal is to maximize this sum, in order to maximize total return.

If we apply taylor expansion on log(1+d*X) and take the first two terms (assuming d*X is not too big, we don’t need the third order term):

sum(log(1+d*X)) = sum(d*X – d*d*X*X/2) = d*sum(X) – d*d*sum(X*X)/2

= d*m*N – d*d*(V + m*m)*N/2

(Here m is the mean of X, and V is the variance of X. And we used the fact that m = sum(X)/N, V = sum(X*X)/N – m*m  ==> sum(X*X) = (V+m*m)*N )

To maximize this value, we need to set the first derivative relative to d to zero:

first derivative = m*N – d*(V+m*m)*N = 0

==> d = m/(V+m*m)  

When m*m << V, this can be simplified to d = m/V  (mean/variance)

Here we can see that d = m/(V+m*m) is another simple yet more accurate formula for Kelly Bet. Many articles uses d = m/V, where V is variance or second central moment, but it really should be second non-central moment, as shown in the Wiki: http://en.wikipedia.org/wiki/Kelly_criterion


A binary game has 60% chance winning, and 40% chance losing. The expected return forms a Bernoulli distribution. Mean is 0.6 – 0.4 = 0.2, variance = 1 – 0.2*0.2 = 0.96.

From the formula d = m/(V+m*m),  d is 0.2/(0.96 + 0.04) = 0.2. Numerical test shows that this is a correct formula, and more accurate than d = m/V formula.

What are the assumptions we have used?

1.  We assumed the distribution is known and doesn’t change.

2. We assumed there are a large of number bets we can do within our interested time frame. (Otherwise, sum(X) is a distribution itself, and is not same as mean *N). For trading, it is not a problem. For long term investment, this might be a problem, but with enough diversification and fairly long term horizon (10-20 years), it should be OK (if we have 7 positions at any time, and each position’s average hold time is 1 year, total we have 70 bets in 10 years, not ideal, but still close to normal distributions, still for long term fundamental investing, we have to be more cautious to use Kelly Bet as it is, since fluctuation within 2-3 years could still be pretty big, human psychology may not sustain such big fluctuations).

3. We assumed d*X is small, so we can ignore the 3rd term in Tayler Expansion. So what small it has to be? If the third term is 5% to 10% of the second term, it may be small enough, and that requires d*X < 0.15 to 0.3.  Normally for fundamental investment or short term trading, this should be closely satisfied, since trading has small returns, and fundamental investment has small capital percentage per-position. However, this condition is very important, if it is not satisfied, the final conclusion is often completely wrong (such like more upside may result in less capital allocation).

So the real questionable assumption here is the #1: distribution unknown. We don’t know the distribution, not even the mean or variance. This means to claim Kelly Bet as the optimal bet size, we have to be very conservative on estimating the mean and variance. Any mistake on the aggressive side is much more devastating than being on the conservative side. In another word, it is “better safe than sorry”.

For stocks, the distribution is certainly not a Bernoulli distribution, the returns are more like a log-normal distribution, although the formula above didn’t use any assumption of a particular distribution, we only need to know mean and variance.


Posted in Uncategorized | Leave a comment

Kelly Bet explained

There are two reasons for being risk aversion.

The first reason is about “Utility Theory”. If you only have $1000, losing $100 is a big deal, but if you have $1,000,000, losing $100 is nothing. So the same amount of money is less valuable when you have more money. So gaining 10% always gives less value than losing 10%, because the bigger the asset, the less the value of the same amount of incremental asset.

The second reason is the asymmetry of gain and loss. If you lose 50%, you have to gain 100% to get back to even. This is because the returns are multiplicative, not additive. In another word, we want to maximize the expected log-asset, not expected asset.

The basic Kelly bet formula is just  capital_allocation_percentage = mean/variance. (“Mean” is expected excess return)

However, we can’t simply use it as is for the following reasons:

1. Kelly bet only gives you an “upper bound”, and that upper bound is very big. 

Example: If you are playing a binary game (double or lose everything), and you have 60% chance win and 40% chance lose, your expected return is 0.6 – 0.4 = 0.2. Then your variance is 0.6*1*1 + 0.4 *(-1) * (-1) – 0.2*0.2 = 0.96, and the percentage of capital allocation is 0.2 / 0.96 = 19%. In another word, you should bet 20% of all your money in one single bet. That is really aggressive.

This number is still valuable though, since it tells you no matter how aggressive you are, more than this number will only bring you worse result, not better. It also tells you if you have a very large number of bets at the same odds, this is the optimal bet size. Less than this bet size will NOT make it safer for you, only give you less returns.

Now what it means for stocks? Assume BRKB is much undervalued, and there is 50% upside. The mean return is 50%. Now assume BRKB is so safe that the standard deviation is only 35% of stock price. The kelly bet is 0.5 / (0.35 * 0.35) = 4  or 400% of your capital!!

This means no only you should put all your money into it. You should also borrow 3 times of your capital to bet on it. That is assuming there is no liquidation calls when it goes down.

Apparently this number is too big. We can’t use this number in practical case, and almost nobody uses it except in high frequency trading.

That said, it does give a general sense about how the capital should be allocated. For example, if a stock is twice more risky, then we should only put 25% of the position size.

For example, if BRKB is $130 right now, and we think its eventual share price in a year is likely to be in $100 to $180 (assume equal chance in this range), kelly bet gives about 25% of capital size. Some people choose to use half kelly bet, or 12.5%. For an value investor, this size is still too much, since the mean is $140, so only 7.6% upside. Nobody should put 12% of capital into a stock with only 7.6% upside right? Especially when the likely downside is 12% (half of the maximum drop).

Even though Kelly bet is too big for practical cases, the math here remains true. Meaning we should use a bet that is inversely proportional to the square of risk, and proportional to the mean return.

2. Kelly bet assumes you get large number of opportunities within a reasonable time frame, plus with known fixed odds.

3. We have to be careful about using leverage and the maximum downside.

If it can ever get to complete loss (due to leverage usually), then since anything times zero is zero, the eventual result is zero. This is certainly not the optimal result. So whatever the distribution we use, we must make sure it can never gets to zero.  Even the distribution we use doesn’t lead to a zero, in real life, the distribution is never fixed, so it is always possible to lead to a zero when distribution changes temporarily.

In math, when it gets to zero or negative number, the log function applied on it will be undefined.

The beauty of Kelly Formula is that it gives an optimal bet no matter how much risk appetite a person has. So it is irrelevant to risk aversion factor or utility function.

However, if it could ever gets to zero (no matter how small the chance is), the final result is zero. Apparently, in this case, it does matter to be more risk averse.

In another word, in the classic Kelly Setup, you take “no risk” if you play the game long enough, since final result is almost a given. But if it could end up with a zero at any point, it will have a chance to be zero, the final result is not a given. In fact, if you play infinite number of times, the final result is always zero!

Another way to think this, Kelly is try to maximize the expected log-asset, once that number is zero, log(0) is undefined.

So the question is why do we have such a big difference between Kelly bet and realistic bet we should use? Didn’t I just say less than Kelly bet will not make it safer, only reduce overall returns?

Here are the reasons:

a. Kelly bet assumes you can bet many times (> 100 or at least > 40) within a reasonable time frame, and each bet has same known odds. This is OK for casino case, but not for long term investing. For trading, you can do many times, but still each time it may have different odds since market is always changing. If you can’t bet many times, lets say you bet only 3 times within 10 years, you will not have “the law of large numbers” to help you, and then the fluctuations caused by bad luck will be really damaging to you. So in that case, you would need to be much more cautious. In real life, we don’t know the odds and we could be overly optimistic especially when it is fundamental investing, not technical trading, as there is no historical data to backup our estimate. Even if we know the odds, the odds will be different for different stocks at different times. However, among all the three factors (unknown odds/variance, small number of available bets, different odds over time), the unknown odds and its variance is the most significant factor usually, that means we better use a pretty conservative odds/variance estimation before we can apply Kelly Bet!


b. The mean/variance formula is not a precise formula, it is approximate, but usually it is a good enough approximate number. However, during high leverage case, it may get more complicated. Since the kelly bet is optimizing the expected log asset, if asset gets to zero or negative, it is undefined!! So anytime when we apply leverage, there is a chance for asset to go to zero, and therefore kelly bet may not apply. That is why using kelly formula can be risky or conceptually wrong when used in high leverage bet.

c. The psychological challenge is too big if the fluctuation is too big.

d. As mentioned above, we have to care about the possibility of changed distribution and that may cause a wipe-out event. Any possibility of wipe-out event will break the promise of optimal result given by Kelly Bet.

What about for multiple stocks in a portfolio? The good news is that kelly bet is additive. So if for BRKB, kelly bet says you should put in 20%, and for UBNT, kelly bet says you need to put in 10%, you should just do so. This is assuming the sum of both proportion ratio is less than 100%. In this case, the sum is 20% + 10% = 30%.

What if Kelly bet says to put 80% to BRKB, and 70% to UBNT? If the sum goes beyond 100% and you don’t want to use leverage, you can scale it down accordingly. So use 80% / 1.5 = 53% for BRKB and 47% for UBNT. However, this is not really optimal in long term growth. For optimal long term growth, usually we need to put more into the one that gives higher expected excess return, and in this case, we may need to put more into UBNT since it gives higher return. But this would certainly increase risk. Linearly scale-down usually gives better/optimal risk profile within single period, but sacrifices the long term growth.

The above is assuming there is no correlations between stocks, if you have strongly correlated stocks such as two banks in the same industry, the correct formula is to multiply the expected return vector by an inverse of covariance matrix. That math is getting a bit more complicated.

Another word of truth unrelated to this: since the average correlation between stocks is about 15%, math shows that over-diversification will not help. Having a portfolio of 7 stocks is not much different from having a portfolio of 700 stocks, in terms of risk diversification, assuming these 7 stocks have average correlation (15%), and equal sized positions.


Posted in Uncategorized | Leave a comment

How Engineers do Business

After studying a lot of Elon Musk and Robert Pera, I found they have a lot of similarities.

First, they are both engineers and later started to do business.

The way they do business is quite different from traditional business schools.

Why would Tesla choose to not make money on car service, and gives supercharging for free? Why would Ubiquiti prices something that can easily sell for $1500 – $2000 (still a lot cheaper than competitors) for $1000?

Why would Tesla sell their luxury car at the same price in China as US? Didn’t Musk know that China’s demand is already so high, and reducing the price of a luxury goods won’t necessarily boost the demand? Plus all the other luxury brands are often selling twice more expensive in China.

Why would Ubiquiti give out software for free? Is it not nice to have a steady revenue stream by charging the annual license fees like other big companies did?

Instead of milking every penny out of their consumer, they pass on the value to consumer.

Instead of playing all kinds of tricks and put hidden charges, they put everything straight forward with just one price tag.

Instead of just getting excited by the profit growth, they also get excited by the value they create.

Instead of getting the best margin possible, they focus on getting the best efficiency possible.

Instead of following everyone else on conducting business, they have their own business model.

What happens then? Is that a good choice or a bad one?

With Tesla and Ubiquiti, we see trust from customer, fans all around the world, efficient corporate structure and production, and amazing growing speed that is rocketing way above expectations or even imaginations.

The disruption they brought to their industry is profound, which goes much beyond a particular industry. It brings a new perspective on how we conduct businesses, and it is what the engineering mind brings to the business world.


Posted in Uncategorized | 6 Comments

The flaw of traditional valuation model

The traditional valuation model, whether it is the model from DCF or wall street’s popular P/E multiple method, requires estimating a fair value of a stock, and either use that value to see the upside potential (potential return of investment), or apply a discount as “margin of safety”.

This model, if used simply as it is, is inherently flawed. The reason is that it doesn’t consider the possible variations of the fair value. Or using the language of statistics, it doesn’t consider the “variance” of the expected mean value.

Some people may argue that the “riskiness” of the investment is already compensated when we change the discount rate (or cost of capital) for different situations. If it is a highly leveraged company, we will use higher cost of capital rate, or high discount rate, or lower P/E multiple.

However, this kind of “compensating” method is pretty ambiguous. Because the compensated higher margin of safety really consists of two parts:

1. The lower expected mean value:

If a company has high financial leverage or operating leverage, it may have 20% chance to go bankrupt in the next 5 years, so 80% chance we have $100 share price, and 20% chance we have $0 share price, the expected mean value is $80 share price. In this case, the additional discount required is 20%

2. The additional return required to compensate the additional risk observed:

This is also a part of the high discount we needed, but it is hard to quantify this either from intuition or from math, since it really depends on personal risk tolerance and the position size.

So if a company has debt to equity ratio of 2 to 1, should we use 12% discount rate? or 10%? or 15%?

What is the justification for that percentage? And how much of that percentage is the part 1 of the premium, or the part 2 of the premium?

If a company has a clearly bad CEO who is very likely to make a very expensive acquisition to waste all the cash on the balance sheet, how much discount should you apply to that fair value?

That is why I call the traditional method very “ambiguous”.

In the modern finance (quant finance), the situation is much clearer, at least in the theory. We just estimate a mean value with different scenarios assuming a probability for each scenario, and then estimate the variance of that mean value given those scenarios. Then we will apply a discount based on the variance.

Despite the additional clarity, there are still two challenges:

1. How much additional discount we need for a given variance still depends on personal risk tolerance.

This problem is not really an issue in two cases:

a. If the position size is very small, and the individual company’s risk can be fully diversified away, we can simply ignore the variance completely. In practical cases though, manual selection of stocks requires a lot of research and follow-up, plus best opportunity is very rare, so this kind of massive diversification is not practical. Still, if position is smaller, variance is not that bad any more, therefore we can require less discount on the risky stock.

b. There is a well defined upper bound of max position size for each stock given its variance and expected return.

The beauty here is that this upper bound (as defined in “Kelly bet”) does not depend on individual’s risk tolerance. No matter how aggressive you are, once you risk more than this limit, you are simply wrong!

“Kelly bet” defines the upper bound, but it doesn’t tell you the “right” bet size, since practical situation is quite different from a theoretical setup, where you don’t have a large number of identical bets waiting for you, and you don’t know the predefined risk-reward ratio. So the “right” bet size still depends on personal risk appetite, but at least we have some theoretical ground work here, and some people just choose “half kelly bet” as their choice.

In Kelly bet, the right position size should be inversely proportional to variance, and proportional to expected return. Remember variance is square of standard deviation, this means if a stock is twice risker, we need to put 25% of position size, or for the same position size we have to require 4 times more expected return.

For example, if buying BRKB has 15% standard deviation, and buying sears has 60% standard deviation, and I am willing to put 60% of my net worth in BRKB if it is 33% discounted (50% expected return), it means I can only put 60% / (4 ^2) = 4% of my net worth to sears if it is expected return is also 50%. Or I have to require 50% * (4 ^ 2)  = 800% expected return to put 60% of my net worth into sears.

This is why I found it wrong when many value investors estimated sears’ real estate asset value and bet big on it. Yes, it does have a lot of asset value, but given its deeply troubled retailed business, the uncertainty of how many years of continued bleeding, the cost of liquidation, and the illiquid nature of real estate asset (it is hard to sell a lot of them in short term), the variance is very big. Therefore, applying 33% discount or even 50% discount may be not enough, especially when someone tries to bet big on it.

2. It is already very hard to estimate fair value, it would be even harder to estimate variance.

This is true, but a general sense would still help here. Plus, if we can list out a few scenarios and its probability, we can have a very rough estimate on the variance.

For quite some time, I was always puzzled about how much certainty I would need before making an investment. Since “certainty” is what Buffett cares most about. If I require too much certainty, it will bring very few opportunities. Too less, I am risking too much.

Later I figured out that, the certainty is important, but we can’t expect too much. There is always risk in an investment. What we should do is to adjust the position size and/or the margin of safety to compensate it.

A word of caution is that impact of risk goes quadratically, not linearly. So twice of risk (standard deviation) would require 4 times more potential return or 4 times less position size. Since it is very unlikely to get that much potential return, what usually ends up is a much smaller position size.

Apparently, this makes a very risky investment not worthwhile at some point, since we don’t want to diversify too much and we have limited time to do the research and follow-ups.

In conclusion, I think we should not neglect “variance” when we evaluate a stock, as it is as critical for evaluation. We should also remember, higher variance doesn’t always require a much higher expected return or higher discount, it really depends on how diversified we are. This concept of quant finance can help to refine the traditional valuation model.

Posted in Uncategorized | 2 Comments

Three Barriers for Tesla

Telsa is getting hot these days! People are falling in love with it, and stocks are high-flying too.

However, there are quite a few nay-sayers. Some say competitions from i3 will be a serious threat, others say fuel cell cars will kill Tesla and other electric cars too.

These people are really missing the basics.

Comparing “fuel cell” with “battery” is missing the point, because the game is not about new energy, not about saving the environment, and not about protecting the planet. None of them is the center of the game.

Yes, people all say we need to protect our environment, but how many of them would spend $10,000 more to actually do that? That is exactly why no electric cars were successful before Tesla.

While Nissan Leaf got some sales, it has pretty much relied on government incentives. That is not sustainable.

The real game here is all about “cost” and “value”, that is what business is always about. Here the value could be the car performance, look and feel, and brand name, basically any kind of user experience.

The reason Tesla is successful is not because it is an electric car, it is because it has so many innovations and it paid so much attention to user experience. The price is high, but the value is higher.

So comparing Tesla with Fuel Cell cars is missing the point, since Tesla’s success has nothing to do with being an electric car. In fact, if Telsa is a hybrid car or gasoline car, it would be even more successful.

So what are the barriers of Telsa’s eventual take-over now? I believe there are 3 barriers:

1. In short term and mid-term, it is the battery production capacity.

Since order and demand is not a problem, the current Model S production is only limited by the amount of battery Tesla can get.

2. In mid-term and long term, it is the price of the car.

In fact, this is the most important barrier of all. And what haunts all the new energy car all the time. Tesla could get away, only because Musk is super smart on providing a car with such a high value. Eventually, it has to bring the cost down though, in order to bring EV to regular people’s life.

3. In long term, it is the charging network.

For now, Tesla can build a bare-bone network to connect US or maybe other countries, but once the third generation is out, it won’t be enough to support all the cars on the road. Either Tesla has to spend a lot more to build charger station, or third party has to do it.

Interestingly, Elon Musk are addressing all the 3 barriers head on! (Apparently he is fully aware of them.)

He built gigafactory to solve the first problem. He planned third generation to solve the second problem. And he just shared his patent for supercharger for addressing the third problem.

It is a smart move to share supercharger technology, since we need third party to build chargers, and not rely on Tesla alone. The charger from other parties could charge a small fee ($5 for example) since nobody need to use it too often, they only need it when going long trips, and $5 is more than enough to cover the cost of building the stations. Waiting 20 minutes or whatever time to get to enough energy to destination is not that bad, but waiting in long line to get charged is a big trouble ahead of EV owners.

Still, #3 is not the largest barrier, since many family has multiple cars and they can keep Tesla at home when going on long trips.

With all the information I currently have now, I just don’t see how the big car companies can catch up Telsa before its third generation car is out. And once it is out, we will see a major take-over through-out the world, as so many people are eagerly waiting for it now. At that time, it is quite possible we are back to the constraint #1 (the battery production problem) again.


Posted in Uncategorized | 3 Comments

Think as a private owner

A common analogy Buffett used was to compare owning a stock to owning a farm. He said: "If you own a farm, you don’t think you need a quote on the farm selling price every day, and you don’t think the farm’s value gets down 80% because a couple of bad years, or the quote is low."

That analogy is the right altitude and first mentioned by Ben Graham. However, this is one big difference between owning a farm and owning a stock. In the first case, as the owner, you have full control on everything about the business. Not only you make full decision on operations and finances, you also hold full knowledge on the business itself. That brings the full confidence to you and makes you focus on earning power of the business rather than the market quotation.
On the other hand, people care the market quotation of stock because they don’t have full control and full knowledge about the company stock. Therefore, their only confidence lies in the ability of selling the stock "today" at the market price. 
That doesn’t mean it is the right way to think about stocks. This is just to explain why they would think in that way naturally. The right way is acquire full control or let someone you can fully trust to take full control of the company, and try to acquire as much as information as you can about the company. 
That is why good management is very very important.
Thinking as a private business owner, you only need three things to succeed:
1. Good average estimated return on the investment.
2. Control on the business or let someone you can trust to control the business
3.  Diversification.
Posted in Uncategorized | Leave a comment

Quantitative analysis and Qualitative analysis

A common mistake made by beginners on value investing is too much focus on "Quantitative Analysis", but completely ignore the Qualitative Analysis.

Basically, the first one is putting all the numbers together. P/E ratio is the most common used figure. Of course, it would be very naive to just take one number to decide the fair value of a stock. Even you find out all the hidden facts behind the current earnings to get to the bottom of the future earning power. There are other factors on the balance sheet to consider, among them, three are most important:
1. Leverage. 
Very high leverage puts the company into huge risk, if the business environment or the industry outlook changes, this is the way that leads to certain death.
2. Capital requirement.
Some business requires more capital input to fuel the growth. Good business has less capital requirement, such as food. But business like airline requires huge capital input which also leads to high debt and leverage usually. Another way to look at the capital requirement is to look at the "Return on Equity". 
3. Operating Margin.
This is especially important in cyclic industries, because when industry slows down, only the one with highest operating margin can survive. For example, when housing starts decreases 66% and the whole wallboard industry is close to bankcruptcy, Eagle Material  still ears good profit.
Even these three traits on balance sheet and income statement are still part of "Quantitative Analysis". The other part: "Qualitative Analysis" is very important too. 
"Qualitative Analysis" focus on traits that are not reflected in any numbers, such as:
1. Management stewardship and ability.
2. Sustainable competitive advantages. 
3. Industry outlook and economy outlook.
These traits are usually hidden behide the scene, and requires substantial understanding of the company and business. They are the proven indicators for the future outlook, because all the numbers we get is just the past and present, without these indicators to ensure the certainty, these numbers could be widely different from the future. 
As an example, Bear Sterns was very profitable every year in the last several decades, but eventually get bankrupted suddenly. Without understanding the management and the nature of the business, it is very hard to forecast this kind of changes just based on the past earning figures.  
Posted in Uncategorized | Leave a comment