Posted on June 28, 2013 @ 09:51:00 AM by Paul Meagher
An additional improvement we can make to the revenue model for our lobster fishing example involves a more realistic distribution of prices per lb for lobster. Previously we assumed that the distribution of prices was normally distributed around a mean of $3.50 a lb with a standard deviation of 25 cents per lb. This means that prices can range continuously around $3.50 with the probability of lower or higher prices falling off as we deviate from $3.50. A price as extreme $3.00 a lb is 2 standard deviations away from the mean price and is expected to occur for some catches when we represent lobster prices with a normal distribution with these parameter settings (e.g., mean=3.50, stdev=0.25).
The problem is that a price of $3.00 never occurred during the season. Also, the prices never varied continuously. Instead the prices only varied in 25 cent increments so there were really only 4 price points in our price distribution; namely, 3.25, 3.50, 3.75, and 4.00. Also, the lower prices in this range occurred more often than the higher prices. Using a normal distribution to capture the distribution of prices is a simple first approximation and may put us in the ballpark for the average prices over the season, but it does not reflect the true state of affairs with respect to what prices were possible and their probabilities of occurrence.
In order to construct a more realistic price distribution, we can opt to represent the distribution of prices with a categorical distribution. I also considered calling this distribution a discrete distribution or a multinomial distribution, but these terms carry a bit of extra baggage that I did not want to commit to at this time.
A categorical distribution consists of labels along with the associated probability of each label. Collectively the probabilities of all labels should sum to 1. The main problem I want to solve in today's blog is how to generate a random label from this distribution. My approach involves constructing a new distribution called
CategoricalDistribution.php and developing a random number generator for that distribution.
Here is what a "bare bones" version of a
CategoricalDistribution.php object looks like:
One important aspect of this code to take note of is that the distribution includes a general
ProbabilityDistribution.php object (located in the Probability Distributions Library that I developed). This object contains a random number generator function, called
RNG(), that calls the private
_getRNG() method in this code. The parent
RNG() method (in the
ProbabilityDistribution.php object) can call this private method multiple times if it is supplied with a number specifying how many times to call the private method; otherwise, it just returns one random category label. The
CategoricalDistribution.php object is "bare bones" because it does not include a variety of other methods/functions that are usually included in probability distribution objects, such a methods for returning the mean, standard deviation, pdf values, cdf values, etc... for a distribution. Perhaps in the future I will add these additional methods after I have reviewed the literature, but for current purposes I only needed to implement the private
_getRNG() method (private methods are often prefixed with an underscore to make their status as private methods more obvious).
To verify that the random variate generator for categorically distributed variables works as it should, I have developed a
random_prices_test.php script that is supplied with a distribution of price points and their corresponding probabilities (i.e.,
$price_distribution = array('3.25'=>0.4, '3.50'=>0.3, '3.75'=>0.2, '4.00'=>0.1)). It generates 30 random prices and then outputs this array of 30 random prices.
Here is what the output of this script looks like:
 => 3.50
 => 3.50
 => 3.75
 => 3.25
 => 3.50
 => 3.75
 => 3.25
 => 4.00
 => 3.75
 => 3.25
 => 3.50
 => 3.75
 => 3.50
 => 3.25
 => 3.25
 => 3.75
 => 3.25
 => 3.50
 => 3.25
 => 3.50
 => 3.50
 => 3.75
 => 3.25
 => 3.50
 => 3.25
 => 3.25
 => 3.75
 => 3.50
 => 4.00
 => 4.00
Notice how the value 3.25 occurs more than other values and that a value like 4.00 occurs the least often. The random variate generator appears to be working as it should (more sophisticated tests than the eyeball method are possible but I'll spare you the details). We can use the
CategoricalDistribution.php object in our next blog to construct a more realistic model of lobster fishing revenue.
In today's blog I introduced you to the idea of a categorical probability distribution because it is required in order to develop a more realistic model of lobster fishing revenue (i.e., lobster price points are distributed in a categorical manner). In general, when modelling revenue from a line-of-business you not only need to give careful consideration to whether the factors determining revenue are stationary or non-stationary (which I discussed in my last blog), but also whether the factor values are distributed in a continuous (e.g., lobster catch size) or a discrete manner (e.g., lobster prices). If you keep these two critical distinctions in mind then you should be able to generate much more realistic revenue models for your own line-of-business.
From a lobster buyer's point of view the price they are willing to pay for lobsters varies according to supply and demand aspects of their business. If we knew what these factors were, we might be able to develop a non-stationary representation of lobster prices that take these factors into account. Because sellers are often in the dark about the buyer's business, we must often be content to use a stationary price distribution to reflect our uncertainty regarding the price we can sell our goods for.
A Non-Stationary Revenue Model
Posted on June 27, 2013 @ 09:10:00 AM by Paul Meagher
The underlying process that generates revenue for a line-of-business can be stationary or non-stationary in nature. A stationary revenue process is one in which the parameters for the probability distribution representing a revenue factor (e.g., average lobster price) is specified once for the period that your revenue model covers. A non-stationary revenue process is one in which you must specify the parameters multiple times (e.g., average lobster catch size) during the period that your revenue model covers. In my last blog I argued that in order to construct a more realistic revenue model for a lobster fishing season, we should take into account the fact that lobster catch sizes diminishes over the course of a season because the rate of extraction from the lobster fishing grounds exceeds the rate of replenishment as the season progresses. I argued that an exponential decay function is a useful function to use to represent how the catch size diminishes over the course of a season. I showed a worked example of the math required to estimate the relevant decay parameter k that captures how the catch size decreases over the course of the season.
In this blog, I want to illustrate how to integrate this non-stationary factor (i.e., catch size) into our lobster fishing revenue model. The essential observation is that we cannot be content to set the parameters of our catch size distribution (average catch size and standard deviation) only once, but instead need to set these parameters multiple times as we iterate through each catch of the season. In other words, the catch size distribution that we sample from changes after each catch; specifically, the mean and standard deviation parameters are reduced by a constant percentage after each catch. This better represents the true state of affairs with respect to how revenue is generated over the course of a lobster fishing season. Our first attempt at creating a revenue model for a lobster fishing season assumed that the lobster catch-size distribution was stationary over the course of a fishing season. Now we are assuming it is non-stationary so that we can construct a more realistic revenue model.
The new script that I developed to model lobster fishing is called
lobster_revenue_with_decay.php and the code for that model is illustrated below. I was also informed that instead of the lobster season consisting of 28 trips, it will consist of around 40 trips this season so that is one other change to the revenue model I presented previously.
What is critical to note in this code is that we set the parameters of our
This is what the output of our new lobster fishing revenue model looks like.
$lobster_catch_distribution multiple times according to our exponential decay function for the mean catch size and the standard deviation of the catch size. In general, a non-stationary process involves re-setting parameters inside the loop that generates revenue for each time unit in your model. In contrast, the parameters for the
$lobster_price_distribution is only set once outside the loop and remains constant for each time unit of the model. This structure will be common to all revenue models that consist of stationary or non-stationary factors that determine revenue.
What you should note about this output is how the revenue is greater at the beginning of the season than towards the end of the season. This gives us a better sense of what type of cashflow to expect over the season. It conforms better to the day-to-day revenue expectations a fisherman has over the course of a lobster fishing season.
In this blog I've illustrated how stationary and non-stationary factors are included in a revenue model. Although I am focusing on a lobster fishing example, the programmatic lessons about how to incorporate stationary and non-stationary factors into a revenue model is more general and you can use this model as a template for constructing a revenue model for your own line-of-business. When specifying the non-stationary component of your revenue model you have many choices as to what function might be used to determine the parameter settings for probability distribution representing that component. I used an exponential function but there are a large number of other possible functions you might use. One common function would be a sine wave function if your revenue model has a seasonal component. Gompertz functions are often used in situations where the sales are brisk at first then die off rapidly thereafter, such as happens when a new movie is released. Piecewise linear functions are also very useful and flexible.
I'm not quite done with revenue modelling as there is one more aspect that I want to add to the lobster-fishing model make it more realistic. Stay tuned for my next blog to find out what else we might do to make our revenue models more realistic.
Stationary and Non-Stationary Revenue Models
Posted on June 26, 2013 @ 12:31:00 PM by Paul Meagher
In my last blog I proposed a simple revenue model for lobster fishing. In today's blog I want to refine that model to make it more realistic. The particular refinement that I want to make to the model today involves taking into account the fact that lobster catches tend to decline over
the course of a lobster fishing season. The number of lobsters available to catch declines after each catch because you are extracting lobsters from the grounds and are they are not being replaced with new lobsters at a rate that equals the extraction rate. That means you have less
lobsters to catch after each running of the lobster traps.
Our simple revenue model for lobster fishing assumed that the size of the catch stayed the same from the beginning of the season to the end of the season. I selected a catch size that was somewhere between the largest and smallest expected catch sizes in order to wash out these
differences, however, it would be better if I acknowledged that the catch size distribution I was sampling from was not "stationary" over the course of the fishing season, but rather is "non-stationary", with the mean catch size decreasing over the course of the lobster fishing season. By acknowledging the non-stationary nature of the lobster biomass over the course of a fishing season, we can better estimate what lobster fishing revenue looks like over the course of a season instead of assuming that it is roughly constant around a single mean with some variation due to chance factors.
In general, when you are modelling revenue you need to think deeply about whether the revenue generating process is stationary over time or is non-stationary. When selling gardening supplies, for example, we might expect there to be minor sales outside of the growing season with sales picking up at the beginning of the growing season and then tapering off towards the end. Your revenue model for garden supply sales might be best captured by factors in your revenue modelling
equations that takes into account the seasonal and bursty nature of such sales.
In the case of lobster fishing we will attempt to capture the non-stationary nature of the catch size distribution (and therefore the revenue distribution) by assuming that the availability of lobsters to catch decays in an exponential manner. Exponential decay of lobster catch size means that there is a percentage decrease in available lobsters after each catch which will lead to lower mean catch sizes as the season progresses.
Exponential Decay Formula
The exponential decay/growth formula looks like this:
N = N0 * e k*t
The symbols in the equation have the following meaning:
N is the amount of some quantity at time t.
N0 is the initial amount of some quantity at time 0.
e is the natural exponent which is roughly equal to 2.71.
t is the amount of time elapsed.
k is the decay factor or percentage decrease per unit of time.
To use this formula in the context of my lobster fishing revenue model I need to figure out what values to plug into this equation. What I have to work with is some estimates of what the lobster catch will be at the beginning of the season (1000 lbs), what it might be at the end of the season (300 lbs), and how many trips in the boat they will make to the lobster fishing grounds during the lobster fishing season (40 trips - Note: the number of trips will be the t value in our exponential formula). Given these values, we can figure out what the decay rate should be so that we begin the season with a catch of 1000 lbs and end the season with a catch of 300 lbs and we do this over 40 fishing trips.
So here is the exponential growth/decay formula (depending on the sign of the k term):
N = N0 * e k*t
Now substitute in our values:
300 = 1000 * e k*(39)
I use 39 for the value of t rather than 40 because there is no decay in lobster catch size for the first trip. The decay only kicks in on the subsequent 39 trips. The math works out correctly this way as you
will see later.
To solve for k, we need to rearrange some terms:
300/1000 = ek*(39)
Applying the natural logarithm function, ln(), to both sides allows us to get rid of the natural exponent e as follows:
ln(300/1000) = ln(ek*(39))
Which evaluates to:
-1.20397 = k * 39
The solution looks like this:
-1.20397/39 = k
Doing the division, we are left with:
k = -0.03087
So the exponential decay formula for the mean catch size looks like this:
N = 1000 * e -0.03087 * t
Where t is the trip number which varies between 1 (for the second trip of the season) and 39 (for the 40th trip of the season).
Testing The Formula
To test his formula we can plug in the relevant values and verify that in our last catch of the season our expected catch would be 300 lbs. The PHP program below verifies that this is the case.
The output of the script tells us that on the final trip we expect to catch 300 lbs of lobster which is how we want our decay function to work.
The mean catch size is not the only parameter that we might expect to vary through the lobster fishing season; we might also expect that the standard deviation in catch sizes would also decrease along with the smaller catch sizes. A simple and reasonable approach to decreasing the expected standard deviation in catch sizes would involve decreasing the standard deviation over catches/trips using the same exponential decay formula but using the initial standard deviation (250) as the N0 initial value in a similar exponential decay equation (i.e., N = 250 * e -0.03087 * t).
The script below is used to verify that our expected catch sizes and expected standard devation in catch sizes start and end at appropriate values.
So what we have done in today's blog is to come up with an exponential decay formula that we will be using to define the mean and standard deviation values that we will plug into our catch size distribution function which we assume to be normally distributed. For our lobster fishing revenue model to become more realistic we have to acknowledge that the revenue obtained from lobster fishing is non-stationary through the season, in particular, that we generate less income each time we go out fishing because the available stock of lobsters is reduced after each catch. We can model this decreased revenue by sampling from a catch size distribution that has a smaller mean and standard deviation after each trip. In my next blog I will show you how the PHP routine above can be incorporated into our lobster fishing revenue model to provide us with a more realistic revenue model, one that might provide us with more realistic expectations regarding cash flow through the season.
As a final note, the lobster fishing season this year is quite unusual as catches are larger than normal and they are still getting good catches towards the end of the season (800 lbs on their last trip). The revenue model ignores certain unusual aspects of this season which might make it a better revenue model for predicting lobster fishing revenue next season. The model does not attempt to overfit the data from this season because the numbers are quite unusual (they might, however, reflect the effect of better conservation measures which might persist in their effects). Predictive revenue modelling can be more of an art than a science as it involves judgement calls regarding what is signal and what is noise.
Posted on June 25, 2013 @ 06:41:00 AM by Paul Meagher
In a previous blog on profit distributions, I suggested that we could forecast profit distributions for a line-of-business rather than make point estimates of how much the line-of-business might make over a specified period of time. A forecasted profit distribution can be viewed as arising from probabilistic factors that jointly produce the range of possible profit outcomes.
In order to compute profits, we need to estimate the revenue term in the profit equation (profits = revenue - costs). In today's blog, I want to focus on revenue modelling because it can provide insight into how our profit distribution forecasts might arise and also because revenue modelling
is a useful and interesting intellectual exercise.
To make a revenue model for a line-of-business that can be used to account for some of the variability in a profit distribution, we need to specify the primary factors that generate revenue and how they might interact to produce the expected range of possible revenue outcomes. We can go into more or less detail on what these primary factors are and how they interact, but initially we should be content to deal with just two or three factors until we master that level of model complexity.
Lobster Fishing Revenue Model
To illustrate what a revenue model for a line-of-business might look like, I will take the example of lobster fishing which I am somewhat familiar with because my in-laws are fisherman (they fish lobster and crabs and grow oysters). I will construct a revenue model that might account for this season's revenue from the lobster fishing line-of-business.
A season of lobster fishing has a start date and an end date, in this case from May 1 to June 30, 2013. Between these dates, fishermen can set traps for lobsters and keep those that fall within conservation guidelines. The lobsters are not fished every day; usually you leave the traps for a day and come back the second day to harvest your traps (fyi, they harvest their crab traps on off days until they complete their crab quota). The harvest you obtain from your traps is called your "catch" and the season can be viewed as consisting of the number and sizes of catches you made over the season, and the prices you obtained for your various catches. So the two primary factors we will use to characterize a catch is the size of the catch in pounds (lbs) and the price per lb that was received for that catch. We can compute the revenue for a season of fishing by adding up the revenue per catch for all their catches in that season.
What I want to do in this week's blogging is construct a simple probabilistic revenue model for lobster fishing and then explore a set of refinements I might make in order to make it a more realistic revenue model. This might inspire in you to construct a simple revenue model for your own line-of-business and to consider what additional factors you might want to take into account to make it more realistic.
Revenue Model Implementation
You can implement a revenue model using whatever programming language you are comfortable with. In the example below
I use PHP because I have developed a Probability Distributions Library (https://github.com/mrdealflow/PDL) that I find useful for injecting uncertainty into the model. I inject uncertainty by assuming that the catch size is normally distributed with a mean catch size
of 500 lbs and a standard deviation of 150 lbs. This means that as my program iterates through all the catches it
generates a possible catch size by sampling from a normal distribution of values with a mean of 500 lbs and a standard
deviation of 150 lbs. This results in catch sizes that vary quite a bit from catch to catch. I also inject uncertainty
into the revenue model by assuming that the price per lb for live lobster is $3.50 per lbs with a standard deviation of
25 cents from catch to catch. So as we iterate through each catch we sample from a catch size distribution and a
price per lb distribution and multiply the sampled values together to compute the revenue generated for that catch. The
revenue generated for each catch is primarily a function of the catch size random variable and the price per lb
random variable. Here is what the model looks like:
And here is the output that the
lobster_fishing.php revenue model generates:
In my next two blogs I'll be exploring a couple of refinements to this revenue model that are designed to make the revenue model more realistic and also to give you more modelling ideas you might use to construct more realistic revenue models for your own
Is Successful Business Investing a Non-Self-Weighted Process?
Posted on June 21, 2013 @ 07:05:00 PM by Paul Meagher
In my last blog, I suggested that to be a better business investor, you should focus on making your investing process more skillful
rather than focusing on short term results because business investments are subject to significant "luck" or "chance" factors that are not
fully under an investor's control. If an investor focuses on improving their investment process rather than focusing exclusively on short-term results, then over the long haul it might produce better returns and result in less short-term anguish over outcomes.
So what does a skillful gambling process consist of, and, by extension, what might a skillful investing process look like?
Some advice from gambling theory is that profitable gambling processes are non-self-weighting.
This terminology is due to Mason Malmuth from his book Gambling Theory and Other Topics, 2004, p. 16.
Quickly recapping, self-weighting gambling strategies are those in which many plays are made for similiar-sized bets, while successful non-self-weighting strategies attempt to identiy where the gamble has the best of it and then to make the most of it. As already noted, only non-self-weighting strategies, where appropriately applicable, are profitable for the gambler.
To properly grasp the concept of "self-weighting" I think it helps to formalize the concept a bit.
A perfectly "self-weighting" (SW) betting process is one that consists of N betting events wherein you bet the same amount on each bet for all N betting events (e.g., each hand of poker). The individual bet would be equal to the mean bet for all betting events (e.g., bet $20) producing 0 dispersion among betting events. Someone who is reluctant to bet more on hands with good odds tends towards the ideal of a "self-weighting" betting process. They are not likely to be successful gamblers.
A "non-self-weighting" betting process (NSW) is one in which there is significant variation in bets accross events and significant non-participation in some betting events. When a skillful gambler earns a profit after a round of poker, this could be indicative of playing the betting odds successfully, not participating in some hands, and not recklessly going "all in" on any one bet.
The successful gambler minimizes risk by playing the odds successfully. This consists of dropping out of many hands and betting more in those hands which have favorable odds for winning. Over the long haul, this can produce profits for a gambler provided they know how to also manage their bankroll to stay in the game (e.g., don't go "all in" and lose your bankroll). On each bet/investment you are managing "money" in the short term, but your "bankroll" in the long term. Bankroll management is of more concern to the successful gambler than money management.
I am not suggesting that you treat business investing as equivalent to poker betting. What I am suggesting is that the theory of gambling has some concepts that might be useful for thinking about the fundamental nature of successful business investing, namely, that is a non-self-weighted process.
Lessons from Poker about Investing
Posted on June 19, 2013 @ 09:27:00 AM by Paul Meagher
If you are a good poker player you will still experience losing streaks. If you are an astute Angel Investor you will still make some bad investments.
In the case of poker, you can play well and win; play well and lose; play badly and lose; play badly and
win. In other words, there is no one-to-one correspondence between the process employed and the results
obtained because winning and losing are not just a matter of playing skillfully or not; it is also a
matter of luck. However, over time, if you play skillfully, you can expect the effect of the luck factor to diminish in importance and the effect of the skill factor to emerge in importance. In the long haul, there is some correlation between the process followed and the results obtained.
Likewise, in the case of Angel Investing, we can invest well and win, invest well and lose, invest badly and lose, and invest badly and win. In the long haul, however, if we are skillfully identifying good companies to invest in, we might expect that some of these fluctuations would wash out and we would see better than normal returns on our Angel Investments.
What this suggests is that investing is less about results obtained in the short term and more about the process followed in the long term.
When we play poker, we control our decision-making process but not how the cards come down. If you correctly identify an opponent's bluff, but he gets a lucky card and wins the hand anyway, you should be pleased rather than angry because you played the hand as well as you could. The irony is that by being less focused on your results, you may achieve better ones. ~ Nate Silver, The Signal and the Noise, 2012, p. 328.
To become a good poker player these days involves reading a lot about game strategy and hand probabilities. What it takes to be a good poker player today is different that what it took to be a top poker player 30 years ago because poker players today are more educated about the formal aspects of playing poker and they are playing against similarly educated poker players. We might expect that business investing will move in a similar direction and that some investors will improve relative to others based upon whether they are able to incorporate more formalized knowledge about how to make good angel investment decisions. These are business investors who are more focused on the process used to make business investments and less-focused on short term results.
If you don't accept the irreducible role of luck and chance in business investing, then you will likely focus more on results than how skillful your investment process is.
Profit Generating Functions
Posted on June 13, 2013 @ 04:51:00 PM by Paul Meagher
In this blog, I want to get under the hood of what causes a profit distribution (which I have discussed in my last three blogs).
One cause of a Profit Distribution Function (PDF) is one or more Profit Generating Functions (PGF).
A profit generating function simulates expected profits based upon a set of parameters that are fed into it.
An example would be a line-of-business that involves shearing sheep for the wool fiber they produce. If you are at the beginning of the sheep shearing season, and are trying to estimate your profits for the end of the upcoming sheep shearing season, you would need to estimate how much money you might make per kg of wool fiber, how much wool fiber each sheep might produce (affected by heat, rain, nutrition, genetics), how many sheep you will have to shear at the future date, the fixed costs of raising your sheep, and the variable costs of raising each sheep. Each of these factors will have a range of uncertainty associated with them. The uncertainty associated with the price per kg and amount of wool in kgs per sheep are illustrated below in the tree diagram below.
The full calculation of how much you will make at the end of a season is a function of the values that each of these parameters might reasonable attain over the forecast period. A profit generating function will sample from each pool of uncertainty according to the distributional characteristics of that parameter and then use some arithmetic to generate a single possible profit value. When the profit generating function is re-run many times, it will generate a large number of possible values that can be graphed and this graph would look like your estimated profit distribution, or something that approximates it.
When estimating the probability to assign to each profit interval for Google (see Google 2013 Profit Distribution), we could constrain our estimates based upon the profit generating functions we believed were critical to generating the actual amount of profit they might attain. The profit generating function for adwords might include the estimated average cost per click and the volume of clicks over a given period (among other factors). Or, we could ignore the profit generating function and estimate our values on something less concrete but still significant - the level of goodwill that will exist towards Google over the forecast period (e.g., big brother privacy concerns creating negative sentiment), or social network rivals taking more of the advertising budget of companies, or search engine rivals like Yahoo gaining more market share, etc... As a Bayesian you are free to base your subjective estimates upon whatever factors you feel are the most critical to determining the actual profit of Google. In certain cases, you might want to rely more upon what your profit generating functions might be telling you. It could be argued that it is always a good idea to construct a profit generating functions for a company just so you understand in concrete terms how the company makes money. Then you can choose to ignore it in your profit forcasts, or not, or base you estimate on a blend of profit generating functions modified by subjective Bayesian factors.
What I am here calling a Profit Generating Function, is somewhat akin to what I have referred to as a Business Model in the past. If you want some ideas for how profit generating functions could be implemented, I would encourage you to examine my blog entitled A Complete and Profitable Business Model. Perhaps in a future blog I will try my hand at implementing a profit generating function that samples from several pools of uncertainty to deliver a forecast profit, and which will generate a profit distribution when re-run many times.
Shapes of Uncertainty
Posted on June 9, 2013 @ 04:38:00 PM by Paul Meagher
In my last 2 blogs, I discussed the idea of a profit distribution. I argued that it is better to estimate profit using a profit distribution rather than a single most-likely value (e.g., we should make 100k next year). A distribution is more epistemically informative than a single most-likely value. I'll illustrate what I mean by this in today's blog on the shapes of uncertainty.
In this blog, I want to focus on what to look for in a profit distribution. A profit distribution can have many shapes and these shapes are quite informative about the type and level of uncertainty involved in an estimate.
To demonstrate why profit distribution shapes matter, I have prepared 3 new Google profit distributions for your consideration.
- A flat distibution.
- A peaked distribution.
- A distribution with a reduced x range, or a "shrunk" distribution.
It is useful to acquire the skill of reading and interpreting a profit distribution. That skill involves attending to significant aspects of the distribution shape and understanding what the shapes mean.
Flat Profit Distribution
If our profit distribution for Google was flat, this would mean that our level uncertainty was the same over all the profit intervals. In the graph below, the estimated profit could fall within the full range of values with the same probabiliy (i.e.,16.6%) of being in any interval. Some Bayesian textbooks advise that you start with a flat distribution if you have no strong convictions where an estimated parameter might lie.
Peaked Profit Distribution
In a peaked profit distribution one of the intervals has significantly more probability mass than other profit intervals. This refects an increased level of certaintly that the estimated profit will actually be within that interval. As we acquire more information about the company and its lines of business (e.g., second quarter financials), we might expect that our profit distribution estimate would begin to change shape in this manner first.
Shrunk Profit Distribution
As we learn even more about a company and their lines of business, then the range of possible profiit outcomes should be reduced so that instead of a Google profit range running from 10.0b to 12.4b, perhaps it only covers the range from 10.8b to 12.0b (see below). We show our confidence in our prediction by how narrow our profit distribution is. This does not necessarily change the shape of the profit distribution, it changes the x axis of the profit distirbution (both shapes might be peaked, but they would be on x axis with different ranges of possible values).
The shape of a profit distribution tells us alot about the nature of the uncertainty surrounding our estimate of profit. We have seen that our confidence in an estimate is reflected in how peaked our profit distribution is and how shrunk the range of possible profits are. This suggests strategies one might adopt to increase confidence in an estimate - gather information that helps you establish a more peaked profit distribution and that helps you reduce the range of the profit distribution.
In this article we have examined three ways in which a profit distibution can appear on a graph - flat, peaked, or shrunk. There are other aspects of shape that we have not examined, namely, the skew factor and the kurtosis factor (second and third moments of the distribution). Using these shape controls, we might be able to approximate the peaked distribution above as a normal distibution with a skew and kurtosis setting that would help match a theoretical normal distribution to the estimated profit distribution. A normal distribution is an example of a function that generates points on a probability curve (sums to 1) based upon the values fed into it (i.e., mean, standard devitation, skew, kurtosis, x-values). We might want to take this additional step of creating a profit distribution function if we thought it would simplify calculations (or thinking) or if we thought it was a better representation of the data than a discrete historgram of possible profit intervals. Step functions are potentially limited as a means of representing the actual shape of our uncertainty about a parameter.
Google 2013 Profit Distribution
Posted on June 5, 2013 @ 10:37:00 AM by Paul Meagher
My last article on the concept of a profit distribution was a bit abstract and lacked a graphic. I wanted to correct this situation by constructing a profit distribution for a company we can all relate to - Google.
In order to make this example realistic, I wanted to know how profitable Google is on a year-to-year basis. To find this info, I consulted Google's investor relations area, specifically their 2013 Financial Tables. Here I learned that the net income for Google in 2011 was approx $9.7 billion, in 2012 it was approx. $10.7 billion. I used these values to come up with some reasonable bounds for their expected profit in 2013 (e.g., between 10 billion and 12.4 billion). I divided up this range in units of .4 billion and estimated the probability that Google's net income (or profit) would fall in each interval. This is what I came up with.
The shape of the profit distribution function reflects my belief that Google will continue to grow and that my best guess is that they will grow by another billion in profit next year. I also believe that there is a greater chance they will earn less than than this amount than that they will earn more than this amount.
Notice that if you sum the percentages (e.g., by converting 45% to .45) that they sum to 1 as all good probability distributions should. My uncertainty regarding the expected profit of Google in 2013 is best captured by a range of probability assignments to profit intervals, than by a single point estimate of how much they might make next year. I don't know that much about Google's business lines and how they will perform this year, but I'm able to use my general knowledge and recently acquired financial statements to come up with a 2013 Profit Distribution for Google. This could be considered my "prior" distribution for Google, one that can be updated according to Bayesian logic as more information comes in.
I used JpGraph library to generate this graph. I modified an example graph from the JpGraph site. FYI, here is the code I used to generate the graph.
Profit Distribution Function
Posted on June 3, 2013 @ 09:06:00 AM by Paul Meagher
One factor that an investor takes into account when deciding whether or not to invest in a company is the expected profit that company might make in the near and the longer term.
So how should we represent the expected profit of a company?
One approach that I think might be useful involves diagramming the expected profit distribution of the company. The profit distribution graph would consist of a subjective estimate of the probability that the company will make a given amount of profit over a specified time frame. The Y axis of the graph is labelled "Probability". The X axis of the graph is labelled "Profit". To construct the graph involves estimating the probability that the company will make specific amounts of profit
(e.g., 10k to 20k, 20k to 30k, 30k to 40k, 40k to 50k, 50k to 60k, 60k to 70k). So we assign a probability to the event that a company will make 10k to 20k in profit next year. Then we assign a probability to the event that a company will make between 20k and 30k and so on up to our 70k limit (the range and intervals chosen will vary by company). In
this manner we can construct a profit distribution.
The profit distribution that is constructed should be constrained so that the mass of the
probability distribution sums to 1. If you constrain it in this manner than you can potentially
do bayesian inference upon the profit distribution. This could be in the form of conditionalizations
that involve saying that given some factor A (e.g., money invested) the profit distribution function
will shift - the mean of the profit distribution would ideally go up by an amount greater than the
So far in my discussions of Bayesian Angel Investing, I have used Bayesian techniques in an objective
manner. The inputs into Bayes formula were objectively measurable entities. In the case of generating
the profit distribution function for a company, we are subjectively assigning probabilities to
possible outcomes. There is no set of trials we can rerun to establish an objective probability
function for the profit distribution of a company (i.e., the relative frequency of different profit
levels for the same company repeated many times with profit levels measured). The probability that
is assigned to a particular profit level should reflect your best estimate of how likely a given
profit level is for the compaany within a particular timeframe. So, what is the probabiity that
Google will make between X1 billion and X2 billion next year (e.g., .10)? What is the probability that
Google will make between X2 and X3 (e.g., .40). Assign mass to the intervals in such a way that the
probability mass of all the intervals sums to 1. Then you will meet all the technical requirements for
a distribution to be considered a probability distribution. All the probability axioms are satisfied.
Why go through all this bother to estimate the how profitable a company might be? Why not just
ball-park a value that you think is most likely and leave it at that.
One reason is because one number does not adequately represent your state of uncertaintly about the
Another reason has to do with modelling risk. Usually when you model risk you don't use one number to do so. Those modelling risk usually like to work with probability distributions, not simple point estimates of the most likely outcome. It provides a more informative model of the uncertainty associated with a forecast.
Also, if you are constructing a profit distribution function for a company there is no reason to hide that information from the company you want to invest in or from co-investors. The profit distribution function, because it is inspectable, can be updated with new information from the company and other investors who might offer strategic capabilities. So the transparency and inspectability of the uncertainty model are also useful features of this approach.