# What’s complex and what’s simple in an exponential model?

In the post on estimating the rate of spread in the current ebola epidemic, a commenter stated that using a monthly rate of disease spread in Liberia was a “simpler” model than what I had done, which was based on a daily rate. This is not correct and I want to clarify why here.

In fact I used a very simple model–an exponential model, which has the form y = b^ax. You can’t get any simpler than a one parameter model, and that fact doesn’t change just because you alter the value of the base b. Any base can model an exponential increase; changing it just requires a corresponding change in parameter a, for a given pair of y and x variables. Base choice ought to be done in a way that carries some meaning. For example, if you’re inherently interested in the doubling time of something, then 2 is the logical choice*. But when no particular base value is obvious, it’s still best if the value used carries meaning in terms of the values of x, i.e. where a = 1.0, presuming that x is measured on some scale that has inherent interest. In my case, that’s the per-day increase in ebola cases.

However, if you fit an exponential model to some data, most programs will use a base of e (~2.781) or 10 by default; the base is fixed and the rate of change is then determined with respect to the units of ax. That’s a bit backwards frankly, but not a big deal, because the base used can easily be converted to whatever base is more meaningful relative to the data at hand. Say for example, that your model fitting procedure gives y = e^(3.2x), where b = e and a = 3.2. But if your x variable is recorded in say, days, you may well not be interested in how y changes every 3.2 days: you want to know the per-day rate of change. Well, y = e^(ax) is simply y = (e^a)^x, and so in this case b = e^(3.2) = 24.5; it takes a larger base to return a given y value if the exponent is smaller. It’s just a straight mathematical transformation (e^a), where a is whatever value is returned in the exponential model fitting. It has nothing to do with model complexity. It has instead to do with scaling, ease of interpretation and convenience.

The relevance to the ebola transmission rate modeling and the original comment is that those rates could very well change within a month’s time due to radical changes in the population’s behavior (critical), or perhaps drug availability (unlikely in this case). In a disease epidemic what happens from day to day is critical. So you want to use a time scale that allows you to detect system changes quickly, while (in this case) also acknowledging the noise generated by the data reporting process (which complicates things and was the whole point of using loess to smooth the raw data before making the estimates). Note that I’ve not gone into the issue of how to detect when an exponential growth rate has changed to some other type of growth. That’s much more difficult.

*Exponential functions are also useful for analyzing outcomes of trials with categorical variables, a where a = 1 and b defines the number of possible outcomes of some repeated process. For example y = 2^25 gives the total number of possible permutations of 25 trials of an event having two possible outcomes. But that’s a different application than modeling a change rate (unless you want to consider the increase in the number of possible permutations a rate).

## 4 thoughts on “What’s complex and what’s simple in an exponential model?”

1. I understood parts of this ðŸ˜‰

What happens when a week or more goes by between updated numbers from official sources? Obviously the rate of change is harder to track… But is there any particular way(s) that it’s likely to be tracked incorrectly with less “resolution” in the input data?

Thanks!

• I’m glad it wasn’t a total failure. An abrupt change in how the symptomatic are cared for would be one, although it wouldn’t manifest itself for a week or two, assuming that to be the length of the asymptomatic period. So roughly weekly resolution data is probably about optimal, although we really do not know what’s going on with the data collecting and reporting schedule. So far, six days has been the longest interval.

2. OK, Jim, I will bite. Mine was simpler because:

1) it took me one sentence to describe and enumerate the process, calculation, & model vs your longer and involved explication & process
2) i used the empirical smoothing process provided with one month nibbles of the actual data (for better & worse) rather than adding another step calculating a daily smoothing. After all, symptoms display in 2 – 21 days, the data rarely comes in daily, and so on.
3) There is no need for more accuracy if the aim is a rough estimate of when the epidemic might become beyond control and my model process naturally dropped into actual and easily understood dates.

Given the 2 x 4 times potential variance of the data, who cares about non-existent accuracy ? I will stipulate all of the various potential better micro-tracking, potential errors, better granularity, etc. You are certainly correct in that regard and there is a long history of how it is normally done. But a simpler process overall is simpler. Not better. Both have their place. Mine ain’t gonna be published, Jim. Don’t confuse the good-enuf with the better.

BTW, great url for actual daily reports (& lots more) from the countries involved given to WHO: