Estimating the spread rate in the current ebola epidemic

I’ve now written several articles on the West African ebola outbreak (see e.g. here, here, here, and here). This time I want to get more analytical, by describing how I estimated the ebola basic reproduction rate Ro (“R zero”), i.e. the rate of infection spread. Almost certainly, various people are making these estimates, but I’ve not seen any yet, including at the WHO and CDC websites or the few articles that have come out to date.

Some background first. Ro is a fundamental parameter in epidemiology, conceptually similar to r, the “intrinsic rate of increase”, in population biology. In epidemiology, it’s defined as the (mean) number of secondary disease cases arising from some primary case. When an individual gets infected, he or she is a secondary case relative to the primary case that infected him or her, and in turn becomes a primary case capable of spreading the disease to others. It’s a lineage in that respect, and fractal. I’ll refer to it simply as R here.

The value of R depends strongly on the biology of the virus and the behavior of the infected. It is thus more context dependent than the r parameter of population biology, which is an idealized, or optimum, rate of population growth determined by intrinsic reproductive parameters (e.g. age to reproductive maturity, mean litter size, gestation time). Diseases which are highly contagious, such as measles, smallpox and the flu, have R values in the range of 3 to 8 or even much more, whereas those that require direct exchange of body fluids, like HIV, have much lower rates.

To slow an epidemic of any disease it is necessary to lower the value of R; to stop it completely, R must be brought to zero. Any value of R > 0.0 indicates a disease with at least some activity in the target population. When R = 1.0, there is a steady increase in the cumulative number of cases, but no change in the rate of infection (new cases per unit time): each infected person infects (on average) exactly 1.0 other person before either recovering or dying. Any R > 1.0 indicates a (necessarily exponential) increase in the infection rate, that is, the rate of new cases per unit time (not just the total number of cases), is increasing.

Ebola virus can apparently remain infective outside a warm body for a little while and can also spread via aerosols, but the dominant transmission mode is (by far) via direct body and body fluid contact. This fact should tend to favor generally lower R values. However, it’s also a very new disease, known only since 1976, and as far as West Africa is concerned, entirely unfamiliar–without any immune resistance in the human population. The same held true when it first showed up in Zaire and Uganda previously–it’s apparently a new disease everywhere it has shown up. Therefore, one might expect significantly higher spread rates than for a viral disease having the same basic biology but with which the population has some residual immunity.

So, on to making the estimates and considerations therein. There are four main issues here: (1) smoothing the data, (2) choosing an optimal mathematical model, (3) estimating that model’s parameter(s), and (4) estimating R from these values. I’ll take them in turn.

A complication in this situation involves the data gathering and reporting timelines, namely that the reported daily new case and death rates jump wildly from one report to the next. There is no biological reason whatsoever to expect this: it is almost certainly due to how the data are being collected and reported. For tabulating the total number of cases this is not a major issue, but for estimating rates it certainly is. One has to smooth these fluctuations out, and I’ve done this using loess, or locally weighted regression. Loess is subjective however, because you have to choose how “locally” you want the regression to be weighted.

As for issue (2), we know two important facts: (1) R > 1.0 (the number of new cases and deaths per unit time is increasing), and (2) R > 1.0 strongly implies an exponential function (a steadily but non-exponentially increasing case rate is possible but unstable, and thus unlikely over an extended period). Exponential functions are of the form y = b^ax, where b is some chosen base, usually e (~2.718) or 10, and a is the single estimated parameter.

Issue (3) necessary involves fitting the model to the data, and thus potentially involves issue (1). Should we fit the model to the raw data, or to the smoothed data (the latter being much more likely to reflect the actual biology, rather than artifacts)? If we fit it to raw data that is greatly affected by spikes in case reports, or other issues, that’s going to induce some error.

Issue (4) involves very simple math in this case, simply converting one base to another; easy (but critical).

In a nutshell, this is what I did. First, I waited for the next WHO report (containing a large case spike) to come out, assuming that whenever such a large spike is shown, that this is an accurate reflection of total cases to date. I then computed the total case and death rates from the shown data, and smoothed these using a relatively stiff “loess” smoothing function, exactly as in the many graphs I’ve shown before. In the R computing language which I use, the stiffness of the resulting loess smoothing is controlled by the “span” argument, with values of ~ 1.0 giving relatively stiff curves that are minimally influenced by the large variations in reported cases.

Eyeballing the loess smooth confirmed that it was concave over the last three months, and therefore that an exponential growth model was indeed appropriate. The standard way to proceed would be to then fit an exponential model to the data, but in the R program this is done using the nlm or optim functions, neither of which ever seems to work well (and which I was in no mood for this time, if ever). So I worked around it using a brute force approach, but one which also has some other advantages, such as the ability to evaluate the loess function’s performance.

So then, what I did was to:
(1) take the loess-estimated new case rate from the latest WHO report (Aug. 22),
(2) compute the per-day growth rate that would be required to achieve that value over the last ~ three months, and,(3) used that value as a starting point for systematic searches of values giving even better estimates (as evaluated via the residual sums of squares).

A key point in step (2) is that the parameter estimated (in model y = b^ax), is not in fact a, but rather the base b. It is easier (and more informative) to estimate b, the per-day growth rate of infected people in the population, as b = y^(1/x), where y is obtained from step (1), a is set to 1.0, and x is the number of days from whenever the primary cases were identified. The conversion from the population growth rate, b, to the per-person spread rate, R, is simple if we know a little about ebola disease progression, which we do.

With a starting point estimate of b, which from eyeballing I have confidence is reasonable, I then compute the sums of squares for all values of b between 0.5b and 1.5b, in 200 increments (the brute force part). Choosing the best fitting value of b, I then convert that to an estimate of R under the following logic. The value of R is time-independent, the total number of secondary cases arising from some set of primary cases. Therefore, I really just need to know how long ebola patients are infectious and the daily growth rate of ebola infection. From what I have read, patients are infectious for roughly 6 to 12 days, before either dying or recovering (they are assumed to be uninfectious (or nearly so) before this time, but to retain some infectiousness after recovery). In the absence of better data, R is therefore estimated simply by R = b^d, where d ranges from 6-12 days.

So finally, here are the results, over all cases in all countries, using a time zero of mid-May (that’s when the initial outbreak of March/April appeared to have subsided, before skyrocketing from June until now). I’ve assumed that case identification accuracy is high, i.e. that “probable” and “suspected” cases are largely correct. I’ve also assumed that spread from any animal vectors to humans is not an important component of the epidemic, once started.

The initial estimate of b was 1.042. The (maximum likelihood) rate obtained from step (3) above was very close to this: 1.043, meaning that for these data, the choice of a relatively stiff loess data smoothing parameter (span = 1.0) gave a very accurate representation of the underlying growth rate. Lastly, the estimate of R, the per capita disease transmission rate, thus ranges from 1.29 to 1.66 for d = 6 and 12 respectively, as computed from data over the last ~ 3 months. It will be interesting to see how these rates change with time, and how they compare to estimates from past epidemics, if they’ve been made, and also between the three countries in this outbreak.

22 thoughts on “Estimating the spread rate in the current ebola epidemic

  1. Pingback: Liberian ebola rate jumps | Ecologically Orientated

  2. Careful with the term “aerosol”. CDC apparently wishes to reserve that term for free floating infectious particles versus those suspended in droplets. IMO, the distinction is essentially meaningless, but the CDC has been on record denying Ebola has airborne infection by slicing and dicing aerosol vs droplets. So there you go, lol. Flu is spread by aerosols but Ebola is not, they insist, at least not at the moment.

    The only mechanism to effectively slow R for Ebola is contact tracing and effective quarantine protocols for each contact. That is pretty much impossible at least in Liberia at the moment, and since the disease is now out of control in a large urban population(s) there we can expect continued increase in cases, likely in some increasing exponential rate. But the reality of 25 – 50% of the cases reported, if true, means such estimates may not reflect reality.

    • I think that distinction is important actually, and the WHO defn. accords with the common definition in atmospheric science of aerosols as small, dry airborne particles, generally. Keep in mind that although the viral load is enormous in sick patients, it is not heavy in the lungs and there is also not a lot of congestion in the lungs and resultant coughing. If ebola were easily spread via the air, the R value would be higher–probably way higher–than it is.

      I don’t think we have much of a handle on the percentage of cases being reported, although some of the on-the-ground people probably do. We do know that in Guinea, about 26 villages that had been off-limits disclosed their cases last week. That caused a spike in Guinea’s numbers, but it was due mainly to unreported cases from weeks past, not to new ones.

      As for movement via hitchiking on flying insects etc, yeah seems possible. Lots and lots and lot of bleach and spray bottles needed.

  3. Oh, one other point. Apparently, the infectious nature of the sick person increases as time goes by, with the corpse the most infectious state since the viral load increases exponentially. We may also need to consider infection via insect vectors like houseflies landing on infected fluids and then vectoring the viral particles hither and yon, either landing on the uninfected or on surfaces. Window screens are almost non-existent and treatments centers are often in tents.

    • About the aerosols vs droplets, yeah, supposedly low infection rate of pulmonary tissue is the meme but I surely would not presume zero blood contamination …. was thinking mostly of the vomitus & sweat, both are apparently very common symptoms & supposedly have heavy viral loads. CDC is still playing games with the droplets thang, & has hidden former advisories mentioning droplet transmission down another few layers. Would not want panic.

      BTW, if you have not seen this page, you might want to check it out – has case count progression data w/ links.

  4. Thanks for deriving an estimate of the spread rate, Jim. For purposes of calculating future total infections, can the time independent spread rate R be used in the same way as a fertility rate might be used for estimating a future population? For instance, one person infects 1.3 people, who in turn infect 1.69 people, and the newly infected 1.69 people go on to infect 2.20 people, etc., etc. This model assumes no counter-measures to slow the rate of infection.

    If true, assuming an R of 1.48 (approx. mid-point for your range), it would take about 15 iterations (or generations so to speak) of infections to get to Liberia’s current total of 1,062. And all of this took place in three to six months?

    (I built a simple spreadsheet in LibreOffice to handle the iterations.)

    At 30 iterations, the spreadsheet is showing about 400K total (accumulated) infections. 36 iterations is enough to infect every person in Liberia. (My apologies if I have misunderstood how to apply R here. I am approaching this topic from a layman’s perspective.)

    • Harold, I had a long comment written and lost it. Here’s the gist.

      What you are describing in your 1st paragraph is an accelerating infection rate, not an accelerating new case rate. Big difference. This can be a little tricky and hopefully I didn’t confuse it too badly.

      To get an accelerating new case rate, all one needs is R > 1.0, which will give exponentially increasing rates, scaling with the magnitude above 1.0. In your example, if R = 1.3, then 1.69 is the number of new cases relative to some initial case. That case infects 1.3 people and those 1.3 in turn infect 1.3 others, not 1.69 others. 1.69 is the number of new cases per unit time, and it’s difference from 1.30 (0.39) is greater than 1.3-1.0 (0.30); hence, a growing new case rate. The infection rate itself is not growing, but constant. An accelerating infection rate would lead to a super-accelerating new case rate and would indicate something completely out of control. What we should expect in this case is a decelerating infection rate, due to people becoming better informed and changing their behaviors. But when it will decelerate is the big question.

      Raising infection rates to powers, as I did, does not involve changing infection rates themselves, but only the time scale by which they are defined. What I did was compute a per-day rate of spread of the epidemic, and then converted that to a per 6-12 day rate, which must then be approximately the number of infections caused by each symptomatic person (on average). Note that when I said that the defn of R is time independent, there’s still an implicit time factor involved, because infectivity is time limited. The generalization here is that I could define the rate of ebola spread on any chosen time scale–daily, weekly, etc. For daily, the base is 1.043, for weekly the base is 1.043^7 (= 1.343). Thus after say 13 weeks, we’d have a new case rate of 1.343^13 = 1.043^(13*7) = 46.2 cases/day or 323 per week.

      Regarding your example, if the population of Liberia is 4.2 million, with a per-day rate of spread of 1.043 it would take about 286 days for everyone to become infected. Of course that assumes no change in spread. Note also that 1.043 is for all countries; Liberia’s rate will differ a little.

    • I prefer to keep models dead simple.
      Liberia: 22 June [51] 20 July [224] 20 Aug [1082]
      monthly increase multiplier: 4.4 4.8

      call it 4.6 X increase per month – therefore:
      Sept {4,977} Oct {22,895} Nov {105,317) Dec {484,461} Jan {2,228,519}
      somewhere Dec – Jan the outbreak could start decreasing / burn out as the potential uninfected pool becomes too small to support the prior 4.6 multiplier rate.

      We will soon see if it’s closer to 150 days or 286.

    • A couple of points on that. First, my model really is about as simple as you can get. Second, I think using a monthly base is too coarse for this. Too many things can happen in a month’s time that can change the rate of spread, including not just the behavior of the infected and those who care for them, but also potentially drug availability and other wild cards. I would go no coarser than a weekly base. Third, I used the rate for all three countries, because I was just trying to illustrate the larger concepts involved, not actually make a prediction as to what would happen. I could restate them with Liberia-specific numbers, but the bigger point is this is a hypothetical only. I don’t in any way believe that the population of Liberia is actually going to be wiped out–nothing even close to that is likely to happen.
      Also, late January 2015 from late June 2014 would be 7 months or ~ 210 days, not 150. Just as back of the envelope, if use a daily spread rate there of about 1.06, which is entirely reasonable, I get that value–210 days.

      Nevertheless, given that it’s now been fully five days since the last update report (whereas the interval over the last couple weeks has been 1-3 days), I’m very worried that we’re going to see a very large spike in the Liberian and Sierra Leonian cases in the next report. Indeed, I expect it.

    • Monthly data is undoubtedly too coarse but it DOES simplify the model. Given that the medical people with experience on the ground claim the reported WHO numbers represent at best only 25 – 50% of reality, I don’t think it really matters much. It suits the 1st order approximation that fits the lack of quality data, and the gross estimation of when the outbreak in Liberia is likely to peak and start a decline also suitable for the data quality. Nobody could possibly claim any precision nor the fitness of assumptions with future behavior of the figures that result and that’s fine because I certainly don’t. It’s simple arithmetic based on gross empirical observation and does not pretend to be anything else.

      Given Liberia’s current situation and the lack of serious action to date by NGOs & governments, I don’t see any reason why Ebola could not eventually infect nearly everyone who resides there subject to the inherent transmission limitations. There are something under 100 beds available for 1 million people in Monrovia, effectively all of the NGOs & western trained medical professionals have left the country or died, there is no contact tracing, minimal quarantine which is not likely to continue as the outbreak intensifies.

      Yeah, I was counting 150 days from post date, not from June nor from the 1st case in March, which was pretty obvious. I missed any actual concrete dates in your post that would have enabled me to discern when your time span started. It hardly matters. I would not rely on my ‘model’ or claim it has any sort of accuracy since it probably doesn’t have much. For all that, the numbers tell the story and given the assumptions the Liberia outbreak may well start to burn out in Dec-Jan because of lack of uninfected hosts.

      I also wonder what is going on with WHO’s normal report schedule. Something is up.

    • It doesn’t simplify the model–it just changes the base period, that’s all. The model structure is the same: y = b^x, as is the method for estimating b, the single parameter. You’re just changing b, and not for the better. By lengthening it to 30 days, you gain nothing but lose any ability to detect finer scale rate changes, which, granted, is not easy due to the data reporting issues, but which is at least possible.

      I don’t think we really have much of an idea of what’s going on in Monrovia, or the fraction of cases being reported, outside of hearsay and media stories of unknown veracity. There’s no way the entire country of Liberia is going to be infected by this; the world community will not allow that to happen.

      update: I did find this statement in the new WHO ebola “Roadmap”: “This Roadmap assumes that in many areas of intense transmission the actual number of cases may be 2-4 fold higher than that currently reported.” However, whether the statement is a description of the current reality, as based on evidence from the epidemic, or is simply a worse case scenario considered for modeling and planning purposes, is not made clear.

    • Jim, I think we will need to just agree to disagree on the modelling topic since we seem to be talking past each other and that’s OK.

      There are people with first hand experience who were and are there and DO have an idea and have told us what IS happening, as I posted previously. If you accept the WHO Epidemiology and Surveillance numbers, then you should more or less accept the rest of their statements so see below. You really should read the 1 page PDF of Frank Glover’s congressional eyewitness testimony linked previously if you have not already.

      “We believe the reported numbers only show 25-50% of the cases.”
      Ken Isaacs , Samaritan’s Purse, Congressional testimony 7 Aug

      “in many areas of intense transmission the actual number of cases may be 2-4 fold higher than that currently reported.” WHO 28 Aug Roadmap

      “the world community will not allow that to happen.”

      The US has certainly responded sending in 65 personnel (21 to Liberia) and $14.5 million in material support to the 4 countries. Obviously too little and too late. It would be wonderful news if the world DID provide the 489 million and 750 foreign personnel WHO says they now need. Instead we have WHO closing down their Sierra Leone testing center & Canadians pulling out their people today after their healthcare professionals contracted Ebola. Foreign support is mostly pulling out not adding people. Efforts to get the world off the dime are being met with statements minimizing the problem, handwaving, explaining how difficult Ebola transmission is, continuing efforts to hide and obscure information, and complaining how alarmists are trying to spread panic. The UN and government are not exactly known for solving problems expeditiously.

      There are just a few more months needed to solve the problem in the out-of-control outbreak areas the old fashioned & real-world way no matter which model is employed. We seem to be better at cleaning up our messes after the fact than preventing them. We won’t have too long to wait to see if optimism or pessimism was warranted. How does it go, the stages of loss & grief: denial, anger, bargaining, depression, acceptance.

      I think I will bow out of your blog now. Always good times to talk to a fellow biologist.

    • Jim, thanks for the in depth clarification. I’ve updated my spreadsheet to use the spread rate.

      Googling the two quarantined areas, Dolo/Dolo’s Town and West Point, produces interesting articles at, especially the articles that quote the inhabitants who are living there. These areas are functioning as little more than controlled “burnout” zones. Implications for the inhabitants are very ominous at this point.

      WHO data is now eight days old. Wonder when the next update will come.

    • Oh, never mind on the update. Liberia now has 1378 cases and 694 deaths as of August 26. WHO is encoding the update date in the URL. I thought I was looking at the most recent report.

      The most recent DONs are listed here:

      It’s either that or change the date in the URL.

    • Yes, new update today. I’ll post new graphs later–they are essentially indistinguishable from the previous report’s–not the big spike I expected.

      Thanks for the reference, I’ll check it out later.

  5. “However, it’s also a very new disease, known only since 1976, and as far as West Africa is concerned, entirely new, unexpected, unfamiliar and without resistance in the human population. The same held when it showed up in Zaire and Uganda previously. Therefore, one might expect significantly higher spread rates than for a viral disease having the same basic biology but with which the population has historical familiarity.”

    This family of virus is only known to modern science since 1976, but it’s possible (even likely) that the disease has a long history with people in central Africa. So the ≈55% fatality rate that’s being cited (for this particular viral strain) could be much higher in human populations of non-African descent. In other words, what if it’s killing ≈55% of a population that DOES have some level of “historical familiarity” with the virus?

    I’d be curious to see if/how fatality rates differ between locals and non-locals (not counting patients who are transported out of Africa, where the level of care is presumably higher, for now). Of course, Ebola survivors have already being released in the US, which makes me curious (concerned?) about them being infectious/carriers for 2-3 months after surviving… We may soon have an answer to that one :/

    It may be like Guns Germs and Steel, but this time wiping out disproportionately large percentages of the “civilised” world…

    The Story Of… Smallpox – and other Deadly Eurasian Germs

    • @ atom – considering the large mortality rate variance, small numbers of cases and local sporadic nature of outbreaks before present, it would seem to be doubtful that Ebola has engendered any significant selective pressure on any human populations other than perhaps the immediate residents of the area of disease endemism, if that.

      For a single mutation to proliferate in a human population, 25 generations x 18 (age of 1st reproduction) = 450 years of continuous selective pressure would be required, assuming that the SNP which conferred survival was already present in the population.

    • Keep in mind that several previous outbreaks, of both Zaire EBOV and Marburg virus, had 80% or higher mortality rates. So whatever resistance there is there, it’s not high. But yes it certainly can’t be expected to be any higher outside of equatorial Africa.

      I also think the data show that the death rate is being under-estimated, but I need to look more closely at it and haven’t had time.

  6. Pingback: What’s complex and what’s simple in an exponential model? | Ecologically Orientated

  7. Given that the symptoms are similar to other diseases including malaria, that the average level of trust in government appears to be low, and that (with the exception of health workers) infection appears to be predominantly among the poor, I don’t see how the officially reported figures (already highly biased towards laboratory-confirmations) can be anything but low. It is also important to remember that the Ebola viruses can be expected to change over time and any changes that increase transmission in an otherwise dead-end host will be selected for.

    Re aerosols. True, WHO is technically correct, but if someone with EVD coughs on you, I would recommend immediate quarantine (I wonder if some of the unexplained health care provider deaths are not the result of something similar). If any Ebola victim is on a plane, then (based on current knowledge) it is true that it is unlikely that the virus would have been spread throughout the plane. However, those sitting in the victim’s immediate vicinity should be quarantined and the entire plane thoroughly sanitized.

    Re virulence. Anecdotal information seems to indicate that hydration therapy can be effective. Given the symptoms, this make sense. So, I suspect that mortality rates above 60% are more about medical care than strain virulence, at least in this outbreak (as I recall Reston was thought to be low virulence – but transmitted by aerosol.)

    Looks like the new outbreak in Zaire is being dealt with in the usual manner (strict quarantine of an isolated area). Probably there won’t be much data available, but it would be interesting to compare the results to the west African outbreak. Along with Putin giving veiled threats of nuclear retaliation re the Ukraine, we certainly do live in interesting times.

Have at it

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s