Estimating the spread rate in the current ebola epidemic

I’ve now written several articles on the West African ebola outbreak (see e.g. here, here, here, and here). This time I want to get more analytical, by describing how I estimated the ebola basic reproduction rate Ro (“R zero”), i.e. the rate of infection spread. Almost certainly various people are making these estimates, but I’ve not seen any yet, including at the WHO and CDC websites or the few articles that have come out to date.

Some background first. Ro is a fundamental parameter in epidemiology, conceptually similar to r, the “intrinsic rate of increase”, in population biology (I’ll refer to it as just R here). It’s defined as the mean number of secondary disease cases arising from a primary case. When an individual is infected, he or she is a secondary case relative to whoever infected him or her, and in turn becomes a primary case capable of spreading the disease to others. Estimates of R depend strongly on the both the biology of the virus, and the behavior of the infected. It is thus more context dependent than population biology’s r parameter, which assumes idealized conditions and depends more strictly on biologically limiting parameters (lifespan, age to first reproduction, gestation time etc.). Diseases which are highly contagious, like measles, smallpox and the flu, have relatively high R values, whereas those requiring direct contact or exchange of body fluids, like HIV, have rates which are at least potentially much lower, depending on the behavior of the infected.

To stop an epidemic of any disease, it is necessary to first lower R, and eventually bring it near zero. Any value of R > 0.0 indicates a disease with at least some activity in the population of concern. When R = 1.0, there is a steady increase in the total number of cases, but no change in the rate of infection (new cases per unit time): each infected person infects (on average) exactly 1.0 other person. Any R > 1.0 indicates a (necessarily exponential) increase in the infection rate, that is, the rate of new cases per unit time (not just the total number of cases), is increasing. It’s also possible to get a constant, rather than accelerating, increase in the number of new cases, but that’s an unstable equilibrium requiring a steady decrease of R from values > 1.0, and is thus uncommon.

Ebola virus can remain infective outside a warm body for a while, and also spread via aerosol mists, but the strongly dominant transmission mode (thankfully) is via direct contact. This fact will tend to favor relatively low R values, all else equal. However, it’s also a very new disease, known only since 1976, and as far as West Africa is concerned, entirely new (at least in the modern record), unexpected, unfamiliar and likely without much resistance in the human population. The same held when it showed up in Zaire and Uganda previously. Therefore, one might expect significantly higher spread rates than for a viral disease having the same basic biology but with which the population has historical familiarity, i.e. some degree of immunity or resistance.

So, on to making the estimates and the considerations thereto. There are four main issues here: (1) smoothing the data, (2) choosing an optimal mathematical model, (3) estimating that model’s parameter(s), and (4) converting these to an estimate of R, if necessary. I’ll take them in turn.

One always needs good data of course, and a major complication in this situation involves the data gathering and reporting timelines. It’s very clear that the reported new case and death rates jump fairly greatly between consecutive reports. There is no biological reason whatsoever to expect this; it is almost certainly due to how the data are being collected and reported. For estimating the total number of cases this is no big issue, but for estimating infection rates it certainly is. One has to smooth this noise out, and I’ve done this using loess, or locally weighted regression. Loess smoothing is subjective however, because you have to choose how “locally” you want the regression to be weighted, which is very ad-hoc.

As for issue (2), we know two very important facts here: (1) R > 1.0 (the number of new cases and deaths per unit time is increasing), and (2) R > 1.0 very strongly implies an exponential function (as mentioned above, a steadily but non-exponentially increasing case rate is unstable and thus unlikely, except for brief periods, such as at the inflection point of a logistic curve). Exponential functions where y = f(x) are of the structural form y = b^ax, where b is some chosen base, usually e (~2.78) or 10, but anything will work, with a being the single estimated parameter. In this case, the base chosen has direct relevance to estimates of R, as discussed below, and it’s actually better not to fit a model having a base of either e or 10.

Issue (3) necessary involves fitting the model to the data, and thus potentially involves issue (1). Should we fit the model to the raw data, or to the smoothed data (the latter being much more likely to reflect the actual biology rather than artifacts)?. If to the raw data, and those are greatly affected by a spike in case reports, that’s going to induce some error. So, should we wait for the next report then, or just smooth with a very stiff loess function or?… etc. Issue (4) involves very simple math in this case, simply converting one base to another; easy (but important).

In a nutshell then, this is what I did. First of all, I waited for the next WHO report after that of August 20 (with it’s enormous spike) to come out, because the typical pattern to date has been a spike followed by much lower numbers in the next report. Sure enough, this pattern was again followed, and fortunately I only had to wait two days. I then computed the recent new case and death rates from the raw data, and smoothed these using a relatively stiff loess function, exactly as in the many graphs I’ve now shown. In the R language, stiffness is controlled by the “span” argument of the loess function. I did a little experimenting with different span values to get the sense that span = 1.0 was relatively stiff, and hence minimally influenced by spikes in the numbers due to reporting issues.

Eyeballing the loess smooth confirmed that it was concave over the last 3 months or so, and thus that an exponential model was indeed reasonable. The standard way to proceed at this point would be to simply fit an exponential model to the raw data. This requires non-linear optimization, since there is no analytical solution. In R, this is done using either the “nlm” or “optim” functions, and for whatever reason both of these always give me headaches, which I was in no mood for, not that I ever am. So I worked around it using a “brute force” approach, but one which also has some very definite advantages, one of which in this case is information on the ability of a stiff loess function to capture the underlying epidemiology. Which, since loess is a really nice, easy and useful tool, is a good thing to know.

What I did was to (1) take the loess-estimated new case rate from the latest WHO report (Aug. 22), (2) compute the per-day rate that would be required to achieve that value over the last ~ three months, and (3) used that value as a starting point estimate for systematic searches for values giving better estimates (as evaluated using standard goodness of fit measures, i.e. the smallest sum of squared deviations from the model-estimated values). A key point in step (2) is that the parameter estimated (in model y = b^ax), is not in fact a, but is rather the base b; I do not assume a given base and then estimate a. I instead set a = 1.0, and since x is the number of days from the defined starting date, I estimate b, which gives the per-day transmission rate of ebola in the population. This is given simply by b = y^(1/x), with y as given by step (1). This does not give us R directly, but that conversion is simple if we know a little about ebola disease progression, which we do.

From an initial estimate of b, which from eyeballing I have confidence is entirely reasonable, I then compute the sums of squares for all values between 0.5b and 1.5b, in 200 increments (I had to experiment a little with the number of increments to use), which is where the so-called brute force comes in. Choosing the value giving the best fit, I then convert that to an estimate of R under the following logic. Since R as commonly defined is time-independent, being the mean of the total number of secondary cases initiated by each primary case, I simply need to know how many days ebola patients are infectious on average. They are presumed to be uninfectious before presenting symptoms. From what I have read, infectiousness ranges from 6-12 days; patients either die or recover fairly quickly after that time (although there is reportedly a long tail for infectiousness of some body fluids after recovery, up to 2-3 months). In the absence of any more precised data, R is therefore estimated simply by R = b^d, where d ranges from 6-12 days.

So, here are the results. These are for all three countries combined, with time zero of mid-May. That’s when the initial outbreak of March/April appeared to have subsided, before the increase from June until now. I’ve assumed that case identification accuracy is high, i.e. that “probable” and “suspected” cases are correct a high percentage of the time, which I think is entirely reasonable. I’ve also assumed that spread from any animal vectors to humans is not an important component of the epidemic, other than having started it of course.

The initial estimate of b, the per-day infection rate, was 1.042. The (maximum likelihood) rate obtained from step (3) above was very close to this: 1.043, meaning that for these data, the relatively stiff loess data smoothing (span = 1.0) of the raw data provides a very reasonable representation of the actual epidemiology. Lastly, the estimate of R, the time-independent, per-individual spread rate, thus ranges from 1.29 to 1.66 for d = 6 and 12 respectively, over the last ~ 3 months. It will be interesting to see how these rates change with time, and how they compare to estimates from past epidemics, if they’ve been made, and also between the three countries in this outbreak. What everyone involved desperately wants to see, is a transition to a logistic form, indicating a slowdown in spread.


22 thoughts on “Estimating the spread rate in the current ebola epidemic

  1. Pingback: Liberian ebola rate jumps | Ecologically Orientated

  2. Careful with the term “aerosol”. CDC apparently wishes to reserve that term for free floating infectious particles versus those suspended in droplets. IMO, the distinction is essentially meaningless, but the CDC has been on record denying Ebola has airborne infection by slicing and dicing aerosol vs droplets. So there you go, lol. Flu is spread by aerosols but Ebola is not, they insist, at least not at the moment.

    The only mechanism to effectively slow R for Ebola is contact tracing and effective quarantine protocols for each contact. That is pretty much impossible at least in Liberia at the moment, and since the disease is now out of control in a large urban population(s) there we can expect continued increase in cases, likely in some increasing exponential rate. But the reality of 25 – 50% of the cases reported, if true, means such estimates may not reflect reality.

    • I think that distinction is important actually, and the WHO defn. accords with the common definition in atmospheric science of aerosols as small, dry airborne particles, generally. Keep in mind that although the viral load is enormous in sick patients, it is not heavy in the lungs and there is also not a lot of congestion in the lungs and resultant coughing. If ebola were easily spread via the air, the R value would be higher–probably way higher–than it is.

      I don’t think we have much of a handle on the percentage of cases being reported, although some of the on-the-ground people probably do. We do know that in Guinea, about 26 villages that had been off-limits disclosed their cases last week. That caused a spike in Guinea’s numbers, but it was due mainly to unreported cases from weeks past, not to new ones.

      As for movement via hitchiking on flying insects etc, yeah seems possible. Lots and lots and lot of bleach and spray bottles needed.

  3. Oh, one other point. Apparently, the infectious nature of the sick person increases as time goes by, with the corpse the most infectious state since the viral load increases exponentially. We may also need to consider infection via insect vectors like houseflies landing on infected fluids and then vectoring the viral particles hither and yon, either landing on the uninfected or on surfaces. Window screens are almost non-existent and treatments centers are often in tents.

    • About the aerosols vs droplets, yeah, supposedly low infection rate of pulmonary tissue is the meme but I surely would not presume zero blood contamination …. was thinking mostly of the vomitus & sweat, both are apparently very common symptoms & supposedly have heavy viral loads. CDC is still playing games with the droplets thang, & has hidden former advisories mentioning droplet transmission down another few layers. Would not want panic.

      BTW, if you have not seen this page, you might want to check it out – has case count progression data w/ links.

  4. Thanks for deriving an estimate of the spread rate, Jim. For purposes of calculating future total infections, can the time independent spread rate R be used in the same way as a fertility rate might be used for estimating a future population? For instance, one person infects 1.3 people, who in turn infect 1.69 people, and the newly infected 1.69 people go on to infect 2.20 people, etc., etc. This model assumes no counter-measures to slow the rate of infection.

    If true, assuming an R of 1.48 (approx. mid-point for your range), it would take about 15 iterations (or generations so to speak) of infections to get to Liberia’s current total of 1,062. And all of this took place in three to six months?

    (I built a simple spreadsheet in LibreOffice to handle the iterations.)

    At 30 iterations, the spreadsheet is showing about 400K total (accumulated) infections. 36 iterations is enough to infect every person in Liberia. (My apologies if I have misunderstood how to apply R here. I am approaching this topic from a layman’s perspective.)

    • Harold, I had a long comment written and lost it. Here’s the gist.

      What you are describing in your 1st paragraph is an accelerating infection rate, not an accelerating new case rate. Big difference. This can be a little tricky and hopefully I didn’t confuse it too badly.

      To get an accelerating new case rate, all one needs is R > 1.0, which will give exponentially increasing rates, scaling with the magnitude above 1.0. In your example, if R = 1.3, then 1.69 is the number of new cases relative to some initial case. That case infects 1.3 people and those 1.3 in turn infect 1.3 others, not 1.69 others. 1.69 is the number of new cases per unit time, and it’s difference from 1.30 (0.39) is greater than 1.3-1.0 (0.30); hence, a growing new case rate. The infection rate itself is not growing, but constant. An accelerating infection rate would lead to a super-accelerating new case rate and would indicate something completely out of control. What we should expect in this case is a decelerating infection rate, due to people becoming better informed and changing their behaviors. But when it will decelerate is the big question.

      Raising infection rates to powers, as I did, does not involve changing infection rates themselves, but only the time scale by which they are defined. What I did was compute a per-day rate of spread of the epidemic, and then converted that to a per 6-12 day rate, which must then be approximately the number of infections caused by each symptomatic person (on average). Note that when I said that the defn of R is time independent, there’s still an implicit time factor involved, because infectivity is time limited. The generalization here is that I could define the rate of ebola spread on any chosen time scale–daily, weekly, etc. For daily, the base is 1.043, for weekly the base is 1.043^7 (= 1.343). Thus after say 13 weeks, we’d have a new case rate of 1.343^13 = 1.043^(13*7) = 46.2 cases/day or 323 per week.

      Regarding your example, if the population of Liberia is 4.2 million, with a per-day rate of spread of 1.043 it would take about 286 days for everyone to become infected. Of course that assumes no change in spread. Note also that 1.043 is for all countries; Liberia’s rate will differ a little.

    • I prefer to keep models dead simple.
      Liberia: 22 June [51] 20 July [224] 20 Aug [1082]
      monthly increase multiplier: 4.4 4.8

      call it 4.6 X increase per month – therefore:
      Sept {4,977} Oct {22,895} Nov {105,317) Dec {484,461} Jan {2,228,519}
      somewhere Dec – Jan the outbreak could start decreasing / burn out as the potential uninfected pool becomes too small to support the prior 4.6 multiplier rate.

      We will soon see if it’s closer to 150 days or 286.

    • A couple of points on that. First, my model really is about as simple as you can get. Second, I think using a monthly base is too coarse for this. Too many things can happen in a month’s time that can change the rate of spread, including not just the behavior of the infected and those who care for them, but also potentially drug availability and other wild cards. I would go no coarser than a weekly base. Third, I used the rate for all three countries, because I was just trying to illustrate the larger concepts involved, not actually make a prediction as to what would happen. I could restate them with Liberia-specific numbers, but the bigger point is this is a hypothetical only. I don’t in any way believe that the population of Liberia is actually going to be wiped out–nothing even close to that is likely to happen.
      Also, late January 2015 from late June 2014 would be 7 months or ~ 210 days, not 150. Just as back of the envelope, if use a daily spread rate there of about 1.06, which is entirely reasonable, I get that value–210 days.

      Nevertheless, given that it’s now been fully five days since the last update report (whereas the interval over the last couple weeks has been 1-3 days), I’m very worried that we’re going to see a very large spike in the Liberian and Sierra Leonian cases in the next report. Indeed, I expect it.

    • Monthly data is undoubtedly too coarse but it DOES simplify the model. Given that the medical people with experience on the ground claim the reported WHO numbers represent at best only 25 – 50% of reality, I don’t think it really matters much. It suits the 1st order approximation that fits the lack of quality data, and the gross estimation of when the outbreak in Liberia is likely to peak and start a decline also suitable for the data quality. Nobody could possibly claim any precision nor the fitness of assumptions with future behavior of the figures that result and that’s fine because I certainly don’t. It’s simple arithmetic based on gross empirical observation and does not pretend to be anything else.

      Given Liberia’s current situation and the lack of serious action to date by NGOs & governments, I don’t see any reason why Ebola could not eventually infect nearly everyone who resides there subject to the inherent transmission limitations. There are something under 100 beds available for 1 million people in Monrovia, effectively all of the NGOs & western trained medical professionals have left the country or died, there is no contact tracing, minimal quarantine which is not likely to continue as the outbreak intensifies.

      Yeah, I was counting 150 days from post date, not from June nor from the 1st case in March, which was pretty obvious. I missed any actual concrete dates in your post that would have enabled me to discern when your time span started. It hardly matters. I would not rely on my ‘model’ or claim it has any sort of accuracy since it probably doesn’t have much. For all that, the numbers tell the story and given the assumptions the Liberia outbreak may well start to burn out in Dec-Jan because of lack of uninfected hosts.

      I also wonder what is going on with WHO’s normal report schedule. Something is up.

    • It doesn’t simplify the model–it just changes the base period, that’s all. The model structure is the same: y = b^x, as is the method for estimating b, the single parameter. You’re just changing b, and not for the better. By lengthening it to 30 days, you gain nothing but lose any ability to detect finer scale rate changes, which, granted, is not easy due to the data reporting issues, but which is at least possible.

      I don’t think we really have much of an idea of what’s going on in Monrovia, or the fraction of cases being reported, outside of hearsay and media stories of unknown veracity. There’s no way the entire country of Liberia is going to be infected by this; the world community will not allow that to happen.

      update: I did find this statement in the new WHO ebola “Roadmap”: “This Roadmap assumes that in many areas of intense transmission the actual number of cases may be 2-4 fold higher than that currently reported.” However, whether the statement is a description of the current reality, as based on evidence from the epidemic, or is simply a worse case scenario considered for modeling and planning purposes, is not made clear.

    • Jim, I think we will need to just agree to disagree on the modelling topic since we seem to be talking past each other and that’s OK.

      There are people with first hand experience who were and are there and DO have an idea and have told us what IS happening, as I posted previously. If you accept the WHO Epidemiology and Surveillance numbers, then you should more or less accept the rest of their statements so see below. You really should read the 1 page PDF of Frank Glover’s congressional eyewitness testimony linked previously if you have not already.

      “We believe the reported numbers only show 25-50% of the cases.”
      Ken Isaacs , Samaritan’s Purse, Congressional testimony 7 Aug

      “in many areas of intense transmission the actual number of cases may be 2-4 fold higher than that currently reported.” WHO 28 Aug Roadmap

      “the world community will not allow that to happen.”

      The US has certainly responded sending in 65 personnel (21 to Liberia) and $14.5 million in material support to the 4 countries. Obviously too little and too late. It would be wonderful news if the world DID provide the 489 million and 750 foreign personnel WHO says they now need. Instead we have WHO closing down their Sierra Leone testing center & Canadians pulling out their people today after their healthcare professionals contracted Ebola. Foreign support is mostly pulling out not adding people. Efforts to get the world off the dime are being met with statements minimizing the problem, handwaving, explaining how difficult Ebola transmission is, continuing efforts to hide and obscure information, and complaining how alarmists are trying to spread panic. The UN and government are not exactly known for solving problems expeditiously.

      There are just a few more months needed to solve the problem in the out-of-control outbreak areas the old fashioned & real-world way no matter which model is employed. We seem to be better at cleaning up our messes after the fact than preventing them. We won’t have too long to wait to see if optimism or pessimism was warranted. How does it go, the stages of loss & grief: denial, anger, bargaining, depression, acceptance.

      I think I will bow out of your blog now. Always good times to talk to a fellow biologist.

    • Jim, thanks for the in depth clarification. I’ve updated my spreadsheet to use the spread rate.

      Googling the two quarantined areas, Dolo/Dolo’s Town and West Point, produces interesting articles at, especially the articles that quote the inhabitants who are living there. These areas are functioning as little more than controlled “burnout” zones. Implications for the inhabitants are very ominous at this point.

      WHO data is now eight days old. Wonder when the next update will come.

    • Oh, never mind on the update. Liberia now has 1378 cases and 694 deaths as of August 26. WHO is encoding the update date in the URL. I thought I was looking at the most recent report.

      The most recent DONs are listed here:

      It’s either that or change the date in the URL.

    • Yes, new update today. I’ll post new graphs later–they are essentially indistinguishable from the previous report’s–not the big spike I expected.

      Thanks for the reference, I’ll check it out later.

  5. “However, it’s also a very new disease, known only since 1976, and as far as West Africa is concerned, entirely new, unexpected, unfamiliar and without resistance in the human population. The same held when it showed up in Zaire and Uganda previously. Therefore, one might expect significantly higher spread rates than for a viral disease having the same basic biology but with which the population has historical familiarity.”

    This family of virus is only known to modern science since 1976, but it’s possible (even likely) that the disease has a long history with people in central Africa. So the ≈55% fatality rate that’s being cited (for this particular viral strain) could be much higher in human populations of non-African descent. In other words, what if it’s killing ≈55% of a population that DOES have some level of “historical familiarity” with the virus?

    I’d be curious to see if/how fatality rates differ between locals and non-locals (not counting patients who are transported out of Africa, where the level of care is presumably higher, for now). Of course, Ebola survivors have already being released in the US, which makes me curious (concerned?) about them being infectious/carriers for 2-3 months after surviving… We may soon have an answer to that one :/

    It may be like Guns Germs and Steel, but this time wiping out disproportionately large percentages of the “civilised” world…

    The Story Of… Smallpox – and other Deadly Eurasian Germs

    • @ atom – considering the large mortality rate variance, small numbers of cases and local sporadic nature of outbreaks before present, it would seem to be doubtful that Ebola has engendered any significant selective pressure on any human populations other than perhaps the immediate residents of the area of disease endemism, if that.

      For a single mutation to proliferate in a human population, 25 generations x 18 (age of 1st reproduction) = 450 years of continuous selective pressure would be required, assuming that the SNP which conferred survival was already present in the population.

    • Keep in mind that several previous outbreaks, of both Zaire EBOV and Marburg virus, had 80% or higher mortality rates. So whatever resistance there is there, it’s not high. But yes it certainly can’t be expected to be any higher outside of equatorial Africa.

      I also think the data show that the death rate is being under-estimated, but I need to look more closely at it and haven’t had time.

  6. Pingback: What’s complex and what’s simple in an exponential model? | Ecologically Orientated

  7. Given that the symptoms are similar to other diseases including malaria, that the average level of trust in government appears to be low, and that (with the exception of health workers) infection appears to be predominantly among the poor, I don’t see how the officially reported figures (already highly biased towards laboratory-confirmations) can be anything but low. It is also important to remember that the Ebola viruses can be expected to change over time and any changes that increase transmission in an otherwise dead-end host will be selected for.

    Re aerosols. True, WHO is technically correct, but if someone with EVD coughs on you, I would recommend immediate quarantine (I wonder if some of the unexplained health care provider deaths are not the result of something similar). If any Ebola victim is on a plane, then (based on current knowledge) it is true that it is unlikely that the virus would have been spread throughout the plane. However, those sitting in the victim’s immediate vicinity should be quarantined and the entire plane thoroughly sanitized.

    Re virulence. Anecdotal information seems to indicate that hydration therapy can be effective. Given the symptoms, this make sense. So, I suspect that mortality rates above 60% are more about medical care than strain virulence, at least in this outbreak (as I recall Reston was thought to be low virulence – but transmitted by aerosol.)

    Looks like the new outbreak in Zaire is being dealt with in the usual manner (strict quarantine of an isolated area). Probably there won’t be much data available, but it would be interesting to compare the results to the west African outbreak. Along with Putin giving veiled threats of nuclear retaliation re the Ukraine, we certainly do live in interesting times.

Have at it

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s