It’s unlikely, but not impossible, that many or most people have already been infected
16 April 2020 (minor updates 17 April 2020)
Contagious (and possibly dangerous) memes
Many false and/or dangerous memes have spread about coronavirus (SARS-CoV-2) and COVID-19. These include that 5G cell phone towers cause COVID-19, that people of certain ethnic backgrounds don’t get sick, that certain drugs are miracle cures, that the virus was deliberately created by [China to attack the US / the US to attack China], and more.
(As an aside, mathetmatical epidemiologist Adam Kucharski’s The Rules of Contagion is a great and highly topical read, though more about the application of epidemiological concepts to spaces other than infectious diseases, including to meme propogation.)
One framework for thinking about these memes is against a two-dimensional space, with one axis being how likely they are to be true, and the other being how helpful or dangerous they would be if they gained widespread acceptance.

Obviously false but harmless ideas can be laughed at; obviously false but dangerous ideas should be argued against using data and logic; and true, useful ideas should be widely propagated.
The most concerning memes are those that meet all of the following criteria: they’re likely to be false (but can’t be demonstrated to be false); people are inclined to believe that they’re true; and their widespread acceptance could be dangerous. We can’t yet definitely disprove them, and need to study them dispassionately, but we don’t want everyone to change their behavior under the assumption that they are true.
With that in mind, there’s an idea that has been circulating for at least a month, and appears to be gaining momentum, that really concerns me.
The idea is this: that the rate of SARS-CoV-2 infection is much, much higher than we currently think, in some or in many countries.
For simplicity, I’ll call this the “ubiquity hypothesis.”
By the way, if you’re not already familiar with the definitions of and the differences between Case Fatality Ratio (CFR) and Infection Fatality Ratio (IFR) read this. In short, CFR measures the fatality ratio among confirmed cases of COVID-19; IFR measure the fatality ratio among all those infected (including asymptomatic or otherwise unconfirmed / unknown). We can approximately measure CFR, but we really care about IFR.
The argument for the ubiquity hypothesis
Here’s the logic for this idea in a nutshell.
- We don’t yet have a definitive way of knowing who currently has COVID-19, or has previously had it and recovered.
- That’s both because the rate of PCR testing (tests for who is currently infected) has been so low in most countries,
- and because we don’t yet have widespread, reliable serological testing (tests for who has previously been infected).
- What we do know with a somewhat higher degree of confidence (but still very significant error bars) is how many people have died while infected with COVID-19.
- But the data on reported, confirmed cases and of deaths attributed to COVID-19 are in theory consistent with two very different scenarios.
- The first is (I’m pretty confident) the mainstream, consensus view among epidemiologists and public health officials: in most countries and most locales, only a relatively small percentage of the population (usually <10%, sometimes much less than this, with a few notable exceptions in certainly highly impacted regions like Lombardy in Northern Italy) has been infected so far. Yes, there is significant underreporting, but this position takes that into account. I’ll call this the “mainstream view.”
- The second view, a minority view which appears to be gaining some momentum, is that a much higher rate of the population has already been infected. (One blog I follow and (usually) respect argued that 30% of the US population may have already been infected.)
- On this view, the vast majority of cases are asymptomatic or at least very mild, and therefore don’t get tested or picked up.
- An important consequence of this second view, the ubiquity hypothesis, would be that the Infection Fatality Rate would be at least an order of magnitude lower than the kinds of figures we hear for the Case Fatality Rate (see here for definitions) — potentially as low as for seasonal flu or even lower.
I want to pause to outline a few things to consider here.
- Who actually holds the ubiquity hypothesis?
- What evidence is there for and against the ubiquity hypothesis, and how can we decide if it’s true?
- How should we proceed given that there is uncertainty?
Who actually holds the ubiquity hypothesis?
Here are just a few.
The blog I mentioned above, arguing that >100M Americans have already been infected is here. Dr Baker (whose blog I read, admire, and sometimes cite) posted a follow-up here; I was one of the people who wrote to him to argue against his view and to whom he was responding in that post.
The epidemiologist Marc Lipsitch (who I do not think subscribes to the ubiquitous view) wrote an excellent opinion piece for the New York Times this week (13 April), primarily focusing on the question of immunity. Dr Lipsitch’s article, while apparently subscribing to the mainstream view (underreporting is up to 10x), allowed that at least one paper — a pre-print here — argues for the ubiquitous view. This paper argues that CDC data on “Influenza-Like Illness” correlates with COVID-19 clusters, and is consistent with “at least 28 million presumed symptomatic SARS-CoV-2 patients across the US during the three week period from March 8 to March 28;” it then argues that the number of cases could have continued to double every 3.5 days from there. If that’s true, we could have well in excess of 100 million cases (current and recovered) in the US.
(Here is an Economist article about the same pre-print.
In late March, there was a much-cited pre-print from modeling group at Oxford arguing that as much as half of the UK population had already been infected. (This paper was also much-criticised; Adam Kucharski’s article and this article from LiveScience are worth reading.)
This video from a German doctor has been making the rounds, arguing that we are over-reacting and that if we let the disease rip, we’ll face at worst 30 extra deaths per day in Germany versus a baseline of 2,200.
This video from a Swedish epidemiologist claims that 50% of UK and Sweden have already been infected, that COVID-19 is a “mild disease”, and that we should just let it rip.
What evidence is there for and against this view?
Current estimates of CFRs might overstate the IFR significantly, because CFRs decline over time
The excellent and cautious Our World in Data (who does not argue for the ubiquity hypothesis) points out that estimates of Case Fatality Ratios usually decline over time as we learn more about the true extent of infection, and that estimated CFRs have declined in some cases for COVID-19. Other sources discuss how this has been true in past epidemics.
But most commentators agree that currently measured CFRs (sometimes called the “crude” rate) are too high, both because we don’t know the true size of the denominator, and because there is a lag between infection and death.
Thus, everyone acknowledges that the true denominator is higher than the reported case count, and therefore that IFR will be lower than measured CFRs. So this doesn’t argue in favor of either view.
Many studies that try to estimate the true incidence of the disease are consistent with the mainstream view, not the ubiquity hypothesis
One such study — very good, and regularly updated, from the CMMID — is here.
Most estimates of the true rate of infection for a given country (and therefore the rate of underreporting) that I’ve come across are based on the assumption that the fatality rate is known and more or less constant in most situations.
But of course risks having an element of circularity to it. If we use an estimate of the IFR and the attributed deaths to estimate the true incidence of the disease, then our estimate of the incidence is only as good as our estimate of the IFR. And we can’t estimate IFR without estimating the true incidence.
One way out of that problem is to look at special situations where we can estimate the IFR with high confidence. Several examples where a high percentage of the relevant population was tested, or where there was a meaningful random or quasi-random sample, include the Diamond Princess cruise ship; early evacuation flights from China; and extensive analysis of how the disease evolved in Wuhan.
These tend to converge on estimates of Case Fatality Ratios (calculating based on those who calculate symptoms) of 1.0-1.5%.
We also know much more now about the ratio of cases that are asymptomatic. I’ve seen recent estimates in clustering in the 40-50% range, though some estimates are lower. Even if 50% of cases were asymptomatic, this would put the IFR at 0.5-0.75%. Those figures are consistent with the mainstream view, not the ubiquity hypothesis.
A study in the Lancet (summarized here) tries to adjust for asymptomatic cases, and concludes that the IFR is 0.66%.
One very recent study from Iceland, published in the New England Journal of Medicine, is particularly valuable because it is one of the only one I know of to try to estimate the incidence of infection in the general population. There were three groups in this study: one drawn from high-risk individuals; and two drawn from the general population using different strategies (not strictly random since individuals could choose to participate or not.). For those two groups selected from the general population for PCR testing, the incidence rate was well under 1%, and rate the did not increase over the 20-day duration of screening. Other nuggets from the study:
- 57% of those in the overall population group reported symptoms. 29% of those who tested negative reported symptoms.
- 43% of participants who tested positive (across the three groups) reported no symptoms.
Reportedly, a study using serological tests against 500 residents in Gangelt, Germany — hit hard by COVID-19 after many were exposed at Carnival — found that 15% of residents had antibodies and therefore had had COVID-19 at some point. This study found a 0.37% IFR (note that Germany has consistently had a lower CFR than most countries, discussed here.)
On a smaller scale, and earlier, doctors tested all 3,000 inhabitants in the town of Vo, in Northern Italy, and found 66 positives (2.2% attack rate).
Are we under- or over-attributing deaths to COVID-19?
Another source of uncertainty is that the reported deaths from COVID-19 may, for various reasons, over- or under-attribute deaths. (I discussed this in some detail here.)
We know the official statistics often exclude deaths that did not occur in hospitals; and almost always exclude deaths where there was not a positive COVID-19 test (perhaps no test was administered, or there was a false negative). These are reasons to believe in under-attribution.
There are arguments for over-attribution too. For example, those who die in a hospital with a COVID-19-positive test might have died in any case; we know that there is a high incidence of co-morbidity, often with serious pre-existing conditions.
One way to try to get to the bottom of this is to look at the overall death rate in a given population and see if it is higher or lower than normal.
Two articles that try to do this in different places: the New York Times article, Deaths in New York City Are More Than Double the Usual Total; and the Economist article, Covid-19’s death toll appears higher than official figures suggest.
Here is a very good Twitter thread making similar arguments and estimating true IFR at 0.5%: “…numbers aggregated by country can be very misleading and mask the severity of #COVID19 in heavily affected communities. A community doing well is likely due to at most a few percent having been infected so far. More viral spread means more morbidity.”
While this evidence doesn’t definitively say that we are under-attributing deaths, it makes me very wary of arguments that claim that there are far fewer “real” COVID-19 deaths than we think.
How should we proceed given that there is uncertainty?
I’ve argued above the reasons that the ubiquity hypothesis is unlikely to be true. But it’s impossible, at this point, to say that it definitely isn’t true.
I think that points to several conclusions.
First, even those who hold the ubiquity hypothesis must admit (and many do, to be fair) that there is a reasonable chance that it is false; and vice-versa. So we need to acknowledge, as with so many aspects of this pandemic, that we need to be humble about the degree of uncertainty we face.
Second, if there is a good chance that the true IFR is even 0.5%, let alone 1-2%, we can’t allow the disease to run unchecked through the population. Say the final attack rate to achieve herd immunity is 50% (and there are arguments for 20%-70%); 8 billion * 50% * 0.5% = 20 million deaths.
Third, this uncertainty — and the growing pressure to ease control measures and allow a significant degree of economic activity to resume — makes it all the more urgent that we have not one, but many serological testing-based studies to get a better understanding of the true rate of incidence.