I have to share this cute little bit of mathematics with you

Nov 16, 2021

“The first principle is that you must not fool yourself - and you are the easiest to fool.” ~Richard Feynman

I’ll say up front here and with complete sincerity that I do not know how much, if at all, the considerations at the link below explain present-day data. But it is certainly an illustration of how easily you can be fooled by the data if you don't examine it in detail. Let me give you the link, but then read my explanation before you click over because I think it might be more understandable than his own.

Therefore from, it appears, a statistics professor in the UK, “Is vaccine efficacy a statistical illusion?”.

So what this fellow did is ran a very simple spreadsheet model of a hypothetical disease that killed, every week, 15 out of every 100,000 people. And then he modeled rolling out a vaccine to the population that did literally nothing, it was pure placebo, it didn't affect the death rate at all. And then, and this is the critical fact, he supposed a one-week delay in death reporting so that the vaccinated/unvaccinated deaths that occurred in prior week get compared to the vaccinated/unvaccinated population this week. If you then plot the vaccinated/unvaccinated death rate as a function of time, you get a spike in the unvaccinated death rate (and a depression in the vaccinated death rate) that looks very convincing but is in fact entirely fictitious, it is entirely an artifact of the one-week delay in reporting combined with the fact that the vaccinated population has been increasing while the unvaccinated population has been decreasing. (And then of course, as the time extends to infinity, both curves gradually return back to a death rate of 15 per 100,000.)

Now again, I don’t know how much these considerations actually matter for the real data, but at a minimum, it’s a great illustration of how easy it would be to be fooled, even if you don’t think we’re being fooled right now. And it’s such a simple model a high school student could build it in a few minutes with spreadsheet software… I love things like that, be a great classroom illustration.

Arne

Nov 17, 2021

Related to the professor's model is the issue about the control groups for the vaccine trials last year being wiped out because they weren't kept from getting vaccinated after the trials ended. In the U.S., we don't have a good statistical basis for comparing these 3 groups: vaccinated, unvaccinated and uninfected, and unvaccinated and infected.

I know terrible data quality is one of the common complaints that people make about the CDC and the state health agencies.

Expand full comment

Edward Hamilton

Justin Hart finds that this kind of reporting lag error for deaths is probably on the scale of multiple weeks, if not months. There are still deaths being reported from 2020!

https://covidreason.substack.com/p/cdc-data-almost-half-of-newly-reported?r=7ikwa&utm_campaign=post&utm_medium=web&utm_source=

This simulated data set reminds me of the small denominator instability problems associated with integrating dynamical equations in non-Cartesian spaces in the vicinity of a geometric singularity. I remember trying to implement time-dependent dynamics simulations for the 3D rotation of asymmetric solid rotors and seeing the really ugly divergences around the poles. There are ways to smooth the divergences (e.g. using quaternions) but if you aren't aware of the problem, you'll just integrate right through the unstable region and permanently ruin your time series.

This feels like the same sort of thing. There are effective divergences due to unstable denominators (here the small population of vaccinated at the start of the series, and unvaccinated at the end of the series). This makes this portion of the data the worst for evaluating the efficiency of the vaccine. The best data comes if you stabilize somewhere in the middle (around 50% vaccination), where both denominators are sizable relative to their rate of change. *Even if* the lag were perfectly corrected, the intrinsic noisiness of the data will be magnified by the small denominator problem. Relevant quote: "There could, of course, be reasons other than just delays in death reporting or misclassification. For example, any systematic underestimation of the actual proportion who remain unvaccinated would lead to a higher mortality rate for unvaccinated higher than that for the vaccinated, even if the mortality rates were equal in each category." This problem gets *worse* as you approach full vaccination due to the unvaccinated denominator going imperfectly toward zero!

1 more comment...

Connecting Thoughts

Discussion about this post