Health officials in the epicenter of the coronavirus outbreak reported a surge in new infections Thursday after changing how they diagnose the illness. The announcement underpins signs that the virus data out of China is flawed.
Officials in the Hubei province, made up of 60 million people, reported more than 13,000 new cases when they expanded how they test patients from throat-swabs to more thorough examinations. The new cases brought the total count to nearly 60,000 infections from 45,000 on Wednesday. Most of those cases and all but one death have been in China.
The sharp revision out of Hubei casts further doubt that the numbers China is reporting to the World Health Organization are reliable. Virus data from China have been in focus in recent weeks as economists and investors try to gauge the economic toll of the outbreak, which has already slowed the world’s second-largest economy and led many U.S. companies operating there to suspend operations. After hitting record highs Wednesday, U.S. stock indexes dipped Thursday on the new revelations.
Anomalies had shown up in China’s coronavirus numbers even before the change in methodology. For instance, the number of deaths reported appeared to correspond to a simple mathematical formula to a very high accuracy, according to a quantitative-finance specialist who ran a regression of the data for Barron’s. A near-perfect 99.99% of variance is explained by the equation, this person said, referring to a statistical measure known as r-squared. That’s a fancy way of saying that the data updating the number of deaths was almost perfectly predictable. “This never happens with real data, which is always noisy,” the person said.
China’s U.S. embassy didn’t immediately respond to requests for comment.
Barron’s re-created the regression analysis of total deaths caused by the virus, which first emerged in the central Chinese city of Wuhan at the end of last year, and found the same variance. We ran it by Melody Goodman, associate professor of biostatistics at New York University’s School of Global Public Health.
“I have never in my years seen an r-squared of 0.99,” Goodman said. “As a statistician it makes me question the data.”
For context, Goodman said a “really good” r-squared, in terms of public health data, would be a 0.7. “Anything like 0.99,” she said “would make me think that someone is simulating data. It would mean you already know what is going to happen.”
There’s one scenario where the data could be understandably manipulated, Goodman said. Because there are privacy concerns around public health data, it’s conceivable that someone would simulate the data based on real data, so as not to make the data identifiable. But even then, the r-squared in this case is extraordinarily high. Moreover, Goodman said when data are manipulated to protect privacy, it would need to be disclosed; there is no such disclosure on the WHO site.
Some economists say there is a longstanding measurement problem in China, irrespective of the coronavirus. Official economic statistics often differ from private attempts to replicate the results. The government-created purchasing-managers index was stronger than a closely watched private version during every month of 2019.
“It’s an emerging economy,” said Carl Weinberg, chief economist at High Frequency Economics. “There’s a natural roughness to the data.”
But questionable data makes forecasting the severity of the virus and economic hit in China and beyond that much harder.
Torsten Slok, chief economist at Deutsche Bank Securities, said he expects the outbreak to shave 1.5 percentage points off Chinese gross domestic product this year. He recently revised his 2020 GDP estimate for China to 4.6% from 6.1%, and he said he thinks the virus will take a 0.5 percentage point off global growth this year.
Estimates like Slok’s, of course, are based in part on data being supplied to the WHO by China.
—Al Root contributed to this story.
By Lisa Beilfuss