A pandemic-related study recently published by the state University at Albany’s School of Public Health is marred by factual errors, inconsistencies and methodological issues that raise doubts about its findings – and questions about the process by which it was reviewed.
Billed as the largest study to date of COVID-19 in health-care workers, the paper appeared in the August 2022 issue of Emerging Infectious Diseases, a widely cited journal published by the CDC.
The study found that health-care workers who were male, over 50, Black or Asian experienced higher death rates than other demographic groups through the first 19 months of the pandemic.
While some of those findings are generally consistent with previous research, the paper includes facts and figures that are far out of line with other sources.
For example, one of its tables puts the total number of COVID deaths as of Oct. 12, 2021, at 1.4 million. That’s almost double the CDC’s published death toll for that point of the pandemic – and well above its current toll of 1.1 million.
At the same time, the paper suggests the number of cases recorded by October 2021 was 6.3 million, which is about one-seventh of the CDC’s official count at that time.
At one point, the paper says the COVID “fatality rate” for non-health-care workers was 24.64 percent, or almost one in four. That figure – which appears to be based on dividing the understated count of cases into the overstated tally of deaths – is about 23 times higher than the observed fatality rate nationwide of just over 1 percent.
Later, the paper cites a “fatality rate in the U.S. general population” of 2.48 percent without explaining the 10-fold disparity with the statistic given earlier.
In another odd claim, the paper says “the highest death numbers occurred in the first surge (April 2020).” According the CDC, however, the nation’s the highest daily death toll was 23,372 on Jan. 13, 2021. That was 51 percent greater than the first wave’s peak on April 15, 2020.
Some of the paper’s shortcomings appear to derive from gaps in the CDC’s COVID-19 “case surveillance” data, which was the basis for the study’s analysis.
Starting in the spring of 2020, the CDC mandated that public health officials record each case of COVID-19 using a 33-page form. The form sought a range of information, including onset date, symptoms, underlying conditions, age, gender, ethnicity and whether the patient was a health-care worker. To protect patient privacy, the most detailed data are available only to authorized researchers, including the authors of the UAlbany study.
The dataset includes records of more than 90 million cases, including 45 million recorded before Oct. 12, 2021, which was the end point of the period examined by the UAlbany study.
However, most of the records in the CDC data set are incomplete. Fifty-one percent of all data fields were left blank when the forms were filled out, and for another 8 percent the answer was given as “unknown.”
The lack of information is particularly common with respect to two questions that were central to the paper’s analysis: Only 14 percent of the records identified whether the infected person was a health-care worker or not, and only 35 percent indicated whether the patient died or survived.
The researchers would have had to disregard the vast majority of records because they were missing one or both of these facts, which appears to explain why the study focused on only 6.3 million of almost 45 million recorded cases during the study period.
As the researchers drew comparisons based on gender, ethnicity, age and other factors, they would have had to leave out additional records that lacked those demographic details. The study’s analysis of disparities among racial and ethnic groups is based on a sample of fewer than 3 million records, or 7 percent of the overall data set.
Although the UAlbany paper alludes to missing data, it does not discuss the issue in detail or explain that the problem affected the overwhelming majority of cases. The authors open their methodology section by saying, “Our study population included all COVID-19 infection cases reported by the CDC,” and that they “obtained demographic and medical information for each record in this dataset” (emphasis added).
The paper’s major findings were based on comparing a demographic group’s share of “non-deaths” – that is, COVID survivors – to that same group’s share of COVID fatalities.
For example, among the health-care worker cases reviewed, women represented 81 percent of non-deaths vs. 61 percent of deaths, suggesting that female workers were at relatively low risk of death from COVID. By contrast, male workers accounted for 19 percent of non-deaths and 39 percent of deaths, suggesting that they were at higher risk.
Similar signs of elevated risk were found for workers who were older than 50, Black and Asian.
However, the fact that the vast majority of records had to be omitted from the analysis due to missing data raises doubt about the validity of these findings – especially since the gaps were more common for some groups than others.
In answer to whether the patient died from COVID, for instance, the publicly available version of the data set includes 678,000 records with the answer “yes” through mid-October 2021, which is 91 percent of the death toll for that period. But it includes 15.9 million with the answer “no,” which is only 35 percent of the people who were known to be infected with COVID and survived.
In other words, patients who died are overrepresented in the sample relative to survivors.
Also, the data gaps were unevenly distributed among demographic categories. Death information was missing or unknown for 58 percent of identified white patients, 62 percent of Black patients, 74 percent of Hispanic/Latino patients, and 76 percent of Asian patients.
The many gaps in the CDC’s case surveillance data limit its value for statistical analysis, especially when analyzing disparities among racial groups.
The paper also finds that the fatality rate in the general population was seven times higher than it was among health-care workers, and suggests that this might be related to the workers’ “better access to healthcare and treatment.”
It does not mention another likely explanation, which is that many health-care workers were routinely tested for COVID on the job – in some cases twice a week. This means that infections among these workers were more likely to be caught, even if they were mild or asymptomatic, which in turn would reduce their apparent fatality rate.
Many of the paper’s shortcomings derive from the significant flaws in the CDC’s data set, which are emblematic of a weak point in the nation’s public health system. The federal government’s inability to gather timely, accurate information frequently hampered its response to the pandemic.
The CDC also publishes the journal in which the UAlbany paper appeared and could have identified weaknesses and inaccuracies during its editing and review process.
The Empire Center has been in contact with the paper’s lead author, Dr. Shao Lin, for reaction to some of these concerns.
Her initial comments noted, among other points, that the researchers had used a restricted-access version of the CDC data that might differ in some ways from what’s available to the public, and that their analysis had been checked by experienced statisticians and subject to peer review before publication.
Dr. Lin said she and her team would need more time to respond in full. This post will be updated as appropriate when that response becomes available.