Is SARS-CoV-2 viral load lower in young children than adults? Jones et al provide evidence that it is (in spite of their claims to the contrary).

Kevin McConway (The Open University) and David Spiegelhalter (University of Cambridge).**

Summary

The article ‘An analysis of SARS-CoV-2 viral load by patient age’ by Jones et al. claims that “viral loads in the very young do not differ significantly from those of adults. ”, and the authors “caution against an unlimited re-opening of schools and kindergartens in the present situation. Children may be as infectious as adults.” It has been widely reported as implying that viral loads in children are similar to adults, and yet the data in the article show children between 1 and 10 having on average 27% (conservative 95% interval 8% to 91%) of the viral load of adults aged over 20. We show how inappropriate statistical analysis led to the authors’ unjustified conclusions: essentially, in spite of initially finding a statistically significant difference between subgroups, they made it disappear by doing so many additional and uninteresting comparisons. We recommend that the error is acknowledged and the paper is withdrawn from circulation.

Figure 1 Log-10 viral loads in samples of different age-categories: 1 (ages 1–10), 2 (11–20), 3 (21–30), …, 10 (91–100). Figure A3 in Jones et al.
Figure 2. The same data as in Figure 1, arranged into alternative age-categories: Adult (ages 26–45), Grade School (GS, 7–11), High School (HS, 12–19), Kindergarten (KG, 0–6), Mature (over 45), University (Uni, 20–25). Figure A4 in Jones et al.
Figure 3. Some summary statistics for log10 transformed viral loads in different age categories. (We term this a ‘figure’, as it is a direct reproduction of Table 2 of Jones et al, including the excessive number of decimal places.)
Table 1. Statistical summaries of alternative comparisons
  • Problem 2. They assumed age as a categorical variable, rather than continuous or ordered. The title of the paper concerns the relationship between viral load and age, and since age is a continuous variable available on all subjects, a standard statistical approach would be to fit some form of regression line. Held has done so using the summary statistics provided, and found a statistically significant linear relationship of viral load with age. An alternative would be to allow a non-linear relationship.
  • Instead, the authors treated the age-groups as non-ordered categories, as if they were say different countries, and used a nonparametric Kruskal-Wallis test to compare viral loads, finding P values of 0.008 and 0.010 for the two age-classifications, indicating that the age-groups do not all have the same viral load. This is reasonable, but the next step essentially, and inappropriately, over-rode that conclusion.
  • Problem 3. They carried out an unnecessary number of multiple comparisons. Having found a statistically significant difference between the group, Jones et al went on to test a large number of ‘post-hoc’ hypotheses, meaning they were not pre-specified. For the age-categorisation into ten age-groups, they used a variety of methods to test every group against every other: a total of 45 comparisons, none of which were of particular interest, such as comparing people aged between 21 and 30 with those aged between 61 and 70. Since so many hypothesis tests are being carried out, it is well-known that it is extremely difficult for any particular comparison to stand out, and it is perhaps remarkable that for one method (Dunn’s test) they did find one statistically significant difference between the Kindergarten (0–6) and the Mature (>45) groups. But they say that the “overwhelming conclusion from the three post hoc testing methods is that no significant differences in viral load exists between any subgroups in either categorization”. Taken together with the results of their Kruskal-Wallis tests, this conclusion is inappropriate; a fairer one would have been “We have found statistically significant evidence that average viral loads vary with age; however, our further investigation has not tied down precisely the nature of that association.”

Technical appendix

As a check on the appropriateness of treating the sample mean of the data from the 1–10 and 0–6 (Kindergarten) age groups as having approximately normal distributions, we estimated the distribution of the sample mean for these groups. Since we do not at present have access to the raw data, we read the numbers of cases off the histograms provided in Figures 1 and corrected the ‘data’ to have the correct mean and standard deviation. This provides a sufficiently close approximation to the raw data to check the adequacy of a normal assumption for the distribution for the sample mean, by creating 10,000 bootstrap samples and plotting the resulting sample means. The resulting distributions are very similar to normal (Figures 4 and 5).

Figure 4. Estimated probability density of the sample means for ages 1–10 (in black) compared to the corresponding normal distribution (in blue).
Figure 5. Estimated probability density of the sample means for ages 0–6 (in black) compared to the corresponding normal distribution (in blue).
Figure 6: Sample mean, and mean ± 2×SE, for log-transformed viral load for all C1 age classes
Figure 7: Sample mean, and mean ± 2×SE, for log-transformed viral load for all C2 age classes
Table 2. Statistical summaries of further comparisons
Figure 8: Sample mean, and mean ± 2×SE, for log-transformed viral load for certain age classes
  • ** Professor Sir David Spiegelhalter is Chairman of the Winton Centre for Risk and Evidence Communication and Fellow of Churchill College in the University of Cambridge, having recently retired from his position as Winton Professor of Public Understanding of Risk. He was President of the Royal Statistical Society from 2017 to 2018. Email david@statslab.cam.ac.uk Twitter @d_spiegel

Statistician, communicator about evidence, risk, probability, chance, uncertainty, etc. Chair, Winton Centre for Risk and Evidence Communication, Cambridge.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store