Is SARS-CoV-2 viral load lower in young children than adults? Jones et al provide evidence that it is (in spite of their claims to the contrary).
Kevin McConway (The Open University) and David Spiegelhalter (University of Cambridge).**
The article ‘An analysis of SARS-CoV-2 viral load by patient age’ by Jones et al. claims that “viral loads in the very young do not differ significantly from those of adults. ”, and the authors “caution against an unlimited re-opening of schools and kindergartens in the present situation. Children may be as infectious as adults.” It has been widely reported as implying that viral loads in children are similar to adults, and yet the data in the article show children between 1 and 10 having on average 27% (conservative 95% interval 8% to 91%) of the viral load of adults aged over 20. We show how inappropriate statistical analysis led to the authors’ unjustified conclusions: essentially, in spite of initially finding a statistically significant difference between subgroups, they made it disappear by doing so many additional and uninteresting comparisons. We recommend that the error is acknowledged and the paper is withdrawn from circulation.
What is the question, and what is the relevant data to help answer this question?
As the re-opening of schools is being considered or taking place, the ability of children to infect others becomes a crucial issue. Since infectivity may be linked to the SARS-CoV-2 viral load carried by infected children, a question of interest is -
Does the viral load of the young children differ from adults?
A primary reference on this issue is the pre-print by Jones et al. [slightly revised version here] whose abstract claims that “viral loads in the very young do not differ significantly from those of adults”: the authors also “caution against an unlimited re-opening of schools and kindergartens in the present situation. Children may be as infectious as adults”. Since being placed on the website of the German research network on zoonotic infectious diseases, it has attracted extensive coverage, for example articles in Nature, academic articles, and media stories claiming Covid patients found to have similar virus levels across ages, children and adults who are infected have similar viral loads, and children as infectious as adults . Media reports, of course, often exaggerate scientific claims and downplay uncertainty, but the statistical analysis has also been criticised by statisticians Held and Liebl: in particular our argument closely follows a pre-print by Jörg Stoye, and a re-analysis by Curtis.
If we are interested in answering the question above, we can use the data summaries provided by Jones et al and reproduced in the Figures below.
The question posed above requires a definition of ‘young children’ and ‘adult’, which must be made independently of the data. Defining ‘adult’ as ‘aged > 20’ seems uncontroversial. However, ‘young children’ may depend on the context, and we look at both ages 0–6 (Kindergarten, KG) and ages 1–10.
We can easily calculate that the ‘adult’ group comprises 3,585 samples, with mean 5.21, standard deviation 1.91 and standard error 0.032. We first use Category C1 to compare viral loads in ages 1–10 with those in the over 20s. The observed difference between groups is -0.57 (corresponding to a viral-load ratio of 10^(–0.57) = 0.27), and with standard error 0.26, as shown in Table 1. The statistical challenge is to obtain a reasonable confidence interval for this difference.
A naïve confidence interval would use the Central Limit Theorem to argue that the difference between two sample means has an approximately normal distribution, even though the raw data are clearly non-normal. This leads to a naïve confidence interval of +/- 1.96 standard errors, or -1.09 to -0.055. But we should take into account that the standard errors have been estimated, and the Welch-Satterthwaite t-distribution gives a slightly wider interval, as shown in Table 1.
However this procedure is formally based on an assumption of normally distributed data, which is clearly not the case here. Fortunately the robustness of the Welch t-distribution to non-normality has been extensively studied, and is generally found to be good except when there is severe skewness in the data. The key point is that this is a comparison of means, and means from even relatively small samples (e.g. of size 49, or 37) have a distribution that is closely approximated by a normal distribution even for quite markedly non-normal distributions in the raw data, as pointed out by Held. An approximate bootstrap analysis described in the Technical Appendix supports this.
If we are interested in the ‘very young’, an alternative is to compare Kindergarten children (aged under 7) to ‘adults’ aged over 20 which gives broadly similar results, and the difference in average viral loads is even larger than for those aged 1–10.
The conclusion seems straightforward: using either categorisation there is a practically and statistically significant smaller viral load in young children than adults.
So how did Jones et al end up claiming there was no evidence for a difference?
We see three main problems in their analysis:
- Problem 1. Their analysis was not aimed at a primary question of interest. Despite the clear prior interest how viral loads in children compared with those of adults, the authors did not pick out that comparison specifically, but treated all comparisons between all the age groups in each of their two classifications on an equal footing.
- Problem 2. They assumed age as a categorical variable, rather than continuous or ordered. The title of the paper concerns the relationship between viral load and age, and since age is a continuous variable available on all subjects, a standard statistical approach would be to fit some form of regression line. Held has done so using the summary statistics provided, and found a statistically significant linear relationship of viral load with age. An alternative would be to allow a non-linear relationship.
- Instead, the authors treated the age-groups as non-ordered categories, as if they were say different countries, and used a nonparametric Kruskal-Wallis test to compare viral loads, finding P values of 0.008 and 0.010 for the two age-classifications, indicating that the age-groups do not all have the same viral load. This is reasonable, but the next step essentially, and inappropriately, over-rode that conclusion.
- Problem 3. They carried out an unnecessary number of multiple comparisons. Having found a statistically significant difference between the group, Jones et al went on to test a large number of ‘post-hoc’ hypotheses, meaning they were not pre-specified. For the age-categorisation into ten age-groups, they used a variety of methods to test every group against every other: a total of 45 comparisons, none of which were of particular interest, such as comparing people aged between 21 and 30 with those aged between 61 and 70. Since so many hypothesis tests are being carried out, it is well-known that it is extremely difficult for any particular comparison to stand out, and it is perhaps remarkable that for one method (Dunn’s test) they did find one statistically significant difference between the Kindergarten (0–6) and the Mature (>45) groups. But they say that the “overwhelming conclusion from the three post hoc testing methods is that no significant differences in viral load exists between any subgroups in either categorization”. Taken together with the results of their Kruskal-Wallis tests, this conclusion is inappropriate; a fairer one would have been “We have found statistically significant evidence that average viral loads vary with age; however, our further investigation has not tied down precisely the nature of that association.”
They also disregarded parametric methods in favour of non-parametric and, as we argue above, a parametric approach would be reasonable for these data. But this would have made little difference to the conclusions.
We have shown that an externally-set question of primary public interest, whether the viral loads in young children are lower than those in adults, can be confidently answered in the affirmative, at least for the set of samples provided in Jones et al. Essentially their inappropriate statistical analysis meant that, in spite of initially finding a statistically significant difference between subgroups, they made it disappear by doing so many additional and uninteresting comparisons.
We must be cautious, however, in generalising from this particular group to the general population and influencing any policy, as we cannot assume that the patients who provided the data are by any means a representative sample of the general population of Berlin, or Germany, or anywhere else — and the authors of the paper make no claim that they do. They provided the data because they were tested, and they were tested for reasons, in part at least, to do with the symptoms of COVID-19 that they were showing. Selecting cases on the basis of symptoms can make observational data appear to show associations that do not really exist, and are certainly not causal, and also to fail to show associations that really do exist and may be causal.
We suggest the pre-print should be withdrawn from the website, and the inappropriate analysis acknowledged.
As a check on the appropriateness of treating the sample mean of the data from the 1–10 and 0–6 (Kindergarten) age groups as having approximately normal distributions, we estimated the distribution of the sample mean for these groups. Since we do not at present have access to the raw data, we read the numbers of cases off the histograms provided in Figures 1 and corrected the ‘data’ to have the correct mean and standard deviation. This provides a sufficiently close approximation to the raw data to check the adequacy of a normal assumption for the distribution for the sample mean, by creating 10,000 bootstrap samples and plotting the resulting sample means. The resulting distributions are very similar to normal (Figures 4 and 5).
As a final check on the appropriateness of the confidence intervals in Table 1, we used a bootstrap method, based again on an approximation to the original data from Figure 1, but not making any specific assumptions about the shapes of the distributions involved. This resulted in a 95% confidence interval for the difference in means (on the log scale) comparing age 1–10 with adults (>20) from -1.11 to -0.054. For the comparison between age 0–6 and adults, the corresponding interval runs from -1.36 to -0.32. Neither is importantly different from the version in Table 1. We are satisfied that the comparisons shown in Figure 1 are appropriate and do not distort the message of the data from Jones et al.
Figures 6 and 7 show the sample means of the log transformed viral loads for each of the two categorizations, together with approximate confidence intervals (calculated as mean ± 2 × SE).
Because all the groups have been considered separately in Figures 6 and 7, it is almost impossible to use them to evaluate the question of primary interest about viral loads in children compared to adults. The diagrams also make it clear, because of the differences in the sizes of the confidence intervals, how much less informative the data are for the children (age groups 1–10, 11–20, KG, and particularly GS) are compared to the data for adult age groups (except 91–100). This is simply because of the much smaller numbers of samples in the child age groups compared to the adult age groups. In particular, there were only 16 children in the GS (age 7–11) group, so the average viral load in for that age group was estimated only very imprecisely.
Although we are content that our two comparisons between young children and adults, shown in Table 1, deal appropriately with the primary question of interest, we also performed a sensitivity analysis by exploring alternatve definitions of ‘young people’, which are shown in Table 2. (Note that the summary statistics in Jones et al. indicate that there were no samples from children ages 0, or people aged 20.) In each case, there is evidence that the average viral load in children is less than that in adults.
A plot showing the sample means and approximate confidence intervals for the age groups compared in Table 2 is given in Figure 8. Note that, in this Figure, the individuals in the four young age classes overlap, so that it is not helpful to compare those four classes with one another on the basis of the diagram.
- ** Professor Kevin McConway is Emeritus Professor of Applied Statistics at the Open University. He was a Vice-President of the Royal Statistical Society from 2012 to 2015, and is currently a trustee and member of the Advisory Committee of the Science Media Centre. Email firstname.lastname@example.org Twitter @kjm2
- ** Professor Sir David Spiegelhalter is Chairman of the Winton Centre for Risk and Evidence Communication and Fellow of Churchill College in the University of Cambridge, having recently retired from his position as Winton Professor of Public Understanding of Risk. He was President of the Royal Statistical Society from 2017 to 2018. Email email@example.com Twitter @d_spiegel