Friday, 10 April 2020

Campaign on a Representative Sample of the Population

How to Better Evaluate the Spread of SARS-CoV-2


10 April 2020


Coordination: Josselin Garnier[1]

 

Summary
A virus-screening test campaign is proposed, using random, unbiased samples representative of the general population, to significantly improve our understanding of the present and future epidemic situation through better-calibrated mathematical infectious disease models.

Assessment
Various mathematical models have been proposed to predict the progression of the Covid-19 epidemic at the national scale. Most of these models consider the temporal evolution of the epidemic in terms of a population divided into compartments associated with different possible disease statuses: susceptible, infected, or recovered. These models can take into account stratification, for example, by age or by region. The laws defining the evolution of the distribution are written in the form of coupled differential equations, which represent the mechanisms and phenomena occurring, and which are deduced from epidemiological data. These models are generally calibrated, meaning that the free parameters (for example, the infection rates) are adjusted such as to reproduce the available data (detected cases and deaths). The models are then used to make predictions, for example, the date of the infection peak, or the impact of policy measures, such as confinement or physical distancing (see, among other publications those of Imperial College[2] or of the Université de Bordeaux.[3]

Statistical studies reveal, however, that the predictions of such models can be very unreliable. Indeed, often different sets of the free parameters can be defined that are all compatible with the available data, but which lead to very different predictions. Even with very simple models (for example, one national compartment for each category), the uncertainties are very large, implying that epidemiological predictions should be used with extreme caution. However, uncertainty quantification procedures (known and used for quantifying the quality and robustness of large digital code simulations) make it possible to highlight these prediction uncertainties. They also make it possible to identify, through sensitivity analysis, the critical model parameters. These are the parameters against which the models are sensitive, and which cannot be extracted from the available data. For example, the proportion of asymptomatic carriers is not measured even though it is a critical parameter.[4]

Objectives
In order to build mathematical models capable of more robust predictions, it is necessary to obtain information about the critical model input parameters. Using sensitivity analysis, many parameters can be set to their most likely values, allowing us to focus on the most important unknowns. Some of them, such as the proportion of asymptomatic carriers, can best be estimated by means of a specific test campaign, carried out on a random and unbiased sample of the population. These data would complement those already available (detected cases and deaths), allowing us to improve the robustness of the models and to strengthen their predictive capability. Uncertainty could be significantly reduced by better estimating the proportion of asymptomatic carriers, by integrating a spatial dimension, and by determining the immunity already acquired.

Feasibility
Such test campaigns can be organized immediately. They are being carried out in three departments of the Ile-de-France region. It is not necessary to test the whole population (which would be ideal but is impractical at present), nor to make full individual diagnoses of a small segment of the population. Instead, our strategy is to obtain statistical information on the current state of susceptibility of the population. Two types of tests – one detecting the viral load, the other, the anti-bodies – provide two different pieces of information, which can be integrated into the models to refine the predictions. The information from tests carried out at different times can also be integrated into the models, because they describe the temporal evolution.

Reliability
The reliability of tests is defined by their sensitivity and by their specificity. Sensitivity measures the occurrence of false negatives and specificity the occurrence of false positives. If this is well-known, this information can be used to improve statistical information on the state of the whole population. The impact of test reliability is different in the case of statistical sampling compared to the case of individual diagnoses, because such sensitivity and specificity information can be integrated into the statistical processing and use of the data. In addition, if we control for sensitivity and specificity, a technique (known as group testing) makes it possible to perform pooled tests, reducing the number of tests which must be performed compared to the number of samples.

Implementation
The implementation of such a test campaign requires:
1) Performing viral-load tests, and, if possible, simultaneous serological tests; the tests with very high specificity should be favored. It is important to have sufficient information on the patient cohorts used to assess the specificity and sensitivity properties.
 2) Recruit researchers capable of constructing the random cohorts in order to make the samples (depending on the number and reliability of the tests available) and to process the data.[5]
3) Setting up specific procedures for collecting consent forms, submitting a medical questionnaire and taking samples. Ideally, random samples with anonymized results should be taken, but with spatial (at the scale of the municipality) and temporal (at the scale of the sample date) tags to be taken into account sequentially in the models. Mettre en place une logistique spécifique pour recueillir les consentements et faire les prélèvements.


This proposal is made by a group of mathematicians from the Saclay plateau (ENS Paris Saclay, Inria Saclay and Polytechnique).













[1] Ecole polytechnique, Centre Cournot, LabEx Hadamard
[2] Imperial College COVID-19 Response Team, Estimating the number of infections and the impact of nonpharmaceutical interventions on COVID-19 in 11 European countries, 2020, 30 mars.
[3] Magal P., Webb G., Predicting the number of reported and unreported cases for the COVID-19 epidemic in South Korea, Italy, France and Germany, 2020, https://doi.org/10.1101/2020.03.21.20040154
[4] A Bayesian estimate of the parameters of the models proposed in the literature shows that the a posteriori distribution of this parameter, knowing the current data, is always almost equal to its a priori distribution. In other words, it cannot be estimated, although the predictions (of the intensity of the peak, for example) strongly depend on it. Other types of data are therefore needed to estimate it.
[5] The basic idea is to draw representative samples from the general population. To illustrate, in the simplest framework, if we expect a rate of positive test results of the order of p in a stratum, then we must take samples from a sample of size N=1/(pa²) to have a precision relative to the estimation of p (for example, if p =1% and a=20%, then N=2500). We can also draw fewer individuals in certain strata or sub-strata where more positive tests are expected, in order to decrease the variance of the estimates (this is a classic method in stratified sampling). We can also do group testing (where we test mixtures of samples), which increase the number of samples while remaining with a significantly reduced number of tests performed when p is low (this is also fairly standard, known as group testing).

No comments:

Post a Comment

Labor and employment dynamics during the coronavirus crisis: an update

Confirmations, clarifications and a mystery Bernard Gazier Among the data and statistical analyses published between April 30 and May 15, 20...