The objective of this paper is to apply datadriven discovery of dynamics modeling to obtain a system of differential equations that allows us to describe the transmission dynamics of Covid-19, based on the number of confirmed cases and deaths reported daily. This methodology was applied in four different countries: Brazil, Colombia, Venezuela, and the United States. The main advantage is that only one differential equation is needed to characterize the dynamic of Covid-19 without any mathematical assumption.
Academic Editor: Qianqian Song, Wake Forest School of Medicine, Wake Forest Baptist Comprehensive Cancer Center, Medical Center Boulevard, Winston-Salem, NC 27157.
Checked for plagiarism: Yes
Review by: Single-blind
Copyright © 2021 Raúl Isea
The authors have declared that no competing interests exist.
There is a great effort to explain the transmission dynamics of Covid-19 with mathematical models after it was declared a pandemic in March 2020 1. In fact, a search in Google Scholar (to cite an example) using the keywords: "Mathematics + Covid-19", obtained 17,900 different results from 2020 as those of November 2021. All of it indicates the great diversity of results obtained in this important field of work.
Most of these papers are dedicated to describe the outbreak in some places of the world. For example, Isea described the dynamics on Venezuela 2, Tang et al on Brazil 3, and so on. For that reason, it is necessary to develop a methodology that allows describing the epidemic by Covid-19 based on the data, and principally with only a mathematical model.
In the last decade, computational methodologies have been developed for obtaining the non-linear differential equations that rule a dynamical system. One of the techniques to do so is called datadriven discovery of dynamics modeling 4, 5, 6, 7, which is based on Sparse Identification of Nonlinear Dynamics (SINDy). This computational implementation is usually done in Python 8 or Mathematical 9.
In fact, the SINDy methodology applied to Covid-19 has already been reported in the scientific literature see for example 10, but unlike those publications, we obtained a polynomial differential equation based on confirmed cases and deaths reported daily as described in the next section.
The data driven discovery of equations is a computational methodology where applied techniques of Data Science and Machine Learningare used, and also Artificial Intelligence as shown by Bruton et al7. This methodology is displayed in Figure 1, where only solutions of polynomial functions are considered.
As can be seen in Figure 1, a matrix whose columns are the time dependent input data are built, i.e., the number of confirmed cases (I) and deaths (D) reported daily. The next step was to build a library of coefficients of nonlinear functions based on polynomial function indicated as in the figure, where the degree of a polynomial is represented by U. For example, U=2, it means will be [1, I,D,I2,D2,ID] (1 in these expressions represents a constant value).
The dynamics will be described by the following equation (the point in X represents the derivative respect to the time), and the sparse coefficients vector will be equal to [x1, x2,], which correspond to the values of [I, D], respectively (accordingly to Bruton’s methodology)7.
The third step is an optimization process where the parameters are calculated by Least Absolute Shrinkage and Selection Operator (abbreviated as LASSO) 14. Remember that LASSO regression is also known as L1-norm regression. In future papers other methods will be implemented such as Scaled Sequential Threshold Least Squares (S2TLS) algorithm 15 to compare results. This step is really the most import of them all. In fact, the degree (U) in the library coefficients is obtained automatically by the program according to the minimization of the error in the optimization step.
Finally, the last step is to obtain the differential equation. For the case in which U=2, this would be
where ai and bi (i from 1 to 6) are the constant coefficients to be calculated for each of the countries.
The data was obtained from the Johns Hopkins University portal, available at coronavirus.jhu.edu. Four countries were selected: Brazil, Colombia, Venezuela, and the United States, and in each country the numberof contagions (I) and deaths (D) is obtained, from March 27, 2020 until June 14, 2021 (a total of 445 records) were retrieved.
The next step was to normalize the data according to standard deviation, and the results are shown in figure 2, i.e., this normalization consisted of subtracting by the mean value and divided by the standard deviation, where the data was represented by symbols in blue color, and the results obtained in dashedblack line. It is interesting to see the result in Brazil by the dispersion of the data.
Figure 2. Daily cases records versus time (t) on the United States, Brazil, Colombia, and Venezuela represented with a blue point, and with dash line shows the results obtained in the normalization of the data.
The next step was to calculate the parameters with the normalization data according to with the methodology described in Figure 1, where the library of coefficients of nonlinear functions is based on a polynomial function. The coefficients obtained in each country are shown in Table 1. The degree of differential equations obtained in all countries wasthree (U=3, error less than 0.001). In addition, it is interesting to note how different are the parameters obtained in this table, because each result depends on country response measures to Covid-19.Table 1. Coefficients obtained in the differential equations system (1) on Brazil (BRA), United States (USA), Venezuela (VEN), and Colombia (COL), divided into two sections corresponding to dI/dt and dD/dt, respectively.
Finally, figure 3 depicts the result obtained in (A) the United States of America and (B) Venezuela according to the results obtained in Table 1 (the results are shown with a red line). The results show that this methodology is not capable to make accurate predictions when there is a lot of difference in the number of cases (see for example the US case), while the prediction for Venezuela better reproduces the observed cases.
Figure 3. Results obtained in (A) the United States, and (B) Venezuela. Daily cases record are shown in blue point, the result obtained with the SINDy methodology in dashed black line (represented as SG), and the values obtained according to the coefficients indicated in Table 1in red.
This paper proposes a system of differential equations of the polynomial type that allows characterizing the transmission dynamics of Covid-19 in any country since the beginning of the pandemic. The main advantage of this methodology is that it is possible to derive only one differential equation to explain the dynamics of contagion by SARS-CoV-2. It only remains to indicate that it is necessary to develop numerical calculations to be able to generalize these conclusions.
I’d like to acknowledgment to Rafael Mayo-Garcia and Jesus Isea for your comments in this manuscript.
This paper is dedicated to the memory to Gloria Teresa Villegas who died on 21th October 2021. Her husband, Raimundo Villegas, also died on October 21. Thank you for your friendship.