The goal of this paper is to analyze the registered cases of people who have been infected with Covid-19 registered from throughout the world, using a digital forensic analysis technique that is based on Benford's Law. Twenty-three countries were randomly chosen for this analysis: China, India, Germany, Brazil, Venezuela, Netherlands, Italy, Colombia, Russia, Norway, South Africa, Portugal, Singapore, United Kingdom, Chile, Ecuador, Egypt, Denmark, Ireland, France, Belgium, Australia and Croatia.. We calculate on the p-values based on Pearson χ2 and Mantissa Arc Test according to the results obtained with the first digit. If any country fails these two tests, a third proof will be carried out based on the Freedman-Watson test. The results indicated that results from Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile are suspicions of data manipulation because the numbers fail the Benford’s Law according to the results obtained until April 30, 2020. However, it is necessary to carry out further studies in these countries in order to ensure that they countries manipulate or altered the information.
Academic Editor: Sasho Stoleski, Institute of Occupational Health of R. Macedonia, WHO CC and Ga2len CC, Macedonia
Checked for plagiarism: Yes
Review by: Single-blind
Copyright © 2020 Raul Isea
The authors have declared that no competing interests exist.
In December 2019, the first cases of a new coronavirus (2019-nCoV) responsible for atypical pneumonia began to be registered in Wuhan (China). As of April 30, there are more than three million people infected individuals and there have been almost 230,000 deaths in 180 countries throughout the world. For that reason, On March 11, the disease was declared a pandemic by the World Health Organization.
There is currently no vaccine against this disease, and social distancing measures have been the main recommendation of the World Health Organization to prevent the spread of this disease. Recently, a study (written in Spanish) based on differential equations that simulate the transmission dynamics of the disease was presented from the reported cases of infection in four different countries, according to data recorded at Johns Hopkins University 1. This paper concludes/indicates that the success of the model will depend on the quality of the data.
For this reason, it is necessary to validate the data obtained from the infected cases of Covid-19, and thus, we can indicate that the data have not been altered or manipulated or even poorly transcribed for unknown reasons. Remember that the Benford's Law has been used in various scenarios to detect, for example, fraud in campaign finances 2, Governmental Economics data 3, in account data 4, fraud in scientific data 5, among others 6, 7.
In the scientific literature, we only found one paper published in a repository (arXiv) where the author studied the first contagion outbreaks occurred in China until February 13, 2020 using Benford's Law 8. This manuscript concluded that until this date, there was no evidence of alteration or manipulation of the cases registered in China.
For this reason, we carry out a more complete study to determine if it is possible to validate the data of people infected by covid-19 using Benford's Law based on Pearson χ2 and the Mantissa Arc Test, and eventually, the Freedman-Watson test to verify that the data has not been manipulated.
The data of infected cases were obtained in the database John Hopkins University (available at coronavirus.jhu.edu), from December 31, 2019 to April 30, 2020. The next step was to determine the frequency of appearance of the first digit according to Benford’s Law. In order to do that, we employed an algorithm in R employed the library: Benford.analysis according to the following equation:
where i corresponds to the values that go from 1 to 9 see details in 9. With this distribution, we calculate the Pearson value X2, which means the goodness of fit statistics according to this equation:
where P(k) and b(k) are the proportions obtained from the data and the Benford’s Law, respectively. The p-value is simply the probability obtained according to random values as explained in 9, where the p-value should be greater than 0,05 which implied that the numbers have not been altered or manipulated. In addition, the Pearson value χ2 should tend to zero.
In the Mantissa Arc Test, itwas necessary to calculate a center of mass of the set of values obtained from the mantissa values when considering that the data is distributed in a unit circle, where the center of the circle is given by:
where x1, x2, …, xN are the data values.
The next step is to determine the length of the mean values L2,which is given as
And finally, the p-value is simply.
Finally, to verify if any country really fails Benford's Law, we will verify with a third test called the Freedman-Watson 10, which is based on the following equation:
but this equation is complicated to explain and see details in 10.
And remember that the p-value should be greater than 0,05 that indicates that the data has not been altered or manipulated.
Finally, the calculations were carried out for twenty-three countries: from 29 December, 2019 until April 30, 2020: China, India, Germany, Brazil, Venezuela, Netherlands, Italy, Colombia, Russia, Norway, South Africa, Portugal, Singapore, United Kingdom, Chile, Ecuador, Egypt, Denmark, Ireland, France, Belgium, Australia and Croatia, and the results are explained in the next section.
In Table 1, we summarize the results that have been obtained with the two tests according to the data obtained up to April 30, 2020. The results were grouped random into three blocks, where the number of degree of freedom in the Pearson χ2 and Mantissa Arc Test were 8 and 2, respectively. In addition, we indicate the number of data points by each country (the results were verified with other module of R called BenfordTest).Table 1. Results obtained according to Benford’s law (see text for more details).
The countries that pass the two tests which means that the p-value greater than 0,05, are China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France and Australia. This means that the information these countries is valid. In fact, China, Singapore and Australia perfectly are agreed with the Benford's Law. On the other hand, Colombia, India, Russia and Croatia pass at least one of the two tests as shown in Table 1, so these countries no manipulate the data.
However, Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile do not pass either of the two tests (their values have been highlighted and in red color in the Table 1). For these countries, we calculate the p-value according to the Freedman-Watson test (employed the Benford.analysis library), and the results obtained were: 10-3, 10-16, 10-4, 10-16, 10-10, 10-16, 10-4, correspondent to Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile, respectively. Therefore, three tests different indicated that these countries may have somewhat or altered the data, because it is not possible to verify their accuracy with these three different tests.
However, it is necessary to wait until the end of the pandemic to be able to analyze all the data and to ensure that these countries have been able to manipulate the data, or perhaps there are failures due to the omission of registered cases.
The results obtained from the analysis based on Benford's Law of infected cases with Covid-19 obtained that China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France, Australia, Colombia, India, Russia, Croatia don’t manipulate the information register in the Jonhs Hopking dataset. However, Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile do not pass three tests carried out in the paper, and therefore, it is necessary to carry out further studies in these countries in order to ensure that they countries manipulate or altered the information.
In fact, we consider that we must wait until the end of the pandemic until all cases have been registered in all countries, and thus we must ensure the lack of credibility of the data provided in a given country in the world.
I’d like to acknowledgment to Karl E. Longreen for your comments in this manuscript.