The authors have declared that no competing interests exist.
Rapidly spreading Covid19 virus and its variants, especially in metropoli tan areas around the world, became a major health public concern. The tendency of Covid19 pandemic and statistical modelling represent an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate com bining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and ac curacy improvement from 2020 to 2023. Most importantly, we provide a new advanced pathways which may serve as targets for developing new solutions and approaches.
The COVID19 pandemic has had a tremendous impact on the world for 3 years and spread to more than 200 countries worldwide, leading to more than 36 million confirmed cases as of October 10, 2020. Some wellrespected organizations such as Johns Hopkins University, the Centers for Disease Control and Prevention, the World Health Organization and the United States Census Bureau are involved in the study and tracking of the Covid19 pandemic
To respond this urgent public health concern, we used 156 weekly time series datasets to evaluate the seasonal patterns of COVID19 cases and mortality in the United States with the objective to determine the tendency of Covid19 pandemic. Besides, the implantation of R and simulation analysis can improve the forecasting accuracy Given my prospective research interest in Data Science, smart data analytics is giving profes sionals and public more insight into the factors impacting than ever before. From assessing risks to analyzing evolving trends, we are now able to anticipate the success of a property more accurately thanks to the abundance of information available to academics and professionals. Our analysis can help in understanding the trends of the disease outbreak and provide suggestions and instructions of adopted countries.
Based on complex nature of virus transformation, traditional epidemic models such as Regression and ARIMA methods have been applied for prediction of its spread. Particularly, Dynamic Harmonic Regression (DHR) approaches were used to predict the spreading trends of COVID19, such as new cases and deaths. We reviewed studies that implemented these strategies ^{10}.
Dynamic Harmonic Regression (DHR) is a nonstationary timeseries analysis approach used to identify trends, seasonal, cyclical and irregular components within a state space framework. Many re searchers studied about this forecasting methods. Dr.Kumar and Dr.Suan (2020) use ARIMA model and day level information of COVID19 spread for cumulative cases from whole world and 10 mostly affected countries to forecast the impact of the virus in the affected countries and worldwide
In early December 2019, an outbreak of coronavirus disease 2019 (COVID19) caused by a novel severe acute respiratory syndrome coronavirus 2 (SARSCoV2), occurred in Wuhan City, Hubei Province, China.
On January 30, 2020 the World Health Organization declared the outbreak as a Public Health Emergency of International Concern (PHEIC).
As of February 14, 2020, 49,053 laboratoryconfirmed and 1,381 deaths have been reported globally.
On March 2020, the Journal of the American Medical Association Ophthalmology reported that COVID19 can be transmitted through the eye. One of the first warnings of the emergence of the SARSCoV2 virus came late in 2019 from a Chinese ophthalmologist, Li Wenliang, MD, who treated patients in Wuhan and later died at age 34 from COVID19.
On December 18, 2020, after demonstrating 94 percent efficacy, the NIHModerna vaccine was authorized by the U.S. Food and Drug Administration (FDA) for emergency use. Just days earlier, the similar Pfizer/BioNTech vaccine had become the first COVID19 vaccine to be authorized for use in the United States.
In the late summer and fall of 2021, the delta variant was the dominate strain of COVID19 in the U.S.
On 26 November 2021, WHO designated the variant B.1.1.529 a variant of concern, named Omicron.
Director of the National Institute of Allergy and Infectious Diseases Anthony Fauci gave an update on the Omicron COVID19 variant during the daily press briefing at the White House on December 1, 2021 in Washington, DC. He said that we will likely learn to live with COVID19 like we do with the common cold and flu ^{10}.
Globally, as of 6:32pm CET, 27 January 2023, there have been 752,517,552 confirmed cases of COVID19, including 6,804,491 deaths, reported to WHO. As of 24 January 2023, a total of13,156,047,747 vaccine doses have been administered.
The data for the ongoing Covid19 outbreak in the United States is collected from the Centers for Disease Control and Prevention. The columns of this dataset include the Total number of weekly cases, Weekly Death and Weekly tests volume of Covid19 patients accumulating all the states, on a weekly basis from 29th Jan 2020 to 18th Jan 2023. The total cases per 100,000, allow for comparisons between areas with different population sizes.
Weekly data is difficult to work with because the seasonal period (the number of weeks in a year) is both large and noninteger, like stock prices, employment numbers, or other economic indicators. The average number of weeks in a year is 52.18. Most of the methods we have considered require the seasonal period to be an integer. Even if we approximate it by 52, most of the methods will not handle such a large seasonal period efficiently.
So far, many publications and researchers have considered relatively simple seasonal patterns, such as quarterly and monthly data. However, higher frequency time series often exhibit more com plicated seasonal patterns. For example, daily data may have a weekly pattern as well as an annual pattern. Hourly data usually has three types of seasonality: a daily pattern, a weekly pattern, and an annual pattern. Even weekly data can be challenging to forecast as it typically has an annual pattern with seasonal period of 365
Exponential smoothing model didn’t seem applicable, and ARIMA modelling is poor working with high integer seasonal periods (e.g. days/weeks rather than months/quarters), and also struggles with a noninteger seasonal period (i.e. 52 weeks some years, 53 weeks other years).
There are several methods for incorporating seasonality into a forecasting model. One common approach is to use timeseries models such as SARIMA (Seasonal Autoregressive Integrated Moving Average) or Seasonal Exponential Smoothing. These models can capture the seasonal patterns in the data and adjust the forecast accordingly.
The time series processes are usually all stationary processes, but many applied time series, par ticularly those arising from economic and business areas are nonstationary. With respect to the class of covariance stationary processes, nonstationary time series can occur in many different ways. They could have nonconstant means
or both of these properties
When applied to Covid19 data, taking the natural logarithm of the number of cases or deaths can help stabilize the variance of the data and make the trend more apparent, especially in the early stages of the pandemic when the growth was exponential. This can also help identify if there are any underlying patterns or seasonality in the data. After applying the log transformation, the resulting data will have a more linear trend and a constant variance, which makes it easier to model using standard statistical techniques such as linear regression or ARIMA models.
Many models used in practice are of the simple ARIMA type, which has a long history and was formalized in Box and Jenkins
Since we are also taking into account the seasonal pattern even if it is weak, we should also examine the seasonal ARIMA process. This model is built by adding seasonal terms in the non seasonal ARIMA model we mentioned before. One shorthand notation for the model is
{(p, d, q)} : nonseasonal part
(P, D, Q)m}: seasonal part.
P = seasonal AR order,
D = seasonal differencing,
Q = seasonal MA order
m: the number of observations before the next year starts; seasonal period ^{12}.
The seasonal parts have term nonseasonal components with backshifts of the seasonal period. For instance, we take {ARIMA(p, d, q)(P, D, Q)m} model for weekly data (m=52). Without differencing operations, this process can be formally written as:
A seasonal ARIMA model inc
The time series models in ARIMA model and Exponential Smoothing model allow for the inclu sion of information from past observations of a series, but not for the inclusion of other information that may also be relevant. For example, the effects of holidays, competitor activity, changes in the law, the wider economy, or other external variables may explain some of the historical variation and may lead to more accurate forecasts. On the other hand, the regression models allow for the inclusion of a lot of relevant information from predictor variables but do not allow for the subtle time series dynamics that can be handled with ARIMA models.
An alternative approach uses a dynamic harmonic regression model. Next, we tried to extend ARIMA models in order to allow other information to be included in the models. Firstly, we consid ered regression model
The system composed by four components: trend (T), sustained cyclical (C) with period different to the seasonality, seasonal (S) and white noise (
The measured values of y are the output (observations) series of a system of stochastic state space equations, which can then be broken down to allow for estimation of the four components.
So for such time series, we prefer a harmonic regression approach where the seasonal pattern is modelled using Fourier terms with shortterm time series dynamics handled by an ARIMA error.
In the following example, the number of Fourier terms was selected by minimising the AICc. The order of the ARIMA model is also selected by minimising the AICc although that is done within the auto.arima() function in R.
Dynamic harmonic regression is based on the principal that a combination of sine and cosine functions can approximate any periodic function.
Where m is the seasonal period,
The fitted model has 18 pairs of Fourier terms and can be written as
Where ηt is an ARIMA(4,1,1) process. Because
The advantages of this approach are :
Flexibility: DHR model can be used to model data with various levels of complexity, including data with multiple seasonal patterns, irregular patterns, and nonstationary patterns. It allows
any length seasonality; The shortterm dynamics are easily handled with a simple ARIMA error. Especially, for data with more than one seasonal period, Fourier terms of different frequencies can be included;
The smoothness of the seasonal pattern can be controlled by K, the number of Fourier sin and cos pairs – the seasonal pattern is smoother for smaller values of K ;
The only real disadvantage (compared to a seasonal ARIMA model) is that the seasonality is assumed to be fixed  the seasonal pattern is not allowed to change over time. But in practice, seasonality is usually remarkably constant so this is not a big disadvantage except for long time series.
Time series analysis and forecasting are an active research area over the last five decades. Thus, various kinds of forecasting models have been developed and researchers have relied on statistical techniques to predict time series data. The accuracy of time series forecasting is fundamental to many decisions processes, and hence the research for improving the performance of forecasting mod els has never been stopped. However, the time series datasets are often nonlinear and irregular
The author measured forecasting performance by the mean absolute error (MAE), root mean square error (RMSE), root relative squared error (RSE), and mean absolute percentage error (MAPE). The MAE criterion is most appropriate when the cost of a forecast error rises proportionally with respect to the absolute size of the error. With RMSE, the cost of the error rises as the square of the error, and so large errors can be weighted far more than proportionally. Whether MAE or RMSE is most appropriate surely varies according to circumstances and individual institutions, and in any case we will find that the several measures pick the same model in all but several instances
These measures were calculated by using the following Equations.
time t,
where k is the number of parameters and n the number of samples.
It is important to note that these information criteria tend not to be good guides to selecting the appropriate order of differencing (d) of a model, but only for selecting the values of p and q. This is because the differencing changes the data on which the likelihood is computed, making the AIC values between models with different orders of differencing not comparable
In this section, the focus is on statistical methodology and forecasting results on time series datasets regarding Covid19 pandemic. The comparison









DHR with ARIMA(2,0,1) error  8447.324  148729.5  92906.71  43052.44  48766.5  0.1582  18.38 
ARIMA(2,1,0)(0,1,0)^{52}  4511.181  132336.8  57721.63  3.0858  8.7082  0.0983  1711.99 
Dynamic Regression  16520.74  162314.7  94507.58  0.1878  19.2053  0.1609  3105.5 
with ARIMA (2,1,3)error  
Log transformation  0.00654  0.25964  0.18395  0.26225  1.8929  0.10279  419.08 
ARIMA (1,1,5)(0,0,1)^{52}  
Log transformation DHR  0.01285  0.1753  0.13024  0.34485  1.4169  0.0728  15.88 
Collectively, these models are capable of identification of learning parameters that affect dissimi larities in COVID19 spread across various regions or populations, combining numerous intervention methods and implementing whatif scenarios by integrating data from diseases having analogous trends with COVID19 pandemic
As it was the case with the forecast in
regression with ARIMA error. We can easily confirm from the above results that the transformation improves the accuracy if the time series have an unstabilized variance. It also shows that when there are long seasonal periods, a dynamic regression with Fourier terms is often better than other models we have considered from the raw datasets.







2023.01.04  11.84703  2.16397924  21.53008  2.9619173  26.65597 
2023.01.11  11.67934  1.86601883  21.49266  3.3288382  26.68751 
2023.01.18  11.39147  1.44959306  21.33336  3.813321  26.59627 
2023.01.25  11.09728  1.02847775  21.16608  4.3016243  26.49618 
2023.02.01  11.01559  0.821447  21.20973  4.5750067  26.60619 
2023.02.08  11.27106  0.95309604  21.58902  4.5089033  27.05102 
2023.02.15  11.77707  1.33675702  22.21738  4.1900106  27.74415 
2023.02.22  12.34798  1.7867302  22.90922  3.8040555  28.50001 
2023.03.01  12.83167  2.15085746  23.51248  3.5032215  29.16656 
2023.03.08  13.12814  2.32908592  23.92719  3.3875856  29.64386 
2023.03.15  13.20719  2.29118114  24.1232  3.487405  29.90179 
2023.03.22  13.14645  2.11472479  24.17818  3.7251195  30.01803 
2023.03.29  13.05819  1.91194524  24.20444  3.9885213  30.10491 
2023.04.05  12.95955  1.69995251  24.21915  4.2605198  30.17963 
2023.04.12  12.79333  1.42150431  24.16515  4.5983756  30.18503 
2023.04.19  12.55773  1.07477713  24.04068  5.0039298  30.11939 
2023.04.26  12.31002  0.71700654  23.90303  5.4199636  30.04 
2023.05.03  12.06197  0.35992833  23.76401  5.834757  29.95869 
2023.05.10  11.79296  0.01709568  23.60302  6.2689634  29.85489 
2023.05.17  11.55598  0.36111708  23.47308  6.669649  29.78162 
2023.05.24  11.44662  0.57657226  23.4698  6.9412638  29.8345 
2023.05.31  11.46867  0.65967812  23.59702  7.0800382  30.01738 







2023.01.04  7.881919  5.361387  10.402452  4.027098  11.736741 
2023.01.11  7.231386  4.707106  9.755666  3.370833  11.091939 
2023.01.18  7.014583  4.49027  9.538896  3.153979  10.875187 
2023.01.25  7.316785  4.790167  9.843403  3.452656  11.180913 
2023.02.01  7.972997  5.438274  10.50772  4.096473  11.849521 
2023.02.08  8.628184  6.080713  11.175655  4.732163  12.524205 
2023.02.15  8.983049  6.422724  11.543374  5.067369  12.898729 
2023.02.22  8.973543  6.404637  11.54245  5.04474  12.902347 
2023.03.01  8.738027  6.166017  11.310036  4.804477  12.671576 
2023.03.08  8.455716  5.883601  11.02783  4.522006  12.389426 
2023.03.15  8.228145  5.654814  10.801476  4.292575  12.163715 
2023.03.22  8.086148  5.50775  10.664546  4.142829  12.029467 
2023.03.29  8.056633  5.469725  10.643541  4.100298  12.012967 
2023.04.05  8.171459  5.575543  10.767374  4.201348  12.141569 
2023.04.12  8.408955  5.806693  11.011218  4.429138  12.388773 
2023.04.19  8.675691  6.070895  11.280486  4.691999  12.659382 
2023.04.26  8.875202  6.270236  11.480169  4.89125  12.859154 
2023.05.03  8.969148  6.363566  11.57473  4.984254  12.954042 
2023.05.10  8.95649  6.347756  11.565223  4.966776  12.946203 
2023.05.17  8.833789  6.21938  11.448199  4.835395  12.832183 
2023.05.24  8.605682  5.984966  11.226399  4.597643  12.613722 
2023.05.31  8.304007  5.678611  10.929403  4.28881  12.319204 
The trend analysis shows unstable situation in the infected cases and weekly deaths and predic tion study shows increase in the expected active and death cases nationally. However, the time series datasets are often nonlinear and irregular. This data has been used by researchers, policymakers, and others to better understand and respond to the effects of the pandemic.
The objective in providing crucial statistical techniques is to enable government and public to make informed decisions regarding Covid19. Most importantly, we obtain how to add value to public health and apply skills in a real world environment. These models are essential for informing public health decisionmaking and resource allocation, as well as for predicting future trends in the spread of the disease.
The author would like to thank some comments and constructive suggestions from Dr.Olusegun Michael Otunuga from the college of Science and Math and Dr.Hinton Romana from Writing Center in Augusta University. Several stimulating discussions and comments allowed me to develop original ideas and improve my paper.