Featured Issues


Articles by Year

Articl(s) are retrieved. (If you want to download full pdf file, please let us know by email.)

A Novel Deep Learning Algorithm to Calculate and Model the Age-Standardized COVID-19 Mortality Rate of a Subpopulation When Compared to a Standard Population

March 16, 2022
Mayur T. Talele, Herricks High School

Abstract: Coronavirus disease -19 (COVID-19) has gained widespread interest in the field of mathematical epidemiology in order to inform the public on basic statistics surrounding COVID-19. However, the age-standardized mortality rates (ASMRs), which adjust age and population discrepancies between different regions by comparing a subpopulation to a standard population, have not been shown publicly. Usually, COVID-19 ASMRs have not been calculated due to the lengthy process required to calculate them; however, ASMRs for COVID-19 have occasionally been calculated, but their effectiveness have been hindered due to the use of a hand-written formula and graphical manual methods. My study involved the development of a deep learning algorithm to calculate ASMR and to instantly graph the ASMR of a subpopulation versus the crude mortality rate of the standard population. This algorithm was used to compare the ASMRs for COVID-19 in American states to the crude mortality rate of the standard population, America. In this study, the algorithm shows efficiency with a consistent runtime of time≤5seconds, within 95% confidence interval error bars among trials. ASMRs show statistically significant differences in expected COVID-19 deaths among most populations. There is at least 95% confidence (p≤0.05) that differences in ASMR are independent of age and population distributions. These findings suggest that there are more factors than just age discrepancy that affect COVID-19 mortality rates.

Keywords: COVID-19, Age-Standardization, Mortality Rate, Algorithm, Deep Learning


I. Introduction

Age-standardized mortality rates (ASMRs) are calculated and modeled as a way of comparing a mortality rate of a subpopulation to a standard population by adjusting the subpopulation to match the standard population’s population size and age distribution. Coronavirus disease -19 (COVID-19) is a severe acute respiratory syndrome that has spread to over 100 countries in rapid succession, thus classifying COVID-19 as a global pandemic [1-2,10]. Due to its status as a global pandemic, COVID-19 has received widespread attention in the field of mathematical epidemiology in order to inform the public on basic statistics surrounding COVID-19, such as the number of COVID-19 deaths, infected patients, patients treated in a hospital, patients treated in intensive care units, related infections, and related deaths [3-4].

However, public data sets have been scrutinized for not including more detailed statistics, notably the comparison of COVID-19 deaths in different populations after removing age as a confounding variable [4,6]. In the United States, there has been a lack of age-specific data regarding COVID-19 [9]. ASMRs for COVID-19 have previously been calculated for some regions in the United States; however, their effectiveness have been hindered due to the use of indirect standardization, rather than direct standardization, a formula, and a graphical manual method, which takes considerable time to complete [5,8].

Therefore, a fast and fully functional deep learning computer algorithm that is consistent, is easily debuggable, calculates age-standardized mortality rates, instantaneously graphs newly calculated data, and uses direct age standardization is the most effective and efficient method of adjusting age such that it is no longer a confounding variable. Python–an open source programming language that is precise, is fast, can serve as a calculator, and graph data–would allow for the bypassing of the time-consuming and user-dependent nature of manually calculating and graphing for ASMR [8]. Hence, the Python encoded deep learning algorithm has potential for calculating and graphing ASMR at a high speed. This study proposes the deep learning algorithm, coded in Python, as a revolutionary method of removing the confounding variable of age with high speed and minimal user-dependency. This deep learning algorithm will be used to compare the ASMR for COVID-19 of each subpopulation (state) in the United States to the standard population, the United States as a whole. I present a protocol for the development, reiteration, application, and examination of the deep learning algorithm to provide greater statistical insight into COVID-19. Success in the application of this deep learning algorithm presents a novel, vital ASMR calculating and graphing algorithm.

ASMRs are a vital measure to compare the mortality rates between a subpopulation and a standard population because ASMRs adjust age and population discrepancies between different regions, thus allowing other confounders to be identified within the respective populations [7]. This allows for the removal of age and population differences as confounding variables, which allows for greater capacity to identify other variables leading to the mortality rates. Because of its various benefits, ASMR calculations have been of growing interest in the field of computational and mathematical biology. However, the efficiency of calculating and graphing ASMRs has been hindered in recent years because of the lengthy process of the calculations and because of the manual graphing that must be used in order to visualize the results [5,8]. In its present application, ASMRs are a relatively slow, inefficient method of calculating mortality rates when age is standardized. Therefore, a deep learning algorithm with the ability to instantly calculate and graph the ASMR of a subpopulation when compared to a standard population. This significantly increases the speed and efficiency of calculation and graphing of ASMR.

II. Methodology

Run the Program Using Publicly Available Datasets for the United States

Public datasets will be obtained from the CDC’s Provisional COVID-19 Death Counts by Sex, Age, and State [16]. Population data for each state (subpopulation) and America (standard population) from World Population Review [17]. All datasets information used in this study are updated as of October 28, 2020. The newly developed algorithm will be run to find the age-standardized COVID-19 mortality rate for every state in the United States when compared to the crude COVID-19 mortality rate of the United States. During this phase, I gave the algorithm input (Fig.5): COVID-19 death statistics and population statistics, which the algorithm will use in order to calculate and graph the COVID-19 ASMR and crude COVID-19 mortality rate for the subpopulation and the standard population respectively.

Statistical Significance Analysis: Standard Deviation, Standard Error, and Confidence Interval for ASMR Comparison

In order to identify the significance of the results, statistical tests are to be run. The free version of Google Sheets was used to conduct the statistical tests. First, after identifying the crude COVID-19 mortality rate, the standard deviation is found. Using the standard deviation, the (SEM) is calculated with the sample size being the number of age ranges inputted to the algorithm.. Then, adding ±2SEM gives the error bars. The same process of getting the error bars applies for the subpopulations whose Age-standardized mortality rates are being calculated. If the error bars of the subpopulation and the standard population overlap, then it means that the difference in mortality rate between the two populations is not statistically significant. This would suggest there is 95% confidence that age discrepancy between populations is the only variable affecting COVID-19 mortality rates. Meanwhile, if the two error bars do not overlap, it means that there is 95% confidence that the population with a higher mortality rate is caused to have a higher mortality rate due to a variable other than age discrepancy. Therefore, running the statistical analyses are crucial to ensure that the results are statistically significant.

III. Results

ASMR due to COVID-19 per 100,000 people in each Population

In the ASMR comparison bar graph, the expected number of deaths to COVID-19 per 100,000 people are graphed. The United States, being the standard population, must have the same ASMR as its crude COVID-19 mortality rate of 64.0199. If New York’s population was adjusted for the same size and age distribution as the US, then their mortality rate for COVID-19 per 100,000 people is expected to be 932.0452. By contrast, all other states in this study show less than half the ASMR to COVID-19 per 100,000 people of New York. Primarily, Texas has the ASMR per 100,000 people of 447.5445, followed by Florida with 362.742, followed by California with 253.0757, and lastly followed by the United States with 64.0199.

Expected Number of Deaths due to COVID-19 in each Population Number of Deaths if Adjusted to Standard Population

The data table shown (Fig 9) lists the standard population, the US, as well as the subpopulations, New York, Texas, California, and Florida. After the algorithm from the deep learning computer algorithm was able to calculate the expected number of deaths within each age range of each subpopulation, the expected deaths were summed to provide a total number of expected deaths from each population. Then, the error bars, representing 95% confidence intervals, were calculated by finding ±2SEM. As shown (Fig 9), New York has, by far, the highest expected number of deaths if adjusted to the population size of the United States as well as the age distribution of the United States. By contrast, Florida has the lowest expected deaths when adjusted for age and population.

IV. Discussion

Overall, throughout this study, it was found that the newly developed computer program, with a deep learning algorithm, is successful in its consistency, functionality, efficiency, calculations, and graphing capabilities. Efficiency and consistency of the algorithm was a key focus of this study as shown in Figure 6a and Figure 6b. Efficiency was tested by measuring the runtime of the program. In this study, runtime was defined as the amount of time taken for the program to start running. A common threshold for an efficient program’s runtime is where time (t) is t≤5seconds [12]. As shown in Figure 6a, the algorithm is consistently below the 5.0 second tick mark, which suggests that the program is efficient. Consistency was also tested because 10 runs of the program were made in each of the three trials. As shown in Figure 6b, the mean runtime for the program for each trial was near t=3.5seconds. Furthermore, the error bars, indicating 95% confidence interval, all overlap the other trials’ means and error bars, which means that the difference in each trials’ mean runtime occurred by chance, and that there is no causal agent. Because there is no causal agent that increases the runtime of the program, it shows that the program runs efficiently.

V. Conclusion

In this study, it was hypothesized that variables other than age discrepancies do not have a significant impact on the mortality rate due to COVID-19. However, refuting the initial hypothesis, one of the main findings of this study is that at least one factor other than population size and age distribution had a significant impact on COVID-19 mortality rate in various populations. This is illustrated in ​Figure 10​, where multiple populations are showing statistically significant differences in expected number of COVID-19 deaths once adjusted to the standard population, the United States. Another key finding of this study is that each subpopulation had a higher ASMR than crude mortality rate (​Fig 7, 8​), which shows that each state would have suffered more deaths than the United States if each state was to have the same age distribution as the United States. This supports the idea that statewide deaths are not solely related to age and population, but also preexisting conditions and environment in which the people live [14,15]​.

Another crucial finding in this study is that the deep learning algorithm, within the computer program, is functioning both consistently and efficiently. The efficiency of the algorithm can be seen by its runtime of t≤5seconds (​Fig 6a​). Then, each trial was within the error bars of each other which means that the algorithm has a low runtime consistently because there is no statistically significant difference between the runtime of the algorithm each time it is run (​Fig 6b​). This has strong implications for future use by the public in the form of a publicly available web application.

To refine the conclusions from this study, in regards to studying the impact of age distribution on COVID-19 deaths, more experiments can be done in which more United States states are compared to the standard population of the United States. However, the algorithm is versatile, so the subpopulations and the standard population can be changed entirely to focus on another region. To refine the conclusions about the newly developed computer program, more trials can be conducted to ensure that the runtime found in the first three trials are not outliers, but are representative of the efficiency and consistency of the program.


References

  1. Wang, D., Li, Z., & Liu, Y. (2020). An overview of the safety, clinical application and antiviral research of the COVID-19 therapeutics. Journal of Infection and Public Health. doi:10.1016/j.jiph.2020.07.004
  2. Brown, S. M., Doom, J. R., Lechuga-Peña, S., Watamura, S. E., & Koppels, T. (2020). Stress and parenting during the global COVID-19 pandemic. Child Abuse & Neglect. doi:10.1016/j.chiabu.2020.104699
  3. Overton, C. E., Stage, H. B., Ahmad, S., Curran-Sebastian, J., Dark, P., Das, R., . . . Webb, L. (2020). Using statistics and mathematical modelling to understand infectious disease outbreaks: COVID-19 as an example. Infectious Disease Modelling, 5, 409-441. doi:10.1016/j.idm.2020.06.008
  4. Tiirinki, H., Tynkkynen, L., Sovala, M., Atkins, S., Koivusalo, M., Rautiainen, P., . . . Keskimäki, I. (2020). COVID-19 pandemic in Finland – preliminary analysis on health system response and economic consequences. Health Policy and Technology. doi:10.1016/j.hlpt.2020.08.005
  5. Russell, T. W., Hellewell, J., Jarvis, C. I., Zandvoort, K. V., Abbott, S., Ratnayake, R., . . . Kucharski, A. J. (2020). Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020. Eurosurveillance, 25(12). doi:10.2807/1560-7917.es.2020.25.12.2000256
  6. Bernardino, G., Benkarim, O., Garza, M. S., Prat-Gonzàlez, S., Sepulveda-Martinez, A., Crispi, F., . . . Ballester, M. A. (2020). Handling confounding variables in statistical shape analysis - application to cardiac remodelling. Medical Image Analysis, 65. doi:10.1016/j.media.2020.101792
  7. Xu, L., Polya, D. A., Li, Q., & Mondal, D. (2020). Association of low-level inorganic arsenic exposure from rice with age-standardized mortality risk of cardiovascular disease (CVD) in England and Wales. Science of The Total Environment, 743. doi:10.1016/j.scitotenv.2020.140534
  8. Shende, R., Gupta, G., & Macherla, S. (2019). Determination of an inflection point for a dosimetric analysis of unflattened beam using the first principle of derivatives by python code programming. Reports of Practical Oncology & Radiotherapy, 24(5), 432-442. doi:10.1016/j.rpor.2019.07.009
  9. Mohamed, M. O., Gale, C. P., Kontopantelis, E., Doran, T., Belder, M. D., Asaria, M., . . . Mamas, M. A. (2020). Sex-differences in mortality rates and underlying conditions for COVID-19 deaths in England and Wales. Mayo Clinic Proceedings. doi:10.1016/j.mayocp.2020.07.009
  10. Kavadi, D. P., Patan, R., Ramachandran, M., & Gandomi, A. H. (2020). Partial derivative Nonlinear Global Pandemic Machine Learning prediction of COVID 19. Chaos, Solitons & Fractals, 139. doi:10.1016/j.chaos.2020.110056
  11. Minicozzi, P., Cassetti, T., Vener, C., & Sant, M. (2018). Analysis of incidence, mortality and survival for pancreatic and biliary tract cancers across Europe, with assessment of influence of revised European age standardisation on estimates. Cancer Epidemiology, 55, 52-60. doi:10.1016/j.canep.2018.04.011
  12. Bosch, Jaume, et al. “Asynchronous Runtime with Distributed Manager for Task-Based Programming Models.” Parallel Computing, vol. 97, 2020, p. 102664., doi:10.1016/j.parco.2020.102664.
  13. Rodriguez-Diaz, Carlos E., et al. “Risk for COVID-19 Infection and Death among Latinos in the United States: Examining Heterogeneity in Transmission Dynamics.” Annals of Epidemiology, 23 July 2020, doi:10.1016/j.annepidem.2020.07.007.
  14. Wiemers, Emily, et al. “Disparities in Vulnerability to Severe Complications from COVID-19 in the United States.” Research in Social Stratification and Mobility, vol. 69, 2020, doi:10.3386/w27294.
  15. Etkin, Yana, et al. “Acute Arterial Thromboembolism in Patients with COVID-19 in the New York City Area.” Annals of Vascular Surgery, 28 Aug. 2020, doi:10.1016/j.avsg.2020.08.085.
  16. Centers for Disease Control and Prevention. www.cdc.gov/.
  17. 2020 World Population by Country, worldpopulationreview.com/.