Classification and Analysis of Patients with COVID-19 Using Machine Learning
Main Article Content
Abstract
The rapid spread of the Coronavirus disease (COVID-19) has demanded studies and research works from many areas of knowledge, searching for treatments, vaccines and preventive measures. This pandemic has become a very challenging situation due to its substantial demand for medical infrastructure. In this context, this paper proposes to apply Machine Learning methods to classify and to analyse the outcome of patients with COVID-19 as discharge or death and to describe the profile of patients infected by the coronavirus. The dataset consists of clinical data from Sírio Libanês Hospital, available in the FAPESP repository (2020). Results indicate that, among all tested classifiers, the Naive Bayes algorithm presents better performance and it better represents the phenomenon under study, demonstrating superiority in terms of classification and induction numerical analysis of the epidemiological phenomenon for COVID-19.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Acter, T., Uddin, N., DAS, J., Akhter, A., Choudhury, T.R., Kim, S. Evolution of severe acute respiratory syndrome coronavirus 2 (sars-cov-2) as coronavirus disease 2019 (covid-19) pandemic: a global health emergency. Science of the Total Environment. 730, e138996 (2020).
Aggarwal, C.C. Data classification: algorithms and applications. (CRC Press, Yorktown Heights, New York, USA, 2014).
Ahmad, A., Garhwal, S., Ray, S.K., Kumar, G., Malebary, S.J., Barukab, O.M. The number of confirmed cases of covid-19 by using machine learning: methods and challenges. Archives of Computational Methods in Engineering. 28, 2645-2653 (2020).
Alelyani, S., Liu, H., Wang, L. The effect of the characteristics of the dataset on the selection stability. In: INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE. IEEE. Proceedings. 970-977 (2011).
Alimadadi, A., Aryal, S., Manandhar, I., Munroe, P.B., Joe, B., Cheng, X. Artificial intelligence and machine learning to fight covid-19. American Physiological Society Bethesda, MD, (2020).
Allam, M., Cai, S., Ganesh, S., Venkatesan, M., Doodhwala, S., Song, Z., HU, T., Kumar, A., Heit, J., Coskun, A.F., et al. Covid-19 diagnostics, tools, and prevention. Diagnostics. 10, 1-33 (2020).
Beckmann, J.S., Lew, D. Reconciling evidence-based medicine and precision medicine in the era of big data: challenges and opportunities. Genome medicine. 8, 1-11 (2016).
Box, G.E.P., Tiao, G.C. Bayesian inference in statistical analysis. (John Wiley and Sons, Canada, 1992).
Brazil. Ministry of Health. Coronavirus Panel Brazil. Available in <https://covid.saude.gov.br>, (accessed in April 26, 2022).
Breiman, L. Random forests. Machine learning. 45, 5-32 (2001).
Dong, E., Du, H., Gardner, L. An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases. 20, 533-534 (2020).
Dougherty, G. Pattern recognition and classification: an introduction. (Springer Science & Business Media, California, USA, 2012).
FAPESP. FAPESP COVID-19 Data Sharing/BR, (2020).
Gao, Y., Cai, G.Y., Fang, W., Li, H.Y., Wang, S.Y., Chen, L., Yu, Y., Liu, D., Xu, S., Cui, P.F., et al. Machine learning based early warning system enables accurate mortality risk prediction for covid-19. Nature communications. 11, 1-10 (2020).
Gordis, L. Epidemiology. (Elsevier Saunders, Philadelphia, PA, 2014).
Grzybowski, J.M.V., Da Silva, R.V., Rafikov, M., 2020. Expanded seircq model applied to covid-19 epidemic control strategy design and medical infrastructure planning. Mathematical Problems in Engineering. e8198563 (2020).
Han, J., Kamber, M., Pei, J., Data mining: concepts and techniques. (Morgan Kaufmann, Burlington, MA, USA, 2012).
Hou, W., Zhao, Z., Chen, A., Li, H., Duong, T.Q. Machining learning predicts the need for escalated care and mortality in COVID-19 patients from clinical variables. International Journal of Medical Sciences. 18 (8), 1739-1745 (2021).
Lalmuanawma, S., Hussain, J., Chhakchhuak, L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos, Solitons & Fractals, e110059 (2020).
Maimon, O.Z., Rokach, L. Data mining with decision trees: theory and applications. (World scientific, 2014).
Mello, L. E. et al. Opening Brazilian COVID-19 patient data to support world research on pandemics. Zenodo, (2020).
Nemati, M., Ansary, J., Nemati, N. Machine-learning approaches in covid-19 survival analysis and discharge-time likelihood prediction using clinical data. Patterns. 1, e100074 (2020).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/, (2020).
Ranzani, O.T., Bastos, L.S.L., Gelli, J.G.M., Marchesi, J.F., Baião, F., Hamacher, S., Bozza, F.A. Characterisation of the first 250 000 hospital admissions for covid-19 in Brazil: a retrospective analysis of nationwide data. The Lancet Respiratory Medicine. (2021).
Rodríguez-Morales, A., Macgregor, K., Kanagarajah, S., Patel, D., Schlagenhauf, P. Going global – travel and the 2019 novel coronavirus. Travel medicine and infectious disease. 33, e101578 (2020).
Steyerberg, E.W., Vickers, A.J., R., C.N., Gerds, T., Gonen, M., Obuchowski, N., J., P.M., Kattan, M.W. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 21, 128-138 (2010).
Vere, J., Gibson, B. Evidence-based medicine as science. Journal of Evaluation in Clinical Practice. 25, 997-1002 (2019).
Yadav, M., Perumal, M., Srinivas, M. Analysis on novel coronavirus (covid-19) using machine learning methods. Chaos, Solitons & Fractals. 139, 110050 (2020).
Zeng, X., Zhang, Y., Kwong, J.S.W., Zhang, C., Li, S., Sun, F., Niu, Y., Du, L. The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review. Journal of evidence-based medicine. 8, 2-10 (2015).