Improving the Accuracy of Early Diagnosis of Dengue Hemorrhagic Fever Based on Clinical Symptoms Using Random Forest

Suci Lestari; Rudi Kurniawan; Bani Nurhakim; Ahmad Rifa'I; Ryan Hamonangan

doi:10.59934/jaiea.v5i2.2170

Authors

Suci Lestari STMIK IKMI Cirebon
Rudi Kurniawan STMIK IKMI Cirebon
Bani Nurhakim STMIK IKMI Cirebon
Ahmad Rifa'I STMIK IKMI Cirebon
Ryan Hamonangan STMIK IKMI Cirebon

DOI:

https://doi.org/10.59934/jaiea.v5i2.2170

Keywords:

Dengue Hemorrhagic Fever, Clinical Diagnosis, Machine Learning, Random Forest, SHAP

Abstract

The development of machine learning in the field of health provides important opportunities for improving the accuracy of disease diagnosis, including dengue hemorrhagic fever (DHF), which remains a major health problem in Indonesia. This study aims to develop an early DHF diagnosis model based on the Random Forest algorithm using clinical symptom data from patients at the Kosasih Group Clinic. The research was conducted using a quantitative approach through the CRISP-DM stages, which included data acquisition, validation, cleaning, and preprocessing, covering missing value handling, normalization, and class imbalance management using SMOTE. The dataset was then divided using stratified sampling to maintain class proportions, followed by training the Random Forest model optimized using Bayesian Optimization to obtain the best combination of hyperparameters. Performance evaluation was carried out using accuracy, precision, recall, F1-score, and ROC-AUC metrics, and validated using stratified k-fold cross-validation to ensure model stability. Model interpretability was analyzed using SHAP and LIME to identify the contribution of each clinical symptom to the prediction. The results showed that the model was able to provide high classification performance, with increased sensitivity to DHF cases after applying SMOTE and consistent interpretation of clinical symptoms such as fever, joint pain, and nausea. These findings confirm the potential of Random Forest as a reliable model to support the development of AI-based clinical decision support systems (CDSS) for early diagnosis of DHF in primary health care facilities.

Downloads

Download data is not yet available.

References

G. S. Collins et al., “TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods,” BMJ, vol. 385, p. e078378, 2024.

N. Sharif, M. R. Islam, and M. Hasan, “Evolving epidemiology, clinical features, and genotyping of dengue outbreaks in Bangladesh, 2000–2024: A systematic review,” Front. Microbiol., vol. 15, p. 1481418, 2024.

H. Long et al., “Annual global dengue dynamics are related to multi-source factors revealed by a machine learning prediction analysis,” PLoS Negl. Trop. Dis., vol. 19, no. 6, p. e0013232, 2025.

H. Xia and X. Dong, “The global, regional, and national burden trends of dengue among adults aged 20–49 from 1990 to 2021,” Sci. Rep., vol. 15, p. 26761, 2025.

J. Jung, J. Dai, B. Liu, and Q. Wu, “Artificial intelligence in fracture detection with different image modalities and data types: A systematic review and meta-analysis,” PLOS Digit. Heal., vol. 3, no. 1, p. e0000438, 2024.

R. T. Subarna and Z. Al Saiyan, “Understanding the unprecedented 2023 dengue outbreak in Bangladesh: A data-driven analysis,” IJID Reg., vol. 12, p. 100406, 2024.

X. Y. Leung et al., “A systematic review of dengue outbreak prediction models: Current scenario and future directions,” PLoS Negl. Trop. Dis., vol. 17, no. 2, p. e0010631, 2023.

M. Y. Ng et al., “Perceptions of dataset experts on important characteristics of health datasets ready for machine learning: A qualitative study,” JAMA Netw. Open, vol. 6, no. 2, p. e2812417, 2023.

S. R. da Silva Neto, T. Tabosa Oliveira, I. V Teixeira, S. B. Aguiar de Oliveira, V. Souza Sampaio, and T. Lynn, “Machine learning and deep learning techniques to support clinical diagnosis of arboviral diseases: A systematic review,” PLoS Negl. Trop. Dis., vol. 16, no. 1, p. e0010061, 2022.

B. C. Bohm et al., “Utilization of machine learning for dengue case screening,” BMC Public Health, vol. 24, 2024.

G. Gupta et al., “DDPM: A dengue disease prediction and diagnosis model using sentiment analysis and machine learning algorithms,” Diagnostics, vol. 13, no. 6, p. 1093, 2023.

R. Zargari Marandi, P. Leung, C. Sigera, D. D. Murray, P. Weeratunga, and D. Fernando, “Development of a machine learning model for early prediction of plasma leakage in suspected dengue patients,” PLoS Negl. Trop. Dis., vol. 17, no. 3, p. e0010758, 2023.

A. Lamer, C. Saint-Dizier, N. Paris, and E. Chazard, “Data lake, data warehouse, data mart, and feature store: Their contributions to the complete data reuse pipeline,” JMIR Med. Informatics, vol. 12, p. e54590, 2024.

S. Islam Khan and A. S. M. L. Hoque, “SICE: an improved missing data imputation technique,” J. Big Data, vol. 7, p. 37, 2020.

R. Hassanzadeh, M. Farhadian, and H. Rafieemehr, “Hospital mortality prediction in traumatic injuries patients: Comparing different SMOTE-based machine learning algorithms,” BMC Med. Res. Methodol., vol. 23, p. 101, 2023.

P. Studi and S. Informasi, “KLASIFIKASI EMOSI PADA TWEET BERBAHASA,” no. 86.

L. Barreñada, P. Dhiman, D. Timmerman, A.-L. Boulesteix, and B. Van Calster, “Understanding overfitting in random forest for probability estimation: A visualization and simulation study,” Diagnostic Progn. Res., vol. 8, p. 14, 2024.

R. D. Riley, “Evaluation of clinical prediction models (Part 2): Calibration, discrimination and overall performance metrics,” BMJ, vol. 384, 2024.

H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, “Feature selection strategies: A comparative analysis of SHAP-value and importance-based methods,” J. Big Data, vol. 11, p. 44, 2024.