Interpretable machine learning approach for early cancer detection

Babajide Olakunle Afeni 1, Peter Adetola Adetunji 2, *, Ibrahim Olakunle Yakub 3 and Philip Adetayo Adetunji 4

1 School of Computing, Engineering and Digital Technologies, Teesside University, Middlesborough, TS1 3BX, UK.
2 School of Computing and Engineering, University of Huddersfield, Huddersfield, HD1 3DH, UK.
3 Faculty of Engineering and Digital Technologies, University of Bradford, BD7 1DP.
4 Department of Computer Science, Federal University Lokoja, PMB 1154.
 
Review
International Journal of Science and Research Archive, 2024, 13(02), 3402-3413.
Article DOI: 10.30574/ijsra.2024.13.2.2586
Publication history: 
Received on 14 November 2024; revised on 22 December 2024; accepted on 25 December 2024
 
Abstract: 
Early cancer detection significantly improves treatment outcomes and patient survival rates. This study explores the efficacy of various machine learning models such as Logistic Regression, Support Vector Machine (SVM), Random Forest, XGBoost, and Neural Network in predicting early-stage cancer. Employing the Local Interpretable Model-agnostic Explanations (LIME) approach, we ensure model transparency and interpretability, which are essential for clinical application. The models were evaluated on a dataset with key features including cancer history, gender, smoking status, age, BMI, genetic risk, alcohol intake, and physical activity. Among the models, Random Forest and XGBoost demonstrated superior performance, achieving the highest balanced accuracy and AUC scores. LIME visualizations revealed that cancer history and gender were the most influential features across all models, with additional contributions from smoking status, age, and BMI. The study highlights the potential of tree-based models for accurate and interpretable cancer detection, providing clinicians with actionable insights. Our findings advocate for the integration of these models into clinical practice, enabling early intervention and personalized treatment strategies. Further research is recommended to validate these models in larger and more diverse populations and to explore the inclusion of additional medical data to enhance predictive accuracy.
 
Keywords: 
Cancer Detection; Machine Learning Model; Lime Models; Medical History Analysis; Early Diagnosis
 
Full text article in PDF: