CapsuleVision: An Interpretable Deep Learning Framework For Wireless Capsule Endoscopy Image Classification

14 Apr

Authors: Mrs. N. V. S.Sowjanya, Dammala Bhanu Durgesh, Chodavarapu Sriram, Vara Bhanu Prasad, Penta Rameswar, Sheik Shameerulla

Abstract: Deep learning has significantly advanced medical imaging and computer-aided diagnosis (CAD), enabling accurate disease detection. However, the limited interpretability of deep learning (DL) models restricts their clinical adoption. To address this, Explainable Artificial Intelligence (XAI) techniques are used to better understand model decisions. In endoscopic imaging, diagnosis is mainly based on manual visual inspection, which can be time-consuming and subjective. Integrating automated DL systems can improve both accuracy and efficiency. In this study, multiple transfer learning models are applied to a balanced subset of the Kvasir-Capsule dataset, consisting of the top nine classes. The Vision Transformer (ViT) achieves the best performance with an F1-score of 97% ± 1%, outperforming existing approaches. Other models, including MobileNetV3Large and ResNet152V2, also achieve F1-scores above 90%.To enhance interpretability, XAI techniques such as Grad-CAM, Grad-CAM++, Layer-CAM, LIME, and SHAP are used to generate heatmaps highlighting important regions in the images. These visual explanations provide insights into model decisions and reduce the black-box nature of DL models. Overall, this work combines high accuracy with improved transparency, contributing to more reliable and trustworthy medical AI systems.

DOI: