Authors: Ashmi Jomon
Abstract: Pneumonia remains one of the leading causes of mortality worldwide, necessitating accurate and timely diagnostic tools. Conventional diagnostic approaches often rely on a single modality such as chest X-rays or CT scans, each providing valuable but distinct clinical information. This paper presents a multimodal deep learning framework that integrates three com-plementary diagnostic modalities—Chest X-Ray images, Chest CT Scan images, and Lung Sound audio recordings—for robust and flexible pneumonia detection. Three independent deep learning models are developed: a DenseNet121 architecture for Chest X-Ray classification, a ResNet50 architecture for CT Scan analysis, and a custom Convolutional Neural Network (CNN) for Lung Sound classi-fication, where raw audio recordings are converted into Mel spectrogram images prior to inference. An attention-based Late Fusion mechanism dynamically combines the probability outputs of the individual models by assigning learned trust weights to each modality through an attention network and producing a final consensus prediction via a dedicated consensus network. The complete system is deployed as a Flask-based web applica-tion supporting both single-modality and comprehensive multi-modal prediction modes, enabling adaptability across different clinical scenarios. Experimental evaluation demonstrates that the proposed system effectively supports reliable predictions across individual modalities while also enabling enhanced inference when multiple modalities are available, evaluated using standard metrics including Accuracy, Precision, Recall, F1-Score, and ROC-AUC. The system demonstrates significant potential as an accessible and clinically meaningful decision support tool for early pneu-monia detection.
International Journal of Science, Engineering and Technology