Authors: Dr. A. A. Khatri, Professor V. A. Karad, Shreyash S. Andre, Akash S. Mundhe, Ashish C. Nalawade, Professor Bangar A.P
Abstract: The rapid increase in imaging data, electronic health records, lab reports, and patient-doctor discussions has created an important need for intelligent systems to understand and interpret disparate biomedical data in a unified manner. The conventional unimodal diagnostic systems that solely rely on textual reports or medical images do not capture the complete clinical context for adequate decision-making. The study explores the application of multimodal machine learning (MML) to solve clinical diagnostic problems using images. Here we describe a health chatbot that uses the Gemini API by Google. It also outlines an X-ray and medical-image analyzer based on convolutional neural network (CNN). An audio-based health assistant helps to transcribe doctor-patient conversations and summarize them. The combined report generator joins heterogeneous PDFs into a single intelligible document. The proposed components of the MML framework, and their fusion integration, are demonstrated in a set of experiments. Human trials produced diagnostics that are indistinguishable from those by human clinicians. The framework suggested was implemented using Python, Streamlit, PyPDF2, ReportLab, Pandas, and Matplotlib Google Gemini Pro Vision API. This was assessed on a well-crafted dataset with 1,000 chest X-ray images belonging to 5 diagnostic classes, i.e., Normal, Pneumonia, Tuberculosis, COVID-19, and Lung Cancer, 500 corresponding clinical reports, and 200 audio consultations. The experimental outcomes show that the multi-modal system has an overall accuracy of 94.6%, with a precision of 93.8%, recall of 94.1%, F1-score of 93.9% and AUC of 0.96. These values are significantly higher than those of the unimodal baseline SVM (78.4%), The results show that the multimodal fusion of text, image and audio modalities, along with explainable visualization, dramatically improves diagnostic accuracy, clinical workflow efficiency and patient understanding.
International Journal of Science, Engineering and Technology