Towards Trustworthy AI-Assisted Healthcare: Review on Studies Integrating LLMs, Multimodal Analysis, and Collaborative Filtering for Personalized Diagnosis
| International Journal of Computer Science and Engineering |
| © 2026 by SSRG - IJCSE Journal |
| Volume 13 Issue 1 |
| Year of Publication : 2026 |
| Authors : Savithri, A V L N Sujith |
How to Cite?
Savithri, A V L N Sujith, "Towards Trustworthy AI-Assisted Healthcare: Review on Studies Integrating LLMs, Multimodal Analysis, and Collaborative Filtering for Personalized Diagnosis," SSRG International Journal of Computer Science and Engineering , vol. 13, no. 1, pp. 1-9, 2026. Crossref, https://doi.org/10.14445/23488387/IJCSE-V13I1P101
Abstract:
The accelerated growth of digital health records, multimodal patient data, and unstructured clinical narratives has overburdened the conventional recommendation systems used in healthcare, and they are incapable of working with complex long-term histories, contextual logic, and multimodal integration. Although Large Language Models (LLMs) have improved natural language understanding and decision support, there are common issues that prevent them, such as hallucinations, insufficient interpretability, safety risks, domain bias, inconsistent reactions, and unreliability across clinical domains, which inhibit the clinical reliability of Large Language Models. This study introduces a hybrid architecture, which is a synergetic integration of LLMs (contextual and reasoning), multimodal modules (clinical image and report analysis), and graph-based collaborative filtering to learn patient longitudinal interactions and collaborative cues. In order to solve hallucinations and uncertainty, the framework involves retrieval-augmented generation, multi-LLM ensemble uncertainty quantification, and knowledge-grounded verification. Tracing paths of reasoning, uncertainty maps, and justifications provided to clinicians to explain explanatory models is built into explainable models to build trust and validation. The system is strictly tested against actual clinical data and known standards (MedQA, MultiMedQA, MEDHALU, and other emerging suites of practice applications such as HealthBench and DiagnosisArena). Early findings show that it is more accurate in diagnostic procedures, has fewer hallucinations (lowering it to less than 2 percent), achieves greater safety in adversarial use, and personalizes better than either standalone LLM or conventional methods. The paper paves the way for creating safe, equitable, and clinically viable decision support tools by filling the knowledge-based reasoning-collaborative longitudinal recommendation gap, which can empower human expertise and not replace it.
Keywords:
Recommendation Systems, AI in Health Care, LLM, GPT Models.
References:
[1] Takanobu Hirosawa et al., “Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study,” International Journal of Environmental Research and Public Health, vol. 20, no. 4, pp. 1-10, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Alec Radford et al., “Language Models are Unsupervised Multitask Learners,” OpenAI Blog, pp. 1-24, 2019.
[Google Scholar] [Publisher Link]
[3] Karan Singhal et al., “Large Language Models Encode Clinical Knowledge (Version 1),” arXiv:2212.13138, pp. 1-44, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sébastien Bubeck et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4 (Version 1),” arXiv:2303.12712, pp. 1-155, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Zahra Atf et al., “The Challenge of Uncertainty Quantification of Large Language Models in Medicine (Version 1),” arXiv:2504.05278, pp. 1-25, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Jason Wei et al., “Emergent Abilities of Large Language Models,” Transactions on Machine Learning Research, pp. 1-30, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Vibhor Agarwal et al., “MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models (Version 2),” arXiv:2409.19492, pp. 1-13, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Josh Achiam et al., “GPT-4 Technical Report (Version 1),” arXiv:2303.08774, pp. 1-100, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Yubo Ma et al., “SciAgent: Tool-Augmented Language Models for Scientific Reasoning,” arXiv:2402.11451, pp. 1-34, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Yujia Qin et al., “Tool Learning with Foundation Models (Version 1),” arXiv:2304.08354, pp. 1-75, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Sijia Chen et al., “CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs (Version 1),” arXiv:2505.11413, pp. 1-31, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Harsha Nori et al., “Capabilities of GPT-4 on Medical Challenge Problems (Version 1),” arXiv:2303.13375, pp. 1-35, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Guangyu Wang et al., “ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation (Version 1),” arXiv:2306.09968, pp. 1-11, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Long Ouyang et al., “Training Language Models to Follow Instructions with Human Feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730-27744, 2022.
[Google Scholar] [Publisher Link]
[15] Yexiao He et al., “MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility (Version 1),” arXiv:2506.00235, pp. 1-21, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Congzhen Shi et al., “Towards Trustworthy Foundation Models for Medical Image Analysis,” arXiv:2407.15851, pp. 1-44, 2024.
[Google Scholar] [Publisher Link]

10.14445/23488387/IJCSE-V13I1P101