A Hybrid CNN-Multi-Class SVM Framework for Biomedical Document Gene-Disease Datasets Classification

International Journal of Electronics and Communication Engineering
© 2023 by SSRG - IJECE Journal
Volume 10 Issue 12
Year of Publication : 2023
Authors : Jose Mary Golamari, D. Haritha
pdf
How to Cite?

Jose Mary Golamari, D. Haritha, "A Hybrid CNN-Multi-Class SVM Framework for Biomedical Document Gene-Disease Datasets Classification," SSRG International Journal of Electronics and Communication Engineering, vol. 10,  no. 12, pp. 73-82, 2023. Crossref, https://doi.org/10.14445/23488549/IJECE-V10I12P107

Abstract:

Healthcare investigators and clinicians need biomedical document classification to organize and handle the large volume of biomedical literature. Conventional classification methods use manually designed features, which may be timeconsuming and may not represent biomedical text complexity. Biomedical data’s high dimensionality and sparsity may also challenge current approaches. For big datasets, CNNs are computationally costly. Increasing feature extraction efficiency reduces training and inference durations. The proposed method intends to improve the accuracy of document classification in the biomedical sector considerably. It functions in two stages: feature extraction and classification. The proposed method employs a hybrid approach to biomedical document classification, focusing on the intricate interactions between genes, diseases, and chemical treatments via the use of a CNN Multi-class Support Vector Machine (M-SVM) model. CNN is utilized to extract features, while M-SVM is employed as a classifier. This work discusses Improved CNNs, which may extract more discriminative and informative features from input data, resulting in a more accurate representation of underlying patterns and connections. Error-Correcting Output Coding (ECOC) based on M-SVM is used to manage noisy data by merging the outputs of many binary classifiers, enabling it to recover from faults in individual classifiers and thereby lowering the risk of overfitting. The study’s results show that the proposed model is successful, with an accuracy of 99.28% and an F1-score of 99.84% across biomedical document datasets.

Keywords:

Biomedical documents, Gene data, Feature extraction, Classification, CNN, M-SVM.

References:

[1] Martín Pérez-Pérez et al., “A Novel Gluten Knowledge Base of Potential Biomedical and Health-Related Interactions Extracted from the Literature: Using Machine Learning and Graph Analysis Methodologies to Reconstruct the Bibliome,” Journal of Biomedical Informatics, vol. 143, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Thulasi Bikku, and Radhika Paturi, “A Novel Somatic Cancer Gene-Based Biomedical Document Feature Ranking and Clustering Model,” Informatics in Medicine Unlocked, vol. 16, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Xiaofeng Liu et al., “A Syntax-Enhanced Model Based on Category Keywords for Biomedical Relation Extraction,” Journal of Biomedical Informatics, vol. 132, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Evan French, and Bridget T. McInnes, “An Overview of Biomedical Entity Linking Throughout the Years,” Journal of Biomedical Informatics, vol. 137, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Kairui Guo et al., “Artificial Intelligence-Driven Biomedical Genomics,” Knowledge-Based Systems, vol. 279, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Esmaeil Nourani, and Vahideh Reshadat, “Association Extraction from Biomedical Literature Based on Representation and Transfer Learning,” Journal of Theoretical Biology, vol. 488, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Elizabeth S. Chen et al., “Automated Acquisition of Disease-Drug Knowledge from Biomedical and Clinical Documents: An Initial Study,” Journal of the American Medical Informatics Association, vol. 15, no. 1, pp. 87-98, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Saeid Balaneshinkordan, and Alexander Kotov, “Bayesian Approach to Incorporating Different Types of Biomedical Knowledge Bases into Information Retrieval Systems for Clinical Decision Support in Precision Medicine,” Journal of Biomedical Informatics, vol. 98, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Alberto G. Jácome, Florentino Fdez-Riverola, and Anália Lourenço, “BIOMedical Search Engine Framework: Lightweight and Customized Implementation of Domain-Specific Biomedical Search Engines,” Computer Methods and Programs in Biomedicine, vol. 131, pp. 63-77, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Martín Pérez-Pérez et al., “Boosting Biomedical Document Classification through the Use of Domain Entity Recognizers and Semantic Ontologies for Document Representation: The Case of Gluten Bibliome,” Neurocomputing, vol. 484, pp. 223-237, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Laura Plaza, “Comparing Different Knowledge Sources for the Automatic Summarization of Biomedical Literature,” Journal of Biomedical Informatics, vol. 52, pp. 319-328, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Hyunjin Shin et al., “Comparing Research Trends with Patenting Activities in the Biomedical Sector: The Case of Dementia,” Technological Forecasting and Social Change, vol. 195, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Muhammad Abulaish, Md. Aslam Parwez, and Jahiruddin, “DiseaSE: A Biomedical Text Analytics System for Disease Symptom Extraction and Characterization,” Journal of Biomedical Informatics, vol. 100, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Saranya Muniyappan, Arockia Xavier Annie Rayan, and Geetha Thekkumpurath Varrieth, “EGeRepDR: An Enhanced Genetic-Based Representation Learning for Drug Repurposing Using Multiple Biomedical Sources,” Journal of Biomedical Informatics, vol. 147, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Xu Ling et al., “Generating Gene Summaries from Biomedical Literature: A Study of Semi-Structured Summarization,” Information Processing & Management, vol. 43, no. 6, pp. 1777-1791, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Muhammad Ali Ibrahim et al., “GHS-NET a Generic Hybridized Shallow Neural Network for Multi-Label Biomedical Text Classification,” Journal of Biomedical Informatics, vol. 116, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Nichola Foster et al., “IBM Watson AI-Enhanced Search Tool Identifies Novel Candidate Genes and Provides Insight into Potential Pathomechanisms of Traumatic Heterotopic Ossification,” Burns Open, vol. 7, no. 4, pp. 126-138, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Jiho Noh, and Ramakanth Kavuluru, “Improved Biomedical Word Embeddings in the Transformer Era,” Journal of Biomedical Informatics, vol. 120, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Tomoki Tsujimura, Makoto Miwa, and Yutaka Sasaki, “Large-Scale Neural Biomedical Entity Linking with Layer Overwriting,” Journal of Biomedical Informatics, vol. 143, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Tommaso Mario Buonocore et al., “Localizing In-Domain Adaptation of Transformer-Based Biomedical Language Models,” Journal of Biomedical Informatics, vol. 144, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Hermenegildo Fabregat et al., “Negation-Based Transfer Learning for Improving Biomedical Named Entity Recognition and Relation Extraction,” Journal of Biomedical Informatics, vol. 138, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Bernhard Schölkopf, and Alexander J. Smola, Learning with Kernels - Support Vector Machines, Regularization, Optimization and Beyond, The MIT Press, 2018.
[CrossRef] [Google Scholar] [Publisher Link]