TL-BERT: An Anti-Phishing Model Based on Transfer Learning and Transformer Mechanisms for Protective Social Networking

Manoj Kumar Prabakaran; Abinaya Devi Chandrasekar; Santhi Selvaraj

doi:10.14445/23488549/IJECE-V13I1P103

TL-BERT: An Anti-Phishing Model Based on Transfer Learning and Transformer Mechanisms for Protective Social Networking

International Journal of Electronics and Communication Engineering

Volume 13 Issue 1

Year of Publication : 2026

Authors : Manoj Kumar Prabakaran, Abinaya Devi Chandrasekar, Santhi Selvaraj, Abinaya Pandiarajan

10.14445/23488549/IJECE-V13I1P103

How to Cite?

Manoj Kumar Prabakaran, Abinaya Devi Chandrasekar, Santhi Selvaraj, Abinaya Pandiarajan, "TL-BERT: An Anti-Phishing Model Based on Transfer Learning and Transformer Mechanisms for Protective Social Networking," SSRG International Journal of Electronics and Communication Engineering, vol. 13, no. 1, pp. 27-45, 2026. Crossref, https://doi.org/10.14445/23488549/IJECE-V13I1P103

Abstract:

Cybercrimes are growing exponentially in the digital era, and hackers continue to devise sophisticated cyber threats to gain unauthorized access. Among them, phishing remains one of the most prevalent and deceptive techniques used to exploit unsuspecting users. Although various preventive measures have been proposed by researchers in the past few decades, phishers are consistently adopting innovative strategies by deploying different forms of phishing URLs and webpage contents that are highly complex to detect in a real-time scenario. To address this issue, this work proposes TL_BERT: An anti-phishing model that integrates Transfer Learning (TL) with the Bidirectional Encoder Representations from Transformers (BERT) architecture. The model employs TL-adapted Autoencoders for extracting URL-based features and applies the BERT model to capture HTML-based textual features of a website. Both features are concatenated and classified using a Deep neural Network Model. Experiments were conducted on the benchmark dataset ISCXURL2016 dataset, which contains 54300 URL samples. The results indicate that TL_BERT attains a detection accuracy of 99.08% with a false positive rate of 1.01%. The optimized selection of lightweight architectures makes the proposed model a suitable entity for real-time deployment.

Keywords:

Bidirectional Encoder Representations from Transformers, Hypertext Markup Language, Phishing detection, Transfer Learning, Uniform Resource Locator.

References:

[1] Ike Vayansky, and Sathish Kumar, “Phishing – Challenges and Solutions,” Computer Fraud & Security, vol. 2018, no. 1, pp. 15-20, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Janos Szurdi et al., “The Long “Taile” of Typosquatting Domain Names,” Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, pp. 191-206, 2014.
[Google Scholar] [Publisher Link]
[3] Anti-Phishing Working Group, “Phishing Activity Trends Report, 3rd Quarter 2024,” Unifying the Global Response to Cybercrime, pp. 1-11, 2024.
[Publisher Link]
[4] Tara Baniya, Dipesh Gautam, and Yoohwan Kim, “Safeguarding Web Surfing with URL Blacklisting,” 2015 12th International Conference on Information Technology - New Generations, Las Vegas, NV, USA, pp. 157-162, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Steve Sheng et al., “An Empirical Analysis of Phishing Blacklists,” Proceedings of the 6th Conference on Email and Anti-spam (CEAS), Mountain View, California USA, pp. 1-10, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Ammar Odeh, Ismail Keshta, and Eman Abdelfattah, “Machine Learning Techniques for Detection of Website Phishing: A Review for Promises and Challenges,” 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), NV, USA, pp. 813-818, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Shamshair Ali et al., “Comparative Evaluation of AI-Based Techniques for Zero-Day Attacks Detection,” Electronics, vol. 11, no. 23, pp. 1-25, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Manu J. Pillai et al., “Evasion Attacks and Defense Mechanisms for Machine Learning-Based Web Phishing Classifiers,” IEEE Access, vol. 12, pp. 19375-19387, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Naya Nagy et al., “Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis,” Sensors, vol. 23, no. 7, pp. 1-17, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Abdul Karim et al., “Phishing Detection System through Hybrid Machine Learning Based on URL,” IEEE Access, vol. 11, pp. 36805-36822, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Alsharif Abuadbba et al., “Towards Web Phishing Detection Limitations and Mitigation,” arXiv Preprint, pp. 1-12, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Subhash Ariyadasa, Shantha Fernando, and Subha Fernando, “Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML,” IEEE Access, vol. 10, pp. 82355-82375, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Chenguang Wang, and Yuanyuan Chen, “TCURL: Exploring Hybrid Transformer and Convolutional Neural Network on Phishing URL Detection,” Knowledge-Based Systems, vol. 258, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Subhash Ariyadasa, Subha Fernando, and Shantha Fernando, “Detecting Phishing Attacks Using a Combined Model of LSTM and CNN,” International Journal of Advanced and Applied Sciences, vol. 7, no. 7, pp. 56-67, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Chidimma Opara, Yingke Chen, and Bo Wei, “Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL and HTML Characteristics,” Expert Systems with Applications, vol. 236, pp. 1-13, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Stay safe on eBay, eBay. [Online]. Available: https://pages.ebay.com/securitycenter/
[17] Netcraft Anti-Phishing Toolbar, Netcraft. [Online]. Available: https://toolbar.netcraft.com
[18] WOT: Web of Trust – Website Reputation and Security, Web of Trust. [Online]. Available: https://www.mywot. com
[19] Google Safe Browsing: Protecting Users from Phishing and Malware, Google Security Blog. [Online]. Available: https://safebrowsing.google.com
[20] McAfee SiteAdvisor: Website Safety Ratings and Security Analysis, McAfee Security. [Online]. Available: https://www.mcafee.com/en-in/safe-browser/mcafee-webadvisor.html
[21] Microsoft Defender SmartScreen: Protection against Phishing and Malware, Microsoft Security. [Online]. Available: https://learn.microsoft.com/en-us/windows/security/operating-system-security/virus-and-threat-protection/microsoft-defender-smartscreen/
[22] Forcepoint ThreatSeeker, Forcepoint. [Online]. Available: https://www.forcepoint.com/product/feature/threatseeker
[23] Mahmoud Khonji, Youssef Iraqi, and Andrew Jones, “Phishing Detection: A Literature Survey,” IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2091-2121, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Lizhen Tang, and Qusay H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Machine Learning and Knowledge Extraction, vol. 3, no. 3, pp. 672-694, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Doyen Sahoo, Chenghao Liu, and Steven C.H. Hoi, “Malicious URL Detection using Machine Learning: A Survey,” arXiv Preprint, pp. 1-37, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Brij B. Gupta et al., “A Novel Approach for Phishing URLs Detection Using Lexical Based Machine Learning in a Real-Time Environment,” Computer Communications, vol. 175, pp. 47-57, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Sajjad Jalil, Muhammad Usman, and Alvis Fong, “Highly Accurate Phishing URL Detection Based on Machine Learning,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 7, pp. 9233-9251, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Ankit Kumar Jain, and B.B. Gupta, “A Machine Learning Based Approach for Phishing Detection Using Hyperlinks Information,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 2015-2028, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[29] M.A. Adebowale et al., “Intelligent Web-phishing Detection and Protection Scheme using Integrated Features of Images, Frames and Text, Frames and Text,” Expert Systems with Applications, vol. 115, pp. 300-313, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Rizka Widyarini Purwanto et al., “PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 1497-1512, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Sultan Asiri et al., “A Survey of Intelligent Detection Designs of HTML URL Phishing Attacks,” IEEE Access, vol. 11, pp. 6421-6443, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Huaping Yuan et al., “Detecting Phishing Websites and Targets Based on URLs and Webpage Links,” 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, pp. 3669-3674, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Ali Aljofey et al., “An Effective Detection Approach for Phishing Websites Using URL and HTML Features,” Scientific Reports, vol. 12, pp. 1-19, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Peng Yang, Guangzhen Zhao, and Peng Zeng, “Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning,” IEEE Access, vol. 7, pp. 15196-15209, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Wenhao Li et al., “A State-of-the-Art Review on Phishing Website Detection Techniques,” IEEE Access, vol. 12, pp. 187976-188012, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Xi Xiao et al., “CNN–MHSA: A Convolutional Neural Network and Multi-Head Self-Attention Combined Approach for Detecting Phishing Websites,” Neural Networks, vol. 125, pp. 303-312, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Erzhou Zhu et al., “CCBLA: A Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism,” Cognitive Computation, vol. 15, pp. 1320-1333, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Pranav Maneriker et al., “URLTran: Improving Phishing URL Detection Using Transformers,” MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM), San Diego, CA, USA, pp. 197-204, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Katherine Haynes, Hossein Shirazi, and Indrakshi Ray, “Lightweight URL-Based Phishing Detection Using Natural Language Processing Transformers for Mobile Devices,” Procedia Computer Science, vol. 191, pp. 127-134, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Nguyet Quang Do et al., “An Integrated Model Based on Deep Learning Classifiers and Pre-Trained Transformer for Phishing URL Detection,” Future Generation Computer Systems, vol. 161, pp. 269-285, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Mayu Sakurada, and Takehisa Yairi, “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction,” Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast Australia QLD Australia, pp. 4-11, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[42] T. Berners-Lee, L. Masinter, and M. McCahill, “Uniform Resource Locators (URL),” IETF RFC 1738, pp. 1-25, 1994.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Allen Chieng Hoon Choong, and Nung Kion Lee, “Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences Using Ordinal Versus One-Hot Encoding Method,” 2017 International Conference on Computer and Drone Applications (IConDA), Kuching, Malaysia, pp. 60-65, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Dor Bank, Noam Koenigstein, and Raja Giryes, Autoencoders, Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, pp. 353-374, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[45] Shuteng Niu et al, “A Decade Survey of Transfer Learning (2010–2020),” IEEE Transactions on Artificial Intelligence, vol. 1, no. 2, pp. 151-166, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[46] Jacob Devlin et al., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, Minnesota, pp. 4171-4186, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[47] Mustafa Nabeel Salim, and Ban Shareef Mustafa, “A Survey on Word Representation in Natural Language Processing,” AIP Conference Proceedings: 1st Samarra International Conference for Pure And Applied Sciences, Samarra, Iraq, vol. 2394, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[48] Weiping Wang et al., “PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks,” Security and Communication Networks, vol. 2019, pp. 1-15, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[49] Chidimma Opara, Bo Wei, and Yingke Chen, “HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis,” 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1-8, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

IJECE MENUS

Call for Paper - Upcoming Issues

TL-BERT: An Anti-Phishing Model Based on Transfer Learning and Transformer Mechanisms for Protective Social Networking

How to Cite?

Abstract:

Keywords:

References: