Email Spam And Phishing Classifier With Pretrained Language Models

9 Jun

Authors: Prof. Nethra H L, Vedant Sachin Nagare, Tejas Nag NK, Shivam Raj, Mokshit S,

Abstract: – The proliferation of spam and phishing emails poses a significant threat to digital security, necessitating advanced filtering mechanisms to protect users before malicious emails reach their inboxes. This study proposes an email spam and phishing classifier leveraging pretrained language models (PLMs) such as BERT, Roberta, and GPT-4, integrated into a pre-inbox filtering system. By employing transformer-based architectures and hybrid feature engineering, the system achieves high precision in classifying emails as spam, phishing, or ham. The methodology incorporates semantic embeddings, metadata analysis, and concept drift detection to ensure robust performance against evolving threats. Experimental results indicate an accuracy of up to 99.8% on benchmark datasets, with real-time filtering capabilities suitable for integration with email servers. This work highlights the efficacy of PLMs in proactive email security and addresses challenges such as adversarial attacks and multilingual spam detection, offering a scalable solution for modern cybersecurity needs.