Authors: Shivam Devidas Gawade, Professor Nishant Rathod
Abstract: The rapid development of digital communication has resulted in a huge volume of email including unsolicited spam, which can cause serious problems such as criminal fraud, time wastage and difficulty in identifying useful emails The aim of this study is to develop pattern-based machine learning that accurately detects and filters spam emails It can do that. By leveraging algorithms to analyze email content, sender information, and metadata attributes, we address the growing need for an efficient, scalable solution to this problem. Our approach involves pre-processing email data through tokenization, stopword extraction, stemming, and vectorization, followed by feature extraction focusing on content-based, metadata, behavioral attributes. We look at how different machine learning models some including Naive Bayes, Random Forest, Gradient Boosting are performed Model performance is evaluated using , and F1-scores The study concludes that clustering methods, especially random forests, provide solutions that are difficult for, balances accuracy and computational efficiency. Although deep learning models such as CNN and NLP-based transformers provide good detection capabilities, their inherent robustness limits their practical application in small-scale applications Future work should focus on nature further integration of advanced language processing techniques to improve the effectiveness and efficiency of spam email detection.
DOI: 10.61463/ijset.vol.13.issue3.177
International Journal of Science, Engineering and Technology