Authors: Sowmiya R,, Divakar P, Ragul M, Anbu D
Abstract: – Cyber attacks are among the most widespread and dangerous cyber threats, tricking users into revealing sensitive information by imitating legitimate websites. As these attacks become increasingly sophisticated, traditional detection methods often fail to identify them accurately. This project proposes a hybrid machine learning ML -based system that enhances phishing URL detection by analyzing both the structural and semantic features of URLs. The system extracts a rich set of features, including domain names, subdomains, URL paths, query parameters, and overall URL structure, which serve as critical indicators for identifying hidden phishing patterns. To optimize detection performance, the system integrates the capabilities of two robust ML models: Random Forest and Support Vector Machine . RF functions as both a feature selector and classifier, leveraging its ensemble learning mechanism to improve accuracy while minimizing overfitting. SVM, known for its effectiveness in handling high-dimensional data, constructs an optimal hyperplane to separate legitimate URLs from phishing ones. The hybrid approach of combining RF and SVM enhances the system’s precision, robustness, and overall detection capability. This dual-model system not only addresses the shortcomings of conventional and single-model techniques but also contributes significantly to preventing data breaches and financial losses. The proposed method demonstrates a scalable and efficient solution for real-world phishing detection by applying advanced machine learning techniques to analyze URL characteristics in depth.