IRJEAS

Volume 13 Issue 3                        July-September 2025

Review Article

LICENCED UNDER

FIND ON

CROSSREF

SCILIT

 

A Comprehensive Survey of Machine Learning Techniques for Phishing Email Detection: Architectures, Challenges and Future Directions

Country- INDIA

Arun Lodhi, Dilip Ahirwar

PAPER ID: IRJEAS04V13I3007

Published: June 2025

Journal: IRJEAS, Volume 13, Issue 3

Pages: 24-31

Abstract:

The exponential growth of email as a primary communication channel has been paralleled by a rise in its exploitation for cybercrime, with phishing representing a particularly pervasive and damaging threat. Phishing attacks, designed to deceive users into surrendering sensitive information, continuously evolve to bypass traditional rule-based and heuristic filters. This has necessitated the adoption of more intelligent, adaptive solutions. Machine Learning (ML) has emerged as a powerful paradigm for detecting these sophisticated attacks by learning complex patterns indicative of malicious intent from email data. This paper presents a comprehensive survey of the state-of-the-art in ML-based phishing email detection. We provide a detailed taxonomy of phishing attacks, a thorough analysis of the email ecosystem and its vulnerabilities, and a critical review of feature extraction techniques—from URL analysis and header inspection to linguistic and behavioral features. We systematically categorize and evaluate a wide range of ML algorithms, including Naïve Bayes, Support Vector Machines (SVM), Random Forests, and neural networks, discussing their respective strengths and limitations. Furthermore, we analyze hybrid and ensemble approaches that combine multiple models to enhance performance and robustness. The survey also delves into persistent challenges such as data scarcity, feature engineering complexity, model adaptability, and real-world deployment issues. Finally, we outline promising future research directions, including the application of deep learning, explainable AI (XAI), and adversarial training to create next-generation phishing detection systems that are both highly accurate and resilient.