Plenty of Smish in the Sea - Time to Cast the PhishNet

Plenty of Smish in the Sea – Time to Cast the PhishNet

Social engineering attacks aim to get unauthorized access or sensitive information by exploiting human vulnerabilities and trust, and with recent advances in natural language processing (NLP), threat actors are now able to produce sophisticated phishing attacks that are convincing, persuasive, and coherent using generative language models such as GPT-based ones (e.g., WormGPT and FraudGPT). One particularly concerning trend in phishing attacks is the rise of smishing (SMS phishing). This attack vector has become more prevalent due to the widespread use of mobile devices and the trust associated with text communication, especially when received from a known sender (via spoofing attacks). For instance, in April 2023, Traficom (the Finnish National Cyber Security Centre) reported aggressive smishing campaigns aimed at collecting payment and card information by posing as tax returns, bank messages, and delivery service (OmaPosti) notifications; similar incidents have been observed globally. In order to effectively counter smishing attacks and protect users from falling to them, we propose a novel anti-smishing solution (PhishNet) that employs NLP and open knowledge bases to identify popular persuasion techniques, well-known brands and organizations, and provide contextual analysis to determine the SMS content and intent.

Our anti-smishing solution consists of 5 steps, which are: input preparation, data and feature extraction, URL analysis, content analysis, and decision-making. Briefly, the first step validates the input and ensures the presence of the relevant information while the following step extracts the desired information and features. Next, if the URL is fresh and unknown to our security cloud, this anti-smishing solution conducts a comprehensive analysis of the content of the message using various NLP and machine learning (ML) techniques. We also leverage WikiData to enrich found named entities. Based on all the analyses and present facts, a final decision is reached through an ensemble model, alongside logical conditions. If needed, the content of the phishing page is assessed using ML models.

We rigorously evaluated the solution against a private dataset of real-world benign and smishing messages, where it achieved high accuracy while maintaining a low false positive rate. These results indicate that it is a robust and reliable defense mechanism against the escalating threat of smishing attacks, and utilizing it will greatly protect individuals and organizations against such attacks while preserving user privacy and ensuring minimal disruption to legitimate SMS communications. In this sharing, I will talk about the details of each step of the process, how we evaluated our solution and its effectiveness along with how we continuously improve it.

Dr. Khalid Alnajjar

Khalid Alnajjar meticulously analyzes threat data and develops AI models at F-Secure to enhance security against emerging threats, particularly in threat detection and mitigation. Holding a doctorate and postdoctoral tenure in NLP and AI, along with a robust cybersecurity background, he demonstrates significant prowess in both academic and professional realms. With over a decade of experience in AI and software development, Khalid has a proven track record highlighted by notable publications in prestigious conferences and journals, and leadership in innovative AI projects from inception to completion. His expertise also extends to MBA principles, leadership, and management, showcasing a well-rounded mastery crucial for advancing AI solutions and fostering a secure digital environment.