Harnessing Language Models for Detection of Evasive Malicious Email Attachments

The HP Q3 2023 Threat Report highlights that 80% of malware is delivered via email, with 12% bypassing detection technologies to reach endpoints. The 2023 Verizon Data Breach Report also indicates that 35% of ransomware infections originated from email. Two primary factors contribute to evasion: the volume and cost challenges of sandbox scanning, which lead to selective scanning and inadvertent bypasses, and the limitations of detection technologies like signature-based methods, sandbox and machine learning, which rely on the final malicious payload for decision-making. However, evasive multi-stage malware and phishing URLs often lack malicious payload when analyzed by these technologies. Additionally, generative AI tools like FraudGPT and WormGPT facilitate the creation of new malicious payloads and phishing pages, further enabling malware to evade defenses and reach endpoints.

To address the challenge of detecting evasive malware and malicious URLs without requiring the final malicious payload, we will share the detailed design of an Interpretative Processor Engine (IPE) specifically designed to detect malicious attachments, URLs, and identity-based attacks by understanding the semantics of the email and leveraging them as features instead of relying on the final malicious payload for its decision making. The IPE harnesses a layered approach employing supervised and unsupervised AI-based models leveraging transformer-based architecture to derive deeper meaning embedded within the email’s body, text in the attachment, and subject.

We will first dive into the details of the semantics commonly used by threat actors to deliver malicious attachments, which lays the foundation of our approach. These details were derived from the analysis of a dataset of malicious emails. The text from the body of the email was extracted to create embeddings. UMAP aided in dimensionality reduction, and clusters were generated based on their density in the high-dimensional embedding space. These clusters represent different types of semantics employed by threat actors to deliver malicious attachments.

In the presentation we will share the details of our approach in which every incoming email undergoes zero-shot semantic analysis using Llama-3 to determine if it contains semantics typically used by the threat actors to deliver malicious attachments. Additionally, email’s body is further analyzed for secondary semantics, including tone, sentiment, and other nuanced elements. Once semantics are identified, hierarchical topic modeling is then applied to uncover the relationships between various topics.

Primary and secondary semantics from the email, along with hierarchical topic modeling, deep file parsing results of attachments, and email headers, are sent to the expert system. This system combines the information using rules to determine if the email (with attachments or URLs) is malicious or benign.

This comprehensive approach identifies malicious content without depending on the final payload, which is crucial for any detection technology.

Our presentation will show how LLM models can effectively detect evasive malicious attachments without depending on the analysis of the malicious payload, which typically occurs in the later stages of attachment analysis. Our approach is exemplified by our success in defending against real-world threats, including HTML smuggling campaigns, Microsoft credential phishing scams, MS Office remote template injection attacks and even new APT attack targeting a defense-related organization.

Using insights from our case studies, our presentation will detail an APT attack on a defense-related organization and explain how leveraging semantic analysis as a feature set successfully detects such attacks. The presentation will conclude with results observed from the production traffic.

Abhishek Singh – InceptionCyber.ai

Abhishek Singh is the Founder and CTO of InceptionCyber.ai. He is a security R&D leader with 15+ years of experience, passion, and a proven track record of driving AI and Cyber Security Research and Engineering, which solves complex problems, resulting in a winning technology leading to revenue gains at Cisco, FireEye, and Microsoft. He holds 39 patents, has authored 17 research papers, seven technical white papers, contributed to three books and presented his research at Virus Bulletin 2023, 2020, 2019, Black Hat 2022, 2013, RSA 2016, CansecWest 2009, AVAR 2023, ACSA.

Patents and papers detail work in algorithms, generative and predictive AI-based approaches to detect advanced threats, and architecture of technologies such as the virtual machine-based approach for threat analysis, EDR, RASP, DAST, Active Defense (Deception), email, web, and IPS.

Many algorithms and preventive features Abhishek has designed are key concepts in technologies like RASP and Active Defense (Deception). His notable recognitions include the following:

2019 Reboot Leadership Award (Innovators Category): SC Media
Shortlisted for Virus Bulletin’s 2018 Péter Szőr Award
Cyber Security Professional of the Year – North America (Silver Winner) Cyber Security Excellence Awards 2020

He holds a Double Master of Science in Computer Science and Information Security from the prestigious College of Computing, Georgia Tech, and a B.Tech in Electrical Engineering from the prestigious Indian Institute of Technology, IIT-BHU. He has also completed a Master of Engineering Leadership (ELPP++) from UC Berkeley and Postgraduate AI and Deep Learning Courses from the Indian Institute of Technology, IIT-Guwahati.

Kalpesh Mantri – InceptionCyber.ai

Kalpesh Mantri is the Founding Principal Research Engineer at InceptionCyber.ai, bringing over 12 years of expertise in Cybersecurity Research and Development. He spearheads pioneering research initiatives and develops innovative, patented solutions with a focus on investigating email threats, particularly within the phishing and malspam landscape.

Before joining InceptionCyber.ai, Kalpesh held the position of Senior Security Engineer, where he specialized in malware reverse engineering, threat hunting, and advanced detection techniques. He played a pivotal role in investigating APT (Advanced Persistent Threat) attacks, notably contributing to the exposure of critical operations such as ‘Operation SideCopy’ and ‘Operation HoneyTrap,’ which targeted defense sectors.

Kalpesh is an active member of the cybersecurity community and a regular speaker at prominent security conferences. His presentations have been featured at Virus Bulletin (2020), AVAR (2023, 2016, 2015), and CARO Workshop (2020, 2017).

He holds both Bachelor’s and Master’s degrees in Computer Science and has completed a Professional Certificate Programme in Applied Data Science and Machine Learning from the Indian Institute of Management Kozhikode (IIM Kozhikode).