<— Back

Adaptive File Analyzer: NLP combined with Heuristic analysis to detect malicious email attachments.

The key finding of the 2023 Verizon Data Breach Investigations Report (DBIR) was that email attachment is a top malware delivery vector. Email Malware attachment is usually convicted using static or dynamic analysis. Instead of keeping the payload in one file, which is attached to an email, most of the malware employs the technique of spreading the payload and making the attack a multi-stage attack. The first stage, which is attached to the email, is the downloader or dropper. The second or the subsequent stage of malware carries a malicious payload.

Let’s take an example of HTML smuggling SHA256 [1], a multi-stage payload. The initial payload has obfuscated script, which uses methods like setTimeout() and debugger identification making dynamic analysis harder. The HTML file has an embedded ZIP file which is encoded in base64. The ZIP file is password protected, making it challenging for dynamic analysis to extract its contents. The ZIP file contains an ISO image. When an ISO image is mounted, it shows an LNK file that executes the JS file, which in turn drops DLL. The DLL initiates a ping command to check the internet’s availability and injects itself into the Windows Error Manager.

Detecting multistage malware challenges static and dynamic analysis since not only it requires capturing every stage of downloader and dropper, but also malware employs evasion techniques, such as extended sleep calls, checking for the debugger, lack of proper environment, etc., [2], which are commonly used to avoid capturing malware behavior and evading dynamic analysis.

We designed an adaptive file analyzer to solve the problem of detecting multi-stage malware without capturing every stage of malware for non-PE file formats usually seen in email traffic, such as HTML, OLE, Archives, PDF, etc. In the first part of the presentation, we share the details of the document modeling used by the adaptive file analyzer to understand the contexts under which emails with attachments are sent by threat actors. Once the context under which the email has been sent is computed, lightweight scanning of the file attached to the email, or the first stage of malware is done. In the second part of the presentation, we dive into the details of the correlation engine, which takes as inputs the context of emails from document modeling and correlates with the results of lightweight scanning of files to determine if the attachment is malicious or benign.

In the last part of the presentation, we share the results of the Adaptive file analyzer on the actual customer traffic. An adaptive file analyzer provides an inherent advantage of using the context under which the email was sent, combined with the lightweight scanning of the first stage to determine if the attachment is malicious or benign without analyzing the second and subsequent stages of malware.

Mr. Kalpesh Mantri

Kalpesh Mantri joined Cisco in 2022 as a Security Research Engineer for Talos. He has accumulated over a decade of experience in the field of Cyber Security. In his current research, he leads the way in conducting forward-thinking research projects and creating innovative prototypes on the investigation of email threats with a particular focus on malspams landscape.

Prior to joining Cisco, Kalpesh worked as a Senior Malware Analyst and Security Software Developer focusing on malware reversing, threat hunting and detection techniques as well as APT attack investigations. Kalpesh aided authorities by uncovering many critical APT operations including notable ‘Operation SideCopy’ and ‘Operation HoneyTrap’ that target defence sectors. Kalpesh is very active in the cybersecurity community and he regularly presents at various security conferences. Some of his previous conference presentations include Virus Bulletin, AVAR and CARO Workshop events.


Mr. Abhishek Singh

Abhishek Singh is a security R&D leader with 15+ years of experience, passion, and a proven track record of driving research and threat detection engineering, which solves complex problems and results in a winning technology leading to revenue gains at Cisco, FireEye and Microsoft. He holds 36  (approved/pending) patents, has authored 17 research papers, seven technical white papers, and contributed to three books. Patents and papers detail work in algorithms, analytics, machine learning-based approaches to detect advanced threats, and architecture of technologies such as the virtual machine-based approach for threat analysis, EDR, RASP, DAST, Active Defense (Deception), email, web and IPS.

Many algorithms and preventive features which Abhishek has designed are key concepts used in technologies like RASP and Active Defense (Deception). His notable recognitions include the following:

  • 2019 Reboot Leadership Award (Innovators Category): SC Media
  • Shortlisted for Virus Bulletin’s 2018 Péter Szőr Award
  • Cyber Security Professional of the Year – North America (Silver Winner) Cyber Security Excellence Awards 2020