Applying Deep Learning for PE-malware detection

Ajay Shinde & Krishna Sawe, Quick Heal

At the root of most malware attacks lie the PE files which essentially cause the resultant damage. A typical attack initiates with download of a PE file via email, website or other commonly used mechanisms. Traditional methods of detecting such malicious PE files range from signature-based static methods to behavior-based dynamic methods.
In the recent past, Machine Learning (ML) based detection methods have been gaining traction. This process includes sample collection, finding valuable features from the samples, performing feature selection, and then training the machine learning model accordingly. Feature generation and selection is a task which requires human intelligence and efforts. Malware authors are on an ongoing drive to devise innovative and sophisticated ideas to evade these detection mechanisms. PE files can easily be obfuscated or mutated to look like a clean file and yet retain the ability to perform malicious activities. This makes it more challenging even for experts to generate new features & perform ML training on newly generated features – which in turn increases the response time. This makes millions of PE malware left undetected every day.

In this work, we address this challenge of adding new & optimized mechanisms to generically detect malicious PE files by using Transfer Learning aspect of Deep Learning (DL). Image Classification using DL has matured in recent times and has potential to cause a meaningful impact in this field. There are a variety of models available for computer vision, face recognition, object classification etc. and many more such models based on image classification algorithms are expected to emerge in near future. One of the major advantages of DL is that feature selection is not required for training and hence it minimizes human intervention.

Transfer learning focuses on capturing the essence of knowledge gained while solving one problem and applying it for a different, but related problem. In our approach, for PE file classification, we are targeting DL models like VGG16 which are already trained for various image classification problems. These models use multi-layered Convolutional Neural Network (CNN) with good accuracy. We have represented a large set of PE files (both benign & malicious) into images and generated new models by using these images and weights of already trained models. After combining results of Supervised ML classifiers like Random Forest / Support Vector Machines (SVM) and CNN classifiers in layered detection mechanism, we have achieved very encouraging results with good efficiency.

Our evaluation results show that CNN based transfer learning models can be leveraged and used to extract improved detection of malicious PE files. Our detectors provide high recall values while maintaining a very low false positive rate and are potent additions to our arsenal for vending off the malicious actors.

Ajay Shinde

A senior scan engine developer in Quick Heal Security Labs.
He has 6 years of experience in reverse engineering, malware analysis, and development of various binary analysis tools.
He is now leading our Machine Learning efforts for fighting today’s complex malware by using various Machine Learning & Deep Learning algorithms.

Sandeep Pimpale

Currently working as technical lead for cloud based security product.
Having 10 years of experience in security product development in Quick Heal.
Worked on developing Windows, Mac and Android scan engine.
Having experience on AWS services and big data technologies.