Applying Deep Learning for PE-malware detection

Ajay Shinde & Krishna Sawe, Quick Heal

At the root of most malware attacks lie the PE files which essentially cause the resultant damage. A typical attack initiates with download of a PE file via email, website or other commonly used mechanisms. Traditional methods of detecting such malicious PE files range from signature-based static methods to behavior-based dynamic methods.
In the recent past, Machine Learning (ML) based detection methods have been gaining traction. This process includes sample collection, finding valuable features from the samples, performing feature selection, and then training the machine learning model accordingly. Feature generation and selection is a task which requires human intelligence and efforts. Malware authors are on an ongoing drive to devise innovative and sophisticated ideas to evade these detection mechanisms. PE files can easily be obfuscated or mutated to look like a clean file and yet retain the ability to perform malicious activities. This makes it more challenging even for experts to generate new features & perform ML training on newly generated features – which in turn increases the response time. This makes millions of PE malware left undetected every day.

In this work, we address this challenge of adding new & optimized mechanisms to generically detect malicious PE files by using Transfer Learning aspect of Deep Learning (DL). Image Classification using DL has matured in recent times and has potential to cause a meaningful impact in this field. There are a variety of models available for computer vision, face recognition, object classification etc. and many more such models based on image classification algorithms are expected to emerge in near future. One of the major advantages of DL is that feature selection is not required for training and hence it minimizes human intervention.

Transfer learning focuses on capturing the essence of knowledge gained while solving one problem and applying it for a different, but related problem. In our approach, for PE file classification, we are targeting DL models like VGG16 which are already trained for various image classification problems. These models use multi-layered Convolutional Neural Network (CNN) with good accuracy. We have represented a large set of PE files (both benign & malicious) into images and generated new models by using these images and weights of already trained models. After combining results of Supervised ML classifiers like Random Forest / Support Vector Machines (SVM) and CNN classifiers in layered detection mechanism, we have achieved very encouraging results with good efficiency.

Our evaluation results show that CNN based transfer learning models can be leveraged and used to extract improved detection of malicious PE files. Our detectors provide high recall values while maintaining a very low false positive rate and are potent additions to our arsenal for vending off the malicious actors.

Ajay Shinde

Krishna Shriram Sawe

1. A young Professional keen on developing Career in field of Machine Learning with Can-do and Proactive attitude.
2. Working at Quick Heal Technologies as Associate Security Researcher from more than 2 years.
3. 2+ year of Experience in performing Malware Reverse Engineering and developing Machine-Learning, Deep-Learning Models for Malware Detection.
4. Possessing Good Knowledge of Programming languages such as Python, C++, Java etc.
5. Good understanding of Machine Learning, Deep Learning, Image Processing and Text Processing.
6. Very good understanding of popular Malware families and their Working. Knowledge of performing Static Analysis, Dynamic Analysis and Malware Debugging.
7. Excellent Analytical, Problem solving, Technical and Communication Skills with ablity to work as best Team Player.

The Dynamic Security Ecosystem
Other Topics