Visible to the public Malware Classification Using Deep Convolutional Neural Networks

TitleMalware Classification Using Deep Convolutional Neural Networks
Publication TypeConference Paper
Year of Publication2018
AuthorsKornish, D., Geary, J., Sansing, V., Ezekiel, S., Pearlstein, L., Njilla, L.
Conference Name2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
Date PublishedOct. 2018
ISBN Number978-1-5386-9306-3
Keywordsaccuracy levels, classification, classifier, convolutional neural nets, convolutional neural network, convolutional neural networks, Correlation, DCNN classification, deep convolutional neural networks, deep learning techniques, feature extraction, Gray-scale, Hamming distance, Human Behavior, image classification, image features, image representation, image type, improved image representation, invasive software, learning (artificial intelligence), learning models, machine learning, Malware, malware binaries, malware classification, malware images, Metrics, Neural Network, object detection, Pattern recognition, privacy, pubcrawl, resilience, Resiliency, support vector machine, support vector machine classifier training, Support vector machines, transfer learning, visual patterns, visualization

In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.

Citation Keykornish_malware_2018