Visible to the public Towards Efficient Malware Detection and Classification using Multilayered Random Forest Ensemble Technique

TitleTowards Efficient Malware Detection and Classification using Multilayered Random Forest Ensemble Technique
Publication TypeConference Paper
Year of Publication2019
AuthorsRoseline, S. Abijah, Sasisri, A. D., Geetha, S., Balasubramanian, C.
Conference Name2019 International Carnahan Conference on Security Technology (ICCST)
ISBN Number978-1-7281-1576-4
KeywordsAdaptation models, Computational modeling, cyber security, deep learning models, deep neural networks, Ensemble forest, feature extraction, Forestry, Gray-scale, Human Behavior, hybrid model, invasive software, learning (artificial intelligence), Malware, malware authors, malware classification, malware detection, malware images, malware patterns, Metrics, Microsoft Windows, multilayered random forest ensemble technique, pattern classification, privacy, pubcrawl, resilience, Resiliency, traditional malware, Vision-based malware analysis

The exponential growth rate of malware causes significant security concern in this digital era to computer users, private and government organizations. Traditional malware detection methods employ static and dynamic analysis, which are ineffective in identifying unknown malware. Malware authors develop new malware by using polymorphic and evasion techniques on existing malware and escape detection. Newly arriving malware are variants of existing malware and their patterns can be analyzed using the vision-based method. Malware patterns are visualized as images and their features are characterized. The alternative generation of class vectors and feature vectors using ensemble forests in multiple sequential layers is performed for classifying malware. This paper proposes a hybrid stacked multilayered ensembling approach which is robust and efficient than deep learning models. The proposed model outperforms the machine learning and deep learning models with an accuracy of 98.91%. The proposed system works well for small-scale and large-scale data since its adaptive nature of setting parameters (number of sequential levels) automatically. It is computationally efficient in terms of resources and time. The method uses very fewer hyper-parameters compared to deep neural networks.

Citation Keyroseline_towards_2019