Visible to the public Large-Scale Identification of Malicious Singleton Files

TitleLarge-Scale Identification of Malicious Singleton Files
Publication TypeConference Paper
Year of Publication2017
AuthorsLi, Bo, Roundy, Kevin, Gates, Chris, Vorobeychik, Yevgeniy
Conference NameProceedings of the Seventh ACM on Conference on Data and Application Security and Privacy
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4523-1
Keywordscomposability, cyber physical systems, False Data Detection, Human Behavior, large-scale malware detection, machine learning, pubcrawl, resilience, Resiliency, security, singleton files

We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of benign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1.4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to important classes of automated evasion attacks.

Citation Keyli_large-scale_2017