Visible to the public Malware Defense via Download Provenance Classification


Modern malware developers make extensive use of sophisticated obfuscation tools, causing a steady decline in the detection capabilities of anti-virus (AV) file scanners. This motivates the need for new ways to detect malware without relying on the inspection of a file's content. As most modern malware are distributed through network downloads, we should aim to complement AV scanners with systems that detect malware files based on where they come from, rather than only considering how they look like.

To this end, we present a novel system that is able to detect malware downloads based on their provenance. Currently, our system monitors all HTTP traffic in a large academic network and identifies if a Windows portable executable (PE) file is being downloaded by a network host. We reconstruct and copy all downloaded PE files off the network in real time, and store them along with information about the URL from which they were downloaded, the server host name and IP address, the referrer associated with the HTTP request, etc. We then employ multiple AV scanners to identify known malware and benign PE files, and use this dataset as the ground truth to train a provenance classifier. In essence, the provenance classifier learns to identify differences in the provenance characteristics of malware and benign downloads. After training, the resulting classifier can then be deployed to detect PE file downloads that may not be detected by AV scanners, but can be regarded as malicious with high confidence due to their download provenance characteristics. Results from our preliminary experiments show great promise, with the provenance classifier achieving more than 96% true positives at a false positive rate of less than 0.1%.

Award ID: 1149051

Creative Commons 2.5

Other available formats:

Malware Defense via Download Provenance Classification
Switch to experimental viewer