Visible to the public Polymorphic Worms, 2014

SoS Newsletter- Advanced Book Block


SoS Logo

Polymorphic Worms, 2014


Polymorphic worms pose a serious threat to Internet security with their ability to rapidly propagate, exploit unknown vulnerabilities, and change their own representations on each new infection or encrypt their payloads using a different key per infection. They have many variations in the signatures of the same worm making their fingerprinting very difficult. Signature-based defenses and traditional security layers miss these stealthy and persistent threats. The research presented here identifies alternative methods for identifying and responding to these worms. All citations are from 2014.

Ali Zand, Giovanni Vigna, Xifeng Yan, Christopher Kruegel. “Extracting Probable Command and Control Signatures for Detecting Botnets.” SAC '14 Proceedings of the 29th Annual ACM Symposium on Applied Computing, March, 2014, Pages 1657-1662. doi:10.1145/2554850.2554896
Abstract: Botnets, which are networks of compromised machines under the control of a single malicious entity, are a serious threat to online security. The fact that botnets, by definition, receive their commands from a single entity can be leveraged to fight them. To this end, one requires techniques that can detect command and control (C&C) traffic, as well as the servers that host C&C services. Given the knowledge of a C&C server's IP address, one can use this information to detect all hosts that attempt to contact such a server, and subsequently disinfect, disable, or block the infected machines. This information can also be used by law enforcement to take down the C&C server.  In this paper, we present a new botnet C&C signature extraction approach that can be used to find C&C communication in traffic generated by executing malware samples in a dynamic analysis system. This approach works in two steps. First, we extract all frequent strings seen in the network traffic. Second, we use a function that assigns a score to each string. This score represents the likelihood that the string is indicative of C&C traffic. This function allows us to rank strings and focus our attention on those that likely represent good C&C signatures. We apply our technique to almost 2.6 million network connections produced by running more than 1.4 million malware samples. Using our technique, we were able to automatically extract a set of signatures that are able to identify C&C traffic. Furthermore, we compared our signatures with those used by existing tools, such as Snort and BotHunter.
Keywords: (not provided) (ID#: 15-5967)


Shahid Alam, Issa Traore, Ibrahim Sogukpinar. “Current Trends and the Future of Metamorphic Malware Detection.”  SIN '14 Proceedings of the 7th International Conference on Security of Information and Networks, September 2014, Pages 411. doi:10.1145/2659651.2659670
Abstract: Dynamic binary obfuscation or metamorphism is a technique where a malware never keeps the same sequence of opcodes in the memory. This stealthy mutation technique helps a malware evade detection by today's signature-based anti-malware programs. This paper analyzes the current trends, provides future directions and reasons about some of the basic characteristics of a system for providing real-time detection of metamorphic malware. Our emphasis is on the most recent advancements and the potentials available in metamorphic malware detection, so we only cover some of the major academic research efforts carried out, including and after, the year 2006. The paper not only serves as a collection of recent references and information for easy comparison and analysis, but also as a motivation for improving the current and developing new techniques for metamorphic malware detection.
Keywords: End point security, Malware detection, Metamorphic malware, Obfuscations (ID#: 15-5968)


Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee, Alok Choudhary. “Spam ain't as Diverse as It Seems: Throttling OSN Spam with Templates Underneath.” ACSAC '14 Proceedings of the 30th Annual Computer Security Applications Conference, December 2014, Pages 76-85. doi:10.1145/2664243.2664251
Abstract: In online social networks (OSNs), spam originating from friends and acquaintances not only reduces the joy of Internet surfing but also causes damage to less security-savvy users. Prior countermeasures combat OSN spam from different angles. Due to the diversity of spam, there is hardly any existing method that can independently detect the majority or most of OSN spam. In this paper, we empirically analyze the textual pattern of a large collection of OSN spam. An inspiring finding is that the majority (63.0%) of the collected spam is generated with underlying templates. We therefore propose extracting templates of spam detected by existing methods and then matching messages against the templates toward accurate and fast spam detection. We implement this insight through Tangram, an OSN spam filtering system that performs online inspection on the stream of user-generated messages. Tangram automatically divides OSN spam into segments and uses the segments to construct templates to filter future spam. Experimental results show that Tangram is highly accurate and can rapidly generate templates to throttle newly emerged campaigns. Specifically, Tangram detects the most prevalent template-based spam with 95.7% true positive rate, whereas the existing template generation approach detects only 32.3%. The integration of Tangram and its auxiliary spam filter achieves an overall accuracy of 85.4% true positive rate and 0.33% false positive rate.
Keywords: online social networks, spam, spam campaigns (ID#: 15-5969)


Blake Anderson, Curtis Storlie, Micah Yates, Aaron McPhall. “Automating Reverse Engineering with Machine Learning Techniques.” AISec '14 Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, November 2014, Pages 103-112. doi:10.1145/2666652.2666665
Abstract: Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.
Keywords: computer security, gaussian processes, machine learning, malware, multiple kernel learning, support vector machines (ID#: 15-5970)


Yiming Jing, Ziming Zhao, Gail-Joon Ahn, Hongxin Hu. “Morpheus: Automatically Generating Heuristics to Detect Android Emulators.” ACSAC '14 Proceedings of the 30th Annual Computer Security Applications Conference, December 2014, Pages 216-225. doi:10.1145/2664243.2664250
Abstract: Emulator-based dynamic analysis has been widely deployed in Android application stores. While it has been proven effective in vetting applications on a large scale, it can be detected and evaded by recent Android malware strains that carry detection heuristics. Using such heuristics, an application can check the presence or contents of certain artifacts and infer the presence of emulators. However, there exists little work that systematically discovers those heuristics that would be eventually helpful to prevent malicious applications from bypassing emulator-based analysis. To cope with this challenge, we propose a framework called Morpheus that automatically generates such heuristics. Morpheus leverages our insight that an effective detection heuristic must exploit discrepancies observable by an application. To this end, Morpheus analyzes the application sandbox and retrieves observable artifacts from both Android emulators and real devices. Afterwards, Morpheus further analyzes the retrieved artifacts to extract and rank detection heuristics. The evaluation of our proof-of-concept implementation of Morpheus reveals more than 10,000 novel detection heuristics that can be utilized to detect existing emulator-based malware analysis tools. We also discuss the discrepancies in Android emulators and potential countermeasures.
Keywords: Android, emulator, malware (ID#: 15-5971)


Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, Christian Rossow. “Leveraging Semantic Signatures for Bug Search in Binary Programs.” ACSAC '14 Proceedings of the 30th Annual Computer Security Applications Conference, December 2014, Pages 406-415. doi:10.1145/2664243.2664269
Abstract: Software vulnerabilities still constitute a high security risk and there is an ongoing race to patch known bugs. However, especially in closed-source software, there is no straightforward way (in contrast to source code analysis) to find buggy code parts, even if the bug was publicly disclosed. To tackle this problem, we propose a method called Tree Edit Distance Based Equational Matching (TEDEM) to automatically identify binary code regions that are "similar" to code regions containing a reference bug. We aim to find bugs both in the same binary as the reference bug and in completely unrelated binaries (even compiled for different operating systems). Our method even works on proprietary software systems, which lack source code and symbols. The analysis task is split into two phases. In a preprocessing phase, we condense the semantics of a given binary executable by symbolic simplification to make our approach robust against syntactic changes across different binaries. Second, we use tree edit distances as a basic block-centric metric for code similarity. This allows us to find instances of the same bug in different binaries and even spotting its variants (a concept called vulnerability extrapolation). To demonstrate the practical feasibility of the proposed method, we implemented a prototype of TEDEM that can find real-world security bugs across binaries and even across OS boundaries, such as in MS Word and the popular messengers Pidgin (Linux) and Adium (Mac OS).
Keywords: (not provided) (ID#: 15-5972)


Yaniv David, Eran Yahav.; “Tracelet-Based Code Search in Executables.” PLDI '14 Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2014, Pages 349-360. doi:10.1145/2594291.2594343
Abstract: We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.
Keywords: static binary analysis, x86, x86-64 (ID#: 15-5973)


Yinzhi Cao, Xiang Pan, Yan Chen, Jianwei Zhuge. “JShield: Towards Real-Time and Vulnerability-Based Detection of Polluted Drive-By Download Attacks.” ACSAC '14 Proceedings of the 30th Annual Computer Security Applications Conference, December 2014, Pages 466-475. doi:10.1145/2664243.2664256
Abstract: Drive-by download attacks, which exploit vulnerabilities of web browsers to control client computers, have become a major venue for attackers. To detect such attacks, researchers have proposed many approaches such as anomaly-based [22, 23] and vulnerability-based [44, 50] detections. However, anomaly-based approaches are vulnerable to data pollution, and existing vulnerability-based approaches cannot accurately describe the vulnerability condition of all the drive-by download attacks.  In this paper, we propose a vulnerability-based approach, namely JShield, which uses novel opcode vulnerability signature, a deterministic finite automaton (DFA) with a variable pool at opcode level, to match drive-by download vulnerabilities. We investigate all the JavaScript engine vulnerabilities of web browsers from 2009 to 2014, as well as those of portable document files (PDF) readers from 2007 to 2014. JShield is able to match all of those vulnerabilities; furthermore, the overall evaluation shows that JShield is so lightweight that it only adds 2.39 percent of overhead to original execution as the median among top 500 Alexa web sites.
Keywords: (not provided) (ID#: 15-5974)


Smita Naval, Vijay Laxmi, Neha Gupta, Manoj Singh Gaur, Muttukrishnan Rajarajan. “Exploring Worm Behaviors using DTW.” SIN '14 Proceedings of the 7th International Conference on Security of Information and Networks, September 2014, Pages 379. doi:10.1145/2659651.2659737
Abstract: Worms are becoming a potential threat to Internet users across the globe. The financial damages due to computer worms increased significantly in past few years. Analyzing these hazardous worm attacks has become a crucial issue to be addressed. Given the fact that worm analysts would prefer to analyze classes of worms rather than individual files, their task will be significantly reduced. In this paper, we have proposed a dynamic host-based worm categorization approach to segregate worms. These groups indicate that worm samples constitute different behavior according to their infection and anti-detection vectors. Our proposed approach utilizes system-call traces and computes a distance matrix using Dynamic Time Warping (DTW) algorithm to form these groups. In conjunction to that, the proposed approach also discriminates worm and benign executables. The constructed model is further evaluated with unknown instances of real-world worms.
Keywords: Behavior Monitoring, DTW, System-calls (ID#: 15-5975)


Battista Biggio, Konrad Rieck, Davide Ariu, Christian Wressnegger, Igino Corona, Giorgio Giacinto, Fabio Roli. “Poisoning Behavioral Malware Clustering.” AISec '14 Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, November 2014, Pages 27-36. doi:10.1145/2666652.2666666
Abstract: Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.
Keywords: adversarial machine learning, clustering, computer security, malware detection, security evaluation, unsupervised learning (ID#: 15-5976)


Shahid Alam, Ibrahim Sogukpinar, Issa Traore, Yvonne Coady. “In-Cloud Malware Analysis and Detection: State of the Art.” SIN '14 Proceedings of the 7th International Conference on Security of Information and Networks, September 2014, Pages 473. doi:10.1145/2659651.2659730
Abstract: With the advent of Internet of Things, we are facing another wave of malware attacks, that encompass intelligent embedded devices. Because of the limited energy resources, running a complete malware detector on these devices is quite challenging. There is a need to devise new techniques to detect malware on these devices. Malware detection is one of the services that can be provided as an in-cloud service. This paper reviews current such systems, discusses there pros and cons, and recommends an improved in-cloud malware analysis and detection system. We introduce a new three layered hybrid system with a lightweight antimalware engine. These features can provide faster malware detection response time, shield the client from malware and reduce the bandwidth between the client and the cloud, compared to other such systems. The paper serves as a motivation for improving the current and developing new techniques for in-cloud malware analysis and detection system.
Keywords: Cloud computing, In-cloud services, Malware analysis, Malware detection (ID#: 15-5977)


M. Zubair Rafique, Ping Chen, Christophe Huygens, Wouter Joosen. “Evolutionary Algorithms for Classification of Malware Families Through Different Network Behaviors.” GECCO '14 Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, July 2014, Pages 1167-1174. doi:10.1145/2576768.2598238
Abstract: The staggering increase of malware families and their diversity poses a significant threat and creates a compelling need for automatic classification techniques. In this paper, we first analyze the role of network behavior as a powerful technique to automatically classify malware families and their polymorphic variants. Afterwards, we present a framework to efficiently classify malware families by modeling their different network behaviors (such as HTTP, SMTP, UDP, and TCP). We propose protocol-aware and state-space modeling schemes to extract features from malware network behaviors. We analyze the applicability of various evolutionary and non-evolutionary algorithms for our malware family classification framework. To evaluate our framework, we collected a real-world dataset of 6,000 unique and active malware samples belonging to 20 different malware families. We provide a detailed analysis of network behaviors exhibited by these prevalent malware families. The results of our experiments shows that evolutionary algorithms, like sUpervised Classifier System (UCS), can effectively classify malware families through different network behaviors in real-time. To the best of our knowledge, the current work is the first malware classification framework based on evolutionary classifier that uses different network behaviors.
Keywords: machine learning, malware classification, network behaviors (ID#: 15-5978)


Luke Deshotels, Vivek Notani, Arun Lakhotia. “DroidLegacy: Automated Familial Classification of Android Malware.” PPREW'14 Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop, January 2014, Article No. 3. doi:10.1145/2556464.2556467
Abstract: We present an automated method for extracting familial signatures for Android malware, i.e., signatures that identify malware produced by piggybacking potentially different benign applications with the same (or similar) malicious code. The APK classes that constitute malware code in a repackaged application are separated from the benign code and the Android API calls used by the malicious modules are extracted to create a signature. A piggybacked malicious app can be detected by first decomposing it into loosely coupled modules and then matching the Android API calls called by each of the modules against the signatures of the known malware families. Since the signatures are based on Android API calls, they are related to the core malware behavior, and thus are more resilient to obfuscations.  In triage, AV companies need to automatically classify large number of samples so as to optimize assignment of human analysts. They need a system that gives low false negatives even if it is at the cost of higher false positives. Keeping this goal in mind, we fine tuned our system and used standard 10 fold cross validation over a dataset of 1,052 malicious APKs and 48 benign APKs to verify our algorithm. Results show that we have 94% accuracy, 97% precision, and 93% recall when separating benign from malware. We successfully classified our entire malware dataset into 11 families with 98% accuracy, 87% precision, and 94% recall.
Keywords: Android malware, class dependence graphs, familial classification, malware detection, module generation, piggybacked malware, signature generation, static analysis (ID#: 15-5979)


Ashish Saini, Ekta Gandotra, Divya Bansal, Sanjeev Sofat. “Classification of PE Files using Static Analysis.” SIN '14 Proceedings of the 7th International Conference on Security of Information and Networks, September 2014, Pages 429. doi:10.1145/2659651.2659679
Abstract: Malware is one of the most terrible and major security threats facing the Internet today. Anti-malware vendors are challenged to identify, classify and counter new malwares due to the obfuscation techniques being used by malware authors. In this paper, we present a simple, fast and scalable method of differentiating malwares from cleanwares on the basis of features extracted from Windows PE files. The features used in this work are Suspicious Section Count and Function Call Frequency. After automatically extracting features of executables, we use machine learning algorithms available in WEKA library to classify them into malwares and cleanwares. Our experimental results provide an accuracy of over 98% for a data set of 3,087 executable files including 2,460 malwares and 627 cleanwares. Based on the results obtained, we conclude that the Function Call Frequency feature derived from the static analysis method plays a significant role in distinguishing malware files from benign ones.
Keywords: Classification, Machine Learning, Static Malware Analysis (ID#: 15-5980)


Ekta Gandotra, Divya Bansal, Sanjeev Sofat. “Integrated Framework for Classification of Malwares.” SIN '14 Proceedings of the 7th International Conference on Security of Information and Networks, September 2014, Pages 417. doi:10.1145/2659651.2659738
Abstract: Malware is one of the most terrible and major security threats facing the Internet today. It is evolving, becoming more sophisticated and using new ways to target computers and mobile devices. The traditional defences like antivirus softwares typically rely on signature based methods and are unable to detect previously unseen malwares. Machine learning approaches have been adopted to classify malwares based on the features extracted using static or dynamic analysis. Both type of malware analysis have their pros and cons. In this paper, we propose a classification framework which uses integration of both static and dynamic features for distinguishing malwares from clean files. A real world corpus of recent malwares is used to validate the proposed approach. The experimental results, based on a dataset of 998 malwares and 428 cleanware files provide an accuracy of 99.58% indicating that the hybrid approach enhances the accuracy rate of malware detection and classification over the results obtained when these features are considered separately.
Keywords: Classification, Dynamic Analysis, Machine Learning, Malware, Static Analysis (ID#: 15-5981)


Jing Qiu, Babak Yadegari, Brian Johannesmeyer, Saumya Debray, Xiaohong Su. “A Framework for Understanding Dynamic Anti-Analysis Defenses.” PPREW-4 Proceedings of the 4th Program Protection and Reverse Engineering Workshop, December 2014, Article No. 2. doi:10.1145/2689702.2689704
Abstract:  Malicious code often use a variety of anti-analysis and anti-tampering defenses to hinder analysis. Researchers trying to understand the internal logic of the malware have to penetrate these defenses. Existing research on such anti-analysis defenses tends to study them in isolation, thereby failing to see underlying conceptual similarities between different kinds of anti-analysis defenses. This paper proposes an information-flow-based framework that encompasses a wide variety of anti-analysis defenses. We illustrate the utility of our approach using two different instances of this framework: self-checksumming-based anti-tampering defenses and timing-based emulator detection. Our approach can provide insights into the underlying structure of various anti-analysis defenses and thereby help devise techniques for neutralizing them.
Keywords: Anti-analysis Defense, Self-checksumming, Taint analysis, Timing defense (ID#: 15-5983)


Mordechai Guri, Gabi Kedma, Buky Carmeli, Yuval Elovici. “Limiting Access to Unintentionally Leaked Sensitive Documents Using Malware Signatures.” SACMAT '14 Proceedings of the 19th ACM Symposium on Access Control Models and Technologies, June 2014, Pages 129-140. doi:10.1145/2613087.2613103
Abstract: Organizations are repeatedly embarrassed when their sensitive digital documents go public or fall into the hands of adversaries, often as a result of unintentional or inadvertent leakage. Such leakage has been traditionally handled either by preventive means, which are evidently not hermetic, or by punitive measures taken after the main damage has already been done. Yet, the challenge of preventing a leaked file from spreading further among computers and over the Internet is not resolved by existing approaches. This paper presents a novel method, which aims at reducing and limiting the potential damage of a leakage that has already occurred. The main idea is to tag sensitive documents within the organization's boundaries by attaching a benign detectable malware signature (DMS). While the DMS is masked inside the organization, if a tagged document is somehow leaked out of the organization's boundaries, common security services such as Anti-Virus (AV) programs, firewalls or email gateways will detect the file as a real threat and will consequently delete or quarantine it, preventing it from spreading further. This paper discusses various aspects of the DMS, such as signature type and attachment techniques, along with proper design considerations and implementation issues. The proposed method was implemented and successfully tested on various file types including documents, spreadsheets, presentations, images, executable binaries and textual source code. The evaluation results have demonstrated its effectiveness in limiting the spread of leaked documents.
Keywords: anti-virus program, data leakage, detectable malware signature, sensitive document (ID#: 15-5984)


Mu Zhang, Yue Duan, Heng Yin, Zhiruo Zhao. “Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs.” CCS '14 Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, November 2014, Pages 1105-1116. doi:10.1145/2660267.2660359
Abstract: The drastic increase of Android malware has led to a strong interest in developing methods to automate the malware analysis process. Existing automated Android malware detection and classification methods fall into two general categories: 1) signature-based and 2) machine learning-based. Signature-based approaches can be easily evaded by bytecode-level transformation attacks. Prior learning-based works extract features from application syntax, rather than program semantics, and are also subject to evasion. In this paper, we propose a novel semantic-based approach that classifies Android malware via dependency graphs. To battle transformation attacks, we extract a weighted contextual API dependency graph as program semantics to construct feature sets. To fight against malware variants and zero-day malware, we introduce graph similarity metrics to uncover homogeneous application behaviors while tolerating minor implementation differences. We implement a prototype system, DroidSIFT, in 23 thousand lines of Java code. We evaluate our system using 2200 malware samples and 13500 benign samples. Experiments show that our signature detection can correctly label 93\% of malware instances; our anomaly detector is capable of detecting zero-day malware with a low false negative rate (2\%) and an acceptable false positive rate (5.15\%) for a vetting purpose.
Keywords: android, anomaly detection, graph similarity, malware classification, semantics-aware, signature detection (ID#: 15-5985)



Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.