Biblio
Modern JavaScript applications extensively depend on third-party libraries. Especially for the Node.js platform, vulnerabilities can have severe consequences to the security of applications, resulting in, e.g., cross-site scripting and command injection attacks. Existing static analysis tools that have been developed to automatically detect such issues are either too coarse-grained, looking only at package dependency structure while ignoring dataflow, or rely on manually written taint specifications for the most popular libraries to ensure analysis scalability. In this work, we propose a technique for automatically extracting taint specifications for JavaScript libraries, based on a dynamic analysis that leverages the existing test suites of the libraries and their available clients in the npm repository. Due to the dynamic nature of JavaScript, mapping observations from dynamic analysis to taint specifications that fit into a static analysis is non-trivial. Our main insight is that this challenge can be addressed by a combination of an access path mechanism that identifies entry and exit points, and the use of membranes around the libraries of interest. We show that our approach is effective at inferring useful taint specifications at scale. Our prototype tool automatically extracts 146 additional taint sinks and 7 840 propagation summaries spanning 1 393 npm modules. By integrating the extracted specifications into a commercial, state-of-the-art static analysis, 136 new alerts are produced, many of which correspond to likely security vulnerabilities. Moreover, many important specifications that were originally manually written are among the ones that our tool can now extract automatically.
Mobile and IoT operating systems–and their ensuing software updates–are usually distributed as binary files. Given that these binary files are commonly closed source, users or businesses who want to assess the security of the software need to rely on reverse engineering. Further, verifying the correct application of the latest software patches in a given binary is an open problem. The regular application of software patches is a central pillar for improving mobile and IoT device security. This requires developers, integrators, and vendors to propagate patches to all affected devices in a timely and coordinated fashion. In practice, vendors follow different and sometimes improper security update agendas for both mobile and IoT products. Moreover, previous studies revealed the existence of a hidden patch gap: several vendors falsely reported that they patched vulnerabilities. Therefore, techniques to verify whether vulnerabilities have been patched or not in a given binary are essential. Deep learning approaches have shown to be promising for static binary analyses with respect to inferring binary similarity as well as vulnerability detection. However, these approaches fail to capture the dynamic behavior of these systems, and, as a result, they may inundate the analysis with false positives when performing vulnerability discovery in the wild. In particular, they cannot capture the fine-grained characteristics necessary to distinguish whether a vulnerability has been patched or not. In this paper, we present PATCHECKO, a vulnerability and patch presence detection framework for executable binaries. PATCHECKO relies on a hybrid, cross-platform binary code similarity analysis that combines deep learning-based static binary analysis with dynamic binary analysis. PATCHECKO does not require access to the source code of the target binary nor that of vulnerable functions. We evaluate PATCHECKO on the most recent Google Pixel 2 smartphone and the Android Things IoT firmware images, within which 25 known CVE vulnerabilities have been previously reported and patched. Our deep learning model shows a vulnerability detection accuracy of over 93%. We further prune the candidates found by the deep learning stage–which includes false positives–via dynamic binary analysis. Consequently, PATCHECKO successfully identifies the correct matches among the candidate functions in the top 3 ranked outcomes 100% of the time. Furthermore, PATCHECKO's differential engine distinguishes between functions that are still vulnerable and those that are patched with an accuracy of 96%.
This paper presents TrustSign, a novel, trusted automatic malware signature generation method based on high-level deep features transferred from a VGG-19 neural network model pre-trained on the ImageNet dataset. While traditional automatic malware signature generation techniques rely on static or dynamic analysis of the malware's executable, our method overcomes the limitations associated with these techniques by producing signatures based on the presence of the malicious process in the volatile memory. Signatures generated using TrustSign well represent the real malware behavior during runtime. By leveraging the cloud's virtualization technology, TrustSign analyzes the malicious process in a trusted manner, since the malware is unaware and cannot interfere with the inspection procedure. Additionally, by removing the dependency on the malware's executable, our method is capable of signing fileless malware. Thus, we focus our research on in-browser cryptojacking attacks, which current antivirus solutions have difficulty to detect. However, TrustSign is not limited to cryptojacking attacks, as our evaluation included various ransomware samples. TrustSign's signature generation process does not require feature engineering or any additional model training, and it is done in a completely unsupervised manner, obviating the need for a human expert. Therefore, our method has the advantage of dramatically reducing signature generation and distribution time. The results of our experimental evaluation demonstrate TrustSign's ability to generate signatures invariant to the process state over time. By using the signatures generated by TrustSign as input for various supervised classifiers, we achieved 99.5% classification accuracy.
Cryptojacking is the permissionless use of a target device to covertly mine cryptocurrencies. With cryptojacking attackers use malicious JavaScript codes to force web browsers into solving proof-of-work puzzles, thus making money by exploiting resources of the website visitors. To understand and counter such attacks, we systematically analyze the static, dynamic, and economic aspects of in-browser cryptojacking. For static analysis, we perform content-, currency-, and code-based categorization of cryptojacking samples to 1) measure their distribution across websites, 2) highlight their platform affinities, and 3) study their code complexities. We apply unsupervised learning to distinguish cryptojacking scripts from benign and other malicious JavaScript samples with 96.4% accuracy. For dynamic analysis, we analyze the effect of cryptojacking on critical system resources, such as CPU and battery usage. Additionally, we perform web browser fingerprinting to analyze the information exchange between the victim node and the dropzone cryptojacking server. We also build an analytical model to empirically evaluate the feasibility of cryptojacking as an alternative to online advertisement. Our results show a large negative profit and loss gap, indicating that the model is economically impractical. Finally, by leveraging insights from our analyses, we build countermeasures for in-browser cryptojacking that improve upon the existing remedies.
Java is a safe programming language by providing bytecode verification and enforcing memory protection. For instance, programmers cannot directly access the memory but have to use object references. Yet, the Java runtime provides an Unsafe API as a backdoor for the developers to access the low- level system code. Whereas the Unsafe API is designed to be used by the Java core library, a growing community of third-party libraries use it to achieve high performance. The Unsafe API is powerful, but dangerous, which leads to data corruption, resource leaks and difficult-to-diagnose JVM crash if used improperly. In this work, we study the Unsafe crash patterns and propose a memory checker to enforce memory safety, thus avoiding the JVM crash caused by the misuse of the Unsafe API at the bytecode level. We evaluate our technique on real crash cases from the openJDK bug system and real-world applications from AJDK. Our tool reduces the efforts from several days to a few minutes for the developers to diagnose the Unsafe related crashes. We also evaluate the runtime overhead of our tool on projects using intensive Unsafe operations, and the result shows that our tool causes a negligible perturbation to the execution of the applications.
Ransomware is currently one of the most significant cyberthreats to both national infrastructure and the individual, often requiring severe treatment as an antidote. Triaging ran-somware based on its similarity with well-known ransomware samples is an imperative preliminary step in preventing a ransomware pandemic. Selecting the most appropriate triaging method can improve the precision of further static and dynamic analysis in addition to saving significant t ime a nd e ffort. Currently, the most popular and proven triaging methods are fuzzy hashing, import hashing and YARA rules, which can ascertain whether, or to what degree, two ransomware samples are similar to each other. However, the mechanisms of these three methods are quite different and their comparative assessment is difficult. Therefore, this paper presents an evaluation of these three methods for triaging the four most pertinent ransomware categories WannaCry, Locky, Cerber and CryptoWall. It evaluates their triaging performance and run-time system performance, highlighting the limitations of each method.
In this paper, we consider one of the approaches to the study of the characteristics of an information system that is under the influence of various factors, and their management using neural networks and wavelet transforms based on determining the relationship between the modified state of the information system and the possibility of dynamic analysis of effects. At the same time, the process of influencing the information system includes the following components: impact on the components providing the functions of the information system; determination of the result of exposure; analysis of the result of exposure; response to the result of exposure. As an input signal, the characteristics of the means that affect are taken. The system includes an adaptive response unit, the input of which receives signals about the prerequisites for changes, and at the output, this unit generates signals for the inclusion of appropriate means to eliminate or compensate for these prerequisites or directly the changes in the information system.
Dynamic data race detectors are valuable tools for testing and validating concurrent software, but to achieve good performance they are typically implemented using sophisticated concurrent algorithms. Thus, they are ironically prone to the exact same kind of concurrency bugs they are designed to detect. To address these problems, we have developed VerifiedFT, a clean slate redesign of the FastTrack race detector [19]. The VerifiedFT analysis provides the same precision guarantee as FastTrack, but is simpler to implement correctly and efficiently, enabling us to mechanically verify an implementation of its core algorithm using CIVL [27]. Moreover, VerifiedFT provides these correctness guarantees without sacrificing any performance over current state-of-the-art (but complex and unverified) FastTrack implementations for Java.
Malware detection is an indispensable factor in security of internet oriented machines. The combinations of different features are used for dynamic malware analysis. The different combinations are generated from APIs, Summary Information, DLLs and Registry Keys Changed. Cuckoo sandbox is used for dynamic malware analysis, which is customizable, and provide good accuracy. More than 2300 features are extracted from dynamic analysis of malware and 92 features are extracted statically from binary malware using PEFILE. Static features are extracted from 39000 malicious binaries and 10000 benign files. Dynamically 800 benign files and 2200 malware files are analyzed in Cuckoo Sandbox and 2300 features are extracted. The accuracy of dynamic malware analysis is 94.64% while static analysis accuracy is 99.36%. The dynamic malware analysis is not effective due to tricky and intelligent behaviours of malwares. The dynamic analysis has some limitations due to controlled network behavior and it cannot be analyzed completely due to limited access of network.
The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.
De-anonymizing the authors of anonymous code (i.e., code stylometry) entails significant privacy and security implications. Most existing code stylometry methods solely rely on static (e.g., lexical, layout, and syntactic) features extracted from source code, while neglecting its key difference from regular text – it is executable! In this paper, we present Sundae, a novel code de-anonymization framework that integrates both static and dynamic stylometry analysis. Compared with the existing solutions, Sundae departs in significant ways: (i) it requires much less number of static, hand-crafted features; (ii) it requires much less labeled data for training; and (iii) it can be readily extended to new programmers once their stylometry information becomes available Through extensive evaluation on benchmark datasets, we demonstrate that Sundae delivers strong empirical performance. For example, under the setting of 229 programmers and 9 problems, it outperforms the state-of-art method by a margin of 45.65% on Python code de-anonymization. The empirical results highlight the integration of static and dynamic analysis as a promising direction for code stylometry research.