Visible to the public Biblio

Filters: Keyword is Binary Analysis  [Clear All Filters]
Chen, Ligeng, He, Zhongling, Wu, Hao, Xu, Fengyuan, Qian, Yi, Mao, Bing.  2022.  DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). :112–122.
Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).
Li, Ziqing, Feng, Guiling.  2020.  Inter-Language Static Analysis for Android Application Security. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE). :647–650.

The Android application market will conduct various security analysis on each application to predict its potential harm before put it online. Since almost all the static analysis tools can only detect malicious behaviors in the Java layer, more and more malwares try to avoid static analysis by taking the malicious codes to the Native layer. To provide a solution for the above situation, there's a new research aspect proposed in this paper and defined as Inter-language Static Analysis. As all the involved technologies are introduced, the current research results of them will be captured in this paper, such as static analysis in Java layer, binary analysis in Native layer, Java-Native penetration technology, etc.

Andarzian, Seyed Behnam, Ladani, Behrouz Tork.  2020.  Compositional Taint Analysis of Native Codes for Security Vetting of Android Applications. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE). :567–572.
Security vetting of Android applications is one of the crucial aspects of the Android ecosystem. Regarding the state of the art tools for this goal, most of them doesn't consider analyzing native codes and only analyze the Java code. However, Android concedes its developers to implement a part or all of their applications using C or C++ code. Thus, applying conservative manners for analyzing Android applications while ignoring native codes would lead to less precision in results. Few works have tried to analyze Android native codes, but only JN-SAF has applied taint analysis using static techniques such as symbolic execution. However, symbolic execution has some problems when is used in large programs. One of these problems is the exponential growth of program paths that would raise the path explosion issue. In this work, we have tried to alleviate this issue by introducing our new tool named CTAN. CTAN applies new symbolic execution methods to angr in a particular way that it can make JN-SAF more efficient and faster. We have introduced compositional taint analysis in CTAN by combining satisfiability modulo theories with symbolic execution. Our experiments show that CTAN is 26 percent faster than its previous work JN-SAF and it also leads to more precision by detecting more data-leakage in large Android native codes.
Tai, Zeming, Washizaki, Hironori, Fukazawa, Yoshiaki, Fujimatsu, Yurie, Kanai, Jun.  2020.  Binary Similarity Analysis for Vulnerability Detection. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1121–1122.
Binary similarity has been widely used in function recognition and vulnerability detection. How to define a proper similarity is the key element in implementing a fast detection method. We proposed a scalable method to detect binary vulnerabilities based on similarity. Procedures lifted from binaries are divided into several comparable strands by data dependency, and those strands are transformed into a normalized form by our tool named VulneraBin, so that similarity can be determined between two procedures through a hash value comparison. The low computational complexity allows semantically equivalent code to be identified in binaries compiled from million lines of source code in a fast and accurate way.
Calhoun, C. S., Reinhart, J., Alarcon, G. A., Capiola, A..  2020.  Establishing Trust in Binary Analysis in Software Development and Applications. 2020 IEEE International Conference on Human-Machine Systems (ICHMS). :1–4.
The current exploratory study examined software programmer trust in binary analysis techniques used to evaluate and understand binary code components. Experienced software developers participated in knowledge elicitations to identify factors affecting trust in tools and methods used for understanding binary code behavior and minimizing potential security vulnerabilities. Developer perceptions of trust in those tools to assess implementation risk in binary components were captured across a variety of application contexts. The software developers reported source security and vulnerability reports provided the best insight and awareness of potential issues or shortcomings in binary code. Further, applications where the potential impact to systems and data loss is high require relying on more than one type of analysis to ensure the binary component is sound. The findings suggest binary analysis is viable for identifying issues and potential vulnerabilities as part of a comprehensive solution for understanding binary code behavior and security vulnerabilities, but relying simply on binary analysis tools and binary release metadata appears insufficient to ensure a secure solution.
Pfeffer, Tobias, Göthel, Thomas, Glesner, Sabine.  2019.  Automatic Analysis of Critical Sections for Efficient Secure Multi-Execution. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). :318–325.

Enforcement of hypersafety security policies such as noninterference can be achieved through Secure Multi-Execution (SME). While this is typically very resource-intensive, more efficient solutions such as Demand-Driven Secure Multi-Execution (DDSME) exist. Here, the resource requirements are reduced by restricting multi-execution enforcement to critical sections in the code. However, the current solution requires manual binary analysis. In this paper, we propose a fully automatic critical section analysis. Our analysis extracts a context-sensitive boundary of all nodes that handle information from the reachability relation implied by the control-flow graph. We also provide evaluation results, demonstrating the correctness and acceleration of DDSME with our analysis.

Guo, Wenbo, Mu, Dongliang, Xu, Jun, Su, Purui, Wang, Gang, Xing, Xinyu.  2018.  LEMNA: Explaining Deep Learning Based Security Applications. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :364–379.
While deep learning has shown a great potential in various domains, the lack of transparency has limited its application in security or safety-critical areas. Existing research has attempted to develop explanation techniques to provide interpretable explanations for each classification decision. Unfortunately, current methods are optimized for non-security tasks ( e.g., image analysis). Their key assumptions are often violated in security applications, leading to a poor explanation fidelity. In this paper, we propose LEMNA, a high-fidelity explanation method dedicated for security applications. Given an input data sample, LEMNA generates a small set of interpretable features to explain how the input sample is classified. The core idea is to approximate a local area of the complex deep learning decision boundary using a simple interpretable model. The local interpretable model is specially designed to (1) handle feature dependency to better work with security applications ( e.g., binary code analysis); and (2) handle nonlinear local boundaries to boost explanation fidelity. We evaluate our system using two popular deep learning applications in security (a malware classifier, and a function start detector for binary reverse-engineering). Extensive evaluations show that LEMNA's explanation has a much higher fidelity level compared to existing methods. In addition, we demonstrate practical use cases of LEMNA to help machine learning developers to validate model behavior, troubleshoot classification errors, and automatically patch the errors of the target models.
Aigner, Alexander.  2018.  FALKE-MC: A Neural Network Based Approach to Locate Cryptographic Functions in Machine Code. Proceedings of the 13th International Conference on Availability, Reliability and Security. :2:1–2:8.
The localization and classification of cryptographic functions in binary files is a growing challenge in information security, not least because of the increasing use of such functions in malware. Nevertheless, it is still a time consuming and laborious task. Some of the most commonly used techniques are based on dynamic methods, signatures or manual reverse engineering. In this paper we present FALKE-MC, a novel framework that creates classifiers for arbitrary cryptographic algorithms from sample binaries. It processes multiple file formats and architectures and is easily expandable due to its modular design. Functions are automatically detected and features as well as constants are extracted. They are used to train a neural network, which can then be applied to classify functions in unknown binary files. The framework is fully automated, from the input of binary files and the creation of a classifier through to the output of classification results. In addition to that, it can deal with class imbalance between cryptographic and non-cryptographic samples during training. Our evaluation shows that this approach offers a high detection rate in combination with a low false positive rate. We are confident that FALKE-MC can accelerate the localization and classification of cryptographic functions in practice.
Schwartz, Edward J., Cohen, Cory F., Duggan, Michael, Gennari, Jeffrey, Havrilla, Jeffrey S., Hines, Charles.  2018.  Using Logic Programming to Recover C++ Classes and Methods from Compiled Executables. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :426–441.
High-level C++ source code abstractions such as classes and methods greatly assist human analysts and automated algorithms alike when analyzing C++ programs. Unfortunately, these abstractions are lost when compiling C++ source code, which impedes the understanding of C++ executables. In this paper, we propose a system, OOAnalyzer, that uses an innovative new design to statically recover detailed C++ abstractions from executables in a scalable manner. OOAnalyzer's design is motivated by the observation that many human analysts reason about C++ programs by recognizing simple patterns in binary code and then combining these findings using logical inference, domain knowledge, and intuition. We codify this approach by combining a lightweight symbolic analysis with a flexible Prolog-based reasoning system. Unlike most existing work, OOAnalyzer is able to recover both polymorphic and non-polymorphic C++ classes. We show in our evaluation that OOAnalyzer assigns over 78% of methods to the correct class on our test corpus, which includes both malware and real-world software such as Firefox and MySQL. These recovered abstractions can help analysts understand the behavior of C++ malware and cleanware, and can also improve the precision of program analyses on C++ executables.
Ispoglou, Kyriakos K., AlBassam, Bader, Jaeger, Trent, Payer, Mathias.  2018.  Block Oriented Programming: Automating Data-Only Attacks. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :1868-1882.

With the widespread deployment of Control-Flow Integrity (CFI), control-flow hijacking attacks, and consequently code reuse attacks, are significantly more difficult. CFI limits control flow to well-known locations, severely restricting arbitrary code execution. Assessing the remaining attack surface of an application under advanced control-flow hijack defenses such as CFI and shadow stacks remains an open problem. We introduce BOPC, a mechanism to automatically assess whether an attacker can execute arbitrary code on a binary hardened with CFI/shadow stack defenses. BOPC computes exploits for a target program from payload specifications written in a Turing-complete, high-level language called SPL that abstracts away architecture and program-specific details. SPL payloads are compiled into a program trace that executes the desired behavior on top of the target binary. The input for BOPC is an SPL payload, a starting point (e.g., from a fuzzer crash) and an arbitrary memory write primitive that allows application state corruption. To map SPL payloads to a program trace, BOPC introduces Block Oriented Programming (BOP), a new code reuse technique that utilizes entire basic blocks as gadgets along valid execution paths in the program, i.e., without violating CFI or shadow stack policies. We find that the problem of mapping payloads to program traces is NP-hard, so BOPC first reduces the search space by pruning infeasible paths and then uses heuristics to guide the search to probable paths. BOPC encodes the BOP payload as a set of memory writes. We execute 13 SPL payloads applied to 10 popular applications. BOPC successfully finds payloads and complex execution traces – which would likely not have been found through manual analysis – while following the target's Control-Flow Graph under an ideal CFI policy in 81% of the cases.

Searles, R., Xu, L., Killian, W., Vanderbruggen, T., Forren, T., Howe, J., Pearson, Z., Shannon, C., Simmons, J., Cavazos, J..  2017.  Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection. 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). :69–77.

Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.

Jeong, Junho, Son, Yunsik, Oh, Seman.  2017.  The X86/64 Binary Code to Smart Intermediate Language Translation for Software Weakness. Proceedings of the International Conference on Advances in Image Processing. :129–134.

Today, the proportion of software in society as a whole is steadily increasing. In addition to size of software increasing, the number of cases dealing with personal information is also increasing. This shows the importance of weekly software security verification. However, software security is very difficult in cases where libraries do not have source code. To solve this problem, it is necessary to develop a technique for checking existing binary security weaknesses. To this end, techniques for analyzing security weaknesses using intermediate languages are actively being discussed. In this paper, we propose a system that translate binary code to intermediate language to effectively analyze existing security weaknesses within binary code.

Yadegari, Babak, Stephens, Jon, Debray, Saumya.  2017.  Analysis of Exception-Based Control Transfers. Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. :205–216.
Dynamic taint analysis and symbolic execution find many important applications in security-related program analyses. However, current techniques for such analyses do not take proper account of control transfers due to exceptions. As a result, they can fail to account for implicit flows arising from exception-based control transfers, leading to loss of precision and potential false negatives in analysis results. While the idea of using exceptions for obfuscating (unconditional) control transfers is well known, we are not aware of any prior work discussing the use of exceptions to implement conditional control transfers and implicit information flows. This paper demonstrates the problems that can arise in existing dynamic taint analysis and symbolic execution systems due to exception-based implicit information flows and proposes a generic architecture-agnostic solution for reasoning about the behavior of code using user-defined exception handlers. Experimental results from a prototype implementation indicate that the ideas described produce better results than current state-of-the-art systems.
Zhang, B., Ye, J., Feng, C., Tang, C..  2017.  S2F: Discover Hard-to-Reach Vulnerabilities by Semi-Symbolic Fuzz Testing. 2017 13th International Conference on Computational Intelligence and Security (CIS). :548–552.
Fuzz testing is a popular program testing technique. However, it is difficult to find hard-to-reach vulnerabilities that are nested with complex branches. In this paper, we propose semi-symbolic fuzz testing to discover hard-to-reach vulnerabilities. Our method groups inputs into high frequency and low frequency ones. Then symbolic execution is utilized to solve only uncovered branches to mitigate the path explosion problem. Especially, in order to play the advantages of fuzz testing, our method locates critical branch for each low frequency input and corrects the generated test cases to comfort the branch condition. We also implemented a prototype\textbackslashtextbarS2F, and the experimental results show that S2F can gain 17.70% coverage performance and discover more hard-to-reach vulnerabilities than other vulnerability detection tools for our benchmark.