Visible to the public Biblio

Filters: Keyword is static code analysis  [Clear All Filters]
Pereira, José D’Abruzzo.  2020.  Techniques and Tools for Advanced Software Vulnerability Detection. 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). :123—126.
Software is frequently deployed with vulnerabilities that may allow hackers to gain access to the system or information, leading to money or reputation losses. Although there are many techniques to detect software vulnerabilities, their effectiveness is far from acceptable, especially in large software projects, as shown by several research works. This Ph.D. aims to study the combination of different techniques to improve the effectiveness of vulnerability detection (increasing the detection rate and decreasing the number of false-positives). Static Code Analysis (SCA) has a good detection rate and is the central technique of this work. However, as SCA reports many false-positives, we will study the combination of various SCA tools and the integration with other detection approaches (e.g., software metrics) to improve vulnerability detection capabilities. We will also study the use of such combination to prioritize the reported vulnerabilities and thus guide the development efforts and fixes in resource-constrained projects.
Visalli, Nicholas, Deng, Lin, Al-Suwaida, Amro, Brown, Zachary, Joshi, Manish, Wei, Bingyang.  2019.  Towards Automated Security Vulnerability and Software Defect Localization. 2019 IEEE 17th International Conference on Software Engineering Research, Management and Applications (SERA). :90–93.

Security vulnerabilities and software defects are prevalent in software systems, threatening every aspect of cyberspace. The complexity of modern software makes it hard to secure systems. Security vulnerabilities and software defects become a major target of cyberattacks which can lead to significant consequences. Manual identification of vulnerabilities and defects in software systems is very time-consuming and tedious. Many tools have been designed to help analyze software systems and to discover vulnerabilities and defects. However, these tools tend to miss various types of bugs. The bugs that are not caught by these tools usually include vulnerabilities and defects that are too complicated to find or do not fall inside of an existing rule-set for identification. It was hypothesized that these undiscovered vulnerabilities and defects do not occur randomly, rather, they share certain common characteristics. A methodology was proposed to detect the probability of a bug existing in a code structure. We used a comprehensive experimental evaluation to assess the methodology and report our findings.

Talukder, Md Arabin Islam, Shahriar, Hossain, Qian, Kai, Rahman, Mohammad, Ahamed, Sheikh, Wu, Fan, Agu, Emmanuel.  2019.  DroidPatrol: A Static Analysis Plugin For Secure Mobile Software Development. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:565–569.

While the number of mobile applications are rapidly growing, these applications are often coming with numerous security flaws due to the lack of appropriate coding practices. Security issues must be addressed earlier in the development lifecycle rather than fixing them after the attacks because the damage might already be extensive. Early elimination of possible security vulnerabilities will help us increase the security of our software and mitigate or reduce the potential damages through data losses or service disruptions caused by malicious attacks. However, many software developers lack necessary security knowledge and skills required at the development stage, and Secure Mobile Software Development (SMSD) is not yet well represented in academia and industry. In this paper, we present a static analysis-based security analysis approach through design and implementation of a plugin for Android Development Studio, namely DroidPatrol. The proposed plugins can support developers by providing list of potential vulnerabilities early.

Rahman, Md Rayhanur, Rahman, Akond, Williams, Laurie.  2019.  Share, But Be Aware: Security Smells in Python Gists. 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). :536–540.

Github Gist is a service provided by Github which is used by developers to share code snippets. While sharing, developers may inadvertently introduce security smells in code snippets as well, such as hard-coded passwords. Security smells are recurrent coding patterns that are indicative of security weaknesses, which could potentially lead to security breaches. The goal of this paper is to help software practitioners avoid insecure coding practices through an empirical study of security smells in publicly-available GitHub Gists. Through static analysis, we found 13 types of security smells with 4,403 occurrences in 5,822 publicly-available Python Gists. 1,817 of those Gists, which is around 31%, have at least one security smell including 689 instances of hard-coded secrets. We also found no significance relation between the presence of these security smells and the reputation of the Gist author. Based on our findings, we advocate for increased awareness and rigorous code review efforts related to software security for Github Gists so that propagation of insecure coding practices are mitigated.

Rahman, Akond, Parnin, Chris, Williams, Laurie.  2019.  The Seven Sins: Security Smells in Infrastructure as Code Scripts. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :164–175.

Practitioners use infrastructure as code (IaC) scripts to provision servers and development environments. While developing IaC scripts, practitioners may inadvertently introduce security smells. Security smells are recurring coding patterns that are indicative of security weakness and can potentially lead to security breaches. The goal of this paper is to help practitioners avoid insecure coding practices while developing infrastructure as code (IaC) scripts through an empirical study of security smells in IaC scripts. We apply qualitative analysis on 1,726 IaC scripts to identify seven security smells. Next, we implement and validate a static analysis tool called Security Linter for Infrastructure as Code scripts (SLIC) to identify the occurrence of each smell in 15,232 IaC scripts collected from 293 open source repositories. We identify 21,201 occurrences of security smells that include 1,326 occurrences of hard-coded passwords. We submitted bug reports for 1,000 randomly-selected security smell occurrences. We obtain 212 responses to these bug reports, of which 148 occurrences were accepted by the development teams to be fixed. We observe security smells can have a long lifetime, e.g., a hard-coded secret can persist for as long as 98 months, with a median lifetime of 20 months.

Pfeffer, Tobias, Göthel, Thomas, Glesner, Sabine.  2019.  Automatic Analysis of Critical Sections for Efficient Secure Multi-Execution. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). :318–325.

Enforcement of hypersafety security policies such as noninterference can be achieved through Secure Multi-Execution (SME). While this is typically very resource-intensive, more efficient solutions such as Demand-Driven Secure Multi-Execution (DDSME) exist. Here, the resource requirements are reduced by restricting multi-execution enforcement to critical sections in the code. However, the current solution requires manual binary analysis. In this paper, we propose a fully automatic critical section analysis. Our analysis extracts a context-sensitive boundary of all nodes that handle information from the reachability relation implied by the control-flow graph. We also provide evaluation results, demonstrating the correctness and acceleration of DDSME with our analysis.

Marin, M\u ad\u alina Angelica, Carabas, Costin, Deaconescu, R\u azvan, T\u apus, Nicolae.  2019.  Proactive Secure Coding for iOS Applications. 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–5.

In this paper we propose a solution to support iOS developers in creating better applications, to use static analysis to investigate source code and detect secure coding issues while simultaneously pointing out good practices and/or secure APIs they should use.

Ding, Steven H. H., Fung, Benjamin C. M., Charland, Philippe.  2019.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. 2019 IEEE Symposium on Security and Privacy (SP). :472–489.

Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \textbackslashemphAsm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

Cheng, Xiao, Wang, Haoyu, Hua, Jiayi, Zhang, Miao, Xu, Guoai, Yi, Li, Sui, Yulei.  2019.  Static Detection of Control-Flow-Related Vulnerabilities Using Graph Embedding. 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS). :41–50.

Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.

Ashfaq, Qirat, Khan, Rimsha, Farooq, Sehrish.  2019.  A Comparative Analysis of Static Code Analysis Tools That Check Java Code Adherence to Java Coding Standards. 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE). :98–103.

Java programming language is considered highly important due to its extensive use in the development of web, desktop as well as handheld devices applications. Implementing Java Coding standards on Java code has great importance as it creates consistency and as a result better development and maintenance. Finding bugs and standard's violations is important at an early stage of software development than at a later stage when the change becomes impossible or too expensive. In the paper, some tools and research work done on Coding Standard Analyzers is reviewed. These tools are categorized based on the type of rules they cheeked, namely: style, concurrency, exceptions, and quality, security, dependency and general methods of static code analysis. Finally, list of Java Coding Standards Enforcing Tools are analyzed against certain predefined parameters that are limited by the scope of research paper under study. This review will provide the basis for selecting a static code analysis tool that enforce International Java Coding Standards such as the Rule of Ten and the JPL Coding Standards. Such tools have great importance especially in the development of mission/safety critical system. This work can be very useful for developers in selecting a good tool for Java code analysis, according to their requirements.

Bakour, K., Ünver, H. M., Ghanem, R..  2018.  The Android Malware Static Analysis: Techniques, Limitations, and Open Challenges. 2018 3rd International Conference on Computer Science and Engineering (UBMK). :586-593.

This paper aims to explain static analysis techniques in detail, and to highlight the weaknesses and challenges which face it. To this end, more than 80 static analysis-based framework have been studied, and in their light, the process of detecting malicious applications has been divided into four phases that were explained in a schematic manner. Also, the features that is used in static analysis were discussed in detail by dividing it into four categories namely, Manifest-based features, code-based features, semantic features and app's metadata-based features. Also, the challenges facing methods based on static analysis were discussed in detail. Finally, a case study was conducted to test the strength of some known commercial antivirus and one of the stat-of-art academic static analysis frameworks against obfuscation techniques used by developers of malicious applications. The results showed a significant impact on the performance of the most tested antiviruses and frameworks, which is reflecting the urgent need for more accurately tools.

Gauthier, F., Keynes, N., Allen, N., Corney, D., Krishnan, P..  2018.  Scalable Static Analysis to Detect Security Vulnerabilities: Challenges and Solutions. 2018 IEEE Cybersecurity Development (SecDev). :134-134.

Parfait [1] is a static analysis tool originally developed to find implementation defects in C/C++ systems code. Parfait's focus is on proving both high precision (low false positives) as well as scaling to systems with millions of lines of code (typically requiring 10 minutes of analysis time per million lines). Parfait has since been extended to detect security vulnerabilities in applications code, supporting the Java EE and PL/SQL server stack. In this abstract we describe some of the challenges we encountered in this process including some of the differences seen between the applications code being analysed, our solutions that enable us to analyse a variety of applications, and a summary of the challenges that remain.

Novikov, A. S., Ivutin, A. N., Troshina, A. G., Vasiliev, S. N..  2018.  Detecting the Use of Unsafe Data in Software of Embedded Systems by Means of Static Analysis Methodology. 2018 7th Mediterranean Conference on Embedded Computing (MECO). :1-4.

The article considers the approach to identifying potentially unsafe data in program code of embedded systems which can lead to errors and fails in the functioning of equipment. The sources of invalid data are revealed and the process of changing the status of this data in process of static code analysis is shown. The mechanism for annotating functions that operate on unsafe data is described, which allows to control the entire process of using them and thus it will improve the quality of the output code.

Yamamoto, Ryota, Yoshida, Norihiro, Takada, Hiroaki.  2018.  Towards Static Recovery of Micro State Transitions from Legacy Embedded Code. Proceedings of the 1st ACM SIGSOFT International Workshop on Automated Specification Inference. :1-4.

During the development of an embedded system, state transition models are frequently used for modeling at several abstraction levels. Unfortunately, specification documents including such model are often lost or not up to date during maintenance/reuse. Based on our experience in industrial collaboration, we present Micro State Transition Table (MSTT) to help developers understanding embedded code based on a fine-grained state transition model. We also discuss the challenges of static recovery of an MSTT.

Gao, Qing, Ma, Sen, Shao, Sihao, Sui, Yulei, Zhao, Guoliang, Ma, Luyao, Ma, Xiao, Duan, Fuyao, Deng, Xiao, Zhang, Shikun et al..  2018.  CoBOT: Static C/C++ Bug Detection in the Presence of Incomplete Code. Proceedings of the 26th Conference on Program Comprehension. :385-388.

To obtain precise and sound results, most of existing static analyzers require whole program analysis with complete source code. However, in reality, the source code of an application always interacts with many third-party libraries, which are often not easily accessible to static analyzers. Worse still, more than 30% of legacy projects [1] cannot be compiled easily due to complicated configuration environments (e.g., third-party libraries, compiler options and macros), making ideal "whole-program analysis" unavailable in practice. This paper presents CoBOT [2], a static analysis tool that can detect bugs in the presence of incomplete code. It analyzes function APIs unavailable in application code by either using function summarization or automatically downloading and analyzing the corresponding library code as inferred from the application code and its configuration files. The experiments show that CoBOT is not only easy to use, but also effective in detecting bugs in real-world programs with incomplete code. Our demonstration video is at:

Verriet, Jacques, Dankers, Reinier, Somers, Lou.  2018.  Performance Prediction for Families of Data-Intensive Software Applications. Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. :189-194.

Performance is a critical system property of any system, in particular of data-intensive systems, such as image processing systems. We describe a performance engineering method for families of data-intensive systems that is both simple and accurate; the performance of new family members is predicted using models of existing family members. The predictive models are calibrated using static code analysis and regression. Code analysis is used to extract performance profiles, which are used in combination with regression to derive predictive performance models. A case study presents the application for an industrial image processing case, which revealed as benefits the easy application and identification of code performance optimization points. 

Nguyen Quang Do, Lisa, Bodden, Eric.  2018.  Gamifying Static Analysis. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. :714-718.

In the past decades, static code analysis has become a prevalent means to detect bugs and security vulnerabilities in software systems. As software becomes more complex, analysis tools also report lists of increasingly complex warnings that developers need to address on a daily basis. The novel insight we present in this work is that static analysis tools and video games both require users to take on repetitive and challenging tasks. Importantly, though, while good video games manage to keep players engaged, static analysis tools are notorious for their lacking user experience, which prevents developers from using them to their full potential, frequently resulting in dissatisfaction and even tool abandonment. We show parallels between gaming and using static analysis tools, and advocate that the user-experience issues of analysis tools can be addressed by looking at the analysis tooling system as a whole, and by integrating gaming elements that keep users engaged, such as providing immediate and clear feedback, collaborative problem solving, or motivators such as points and badges.

Querel, Louis-Philippe, Rigby, Peter C..  2018.  WarningsGuru: Integrating Statistical Bug Models with Static Analysis to Provide Timely and Specific Bug Warnings. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. :892-895.

The detection of bugs in software systems has been divided into two research areas: static code analysis and statistical modeling of historical data. Static analysis indicates precise problems on line numbers but has the disadvantage of suggesting many warning which are often false positives. In contrast, statistical models use the history of the system to suggest which files or commits are likely to contain bugs. These course-grained predictions do not indicate to the developer the precise reasons for the bug prediction. We combine static analysis with statistical bug models to limit the number of warnings and provide specific warnings information at the line level. Previous research was able to process only a limited number of releases, our tool, WarningsGuru, can analyze all commits in a source code repository and we currently have processed thousands of commits and warnings. Since we process every commit, we present developers with more precise information about when a warning is introduced allowing us to show recent warnings that are introduced in statistically risky commits. Results from two OSS projects show that CommitGuru's statistical model flags 25% and 29% of all commits as risky. When we combine this with static analysis in WarningsGuru the number of risky commits with warnings is 20% for both projects and the number commits with new warnings is only 3% and 6%. We can drastically reduce the number of commits and warnings developers have to examine. The tool, source code, and demo is available at

Ludwig, Jeremy, Xu, Steven, Webber, Frederick.  2018.  Static Software Metrics for Reliability and Maintainability. Proceedings of the 2018 International Conference on Technical Debt. :53-54.

This paper identifies a small, essential set of static software code metrics linked to the software product quality characteristics of reliability and maintainability and to the most commonly identified sources of technical debt. An open-source plug-in is created for the Understand code analysis tool that calculates and visualizes these metrics. The plug-in was developed as a first step in an ongoing project aimed at applying case-based reasoning to the issue of software product quality.1

Ferenc, Rudolf, Tóth, Zoltán, Ladányi, Gergely, Siket, István, Gyimóthy, Tibor.  2018.  A Public Unified Bug Dataset for Java. Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. :12-21.

Background: Bug datasets have been created and used by many researchers to build bug prediction models. Aims: In this work we collected existing public bug datasets and unified their contents. Method: We considered 5 public datasets which adhered to all of our criteria. We also downloaded the corresponding source code for each system in the datasets and performed their source code analysis to obtain a common set of source code metrics. This way we produced a unified bug dataset at class and file level that is suitable for further research (e.g. to be used in the building of new bug prediction models). Furthermore, we compared the metric definitions and values of the different bug datasets. Results: We found that (i) the same metric abbreviation can have different definitions or metrics calculated in the same way can have different names, (ii) in some cases different tools give different values even if the metric definitions coincide because (iii) one tool works on source code while the other calculates metrics on bytecode, or (iv) in several cases the downloaded source code contained more files which influenced the afferent metric values significantly. Conclusions: Apart from all these imprecisions, we think that having a common metric set can help in building better bug prediction models and deducing more general conclusions. We made the unified dataset publicly available for everyone. By using a public dataset as an input for different bug prediction related investigations, researchers can make their studies reproducible, thus able to be validated and verified.

Gharibi, Gharib, Tripathi, Rashmi, Lee, Yugyung.  2018.  Code2Graph: Automatic Generation of Static Call Graphs for Python Source Code. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. :880-883.

A static call graph is an imperative prerequisite used in most interprocedural analyses and software comprehension tools. However, there is a lack of software tools that can automatically analyze the Python source-code and construct its static call graph. In this paper, we introduce a prototype Python tool, named code2graph, which automates the tasks of (1) analyzing the Python source-code and extracting its structure, (2) constructing static call graphs from the source code, and (3) generating a similarity matrix of all possible execution paths in the system. Our goal is twofold: First, assist the developers in understanding the overall structure of the system. Second, provide a stepping stone for further research that can utilize the tool in software searching and similarity detection applications. For example, clustering the execution paths into a logical workflow of the system would be applied to automate specific software tasks. Code2graph has been successfully used to generate static call graphs and similarity matrices of the paths for three popular open-source Deep Learning projects (TensorFlow, Keras, PyTorch). A tool demo is available at

Kronjee, Jorrit, Hommersom, Arjen, Vranken, Harald.  2018.  Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. Proceedings of the 13th International Conference on Availability, Reliability and Security. :6:1–6:10.

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.

Reynolds, Z. P., Jayanth, A. B., Koc, U., Porter, A. A., Raje, R. R., Hill, J. H..  2017.  Identifying and Documenting False Positive Patterns Generated by Static Code Analysis Tools. 2017 IEEE/ACM 4th International Workshop on Software Engineering Research and Industrial Practice (SER IP). :55–61.

This paper presents our results from identifying anddocumenting false positives generated by static code analysistools. By false positives, we mean a static code analysis toolgenerates a warning message, but the warning message isnot really an error. The goal of our study is to understandthe different kinds of false positives generated so we can (1)automatically determine if an error message is truly indeed a truepositive, and (2) reduce the number of false positives developersand testers must triage. We have used two open-source tools andone commercial tool in our study. The results of our study haveled to 14 core false positive patterns, some of which we haveconfirmed with static code analysis tool developers.

Novikov, A. S., Ivutin, A. N., Troshina, A. G., Vasiliev, S. N..  2017.  The approach to finding errors in program code based on static analysis methodology. 2017 6th Mediterranean Conference on Embedded Computing (MECO). :1–4.

The article considers the approach to static analysis of program code and the general principles of static analyzer operation. The authors identify the most important syntactic and semantic information in the programs, which can be used to find errors in the source code. The general methodology for development of diagnostic rules is proposed, which will improve the efficiency of static code analyzers.

Obster, M., Kowalewski, S..  2017.  A live static code analysis architecture for PLC software. 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). :1–4.

Static code analysis is a convenient technique to support the development of software. Without prior test setup, information about a later runtime behavior can be inferred and errors in the code can be found before using a regular compiler. Solutions to apply static code analysis to PLC software following the IEC 61131-3 already exist, but using these separate tools usually creates a gap in the development process. In this paper we introduce an architecture to use static analysis directly in a development environment and give instant feedback to the developer while he is still editing the PLC software.