Visible to the public Biblio

Found 267 results

Jonathan Aldrich, Alex Potanin.  2016.  Naturally Embedded DSLs. Systems, Programming, Languages and Applications: Software for Humanity (SPLASH) .

Domain-specific languages can be embedded in a variety of ways within a host language. The choice of embedding approach entails significant tradeoffs in the usability of the embedded DSL. We argue embedding DSLs \textit{naturally} within the host language results in the best experience for end users of the DSL. A \textit{naturally embedded DSL} is one that uses natural syntax, static semantics, and dynamic semantics for the DSL, all of which may differ from the host language. Furthermore, it must be possible to use DSLs together naturally - meaning that different DSLs cannot conflict, and the programmer can easily tell which code is written in which language.

Esther Wang, Jonathan Aldrich.  2016.  Capability Safe Reflection for the Wyvern Language. SPLASH 2016.

Reflection allows a program to examine and even modify itself, but its power can also lead to violations of encapsulation and even security vulnerabilities. The Wyvern language leverages static types for encapsulation and provides security through an object capability model. We present a design for reflection in Wyvern which respects capability safety and type-based encapsulation. This is accomplished through a mirror-based design, with the addition of a mechanism to constrain the visible type of a reflected object. In this way, we ensure that the programmer cannot use reflection to violate basic encapsulation and security guarantees.

Alireza Sadeghi, Hamid Bagheri, Joshua Garcia, Sam Malek.  2016.  A Taxonomy and Qualitative Comparison of Program Analysis Techniques for Security Assessment of Android Software. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. 99

In parallel with the meteoric rise of mobile software, we are witnessing an alarming escalation in the number and sophistication of the security threats targeted at mobile platforms, particularly Android, as the dominant platform. While existing research has made significant progress towards detection and mitigation of Android security, gaps and challenges remain. This paper contributes a comprehensive taxonomy to classify and characterize the state-of-the-art research in this area. We have carefully followed the systematic literature review process, and analyzed the results of more than 300 research papers, resulting in the most comprehensive and elaborate investigation of the literature in this area of research. The systematic analysis of the research literature has revealed patterns, trends, and gaps in

Jafar Al-Kofahi, Tien Nguyen, Christian Kästner.  2016.  Escaping AutoHell: a vision for automated analysis and migration of autotools build systems. RELENG 2016 Proceedings of the 4th International Workshop on Release Engineering.

GNU Autotools is a widely used build tool in the open source community. As open source projects grow more complex, maintaining their build systems becomes more challenging, due to the lack of tool support. In this paper, we propose a platform to build support tools for GNU Autotools build systems. The platform provides an abstraction of the build system to be used in different analysis techniques.

Meng Meng, Jens Meinicke, Chu-Pan Wong, Eric Walkingshaw, Christian Kästner.  2017.  A Choice of Variational Stacks: Exploring Variational Data Structures. 11th International Workshop on Variability Modelling of Software-intensive Systems (VAMOS).

Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make the variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them. In this paper, we strive for a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered. Specifically, we study several alternative designs of a variational stack, a data structure that supports efficiently representing and computing with multiple variants of a plain stack, and that is a common building block in many algorithms. The different variational stacks are presented as a small product line organized by three design decisions. We analyze how these design decisions affect the performance of a variational stack with different usage profiles. Finally, we evaluate how these design decisions affect the performance of the variational stack in a real-world scenario: in the interpreter VarexJ when executing real software containing variability. 

Aiping Xiong, Robert W. Proctor, Ninghui Li, Weining Yang.  2016.  Use of Warnings for Instructing Users How to Detect Phishing Webpages. 46th Annual Meeting of the Society for Computers in Psychology.

The ineffectiveness of phishing warnings has been attributed to users' poor comprehension of the warning. However, the effectiveness of a phishing warning is typically evaluated at the time when users interact with a suspected phishing webpage, which we call the effect with phishing warning. Nevertheless, users' improved phishing detection when the warning is absent—or the effect of the warning—is the ultimate goal to prevent users from falling for phishing scams. We conducted an online study to evaluate the effect with and of several phishing warning variations, varying the point at which the warning was presented and whether procedural knowledge instruction was included in the warning interface. The current Chrome phishing warning was also included as a control. 360 Amazon Mechanical-Turk workers made submission; 500¬ word maximum for symposia) decisions about 10 login webpages (8 authentic, 2 fraudulent) with the aid of warning (first phase). After a short distracting task, the workers made the same decisions about 10 different login webpages (8 authentic, 2 fraudulent) without warning. In phase one, the compliance rates with two proposed warning interfaces (98% and 94%) were similar to those of the Chrome warning (98%), regardless of when the warning was presented. In phase two (without warning), performance was better for the condition in which warning with procedural knowledge instruction was presented before the phishing webpage in phase one, suggesting a better of effect than for the other conditions. With the procedural knowledge of how to determine a webpage’s legitimacy, users identified phishing webpages more accurately even without the warning being presented.

Jing Chen, Robert W. Proctor, Ninghui Li.  2016.  Human Trust in Automation in a Phishing Context. 46th Annual Meeting of the Society for Computers in Psychology.

Many previous studies have shown that trust in automation mediates the effectiveness of automation in maintaining performance, and one critical factor that affects trust is the reliability of the automated system. In the cyber domain, automated systems are pervasive, yet the involvement of human trust has not been studied extensively as in other domains such as transportation.

In the current study, we used a phishing email identification task (with a phishing detection automated assistant system) as a testbed to study human trust in automation in the cyber domain. More specifically, we systematically investigated the influence of “description” (i.e., whether the user was informed about the actual reliability of the automated system) and “experience” (i.e., whether the user was provided feedback on their choices), in addition to the reliability level of the automated phishing detection system. These factors were varied in different conditions of response bias (false alarm vs. misses) and task difficulty (easy vs. difficult), which were found may be critical in a pilot study. Measures of user performance and trust were compared across different conditions. The measures of interest were human trust in the warning (a subjective rating of how trustable the warning system is), human reliance on the automated system (an objective measure of whether the participants comply with the system’s warnings), and performance (the overall quality of the decisions made).

Aiping Xiong, R. W. Proctor, Weining Yang, Ninghui Li.  2017.  Is Domain Highlighting Actually Helpful in Identifying Phishing Webpages? Human Factors: The Journal of the Human Factors and Ergonomics Society.

Objective: To evaluate the effectiveness of domain highlighting in helping users identify whether webpages are legitimate or spurious.

Background: As a component of the URL, a domain name can be overlooked. Consequently, browsers highlight the domain name to help users identify which website they are visiting. Nevertheless, few studies have assessed the effectiveness of domain highlighting, and the only formal study confounded highlighting with instructions to look at the address bar. 

Method: We conducted two phishing detection experiments. Experiment 1 was run online: Participants judged the legitimacy of webpages in two phases. In phase one, participants were to judge the legitimacy based on any information on the webpage, whereas phase two they were to focus on the address bar. Whether the domain was highlighted was also varied.  Experiment 2 was conducted similarly but with participants in a laboratory setting, which allowed tracking of fixations.

Results: Participants differentiated the legitimate and fraudulent webpages better than chance. There was some benefit of attending to the address bar, but domain highlighting did not provide effective protection against phishing attacks. Analysis of eye-gaze fixation measures was in agreement with the task performance, but heat-map results revealed that participants’ visual attention was attracted by the highlighted domains.

Conclusion: Failure to detect many fraudulent webpages even when the domain was highlighted implies that users lacked knowledge of webpage security cues or how to use those cues.

Jaspreet Bhatia, Travis Breaux, Joel Reidenberg, Thomas Norton.  2016.  A Theory of Vagueness and Privacy Risk Perception. 2016 IEEE 24th International Requirements Engineering Conference (RE).

Ambiguity arises in requirements when astatement is unintentionally or otherwise incomplete, missing information, or when a word or phrase has morethan one possible meaning. For web-based and mobileinformation systems, ambiguity, and vagueness inparticular, undermines the ability of organizations to aligntheir privacy policies with their data practices, which canconfuse or mislead users thus leading to an increase inprivacy risk. In this paper, we introduce a theory ofvagueness for privacy policy statements based on ataxonomy of vague terms derived from an empiricalcontent analysis of 15 privacy policies. The taxonomy wasevaluated in a paired comparison experiment and resultswere analyzed using the Bradley-Terry model to yield arank order of vague terms in both isolation andcomposition. The theory predicts how vague modifiers toinformation actions and information types can becomposed to increase or decrease overall vagueness. Wefurther provide empirical evidence based on factorialvignette surveys to show how increases in vagueness willdecrease users' acceptance of privacy risk and thusdecrease users' willingness to share personal information.

Rui Shu, Xiaohui Gu, William Enck.  2017.  A Study of Security Vulnerabilities on Docker Hub. Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY).
Mohammed Alsaleh, Ehab Al-Shaer.  2016.  Towards Automated Verification of Active Cyber Defense Strategies on Software Defined Networks. ACM CCS SafeConfig '16 Proceedings of the 2016 ACM Workshop on Automated Decision Making for Active Cyber Defense:.
Lichao Sun, Zhiqiang Li, Qiben Yan, Witawas Srisa-an, Yu Pan.  2016.  SigPID: Significant Permission Identification for Android Malware Detection. 11th International Conference on Malicious and Unwanted Software (MALCON 2016).

A recent report indicates that a newly developed mali- cious app for Android is introduced every 11 seconds.  To combat this alarming rate of malware creation,  we need a scalable malware detection approach that is effective and efficient. In this paper, we introduce SIGPID, a malware detection system based on permission  analysis to cope with the rapid increase in the number of Android malware. In- stead of analyzing all 135 Android permissions, our ap- proach applies 3-level pruning by mining the permission data to identify only significant permissions that can be ef- fective in distinguishing benign and malicious apps. SIG- PID then utilizes classification algorithms to classify differ- ent families of malware and benign apps. Our evaluation finds that only 22 out of 135 permissions are significant. We then compare the performance of our approach, using only

22 permissions, against a baseline approach that analyzes all permissions. The results indicate that when Support Vec- tor Machine (SVM) is used as the classifier, we can achieve over 90% of precision, recall, accuracy, and F-measure, which  are about the same as those produced by the base- line approach while incurring the analysis times that are 4 to 32 times smaller that those of using all 135 permissions. When we compare the detection effectiveness of SIGPID to those of other approaches, SIGPID can detect 93.62% of malware in the data set, and 91.4% unknown malware.

Zhiqiang Li, Lichao Sun, Qiben Yan, Witawas Srisa-an, Zhenxiang Chen.  2016.  DroidClassifier: Efficient Adaptive Mining of Application-Layer Header for Classifying Android Malware. 12th EAI International Conference on Security and Privacy in Communication Networks.

A recent report has shown that there are more than 5,000 malicious applications created for Android devices each day. This creates a need for researchers to develop effective and efficient malware classification and detection approaches. To address this need, we introduce DroidClassifier: a systematic framework for classifying network traffic generated by mobile malware. Our approach utilizes network traffic analysis to construct multiple models in an automated fashion using a supervised method over a set of labeled malware network traffic (the training dataset). Each model is built by extracting common identifiers from multiple HTTP header fields. Adaptive thresholds are designed to capture the disparate characteristics of different malware families. Clustering is then used to improve the classification efficiency. Finally, we aggregate the multiple models to construct a holistic model to conduct cluster-level malware classification. We then perform a comprehensive evaluation of DroidClassifier by using 706 malware samples as the training set and 657 malware samples and 5,215 benign apps as the testing set. Collectively , these malicious and benign apps generate 17,949 network flows. The results show that DroidClassifier successfully identifies over 90% of different families of malware with more than 90% accuracy with accessible computational cost. Thus, DroidClassifier can facilitate network management in a large network, and enable unobtrusive detection of mobile malware. By focusing on analyzing network behaviors, we expect DroidClassifier to work with reasonable accuracy for other mobile platforms such as iOS and Windows Mobile as well.

Supat Rattanasuksun, Tingting Yu, Witawas Srisa-an, Gregg Rothermel.  2016.  RRF: A Race Reproduction Framework for Use in Debugging Process-Level Races. 27th International Symposium on Software Reliability Engineering (ISSRE).

Process-level races are endemic in modern  systems. These races are difficult  to debug  because they are  sensitive to execution   events  such  as  interrupts and scheduling.  Unless  a process interleaving   that can result in the race can be found, it cannot be reproduced  and cannot be corrected. In practice, however,  the number of interleavings  that can occur among processes  in practice  is large,  and the patterns of interleavings can be complex. Thus, approaches for reproducing process-level races  to date are  often ineffective.  In  this paper, we present RRF, a race reproduction  framework that can help software engineers reproduce reported process-level races, enabling  them to potentially  debug these races. RRF performs a hybrid analysis by leveraging  existing  static program analysis tools, dynamic kernel event  reporting tools,  and yield points  to provide  the observability and controllability  needed to reproduce races. We conducted an empirical study to evaluate RRF; our results show that RRF can be effective for reproducing races.

Alain Forget, Sarah Pearman, Jeremy Thomas, Alessandro Acquisti, Nicolas Christin, Lorrie Cranor, Serge Egelman, Marian Harbach, Rahul Telang.  2016.  Do or Do Not, There Is No Try: User Engagement May Not Improve Security Outcomes. Proceedings of the Twelfth Symposium on Usable Privacy and Security (SOUPS 2016).

Computer security problems often occur when there are disconnects between users’ understanding of their role in computer security and what is expected of them. To help users make good security decisions more easily, we need insights into the challenges they face in their daily computer usage. We built and deployed the Security Behavior Observatory (SBO) to collect data on user behavior and machine configurations from participants’ home computers. Combining SBO data with user interviews, this paper presents a qualitative study comparing users’ attitudes, behaviors, and understanding of computer security to the actual states of their computers. Qualitative inductive thematic analysis of the interviews produced “engagement” as the overarching theme, whereby participants with greater engagement in computer security and maintenance did not necessarily have more secure computer states. Thus, user engagement alone may not be predictive of computer security. We identify several other themes that inform future directions for better design and research into security interventions. Our findings emphasize the need for better understanding of how users’ computers get infected, so that we can more effectively design user-centered mitigations.

Gabriel Ferreira, Momin Malik, Christian Kästner, Jurgen Pfeffer, Sven Apel.  2016.  Do #ifdefs influence the occurrence of vulnerabilities? an empirical study of the linux kernel SPLC '16 Proceedings of the 20th International Systems and Software Product Line Conference. :65-73.

Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code.

We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities.

Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.

Jens Meinicke, Chu-Pan Wong, Christian Kästner, Thomas Thum, Gunter Saake.  2016.  On essential configuration complexity: measuring interactions in highly-configurable systems. ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. :483-494.

Quality assurance for highly-configurable systems is challenging due to the exponentially growing configuration space. Interactions among multiple options can lead to surprising behaviors, bugs, and security vulnerabilities. Analyzing all configurations systematically might be possible though if most options do not interact or interactions follow specific patterns that can be exploited by analysis tools. To better understand interactions in practice, we analyze program traces to characterize and identify where interactions occur on control flow and data. To this end, we developed a dynamic analysis for Java based on variability-aware execution and monitor executions of multiple small to medium-sized programs. We find that the essential configuration complexity of these programs is indeed much lower than the combinatorial explosion of the configuration space indicates. However, we also discover that the interaction characteristics that allow scalable and complete analyses are more nuanced than what is exploited by existing state-of-the-art quality assurance strategies.

Christopher Bogart, Christian Kästner, James Herbsleb, Ferdian Thung.  2016.  How to break an API: cost negotiation and community values in three software ecosystems. FSE 2016 Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.

Change introduces conflict into software ecosystems: breaking changes may ripple through the ecosystem and trigger rework for users of a package, but often developers can invest additional effort or accept opportunity costs to alleviate or delay downstream costs. We performed a multiple case study of three software ecosystems with different tooling and philosophies toward change, Eclipse, R/CRAN, and Node.js/npm, to understand how developers make decisions about change and change-related costs and what practices, tooling, and policies are used. We found that all three ecosystems differ substantially in their practices and expectations toward change and that those differences can be explained largely by different community values in each ecosystem. Our results illustrate that there is a large design space in how to build an ecosystem, its policies and its supporting infrastructure; and there is value in making community values and accepted tradeoffs explicit and transparent in order to resolve conflicts and negotiate change-related costs

Flavio Medeiros, Christian Kästner, Marcio Ribeiro, Rohit Gheyi, Sven Apel.  2016.  A comparison of 10 sampling algorithms for configurable systems. ICSE '16 Proceedings of the 38th International Conference on Software Engineering. :643-654.

Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.

Christian Kästner, Jurgen Pfeffer.  2014.  Limiting Recertification in Highly Configurable Systems Analyzing Interactions and Isolation among Configuration Options. HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security.

In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core. 

Hanan Hibshi, Travis Breaux, Christian Wagner.  2016.  Improving Security Requirements Adequacy An Interval Type 2 Fuzzy Logic Security Assessment System. 2016 IEEE Symposium Series on Computational Intelligence .

Organizations rely on security experts to improve the security of their systems. These professionals use background knowledge and experience to align known threats and vulnerabilities before selecting mitigation options. The substantial depth of expertise in any one area (e.g., databases, networks, operating systems) precludes the possibility that an expert would have complete knowledge about all threats and vulnerabilities. To begin addressing this problem of distributed knowledge, we investigate the challenge of developing a security requirements rule base that mimics human expert reasoning to enable new decision-support systems. In this paper, we show how to collect relevant information from cyber security experts to enable the generation of: (1) interval type-2 fuzzy sets that capture intra- and inter-expert uncertainty around vulnerability levels; and (2) fuzzy logic rules underpinning the decision-making process within the requirements analysis. The proposed method relies on comparative ratings of security requirements in the context of concrete vignettes, providing a novel, interdisciplinary approach to knowledge generation for fuzzy logic systems. The proposed approach is tested by evaluating 52 scenarios with 13 experts to compare their assessments to those of the fuzzy logic decision support system. The initial results show that the system provides reliable assessments to the security analysts, in particular, generating more conservative assessments in 19% of the test scenarios compared to the experts’ ratings. 

Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis Breaux, Jianwei Niu.  2016.  Toward a framework for detecting privacy policy violations in android application code. ICSE '16 Proceedings of the 38th International Conference on Software Engineering.

Mobile applications frequently access sensitive personal information to meet user or business requirements. Because such information is sensitive in general, regulators increasingly require mobile-app developers to publish privacy policies that describe what information is collected. Furthermore, regulators have fined companies when these policies are inconsistent with the actual data practices of mobile apps. To help mobile-app developers check their privacy policies against their apps' code for consistency, we propose a semi-automated framework that consists of a policy terminology-API method map that links policy phrases to API methods that produce sensitive information, and information flow analysis to detect misalignments. We present an implementation of our framework based on a privacy-policy-phrase ontology and a collection of mappings from API methods to policy phrases. Our empirical evaluation on 477 top Android apps discovered 341 potential privacy policy violations.

Travis Breaux, Daniel Smullen, Hanan Hibshi.  2015.  Detecting Repurposing and Over-Collection in Multi-party Privacy Requirements Specifications. RE 2015: Requirement Engineering Conference.

Mobile and web applications increasingly leverage service-oriented architectures in which developers integrate third-party services into end user applications. This includes identity management, mapping and navigation, cloud storage, and advertising services, among others. While service reuse reduces development time, it introduces new privacy and security risks due to data repurposing and over-collection as data is shared among multiple parties who lack transparency into thirdparty data practices. To address this challenge, we propose new techniques based on Description Logic (DL) for modeling multiparty data flow requirements and verifying the purpose specification and collection and use limitation principles, which are prominent privacy properties found in international standards and guidelines. We evaluate our techniques in an empirical case study that examines the data practices of the Waze mobile application and three of their service providers: Facebook Login, Amazon Web Services (a cloud storage provider), and (a popular mobile analytics and advertising platform). The study results include detected conflicts and violations of the principles as well as two patterns for balancing privacy and data use flexibility in requirements specifications. Analysis of automation reasoning over the DL models show that reasoning over complex compositions of multi-party systems is feasible within exponential asymptotic timeframes proportional to the policy size, the number of expressed data, and orthogonal to the number of conflicts found.