Biblio
Programming languages often include specialized syntax for common datatypes e.g. lists and some also build in support for specific specialized datatypes e.g. regular expressions, but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages TSLs: logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.
Programming languages often include specialized syntax for common
datatypes (e.g. lists) and some also build in support for specific specialized
datatypes (e.g. regular expressions), but user-defined types must use generalpurpose
syntax. Frustration with this causes developers to use strings, rather than
structured data, with alarming frequency, leading to correctness, performance,
security, and usability issues. Allowing library providers to modularly extend a
language with new syntax could help address these issues. Unfortunately, prior
mechanisms either limit expressiveness or are not safely composable: individually
unambiguous extensions can still cause ambiguities when used together.
We introduce type-specific languages (TSLs): logic associated with a type that
determines how the bodies of generic literals, able to contain arbitrary syntax,
are parsed and elaborated, hygienically. The TSL for a type is invoked only
when a literal appears where a term of that type is expected, guaranteeing noninterference.
We give evidence supporting the applicability of this approach and
formally specify it with a bidirectionally typed elaboration semantics for the
Wyvern programming language.
The use of shared mutable state, commonly seen in object-oriented systems, is often problematic due to the potential conflicting interactions between aliases to the same state. We present a substructural type system outfitted with a novel lightweight interference control mechanism, rely-guarantee protocols, that enables controlled aliasing of shared resources. By assigning each alias separate roles, encoded in a novel protocol abstraction in the spirit of rely-guarantee reasoning, our type system ensures that challenging uses of shared state will never interfere in an unsafe fashion. In particular, rely-guarantee protocols ensure that each alias will never observe an unexpected value, or type, when inspecting shared memory regardless of how the changes to that shared state (originating from potentially unknown program contexts) are interleaved at run-time.
Application Programming Interfaces (APIs) often define protocols --- restrictions on the order of client calls to API methods. API protocols are common and difficult to use, which has generated tremendous research effort in alternative specification, implementation, and verification techniques. However, little is understood about the barriers programmers face when using these APIs, and therefore the research effort may be misdirected.
To understand these barriers better, we perform a two-part qualitative study. First, we study developer forums to identify problems that developers have with protocols. Second, we perform a think-aloud observational study, in which we systematically observe professional programmers struggle with these same problems to get more detail on the nature of their struggles and how they use available resources. In our observations, programmer time was spent primarily on four types of searches of the protocol state space. These observations suggest protocol-targeted tools, languages, and verification techniques will be most effective if they enable programmers to efficiently perform state search.
Hadoop is a map-reduce implementation that rapidly processes data in parallel. Cloud provides reliability, flexibility, scalability, elasticity and cost saving to customers. Moving Hadoop into Cloud can be beneficial to Hadoop users. However, Hadoop has two vulnerabilities that can dramatically impact its security in a Cloud. The vulnerabilities are its overloaded authentication key, and the lack of fine-grained access control at the data access level. We propose and develop a security enhancement for Cloud-based Hadoop.
This paper presents an approach for securing software application chains in cloud environments. We use the concept of workflow management systems to explain the model. Our prototype is based on the Kepler scientific workflow system enhanced with security analytics package.
Much of the data researchers usually collect about users’ privacy and security behavior comes from short-term studies and focuses on specific, narrow activities. We present a design architecture for the Security Behavior Observatory (SBO), a client-server infrastructure designed to collect a wide array of data on user and computer behavior from a panel of hundreds of participants over several years. The SBO infrastructure had to be carefully designed to fulfill several requirements. First, the SBO must scale with the desired length, breadth, and depth of data collection. Second, we must take extraordinary care to ensure the security and privacy of the collected data, which will inevitably include intimate details about our participants’ behavior. Third, the SBO must serve our research interests, which will inevitably change over the course of the study, as collected data is analyzed, interpreted, and suggest further lines of inquiry. We describe in detail the SBO infrastructure, its secure data collection methods, the benefits of our design and implementation, as well as the hurdles and tradeoffs to consider when designing such a data collection system.
Much of the data researchers usually collect about users’ privacy and security behavior comes from short-term studies and focuses on specific, narrow activities. We present a design architecture for the Security Behavior Observatory (SBO), a client-server infrastructure designed to collect a wide array of data on user and computer behavior from a panel of hundreds of participants over several years. The SBO infrastructure had to be carefully designed to fulfill several requirements. First, the SBO must scale with the desired length, breadth, and depth of data collection. Second, we must take extraordinary care to ensure the security and privacy of the collected data, which will inevitably include intimate details about our participants' behavior. Third, the SBO must serve our research interests, which will inevitably change over the course of the study, as collected data is analyzed, interpreted, and suggest further lines of inquiry. We describe in detail the SBO infrastructure, its secure data collection methods, the benefits of our design and implementation, as well as the hurdles and tradeoffs to consider when designing such a data collection system. - See more at: https://www.cylab.cmu.edu/research/techreports/2014/tr_cylab14009.html#sthash.vsO39UdR.dpuf
The process of software development and evolution has proven difficult to improve. For example, well documented security issues such as SQL injection (SQLi), after more than a decade, still top most vulnerability lists. Quantitative security process and quality metrics are often subdued due to lack of time and resources. Security problems are hard to quantify and even harder to predict or relate to any process improvement activity. The goal of this thesis is to assess usefulness of “classical” software reliability engineering (SRE) models in the context of open source software security, the conditions under which they may be useful, and the information that they can provide with respect to the security quality of a software product. We start with security problem reports for open source Fedora series of software releases.We illustrate how one can learn from normal operational profile about the non-operational processes related to security problems. One aspect is classification of security problems based on the human traits that contribute to the injection of problems into code, whether due to poor practices or limited knowledge (epistemic errors), or due to random accidental events (aleatoric errors). Knowing the distribution aids in development of an attack profile. In the case of Fedora, the distribution of security problems found post-release was consistent across four different releases of the software. The security problem discovery rate appears to be roughly constant but much lower than the initial non-security problem discovery rate. Previous work has shown that non-operational testing can help accelerate and focus the problem discovery rate and that it can be successfully modeled.We find that some classical reliability models can be used with success to estimate the residual number of security problems, and through that provide a measure of the security characteristics of the software. We propose an agile software testing process that combines operational and non-operational (or attack related) testing with the intent of finding more security problems faster.
Security requirements engineering ideally combines expertise in software security with proficiency in requirements engineering to provide a foundation for developing secure systems. However, security requirements are often inadequately understood and improperly specified, often due to lack of security expertise and a lack of emphasis on security during early stages of system development. Software systems often have common and recurrent security requirements in addition to system-specific security needs. Security requirements patterns can provide a means of capturing common security requirements while documenting the context in which a requirement manifests itself and the tradeoffs involved. The objective of this paper is to aid in understanding of the process for pattern development and provide considerations for writing effective security requirements patterns. We analyzed existing literature on software patterns, problem solving and cognition to outline the process for developing software patterns. We also reviewed strategies for specifying reusable security requirements and security requirements patterns. Our proposed considerations can aid pattern writers in capturing necessary contextual information when documenting security requirements patterns to facilitate application and integration of security requirements.
A fundamental problem in the specification of regulatory privacy policies such as the Health Insurance Portability and Accountability Act (HIPAA) in a computer system is to state the policies precisely, consistent with their high-level intuition. In this paper, we propose UML sequence diagrams as a practical means to graphically express privacy policies. A graphical representation allows decision-makers such as application domain experts and security architects to easily verify and confirm the expected behavior. Once intuitively confirmed, our work in this article introduces an algorithmic approach to formalizing the semantics of sequence diagrams in terms of linear temporal logic (LTL) templates. In all the templates, different semantic aspects are expressed as separate, yet simple LTL formulas that can be composed to define the complex semantics of sequence diagrams. The formalization enables us to leverage the analytical powers of automated decision procedures for LTL formulas to determine if a collection of sequence diagrams is consistent, independent, etc. and also to verify if a system design conforms to the privacy policies. We evaluate our approach by modeling and analyzing a substantial subset of HIPAA rules using sequence diagrams.
Pervasiveness of smartphones and the vast number of corresponding apps have underlined the need for applicable automated software testing techniques. A wealth of research has been focused on either unit or GUI testing of smartphone apps, but little on automated support for end-to-end system testing. This paper presents SIG-Droid, a framework for system testing of Android apps, backed with automated program analysis to extract app models and symbolic execution of source code guided by such models for obtaining test inputs that ensure covering each reachable branch in the program. SIG-Droid leverages two automatically extracted models: Interface Model and Behavior Model. The Interface Model is used to find values that an app can receive through its interfaces. Those values are then exchanged with symbolic values to deal with constraints with the help of a symbolic execution engine. The Behavior Model is used to drive the apps for symbolic execution and generate sequences of events. We provide an efficient implementation of SIG-Droid based in part on Symbolic PathFinder, extended in this work to support automatic testing of Android apps. Our experiments show SIG-Droid is able to achieve significantly higher code coverage than existing automated testing tools targeted for Android.
A recent report indicates that a newly developed mali- cious app for Android is introduced every 11 seconds. To combat this alarming rate of malware creation, we need a scalable malware detection approach that is effective and efficient. In this paper, we introduce SIGPID, a malware detection system based on permission analysis to cope with the rapid increase in the number of Android malware. In- stead of analyzing all 135 Android permissions, our ap- proach applies 3-level pruning by mining the permission data to identify only significant permissions that can be ef- fective in distinguishing benign and malicious apps. SIG- PID then utilizes classification algorithms to classify differ- ent families of malware and benign apps. Our evaluation finds that only 22 out of 135 permissions are significant. We then compare the performance of our approach, using only
22 permissions, against a baseline approach that analyzes all permissions. The results indicate that when Support Vec- tor Machine (SVM) is used as the classifier, we can achieve over 90% of precision, recall, accuracy, and F-measure, which are about the same as those produced by the base- line approach while incurring the analysis times that are 4 to 32 times smaller that those of using all 135 permissions. When we compare the detection effectiveness of SIGPID to those of other approaches, SIGPID can detect 93.62% of malware in the data set, and 91.4% unknown malware.
In a multiagent system, a (social) norm describes what the agents may expect from each other. Norms promote autonomy (an agent need not comply with a norm) and heterogeneity (a norm describes interactions at a high level independent of implementation details). Researchers have studied norm emergence through social learning where the agents interact repeatedly in a graph structure.
In contrast, we consider norm emergence in an open system, where membership can change, and where no predetermined graph structure exists. We propose Silk, a mechanism wherein a generator monitors interactions among member agents and recommends norms to help resolve conflicts. Each member decides on whether to accept or reject a recommended norm. Upon exiting the system, a member passes its experience along to incoming members of the same type. Thus, members develop norms in a hybrid manner to resolve conflicts.
We evaluate Silk via simulation in the traffic domain. Our results show that social norms promoting conflict resolution emerge in both moderate and selfish societies via our hybrid mechanism.
Concurrent programs are prone to various classes of difficult-to- detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SIMRT, an automated regression testing framework for use in de- tecting races introduced by code modifications. SIMRT employs a regression test selection technique, focused on sets of program ele- ments related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritiza- tion technique to improve the rate at which such races are detected. Our empirical study of SIMRT reveals that it is more efficient and effective for revealing races than other approaches, and that its con- stituent test selection and prioritization components each contribute to its performance.
Concurrent programs are prone to various classes of difficult-to-detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SimRT, an automated regression testing framework for use in detecting races introduced by code modifications. SimRT employs a regression test selection technique, focused on sets of program elements related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritization technique to improve the rate at which such races are detected. Our empirical study of SimRT reveals that it is more efficient and effective for revealing races than other approaches, and that its constituent test selection and prioritization components each contribute to its performance.
Parallel garbage collection has been used to speedup the collection process on multicore architectures. Similar to other parallel techniques, balancing the workload among threads is critical to ensuring good overall collection performance. To this end, work stealing is employed by the current stateof-the-art Java Virtual Machine, OpenJDK, to keep GC threads from idling during a collection process. However, we found that the current algorithm is not efficient. Its usage can often cause GC performance to be worse than when work stealing is not used. In this paper, we identify three factors that affect work stealing efficiency: determining tasks that can benefit from stealing, frequency with which to attempt stealing, and performance impacts of failed stealing attempts. Based on this analysis, we propose SmartStealing, a new algorithm that can automatically decide whether to attempt stealing at a particular point during execution. If stealing is attempted, it can efficiently identify a task to steal from. We then compare the collection performances when (i) the default work stealing algorithm is used, (ii) work stealing is not used at all, and (iii) the SmartStealing approach is used. Without modifying the remaining garbage collection system, the evaluation result shows that SmartStealing can reduce the parallel GC execution time for 19 of the 21 benchmarks. The average reduction is 50.4% and the highest reduction is 78.7%. We also investigate the performances of SmartStealing on NUMA and UMA architectures.
Decision makers need capabilities to quickly model and effectively assess consequences of actions and reactions in crisis de-escalation environments. The creation and what-if exercising of such models has traditionally had onerous resource requirements. This research demonstrates fast and viable ways to build such models in operational environments. Through social network extraction from texts, network analytics to identify key actors, and then simulation to assess alternative interventions, advisors can support practicing and execution of crisis de-escalation activities. We describe how we used this approach as part of a scenario-driven modeling effort. We demonstrate the strength of moving from data to models and the advantages of data-driven simulation, which allow for iterative refinement. We conclude with a discussion of the limitations of this approach and anticipated future work.
Certification schemes exist to regulate software systems and prevent them from being deployed before they are judged fit to use. However, practitioners are often unsatisfied with the efficiency of certification standards and processes. In this study, we analyzed two certification standards, Common Criteria and DO-178C, and collected insights from literature and from interviews with subject-matter experts to identify concepts affecting the efficiency of certification processes. Our results show that evaluation time, reusability of evaluation artifacts, and composition of systems and certified artifacts are barriers to achieve efficient certification.
The goal of this roadmap paper is to summarize the stateof-the-art and identify research challenges when developing, deploying and managing self-adaptive software systems. Instead of dealing with a wide range of topics associated with the field, we focus on four essential topics of self-adaptation: design space for self-adaptive solutions, software engineering processes for self-adaptive systems, from centralized to decentralized control, and practical run-time verification & validation for self-adaptive systems. For each topic, we present an overview, suggest future directions, and focus on selected challenges. This paper complements and extends a previous roadmap on software engineering for self-adaptive systems published in 2009 covering a different set of topics, and reflecting in part on the previous paper. This roadmap is one of the many results of the Dagstuhl Seminar 10431 on Software Engineering for Self-Adaptive Systems, which took place in October 2010.
Massively Open Online Courses (MOOCs) provide a unique opportunity to reach out to students who would not normally be reached by alleviating the need to be physically present in the classroom. However, teaching software security coursework outside of a classroom setting can be challenging. What are the challenges when converting security material from an on-campus course to the MOOC format? The goal of this research is to assist educators in constructing software security coursework by providing a comparison of classroom courses and MOOCs. In this work, we compare demographic information, student motivations, and student results from an on-campus software security course and a MOOC version of the same course. We found that the two populations of students differed, with the MOOC reaching a more diverse set of students than the on-campus course. We found that students in the on-campus course had higher quiz scores, on average, than students in the MOOC. Finally, we document our experience running the courses and what we would do differently to assist future educators constructing similar MOOC's.