Visible to the public Biblio

Filters: First Letter Of Last Name is C  [Clear All Filters]
A B [C] D E F G H I J K L M N O P Q R S T U V W X Y Z   [Show ALL]
C
Cyrus Omar, Ian Voysey, Michael Hilton, Joshua Sunshine, Claire Le Goues, Jonathan Aldrich, Matthew Hammer.  2017.  Toward Semantic Foundations for Program Editors. 2nd Summit on Advances in Programming Languages (SNAPL 2017).

Programming language definitions assign formal meaning to complete programs. Programmers, however, spend a substantial amount of time interacting with incomplete programs -- programs with holes, type inconsistencies and binding inconsistencies -- using tools like program editors and live programming environments (which interleave editing and evaluation). Semanticists have done comparatively little to formally characterize (1) the static and dynamic semantics of incomplete programs; (2) the actions available to programmers as they edit and inspect incomplete programs; and (3) the behavior of editor services that suggest likely edit actions to the programmer based on semantic information extracted from the incomplete program being edited, and from programs that the system has encountered in the past. As such, each tool designer has largely been left to develop their own ad hoc heuristics. 
This paper serves as a vision statement for a research program that seeks to develop these "missing" semantic foundations. Our hope is that these contributions, which will take the form of a series of simple formal calculi equipped with a tractable metatheory, will guide the design of a variety of current and future interactive programming tools, much as various lambda calculi have guided modern language designs. Our own research will apply these principles in the design of Hazel, an experimental live lab notebook programming environment designed for data science tasks. We plan to co-design the Hazel language with the editor so that we can explore concepts such as edit-time semantic conflict resolution mechanisms and mechanisms that allow library providers to install library-specific editor services.

Cyrus Omar, Chenglong Wang, Jonathan Aldrich.  2015.  Composable and Hygienic Typed Syntax Macros. Symposium on Applied Computing (SAC).

Syntax extension mechanisms are powerful, but reasoning about syntax extensions can be difficult. Recent work on type-specific languages (TSLs) addressed reasoning about composition, hygiene and typing for extensions introducing new literal forms. We supplement TSLs with typed syntax macros (TSMs), which, unlike TSLs, are explicitly invoked to give meaning to delimited segments of arbitrary syntax. To maintain a typing discipline, we describe two avors of term-level TSMs: synthetic TSMs specify the type of term that they generate, while analytic TSMs can generate terms of arbitrary type, but can only be used in positions where the type is otherwise known. At the level of types, we describe a third avor of TSM that generates a type of a specified kind along with its TSL and show interesting use cases where the two mechanisms operate in concert.

Cyrus Omar, Ian Voysey, Michael Hilton, Jonathan Aldrich, Matthew Hammer.  2017.  Hazelnut: a bidirectionally typed structure editor calculus. POPL 2017 Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages.

Structure editors allow programmers to edit the tree structure of a program directly. This can have cognitive benefits, particularly for novice and end-user programmers. It also simplifies matters for tool designers, because they do not need to contend with malformed program text. This paper introduces Hazelnut, a structure editor based on a small bidirectionally typed lambda calculus extended with holes and a cursor. Hazelnut goes one step beyond syntactic well-formedness: its edit actions operate over statically meaningful incomplete terms. Naïvely, this would force the programmer to construct terms in a rigid “outside-in” manner. To avoid this problem, the action semantics automatically places terms assigned a type that is inconsistent with the expected type inside a hole. This meaningfully defers the type consistency check until the term inside the hole is finished. Hazelnut is not intended as an end-user tool itself. Instead, it serves as a foundational account of typed structure editing. To that end, we describe how Hazelnut’s rich metatheory, which we have mechanized using the Agda proof assistant, serves as a guide when we extend the calculus to include binary sum types. We also discuss various interpretations of holes, and in so doing reveal connections with gradual typing and contextual modal type theory, the Curry-Howard interpretation of contextual modal logic. Finally, we discuss how Hazelnut’s semantics lends itself to implementation as an event-based functional reactive program. Our simple reference implementation is written using js_of_ocaml. 

Cyrus Omar, Benjamin Chung, Darya Kurilova, Alex Potanin, Jonathan Aldrich.  2013.  Type-directed, whitespace-delimited parsing for embedded DSLs. GlobalDSL '13 Proceedings of the First Workshop on the Globalization of Domain Specific Languages.

Domain-specific languages improve ease-of-use, expressiveness and verifiability, but defining and using different DSLs within a single application remains difficult. We introduce an approach for embedded DSLs where 1) whitespace delimits DSL-governed blocks, and 2) the parsing and type checking phases occur in tandem so that the expected type of the block determines which domain-specific parser governs that block. We argue that this approach occupies a sweet spot, providing high expressiveness and ease-of-use while maintaining safe composability. We introduce the design, provide examples and describe an ongoing implementation of this strategy in the Wyvern programming language. We also discuss how a more conventional keyword-directed strategy for parsing of DSLs can arise as a special case of this type-directed strategy. 

Cyrus Omar, Darya Kurilova, Ligia Nistor, Benjamin Chung, Alex Potanin, Jonathan Aldrich.  2014.  Safely Composable Type-Specific Languages. Proceedings of the 28th European Conference on ECOOP 2014 --- Object-Oriented Programming.

Programming languages often include specialized syntax for common datatypes e.g. lists and some also build in support for specific specialized datatypes e.g. regular expressions, but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages TSLs: logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.

Cyrus Omar, Jonathan Aldrich.  2016.  Programmable semantic fragments: the design and implementation of typy. GPCE 2016 Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences.

This paper introduces typy, a statically typed programming language embedded by reflection into Python. typy features a fragmentary semantics, i.e. it delegates semantic control over each term, drawn from Python's fixed concrete and abstract syntax, to some contextually relevant user-defined semantic fragment. The delegated fragment programmatically 1) typechecks the term (following a bidirectional protocol); and 2) assigns dynamic meaning to the term by computing a translation to Python.

We argue that this design is expressive with examples of fragments that express the static and dynamic semantics of 1) functional records; 2) labeled sums (with nested pattern matching a la ML); 3) a variation on JavaScript's prototypal object system; and 4) typed foreign interfaces to Python and OpenCL. These semantic structures are, or would need to be, defined primitively in conventionally structured languages.

We further argue that this design is compositionally well-behaved. It avoids the expression problem and the problems of grammar composition because the syntax is fixed. Moreover, programs are semantically stable under fragment composition (i.e. defining a new fragment will not change the meaning of existing program components.)

Coblenz, Michael, Aldrich, Jonathan, Myers, Bradley, Sunshine, Joshua.  2014.  Considering Productivity Effects of Explicit Type Declarations. Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU), 2014.

Static types may be used both by the language implementation and directly by the user as documentation. Though much existing work focuses primarily on the implications of static types on the semantics of programs, relatively little work considers the impact on usability that static types pro- vide. Though the omission of static type information may decrease program length and thereby improve readability, it may also decrease readability because users must then frequently derive type information manually while reading programs. As type inference becomes more popular in languages that are in widespread use, it is important to consider whether the adoption of type inference may impact productivity of developers.

Claus Hunsen, Bo Zhang, Janet Siegmund, Christian Kästner, Olaf Lebenich, Martin Becker, Sven Apel.  2015.  Preprocessor-based variability in open-source and industrial software systems: An empirical study. Empirical Software Engineering. 20:1-34.

Almost every sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C preprocessor (CPP) is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry—especially from the embedded-systems domain—based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, the analyzed open-source systems and the considered embedded systems from industry are similar regarding most metrics, including systems that have been developed in industry and made open source at some point. So, our study indicates that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems—at least, with respect to the metrics we considered.

Christopher Theisen, Brendan Murphy, Kim Herzig, Laurie Williams.  Submitted.  Risk-Based Attack Surface Approximation: How Much Data is Enough? International Conference on Software Engineering (ICSE) Software Engineering in Practice (SEIP) 2017.

Proactive security reviews and test efforts are a necessary component of the software development lifecycle. Resource limitations often preclude reviewing the entire code
base. Making informed decisions on what code to review can improve a team’s ability to find and remove vulnerabilities. Risk-based attack surface approximation (RASA) is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. The goal of this research is to help software development teams prioritize security efforts by the efficient development of a risk-based attack surface approximation. We explore the use of RASA using Mozilla Firefox and Microsoft Windows stack traces from crash dumps. We create RASA at the file level for Firefox, in which the 15.8% of the files that were part of the approximation contained 73.6% of the vulnerabilities seen for the product. We also explore the effect of random sampling of crashes on the approximation, as it may be impractical for organizations to store and process every crash received. We find that 10-fold random sampling of crashes at a rate of 10% resulted in 3% less vulnerabilities identified than using the entire set of stack traces for Mozilla Firefox. Sampling crashes in Windows 8.1 at a rate of 40% resulted in insignificant differences in vulnerability and file coverage as compared to a rate of 100%.

Christopher Bogart, Christian Kästner, James Herbsleb, Ferdian Thung.  2016.  How to break an API: cost negotiation and community values in three software ecosystems. FSE 2016 Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.

Change introduces conflict into software ecosystems: breaking changes may ripple through the ecosystem and trigger rework for users of a package, but often developers can invest additional effort or accept opportunity costs to alleviate or delay downstream costs. We performed a multiple case study of three software ecosystems with different tooling and philosophies toward change, Eclipse, R/CRAN, and Node.js/npm, to understand how developers make decisions about change and change-related costs and what practices, tooling, and policies are used. We found that all three ecosystems differ substantially in their practices and expectations toward change and that those differences can be explained largely by different community values in each ecosystem. Our results illustrate that there is a large design space in how to build an ecosystem, its policies and its supporting infrastructure; and there is value in making community values and accepted tradeoffs explicit and transparent in order to resolve conflicts and negotiate change-related costs

Christian Kästner, Jurgen Pfeffer.  2014.  Limiting Recertification in Highly Configurable Systems Analyzing Interactions and Isolation among Configuration Options. HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security.

In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core. 

Christian Kästner, Jurgen Pfeffer.  2014.  Analyzing Interactions and Isolation among Configuration Options. HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security.

In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core. 

Casey Canfield, Alex Davis, Baruch Fischhoff, Alain Forget, Sarah Pearman, Jeremy Thomas.  2017.  Replication: Challenges in Using Data Logs to Validate Phishing Detection Ability Metrics. 13th Symposium on Usable Privacy and Security (SOUPS).

The Security Behavior Observatory (SBO) is a longitudinal field-study of computer security habits that provides a novel dataset for validating computer security metrics. This paper demonstrates a new strategy for validating phishing detection ability metrics by comparing performance on a phishing signal detection task with data logs found in the SBO. We report: (1) a test of the robustness of performance on the signal detection task by replicating Canfield, Fischhoff and Davis (2016), (2) an assessment of the task's construct validity, and (3) evaluation of its predictive validity using data logs. We find that members of the SBO sample had similar signal detection ability compared to members of the previous mTurk sample and that performance on the task correlated with the Security Behavior Intentions Scale (SeBIS). However, there was no evidence of predictive validity, as the signal detection task performance was unrelated to computer security outcomes in the SBO, including the presence of malicious URLs, malware, and malicious files. We discuss the implications of these findings and the challenges of comparing behavior on structured experimental tasks to behavior in complex real-world settings.

C. Theisen, L. Williams, K. Oliver, E. Murphy-Hill.  2016.  Software Security Education at Scale. 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). :346-355.

Massively Open Online Courses (MOOCs) provide a unique opportunity to reach out to students who would not normally be reached by alleviating the need to be physically present in the classroom. However, teaching software security coursework outside of a classroom setting can be challenging. What are the challenges when converting security material from an on-campus course to the MOOC format? The goal of this research is to assist educators in constructing software security coursework by providing a comparison of classroom courses and MOOCs. In this work, we compare demographic information, student motivations, and student results from an on-campus software security course and a MOOC version of the same course. We found that the two populations of students differed, with the MOOC reaching a more diverse set of students than the on-campus course. We found that students in the on-campus course had higher quiz scores, on average, than students in the MOOC. Finally, we document our experience running the courses and what we would do differently to assist future educators constructing similar MOOC's.

C. Theisen, K. Herzig, B. Murphy, L. Williams.  2017.  Risk-based attack surface approximation: how much data is enough? 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). :273-282.

Proactive security reviews and test efforts are a necessary component of the software development lifecycle. Resource limitations often preclude reviewing the entire code base. Making informed decisions on what code to review can improve a team's ability to find and remove vulnerabilities. Risk-based attack surface approximation (RASA) is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. The goal of this research is to help software development teams prioritize security efforts by the efficient development of a risk-based attack surface approximation. We explore the use of RASA using Mozilla Firefox and Microsoft Windows stack traces from crash dumps. We create RASA at the file level for Firefox, in which the 15.8% of the files that were part of the approximation contained 73.6% of the vulnerabilities seen for the product. We also explore the effect of random sampling of crashes on the approximation, as it may be impractical for organizations to store and process every crash received. We find that 10-fold random sampling of crashes at a rate of 10% resulted in 3% less vulnerabilities identified than using the entire set of stack traces for Mozilla Firefox. Sampling crashes in Windows 8.1 at a rate of 40% resulted in insignificant differences in vulnerability and file coverage as compared to a rate of 100%.