Spotlight on Lablet Research

Spotlight on Lablet Research

Spotlight on Lablet Research #1 - Analytics for Cyber-Physical Systems Cybersecurity

Spotlight on Lablet Research #1 -

Project: Analytics for Cyber-Physical Systems Cybersecurity

Lablet: Vanderbilt University
Participating Sub-Lablet: Massachusetts Institute Technology

Mounting concerns about safety and security have resulted in an intricate ecosystem of guidelines, compliance measures, directives and policy reports for cybersecurity of all critical infrastructure. By definition, such guidelines and policies are written in linear sequential text form that makes them difficult to integrate, or to understand the policy-technology-security interactions, thus limiting their relevance for science of security. The challenges are to develop a structured system model from text-based policy guidelines and directives in order to identify major policy-defined system-wide parameters, situate vulnerabilities, map security requirements to security objectives, and advance research on how multiple system features respond to diverse policy controls to strengthen the security of fundamentals in cyber-physical systems.

Project research draws on major reports presented by the National Institute for Standards and Technology (NIST) as the source of the data. While some efforts have already been made to mine NIST materials, few exploit the value of multi-methods for knowledge mining and analytical tools to support user understanding, analysis, and eventually action. The team's approach learns from, and transcends, the above efforts by developing a platform for multi-methods cybersecurity analytics based entirely on the contents of policy documents. The case application focuses on cybersecurity of smart grid for electric power systems.

The overarching purpose of this project is to support the national strategy for cybersecurity, as outlined in Presidential Executive Orders (EXORD) and the National Defense Authorization Acts (NDAAs). Operationally, the goal is to develop analytics for cybersecurity policies and guidelines targeted specifically to (a) overcome the limitations of text-based guidelines (b) extract the knowledge embedded in policy guidelines, and (c) assist the user community, analysts, and operators in implementation. Another goal is to construct new tools that are applicable to policy directives, regulations, and guidelines for diverse issue areas. The tools will enable users to explore mission-related properties, concerns, or contingencies. The Cyber Security Framework (CSF) is mandatory in the public sector and greatly encouraged for the private sector. CSF provides general guidance and directives of a broadly defined nature. But the mission-specific application is left to the user--with only the general guidance provided by CSF. It is up to the user to proceed as best determined.

Led by Principal Investigator Nazli Choucri, MIT, the research team has been able to align the project vision and mission to National Cybersecurity Policy and identify the policy-relevant ecosystem. In focusing on national cybersecurity policies for securing Cyber-Physical Systems (CPS), the researchers identified core policy documents for smart grid CPS and research design. They extracted data and created a Dependency Structure Matrix (DSM) of the "as-is" smart grid NIST reference model. They also completed the design and operational strategy of the data extraction and linkage method including identifying and extracting the value-added of policy documents and guidelines and developing the process to move from "policy-as-text" to "text-as-data" for constructing the Platform for Cyber Analytics.

A review of rules and methods they developed for extracting data from key documents and creating the linked database allowed them to create initial exploratory tools for analysis of system information, and a core DSM of the CPS system by identifying the first level information dependencies. The dependency matrix will be examined closely and validated, further transformed as needed into clusters and partitions of structure and process in order to explore properties that reveal interconnections and "hidden features". It is also the basis upon which added policy imperatives - also in text form - are incorporated later on in expanded DSM forms.

Additional details on the project can be found here.

Spotlight on Lablet Research #2 - Automated Synthesis Framework for Network Security and Resilience

Spotlight on Lablet Research #2 -

Project: Automated Synthesis Framework for Network Security and Resilience

Lablet: University of Illinois at Urbana-Champaign
Participating Sub-Lablet: Illinois Institute of Technology

This project proposes to develop the analysis methodology needed to support scientific reasoning about the resilience and security of networks, with a particular focus on network control and information/data flow. The core of this vision is an Automated Synthesis Framework (ASF), which will automatically derive network state and repairs from a set of specified correctness requirements and security policies. ASF consists of a set of techniques for performing and integrating security and resilience analyses applied at different layers (i.e., data forwarding, network control, programming language, and application software) in a real-time and automated fashion. The ASF approach is exciting because developing it adds to the theoretical underpinnings of SoS, while using it supports the practice of SoS.

This project is led by Principal Investigator (PI) Matt Caesar and Co-PI Dong Jin. As has been the case since the start of the project, the technology developed as a result of the research is transferred to industry through interactions with Veriflow, a startup company commercializing verification technology that came out of the project's SoS Lablet funding. In September 2019, Veriflow was sold to VMWare, and the technology is slated to be incorporated into VMWare NSX, a very widely used virtualization platform in industry. Collaborations with VMWare continue by incorporating the project platform with NSX, targeting deployment of verification technology to distributed cloud environments, and integration of real-time traffic data into analysis.

In continuing the researchers' investigation of automated synthesis of network control to preserve desired security policies and network invariants, they have designed a list of approximately 30 important and useful invariants to showcase the functionality of the system, as well as to test it in practical use. They have completed the development of parsing infrastructure and performed a large-scale evaluation of the approach on real network data.

They are developing a simulation/emulation-based platform for Cyber-Physical System (CPS) resilience and security evaluation. The platform combines physical computing and networking hardware for the cyber presence while allowing for offline simulation and computation of the physical world. The platform consists of a distributed virtual time kernel module for efficient synchronization between simulation and network emulation/hardware. The kernel module enables processes and their clocks to be paused, resumed, and dilated across embedded Linux devices through the use of hardware interrupts.

Self-healing network management to address the resilient architecture hard problem and apply the methods to applications in cyber-physical energy systems, in particular, phasor measurement unit (PMU) networks in the electric grid, is being explored. The research team developed an efficient algorithm that considers both power system observability and communication network characteristics. The algorithm also assigns weights to the end-devices according to selected power system metrics before performing the optimization. There has been a system evaluation with the proof-of-concept system using IEEE 14, 30 and 118 bus systems.

Other efforts with industry include a collaboration with AT&T to customize and deploy the technology in their environments, and with Boeing on constructing a resilient IoT platform for the battlefield.

Project research has passed extensive peer review, and researchers have published over fifteen works at top conferences. In addition, they have regularly released software packages and data sets used in the publications, which have been used by a number of other researchers.

Additional details on this project can be found here.

Spotlight on Lablet Research #3 - Predicting the Difficulty of Compromise through How Attackers Discover Vulnerabilities

Spotlight on Lablet Research #3 -

Project: Predicting the Difficulty of Compromise through How Attackers Discover Vulnerabilities

Lablet: North Carolina State University
Participating Sub-Lablet: Rochester Institute of Technology

The goal of this project is to provide actionable feedback on the discoverability of a vulnerability. This feedback is useful for in-process software risk assessment, incident response, and the vulnerabilities equities process. The approach is to combine the attack surface metaphor and attacker behavior to estimate how attackers will approach discovering a vulnerability. The researchers want to develop metrics that are useful and improve the metric formulation based on qualitative and quantitative feedback.

Led by Principle Investigator (PI) Andy Meneely and Co-PI Laurie Williams, this project focuses on the attack surface based on the notion that pathways into the system enable attackers to discover vulnerabilities. This knowledge is important to software developers, architects, system administrators, and users. A literature review to classify attack surface definitions led to six clusters of definitions which differ significantly (methods, avenues, flows, features, barriers, and vulnerabilities). The methodology used to discover the attack surface (mining stacktraces from thousands of crash reports) and what the attack surface meant within the context of metric actionability, will lead to evolving the models for a risky walk and deploying a human-in-the-loop study. Attacker behavior data is gathered from the National Collegiate Penetration Testing Competition (CPTC) from years 2018 and 2019.

A national collegiate penetration testing competition allowed researchers to collect a massive data set. Nine teams from schools across the United States performed coordinated sets of network and device penetration attempts during the 2018 National Collegiate Penetration Testing Competition (CPTC). During the ten-hour attack window, the teams generated an aggregate of more than 300GB of alert logs across duplicate networks consisting of 252 virtual machines in total. Researchers captured and made available full images of 99 virtual machines (as .vmdk) and approximately 200GB of alert log data (as JSON) for the six teams who consented to allow their data to be published in this academic research. The inclusion of virtual machine instances along with network alert data is a novel contribution to the research community.

This national collegiate penetration testing competition data set enables the research team to provide a fine-grained history of vulnerability discovery and exploitation. With this data, they can enrich their models of the attack surface, which will, in turn, lead to more robust metrics of difficulty to compromise. Given that this data is from a competition where teams were assigned the same systems and evaluated on their attacks, the difficulty to compromise will come from the correlation of competition data with the alert and virtual machine data.

In completing the collection and analysis of the CPTC data set, project researchers manually annotated and curated the data from the CPTC data set and have over 400 events logged with 79 vulnerabilities reported. They classified each event as part of the MITRE ATT&CK framework, with reviewer agreement kappa scores over 80%, and have begun their attacker behavior model analysis. As part of their submission to the ESEM 2019 New Ideas and Emerging Results track (ESEM NIER), they honed their annotation process by curating one team's entire timeline. Then, to scale up the approach, they hired a CPTC-trained competitor to capture the timelines and map the events to MITRE ATT&CK. The resulting data set will be a fine-grained log of events that map the techniques attackers use to break into systems in the competition environment. The next step for this data set will be to create a probabilistic model that can estimate the probability of discovery for a given vulnerability based upon attacker behavior. A single team had 47 relevant events with approximately 17 vulnerabilities found, which was filtered from millions of events from the SPLUNK monitoring system.

The researchers also collected data from the 2019 CPTC competition, which included over 7TB of logs ranging from command histories, process information, network metadata, and a variety of other fine-grained measurements from the Splunk incident response system. Their techniques for gathering data and creating timelines were optimized ahead of this competition through research assistants volunteering to contribute to the competition's monitoring team. Project researchers were able to create example infrastructures and tested out monitoring tools to ensure reliable data collection. With this data, they can re-create the timelines and deliver them faster to the research public.

Defects in Infrastructure as Code (IaC) scripts can have serious consequences for organizations that adopt DevOps. While developing IaC scripts, practitioners may inadvertently introduce security smells. Security smells are recurring coding patterns that are indicative of security weakness and can potentially lead to security breaches. The goal of this work is to help practitioners avoid insecure coding practices while developing IaC scripts through an empirical study of security smells in IaC scripts. They expanded the scale and depth of previously-reported security smell work by increasing the number of languages to three: Ansible, Chef, and Puppet (prior reported results were for the Puppet language only). They identify nine security smells for IaC scripts.

Security testers often perform a series of operations to exploit vulnerabilities in software source code. Project researchers are conducting a study to aid them in exploiting vulnerabilities in software by synthesizing attack mechanisms used in open-source software. By characterizing these operations, they will identify patterns of tools and techniques used to exploit a vulnerability. This information will help security testers detect vulnerabilities before code changes are integrated and deployed. They are currently investigating security reports archived in open source software bug reports, such as the Bugzilla reports, hosted by Mozilla to characterize what steps are executed to exploit a vulnerability. They are using the Structured Threat Information eXpression (STIX) format to record and standardize the steps needed to exploit a vulnerability.

The work done under this project will lead to the development of a new set of metrics for measuring exploitability using the attack surface. These metrics are based on the behavior observed by penetration testers in a competitive environment. The intrusion detection data collected from the CTPC have provided detailed timelines of how attackers find, exploit, and pivot with vulnerabilities. When studying how they work with the known attack surface, researchers will develop metrics that show which vulnerabilities are at the highest risk based on the current deployment.

Additional details on this project can be found here.

Spotlight on Lablet Research #4 - Characterizing User Behavior and Anticipating its Effects on Computer Security with a Security Behavior Observatory

Spotlight on Lablet Research #4 -

Project: Characterizing User Behavior and Anticipating its Effects on Computer Security with a Security Behavior Observatory

Lablet: Carnegie Mellon University

This research aims to characterize users' choices of password tools and what influence the tools have on password practices. The results of this study will contribute to finding usable solutions for managing authentication.

Technically secure systems may still be exploited if users behave in unsafe ways. Most studies of user behavior are in controlled laboratory settings or in large-scale between-subjects measurements in the field. Both methods have shortcomings: lab experiments are not in natural environments and therefore may not accurately capture real-world behaviors (i.e., low ecological validity), whereas large-scale measurement studies do not allow the researchers to probe user intent or gather explanatory data for observed behaviors, and they offer limited control for confounding factors. The CMU research team, led by Principal Investigator (PI) Lorrie Cranor and Co-PI Nicolas Christin, uses a multi-purpose observational resource, the Security Behavior Observatory (SBO), which was developed to collect data from Windows home computers. The SBO collects a wide array of system, network, and browser data from over 500 home Windows computer users (who participate as human subjects), and this data can be used to investigate a number of aspects of computer security that are especially affected by the hard problem of understanding and accounting for human behavior. Password authentication is one aspect of computer security that is heavily affected by human behavior, since human tendencies and capabilities tend to be directly at odds with what are considered the most secure password practices. This team has previously used data from the SBO to investigate password practices, including the prevalence of password reuse and the potential effects of password management tools on password habits, and this year they published a follow-up study that used interviews to understand why users are choosing various existing password tools and why those tools are or are not leading to more secure password practices. The team is also conducting ongoing work on a number of related research questions, including investigating how people learn about online data breaches and actions people take in the aftermath of those breaches.

To follow up on previous findings that suggested that people in the SBO sample using password managers did not necessarily have stronger passwords or decreased password reuse, the team conducted interviews with a separate sample of 30 participants to better understand password habits and choices of password management tools. These results were published at the Symposium on Usable Privacy and Security (SOUPS). The results suggested that users of built-in password managers may have different underlying motivations for using password tools (i.e., mostly focused on convenience) and may thus use those tools to aid their insecure password habits, whereas people using separately installed password managers seem to be more motivated to prioritize security.

The team is also conducting ongoing research using the SBO dataset to study how people learn about breaches online and the actions people take in the aftermath of breaches. They are studying to what extent participants come across breach information in their browsing, and what aspects of people or their browsing increase their likelihood of reading about breach information. They also study the methods by which people come across breach-related pages characteristics of these methods and pages and what influences people to learn more about breaches or take action (either in the form of more browsing or changing passwords), how often people actually change their passwords in the aftermath of a breach, and how constructive these changes are.

The Security Behavior Observatory addresses the hard problem of "Understanding and Accounting for Human Behavior" by collecting data directly from people's own home computers, thereby capturing people's computing behavior "in the wild." This data is the closest to the ground truth of the users' everyday security and privacy challenges that the research community has ever collected. The insights discovered by analyzing this data will profoundly impact multiple research domains, including but not limited to behavioral sciences, computer security and privacy, economics, and human-computer interaction.

The SBO dataset has been used in multiple related projects since its inception and will continue to provide researchers across the community with users' actual practices in a variety of security and privacy applications.

Additional details on this project can be found here.

Spotlight on Lablet Research #5 - Side-Channel Attack Resistance

Spotlight on Lablet Research #5 -

Project: Side-Channel Attack Resistance

Lablet: University of Kansas

Cyber-Physical Systems (CPS)--cars, airplanes, power plants, etc.--are increasingly dependent on powerful and complex hardware for higher intelligence and functionalities. However, this complex hardware may also introduce new attack vectors--hardware side-channels--which can be exploited by attackers to steal sensitive information, to disrupt the timing of time-critical functions that interact with the physical plants, or to break memory protection mechanisms in modern computers. Because these attacks target hardware, even logically safe and secure software such as a formally verified OS, could still be vulnerable. Given the safety-critical nature of CPS, hardware side-channels should be thoroughly analyzed and prevented in CPS. This project focuses on micro-architectural side channels in embedded multicore computing hardware, and aims to develop fundamental OS and architecture designs that minimize or completely eliminate the possibility of potential hardware-level side-channel attacks. Led by Principal Investigator Heechul Yun, researchers are seeking to fundamentally reduce or completely eradicate these micro-architectural side-channels by introducing new OS abstractions and minimally modifying micro-architecture and OS. Successful completion of this project will result in empirical studies on micro-architectural side-channels in safety-critical CPS and criticality-aware OS and architecture prototypes for side-channel attack resistant CPS.

In a paper entitled "Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention," the research team demonstrated the feasibility and severity of micro-architectural DoS attacks on shared caches in widely used contemporary COTS multicore processors. The paper won an Outstanding Paper Award at IEEE RTAS 2019, and the code was released to reproduce the results.

Researchers developed a comprehensive OS-level scheduling framework, RT-Gang, to mitigate timing-related micro-architectural DoS attacks on multicore platforms. The work was also published at IEEE RTAS 2019, and the code was released as open-source. They are currently extending the capability of the RT-Gang framework to improve real-time schedulability and isolation guarantees.

Researchers successfully integrated a quad-core RISC-V SoC and an NVDLA DNN accelerator on Amazon FPGA cloud environment, and plan to use this platform for micro-architectural side-channel research in the future. The integration and some preliminary results were published at the EMC^2 workshop. The integrated RISC-V SoC testbed, called FireSim-NVDLA, was released as open-source, and has received significant attention from industry practitioners and academic researchers. For example, Nvidia revealed in their official developer blog that it uses FireSim-NVDLA to evaluate the software release for their open-source deep neural network hardware accelerator called Nvidia Deep Learning Accelerator (NVDLA).

The research team also successfully developed SpectreGuard, a software/hardware collaborative defense mechanism against Spectre attacks. The work leverages software provided information to mitigate Spectre attacks at low hardware and performance cost. The work was published at ACM/IEEE DAC 2019, and the code was released as open-source.

The KU researchers have successfully developed a small hardware unit, Bandwidth Regulation Unit (BRU), that regulated memory traffic at the on-chip interconnect level within the RISC-V multicore. The work was accepted for publication at IEEE RTAS 2020, and the research team is currently working to prepare a camera-ready version of the paper and open-source release of the BRU.

The team's work in developing a series of resilient OS and hardware architecture prototypes--by extending open-source OS (Linux) and hardware (RISC-V SoC)--that can defend against micro-architectural attacks with minimal impacts on performance has real potential to influence the broader computer industry.

Additional details on the project can be found here.

Spotlight on Lablet Research #6 - Contextual Integrity for Computer Systems 

Spotlight on Lablet Research #6 -

Project: Contextual Integrity for Computer Systems

Lablet: International Computer Science Institute
Participating Sub-Lablet: Cornell Tech

The overall goal of this research is to convert the philosophical theory of Contextual Integrity (CI) into terms computer scientists can use. Philosophers and computer scientists have different understandings of context, with philosophers focusing on abstract spheres of life, and computer scientists focusing on the concrete. The goal is to develop models of context and contextual integrity that meet computer scientists on their own truth.

Relevant research questions include accounting for privacy in the design of multi-use computer systems that cut across contexts; modeling the adaptation of contexts to changes in technologies; and determining how contextual integrity relates to differential privacy. The current organizing hypothesis is that contexts are defined by a purpose. The privacy norms of a context promote the purpose, and that purpose restrictions are ubiquitous. There are several possible models, including game models, Markov decision process (MDP) models, partially observable Markov decision process models, and multi-agent influence diagrams. Some of the challenges are that contexts don't exist in a vacuum, contexts might be in competition, privacy is multifaceted, and people often disagree. Potential outcomes are progressing on defining privacy, furthering accountability for big data systems that cut across contexts, and enabling policy governed privacy with respect to collaboration.

The research, led by Principal Investigator (PI) Michael Tschantz and Co-PI Helen Nissenbaum, seeks to create a formal representation of the contexts found in Contextual Integrity. Prior work has shown that the term "context" has been interpreted in a wide range of manners. The representation produced will serve as a reference model for not just comparing different interpretations but also for expressing what Helen Nissenbaum, the creator of Contextual Integrity, sees as the precise form of contexts in her theory. The representation will also serve as a starting point for adapting Contextual Integrity to the changing needs of computer science. The current focus is on how a context can be formed by smaller "sub-contexts" composing together. The working hypothesis is that the "values" of a sub-context may come from the purpose of the super-context.

The research team has started to translate the concept of context into a formal representation and has been working on a model of context in a formal representation similar to an MDP. After finding that the model was not flexible enough for accommodating disagreements about the state of a system, the researchers developed a new model that is more flexible by adding a layer of interpreted predicates between norms and states. The new model includes the possibility of values coming from a global context, to model deontological approaches to ethics.

The work surfaced that the notion of norms found in the theory of Contextual Integrity is difficult to precisely pin down, and the researchers will look more closely at the process that legitimates norms in hopes of providing an operational definition for at least legitimate ones. Their work on sub-contexts has found that a similar concept of sub-goals may be conceptually primary, and they are working toward a model of this prerequisite.

The CI framework is being used to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI enables decoupling (1) the syntactic extraction of flows from information exchanges; and (2) the enforcement of privacy policies on these flows. As an alternative to predominant approaches to privacy, which were ineffective against novel information practices enabled by IT, CI was able both to pinpoint sources of disruption and provide grounds for either accepting or rejecting them. Growing challenges from a burgeoning array of networked, sensor-enabled devices (IoT) and data-ravenous machine learning systems, similar in form though magnified in scope, call for renewed attention to theory.

Additional details on the project can be found here.

Spotlight on Lablet Research #7 - Foundations of Cyber-Physical Systems Resilience

Spotlight on Lablet Research #7 -

Project: Foundations of Cyber-Physical Systems Resilience

Lablet: Vanderbilt University

The goal of this project is to develop the principles and methods for designing and analyzing resilient Cyber-Physical Systems (CPS) architectures that deliver required service in the face of compromised components. A fundamental challenge is understanding the basic tenets of CPS resilience and how they can be used in developing resilient architectures. CPS are ubiquitous in critical application domains, which necessitates that systems demonstrate resiliency under cyber-attacks. The researchers' proposed approach integrates redundancy, diversity, and hardening methods for designing both passive resilience methods that are inherently robust against attacks and active resilience methods that allow responding to attacks.

As CPS becomes more prevalent in critical application domains, ensuring security and resilience in the face of cyber-attacks is becoming an issue of paramount importance. Cyber-attacks against critical infrastructures, smart water-distribution, and transportation systems, for example, pose serious threats to public health and safety. Owing to the severity of these threats, a variety of security techniques are available. However, no single technique can address the whole spectrum of cyber-attacks that may be launched by a determined and resourceful attacker. In light of this, the research team, led by Principal Investigator (PI) Xenefon Koutsoukos, adopted a multi-pronged approach for designing secure and resilient CPS, which integrates redundancy, diversity, and hardening techniques for designing either passive resilience methods that are inherently robust against attacks and active resilience methods that allow responding to attacks. They also introduced a framework for quantifying cyber-security risks and optimizing the system design by determining security investments in redundancy, diversity, and hardening. To demonstrate the applicability of the framework, they used a modeling and simulation integration platform for experimentation and evaluation of resilient CPS using CPS application domains such as power, transportation, and water distribution systems.

Adversaries may cause significant damage to smart infrastructure using malicious attacks. To detect and mitigate malicious attacks before they can cause physical damage to smart infrastructure, operators can deploy Anomaly Detection Systems (ADS), which can alarm operators to suspicious activities. However, detection thresholds of ADS need to be configured properly, as an oversensitive detector raises a prohibitively large number of false alarms, while an undersensitive detector may miss actual attacks. Using a game-theoretic approach, researchers formulated the problem of computing optimal detection thresholds, which minimize both the number of false alarms and the probability of missing actual attacks as a two-player Stackelberg security game.

The research team seeks to improve the structural robustness in networks using the notions of diversity and trustiness. Diversity means that nodes in a network are of different types and have many variants. Trustiness means that a small subset of nodes is immune to failures and attacks. They have shown that by combining diversity and trustiness within the network, they can significantly limit the attacker's ability to change the underlying network structure by strategically removing nodes.

Non-control data attacks have become widely popular for circumventing authentication mechanisms in websites, servers, and personal computers. In the context of CPS, attacks can be executed against not only authentication but also safety. Moving Target Defense (MTD) techniques such as Data Space Randomization (DSR) can be effective for protecting against various types of memory corruption attacks, including non-control data attacks. The team's work addressed the problem of maintaining system stability and security properties of a CPS in the face of non-control data attacks by developing a DSR approach for randomizing binaries at runtime, creating a variable redundancy-based detection algorithm for identifying variable integrity violations, and integrating a control reconfiguration architecture for maintaining safe and reliable operation.

With the increasingly connected nature of CPS, new attack vectors are emerging that were previously not considered in the design process. Specifically, autonomous vehicles are one of the most at risk CPS applications, including challenges such as a large amount of legacy software, non-trusted third-party applications, and remote communication interfaces. With zero-day vulnerabilities constantly being discovered, an attacker can exploit such vulnerabilities to inject malicious code or even leverage existing legitimate code to take over the cyber part of a CPS. Due to the tightly coupled nature of CPS, this can lead to altering physical behavior in an undesirable or devastating manner. Therefore, it is no longer effective to harden systems reactively, but a more proactive approach must be taken. MTD techniques such as Instruction Set Randomization (ISR), and Address Space Randomization (ASR) have been shown to be effective against code injection and code reuse attacks. However, these MTD techniques can result in control system crashing, which is unacceptable in CPS applications since such crashing may cause catastrophic consequences. Therefore, it is crucial for MTD techniques to be complemented by control reconfiguration to maintain system availability in the event of a cyber-attack. Recent work addressed the problem of maintaining system and security properties of a CPS under attack by integrating MTD techniques, as well as detection and recovery mechanisms to ensure safe, reliable, and predictable system operation. Specifically, the researchers are considering the problem of detecting code injection as well as code reuse attacks, and reconfiguring fast enough to ensure the safety and stability of autonomous vehicle controllers are maintained. By using MTD such as ISR, and ASR, their approach provides the advantage of preventing attackers from obtaining the reconnaissance knowledge necessary to perform code injection and code reuse attacks, making sure attackers can't find vulnerabilities in the first place. The system implementation includes a combination of runtime MTD utilizing AES 256 ISR and fine-grained ASR, as well as control management that utilizes attack detection and reconfiguration capabilities. They evaluated the developed security architecture in an autonomous vehicle case study, utilizing a custom-developed hardware-in-the-loop testbed.

Technological advancements in today's electrical grids give rise to new vulnerabilities and increase the potential attack surface for cyber-attacks that can severely affect the grid's resilience. Cyber-attacks are increasing both in number as well as sophistication. These attacks can be strategically organized in chronological order (dynamic attacks), where they can be instantiated at different time instants. The chronological order of attacks enables the uncovering of those attack combinations that can cause severe system damage, but this concept remained unexplored due to the lack of dynamic attack models. Motivated by the idea, researchers considered a game-theoretic approach to design a new attacker-defender model for power systems. Here, the attacker can strategically identify the chronological order in which the critical substations and their protection assemblies can be attacked in order to maximize the overall system damage. However, the defender can intelligently identify the critical substations to protect such that the system damage can be minimized. The research team applied the developed algorithms to the IEEE-39 and 57 bus systems with finite attacker/defender budgets. Their results show the effectiveness of these models in improving the system resilience under dynamic attacks.

Additional details on the project can be found here.

Spotlight on Lablet Research #8 - Uncertainty in Security Analysis

Spotlight on Lablet Research #8 -

Project: Uncertainty in Security Analysis

Lablet: University of Illinois at Urbana-Champaign

The goal of this project is to develop a mathematical basis for describing and analyzing the ability of an adversary to laterally traverse networks in the presence of uncertainty about connections and uncertainty about exploitable vulnerabilities. The research team will then use this basis to develop algorithms for quantified risk analysis of Cyber-Physical Systems (CPS).

Cyber-security vulnerabilities in CPS allow an adversary to remotely reach and damage physical infrastructure. Following the initial point of entry, the adversary may move laterally through the computer network using connections that are allowed by the access control but which give access to services with exploitable vulnerabilities. Using lateral movement, the adversary may eventually have control of monitors and actuators in the CPS, corrupt data being reported and/or issue malicious control commands, the consequences of which may inflict significant damage. Analyses of the risk of such attacks are known, under the assumption that all vulnerabilities and all connections in the cyber-system are known perfectly. They aren't. The research team, led by Principal Investigator (PI) David Nicol, is interested in developing the mathematical basis for describing the ability of the adversary to reach actuators in the presence of uncertainty with respect to the connections and the vulnerabilities which enable lateral movement.

Edges derived from the topological analysis may be thought to have "exploitation probabilities," which quantify with a single probability the possibility of the adversary traversing that edge in a lateral movement. An edge probability models the possibility of an adversary on one host A being able to connect to another host B and exploit a vulnerability there, enabling the adversary to launch further attacks from B. In a previous study, researchers used expressions of Boolean random variables to describe these probabilities, in order to be able to escape the otherwise necessary assumption of independence among edges. Since edges quantify the likelihood of an adversary exploiting a vulnerability, distinct edges that describe the same vulnerability will not have independent probabilities. Using Boolean expressions enables researchers to describe those correlations they know must exist. The current investigation generalized this model. Point probabilities implicitly assert certainty in the probability. There are different reasons why an exploitation probability may be non-zero, and some uncertainty in knowing just what causes the variability in connection. The new work replaces edge probabilities with edge probability distributions, which allows greater flexibility in expressing the certain or uncertainty of the edge probability. Using the beta distribution, one set of parameters can create a very spiked distribution centered on the mean, while another set of parameters can create a distribution with the same mean, but with the probability mass distributed so as greater variance is captured. However, while beta distributions are closed under some operations, they are not closed under others, and so the new work considers how to compute the parameters of approximated betas with good accuracy.

In investigating the potential computational benefit of the Beta approximation result, the research showed that using a small number of samples (less than a few thousands), computing the parameters of an approximating Beta yields a significantly better estimate of the reliability distribution than constructing the empirical distribution. The researchers used the proposed model to study the reliability of two realistic systems, a distributed system with redundant deployment and a gas distribution network. They also completed the evaluation of the simulation-based experiments. Numerical results from Monte Carlo simulation of an approximation scheme and from two case studies strongly support the observations made above, especially for non-corner cases where the model parameters do not take extreme values. The next phase of the project is to embed the developed security model in a risk assessment framework. More specifically, the research team is systematically surveying the literature on risk assessment in SCADA systems to identify the common approaches, their abilities, and limitations. The lessons learned will help them build their own risk assessment framework.

This research intersects the predictive security metric problem since researchers are attempting to predict uncertainty associated with a system model. It also intersects with resilience as a system's resilience will be established by analysis of some model and decisions (e.g., how significant the breach may be, whether to interdict and where, where to focus recovery activity) will be made as a result. Those decisions will be better informed when some notion of uncertainty is built into the model predictions, or accompanies those model predictions.

Additional details on this project can be found here.

Spotlight on Lablet Research #9 - Coordinated Machine Learning-Based Vulnerability and Security Patching for Resilient Virtual Computing Infrastructure

Spotlight on Lablet Research #9 -

Coordinated Machine Learning-Based Vulnerability and Security Patching for Resilient Virtual Computing Infrastructure

Lablet: North Carolina State University

This research aims to assist administrators of virtualized computing infrastructures in making services more resilient to security attacks. This is done through applying machine learning to reduce both security and functionality risks in software patching by continually monitoring patched and unpatched software to discover vulnerabilities and triggering proper security updates.

The existing approach to making services more resilient to security attacks is static security analysis and scheduled patching. NCSU researchers, led by Principal Investigator (PI) Xiaohui (Helen) Gu, determined in their experiments that this approach fails to detect 90% of vulnerabilities, displays high false alarms, and shows memory inflation caused by unnecessary security patching. This research project focuses on runtime vulnerability detection using online machine learning methods and just-in-time security patching. Just-in-time security patching includes applying patches intentionally after attacks are detected, enforcing update validation, making intelligent decisions on update vice rebuild, and adhering to system operational constraints.

Containers have become increasingly popular for deploying applications in cloud computing infrastructures. However, researchers' previous studies showed that containers are prone to various security attacks. The research team is continuing research on real-time container vulnerability discovery and refining approaches for aggregated learning and application classification. They refined the dynamic vulnerability exploit detection system (changing the algorithm from SOM to autoencoder) and collected results over a set of real-world exploits. Initial results using a limited set of exploits show that this approach to exploit detection has a 100% detection rate and lower than 1% false alarms. Researchers implemented their aggregated behavior learning framework that leverages learning data from different containers running the same application together. After conducting dynamic container vulnerability exploit detection experiments over 31 vulnerability exploits (e.g., return a shell, execute arbitrary code, privilege escalation) detected in 23 commonly used server applications (e.g., Apache, CouchDB, Bash, ElasticSearch, JBoss, OpenSSH), they compared the detection accuracy of aggregated behavior learning versus individual behavior learning. Their experiments show that aggregated learning can improve the accuracy and detection lead time in 70% of tested exploits. The researchers further conducted an initial study on a container classification scheme that can recognize containers running the same application using runtime monitoring data. Those initial experiments show that they can achieve over 80% classification accuracy. Researchers saw improved detection results while training using a larger number of containers. They further refined the application classification algorithm using random forest learning methods and can now achieve over 96% accuracy. They also completed attack type classification implementation by extracting top frequently used system calls during the attack period and conducted extensive experiments. The resulting attack signature can achieve 90% classification accuracy over all tested 31 vulnerability exploits. They also started to implement the on-demand targeted patching (OPatch) system, and initial experiments show that OPatch can reduce memory overhead incurred by patching by up to 84% and disk overhead by up to 40% compared to existing whole software upgrade approach.

Additional details on this project can be found here.

Spotlight on Lablet Research #10 - Model-Based Explanation for Human-in-the-Loop Security

Spotlight on Lablet Research #10 -

Model-Based Explanation for Human-in-the-Loop Security

Lablet: Carnegie Mellon University

An effective response to security attacks often requires a combination of both automated and human-mediated actions. Currently, we lack adequate methods to reason about such human-system coordination, including ways to determine when to allocate tasks to each party and how to gain assurance that automated mechanisms are appropriately aligned with organizational needs and policies. This project focuses on combining human and automated actions in response to security attacks and will show how probabilistic models and model checkers can be used both to synthesize complex plans that involve a combination of human and automated actions, as well as to provide human-understandable explanations of mitigation plans proposed or carried out by the system.

Models that support attack-resiliency in systems need to address the allocation of tasks to humans and systems, and how the mechanisms align with organizational policies. These models include, for example, identification of when and how systems and humans should cooperate, how to provide self-explanation to support human hand-offs, and ways to assess the overall effectiveness of coordinated human-system approaches for mitigating sophisticated threats. In this project, the research team, led by Principal Investigator (PI) David Garlan, developed a model-based approach to: 1) reason about when and how systems and humans should cooperate with each other; 2) improve human understanding and trust in automated behavior through self-explanation; and 3) provide mechanisms for humans to correct a system's automated behavior when it is inappropriate. The team explored the effectiveness of the techniques in the context of coordinated system-human approaches for mitigating Advanced Persistent Threats (APTs).

As self-security becomes more automated, it becomes harder for humans who interact with the autonomous system to understand the behavior of the systems. Particularly while optimizing for multiple quality objectives and acting under uncertainty, it can be difficult for humans to understand the system behavior generated by an automated planner. The CMU researchers developed an approach with tool support that aims at clarifying system behavior through interactive explanation by allowing end-users to ask Why and Why-Not questions about specific behaviors of the system, and providing answers in the form of contrastive explanation. They further designed and piloted a human study to understand the effectiveness of explanations for human operators.

The research team conducted a human-subject experiment to evaluate the effectiveness of their explainable planning approach. The experimental design is based on one of the envisioned use-cases of the generated explanation. The use-case scenario aims to address the problem of potential misalignment between the end-user's objectives and the planning agent's reward function, particularly in multi-objective planning settings. The use-case of the explanation approach is to enable the end-user of a planning-based system to identify preference alignment or misalignment between the user's objectives and the planning agent's reward function. The main hypothesis of this experimental study is that the explainable planning approach can improve the end-user's ability to identify such preference (mis)alignment in multi-objective planning settings. This was divided into two sub-hypothesis - H1: Participants who receive the explanations are significantly more likely to correctly determine whether the robot's proposed plan is in line with their (given) preference; and H2: Participants who receive the explanations have significantly more reliable confidence in their determination (i.e., have higher confidence-weighted scores). In the study, each participant is prescribed a fixed preference over the 3 objectives. The participant is told that the robot may or may not be optimizing for a reward function that aligns with their prescribed preference. When presented with a navigation plan from the robot, the participant is asked to indicate whether they think the robot's plan is the best available option with respect to their prescribed preference. The study participants were divided into 2 groups. The control group did not receive an explanation from the robot, except the prediction of how much time the navigation would take, and how safe and intrusive the navigation is. The experimental group received a full explanation from the robot. The team measured the participant's accuracy of identifying preference alignment and misalignment, the confidence in their answer, and the amount of time they take to come to an answer. The experiment was conducted using Amazon's Mechanical Turk, and the results that received were compelling:

  • H1: The experiment showed that the team's explanation has a significant effect on the participants' correctness. Overall, the odds of the participants in the treatment group being correct is on average 3.8 times higher than that of the participants in the control group, with 95% confidence interval [2.04, 7.07]. (The fixed-effect logistic regression coefficient estimate is 1.335 and the standard error is 0.317.) Overall, H1 is supported.
  • H2: The team explanation had a significant effect on the participants' reliable confidence. Overall, the confidence-weighted score of the participants in the treatment group is on average 1.73 higher than that of the participants in the control group, on the scale of 4 to +4, with 95% confidence interval [1.04, 2.42]. (The fixed-effect linear regression coefficient estimate is 1.727 and the standard error is 0.351.) Overall, H2 is supported.

The researchers are also continuing to work on how to use graceful degradation to respond to security attacks, using DataLog as a reasoning engine. They improved the performance of the code by about a factor of about 100-500 through adding richness and realism to the architectural styles, as well as rewriting much of the attack trace generation code. Attack traces now overlay data flows across networks, to evaluate the effects of attacks on data flows (and their associated confidentiality, integrity, and availability security attributes), rather than simply network components, bringing the modeling much closer to the proper way to understand the impact of an attack. The researchers also added richness to a new Functional Perspective, refining and improving on aspects of the DoD Architecture Framework (DoDAF).

In the context of Markov Decision Process (MDP) planning, manually inspecting the solution policy and its value function to gain such understanding is infeasible due to the lack of domain semantics and concepts in which the end-users are interested. They also lack information about which, if any, of the objectives are conflicting in a problem instance, and what compromises had to be made. The team investigated an approach for generating an automated explanation of an MDP policy that is based on: (i) describing the expected consequences of the policy in terms of domain-specific, human-concept values, and relating those values to the overall expected cost of the policy, and (ii) explaining any tradeoff by contrasting the policy to counterfactual solutions (i.e., alternative policies that were not generated as a solution) on the basis of their human-concept values and the corresponding costs. The team demonstrates their approach on MDP problems with two different cost criteria, namely, the expected total-cost and average-cost criteria. Such an approach enhances resilient architectures by helping to explain and have stakeholders explore the decision-making process that goes into automated planning for maintaining system resilience.

Additional detail on this project can be found here.

Spotlight on Lablet Research #11 - Cloud-Assisted IoT Systems Privacy

Spotlight on Lablet Research #11 -

Cloud-Assisted IoT Systems Privacy

Lablet: University of Kansas

The goal of this project is to develop principles and methods to model privacy requirements, threats, and protection mechanisms in cloud-assisted IoT systems.

The key to realizing the smart functionalities envisioned through the Internet of Things (IoT) is to securely and efficiently communicate, maintain, and analyze the tremendous amount of data generated by IoT devices. Therefore, integrating IoT with the cloud platform to utilize its computing and big data analysis capabilities becomes critically important, since IoT devices are computational units with strict performance and energy constraints. However, when data is transferred among connected devices or to the cloud, new security and privacy issues arise. In this project, the University of Kansas researchers, led by Principal Investigator (PI) Fengjun Li and Co-PI Bo Luo, investigated the privacy threats in the cloud-assisted IoT systems, in which heterogeneous and distributed data are collected, integrated, and analyzed by different IoT applications. The research aim is to develop a privacy threat analysis framework for modeling privacy threats in the cloud-assisted IoT systems and provide a holistic solution toward privacy protection.

The number of IoT devices is expected to reach 125 billion by 2030. These devices collect a variety of data that may contain privacy-sensitive information of the users. However, it is difficult to quantitatively assess privacy leakage of a text snippet. The researchers are among the first to develop a context-aware, text-based quantitative model for private information assessment to address this problem. The team developed a computational framework using natural language processing and deep neural networks to train prediction models, which can be used to measure privacy scores of texts from a social network dataset. These models serve as the foundation for developing user alerting mechanisms to warn users when they attempt to disseminate sensitive information online.

To support big data analytics over IoT-generated data while protecting IoT data privacy, the KU researchers developed two privacy-preserving learning frameworks for collaborative learning tasks in cloud-assisted IoT systems. First, in the scenarios where clients (e.g., IoT devices) hold horizontally partitioned data (i.e., data with the same distribution or feature space), they developed a privacy-preserving incremental learning protocol. This protocol allows clients to train Support Vector Machines (SVM) using a linear, polynomial kernel function and securely update the SVM model by exchanging the kernel matrix using homomorphic encryption. As IoT devices are continuously generating new data, this privacy-preserving incremental learning scheme can actively update the learning model by integrating the new data into the quadratic program and modifying the kernel parameters when necessary. Therefore, it is critical to predicate tasks in IoT applications.

The researchers also designed a blockchain-based framework to support privacy-preserving and accountable federated learning tasks in IoT applications. As an emerging collaborative learning technique, Federated Learning (FL) allows learning of high-quality, personalized models from data across distributed sources without transmitting the raw data from the devices to anywhere else. This facilitates large-scale collaborative IoT applications by not only leveraging computational resources on IoT devices but also improving user privacy protection. To prevent privacy leakage, model updates (e.g., updated stochastic gradient descents) are securely exchanged among the clients using homomorphic encryption, multi-party computation, and differential privacy schemes. As the IoT devices are loosely federated in the FL tasks, some of them may be malicious by cheating or colluding in the model update. The researchers developed a blockchain-based framework to provide an accountable federated learning service by leveraging the immutability and decentralized trust properties of the blockchain to provide provenance of model update.

Successful completion of this project will result in: 1) a systematic methodology to model privacy threats in data communication, storage, and analysis processes in IoT applications; 2) a privacy threats analysis framework with an extensive catalogue of application-specific privacy needs and privacy-specific threat categorization; and 3) a privacy protection framework that maps existing Privacy Enhancing Technologies (PETs) to the identified privacy needs and threats of IoT applications to simplify the selection of sound privacy protection countermeasures.

Additional details on the project can be found here.

Spotlight on Lablet Research #12 - Operationalizing Contextual Integrity

Spotlight on Lablet Research #12 -

Project: Operationalizing Contextual Integrity

Lablet: International Computer Science Institute
Sub-Lablet: Cornell Tech

The ultimate goal of this research is to design new privacy controls that are grounded in the theory of Contextual Integrity (CI) so that they can automatically infer contextual norms and handle data-sharing and disclosure on a per-use basis. Another goal is to examine how policies surrounding the acceptable use of personal data can be adapted to support the theory of contextual integrity. Our goal is to design and develop future privacy controls that have high usability because their design principles are informed by empirical research.

This project centers around work on mobile device apps that is the basis for what is planned for the future, addressing privacy as contextual integrity. Inappropriate data flows violate contextual information norms; contextual information norms are modeled using data subjection, data sender, data recipient, information type, and transmission principle (constraints). In questioning what this means for user-centered design, it is suggested that an app should only provide notice when reasonable privacy expectations are expected to be violated. The next steps to determine what parameters are actually important to users are:

  • Phase 1: Factorial vignette studies (interviews, surveys; randomly generated scenarios based on controlled parameters)
  • Phase 2: Observational studies (instrument phones, detect parameters and resulting behaviors)

The research team, led by Principal Investigator (PI) Serge Egelman and Co-PI Helen Nissenbaum, is working on improving infrastructure to allow them to study privacy behaviors in situ, long-term project planning to examine new ways of applying the theory of contextual integrity to privacy controls for emergent technologies (e.g., in-home IoT devices), and constructing educational materials for use in the classroom based on the research findings.

In addition to the publications and engagement events, they are designing new studies to better understand users' privacy perceptions surrounding in-home voice assistants and voice capture in general. The goal is to gather data that can be used to predict privacy-sensitive events based on contextual data.

To that end, the team deployed a study to examine contextual norms around in-home audio monitoring, which is likely to proliferate as new devices appear on the market. They recruited users of both the Google Home and Amazon Echo to answer questions about previously-recorded audio from their devices. Both manufacturers make audio recordings accessible to device owners through a web portal, and so the study involved using a browser extension to present these clips to users randomly, and then have them answer questions about the circumstances surrounding the recordings. The research seeks to determine whether users were aware that the recordings were made, how sensitive the content was, as well as participants' preferences for various data retention and sharing policies.

In another set of studies, the team examined existing audio corpora, and then used crowdworkers to identify sensitive conversations that they can then label and use to train a classifier. The goal is to design devices that can predict when they should not be recording or sharing data. The study was deployed for several hundred participants, and the data is under review.

Additional details on the project can be found here.

Spotlight on Lablet Research #13 - Multi-Model Testbed for the Simulation-Based Evaluation of Resilience

Spotlight on Lablet Research #13 -

Project: Multi-Model Testbed for the Simulation-Based Evaluation of Resilience

Lablet: Vanderbilt University

The goal of the Multi-model Testbed is to provide a collaborative design tool for evaluating various cyberattack/defense strategies and their effects on the physical infrastructure. The web-based, cloud-hosted environment integrates state-of-the-art simulation engines for the different CPS domains and presents interesting research challenges as ready-to-use scenarios. Input data, model parameters, and simulation results are archived, versioned with a strong emphasis on repeatability and provenance.

Earlier researchers developed the Science of SecUre and REsilient Cyber-Physical Systems (SURE) platform, a modeling and simulation integration testbed for evaluation of resilience for complex CPS. Previous efforts resulted in a web-based collaborative design environment for attack-defense scenarios supported by a cloud-deployed simulation engine for executing and evaluating the scenarios. Led by PI Peter Volgyesi and Co-PI Himanshu Neema, the VU research team seeks to significantly extend these design and simulation capabilities for better understanding the security and resilience aspects of CPS systems. These improvements include first-class support for the design of experiments (exploring different parameters and/or strategies) and target alternative CPS domains (connected vehicles, railway systems, and smart grids), incorporating models of human behavior, and executing multistage games. The researchers also integrate state-of-the-art machine learning libraries and workflows to support security research with AI-assisted CPS applications. They introduced significant changes to the SURE Testbed architecture to achieve these goals, replacing the HLA-based C2 Wind Tunnel federated simulation engine with a more lightweight integration approach within WebGME and DeepForge.

Testbed efforts are focused on developing a fully integrated workflow in DeepForge, targeting the smart grid CPS domain. This work has two major goals: 1) A complete set of prediction, attack, and detection models have been developed for load forecasting applications; and 2) several building blocks of these models--most notably for gradient-based deep neural network attacks--are generalized to form the basis of a future library of reusable components to create SoS experiments involving learning-enabled components. The Testbed development effort is focusing on the initial integration of two existing design studios: (1) DeepForge, a collaborative deep neural network experimentation platform with TensorFlow/Keras backend support and (2) GridLAB-D Design Studio, for configuring and executing smart power grid simulation models through a web-based interface.

The VU team has established a collaborative and technical exchange with the Cybersecurity Research Group at Fujitsu System Integration Laboratories Ltd. This group uses WebGME, DeepForge, and technology elements of the SURE Testbed to develop their Cyber Range product.

A publication based on the research, "Simulation Testbed for Railway Infrastructure Security and Resilience Evaluation," by Himanshu Neema, Xenofon Koutsoukos, Bradley Potteiger, Cheeyee Tang, and Keith Stouffer won the Best Paper Award at the 7th Annual Symposium on the Science of Security (HoTSoS), held virtually in September 2020.

Additional details on the project can be found here.

Spotlight on Lablet Research #14 - A Human Agent-Focused Approach to Security Modeling

Spotlight on Lablet Research #14 -

Project: A Human Agent-Focused Approach to Security Modeling

Lablet: University of Illinois at Urbana-Champaign

The aim of this project, which concluded in 2020, was to make fundamental advances in scientifically-motivated techniques to aid risk assessment for computer security through the development of a general-purpose, easy-to-use formalism. This formalism will allow for realistic modeling of cyber systems and all human agents that interact with the system with the ultimate goal of generating quantitative results that will help system architects make better design decisions.

The hypothesis is that models that incorporate all human agents who interact with the system will produce insightful metrics. System architects can leverage the results to build more resilient systems that are able to achieve their mission objectives despite attacks.

Researchers began by conducting a literature review with the goal of constructing a high-quality case study to exercise the human-centric cybersecurity modeling formalism being developed. Their case study focused on comparing the security and usability of different password policies (e.g., password length, time until the password expires, etc.), which a hypothetical institution may enact. They constructed submodels of the institutions, their employees and customers, and the adversaries. Then, they composed these submodels and studied the interaction to give insight into the relative strengths and weaknesses of the password policies. The model was validated by using previously-conducted studies of human behavior with regard to passwords.

UIUC researchers extended their work focused on a metamodeling-based approach to sensitivity analysis and uncertainty quantification in complex security models. Many realistic security models run slowly and have input variables whose values are uncertain, making it difficult to conduct sensitivity analysis and uncertainty quantification. It is possible to create metamodels of the base security model that trade some accuracy for speed using machine learning techniques. Researchers had earlier investigated this method by applying it to a previously-published work that models the growth of peer-to-peer botnets, and they then applied it to two new models to test its general applicability.

The researchers also investigated two ways to solve an issue with applying the metamodeling approach to certain models that contained a mix of quantitative and qualitative input variables. The two approaches were one-hot encoding and splitting. They implemented the two approaches and evaluated them on an AMI ADVISE model, and found, at least in that one case, that splitting substantially outperformed one-hot encoding. This work can help modelers apply the metamodeling approach that was developed to a broader class of security models. The metamodeling approach helps modelers perform sensitivity analysis and uncertainty quantification on complex slow-running security models that contain uncertain input variables.

Researcher Michael Rausch and Principal Investigator William Sanders were awarded the Best Paper Award for "Sensitivity Analysis and Uncertainty Quantification of State-Based Discrete-Event Simulation Models through a Stacked Ensemble Metamodels," at the 17th Annual International Conference on Quantitative Evaluation of SysTems (QEST), 2020.

Spotlight on Lablet Research #15 - Reasoning about Accidental and Malicious Misuse via Formal Methods

Spotlight on Lablet Research #15 -

Project: Reasoning about Accidental and Malicious Misuse via Formal Methods

Lablet: North Carolina State University

This project seeks to aid security analysts in identifying and protecting against accidental and malicious actions by users or software through automated reasoning on unified representations of user expectations and software implementations to identify misuses sensitive to usage and machine context.

Led by Principal Investigator (PI) Munindar Singh and Co-PIs William Enck and Laurie Williams, this research project deals with accidental and malicious misuse case discovery in sociotechnical systems. System misuse is conceptually a violation of a stakeholder's expectation of how a system should operate. Whereas existing tools make security decisions using the context of usage, including environment, time, and execution state, they lack an ability to reason about the underlying stakeholder expectations, which are often crucial to identifying misuses. The researchers believe that if existing tools making security decisions could reason about expectations, they could automatically prevent, discover, and mitigate misuse. Unfortunately, the automatic extraction of stakeholder expectations remains ineffective. This led the researchers to identify the following research questions: What are the key components of stakeholders' expectations, and how may they be represented computationally? How would we identify the relevant stakeholder expectations? In what ways can we employ reasoning about expectations to inform the specification of sociotechnical systems to promote security?

The NCSU research team has studied these research questions through case studies on mobile applications as a basis for studying accidental and malicious misuse in a practical setting. Through manual collection and examination of app reviews that describe spying activities with apps, they have determined the necessity of considering app reviews for identifying apps that can aid spying, either explicitly or through misuses. Specifically, they are concerned about intimate partner surveillance spying. Based on this understanding, they are developing a computational framework for spotting such apps, in which they first identify apps that can potentially be misused for spying based on their metadata (e.g., their descriptions and permissions), collect their reviews, and determine their spying capability based on user-reported stories.

The research team started building a computational framework which, by analyzing app reviews, identifies if that app facilitates spying activity; they then conducted a preliminary investigation to identify app reviews that were relevant to spying. They observed that relevant app reviews differ greatly in terms of the severity of the problem leading them to investigate how they can automatically determine the severity of the app's spying capability described in an app review. They are designing an annotation scheme for crowdsourcing the annotation of reviews based on their severity.

As part of their research, they performed a systematic analysis of the network protocol exchanges used by Payment Service Providers (PSPs), and, through formal modeling using the Tamarin Prover, identified four vulnerabilities in these Software Development Toolkits (SDKs) and demonstrated proof-of-concept exploits for four payment service providers. The vulnerabilities have been reported to the providers. The researchers continued the analysis of PSP Application Programming Interfaces (APIs), developing models for analyzing the security of code in SDKs. They also completed a systematic literature review of research works on mining threat intelligence from unstructured textual data.

The team extended their scope from spying to Unexpected Information Gathering (UIG) in mobile apps and identified 124 UIG-enabling apps from their current dataset of apps. They identified an additional 131 UIG-enabling apps in a snowball fashion.

Healthcare professionals use mobile apps to store patient information and communicate with their patients, but not all such apps are HIPAA compliant. The NCSU team began investigating HIPAA compliance of medical mobile apps on the Apple App Store. In a preliminary investigation, they identified 899 medical apps that were potentially relevant but did not mention HIPAA compliance in their descriptions, and they are investigating these 899 medical apps further to determine their compliance with HIPAA.

Background on this project can be found here.

Spotlight on Lablet Research #16 - Securing Safety-Critical Machine Learning Algorithms

Spotlight on Lablet Research #16 -

Project: Securing Safety-Critical Machine Learning Algorithms

Lablet: Carnegie Mellon University

This project seeks to understand how classifiers can be spoofed, including in ways that are not apparent to human observers, and how the robustness of classifiers can be enhanced, including through explanations of model behavior.

Machine-learning algorithms, especially classifiers, are becoming prevalent in safety and security-critical applications. The susceptibility of some types of classifiers to being evaded by adversarial input data has been explored in domains such as spam filtering, but the rapid growth in the adoption of machine learning in multiple application domains amplifies the extent and severity of this vulnerability landscape. The CMU research team is led by Principal Investigator (PI) Lujo Bauer and Co-PI Matt Fredrikson, and includes researchers from the Sub-Lablet University of North Carolina. The team proposes to 1) develop predictive metrics that characterize the degree to which a neural-network-based image classifier used in domains such as face recognition can be evaded through attacks that are both practically realizable and inconspicuous; and 2) develop methods that make these classifiers and the applications that incorporate them, robust to such interference. They first examine how to manipulate images to fool classifiers in various ways, and how to do so in a way that escapes the suspicion of even human onlookers and then develop explanations of model behavior to help identify the presence of a likely attack. They will then generalize these explanations to harden models against future attacks.

The researchers have continued their study of network pruning techniques to enhance robustness. Their approach is based on attribution measurements of internal neurons and aims to identify features that are pivotal for adversarial examples but not necessary for the correct classification of normal inputs. Experiments to date suggest that it is possible to identify and remove such non-robust features for norm-bounded attacks, but further suggest that physical attacks may rely on different sets of features that cannot be pruned without significant impact on model performance.

The team's research focused on revising and extending previous results on n-ML (which provides robustness to evasion attacks via ensembles of topologically diversified classifiers) and attacks on malware detection. They've discovered that the utility of and best approaches for tuning n-ML differ depending on the dataset and its complexity, e.g., tunings of n-ML that lead to particularly good performance for MNIST lead to sub-ideal performance for GTSRB and vice versa. In the process, they are discovering tunings of n-ML that further improve its performance compared to other approaches for making classifiers more robust. The team has improved their algorithm for attacking malware classifiers to better gauge the impact of small changes to the binary (e.g., swapping a pair of instructions) on the correctness of the classification of the binary as benign or malware. They have also identified engineering errors in libraries that their code builds on; fixing these should result in significantly lower resource usage, enabling more comprehensive experiments.

In addition to continuing research on n-ML and on evasion attacks against malware classifiers, they are also working on improving experimental infrastructure and methodology, which will enable more automated, much quicker experiments with malware evasion and a more comprehensive examination of the effects of hyperparameter tuning on both attacks on defenses.

The team is also continuing research on investigating the leakage of training data from models and is currently examining the increased risk that techniques from explainability pose to this leakage, as well as the role that robustness plays in this risk. They seek to determine the feasibility of leakage attacks in black-box settings, where explainability methods are most likely to be used.

Background on this project can be found here.

Spotlight on Lablet Research #17 - Scalable Trust Semantics and Infrastructure

Spotlight on Lablet Research #17 -

Project: Scalable Trust Semantics and Infrastructure

Lablet: University of Kansas

Remote attestation has enormous potential for establishing trust in highly distributed IoT and cyber-physical systems. However, significant work remains to define a unifying semantics of remote attestation that precisely defines guarantees that scales to large, heterogeneous systems. Successful completion of this project will result in a science of trust and remote attestation, and prototype infrastructure for building remote attestation systems.

Remote attestation provides boot and run-time capabilities for attesting to system behavior and establishing trust. When using remote attestation, an appraiser requests evidence from a target that responds by performing measurement to gather evidence and adding cryptographic signatures to assure integrity and authenticity. The appraiser receives and appraises evidence to determine if the target is who it claims to be and is behaving as expected.

University of Kansas (KU) researchers, led by Principal Investigator (PI) Perry Alexander, developed an initial specification of an Attestation Monad that describes the formal semantics of attestation. The Attestation Monad is defined as a state monad with exceptions that provide a stateful environment for executing Copland phrases. The Attestation Monad serves as the basic building block for constructing various elements of remote attestation systems. Specified in Coq, the Attestation Monad has well-defined formal semantics that defines requirements for specifying and verifying systems.

Progress on formalizing the Attestation Monad resulted in significant changes in the overall architecture. Each place in a Copland phrase has consistently been defined as an attestation manager modeled as an Attestation Monad. As the research team began modeling attestation patterns and examples, they realized that a place is actually a collection of such monads for negotiation, attestation, and appraisal. Thus, each attestation manager is a collection of these negotiation, attestation, and appraisal services. Decomposing the definition of place and attestation manager in this way maximizes flexibility allowing an attestation manager to only perform a subset of attestation and appraisal operations or use a collection of Attestation Monads to provide flexible infrastructure. Initial models of attestation patterns suggest this approach has great promise in achieving our goal of scalable trust. Researchers extended the attestation manager design to include late launch via key release where attestation manager boot releases keys as trust is established.

KU researchers continued the development and verification of the Attestation Virtual Machine and a compiler from Copland to the AVM. The basic compiler is largely complete and verified in Coq.

Working with MITRE, JHUAPL, and NSA, the KU team has identified an initial set of attestation architecture constructs that include: a) delegated appraisal; b) cached; c) layered; and d) mutual attestation. The expanded team has been working with a set of specific architectures and is moving towards a collection of architecture building-blocks that assemble to define architectures. For example, a layered attestation involves an appraiser delegating appraisal tasks to its subsystems. A cached attestation involves evidence gathered prior to the attestation request. Composing layered and cached attestation results in an architecture where subcomponents update an attestation agent rather than waiting for requests. A layered mutual attestation allows for several mutual attestation steps to build a full attestation result. Researchers are working to identify and specify a set of such architecture constructions that allow defining attestation systems over large systems.

The team has made significant progress on attestation protocol negotiation and policies for defining negotiation and selection. The goal is assuring that the best protocol is negotiated for both appraiser and target that satisfies their local policy. Researchers began defining a lattice of protocols where the lattice partial ordering defines preference for the appraiser or target. Given an attestation request, the best protocol is the local maxima of the intersection of the sublattices formed by those protocols that satisfy the attestation request while respecting local policy. This is important because it captures formally what "best" means, providing a verification condition as we move forward to a more complete model.

Additional details on the project can be found here.

Spotlight on Lablet Research #18 - Scalable Privacy Analysis

Spotlight on Lablet Research #18 -

Project: Scalable Privacy Analysis

Lablet: International Computer Science Institute (ICSI)

ICSI researchers led by Principal Investigator (PI) Serge Egelman and Co-PI Narseo Vallina-Rodriguez have constructed a toolchain that allows them to automatically perform dynamic analysis on mobile apps to monitor what sensitive personal information the apps attempt to access, and then to whom they transmit it. This is allowing the researchers to perform large-scale studies of the privacy behaviors of the mobile app ecosystem, as well as devise new methods of protecting user privacy.

Governments and private organizations codify expectations of privacy into enforceable policy. These policies have taken such forms as legislation, contracts, and best practices, among others. Common to these rules are definitions of what constitutes private information, and which uses of that information are appropriate or inappropriate. Additionally, policies might place restrictions on what pieces of data may be collected, for what purposes it may be used, how long that data may be retained for yet-unspecified future applications, and under which circumstances (if any) are disclosure and dissemination to other parties permitted.

Different motivations drive different policies. There are procedures and restrictions meant to maintain strategic advantages for holders of sensitive information. The United States government, for instance, routinely classifies information based on the amount of harm to national interests its disclosure would bring. Other policies on data usage seek to protect vulnerable populations by establishing rules limiting how information from those individuals is collected and used: the Family Educational Rights and Privacy Act (FERPA) requires appropriate consent before an individual's educational records are disclosed; the Health Insurance Portability and Accountability Act (HIPAA) regulates the use of Protected Health Information (PHI) by defining what is considered PHI and how individual patients should be de-identified in records prior to aggregation for research purposes; and the Children's Online Privacy Protection Act (COPPA) prohibits the collection of personal information (e.g., contact information and audio/visual recordings) by online services from users under 13 years of age.

The problem is that the constraints for data usage stated in policies--be they stated privacy practices, regulations, or laws--cannot easily be compared against the technologies that they govern. To that end, ICSI researchers propose a framework to automatically compare policy against practice. Broadly, this involves identifying the relevant data usage policies and practices in a given domain, then measuring the real-world exchanges of data restricted by those rules. The results of such a method will then be used to measure and predict the harms brought onto the data's subjects and holders in the event of its unauthorized usage. In doing so, researchers will be able to infer which specific protected pieces of information, individual prohibited operations on that data, and aggregations thereof pose the highest risks compared to other items covered by the policy. This will shed light on the relationship between the unwanted collection of data, its usage and dissemination, and resulting negative consequences. Researchers are currently building a taxonomy of the ways in which apps attempt to detect whether or not they are being monitored (specifically, whether they're running on jailbroken/rooted devices).

The PIs have given numerous talks and media interviews about this project, specifically how apps are tracking users. In 2020, for example, PI Egelman was interviewed by publications including the Washington Post, Consumer Reports, and CNET.

Additional details on the project can be found here.