Visible to the public Biblio

Filters: Keyword is fault localization  [Clear All Filters]
2020-10-06
Zaman, Tarannum Shaila, Han, Xue, Yu, Tingting.  2019.  SCMiner: Localizing System-Level Concurrency Faults from Large System Call Traces. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :515—526.

Localizing concurrency faults that occur in production is hard because, (1) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers to reproduce the failure; (2) it is often impractical to assume the availability of multiple failing executions to localize the faults using existing techniques; (3) it is challenging to search for buggy locations in an application given limited runtime data; and, (4) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which can not be handled by existing tools for diagnosing intra-process(thread-level) failures. To address these problems, we present SCMiner, a practical online bug diagnosis tool to help developers understand how a system-level concurrency fault happens based on the logs collected by the default system audit tools. SCMiner achieves online bug diagnosis to obviate the need for offline bug reproduction. SCMiner does not require code instrumentation on the production system or rely on the assumption of the availability of multiple failing executions. Specifically, after the system call traces are collected, SCMiner uses data mining and statistical anomaly detection techniques to identify the failure-inducing system call sequences. It then maps each abnormal sequence to specific application functions. We have conducted an empirical study on 19 real-world benchmarks. The results show that SCMiner is both effective and efficient at localizing system-level concurrency faults.

2019-06-28
Gulzar, Muhammad Ali.  2018.  Interactive and Automated Debugging for Big Data Analytics. Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. :509-511.

An abundance of data in many disciplines of science, engineering, national security, health care, and business has led to the emerging field of Big Data Analytics that run in a cloud computing environment. To process massive quantities of data in the cloud, developers leverage Data-Intensive Scalable Computing (DISC) systems such as Google's MapReduce, Hadoop, and Spark. Currently, developers do not have easy means to debug DISC applications. The use of cloud computing makes application development feel more like batch jobs and the nature of debugging is therefore post-mortem. Developers of big data applications write code that implements a data processing pipeline and test it on their local workstation with a small sample data, downloaded from a TB-scale data warehouse. They cross fingers and hope that the program works in the expensive production cloud. When a job fails or they get a suspicious result, data scientists spend hours guessing at the source of the error, digging through post-mortem logs. In such cases, the data scientists may want to pinpoint the root cause of errors by investigating a subset of corresponding input records. The vision of my work is to provide interactive, real-time and automated debugging services for big data processing programs in modern DISC systems with minimum performance impact. My work investigates the following research questions in the context of big data analytics: (1) What are the necessary debugging primitives for interactive big data processing? (2) What scalable fault localization algorithms are needed to help the user to localize and characterize the root causes of errors? (3) How can we improve testing efficiency during iterative development of DISC applications by reasoning the semantics of dataflow operators and user-defined functions used inside dataflow operators in tandem? To answer these questions, we synthesize and innovate ideas from software engineering, big data systems, and program analysis, and coordinate innovations across the software stack from the user-facing API all the way down to the systems infrastructure.

2019-03-25
Refaat, S. S., Mohamed, A., Kakosimos, P..  2018.  Self-Healing control strategy; Challenges and opportunities for distribution systems in smart grid. 2018 IEEE 12th International Conference on Compatibility, Power Electronics and Power Engineering (CPE-POWERENG 2018). :1–6.
Implementation of self-healing control system in smart grid is a persisting challenge. Self-Healing control strategy is the important guarantee to implement the smart grid. In addition, it is the support of achieving the secure operation, improving the reliability and security of distribution grid, and realizing the smart distribution grid. Although self-healing control system concept is presented in smart grid context, but the complexity of distribution network structure recommended to choose advanced control and protection system using a self-healing, this system must be able to heal any disturbance in the distribution system of smart grid to improve efficiency, resiliency, continuity, and reliability of the smart grid. This review focuses mostly on the key technology of self-healing control, gives an insight into the role of self-healing in distribution system advantages, study challenges and opportunities in the prospect of utilities. The main contribution of this paper is demonstrating proposed architecture, control strategy for self-healing control system includes fault detection, fault localization, faulted area isolation, and power restoration in the electrical distribution system.
2018-05-16
Abbas, Waseem, Perelman, Lina Sela, Amin, Saurabh, Koutsoukos, Xenofon.  2017.  Resilient Sensor Placement for Fault Localization in Water Distribution Networks. Proceedings of the 8th International Conference on Cyber-Physical Systems. :165–174.

In this paper, we study the sensor placement problem in urban water networks that maximizes the localization of pipe failures given that some sensors give incorrect outputs. False output of a sensor might be the result of degradation in sensor's hardware, software fault, or might be due to a cyber attack on the sensor. Incorrect outputs from such sensors can have any possible values which could lead to an inaccurate localization of a failure event. We formulate the optimal sensor placement problem with erroneous sensors as a set multicover problem, which is NP-hard, and then discuss a polynomial time heuristic to obtain efficient solutions. In this direction, we first examine the physical model of the disturbance propagating in the network as a result of a failure event, and outline the multi-level sensing model that captures several event features. Second, using a combinatorial approach, we solve the problem of sensor placement that maximizes the localization of pipe failures by selecting m sensors out of which at most e give incorrect outputs. We propose various localization performance metrics, and numerically evaluate our approach on a benchmark and a real water distribution network. Finally, using computational experiments, we study relationships between design parameters such as the total number of sensors, the number of sensors with errors, and extracted signal features.

2018-05-09
Gulzar, Muhammad Ali, Interlandi, Matteo, Han, Xueyuan, Li, Mingda, Condie, Tyson, Kim, Miryung.  2017.  Automated Debugging in Data-Intensive Scalable Computing. Proceedings of the 2017 Symposium on Cloud Computing. :520–534.

Developing Big Data Analytics workloads often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g., program crash, outlier results, etc.) arise, developers are often interested in identifying a subset of the input data that is able to reproduce the problem. BigSift is a new faulty data localization approach that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a minimum set of failure-inducing inputs. BigSift redefines data provenance for the purpose of debugging using a test oracle function and implements several unique optimizations, specifically geared towards the iterative nature of automated debugging workloads. BigSift improves the accuracy of fault localizability by several orders-of-magnitude ($\sim$103 to 107×) compared to Titian data provenance, and improves performance by up to 66× compared to Delta Debugging, an automated fault-isolation technique. For each faulty output, BigSift is able to localize fault-inducing data within 62% of the original job running time.

2016-04-08
Abbas, Waseem, Perelman, Lina Sela, Amin, Saurabh, Koutsoukos, Xenofon.  2015.  An Efficient Approach to Fault Identification in Urban Water Networks Using Multi-Level Sensing. Proceedings of the 2Nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments. :147–156.

The objective of this work is to develop an efficient and practical sensor placement method for the failure detection and localization in water networks. We formulate the problem as the minimum test cover problem (MTC) with the objective of selecting the minimum number of sensors required to uniquely identify and localize pipe failure events. First, we summarize a single-level sensing model and discuss an efficient fast greedy approach for solving the MTC problem. Simulation results on benchmark test networks demonstrate the efficacy of the fast greedy algorithm. Second, we develop a multi-level sensing model that captures additional physical features of the disturbance event, such as the time lapsed between the occurrence of disturbance and its detection by the sensor. Our sensor placement approach using MTC extends to the multi-level sensing model and an improved identification performance is obtained via reduced number of sensors (in comparison to single-level sensing model). In particular, we investigate the bi-level sensing model to illustrate the efficacy of employing multi-level sensors for the identification of failure events. Finally, we suggest extensions of our approach for the deployment of heterogeneous sensors in water networks by exploring the trade-off between cost and performance (measured in terms of the identification score of pipe/link failures).