Visible to the public Biblio

Filters: Keyword is Fault tolerant systems  [Clear All Filters]
Torkura, Kennedy A., Sukmana, Muhammad I.H., Cheng, Feng, Meinel, Christoph.  2019.  Security Chaos Engineering for Cloud Services: Work In Progress. 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA). :1–3.
The majority of security breaches in cloud infrastructure in recent years are caused by human errors and misconfigured resources. Novel security models are imperative to overcome these issues. Such models must be customer-centric, continuous, not focused on traditional security paradigms like intrusion detection and adopt proactive techniques. Thus, this paper proposes CloudStrike, a cloud security system that implements the principles of Chaos Engineering to enable the aforementioned properties. Chaos Engineering is an emerging discipline employed to prevent non-security failures in cloud infrastructure via Fault Injection Testing techniques. CloudStrike employs similar techniques with a focus on injecting failures that impact security i.e. integrity, confidentiality and availability. Essentially, CloudStrike leverages the relationship between dependability and security models. Preliminary experiments provide insightful and prospective results.
Liem, Clifford, Murdock, Dan, Williams, Andrew, Soukup, Martin.  2019.  Highly Available, Self-Defending, and Malicious Fault-Tolerant Systems for Automotive Cybersecurity. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). :24–27.
With the growing number of electronic features in cars and their connections to the cloud, smartphones, road-side equipment, and neighboring cars the need for effective cybersecurity is paramount. Beyond the concern of brand degradation, warranty fraud, and recalls, what keeps manufacturers up at night is the threat of malicious attacks which can affect the safety of vehicles on the road. Would any single protection technique provide the security needed over the long lifetime of a vehicle? We present a new methodology for automotive cybersecurity where the designs are made to withstand attacks in the future based on the concepts of high availability and malicious fault-tolerance through self-defending techniques. When a system has an intrusion, self-defending technologies work to contain the breach using integrity verification, self-healing, and fail-over techniques to keep the system running.
Babasaheb, Desai Rahul, Raman, Indhumathi.  2018.  Survey on Fault Tolerance and Security in Mobile Ad Hoc Networks (MANETs). 2018 3rd International Conference for Convergence in Technology (I2CT). :1–5.
Providing fault tolerance in Mobile Ad hoc Networks (MANETs) is very tricky activity as nodes migrate from one place to other place and changes network topology. Also MANET is very susceptible for various attacks like DoS attacks etc. So providing security to MANET is also very difficult job. Multipath protocols provide better results than unipath protocols. Multipath protocols provide fault tolerance but many multipath protocols for MANETs not targeted security issues. Distributed and cooperative security that means Intrusion Detection System (IDS) gives better security to MANETs. In this paper we have discussed many confronts and concerns regarding fault tolerance and IDS.
Ma, Siyou, Yan, Yunqiang.  2018.  Simulation Testing of Fault-Tolerant CPS Based on Hierarchical Adaptive Policies. 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). :443—449.

Cyber physical system (CPS) is often deployed at safety-critical key infrastructures and fields, fault tolerance policies are extensively applied in CPS systems to improve its credibility; the same physical backup of hardware redundancy (SPB) technology is frequently used for its simple and reliable implementation. To resolve challenges faced with in simulation test of SPB-CPS, this paper dynamically determines the test resources matched with the CPS scale by using the adaptive allocation policies, establishes the hierarchical models and inter-layer message transmission mechanism. Meanwhile, the collaborative simulation time sequence push strategy and the node activity test mechanism based on the sliding window are designed in this paper to improve execution efficiency of the simulation test. In order to validate effectiveness of the method proposed in this paper, we successfully built up a fault-tolerant CPS simulation platform. Experiments showed that it can improve the SPB-CPS simulation test efficiency.

Xu, Zheng, Abraham, Jacob.  2019.  Resilient Reorder Buffer Design for Network-on-Chip. 20th International Symposium on Quality Electronic Design (ISQED). :92–97.

Functionally safe control logic design without full duplication is difficult due to the complexity of random control logic. The Reorder buffer (ROB) is a control logic function commonly used in high performance computing systems. In this study, we focus on a safe ROB design used in an industry quality Network-on-Chip (NoC) Advanced eXtensible Interface (AXI) Network Interface (NI) block. We developed and applied area efficient safe design techniques including partial duplication, Error Detection Code (EDC) and invariance checking with formal proofs and showed that we can achieve a desired safe Diagnostic Coverage (DC) requirement with small area and power overheads and no performance degradation.

Gotsman, Alexey, Lefort, Anatole, Chockler, Gregory.  2019.  White-Box Atomic Multicast. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :176–187.

Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.

Sengupta, Anirban, Kachave, Deepak.  2018.  Integrating Compiler Driven Transformation and Simulated Annealing Based Floorplan for Optimized Transient Fault Tolerant DSP Cores. 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). :17–20.
Reliability of electronic devices in sub-nanometer technology scale has become a major concern. However, demand for battery operated low power, high performance devices necessitates technology scaling. To meet these contradictory design goals optimization and reliability must be performed simultaneously. This paper proposes by integrating compiler driven transformation and simulated annealing based optimization process for generating optimized low cost transient fault tolerant DSP core. The case study on FIR filter shows improved performance (in terms of reduced area and delay) of proposed approach in comparison to state-of-art transient fault tolerant approach.
Lawson, M., Lofstead, J..  2018.  Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales. 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage Data Intensive Scalable Computing Systems (PDSW-DISCS). :13–23.
Our previous work, which can be referred to as EMPRESS 1.0, showed that rich metadata management provides a relatively low-overhead approach to facilitating insight from scale-up scientific applications. However, this system did not provide the functionality needed for a viable production system or address whether such a system could scale. Therefore, we have extended our previous work to create EMPRESS 2.0, which incorporates the features required for a useful production system. Through a discussion of EMPRESS 2.0, this paper explores how to incorporate rich query functionality, fault tolerance, and atomic operations into a scalable, storage system independent metadata management system that is easy to use. This paper demonstrates that such a system offers significant performance advantages over HDF5, providing metadata querying that is 150X to 650X faster, and can greatly accelerate post-processing. Finally, since the current implementation of EMPRESS 2.0 relies on an RDBMS, this paper demonstrates that an RDBMS is a viable technology for managing data-oriented metadata.
Popov, P..  2017.  Models of Reliability of Fault-Tolerant Software Under Cyber-Attacks. 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). :228–239.

This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and "cleansing" ("proactive recovery") under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.

Resch, S., Paulitsch, M..  2017.  Using TLA+ in the Development of a Safety-Critical Fault-Tolerant Middleware. 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). :146–152.

Creating and implementing fault-tolerant distributed algorithms is a challenging task in highly safety-critical industries. Using formal methods supports design and development of complex algorithms. However, formal methods are often perceived as an unjustifiable overhead. This paper presents the experience and insights when using TLA+ and PlusCal to model and develop fault-tolerant and safety-critical modules for TAS Control Platform, a platform for railway control applications up to safety integrity level (SIL) 4. We show how formal methods helped us improve the correctness of the algorithms, improved development efficiency and how part of the gap between model and implementation has been closed by translation to C code. Additionally, we describe how we gained trust in the formal model and tools by following a specific design process called property-driven design, which also implicitly addresses software quality metrics such as code coverage metrics.

Guan, Z., Si, G., Du, X., Liu, P., Zhang, Z., Zhou, Z..  2017.  Protecting User Privacy Based on Secret Sharing with Fault Tolerance for Big Data in Smart Grid. 2017 IEEE International Conference on Communications (ICC). :1–6.

In smart grid, large quantities of data is collected from various applications, such as smart metering substation state monitoring, electric energy data acquisition, and smart home. Big data acquired in smart grid applications is usually sensitive. For instance, in order to dispatch accurately and support the dynamic price, lots of smart meters are installed at user's house to collect the real-time data, but all these collected data are related to user privacy. In this paper, we propose a data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without revealing user's privacy. Meanwhile, we also consider fault tolerance during the data aggregation. At last, we analyze the security of our scheme and carry out experiments to validate the results.

Chatti, S., Ounelli, H..  2017.  Fault Tolerance in a Cloud of Databases Environment. 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA). :166–171.

We will focused the concept of serializability in order to ensure the correct processing of transactions. However, both serializability and relevant properties within transaction-based applications might be affected. Ensure transaction serialization in corrupt systems is one of the demands that can handle properly interrelated transactions, which prevents blocking situations that involve the inability to commit either transaction or related sub-transactions. In addition some transactions has been marked as malicious and they compromise the serialization of running system. In such context, this paper proposes an approach for the processing of transactions in a cloud of databases environment able to secure serializability in running transactions whether the system is compromised or not. We propose also an intrusion tolerant scheme to ensure the continuity of the running transactions. A case study and a simulation result are shown to illustrate the capabilities of the suggested system.

Mondal, S. K., Sabyasachi, A. S., Muppala, J. K..  2017.  On Dependability, Cost and Security Trade-Off in Cloud Data Centers. 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC). :11–19.

The performance, dependability, and security of cloud service systems are vital for the ongoing operation, control, and support. Thus, controlled improvement in service requires a comprehensive analysis and systematic identification of the fundamental underlying constituents of cloud using a rigorous discipline. In this paper, we introduce a framework which helps identifying areas for potential cloud service enhancements. A cloud service cannot be completed if there is a failure in any of its underlying resources. In addition, resources are kept offline for scheduled maintenance. We use redundant resources to mitigate the impact of failures/maintenance for ensuring performance and dependability; which helps enhancing security as well. For example, at least 4 replicas are required to defend the intrusion of a single instance or a single malicious attack/fault as defined by Byzantine Fault Tolerance (BFT). Data centers with high performance, dependability, and security are outsourced to the cloud computing environment with greater flexibility of cost of owing the computing infrastructure. In this paper, we analyze the effectiveness of redundant resource usage in terms of dependability metric and cost of service deployment based on the priority of service requests. The trade-off among dependability, cost, and security under different redundancy schemes are characterized through the comprehensive analytical models.

He, S., Shu, Y., Cui, X., Wei, C., Chen, J., Shi, Z..  2017.  A Trust Management Based Framework for Fault-Tolerant Barrier Coverage in Sensor Networks. 2017 IEEE Wireless Communications and Networking Conference (WCNC). :1–6.

Barrier coverage has been widely adopted to prevent unauthorized invasion of important areas in sensor networks. As sensors are typically placed outdoors, they are susceptible to getting faulty. Previous works assumed that faulty sensors are easy to recognize, e.g., they may stop functioning or output apparently deviant sensory data. In practice, it is, however, extremely difficult to recognize faulty sensors as well as their invalid output. We, in this paper, propose a novel fault-tolerant intrusion detection algorithm (TrusDet) based on trust management to address this challenging issue. TrusDet comprises of three steps: i) sensor-level detection, ii) sink-level decision by collective voting, and iii) trust management and fault determination. In the Step i) and ii), TrusDet divides the surveillance area into a set of fine- grained subareas and exploits temporal and spatial correlation of sensory output among sensors in different subareas to yield a more accurate and robust performance of barrier coverage. In the Step iii), TrusDet builds a trust management based framework to determine the confidence level of sensors being faulty. We implement TrusDet on HC- SR501 infrared sensors and demonstrate that TrusDet has a desired performance.

Duan, S., Li, Y., Levitt, K..  2016.  Cost sensitive moving target consensus. 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA). :272–281.

Consensus is a fundamental approach to implementing fault-tolerant services through replication. It is well known that there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing system and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.

Zheng, J., Okamura, H., Dohi, T..  2016.  Performance Evaluation of VM-based Intrusion Tolerant Systems with Poisson Arrivals. 2016 Fourth International Symposium on Computing and Networking (CANDAR). :181–187.

Computer security has become an increasingly important hot topic in computer and communication industry, since it is important to support critical business process and to protect personal and sensitive information. Computer security is to keep security attributes (confidentiality, integrity and availability) of computer systems, which face the threats such as deny-of-service (DoS), virus and intrusion. To ensure high computer security, the intrusion tolerance technique based on fault-tolerant scheme has been widely applied. This paper presents the quantitative performance evaluation of a virtual machine (VM) based intrusion tolerant system. Concretely, two security measures are derived; MTTSF (mean time to security failure) and the effective traffic intensity. The mathematical analysis is achieved by using Laplace-Stieltjes transforms according to the analysis of M/G/1 queueing system.

Regainia, L., Salva, S., Ecuhcurs, C..  2016.  A classification methodology for security patterns to help fix software weaknesses. 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA). :1–8.

Security patterns are generic solutions that can be applied since early stages of software life to overcome recurrent security weaknesses. Their generic nature and growing number make their choice difficult, even for experts in system design. To help them on the pattern choice, this paper proposes a semi-automatic methodology of classification and the classification itself, which exposes relationships among software weaknesses, security principles and security patterns. It expresses which patterns remove a given weakness with respect to the security principles that have to be addressed to fix the weakness. The methodology is based on seven steps, which anatomize patterns and weaknesses into set of more precise sub-properties that are associated through a hierarchical organization of security principles. These steps provide the detailed justifications of the resulting classification and allow its upgrade. Without loss of generality, this classification has been established for Web applications and covers 185 software weaknesses, 26 security patterns and 66 security principles. Research supported by the industrial chair on Digital Confidence (

Ansari, M. R., Yu, S., Yu, Q..  2015.  "IntelliCAN: Attack-resilient Controller Area Network (CAN) for secure automobiles". 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS). :233–236.

Controller Area Network (CAN) is the main bus network that connects electronic control units in automobiles. Although CAN protocols have been revised to improve the vehicle safety, the security weaknesses of CAN have not been fully addressed. Security threats on automobiles might be from external wireless communication or from internal malicious CAN nodes mounted on the CAN bus. Despite of various threat sources, the security weakness of CAN is the root of security problems. Due to the limited computation power and storage capacity on each CAN node, there is a lack of hardware-efficient protection methods for the CAN system without losing the compatibility to CAN protocols. To save the cost and maintain the compatibility, we propose to exploit the built-in CAN fault confinement mechanism to detect the masquerade attacks originated from the malicious CAN devices on the CAN bus. Simulation results show that our method achieves the attack misdetection rate at the order of 10-5 and reduces the encryption latency by up to 68% over the complete frame encryption method.

Rui Zhou, Rong Min, Qi Yu, Chanjuan Li, Yong Sheng, Qingguo Zhou, Xuan Wang, Kuan-Ching Li.  2014.  Formal Verification of Fault-Tolerant and Recovery Mechanisms for Safe Node Sequence Protocol. Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on. :813-820.

Fault-tolerance has huge impact on embedded safety-critical systems. As technology that assists to the development of such improvement, Safe Node Sequence Protocol (SNSP) is designed to make part of such impact. In this paper, we present a mechanism for fault-tolerance and recovery based on the Safe Node Sequence Protocol (SNSP) to strengthen the system robustness, from which the correctness of a fault-tolerant prototype system is analyzed and verified. In order to verify the correctness of more than thirty failure modes, we have partitioned the complete protocol state machine into several subsystems, followed to the injection of corresponding fault classes into dedicated independent models. Experiments demonstrate that this method effectively reduces the size of overall state space, and verification results indicate that the protocol is able to recover from the fault model in a fault-tolerant system and continue to operate as errors occur.

Turguner, C..  2014.  Secure fault tolerance mechanism of wireless Ad-Hoc networks with mobile agents. Signal Processing and Communications Applications Conference (SIU), 2014 22nd. :1620-1623.

Mobile Ad-Hoc Networks are dynamic and wireless self-organization networks that many mobile nodes connect to each other weakly. To compare with traditional networks, they suffer failures that prevent the system from working properly. Nevertheless, we have to cope with many security issues such as unauthorized attempts, security threats and reliability. Using mobile agents in having low level fault tolerance ad-hoc networks provides fault masking that the users never notice. Mobile agent migration among nodes, choosing an alternative paths autonomous and, having high level fault tolerance provide networks that have low bandwidth and high failure ratio, more reliable. In this paper we declare that mobile agents fault tolerance peculiarity and existing fault tolerance method based on mobile agents. Also in ad-hoc networks that need security precautions behind fault tolerance, we express the new model: Secure Mobil Agent Based Fault Tolerance Model.

Liu, J.N.K., Yanxing Hu, You, J.J., Yulin He.  2014.  An advancing investigation on reduct and consistency for decision tables in Variable Precision Rough Set models. Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on. :1496-1503.

Variable Precision Rough Set (VPRS) model is one of the most important extensions of the Classical Rough Set (RS) theory. It employs a majority inclusion relation mechanism in order to make the Classical RS model become more fault tolerant, and therefore the generalization of the model is improved. This paper can be viewed as an extension of previous investigations on attribution reduction problem in VPRS model. In our investigation, we illustrated with examples that the previously proposed reduct definitions may spoil the hidden classification ability of a knowledge system by ignoring certian essential attributes in some circumstances. Consequently, by proposing a new β-consistent notion, we analyze the relationship between the structures of Decision Table (DT) and different definitions of reduct in VPRS model. Then we give a new notion of β-complement reduct that can avoid the defects of reduct notions defined in previous literatures. We also supply the method to obtain the β- complement reduct using a decision table splitting algorithm, and finally demonstrate the feasibility of our approach with sample instances.

Di Benedetto, M.D., D'Innocenzo, A., Smarra, F..  2014.  Fault-tolerant control of a wireless HVAC control system. Communications, Control and Signal Processing (ISCCSP), 2014 6th International Symposium on. :235-238.

In this paper we address the problem of designing a fault tolerant control scheme for an HVAC control system where sensing and actuation data are exchanged with a centralized controller via a wireless sensors and actuators network where the communication nodes are subject to permanent failures and malicious intrusions.

Hua Chai, Wenbing Zhao.  2014.  Towards trustworthy complex event processing. Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on. :758-761.

Complex event processing has become an important technology for big data and intelligent computing because it facilitates the creation of actionable, situational knowledge from potentially large amount events in soft realtime. Complex event processing can be instrumental for many mission-critical applications, such as business intelligence, algorithmic stock trading, and intrusion detection. Hence, the servers that carry out complex event processing must be made trustworthy. In this paper, we present a threat analysis on complex event processing systems and describe a set of mechanisms that can be used to control various threats. By exploiting the application semantics for typical event processing operations, we are able to design lightweight mechanisms that incur minimum runtime overhead appropriate for soft realtime computing.

Wenbing Zhao.  2014.  Application-Aware Byzantine Fault Tolerance. Dependable, Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on. :45-50.

Byzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing and sequential execution of totally ordered requests. One way of increasing the practicality of Byzantine fault tolerance is to exploit the application semantics, which we refer to as application-aware Byzantine fault tolerance. Application-aware Byzantine fault tolerance makes it possible to facilitate concurrent processing of requests, to minimize the use of Byzantine agreement, and to identify and control replica nondeterminism. In this paper, we provide an overview of recent works on application-aware Byzantine fault tolerance techniques. We elaborate the need for exploiting application semantics for Byzantine fault tolerance and the benefits of doing so, provide a classification of various approaches to application-aware Byzantine fault tolerance, and outline the mechanisms used in achieving application-aware Byzantine fault tolerance according to our classification.

Wenbing Zhao.  2014.  Application-Aware Byzantine Fault Tolerance. Dependable, Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on. :45-50.

Byzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing and sequential execution of totally ordered requests. One way of increasing the practicality of Byzantine fault tolerance is to exploit the application semantics, which we refer to as application-aware Byzantine fault tolerance. Application-aware Byzantine fault tolerance makes it possible to facilitate concurrent processing of requests, to minimize the use of Byzantine agreement, and to identify and control replica nondeterminism. In this paper, we provide an overview of recent works on application-aware Byzantine fault tolerance techniques. We elaborate the need for exploiting application semantics for Byzantine fault tolerance and the benefits of doing so, provide a classification of various approaches to application-aware Byzantine fault tolerance, and outline the mechanisms used in achieving application-aware Byzantine fault tolerance according to our classification.