Visible to the public A Knowledge Representation and Information Fusion Framework for Decision Making in Complex Cyber-Physical Systems Poster.pdf

Performance monitoring data (e.g., measurements, logs, events) are becoming increasingly accessible and abundant (in terms of cost and availability) in modern distributed complex systems such as computer systems and networks, integrated buildings, industrial systems, transportation networks and power-grids. With efficient exploration of such data, health monitoring, diagnosis and prognosis can be greatly improved from the current state-of-the-art. From the perspective of monitoring, anomaly detection and root-cause analysis of such systems, technical challenges arise from a large number of subsystems that are highly interactive and operating in diverse modes.

A semi-supervised tool for root-cause analysis in complex systems has been proposed and validated based on a data driven framework proposed for system-wide time-series anomaly detection in distributed complex system, and using a spatiotemporal feature extraction scheme built on the concept of symbolic dynamics for discovering and representing causal interactions between the subsystems. The proposed tool aims to (i) capture multiple operational modes as nominal in complex CPSs, (ii) only use nominal data and artificially generated fault data to train the model without requiring true labeled anomaly data, and (iii) implement root-cause analysis in a semi-supervised way in a diversity of faults (e.g., one failed pattern, multiple failed patterns, one fault node, and multiple fault nodes). We present two approaches for root-cause analysis, namely the sequential state switching (S3, based on free energy concept of a Restricted Boltzmann Machine, RBM) and artificial anomaly association (A3, a multi-label classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and faulty node are simulated to validate the proposed approaches, then compared with the vector autoregressive (VAR) model in terms of root-cause analysis performance. With synthetic data, the proposed approaches are validated and they demonstrated high accuracy in finding failed patterns and diagnosing for the anomalous node.

While the current work is focusing on validating the method for a large variety of scenarios, diagnosing the change of patterns, further works will pursue the following: (i) optimal reasoning strategy in node failure inference including single node and multiple nodes, (ii) detection and root-cause analysis of simultaneous multiple faults in distributed complex systems.
This project is supported by the National Science Foundation under Grant No. CNS-1464279.

Creative Commons 2.5
Switch to experimental viewer