# Biblio

Traffic of Industrial Control System (ICS) between the Human Machine Interface (HMI) and the Programmable Logic Controller (PLC) is known to be highly periodic. However, it is sometimes multiplexed, due to asynchronous scheduling. Modeling the network traffic patterns of multiplexed ICS streams using Deterministic Finite Automata (DFA) for anomaly detection typically produces a very large DFA and a high false-alarm rate. In this article, we introduce a new modeling approach that addresses this gap. Our Statechart DFA modeling includes multiple DFAs, one per cyclic pattern, together with a DFA-selector that de-multiplexes the incoming traffic into sub-channels and sends them to their respective DFAs. We demonstrate how to automatically construct the statechart from a captured traffic stream. Our unsupervised learning algorithms first build a Discrete-Time Markov Chain (DTMC) from the stream. Next, we split the symbols into sets, one per multiplexed cycle, based on symbol frequencies and node degrees in the DTMC graph. Then, we create a sub-graph for each cycle and extract Euler cycles for each sub-graph. The final statechart is comprised of one DFA per Euler cycle. The algorithms allow for non-unique symbols, which appear in more than one cycle, and also for symbols that appear more than once in a cycle. We evaluated our solution on traces from a production ICS using the Siemens S7-0x72 protocol. We also stress-tested our algorithms on a collection of synthetically-generated traces that simulated multiplexed ICS traces with varying levels of symbol uniqueness and time overlap. The algorithms were able to split the symbols into sets with 99.6% accuracy. The resulting statechart modeled the traces with a median false-alarm rate of as low as 0.483%. In all but the most extreme scenarios, the Statechart model drastically reduced both the false-alarm rate and the learned model size in comparison with the naive single-DFA model.

Traffic of Industrial Control System (ICS) between the Human Machine Interface (HMI) and the Programmable Logic Controller (PLC) is highly periodic. However, it is sometimes multiplexed, due to multi-threaded scheduling. In previous work we introduced a Statechart model which includes multiple Deterministic Finite Automata (DFA), one per cyclic pattern. We demonstrated that Statechart-based anomaly detection is highly effective on multiplexed cyclic traffic when the individual cyclic patterns are known. The challenge is to construct the Statechart, by unsupervised learning, from a captured trace of the multiplexed traffic, especially when the same symbols (ICS messages) can appear in multiple cycles, or multiple times in a cycle. Previously we suggested a combinatorial approach for the Statechart construction, based on Euler cycles in the Discrete Time Markov Chain (DTMC) graph of the trace. This combinatorial approach worked well in simple scenarios, but produced a false-alarm rate that was excessive on more complex multiplexed traffic. In this paper we suggest a new Statechart construction method, based on spectral analysis. We use the Fourier transform to identify the dominant periods in the trace. Our algorithm then associates a set of symbols with each dominant period, identifies the order of the symbols within each period, and creates the cyclic DFAs and the Statechart. We evaluated our solution on long traces from two production ICS: one using the Siemens S7-0x72 protocol and the other using Modbus. We also stress-tested our algorithms on a collection of synthetically-generated traces that simulate multiplexed ICS traces with varying levels of symbol uniqueness and time overlap. The resulting Statecharts model the traces with an overall median false-alarm rate as low as 0.16% on the synthetic datasets, and with zero false-alarms on production S7-0x72 traffic. Moreover, the spectral analysis Statecharts consistently out-performed the previous combinatorial Statecharts, exhibiting significantly lower false alarm rates and more compact model sizes.

In Industrial Control Systems (ICS/SCADA), machine to machine data traffic is highly periodic. Past work showed that in many cases, it is possible to model the traffic between each individual Programmable Logic Controller (PLC) and the SCADA server by a cyclic Deterministic Finite Automaton (DFA), and to use the model to detect anomalies in the traffic. However, a recent analysis of network traffic in a water facility in the U.S, showed that cyclic-DFA models have limitations. In our research, we examine the same data corpus; our study shows that the communication on all of the channels in the network is done in bursts of packets, and that the bursts have semantic meaning---the order within a burst depends on the messages. Using these observations, we suggest a new burst-DFA model that fits the data much better than previous work. Our model treats the traffic on each channel as a series of bursts, and matches each burst to the DFA, taking the burst's beginning and end into account. Our burst-DFA model successfully explains between 95% and 99% of the packets in the data-corpus, and goes a long way toward the construction of a practical anomaly detection system.