Visible to the public Stream-Based Machine Learning for Network Security and Anomaly Detection

TitleStream-Based Machine Learning for Network Security and Anomaly Detection
Publication TypeConference Paper
Year of Publication2018
AuthorsMulinka, Pavol, Casas, Pedro
Conference NameProceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks
ISBN Number978-1-4503-5904-7
KeywordsData Stream mining, high-dimensional data, machine learning, network attacks, pubcrawl, Resiliency, Scalability, security, Stochastic computing

Data Stream Machine Learning is rapidly gaining popularity within the network monitoring community as the big data produced by network devices and end-user terminals goes beyond the memory constraints of standard monitoring equipment. Critical network monitoring applications such as the detection of anomalies, network attacks and intrusions, require fast and continuous mechanisms for on-line analysis of data streams. In this paper we consider a stream-based machine learning approach for network security and anomaly detection, applying and evaluating multiple machine learning algorithms in the analysis of continuously evolving network data streams. The continuous evolution of the data stream analysis algorithms coming from the data stream mining domain, as well as the multiple evaluation approaches conceived for benchmarking such kind of algorithms makes it difficult to choose the appropriate machine learning model. Results of the different approaches may significantly differ and it is crucial to determine which approach reflects the algorithm performance the best. We therefore compare and analyze the results from the most recent evaluation approaches for sequential data on commonly used batch-based machine learning algorithms and their corresponding stream-based extensions, for the specific problem of on-line network security and anomaly detection. Similar to our previous findings when dealing with off-line machine learning approaches for network security and anomaly detection, our results suggest that adaptive random forests and stochastic gradient descent models are able to keep up with important concept drifts in the underlying network data streams, by keeping high accuracy with continuous re-training at concept drift detection times.

Citation Keymulinka_stream-based_2018