Visible to the public A user-centric machine learning framework for cyber security operations center

TitleA user-centric machine learning framework for cyber security operations center
Publication TypeConference Paper
Year of Publication2017
AuthorsFeng, C., Wu, S., Liu, N.
Conference Name2017 IEEE International Conference on Intelligence and Security Informatics (ISI)
Keywordsartificial intelligence security, computer security, cyber security operation center, cyber security operations center, cyber security practitioners, Data collection, Data models, data scientists, false positive rate reduction, feature engineering, flag alerts, Human Behavior, label creation, learning (artificial intelligence), Learning systems, machine learning algorithm selection, machine learning algorithms, machine learning researchers, machine learning system, malicious attacks, Mathematical model, Metrics, model performance evaluations, Predictive models, preventive technologies, pubcrawl, Resiliency, risk score generation, risky user detection, Scalability, security event normalization, security information and event management system, security of data, SIEM, SOC analyst productivity, Symantec SOC production environment, user-centric, user-centric machine learning framework

To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.

Citation Keyfeng_user-centric_2017