Visible to the public Practical and White-Box Anomaly Detection through Unsupervised and Active Learning

TitlePractical and White-Box Anomaly Detection through Unsupervised and Active Learning
Publication TypeConference Paper
Year of Publication2020
AuthorsWang, Y., Wang, Z., Xie, Z., Zhao, N., Chen, J., Zhang, W., Sui, K., Pei, D.
Conference Name2020 29th International Conference on Computer Communications and Networks (ICCCN)
Keywordsactive learning, anomaly detection, composability, Forestry, iRRCF, key performance indicators, KPI anomaly detection framework, Labeling, Metrics, Monitoring, Neural networks, pubcrawl, random forests, resilience, Resiliency, robust random cut forest, RRCF, RRCF algorithm, security, security of data, supervised learning, time series, Time series analysis, Unsupervised Anomaly Detection, unsupervised learning, user experience, white box, White Box Security, white-box anomaly detection

To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.

Citation Keywang_practical_2020