Visible to the public A Learning-based Data Augmentation for Network Anomaly Detection

TitleA Learning-based Data Augmentation for Network Anomaly Detection
Publication TypeConference Paper
Year of Publication2020
AuthorsOlaimat, M. Al, Lee, D., Kim, Y., Kim, J., Kim, J.
Conference Name2020 29th International Conference on Computer Communications and Networks (ICCCN)
Date Publishedaug
Keywordsanomaly detection, attack instances, augments data, class imbalance problem, data augmentation, data handling, data instances, Data models, Divide-Augment-Combine, divide-augment-combine strategy, Gallium nitride, Generative Adversarial Learning, generative adversarial model, generative adversarial network, generative adversarial networks, Generators, high-quality data, learning (artificial intelligence), learning-based data augmentation, machine learning, network anomaly detection, network traffic, network traffic traces, neural nets, pattern classification, Predictive Metrics, pubcrawl, public network datasets, Resiliency, sampling methods, Scalability, security of data, statistical sampling, Support vector machines, synthetic instances
AbstractWhile machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.
Citation Keyolaimat_learning-based_2020