Visible to the public Biblio

Filters: Keyword is sampling methods  [Clear All Filters]
Olaimat, M. Al, Lee, D., Kim, Y., Kim, J., Kim, J..  2020.  A Learning-based Data Augmentation for Network Anomaly Detection. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1–10.
While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.
Wang, Hui, Yan, Qiurong, Li, Bing, Yuan, Chenglong, Wang, Yuhao.  2019.  Sampling Time Adaptive Single-Photon Compressive Imaging. IEEE Photonics Journal. 11:1–10.
We propose a time-adaptive sampling method and demonstrate a sampling-time-adaptive single-photon compressive imaging system. In order to achieve self-adapting adjustment of sampling time, the theory of threshold of light intensity estimation accuracy is deduced. According to this threshold, a sampling control module, based on field-programmable gate array, is developed. Finally, the advantage of the time-adaptive sampling method is proved experimentally. Imaging performance experiments show that the time-adaptive sampling method can automatically adjust the sampling time for the change of light intensity of image object to obtain an image with better quality and avoid speculative selection of sampling time.
Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L..  2019.  A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :83—88.

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.

Ullah, Imtiaz, Mahmoud, Qusay H..  2019.  A Two-Level Hybrid Model for Anomalous Activity Detection in IoT Networks. 2019 16th IEEE Annual Consumer Communications Networking Conference (CCNC). :1–6.
In this paper we propose a two-level hybrid anomalous activity detection model for intrusion detection in IoT networks. The level-1 model uses flow-based anomaly detection, which is capable of classifying the network traffic as normal or anomalous. The flow-based features are extracted from the CICIDS2017 and UNSW-15 datasets. If an anomaly activity is detected then the flow is forwarded to the level-2 model to find the category of the anomaly by deeply examining the contents of the packet. The level-2 model uses Recursive Feature Elimination (RFE) to select significant features and Synthetic Minority Over-Sampling Technique (SMOTE) for oversampling and Edited Nearest Neighbors (ENN) for cleaning the CICIDS2017 and UNSW-15 datasets. Our proposed model precision, recall and F score for level-1 were measured 100% for the CICIDS2017 dataset and 99% for the UNSW-15 dataset, while the level-2 model precision, recall, and F score were measured at 100 % for the CICIDS2017 dataset and 97 % for the UNSW-15 dataset. The predictor we introduce in this paper provides a solid framework for the development of malicious activity detection in IoT networks.
Zhang, F., Chan, P. P. K., Tang, T. Q..  2015.  L-GEM based robust learning against poisoning attack. 2015 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR). :175–178.

Poisoning attack in which an adversary misleads the learning process by manipulating its training set significantly affect the performance of classifiers in security applications. This paper proposed a robust learning method which reduces the influences of attack samples on learning. The sensitivity, defined as the fluctuation of the output with small perturbation of the input, in Localized Generalization Error Model (L-GEM) is measured for each training sample. The classifier's output on attack samples may be sensitive and inaccurate since these samples are different from other untainted samples. An import score is assigned to each sample according to its localized generalization error bound. The classifier is trained using a new training set obtained by resampling the samples according to their importance scores. RBFNN is applied as the classifier in experimental evaluation. The proposed model outperforms than the traditional one under the well-known label flip poisoning attacks including nearest-first and farthest-first flips attack.

M. Clark, L. Lampe.  2015.  "Single-channel compressive sampling of electrical data for non-intrusive load monitoring". 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP). :790-794.

Non-intrusive load monitoring (NILM) extracts information about how energy is being used in a building from electricity measurements collected at a single location. Obtaining measurements at only one location is attractive because it is inexpensive and convenient, but it can result in large amounts of data from high frequency electrical measurements. Different ways to compress or selectively measure this data are therefore required for practical implementations of NILM. We explore the use of random filtering and random demodulation, techniques that are closely related to compressed sensing, to offer a computationally simple way of compressing the electrical data. We show how these techniques can allow one to reduce the sampling rate of the electricity measurements, while requiring only one sampling channel and allowing accurate NILM performance. Our tests are performed using real measurements of electrical signals from a public data set, thus demonstrating their effectiveness on real appliances and allowing for reproducibility and comparison with other data management strategies for NILM.

Baraldi, A., Boschetti, L., Humber, M.L..  2014.  Probability Sampling Protocol for Thematic and Spatial Quality Assessment of Classification Maps Generated From Spaceborne/Airborne Very High Resolution Images. Geoscience and Remote Sensing, IEEE Transactions on. 52:701-760.

To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: 1) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. 2) The inclusion probabilities must be: a) knowable for nonsampled units and b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne very high resolution images, where: 1) an original Categorical Variable Pair Similarity Index (proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and 2) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session, the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic Mapper (SIAM™) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps, and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAM™ by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAM™ pre-classification maps proposed in this contribution, together with OQIs claimed for SIAM™ by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAM™ software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems initiative and the QA4EO international guidelines.