Visible to the public Biblio

Filters: Keyword is Training data  [Clear All Filters]
Y. Cao, J. Yang.  2015.  Towards Making Systems Forget with Machine Unlearning. 2015 IEEE Symposium on Security and Privacy. :463-480.
Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
Z. Abaid, M. A. Kaafar, S. Jha.  2017.  Quantifying the impact of adversarial evasion attacks on machine learning based android malware classifiers. 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA). :1-10.
With the proliferation of Android-based devices, malicious apps have increasingly found their way to user devices. Many solutions for Android malware detection rely on machine learning; although effective, these are vulnerable to attacks from adversaries who wish to subvert these algorithms and allow malicious apps to evade detection. In this work, we present a statistical analysis of the impact of adversarial evasion attacks on various linear and non-linear classifiers, using a recently proposed Android malware classifier as a case study. We systematically explore the complete space of possible attacks varying in the adversary's knowledge about the classifier; our results show that it is possible to subvert linear classifiers (Support Vector Machines and Logistic Regression) by perturbing only a few features of malicious apps, with more knowledgeable adversaries degrading the classifier's detection rate from 100% to 0% and a completely blind adversary able to lower it to 12%. We show non-linear classifiers (Random Forest and Neural Network) to be more resilient to these attacks. We conclude our study with recommendations for designing classifiers to be more robust to the attacks presented in our work.