Visible to the public Biblio

Filters: Author is Li, Y.  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
Chang, B., Zhang, F., Chen, B., Li, Y., Zhu, W., Tian, Y., Wang, Z., Ching, A..  2018.  MobiCeal: Towards Secure and Practical Plausibly Deniable Encryption on Mobile Devices. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :454–465.

We introduce MobiCeal, the first practical Plausibly Deniable Encryption (PDE) system for mobile devices that can defend against strong coercive multi-snapshot adversaries, who may examine the storage medium of a user's mobile device at different points of time and force the user to decrypt data. MobiCeal relies on "dummy write" to obfuscate the differences between multiple snapshots of storage medium due to existence of hidden data. By incorporating PDE in block layer, MobiCeal supports a broad deployment of any block-based file systems on mobile devices. More importantly, MobiCeal is secure against side channel attacks which pose a serious threat to existing PDE schemes. A proof of concept implementation of MobiCeal is provided on an LG Nexus 4 Android phone using Android 4.2.2. It is shown that the performance of MobiCeal is significantly better than prior PDE systems against multi-snapshot adversaries.

Duan, S., Li, Y., Levitt, K..  2016.  Cost sensitive moving target consensus. 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA). :272–281.

Consensus is a fundamental approach to implementing fault-tolerant services through replication. It is well known that there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing system and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.

Huo, T., Wang, W., Zhao, P., Li, Y., Wang, T., Li, M..  2020.  TEADS: A Defense-Aware Framework for Synthesizing Transient Execution Attacks. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :320—327.

Since 2018, a broad class of microarchitectural attacks called transient execution attacks (e.g., Spectre and Meltdown) have been disclosed. By abusing speculative execution mechanisms in modern CPUs, these attacks enable adversaries to leak secrets across security boundaries. A transient execution attack typically evolves through multiple stages, termed the attack chain. We find that current transient execution attacks usually rely on static attack chains, resulting in that any blockage in an attack chain may cause the failure of the entire attack. In this paper, we propose a novel defense-aware framework, called TEADS, for synthesizing transient execution attacks dynamically. The main idea of TEADS is that: each attacking stage in a transient execution attack chain can be implemented in several ways, and the implementations used in different attacking stages can be combined together under certain constraints. By constructing an attacking graph representing combination relationships between the implementations and testing available paths in the attacking graph dynamically, we can finally synthesize transient execution attacks which can bypass the imposed defense techniques. Our contributions include: (1) proposing an automated defense-aware framework for synthesizing transient execution attacks, even though possible combinations of defense strategies are enabled; (2) presenting an attacking graph extension algorithm to detect potential attack chains dynamically; (3) implementing TEADS and testing it on several modern CPUs with different protection settings. Experimental results show that TEADS can bypass the defenses equipped, improving the adaptability and durability of transient execution attacks.

Li, W., Song, T., Li, Y., Ma, L., Yu, J., Cheng, X..  2017.  A Hierarchical Game Framework for Data Privacy Preservation in Context-Aware IoT Applications. 2017 IEEE Symposium on Privacy-Aware Computing (PAC). :176–177.

Due to the increasing concerns of securing private information, context-aware Internet of Things (IoT) applications are in dire need of supporting data privacy preservation for users. In the past years, game theory has been widely applied to design secure and privacy-preserving protocols for users to counter various attacks, and most of the existing work is based on a two-player game model, i.e., a user/defender-attacker game. In this paper, we consider a more practical scenario which involves three players: a user, an attacker, and a service provider, and such a complicated system renders any two-player model inapplicable. To capture the complex interactions between the service provider, the user, and the attacker, we propose a hierarchical two-layer three-player game framework. Finally, we carry out a comprehensive numerical study to validate our proposed game framework and theoretical analysis.

Li, Y., Zhou, W., Wang, H..  2020.  F-DPC: Fuzzy Neighborhood-Based Density Peak Algorithm. IEEE Access. 8:165963–165972.
Clustering is a concept in data mining, which divides a data set into different classes or clusters according to a specific standard, making the similarity of data objects in the same cluster as large as possible. Clustering by fast search and find of density peaks (DPC) is a novel clustering algorithm based on density. It is simple and novel, only requiring fewer parameters to achieve better clustering effect, without the requirement for iterative solution. And it has expandability and can detect the clustering of any shape. However, DPC algorithm still has some defects, such as it employs the clear neighborhood relations to calculate local density, so it cannot identify the neighborhood membership of different values of points from the distance of points and It is impossible to accurately cluster the data of the multi-density peak. The fuzzy neighborhood density peak clustering algorithm is proposed for this shortcoming (F-DPC): novel local density is defined by the fuzzy neighborhood relationship. The fuzzy set theory can be used to make the fuzzy neighborhood function of local density more sensitive, so that the clustering for data set of various shapes and densities is more robust. Experiments show that the algorithm has high accuracy and robustness.
Li, Y., Ji, X., Li, C., Xu, X., Yan, W., Yan, X., Chen, Y., Xu, W..  2020.  Cross-domain Anomaly Detection for Power Industrial Control System. 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). :383—386.

In recent years, artificial intelligence has been widely used in the field of network security, which has significantly improved the effect of network security analysis and detection. However, because the power industrial control system is faced with the problem of shortage of attack data, the direct deployment of the network intrusion detection system based on artificial intelligence is faced with the problems of lack of data, low precision, and high false alarm rate. To solve this problem, we propose an anomaly traffic detection method based on cross-domain knowledge transferring. By using the TrAdaBoost algorithm, we achieve a lower error rate than using LSTM alone.

Li, Y., Zhou, Y., Hu, K., Sun, N., Ke, K..  2020.  A Security Situation Prediction Method Based on Improved Deep Belief Network. 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT. :594–598.
With the rapid development of smart grids and the continuous deepening of informatization, while realizing remote telemetry and remote control of massive data-based grid operation, electricity information network security problems have become more serious and prominent. A method for electricity information network security situation prediction method based on improved deep belief network is proposed in this paper. Firstly, the affinity propagation clustering algorithm is used to determine the depth of the deep belief network and the number of hidden layer nodes based on sample parameters. Secondly, continuously adjust the scaling factor and crossover probability in the differential evolution algorithm according to the population similarity. Finally, a chaotic search method is used to perform a second search for the best individuals and similarity centers of each generation of the population. Simulation experiments show that the proposed algorithm not only enhances the generalization ability of electricity information network security situation prediction, but also has higher prediction accuracy.
Li, Y., Guan, Z., Xu, C..  2018.  Digital Image Self Restoration Based on Information Hiding. 2018 37th Chinese Control Conference (CCC). :4368–4372.
With the rapid development of computer networks, multimedia information is widely used, and the security of digital media has drawn much attention. The revised photo as a forensic evidence will distort the truth of the case badly tampered pictures on the social network can have a negative impact on the parties as well. In order to ensure the authenticity and integrity of digital media, self-recovery of digital images based on information hiding is studied in this paper. Jarvis half-tone change is used to compress the digital image and obtain the backup data, and then spread the backup data to generate the reference data. Hash algorithm aims at generating hash data by calling reference data and original data. Reference data and hash data together as a digital watermark scattered embedded in the digital image of the low-effective bits. When the image is maliciously tampered with, the hash bit is used to detect and locate the tampered area, and the image self-recovery is performed by extracting the reference data hidden in the whole image. In this paper, a thorough rebuild quality assessment of self-healing images is performed and better performance than the traditional DCT(Discrete Cosine Transform)quantization truncation approach is achieved. Regardless of the quality of the tampered content, a reference authentication system designed according to the principles presented in this paper allows higher-quality reconstruction to recover the original image with good quality even when the large area of the image is tampered.
Li, Y., Liu, X., Tian, H., Luo, C..  2018.  Research of Industrial Control System Device Firmware Vulnerability Mining Technology Based on Taint Analysis. 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). :607-610.

Aiming at the problem that there is little research on firmware vulnerability mining and the traditional method of vulnerability mining based on fuzzing test is inefficient, this paper proposed a new method of mining vulnerabilities in industrial control system firmware. Based on taint analysis technology, this method can construct test cases specifically for the variables that may trigger vulnerabilities, thus reducing the number of invalid test cases and improving the test efficiency. Experiment result shows that this method can reduce about 23 % of test cases and can effectively improve test efficiency.

Li, Y., Zhang, T., Han, X., Qi, Y..  2018.  Image Style Transfer in Deep Learning Networks. 2018 5th International Conference on Systems and Informatics (ICSAI). :660–664.

Since Gatys et al. proved that the convolution neural network (CNN) can be used to generate new images with artistic styles by separating and recombining the styles and contents of images. Neural Style Transfer has attracted wide attention of computer vision researchers. This paper aims to provide an overview of the style transfer application deep learning network development process, and introduces the classical style migration model, on the basis of the research on the migration of style of the deep learning network for collecting and organizing, and put forward related to gathered during the investigation of the problem solution, finally some classical model in the image style to display and compare the results of migration.

Li, Y., Chang, T.-H., Chi, C.-Y..  2020.  Secure Federated Averaging Algorithm with Differential Privacy. 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). :1–6.
Federated learning (FL), as a recent advance of distributed machine learning, is capable of learning a model over the network without directly accessing the client's raw data. Nevertheless, the clients' sensitive information can still be exposed to adversaries via differential attacks on messages exchanged between the parameter server and clients. In this paper, we consider the widely used federating averaging (FedAvg) algorithm and propose to enhance the data privacy by the differential privacy (DP) technique, which obfuscates the exchanged messages by properly adding Gaussian noise. We analytically show that the proposed secure FedAvg algorithm maintains an O(l/T) convergence rate, where T is the total number of stochastic gradient descent (SGD) updates for local model parameters. Moreover, we demonstrate how various algorithm parameters can impact on the algorithm communication efficiency. Experiment results are presented to justify the obtained analytical results on the performance of the proposed algorithm in terms of testing accuracy.
Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S..  2020.  Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :3204—3213.
AI-synthesized face-swapping videos, commonly known as DeepFakes, is an emerging problem threatening the trustworthiness of online information. The need to develop and evaluate DeepFake detection algorithms calls for datasets of DeepFake videos. However, current DeepFake datasets suffer from low visual quality and do not resemble DeepFake videos circulated on the Internet. We present a new large-scale challenging DeepFake video dataset, Celeb-DF, which contains 5,639 high-quality DeepFake videos of celebrities generated using improved synthesis process. We conduct a comprehensive evaluation of DeepFake detection methods and datasets to demonstrate the escalated level of challenges posed by Celeb-DF.
Li, Y., Yang, Y., Yu, X., Yang, T., Dong, L., Wang, W..  2020.  IoT-APIScanner: Detecting API Unauthorized Access Vulnerabilities of IoT Platform. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1—5.

The Internet of Things enables interaction between IoT devices and users through the cloud. The cloud provides services such as account monitoring, device management, and device control. As the center of the IoT platform, the cloud provides services to IoT devices and IoT applications through APIs. Therefore, the permission verification of the API is essential. However, we found that some APIs are unverified, which allows unauthorized users to access cloud resources or control devices; it could threaten the security of devices and cloud. To check for unauthorized access to the API, we developed IoT-APIScanner, a framework to check the permission verification of the cloud API. Through observation, we found there is a large amount of interactive information between IoT application and cloud, which include the APIs and related parameters, so we can extract them by analyzing the code of the IoT application, and use this for mutating API test cases. Through these test cases, we can effectively check the permissions of the API. In our research, we extracted a total of 5 platform APIs. Among them, the proportion of APIs without permission verification reached 13.3%. Our research shows that attackers could use the API without permission verification to obtain user privacy or control of devices.

Li, Y., Chen, J., Li, Q., Liu, A..  2020.  Differential Privacy Algorithm Based on Personalized Anonymity. 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). :260—267.

The existing anonymized differential privacy model adopts a unified anonymity method, ignoring the difference of personal privacy, which may lead to the problem of excessive or insufficient protection of the original data [1]. Therefore, this paper proposes a personalized k-anonymity model for tuples (PKA) and proposes a differential privacy data publishing algorithm (DPPA) based on personalized anonymity, firstly based on the tuple personality factor set by the user in the original data set. The values are classified and the corresponding privacy protection relevance is calculated. Then according to the tuple personality factor classification value, the data set is clustered by clustering method with different anonymity, and the quasi-identifier attribute of each cluster is aggregated and noise-added to realize anonymized differential privacy; finally merge the subset to get the data set that meets the release requirements. In this paper, the correctness of the algorithm is analyzed theoretically, and the feasibility and effectiveness of the proposed algorithm are verified by comparison with similar algorithms.

Li, Y., Liu, Y., Wang, Y., Guo, Z., Yin, H., Teng, H..  2020.  Synergetic Denial-of-Service Attacks and Defense in Underwater Named Data Networking. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. :1569–1578.
Due to the harsh environment and energy limitation, maintaining efficient communication is crucial to the lifetime of Underwater Sensor Networks (UWSN). Named Data Networking (NDN), one of future network architectures, begins to be applied to UWSN. Although Underwater Named Data Networking (UNDN) performs well in data transmission, it still faces some security threats, such as the Denial-of-Service (DoS) attacks caused by Interest Flooding Attacks (IFAs). In this paper, we present a new type of DoS attacks, named as Synergetic Denial-of-Service (SDoS). Attackers synergize with each other, taking turns to reply to malicious interests as late as possible. SDoS attacks will damage the Pending Interest Table, Content Store, and Forwarding Information Base in routers with high concealment. Simulation results demonstrate that the SDoS attacks quadruple the increased network traffic compared with normal IFAs and the existing IFA detection algorithm in UNDN is completely invalid to SDoS attacks. In addition, we analyze the infection problem in UNDN and propose a defense method Trident based on carefully designed adaptive threshold, burst traffic detection, and attacker identification. Experiment results illustrate that Trident can effectively detect and resist both SDoS attacks and normal IFAs. Meanwhile, Trident can robustly undertake burst traffic and congestion.
Lin, X., Zhang, Z., Chen, M., Sun, Y., Li, Y., Liu, M., Wang, Y., Liu, M..  2020.  GDGCA: A Gene Driven Cache Scheduling Algorithm in Information-Centric Network. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE). :167–172.
The disadvantages and inextensibility of the traditional network require more novel thoughts for the future network architecture, as for ICN (Information-Centric Network), is an information centered and self-caching network, ICN is deeply rooted in the 5G era, of which concept is user-centered and content-centered. Although the ICN enables cache replacement of content, an information distribution scheduling algorithm is still needed to allocate resources properly due to its limited cache capacity. This paper starts with data popularity, information epilepsy and other data related attributes in the ICN environment. Then it analyzes the factors affecting the cache, proposes the concept and calculation method of Gene value. Since the ICN is still in a theoretical state, this paper describes an ICN scenario that is close to the reality and processes a greedy caching algorithm named GDGCA (Gene Driven Greedy Caching Algorithm). The GDGCA tries to design an optimal simulation model, which based on the thoughts of throughput balance and satisfaction degree (SSD), then compares with the regular distributed scheduling algorithm in related research fields, such as the QoE indexes and satisfaction degree under different Poisson data volumes and cycles, the final simulation results prove that GDGCA has better performance in cache scheduling of ICN edge router, especially with the aid of Information Gene value.
Liu, D., Li, Y., Tang, Y., Wang, B., Xie, W..  2018.  VMPBL: Identifying Vulnerable Functions Based on Machine Learning Combining Patched Information and Binary Comparison Technique by LCS. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :800-807.

Nowadays, most vendors apply the same open source code to their products, which is dangerous. In addition, when manufacturers release patches, they generally hide the exact location of the vulnerabilities. So, identifying vulnerabilities in binaries is crucial. However, just searching source program has a lower identifying accuracy of vulnerability, which requires operators further to differentiate searched results. Under this context, we propose VMPBL to enhance identifying the accuracy of vulnerability with the help of patch files. VMPBL, compared with other proposed schemes, uses patched functions according to its vulnerable functions in patch file to further distinguish results. We establish a prototype of VMPBL, which can effectively identify vulnerable function types and get rid of safe functions from results. Firstly, we get the potential vulnerable-patched functions by binary comparison technique based on K-Trace algorithm. Then we combine the functions with vulnerability and patch knowledge database to classify these function pairs and identify the possible vulnerable functions and the vulnerability types. Finally, we test some programs containing real-world CWE vulnerabilities, and one of the experimental results about CWE415 shows that the results returned from only searching source program are about twice as much as the results from VMPBL. We can see that using VMPBL can significantly reduce the false positive rate of discovering vulnerabilities compared with analyzing source files alone.

Liu, J., Xiao, K., Luo, L., Li, Y., Chen, L..  2020.  An intrusion detection system integrating network-level intrusion detection and host-level intrusion detection. 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). :122—129.
With the rapid development of Internet, the issue of cyber security has increasingly gained more attention. An intrusion Detection System (IDS) is an effective technique to defend cyber-attacks and reduce security losses. However, the challenge of IDS lies in the diversity of cyber-attackers and the frequently-changing data requiring a flexible and efficient solution. To address this problem, machine learning approaches are being applied in the IDS field. In this paper, we propose an efficient scalable neural-network-based hybrid IDS framework with the combination of Host-level IDS (HIDS) and Network-level IDS (NIDS). We applied the autoencoders (AE) to NIDS and designed HIDS using word embedding and convolutional neural network. To evaluate the IDS, many experiments are performed on the public datasets NSL-KDD and ADFA. It can detect many attacks and reduce the security risk with high efficiency and excellent scalability.
Liu, Y., Yuan, X., Li, M., Zhang, W., Zhao, Q., Zhong, J., Cao, Y., Li, Y., Chen, L., Li, H. et al..  2018.  High Speed Device-Independent Quantum Random Number Generation without Detection Loophole. 2018 Conference on Lasers and Electro-Optics (CLEO). :1–2.

We report a an experimental study of device-independent quantum random number generation based on an detection-loophole free Bell test with entangled photons. After considering statistical fluctuations and applying an 80 Gb × 45.6 Mb Toeplitz matrix hashing, we achieve a final random bit rate of 114 bits/s, with a failure probability less than 10-5.

Ma, J., Feng, Z., Li, Y., Sun, X..  2020.  Topologically Protected Acoustic Wave Amplification in an Optomechanical Array. 2020 Conference on Lasers and Electro-Optics (CLEO). :1–2.
By exploiting the simultaneous particle-conserving and particle-nonconserving phonon-photon interactions in an optomechanical array, we find a topologically protected edge state for phonons that can be parametrically amplified when all the bulk states remain stable.
Peng, Y., Yue, M., Li, H., Li, Y., Li, C., Xu, H., Wu, Q., Xi, W..  2018.  The Effect of Easy Axis Deviations on the Magnetization Reversal of Co Nanowire. IEEE Transactions on Magnetics. 54:1–5.
Macroscopic hysteresis loops and microscopic magnetic moment distributions have been determined by 3-D model for Co nanowire with various easy axis deviations from applied field. It is found that both the coercivity and the remanence decrease monotonously with the increase of easy axis deviation as well as the maximum magnetic product, indicating the large impact of the easy axis orientation on the magnetic performance. Moreover, the calculated angular distributions and the evolution of magnetic moments have been shown to explain the magnetic reversal process. It is demonstrated that the large demagnetization field in the two ends of the nanowire makes the occurrence of reversal domain nucleation easier, hence the magnetic reversal. In addition, the magnetic reversal was illustrated in terms of the analysis of the energy evolution.
Su, C., Santoso, B., Li, Y., Deng, R. H., Huang, X..  2017.  Universally Composable RFID Mutual Authentication. IEEE Transactions on Dependable and Secure Computing. 14:83–94.

Universally Composable (UC) framework provides the strongest security notion for designing fully trusted cryptographic protocols, and it is very challenging on applying UC security in the design of RFID mutual authentication protocols. In this paper, we formulate the necessary conditions for achieving UC secure RFID mutual authentication protocols which can be fully trusted in arbitrary environment, and indicate the inadequacy of some existing schemes under the UC framework. We define the ideal functionality for RFID mutual authentication and propose the first UC secure RFID mutual authentication protocol based on public key encryption and certain trusted third parties which can be modeled as functionalities. We prove the security of our protocol under the strongest adversary model assuming both the tags' and readers' corruptions. We also present two (public) key update protocols for the cases of multiple readers: one uses Message Authentication Code (MAC) and the other uses trusted certificates in Public Key Infrastructure (PKI). Furthermore, we address the relations between our UC framework and the zero-knowledge privacy model proposed by Deng et al. [1].

Wang, H., Li, Y., Wang, Y., Hu, H., Yang, M.-H..  2020.  Collaborative Distillation for Ultra-Resolution Universal Style Transfer. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :1857–1866.
Universal style transfer methods typically leverage rich representations from deep Convolutional Neural Network (CNN) models (e.g., VGG-19) pre-trained on large collections of images. Despite the effectiveness, its application is heavily constrained by the large model size to handle ultra-resolution images given limited memory. In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters. The main idea is underpinned by a finding that the encoder-decoder pairs construct an exclusive collaborative relationship, which is regarded as a new kind of knowledge for style transfer models. Moreover, to overcome the feature size mismatch when applying collaborative distillation, a linear embedding loss is introduced to drive the student network to learn a linear embedding of the teacher's features. Extensive experiments show the effectiveness of our method when applied to different universal style transfer approaches (WCT and AdaIN), even if the model size is reduced by 15.5 times. Especially, on WCT with the compressed models, we achieve ultra-resolution (over 40 megapixels) universal style transfer on a 12GB GPU for the first time. Further experiments on optimization-based stylization scheme show the generality of our algorithm on different stylization paradigms. Our code and trained models are available at
Wang, J., Shi, D., Li, Y., Chen, J., Duan, X..  2017.  Realistic measurement protection schemes against false data injection attacks on state estimators. 2017 IEEE Power Energy Society General Meeting. :1–5.
False data injection attacks (FDIA) on state estimators are a kind of imminent cyber-physical security issue. Fortunately, it has been proved that if a set of measurements is strategically selected and protected, no FDIA will remain undetectable. In this paper, the metric Return on Investment (ROI) is introduced to evaluate the overall returns of the alternative measurement protection schemes (MPS). By setting maximum total ROI as the optimization objective, the previously ignored cost-benefit issue is taken into account to derive a realistic MPS for power utilities. The optimization problem is transformed into the Steiner tree problem in graph theory, where a tree pruning based algorithm is used to reduce the computational complexity and find a quasi-optimal solution with acceptable approximations. The correctness and efficiency of the algorithm are verified by case studies.
Wu, M., Li, Y..  2018.  Adversarial mRMR against Evasion Attacks. 2018 International Joint Conference on Neural Networks (IJCNN). :1–6.

Machine learning (ML) algorithms provide a good solution for many security sensitive applications, they themselves, however, face the threats of adversary attacks. As a key problem in machine learning, how to design robust feature selection algorithms against these attacks becomes a hot issue. The current researches on defending evasion attacks mainly focus on wrapped adversarial feature selection algorithm, i.e., WAFS, which is dependent on the classification algorithms, and time cost is very high for large-scale data. Since mRMR (minimum Redundancy and Maximum Relevance) algorithm is one of the most popular filter algorithms for feature selection without considering any classifier during feature selection process. In this paper, we propose a novel adversary-aware feature selection algorithm under filter model based on mRMR, named FAFS. The algorithm, on the one hand, takes the correlation between a single feature and a label, and the redundancy between features into account; on the other hand, when selecting features, it not only considers the generalization ability in the absence of attack, but also the robustness under attack. The performance of four algorithms, i.e., mRMR, TWFS (Traditional Wrapped Feature Selection algorithm), WAFS, and FAFS is evaluated on spam filtering and PDF malicious detection in the Perfect Knowledge attack scenarios. The experiment results show that FAFS has a better performance under evasion attacks with less time complexity, and comparable classification accuracy.