Visible to the public Biblio

Filters: Author is Zhang, Jun  [Clear All Filters]
2016
Zhang, Jun, Xiao, Xiaokui, Xie, Xing.  2016.  PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions. Proceedings of the 2016 International Conference on Management of Data. :155–170.

Given a set D of tuples defined on a domain Omega, we study differentially private algorithms for constructing a histogram over Omega to approximate the tuple distribution in D. Existing solutions for the problem mostly adopt a hierarchical decomposition approach, which recursively splits Omega into sub-domains and computes a noisy tuple count for each sub-domain, until all noisy counts are below a certain threshold. This approach, however, requires that we (i) impose a limit h on the recursion depth in the splitting of Omega and (ii) set the noise in each count to be proportional to h. The choice of h is a serious dilemma: a small h makes the resulting histogram too coarse-grained, while a large h leads to excessive noise in the tuple counts used in deciding whether sub-domains should be split. Furthermore, h cannot be directly tuned based on D; otherwise, the choice of h itself reveals private information and violates differential privacy. To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h. The core of PrivTree is a novel mechanism that (i) exploits a new analysis on the Laplace distribution and (ii) enables us to use only a constant amount of noise in deciding whether a sub-domain should be split, without worrying about the recursion depth of splitting. We demonstrate the application of PrivTree in modelling spatial data, and show that it can be extended to handle sequence data (where the decision in sub-domain splitting is not based on tuple counts but a more sophisticated measure). Our experiments on a variety of real datasets show that PrivTree considerably outperforms the states of the art in terms of data utility.

2017
Zhang, Jun, Cormode, Graham, Procopiuc, Cecilia M., Srivastava, Divesh, Xiao, Xiaokui.  2017.  PrivBayes: Private Data Release via Bayesian Networks. ACM Trans. Database Syst.. 42:25:1–25:41.
Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.
2018
Liu, Shigang, Zhang, Jun, Wang, Yu, Zhou, Wanlei, Xiang, Yang, Vel., Olivier De.  2018.  A Data-driven Attack Against Support Vectors of SVM. Proceedings of the 2018 on Asia Conference on Computer and Communications Security. :723–734.
Machine learning (ML) is commonly used in multiple disciplines and real-world applications, such as information retrieval, financial systems, health, biometrics and online social networks. However, their security profiles against deliberate attacks have not often been considered. Sophisticated adversaries can exploit specific vulnerabilities exposed by classical ML algorithms to deceive intelligent systems. It is emerging to perform a thorough security evaluation as well as potential attacks against the machine learning techniques before developing novel methods to guarantee that machine learning can be securely applied in adversarial setting. In this paper, an effective attack strategy for crafting foreign support vectors in order to attack a classic ML algorithm, the Support Vector Machine (SVM) has been proposed with mathematical proof. The new attack can minimize the margin around the decision boundary and maximize the hinge loss simultaneously. We evaluate the new attack in different real-world applications including social spam detection, Internet traffic classification and image recognition. Experimental results highlight that the security of classifiers can be worsened by poisoning a small group of support vectors.
Monteuuis, Jean-Philippe, Boudguiga, Aymen, Zhang, Jun, Labiod, Houda, Servel, Alain, Urien, Pascal.  2018.  SARA: Security Automotive Risk Analysis Method. Proceedings of the 4th ACM Workshop on Cyber-Physical System Security. :3-14.

Connected and automated vehicles aim to improve the comfort and the safety of the driver and passengers. To this end, car manufacturers continually improve actual standardized methods to ensure their customers safety, privacy, and vehicles security. However, these methods do not support fully autonomous vehicles, linkability and confusion threats. To address such gaps, we propose a systematic threat analysis and risk assessment framework, SARA, which comprises an improved threat model, a new attack method/asset map, the involvement of the attacker in the attack tree, and a new driving system observation metric. Finally, we demonstrate its feasibility in assessing risk with two use cases: Vehicle Tracking and Comfortable Emergency Brake Failure.

2020
Shen, Jian, Gui, Ziyuan, Chen, Xiaofeng, Zhang, Jun, Xiang, Yang.  2020.  Lightweight and Certificateless Multi-Receiver Secure Data Transmission Protocol for Wireless Body Area Networks. IEEE Transactions on Dependable and Secure Computing. :1–1.
The rapid development of low-power integrated circuits, wireless communication, intelligent sensors and microelectronics has allowed the realization of wireless body area networks (WBANs), which can monitor patients' vital body parameters remotely in real time to offer timely treatment. These vital body parameters are related to patients' life and health; and these highly private data are subject to many security threats. To guarantee privacy, many secure communication protocols have been proposed. However, most of these protocols have a one-to-one structure in extra-body communication and cannot support multidisciplinary team (MDT). Hence, we propose a lightweight and certificateless multi-receiver secure data transmission protocol for WBANs to support MDT treatment in this paper. In particular, a novel multi-receiver certificateless generalized signcryption (MR-CLGSC) scheme is proposed that can adaptively use only one algorithm to implement one of three cryptographic primitives: signature, encryption or signcryption. Then, a multi-receiver secure data transmission protocol based on the MR-CLGSC scheme with many security properties, such as data integrity and confidentiality, non-repudiation, anonymity, forward and backward secrecy, unlinkability and data freshness, is designed. Both security analysis and performance analysis show that the proposed protocol for WBANs is secure, efficient and highly practical.
Li, Meng, Zhong, Qi, Zhang, Leo Yu, Du, Yajuan, Zhang, Jun, Xiang, Yong.  2020.  Protecting the Intellectual Property of Deep Neural Networks with Watermarking: The Frequency Domain Approach. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :402–409.
Similar to other digital assets, deep neural network (DNN) models could suffer from piracy threat initiated by insider and/or outsider adversaries due to their inherent commercial value. DNN watermarking is a promising technique to mitigate this threat to intellectual property. This work focuses on black-box DNN watermarking, with which an owner can only verify his ownership by issuing special trigger queries to a remote suspicious model. However, informed attackers, who are aware of the watermark and somehow obtain the triggers, could forge fake triggers to claim their ownerships since the poor robustness of triggers and the lack of correlation between the model and the owner identity. This consideration calls for new watermarking methods that can achieve better trade-off for addressing the discrepancy. In this paper, we exploit frequency domain image watermarking to generate triggers and build our DNN watermarking algorithm accordingly. Since watermarking in the frequency domain is high concealment and robust to signal processing operation, the proposed algorithm is superior to existing schemes in resisting fraudulent claim attack. Besides, extensive experimental results on 3 datasets and 8 neural networks demonstrate that the proposed DNN watermarking algorithm achieves similar performance on functionality metrics and better performance on security metrics when compared with existing algorithms.
Coulter, Rory, Zhang, Jun, Pan, Lei, Xiang, Yang.  2020.  Unmasking Windows Advanced Persistent Threat Execution. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :268—276.

The advanced persistent threat (APT) landscape has been studied without quantifiable data, for which indicators of compromise (IoC) may be uniformly analyzed, replicated, or used to support security mechanisms. This work culminates extensive academic and industry APT analysis, not as an incremental step in existing approaches to APT detection, but as a new benchmark of APT related opportunity. We collect 15,259 APT IoC hashes, retrieving subsequent sandbox execution logs across 41 different file types. This work forms an initial focus on Windows-based threat detection. We present a novel Windows APT executable (APT-EXE) dataset, made available to the research community. Manual and statistical analysis of the APT-EXE dataset is conducted, along with supporting feature analysis. We draw upon repeat and common APT paths access, file types, and operations within the APT-EXE dataset to generalize APT execution footprints. A baseline case analysis successfully identifies a majority of 117 of 152 live APT samples from campaigns across 2018 and 2019.