Visible to the public Biblio

Filters: Author is Gao, Jing  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
Li, Yaliang, Miao, Chenglin, Su, Lu, Gao, Jing, Li, Qi, Ding, Bolin, Qin, Zhan, Ren, Kui.  2018.  An Efficient Two-Layer Mechanism for Privacy-Preserving Truth Discovery. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :1705–1714.
Soliciting answers from online users is an efficient and effective solution to many challenging tasks. Due to the variety in the quality of users, it is important to infer their ability to provide correct answers during aggregation. Therefore, truth discovery methods can be used to automatically capture the user quality and aggregate user-contributed answers via a weighted combination. Despite the fact that truth discovery is an effective tool for answer aggregation, existing work falls short of the protection towards the privacy of participating users. To fill this gap, we propose perturbation-based mechanisms that provide users with privacy guarantees and maintain the accuracy of aggregated answers. We first present a one-layer mechanism, in which all the users adopt the same probability to perturb their answers. Aggregation is then conducted on perturbed answers but the aggregation accuracy could drop accordingly. To improve the utility, a two-layer mechanism is proposed where users are allowed to sample their own probabilities from a hyper distribution. We theoretically compare the one-layer and two-layer mechanisms, and prove that they provide the same privacy guarantee while the two-layer mechanism delivers better utility. This advantage is brought by the fact that the two-layer mechanism can utilize the estimated user quality information from truth discovery to reduce the accuracy loss caused by perturbation, which is confirmed by experimental results on real-world datasets. Experimental results also demonstrate the effectiveness of the proposed two-layer mechanism in privacy protection with tolerable accuracy loss in aggregation.
Wan, Mengting, Chen, Xiangyu, Kaplan, Lance, Han, Jiawei, Gao, Jing, Zhao, Bo.  2016.  From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1885–1894.

In this era of information explosion, conflicts are often encountered when information is provided by multiple sources. Traditional truth discovery task aims to identify the truth the most trustworthy information, from conflicting sources in different scenarios. In this kind of tasks, truth is regarded as a fixed value or a set of fixed values. However, in a number of real-world cases, objective truth existence cannot be ensured and we can only identify single or multiple reliable facts from opinions. Different from traditional truth discovery task, we address this uncertainty and introduce the concept of trustworthy opinion of an entity, treat it as a random variable, and use its distribution to describe consistency or controversy, which is particularly difficult for data which can be numerically measured, i.e. quantitative information. In this study, we focus on the quantitative opinion, propose an uncertainty-aware approach called Kernel Density Estimation from Multiple Sources (KDEm) to estimate its probability distribution, and summarize trustworthy information based on this distribution. Experiments indicate that KDEm not only has outstanding performance on the classical numeric truth discovery task, but also shows good performance on multi-modality detection and anomaly detection in the uncertain-opinion setting.

Zhang, Chenwei, Xie, Sihong, Li, Yaliang, Gao, Jing, Fan, Wei, Yu, Philip S..  2016.  Multi-source Hierarchical Prediction Consolidation. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2251–2256.
In big data applications such as healthcare data mining, due to privacy concerns, it is necessary to collect predictions from multiple information sources for the same instance, with raw features being discarded or withheld when aggregating multiple predictions. Besides, crowd-sourced labels need to be aggregated to estimate the ground truth of the data. Due to the imperfection caused by predictive models or human crowdsourcing workers, noisy and conflicting information is ubiquitous and inevitable. Although state-of-the-art aggregation methods have been proposed to handle label spaces with flat structures, as the label space is becoming more and more complicated, aggregation under a label hierarchical structure becomes necessary but has been largely ignored. These label hierarchies can be quite informative as they are usually created by domain experts to make sense of highly complex label correlations such as protein functionality interactions or disease relationships. We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. We formulate the problem as an optimization problem with a closed-form solution. The consolidation result is inferred in a totally unsupervised, iterative fashion. Experimental results on both synthetic and real-world data sets show the effectiveness of the proposed method over existing alternatives.
Yuan, Xu, Zhang, Jianing, Chen, Zhikui, Gao, Jing, Li, Peng.  2019.  Privacy-Preserving Deep Learning Models for Law Big Data Feature Learning. 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :128–134.
Nowadays, a massive number of data, referred as big data, are being collected from social networks and Internet of Things (IoT), which are of tremendous value. Many deep learning-based methods made great progress in the extraction of knowledge of those data. However, the knowledge extraction of the law data poses vast challenges on the deep learning, since the law data usually contain the privacy information. In addition, the amount of law data of an institution is not large enough to well train a deep model. To solve these challenges, some privacy-preserving deep learning are proposed to capture knowledge of privacy data. In this paper, we review the emerging topics of deep learning for the feature learning of the privacy data. Then, we discuss the problems and the future trend in deep learning for privacy-preserving feature learning on law data.
Xiong, Chen, Chen, Hua, Cai, Ming, Gao, Jing.  2019.  A Vehicle Trajectory Adversary Model Based on VLPR Data. 2019 5th International Conference on Transportation Information and Safety (ICTIS). :903–912.
Although transport agency has employed desensitization techniques to deal with the privacy information when publicizing vehicle license plate recognition (VLPR) data, the adversaries can still eavesdrop on vehicle trajectories by certain means and further acquire the associated person and vehicle information through background knowledge. In this work, a privacy attacking method by using the desensitized VLPR data is proposed to link the vehicle trajectory. First the road average speed is evaluated by analyzing the changes of traffic flow, which is used to estimate the vehicle's travel time to the next VLPR system. Then the vehicle suspicion list is constructed through the time relevance of neighboring VLPR systems. Finally, since vehicles may have the same features like color, type, etc, the target trajectory will be located by filtering the suspected list by the rule of qualified identifier (QI) attributes and closest time method. Based on the Foshan City's VLPR data, the method is tested and results show that correct vehicle trajectory can be linked, which proves that the current VLPR data publication way has the risk of privacy disclosure. At last, the effects of related parameters on the proposed method are discussed and effective suggestions are made for publicizing VLPR date in the future.