Visible to the public Differential privacy-based data de-identification protection and risk evaluation system

TitleDifferential privacy-based data de-identification protection and risk evaluation system
Publication TypeConference Paper
Year of Publication2017
AuthorsTsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P.
Conference Name2017 International Conference on Information and Communication Technology Convergence (ICTC)
Date Publishedoct
ISBN Number978-1-5090-4032-2
KeywordsBig Data, composability, data de-identification process, data disclosure estimation system, data mining, data privacy, data protection, data query, de-identification, Differential privacy, differential privacy-based data de-identification protection, Human Behavior, native differential privacy, privacy protection issues, privacy-sensitive data, pubcrawl, query processing, Resiliency, risk evaluation system, Scalability, Synthetic Dataset, value added analysis

As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.

Citation Keytsou_differential_2017