Visible to the public Differentially Private K-Means Clustering

TitleDifferentially Private K-Means Clustering
Publication TypeConference Paper
Year of Publication2016
AuthorsSu, Dong, Cao, Jianneng, Li, Ninghui, Bertino, Elisa, Jin, Hongxia
Conference NameProceedings of the Sixth ACM Conference on Data and Application Security and Privacy
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3935-3
Keywordscontrol theory, Differential privacy, k-means clustering, privacy, private data publishing, pubcrawl, Resiliency

There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.

Citation Keysu_differentially_2016