Visible to the public International Conferences: Privacy and Security of Big Data, Shanghai, China, 2014

SoS Newsletter- Advanced Book Block

SoS Newsletter Logo

Privacy and Security of Big Data
Shanghai, China


The First International Workshop on Privacy and Security of Big Data was held in Shanghai, China on November 3-7, 2014 concurrently with the 2014 ACM Conference on Information and Knowledge Management.   The research work cited here was presented as part of the Big Data security workshop. 


Mário J. Silva, Pedro Rijo, Alexandre Francisco; Evaluating the Impact of Anonymization on Large Interaction Network Datasets: PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 3-10. Doi: 10.1145/2663715.2669610

Abstract: We address the publication of a large academic information dataset addressing privacy issues. We evaluate anonymization techniques achieving the intended protection, while retaining the utility of the anonymized data. The released data could help infer behaviors and subsequently find solutions for daily planning activities, such as cafeteria attendance, cleaning schedules or student performance, or study interaction patterns among an academic population. However, the nature of the academic data is such that many implicit social interaction networks can be derived from the anonymized datasets, raising the need for researching how anonymity can be assessed in this setting.

Keywords: academic data publishing, interaction network inference, privacy of big data, privacy-preserving data publishing   (ID#:15-3937)



Peter Christen; Privacy Aspects in Big Data Integration: Challenges and Opportunities: PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 1-1. Doi: 10.1145/2663715.2669615

Abstract: Big Data projects often require data from several sources to be integrated before they can be used for analysis. Once data have been integrated, they allow more detailed analysis that would otherwise not be possible. Accordingly, recent years have seen an increasing interest in techniques that facilitate the integration of data from diverse sources. Whenever data about individuals, or otherwise sensitive data, are to be integrated across organizations, privacy and confidentiality have to be considered. Domains where privacy preservation during data integration is of importance include business collaborations, health research, national censuses, the social sciences, crime and fraud detection, and homeland security. Increasingly, applications in these domains require data from diverse sources (both internal and external to an organization) to be integrated.  Consequently, in the past decade, various techniques have been developed that aim to facilitate data integration without revealing any private or confidential information about the databases and records that are integrated. These techniques either provably prevent leakage of any private information, or they provide some empirical numerical measure of the risk of disclosure of private information.  In the first part of this presentation we provide a background on data integration, and illustrate the importance of preserving privacy during data integration with several application scenarios. We then give an overview of the main concepts and techniques that have been developed to facilitate data integration in such ways that no private or confidential information is being revealed. We focus on privacy-preserving record linkage (PPRL), where so far most research has been conducted. We describe the basic protocols used in PPRL, and several key technologies employed in these protocols. Finally, we discuss the challenges privacy poses to data integration in the era of Big Data, and we discuss directions and opportunities in this research area.

Keywords: data matching, multi-party, privacy techniques, privacy-preserving record linkage, scalability   (ID#:15-3938)



Kangsoo Jung, Sehwa Park, Seog Park;  Hiding A Needle In A Haystack: Privacy Preserving A Priori Algorithm In MapReduce Framework; PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 11-17. Doi: 10.1145/2663715.2669611

Abstract: In the last few years, Hadoop become a "de facto" standard to process large scale data as an open source distributed system. With combination of data mining techniques, Hadoop improve data analysis utility. That is why, there are amount of research is studied to apply data mining technique to mapreduce framework in Hadoop. However, data mining have a possibility to cause a privacy violation and this threat is a huge obstacle for data mining using Hadoop. To solve this problem, numerous studies have been conducted. However, existing studies were insufficient and had several drawbacks. In this paper, we propose the privacy preserving data mining technique in Hadoop that is solve privacy violation without utility degradation. We focus on association rule mining algorithm that is representative data mining algorithm. We validate the proposed technique to satisfy performance and preserve data privacy through the experimental results.

Keywords: HADOOP, assoication rule mining, privacy-preserving data mining   (ID#:15-3939)




Avinash Srinivasan, Jie Wu, Wen Zhu;  SAFE: Secure and Big Data-Adaptive Framework for Efficient Cross-Domain Communication; PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 19-28. Doi: 10.1145/2663715.2669612

Abstract: Today's Cross Domain Communication (CDC) infrastructure primarily consists of vendor-specific guard products that have little inter-domain coordination at runtime. Unaware of the context and the semantics of the CDC message that is being processed, the guard heavily relies on rudimentary filtering techniques. Consequently, the information domains are rendered vulnerable to an array of attacks, and countering these attacks often necessitates time-consuming human intervention to adjudicate messages in order to meet the desired security and privacy requirements of the communicating domains. Subsequently, this causes significant performance bottlenecks. In this paper, we present a set of key requirements and design principles for a service oriented CDC security infrastructure in form of a CDC Reference Architecture, featuring Domain Associated Guards (DOGs) as active work ow participants. Our proposed framework, SAFE, is secure and adaptable. SAFE also provide the foundation for the development of protocols and ontologies enabling run-time coordination among CDC elements. This enables more flexible, interoperable, and efficient CDC designs to serve mission needs, specifically among critical infrastructure domains as well as domains with significantly differing security and privacy vocabulary. To the best of our knowledge, SAFE is the first effort to employ DOG for secure CDC, unlike existing solutions with link-associated guards. Because of the DOG approach, SAFE overcomes the scalability problems encountered by exiting solutions.

Keywords: big data, cross domain communication, ontology, privacy, protocol, reference architecture, security, security guard   (ID#:15-3940)



Joanna Biega, Ida Mele, Gerhard Weikum; Probabilistic Prediction of Privacy Risks in User Search Histories; PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 29-36. Doi: 10.1145/2663715.2669609

Abstract:  This paper proposes a new model of user-centric, global, probabilistic privacy, geared for today's challenges of helping users to manage their privacy-sensitive information across a wide variety of social networks, online communities, QA forums, and search histories. Our approach anticipates an adversary that harnesses global background knowledge and rich statistics in order to make educated guesses, that is, probabilistic inferences at sensitive data. We aim for a tool that simulates such a powerful adversary, predicts privacy risks, and guides the user. In this paper, our framework is specialized for the case of Internet search histories. We present preliminary experiments that demonstrate how estimators of global correlations among sensitive and non-sensitive key-value items can be fed into a probabilistic graphical model in order to compute meaningful measures of privacy risk.

Keywords: privacy risk prediction, probabilistic privacy, query logs, user-centric privacy   (ID#:15-3941)



Suvarna Bothe, Alfredo Cuzzocrea, Panagiotis Karras, Akrivi Vlachou; Skyline Query Processing over Encrypted Data: An Attribute-Order-Preserving-Free Approach;  PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 37-43. Doi: 10.1145/2663715.2669613

Abstract: Making co-existent and convergent the need for efficiency of relational query processing over Clouds and the security of data themselves is figuring-out how one of the most challenging research problems in the Big Data era. Indeed, in actual analytics-oriented engines, such as Google Analytics and Amazon S3, where key-value storage-representation and efficient-management models are employed as to cope with the simultaneous processing of billions of transactions, querying encrypted data is becoming one of the most annoying problem, which has also attracted a great deal of attention from the research community. While this issue has been applied to a large variety of data formats, e.g. relational, RDF and multidimensional data, very few initiatives have pointed-out skyline query processing over encrypted data, which is, indeed, relevant for database analytics. In order to fulfill this methodological and technological gap, in this paper we present eSkyline, a prototype system and query interface that enables the processing of skyline queries over encrypted data, even without preserving the order on each attribute as order-preserving encryption would do. Our system comprises of an encryption scheme that facilitates the evaluation of domination relationships, hence allows for state-of-the-art skyline processing algorithms to be used. In order to prove the effectiveness and the reliability of our system, we also provide the details of the underlying encryption scheme, plus a suitable GUI that allows a user to interact with a server, and showcases the efficiency of computing skyline queries and decrypting the results.

Keywords: database security, querying encrypted data, skyline queries over encrypted data   (ID#:15-3942)



Alfredo Cuzzocrea; Privacy and Security of Big Data: Current Challenges and Future Research Perspectives; PSBD '14 Proceedings of the First International Workshop on Privacy and Security of Big Data, November 2014, Pages 45-47. Doi: 10.1145/2663715.2669614

Abstract:  Privacy and security of Big Data is gaining momentum in the research community, also due to emerging technologies like Cloud Computing, analytics engines and social networks. In response of this novel research challenge, several privacy and security of big data models, techniques and algorithms have been proposed recently, mostly adhering to algorithmic paradigms or model-oriented paradigms. Following this major trend, in this paper we provide an overview of state-of-the-art research issues and achievements in the field of privacy and security of big data, by highlighting open problems and actual research trends, and drawing novel research directions in this field.

Keywords: privacy of big data, privacy-preserving analytics over big data, secure query processing over big data, security of big data   (ID#:15-3943)



Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.