Visible to the public Data Sanitization

SoS Newsletter- Advanced Book Block

Data Sanitization

For security researchers, privacy protection during data mining is a major concern. Sharing information over the Internet or holding it in a database requires methods of sanitizing data so that personal information cannot be obtained. The methods described in the articles listed here include SQL injections, provenance workflows, item set hiding, differential privacy, and a framework for a mathematical definition of privacy.

  • Mihai Maruseac, Gabriel Ghinita, Razvan Rughinis, "Privacy-preserving Publication Of Provenance Workflows," CODASPY '14 Proceedings of the 4th ACM conference on Data and Application Security And Privacy , March 2014, (Pages 159-162). (ID#:14-1558) Available at: Provenance workflows capture the data movement and the operations changing the data in complex applications such as scientific computations, document management in large organizations, content generation in social media, etc. Provenance is essential to understand the processes and operations that data undergo, and many research efforts focused on modeling, capturing and analyzing provenance information. Sharing provenance brings numerous benefits, but may also disclose sensitive information, such as secret processes of synthesizing chemical substances, confidential business practices and private details about social media participants' lives. In this paper, we study privacy-preserving provenance workflow publication using differential privacy. We adapt techniques designed for sanitization of multi-dimensional spatial data to the problem of provenance workflows. Experimental results show that such an approach is feasible to protect provenance workflows, while at the same time retaining a significant amount of utility for queries. In addition, we identify influential factors and trade-offs that emerge when sanitizing provenance workflows. Keywords: privacy, provenance
  • Vasileios Kagklis, Vassilios S. Verykios, Giannis Tzimas, Athanasios K. Tsakalidis, "Knowledge Sanitization on the Web," WIMS '14 Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), June 2014, Article No. 4. (ID#:14-1559) Available at: The widespread use of the Internet caused the rapid growth of data on the Web. But as data on the Web grew larger in numbers, so did the perils due to the applications of data mining. Privacy preserving data mining (PPDM) is the field that investigates techniques to preserve the privacy of data and patterns. Knowledge Hiding, a subfield of PPDM, aims at preserving the sensitive patterns included in the data, which are going to be published. A wide variety of techniques fall under the umbrella of Knowledge Hiding, such as frequent pattern hiding, sequence hiding, classification rule hiding and so on. In this tutorial we create a taxonomy for the frequent itemset hiding techniques. We also provide as examples for each category representative works that appeared recently and fall into each one of these categories. Then, we focus on the detailed overview of a specific category, the so called linear programming-based techniques. Finally, we make a quantitative and qualitative comparison among some of the existing techniques that are classified into this category. Keywords: Frequent Itemset Hiding, Knowledge Hiding, LP-Based Hiding Approaches, Privacy Preserving Data Mining
  • Madhushri Banerjee, Zhiyuan Chen, Aryya Gangopadhyay, "A Generic and Distributed Privacy Preserving Classification Method with A Worst-Case Privacy Guarantee," Distributed and Parallel Databases,Volume 32 Issue 1, March 2014, (Pages 5-35). (ID#:14-1560) Available at: This discusses the development of privacy preserving distributed data mining in response to the security risks involved with data mining. Current methods of data mining can either only handle a single mining task, or experience increased overhead when attempting multiple tasks. This paper takes these challenges into consideration, and explores a generic approach to efficient privacy preserving classification. Keywords: Classification, Data mining, Privacy preserving data mining
  • Christos Kalloniatis, Haralambos Mouratidis, Manousakis Vassilis, Shareeful Islam, Stefanos Gritzalis, Evangelia Kavakli, "Towards the Design Of Secure And Privacy-Oriented Information Systems In The Cloud: Identifying The Major Concepts," Computer Standards & Interfaces, Volume 36 Issue 4, June, 2014, (Pages 759-775). (ID#:14-1561) Available at: This paper emphasizes the different security challenges between cloud architecture and traditional distributed systems. The authors stress the imperative nature of thoroughly understanding security in the cloud environment in order to design secure cloud systems. Keywords: Cloud computing, Concepts, Privacy, Requirements, Security, Security and Privacy Issues
  • Daniel Kifer, Ashwin Machanavajjhala, "Pufferfish: A Framework for Mathematical Privacy Definitions," ACM Transactions on Database Systems (TODS) Volume 39 Issue 1, January 2014, Article No. 3. (ID#:14-1562) Available at: In this article, we introduce a new and general privacy framework called Pufferfish. The Pufferfish framework can be used to create new privacy definitions that are customized to the needs of a given application. The goal of Pufferfish is to allow experts in an application domain, who frequently do not have expertise in privacy, to develop rigorous privacy definitions for their data sharing needs. In addition to this, the Pufferfish framework can also be used to study existing privacy definitions. We illustrate the benefits with several applications of this privacy framework: we use it to analyze differential privacy and formalize a connection to attackers who believe that the data records are independent; we use it to create a privacy definition called hedging privacy, which can be used to rule out attackers whose prior beliefs are inconsistent with the data; we use the framework to define and study the notion of composition in a broader context than before; we show how to apply the framework to protect unbounded continuous attributes and aggregate information; and we show how to use the framework to rigorously account for prior data releases. Keywords: Privacy, differential privacy
  • Younsung Choi, Donghoon Lee, Woongryul Jeon, Dongho Won, "Password-based Single-File Encryption and Secure Data Deletion for Solid-State Drive," ICUIMC '14 Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, January 2014,Article No. 5. (ID#:14-1563) Available at: Recently, SSD sales are on the steady rise. The reason of sales is that SSD is faster and smaller than HDD. Therefore SSD serves as a typical alternative to HDD. In fact, SSD considerably emulates the technology of HDD such as the communication protocol and hardware interfaces. So, the technology of HDD can quickly be adapted to SSD. However, SSD slightly differ from HDD in the way of storing, managing and accessing the data. Because of the difference between SSD and HDD, it is possible that technology and command of HDD are not accurately operated on SSD. So various problems on the technology of SSD have occurred gradually including encryption and deletion. To solve this problem, we have to analyze the method of data encryption and secure data deletion suitable for SSD. In this paper, we research significant technology and security problem relevant to SSD. On the basis of analysis about SSD problems, we propose the password-based single-file encryption and secure data deletion for SSD and compare the previous researches with proposed method. Keywords: encryption, secure deletion, solid state drive
  • Xiaowei Li, Yuan Xue, "A Survey On Server-Side Approaches To Securing Web Applications," ACM Computing Surveys (CSUR) Surveys, Volume 46 Issue 4, April 2014, Article No. 54. (ID#:14-1564) Available at: Web applications are one of the most prevalent platforms for information and service delivery over the Internet today. As they are increasingly used for critical services, web applications have become a popular and valuable target for security attacks. Although a large body of techniques have been developed to fortify web applications and mitigate attacks launched against them, there has been little effort devoted to drawing connections among these techniques and building the big picture of web application security research. This article surveys the area of securing web applications from the server side, with the aim of systematizing the existing techniques into a big picture that promotes future research. We first present the unique aspects of the web application development that cause inherent challenges in building secure web applications. We then discuss three commonly seen security vulnerabilities within web applications: input validation vulnerabilities, session management vulnerabilities, and application logic vulnerabilities, along with attacks that exploit these vulnerabilities. We organize the existing techniques along two dimensions: (1) the security vulnerabilities and attacks that they address and (2) the design objective and the phases of a web application during which they can be carried out. These phases are secure construction of new web applications, security analysis/testing of legacy web applications, and runtime protection of legacy web applications. Finally, we summarize the lessons learned and discuss future research opportunities in this area. Keywords: Web application security, application logic vulnerability, input validation vulnerability, session management vulnerability
  • Prithvi Bisht, Timothy Hinrichs, Nazari Skrupsky, V. N. Venkatakrishnan, "Automated Detection Of Parameter Tampering Opportunities And Vulnerabilities In Web Applications," Journal of Computer Security, Volume 22 Issue 3, May 2014, (Pages 415-465). (ID#:14-1565) Available at: This paper reviews the definition of parameter tampering vulnerabilities, and discusses an overview approach for parameter tampering detection. The challenges of this approach are explored, and consider both blackbox and whitebox settings in terms of a detection solution. Results of testing are explained, along with a conducted survey of current defense methods and their proficiencies. Keywords: Dynamic Monitoring, Parameter Tampering Attacks, Symbolic Evaluation
  • Julian Thome, Alessandra Gorla, Andreas Zeller, "Search-based Security Testing of Web Applications," SBST 2014 Proceedings of the 7th International Workshop on Search-Based Software Testing, June 2014, (Pages 5-14). (ID#:14-1566) Available at: QL injections are still the most exploited web application vulnerabilities. We present a technique to automatically detect such vulnerabilities through targeted test generation. Our approach uses search-based testing to systematically evolve inputs to maximize their potential to expose vulnerabilities. Starting from an entry URL, our BIOFUZZ prototype systematically crawls a web application and generates inputs whose effects on the SQL interaction are assessed at the interface between Web server and database. By evolving those inputs whose resulting SQL interactions show best potential, BIOFUZZ exposes vulnerabilities on real-world Web applications within minutes. As a black-box approach, BIOFUZZ requires neither analysis nor instrumentation of server code; however, it even outperforms state-of-the-art white-box vulnerability scanners. Keywords: SQL injections, Search-based testing, Security testing
  • Riboni, D.; Villani, A.; Vitali, D.; Bettini, C.; Mancini, L.V., "Obfuscation of Sensitive Data for Incremental Release of Network Flows," Networking, IEEE/ACM Transactions on, vol.PP, no.99, pp.1,1, March 2014. (ID#:14-1567) Available at: Large datasets of real network flows acquired from the Internet are an invaluable resource for the research community. Applications include network modeling and simulation, identification of security attacks, and validation of research results. Unfortunately, network flows carry extremely sensitive information, and this discourages the publication of those datasets. Indeed, existing techniques for network flow sanitization are vulnerable to different kinds of attacks, and solutions proposed for microdata anonymity cannot be directly applied to network traces. In our previous research, we proposed an obfuscation technique for network flows, providing formal confidentiality guarantees under realistic assumptions about the adversary's knowledge. In this paper, we identify the threats posed by the incremental release of network flows, we propose a novel defense algorithm, and we formally prove the achieved confidentiality guarantees. An extensive experimental evaluation of the algorithm for incremental obfuscation, carried out with billions of real Internet flows, shows that our obfuscation technique preserves the utility of flows for network traffic analysis. Keywords: Data privacy; Encryption; P networks; Knowledge engineering; Privacy; Uncertainty; Data sharing; network flow analysis; privacy; security


Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to SoS.Project (at) for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.