Visible to the public Bootstrapping Privacy Compliance in Big Data Systems

TitleBootstrapping Privacy Compliance in Big Data Systems
Publication TypeConference Paper
Year of Publication2014
AuthorsSen, S., Guha, S., Datta, A., Rajamani, S.K., Tsai, J., Wing, J.M.
Conference NameSecurity and Privacy (SP), 2014 IEEE Symposium on
Date PublishedMay
Keywordsadvertising, automatic privacy policy compliance checking, Big Data, Bing, business imperative privacy policies, cloud computing, cloud services, code-level schema element mapping, compliance, computer bootstrapping, conformance testing, data privacy, datatypes, Grok data inventory, information flow, information flow types, IP networks, Lattices, Legal ease language, Map-Reduce-like Big Data systems, minimal human input, parallel programming, personalized user experiences, policy, privacy, privacy compliance bootstrapping, privacy policy specification, program analysis, program annotation, search engines, Semantics, source code, source code (software), user data handling, user trust, web services

With the rapid increase in cloud services collecting and using user data to offer personalized experiences, ensuring that these services comply with their privacy policies has become a business imperative for building user trust. However, most compliance efforts in industry today rely on manual review processes and audits designed to safeguard user data, and therefore are resource intensive and lack coverage. In this paper, we present our experience building and operating a system to automate privacy policy compliance checking in Bing. Central to the design of the system are (a) Legal ease-a language that allows specification of privacy policies that impose restrictions on how user data is handled, and (b) Grok-a data inventory for Map-Reduce-like big data systems that tracks how user data flows among programs. Grok maps code-level schema elements to data types in Legal ease, in essence, annotating existing programs with information flow types with minimal human input. Compliance checking is thus reduced to information flow analysis of Big Data systems. The system, bootstrapped by a small team, checks compliance daily of millions of lines of ever-changing source code written by several thousand developers.

Citation Key6956573