Visible to the public Spotlight on Lablet Research #25 - Governance for Big DataConflict Detection Enabled

Spotlight on Lablet Research #25 -

Governance for Big Data

Lablet: International Computer Science Institute (ICSI)
Participating Sub-Lablet: University of California at Berkeley

This project aims to synthesize computer science abstractions with governance goals. The risk in governance for big data is that access control does not capture privacy requirements. With respect to sensitive inferences and reidentification, it is difficult to redact sensitive information from rich data sets, and often sensitive data can be reidentified using additional information outside the data set or proxies. It is possible that Machine Learning will find such correlations automatically; binary allow/deny access control fails to capture this well. In limiting sensitive inferences, there are several related issues, including differential privacy, encryption and access control, and fairness issues.

Software development teams are responsible for making and implementing software design decisions that directly impact end-user privacy, a challenging task to do well. Privacy Champions within organizations--people who strongly care about advocating for privacy--play a useful role in supporting privacy-respecting development cultures. To understand their motivations, challenges, and strategies for protecting end-user privacy, the research team led by Serge Egelman, Principal Investigator (PI), conducted 12 interviews with Privacy Champions in software development teams. The interviews determined that common barriers to implementing privacy in software design include: negative privacy culture, internal prioritization tensions, limited tool support, unclear evaluation metrics, and technical complexity. To promote privacy, Privacy Champions regularly use informal discussions, management support, communication among stakeholders, and documentation and guidelines. They perceive code reviews and practical training as more instructive than general privacy awareness and on-boarding training. This study is a first step towards understanding how Privacy Champions work to improve their organization's privacy approaches and improve the privacy of end-user products.

With colleagues at the University of Edinburgh, the research team designed and conducted a study of how the privacy information and choices presented to mobile app developers influence their choices about integrating personalized vs. non-personalized ads into their apps. Mobile advertising networks present personalized advertisements to developers as a way to increase revenue. These types of ads use data about users to select potentially more relevant content. However, choice framing also impacts app developers' decisions which in turn impacts their users' privacy. Currently, ad networks provide choices in developer-facing dashboards that control the types of information collected by the ad network as well as how users will be asked for consent. Framing and nudging have been shown to impact users' choices about privacy; it is believed that they have a similar impact on choices made by developers. The researchers conducted a survey-based online experiment with 400 participants with experience in mobile app development. Across six conditions, the researchers varied the choice framing of options around ad personalization. Participants in the condition where privacy consequences of ads personalization are highlighted in the options are significantly (11.06 times) more likely to choose non-personalized ads compared to participants in the Control condition with no information about privacy. Participants' choice of ad type is driven by the impact on revenue, user privacy, and relevance to users. The findings suggest that developers are impacted by interfaces and need transparent options.

With colleagues at the University of Bristol, the researcher team is initiating a research study to examine how developers of mobile health apps approach privacy for those apps, including app developers' expectations and beliefs about legal protections for health data, technical possibilities for protecting privacy, and users' expectations and preferences. As a first step, researchers are analyzing posts on Stack Overflow to develop an initial sketch of some of the technical concerns and issues developers of health apps have about the privacy of the data they are dealing with, and who or what is driving those issues or concerns. For example, many questions in the last few years are related to the permissions requirements for the health-specific APIs provided by mobile platforms (Apple's HealthKit, Google Fit, Samsung Health). The next step in the study will be to conduct interviews with app developers to broaden researchers' views of how they approach health data. The research team is continuing analysis of how developers of mobile health apps approach privacy for those apps, wrapping up analysis of Stack Overflow posts, examining other textual sources, and turning to designing a human subjects study with members of health app product teams.

They are also designing a vignette-style survey study to analyze users' privacy expectations about consumer health apps and what those expectations are based on, comparing across different types of apps that users may perceive as being more and less likely to be constrained by laws that regulate collection and sharing of health data. One such example is telemedicine apps vs. workout trackers vs. dating apps for people with certain medical conditions.

Finally, with colleagues at Aalto University in Finland, the research team is preparing to conduct a related study with healthcare professionals on their views of consumer health apps. The study design has been developed and submitted for ethics review by Aalto University. Research questions include how these views and expectations on the collection of health data by consumer health apps are related to policies and governance structures for health data in medical settings. The plan includes interviews with healthcare professionals in several countries (the U.S., Finland, Sweden, Sri Lanka, and Singapore) that are quite different in terms of legal and ethical protections for medical data, the degree of centralization and control in the systems where patients' medical data resides, and the availability of government-sponsored health self-management platforms and apps.

Background on the project can be found here.