Visible to the public TWC: Small: CrowdVerify: Using the Crowd to Summarize Web Site Privacy Policies and Terms of Use PoliciesConflict Detection Enabled

Project Details

Lead PI

Performance Period

Oct 01, 2014 - Sep 30, 2017


Carnegie-Mellon University

Award Number

Outcomes Report URL

Everyday web users have little guidance in handling the growing number of privacy issues they face when they go online. Many web sites - some legitimate, some less so - have behaviors many would consider unexpected or undesirable. These include popular and well-known web sites, as well as web sites that aim to dupe customers with "free" trials. These kinds of sites often detail their behaviors in privacy policies and terms of use pages, but these policies are rarely read, hard to understand, and sometimes intentionally obfuscated with legal jargon, small text, and pale fonts. The goal of this research is to develop new techniques to pinpoint and summarize the most surprising and most important parts of policies. The results of this research will be made publicly available on a web site and through web browser extensions.

The major research activity for this research will be to design, implement, and evaluate CrowdVerify, a system that combines crowdsourcing with machine learning techniques to flag the most important and unexpected behaviors of web sites. The core idea is to slice up a given policy into smaller text segments, have crowd workers compare different segments, and then aggregate the results together. A number of competitor scoring systems will also be evaluated for rating the importance of segments, including ELO, Glicko, and TrueSkill. Using these results, computational models will be built that can predict what people find most surprising as well as most important in web policies.