Visible to the public SaTC: CORE: Small: Models and Measurements for Website FingerprintingConflict Detection Enabled

Project Details

Performance Period

Sep 01, 2018 - Aug 31, 2021


University of Minnesota - Twin Cities

Award Number

Many private interactions between individuals and their friends, families, employers, and institutions are now carried out on the Internet; disclosure of the contents of these interactions or even the mere associations between these parties can expose people to real financial or physical risks. As a result, encryption and services such as virtual private networks or the Tor project that conceal the connection between a user and the websites they visit are growing in popularity. Website fingerprinting attacks use information that is not concealed by these techniques, such as file sizes and download times, to re-identify the websites a user visits, but while these attacks work in a lab environment, it is a challenge to evaluate them in practical settings and develop effective protections against them. This project will apply statistics and machine learning to develop new probabilistic models of the "fingerprint" of a website, metrics of the amount of information these models reveal, and privacy preserving algorithms and datasets to test these models. The results of these models and tests will be used to assess the threat posed by website fingerprinting and inform the design of new defense mechanisms. As a result, users will benefit from improved protection techniques, while other researchers can use the resulting models, datasets, and metrics to study the effectiveness of website fingerprinting defenses. The work and data will also be used to support both undergraduate and graduate education through both courses and research training.

This project will seek to address three key challenges in website fingerprinting research -- privacy-preserving characterization of background traffic, maintaining fingerprint databases, and evaluation and comparison of defenses -- by developing new representations of website fingerprints that can assign a likelihood to any fingerprint being generated by a specific type of download. Using these representations, the project will pursue four main thrusts. First, the project will use these models to determine the extent to which website fingerprints can directly infer identifying features of a download without requiring a database of all possible web pages, moving from existing closed world approaches to website fingerprinting toward more broadly applicable open world approaches. Second, the project will develop algorithms to train the models on live traffic while preserving the privacy of individual users using concepts from differential privacy. Third, using the trained models, the project will provide the first assessment of attacks and defenses on realistic data, and use metrics from information theory to compare those attacks and defenses on an equal footing. Finally, the project will use the results of these evaluations to develop new defensive techniques that can be applied directly to the content of privacy-sensitive sites and to systems designed to protect users' downloads such as the Tor network.