Visible to the public Biblio

Filters: Keyword is Computing Theory and Privacy  [Clear All Filters]
Gu, Bruce, Wang, Xiaodong, Qu, Youyang, Jin, Jiong, Xiang, Yong, Gao, Longxiang.  2019.  Context-Aware Privacy Preservation in a Hierarchical Fog Computing System. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1–6.
Fog computing faces various security and privacy threats. Internet of Things (IoTs) devices have limited computing, storage, and other resources. They are vulnerable to attack by adversaries. Although the existing privacy-preserving solutions in fog computing can be migrated to address some privacy issues, specific privacy challenges still exist because of the unique features of fog computing, such as the decentralized and hierarchical infrastructure, mobility, location and content-aware applications. Unfortunately, privacy-preserving issues and resources in fog computing have not been systematically identified, especially the privacy preservation in multiple fog node communication with end users. In this paper, we propose a dynamic MDP-based privacy-preserving model in zero-sum game to identify the efficiency of the privacy loss and payoff changes to preserve sensitive content in a fog computing environment. First, we develop a new dynamic model with MDP-based comprehensive algorithms. Then, extensive experimental results identify the significance of the proposed model compared with others in more effectively and feasibly solving the discussed issues.
Oya, Simon, Troncoso, Carmela, Pèrez-Gonzàlez, Fernando.  2019.  Rethinking Location Privacy for Unknown Mobility Behaviors. 2019 IEEE European Symposium on Security and Privacy (EuroS P). :416–431.
Location Privacy-Preserving Mechanisms (LPPMs) in the literature largely consider that users' data available for training wholly characterizes their mobility patterns. Thus, they hardwire this information in their designs and evaluate their privacy properties with these same data. In this paper, we aim to understand the impact of this decision on the level of privacy these LPPMs may offer in real life when the users' mobility data may be different from the data used in the design phase. Our results show that, in many cases, training data does not capture users' behavior accurately and, thus, the level of privacy provided by the LPPM is often overestimated. To address this gap between theory and practice, we propose to use blank-slate models for LPPM design. Contrary to the hardwired approach, that assumes known users' behavior, blank-slate models learn the users' behavior from the queries to the service provider. We leverage this blank-slate approach to develop a new family of LPPMs, that we call Profile Estimation-Based LPPMs. Using real data, we empirically show that our proposal outperforms optimal state-of-the-art mechanisms designed on sporadic hardwired models. On non-sporadic location privacy scenarios, our method is only better if the usage of the location privacy service is not continuous. It is our hope that eliminating the need to bootstrap the mechanisms with training data and ensuring that the mechanisms are lightweight and easy to compute help fostering the integration of location privacy protections in deployed systems.
Chen, Lvhao, Liao, Xiaofeng, Mu, Nankun, Wu, Jiahui, Junqing, Junqing.  2019.  Privacy-Preserving Fuzzy Multi-Keyword Search for Multiple Data Owners in Cloud Computing. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :2166–2171.
With cloud computing's development, more users are decide to store information on the cloud server. Owing to the cloud server's insecurity, many documents should be encrypted to avoid information leakage before being sent to the cloud. Nevertheless, it leads to the problem that plaintext search techniques can not be directly applied to the ciphertext search. In this case, many searchable encryption schemes based on single data owner model have been proposed. But, the actual situation is that users want to do research with encrypted documents originating from various data owners. This paper puts forward a privacy-preserving scheme that is based on fuzzy multi-keyword search (PPFMKS) for multiple data owners. For the sake of espousing fuzzy multi-keyword and accurate search, secure indexes on the basis of Locality-Sensitive Hashing (LSH) and Bloom Filter (BF)are established. To guarantee the search privacy under multiple data owners model, a new encryption method allowing that different data owners have diverse keys to encrypt files is proposed. This method also solves the high cost caused by inconvenience of key management.
Zhang, Shuaipeng, Liu, Hong.  2019.  Environment Aware Privacy-Preserving Authentication with Predictability for Medical Edge Computing. 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :90–96.
With the development of IoT, smart health has significantly improved the quality of people's life. A large amount of smart health monitoring system has been proposed, which provides an opportunity for timely and efficient diagnosis. Nevertheless, most of them ignored the impact of environment on patients' health. Due to the openness of the communication channel, data security and privacy preservation are crucial problems to be solved. In this work, an environment aware privacy-preserving authentication protocol based on the fuzzy extractor and elliptic curve cryptography (ecc) is designed for health monitoring system with mutual authentication and anonymity. Edge computing unit can authenticate all environmental sensors at one time. Fuzzy synthetic evaluation model is utilized to evaluate the environment equality with the patients' temporal health index (THI) as an assessment factor, which can help to predict the appropriate environment. The session key is established for secure communication based on the predicted result. Through security analysis, the proposed protocol can prevent common attacks. Moreover, performance analysis shows that the proposed protocol is applicable for resource-limited smart devices in edge computing health monitoring system.
Becher, Kilian, Beck, Martin, Strufe, Thorsten.  2019.  An Enhanced Approach to Cloud-based Privacy-preserving Benchmarking. 2019 International Conference on Networked Systems (NetSys). :1–8.
Benchmarking is an important measure for companies to investigate their performance and to increase efficiency. As companies usually are reluctant to provide their key performance indicators (KPIs) for public benchmarks, privacy-preserving benchmarking systems are required. In this paper, we present an enhanced privacy-preserving benchmarking protocol, which we implemented and evaluated based on the real-world scenario of product cost optimisation. It is based on homomorphic encryption and enables cloud-based KPI comparison, providing a variety of statistical measures. The theoretical and empirical evaluation of our benchmarking system underlines its practicability.
Chertchom, Prajak, Tanimoto, Shigeaki, Konosu, Tsutomu, Iwashita, Motoi, Kobayashi, Toru, Sato, Hiroyuki, Kanai, Atsushi.  2019.  Data Management Portfolio for Improvement of Privacy in Fog-to-cloud Computing Systems. 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI). :884–889.
With the challenge of the vast amount of data generated by devices at the edge of networks, new architecture needs a well-established data service model that accounts for privacy concerns. This paper presents an architecture of data transmission and a data portfolio with privacy for fog-to-cloud (DPPforF2C). We would like to propose a practical data model with privacy from a digitalized information perspective at fog nodes. In addition, we also propose an architecture for implicating the privacy of DPPforF2C used in fog computing. Technically, we design a data portfolio based on the Message Queuing Telemetry Transport (MQTT) and the Advanced Message Queuing Protocol (AMQP). We aim to propose sample data models with privacy architecture because there are some differences in the data obtained from IoT devices and sensors. Thus, we propose an architecture with the privacy of DPPforF2C for publishing data from edge devices to fog and to cloud servers that could be applied to fog architecture in the future.
Dong, Guishan, Chen, Yuxiang, Fan, Jia, Liu, Dijun, Hao, Yao, Wang, Zhen.  2018.  A Privacy-User-Friendly Scheme for Wearable Smart Sensing Devices Based on Blockchain. 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :481–486.
Wearable smart sensing devices presently become more and more popular in people's daily life, which also brings serious problems related to personal data privacy. In order to provide users better experiences, wearable smart sensing devices are collecting users' personal data all the time and uploading the data to service provider to get computing services, which objectively let service provider master each user's condition and cause a lot of problems such as spam, harassing call, etc. This paper designs a blockchain based scheme to solve such problems by cutting off the association between user identifier and its sensing data from perspective of shielding service providers and adversaries. Firstly, privacy requirements and situations in smart sensing area are reviewed. Then, three key technologies are introduced in the scheme including its theories, purposes and usage. Next, the designed protocol is shown and analyzed in detail. Finally, security analysis and engineering feasibility of the scheme are given. This scheme will give user better experience from privacy protection perspective in smart sensing area.
Zhang, Xueru, Khalili, Mohammad Mahdi, Liu, Mingyan.  2018.  Recycled ADMM: Improve Privacy and Accuracy with Less Computation in Distributed Algorithms. 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton). :959–965.
Alternating direction method of multiplier (ADMM) is a powerful method to solve decentralized convex optimization problems. In distributed settings, each node performs computation with its local data and the local results are exchanged among neighboring nodes in an iterative fashion. During this iterative process the leakage of data privacy arises and can accumulate significantly over many iterations, making it difficult to balance the privacy-utility tradeoff. In this study we propose Recycled ADMM (R-ADMM), where a linear approximation is applied to every even iteration, its solution directly calculated using only results from the previous, odd iteration. It turns out that under such a scheme, half of the updates incur no privacy loss and require much less computation compared to the conventional ADMM. We obtain a sufficient condition for the convergence of R-ADMM and provide the privacy analysis based on objective perturbation.
Kohli, Nitin, Laskowski, Paul.  2018.  Epsilon Voting: Mechanism Design for Parameter Selection in Differential Privacy. 2018 IEEE Symposium on Privacy-Aware Computing (PAC). :19–30.
The behavior of a differentially private system is governed by a parameter epsilon which sets a balance between protecting the privacy of individuals and returning accurate results. While a system owner may use a number of heuristics to select epsilon, existing techniques may be unresponsive to the needs of the users who's data is at risk. A promising alternative is to allow users to express their preferences for epsilon. In a system we call epsilon voting, users report the parameter values they want to a chooser mechanism, which aggregates them into a single value. We apply techniques from mechanism design to ask whether such a chooser mechanism can itself be truthful, private, anonymous, and also responsive to users. Without imposing restrictions on user preferences, the only feasible mechanisms belong to a class we call randomized dictatorships with phantoms. This is a restrictive class in which at most one user has any effect on the chosen epsilon. On the other hand, when users exhibit single-peaked preferences, a broader class of mechanisms - ones that generalize the median and other order statistics - becomes possible.
Fimiani, Gianluca.  2018.  Supporting Privacy in a Cloud-Based Health Information System by Means of Fuzzy Conditional Identity-Based Proxy Re-encryption (FCI-PRE). 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA). :569–572.
Healthcare is traditionally a data-intensive domain, where physicians needs complete and updated anamnesis of their patients to take the best medical decisions. Dematerialization of the medical documents and the consequent health information systems to share electronic health records among healthcare providers are paving the way to an effective solution to this issue. However, they are also paving the way of non-negligible privacy issues that are limiting the full application of these technologies. Encryption is a valuable means to resolve such issues, however the current schemes are not able to cope with all the needs and challenges that the cloud-based sharing of electronic health records imposes. In this work we have investigated the use of a novel scheme where encryption is combined with biometric authentication, and defines a preliminary solution.
Liu, Qin, Pei, Shuyu, Xie, Kang, Wu, Jie, Peng, Tao, Wang, Guojun.  2018.  Achieving Secure and Effective Search Services in Cloud Computing. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1386–1391.
One critical challenge of today's cloud services is how to provide an effective search service while preserving user privacy. In this paper, we propose a wildcard-based multi-keyword fuzzy search (WMFS) scheme over the encrypted data, which tolerates keyword misspellings by exploiting the indecomposable property of primes. Compared with existing secure fuzzy search schemes, our WMFS scheme has the following merits: 1) Efficiency. It eliminates the requirement of a predefined dictionary and thus supports updates efficiently. 2) High accuracy. It eliminates the false positive and false negative introduced by specific data structures and thus allows the user to retrieve files as accurate as possible. 3) Flexibility. It gives the user great flexibility to specify different search patterns including keyword and substring matching. Extensive experiments on a real data set demonstrate the effectiveness and efficiency of our scheme.
Gao, Meng-Qi, Han, Jian-Min, Lu, Jian-Feng, Peng, Hao, Hu, Zhao-Long.  2018.  Incentive Mechanism for User Collaboration on Trajectory Privacy Preservation. 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). :1976–1981.
Collaborative trajectory privacy preservation (CTPP) scheme is an effective method for continuous queries. However, collaborating with other users need pay some cost. Therefore, some rational and selfish users will not choose collaboration, which will result in users' privacy disclosing. To solve the problem, this paper proposes a collaboration incentive mechanism by rewarding collaborative users and punishing non-collaborative users. The paper models the interactions of users participating in CTPP as a repeated game and analysis the utility of participated users. The analytical results show that CTPP with the proposed incentive mechanism can maximize user's payoffs. Experiments show that the proposed mechanism can effectively encourage users' collaboration behavior and effectively preserve the trajectory privacy for continuous query users.
Han, Xu, Liu, Yanheng, Wang, Jian.  2018.  Modeling and analyzing privacy-awareness social behavior network. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :7–12.
The increasingly networked human society requires that human beings have a clear understanding and control over the structure, nature and behavior of various social networks. There is a tendency towards privacy in the study of network evolutions because privacy disclosure behavior in the network has gradually developed into a serious concern. For this purpose, we extended information theory and proposed a brand-new concept about so-called “habitual privacy” to quantitatively analyze privacy exposure behavior and facilitate privacy computation. We emphasized that habitual privacy is an inherent property of the user and is correlated with their habitual behaviors. The widely approved driving force in recent modeling complex networks is originated from activity. Thus, we propose the privacy-driven model through synthetically considering the activity impact and habitual privacy underlying the decision process. Privacy-driven model facilitates to more accurately capture highly dynamical network behaviors and figure out the complex evolution process, allowing a profound understanding of the evolution of network driven by privacy.
Li, Wei, Hu, Chunqiang, Song, Tianyi, Yu, Jiguo, Xing, Xiaoshuang, Cai, Zhipeng.  2018.  Privacy-Preserving Data Collection in Context-Aware Applications. 2018 IEEE Symposium on Privacy-Aware Computing (PAC). :75–85.
Thanks to the development and popularity of context-aware applications, the quality of users' life has been improved through a wide variety of customized services. Meanwhile, users are suffering severe risk of privacy leakage and their privacy concerns are growing over time. To tackle the contradiction between the serious privacy issues and the growing privacy concerns in context-aware applications, in this paper, we propose a privacy-preserving data collection scheme by incorporating the complicated interactions among user, attacker, and service provider into a three-antithetic-party game. Under such a novel game model, we identify and rigorously prove the best strategies of the three parties and the equilibriums of the games. Furthermore, we evaluate the performance of our proposed data collection game by performing extensive numerical experiments, confirming that the user's data privacy can be effective preserved.
Zhang, Xuejun, Chen, Qian, Peng, Xiaohui, Jiang, Xinlong.  2019.  Differential Privacy-Based Indoor Localization Privacy Protection in Edge Computing. 2019 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). :491–496.

With the popularity of smart devices and the widespread use of the Wi-Fi-based indoor localization, edge computing is becoming the mainstream paradigm of processing massive sensing data to acquire indoor localization service. However, these data which were conveyed to train the localization model unintentionally contain some sensitive information of users/devices, and were released without any protection may cause serious privacy leakage. To solve this issue, we propose a lightweight differential privacy-preserving mechanism for the edge computing environment. We extend ε-differential privacy theory to a mature machine learning localization technology to achieve privacy protection while training the localization model. Experimental results on multiple real-world datasets show that, compared with the original localization technology without privacy-preserving, our proposed scheme can achieve high accuracy of indoor localization while providing differential privacy guarantee. Through regulating the value of ε, the data quality loss of our method can be controlled up to 8.9% and the time consumption can be almost negligible. Therefore, our scheme can be efficiently applied in the edge networks and provides some guidance on indoor localization privacy protection in the edge computing.

Ding, Hongfa, Peng, Changgen, Tian, Youliang, Xiang, Shuwen.  2019.  A Game Theoretical Analysis of Risk Adaptive Access Control for Privacy Preserving. 2019 International Conference on Networking and Network Applications (NaNA). :253–258.

More and more security and privacy issues are arising as new technologies, such as big data and cloud computing, are widely applied in nowadays. For decreasing the privacy breaches in access control system under opening and cross-domain environment. In this paper, we suggest a game and risk based access model for privacy preserving by employing Shannon information and game theory. After defining the notions of Privacy Risk and Privacy Violation Access, a high-level framework of game theoretical risk based access control is proposed. Further, we present formulas for estimating the risk value of access request and user, construct and analyze the game model of the proposed access control by using a multi-stage two player game. There exists sub-game perfect Nash equilibrium each stage in the risk based access control and it's suitable to protect the privacy by limiting the privacy violation access requests.

Takbiri, Nazanin, Shao, Xiaozhe, Gao, Lixin, Pishro-Nik, Hossein.  2019.  Improving Privacy in Graphs Through Node Addition. 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). :487–494.

The rapid growth of computer systems which generate graph data necessitates employing privacy-preserving mechanisms to protect users' identity. Since structure-based de-anonymization attacks can reveal users' identity's even when the graph is simply anonymized by employing naïve ID removal, recently, k- anonymity is proposed to secure users' privacy against the structure-based attack. Most of the work ensured graph privacy using fake edges, however, in some applications, edge addition or deletion might cause a significant change to the key property of the graph. Motivated by this fact, in this paper, we introduce a novel method which ensures privacy by adding fake nodes to the graph. First, we present a novel model which provides k- anonymity against one of the strongest attacks: seed-based attack. In this attack, the adversary knows the partial mapping between the main graph and the graph which is generated using the privacy-preserving mechanisms. We show that even if the adversary knows the mapping of all of the nodes except one, the last node can still have k- anonymity privacy. Then, we turn our attention to the privacy of the graphs generated by inter-domain routing against degree attacks in which the degree sequence of the graph is known to the adversary. To ensure the privacy of networks against this attack, we propose a novel method which tries to add fake nodes in a way that the degree of all nodes have the same expected value.

Li, Xianxian, Luo, Chunfeng, Liu, Peng, Wang, Li-e.  2019.  Information Entropy Differential Privacy: A Differential Privacy Protection Data Method Based on Rough Set Theory. 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :918–923.

Data have become an important asset for analysis and behavioral prediction, especially correlations between data. Privacy protection has aroused academic and social concern given the amount of personal sensitive information involved in data. However, existing works assume that the records are independent of each other, which is unsuitable for associated data. Many studies either fail to achieve privacy protection or lead to excessive loss of information while applying data correlations. Differential privacy, which achieves privacy protection by injecting random noise into the statistical results given the correlation, will improve the background knowledge of adversaries. Therefore, this paper proposes an information entropy differential privacy solution for correlation data privacy issues based on rough set theory. Under the solution, we use rough set theory to measure the degree of association between attributes and use information entropy to quantify the sensitivity of the attribute. The information entropy difference privacy is achieved by clustering based on the correlation and adding personalized noise to each cluster while preserving the correlations between data. Experiments show that our algorithm can effectively preserve the correlation between the attributes while protecting privacy.

Cerf, Sophie, Robu, Bogdan, Marchand, Nicolas, Mokhtar, Sonia Ben, Bouchenak, Sara.  2018.  A Control-Theoretic Approach for Location Privacy in Mobile Applications. 2018 IEEE Conference on Control Technology and Applications (CCTA). :1488-1493.

The prevalent use of mobile applications using location information to improve the quality of their service has arisen privacy issues, particularly regarding the extraction of user's points on interest. Many studies in the literature focus on presenting algorithms that allow to protect the user of such applications. However, these solutions often require a high level of expertise to be understood and tuned properly. In this paper, the first control-based approach of this problem is presented. The protection algorithm is considered as the ``physical'' plant and its parameters as control signals that enable to guarantee privacy despite user's mobility pattern. The following of the paper presents the first control formulation of POI-related privacy measure, as well as dynamic modeling and a simple yet efficient PI control strategy. The evaluation using simulated mobility records shows the relevance and efficiency of the presented approach.

Martiny, Karsten, Elenius, Daniel, Denker, Grit.  2018.  Protecting Privacy with a Declarative Policy Framework. 2018 IEEE 12th International Conference on Semantic Computing (ICSC). :227–234.

This article describes a privacy policy framework that can represent and reason about complex privacy policies. By using a Common Data Model together with a formal shareability theory, this framework enables the specification of expressive policies in a concise way without burdening the user with technical details of the underlying formalism. We also build a privacy policy decision engine that implements the framework and that has been deployed as the policy decision point in a novel enterprise privacy prototype system. Our policy decision engine supports two main uses: (1) interfacing with user interfaces for the creation, validation, and management of privacy policies; and (2) interfacing with systems that manage data requests and replies by coordinating privacy policy engine decisions and access to (encrypted) databases using various privacy enhancing technologies.

Yuan, Ganzhao, Yang, Yin, Zhang, Zhenjie, Hao, Zhifeng.  2016.  Convex Optimization for Linear Query Processing Under Approximate Differential Privacy. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :2005–2014.

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate strategy, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (ε, ཬ)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newton's method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.

Malluhi, Qutaibah M., Shikfa, Abdullatif, Trinh, Viet Cuong.  2016.  An Efficient Instance Hiding Scheme. Proceedings of the Seventh Symposium on Information and Communication Technology. :388–395.

Delegating computation, which is applicable to many practical contexts such as cloud computing or pay-TV system, concerns the task where a computationally weak client wants to securely compute a very complex function f on a given input with the help of a remote computationally strong but untrusted server. The requirement is that the computation complexity of the client is much more efficient than that of f, ideally it should be in constant time or in NC0. This task has been investigated in several contexts such as instance hiding, randomized encoding, fully homomorphic encryption, garbling schemes, and verifiable scheme. In this work, we specifically consider the context where only the client has an input and gets an output, also called instance hiding. Concretely, we first give a survey of delegating computation, we then propose an efficient instance hiding scheme with passive input privacy. In our scheme, the computation complexity of the client is in NC0 and that of the server is exactly the same as the original function f. Regarding communication complexity, the client in our scheme just needs to transfer 4textbarftextbar + textbarxtextbar bits to the server, where textbarftextbar is the size of the circuit representing f and textbarxtextbar is the length of the input of f.

Bassily, Raef, Nissim, Kobbi, Smith, Adam, Steinke, Thomas, Stemmer, Uri, Ullman, Jonathan.  2016.  Algorithmic Stability for Adaptive Data Analysis. Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing. :1046–1059.

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Cummings, Rachel, Ligett, Katrina, Radhakrishnan, Jaikumar, Roth, Aaron, Wu, Zhiwei Steven.  2016.  Coordination Complexity: Small Information Coordinating Large Populations. Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science. :281–290.

We initiate the study of a quantity that we call coordination complexity. In a distributed optimization problem, the information defining a problem instance is distributed among n parties, who need to each choose an action, which jointly will form a solution to the optimization problem. The coordination complexity represents the minimal amount of information that a centralized coordinator, who has full knowledge of the problem instance, needs to broadcast in order to coordinate the n parties to play a nearly optimal solution. We show that upper bounds on the coordination complexity of a problem imply the existence of good jointly differentially private algorithms for solving that problem, which in turn are known to upper bound the price of anarchy in certain games with dynamically changing populations. We show several results. We fully characterize the coordination complexity for the problem of computing a many-to-one matching in a bipartite graph. Our upper bound in fact extends much more generally to the problem of solving a linearly separable convex program. We also give a different upper bound technique, which we use to bound the coordination complexity of coordinating a Nash equilibrium in a routing game, and of computing a stable matching.