Biblio
Preserving privacy is extremely important in data publishing. The existing privacy-preserving models are mostly oriented to single sensitive attribute, can not be applied to multiple sensitive attributes situation. Moreover, they do not consider the semantic similarity between sensitive attribute values, and may be vulnerable to similarity attack. In this paper, we propose a (l, m, d)-anonymity model for multiple sensitive attributes similarity attack, where m is the dimension of the sensitive attributes. This model uses the semantic hierarchical tree to analyze and compute the semantic dissimilarity between sensitive attribute values, and each equivalence class must exist at least l sensitive attribute values that satisfy d-different on each dimension sensitive attribute. Meanwhile, in order to make the published data highly available, our model adopts the distance-based measurement method to divide the equivalence class. We carry out extensive experiments to certify the (1, m, d)-anonymity model can significantly reduce the probability of sensitive information leakage and protect individual privacy more effectively.
Aliasing is a known source of challenges in the context of imperative object-oriented languages, which have led to important advances in type systems for aliasing control. However, their large-scale adoption has turned out to be a surprisingly difficult challenge. While new language designs show promise, they do not address the need of aliasing control in existing languages. This paper presents a new approach to isolation and uniqueness in an existing, widely-used language, Scala. The approach is unique in the way it addresses some of the most important obstacles to the adoption of type system extensions for aliasing control. First, adaptation of existing code requires only a minimal set of annotations. Only a single bit of information is required per class. Surprisingly, the paper shows that this information can be provided by the object-capability discipline, widely-used in program security. We formalize our approach as a type system and prove key soundness theorems. The type system is implemented for the full Scala language, providing, for the first time, a sound integration with Scala's local type inference. Finally, we empirically evaluate the conformity of existing Scala open-source code on a corpus of over 75,000 LOC.
There are seemingly many advantages to being able to identify, document, test, and trace single or "atomic" requirements. Why then has there been little attention to the topic and no widely used definition or process on how to define atomic requirements? Definitions of requirements and standards focus on user needs, system capabilities or functions; some definitions include making individual requirements singular or without the use of conjunctions. In a few cases there has been a description of atomic system events or requirements. This work is surveyed here although there is no well accepted and used best practice for generating atomic requirements. Due to their importance in software engineering, quality and metrics for requirements have received considerable attention. In the seminal paper on software requirements quality, Davis et al. proposed specific metrics including the "unambiguous quality factor" and the "verifiable quality factor"; these and other metrics work best with a clearly enumerable list of single requirements. Atomic requirements are defined here as a natural language statement that completely describes a single system function, feature, need, or capability, including all information, details, limits, and characteristics. A typical user login screen is used as an example of an atomic requirement which can include both functional and nonfunctional requirements. Individual atomic requirements are supported by a system glossary, references to applicable industry standards, mock ups of the user interface, etc. One way to identify such atomic requirements is from use case or system event analysis. This definition of atomic requirements is still a work in progress and offered to prompt discussion. Atomic requirements allow clear naming or numbering of requirements for traceability, change management, and importance ranking. Further, atomic requirements defined in this manner are suitable for rapid implementation approaches (implementing one requirement at a time), enable good test planning (testing can clearly indicate pass or fail of the whole requirement), and offer other management advantages in project control.
In recent years, the issues of RFID security and privacy are a concern. To prevent the tag is cloned, physically unclonable function (PUF) has been proposed. In each PUF-enabled tag, the responses of PUF depend on the structural disorder that cannot be cloned or reproduced. Therefore, many responses need to store in the database in the initial phase of many authentication protocols. In the supply chain, the owners of the PUF-enabled Tags change frequently, many authentication and delegation protocols are proposed. In this paper, a new lightweight authentication and delegation protocol for RFID tags (LADP) is proposed. The new protocol does not require pre-stored many PUF's responses in the database. When the authentication messages are exchanged, the next response of PUF is passed to the reader secretly. In the transfer process of ownership, the new owner will not get the information of the interaction of the original owner. It can protect the privacy of the original owner. Meanwhile, the original owner cannot continue to access or track the tag. It can protect the privacy of the new owner. In terms of efficiency, the new protocol replaces the pseudorandom number generator with the randomness of PUF that suitable for use in the low-cost tags. The cost of computation and communication are reduced and superior to other protocols.
Keystroke Dynamics can be used as an unobtrusive method to enhance password authentication, by checking the typing rhythm of the user. Fixed passwords will give an attacker the possibility to try to learn to mimic the typing behaviour of a victim. In this paper we will investigate the performance of a keystroke dynamic (KD) system when the users have to type given (English) words. Under the assumption that it is easy to type words in your native language and difficult in a foreign language will we also test the performance of such a challenge-based KD system when the challenges are not common English words, but words in the native language of the user. We collected data from participants with 6 different native language backgrounds and had them type random 8-12 character words in each of the 6 languages. The participants also typed random English words and random French words. English was assumed to be a language familiar to all participants, while French was not a native language to any participant and most likely most participants were not fluent in French. Analysis showed that using language dependent words gave a better performance of the challenge-based KD compared to an all English challenge-based system. When using words in a native language, then the performance of the participants with their mother-tongue equal to that native language had a similar performance compared to the all English challenge-based system, but the non-native speakers had an FMR that was significantly lower than the native language speakers. We found that native Telugu speakers had an FMR of less than 1% when writing Spanish or Slovak words. We also found that duration features were best to recognize genuine users, but latency features performed best to recognize non-native impostor users.
Hardware Trojan Horses and active fault attacks are a threat to the safety and security of electronic systems. By such manipulations, an attacker can extract sensitive information or disturb the functionality of a device. Therefore, several protections against malicious inclusions have been devised in recent years. A prominent technique to detect abnormal behavior in the field is run-time verification. It relies on dedicated monitoring circuits and on verification rules generated from a set of temporal properties. An important question when dealing with such protections is the effectiveness of the protection against unknown attacks. In this paper, we present a methodology based on automatic generation of monitoring and formal verification techniques that can be used to validate and analyze the quality of a set of temporal properties when used as protection against generic attackers of variable strengths.
In the last few years, a shift from mass production to mass customisation is observed in the industry. Easily reprogrammable robots that can perform a wide variety of tasks are desired to keep up with the trend of mass customisation while saving costs and development time. Learning by Demonstration (LfD) is an easy way to program the robots in an intuitive manner and provides a solution to this problem. In this work, we discuss and evaluate LAP, a three-stage LfD method that conforms to the criteria for the high-mix-low-volume (HMLV) industrial settings. The algorithm learns a trajectory in the task space after which small segments can be adapted on-the-fly by using a human-in-the-loop approach. The human operator acts as a high-level adaptation, correction and evaluation mechanism to guide the robot. This way, no sensors or complex feedback algorithms are needed to improve robot behaviour, so errors and inaccuracies induced by these subsystems are avoided. After the system performs at a satisfactory level after the adaptation, the operator will be removed from the loop. The robot will then proceed in a feed-forward fashion to optimise for speed. We demonstrate this method by simulating an industrial painting application. A KUKA LBR iiwa is taught how to draw an eight figure which is reshaped by the operator during adaptation.
This work presents the design and implementation of a large curved display system in a virtual reality (VR) environment that supports visualization of 2D datasets (e.g., images, buttons and text). By using this system, users are allowed to interact with data in front of a wide field of view and gain a high level of perceived immersion. We exhibit two use cases of this system, including (1) a virtual image wall as the display component of a 3D user interface, and (2) an inventory interface for a VR-based educational game. The use cases demonstrate capability and flexibility of curved displays in supporting varied purposes of data interaction within virtual environments.
We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people.
Recently, proactive systems such as Google Now and Microsoft Cortana have become increasingly popular in reforming the way users access information on mobile devices. In these systems, relevant content is presented to users based on their context without a query in the form of information cards that do not require a click to satisfy the users. As a result, prior approaches based on clicks cannot provide reliable measurements of user satisfaction with such systems. It is also unclear how much of the previous findings regarding good abandonment with reactive Web searches can be applied to these proactive systems due to the intrinsic difference in user intent, the greater variety of content types and their presentations. In this paper, we present the first large-scale analysis of viewing behavior based on the viewport (the visible fraction of a Web page) of the mobile devices, towards measuring user satisfaction with the information cards of the mobile proactive systems. In particular, we identified and analyzed a variety of factors that may influence the viewing behavior, including biases from ranking positions, the types and attributes of the information cards, and the touch interactions with the mobile devices. We show that by modeling the various factors we can better measure user satisfaction with the mobile proactive systems, enabling stronger statistical power in large-scale online A/B testing.
Linking the growing IPv6 deployment to existing IPv4 addresses is an interesting field of research, be it for network forensics, structural analysis, or reconnaissance. In this work, we focus on classifying pairs of server IPv6 and IPv4 addresses as siblings, i.e., running on the same machine. Our methodology leverages active measurements of TCP timestamps and other network characteristics, which we measure against a diverse ground truth of 682 hosts. We define and extract a set of features, including estimation of variable (opposed to constant) remote clock skew. On these features, we train a manually crafted algorithm as well as a machine-learned decision tree. By conducting several measurement runs and training in cross-validation rounds, we aim to create models that generalize well and do not overfit our training data. We find both models to exceed 99% precision in train and test performance. We validate scalability by classifying 149k siblings in a large-scale measurement of 371k sibling candidates. We argue that this methodology, thoroughly cross-validated and likely to generalize well, can aid comparative studies of IPv6 and IPv4 behavior in the Internet. Striving for applicability and replicability, we release ready-to-use source code and raw data from our study.
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of benign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1.4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to important classes of automated evasion attacks.
This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to map activities both spatially and temporally.
Online privacy policies notify users of a Website how their personal information is collected, processed and stored. Against the background of rising privacy concerns, privacy policies seem to represent an influential instrument for increasing customer trust and loyalty. However, in practice, consumers seem to actually read privacy policies only in rare cases, possibly reflecting the common assumption stating that policies are hard to comprehend. By designing and implementing an automated extraction and readability analysis toolset that embodies a diversity of established readability measures, we present the first large-scale study that provides current empirical evidence on the readability of nearly 50,000 privacy policies of popular English-speaking Websites. The results empirically confirm that on average, current privacy policies are still hard to read. Furthermore, this study presents new theoretical insights for readability research, in particular, to what extent practical readability measures are correlated. Specifically, it shows the redundancy of several well-established readability metrics such as SMOG, RIX, LIX, GFI, FKG, ARI, and FRES, thus easing future choice making processes and comparisons between readability studies, as well as calling for research towards a readability measures framework. Moreover, a more sophisticated privacy policy extractor and analyzer as well as a solid policy text corpus for further research are provided.
Satellite networks play an important role in realizing the combination of the space networks and ground networks as well as the global coverage of the Internet. However, due to the limitation of bandwidth resource, compared with ground network, space backbone networks are more likely to become victims of DDoS attacks. Therefore, we hypothesize an attack scenario that DDoS attackers make reflection amplification attacks, colluding with terminal devices accessing space backbone network, and exhaust bandwidth resources, resulting in degradation of data transmission and service delivery. Finally, we propose some plain countermeasures to provide solutions for future researchers.
Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs), where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. However, these "traditional" backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize "Teacher" models already pretrained by providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice. In this paper, we describe latent backdoors, a more powerful and stealthy variant of backdoor attacks that functions under transfer learning. Latent backdoors are incomplete backdoors embedded into a "Teacher" model, and automatically inherited by multiple "Student" models through transfer learning. If any Student models include the label targeted by the backdoor, then its customization process completes the backdoor and makes it active. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.
The use of typing biometrics—the characteristic typing patterns of individual keyboard users—has been studied extensively in the context of enhancing multi-factor authentication services. The key starting point for such work has been the collection of high-fidelity local timing data, and the key (implicit) security assumption has been that such biometrics could not be obtained by other means. We show that the latter assumption to be false, and that it is entirely feasible to obtain useful typing biometric signatures from third-party timing logs. Specifically, we show that the logs produced by realtime collaboration services during their normal operation are of sufficient fidelity to successfully impersonate a user using remote data only. Since the logs are routinely shared as a byproduct of the services' operation, this creates an entirely new avenue of attack that few users would be aware of. As a proof of concept, we construct successful biometric attacks using only the log-based structure (complete editing history) of a shared Google Docs, or Zoho Writer, document which is readily available to all contributing parties. Using the largest available public data set of typing biometrics, we are able to create successful forgeries 100% of the time against a commercial biometric service. Our results suggest that typing biometrics are not robust against practical forgeries, and should not be given the same weight as other authentication factors. Another important implication is that the routine collection of detailed timing logs by various online services also inherently (and implicitly) contains biometrics. This not only raises obvious privacy concerns, but may also undermine the effectiveness of network anonymization solutions, such as ToR, when used with existing services.
The growing computerization of critical infrastructure as well as the pervasiveness of computing in everyday life has led to increased interest in secure application development. We observe a flurry of new security technologies like ARM TrustZone and Intel SGX, but a lack of a corresponding architectural vision. We are convinced that point solutions are not sufficient to address the overall challenge of secure system design. In this paper, we outline our take on a trusted component ecosystem of small individual building blocks with strong isolation. In our view, applications should no longer be designed as massive stacks of vertically layered frameworks, but instead as horizontal aggregates of mutually isolated components that collaborate across machine boundaries to provide a service. Lateral thinking is needed to make secure systems going forward.