Visible to the public Biblio

Filters: Keyword is Writing  [Clear All Filters]
Chen, Shuo-Han, Yang, Ming-Chang, Chang, Yuan-Hao, Wu, Chun-Feng.  2019.  Enabling File-Oriented Fast Secure Deletion on Shingled Magnetic Recording Drives. 2019 56th ACM/IEEE Design Automation Conference (DAC). :1—6.

Existing secure deletion approaches are inefficient in erasing data permanently because file systems have no knowledge of the data layout on the storage device, nor is the storage device aware of file information within the file systems. This inefficiency is exaggerated on the emerging shingled magnetic recording (SMR) drive due to its inherent sequential-write constraint. On SMR drives, secure deletion requests may lead to serious write amplification and performance degradation if the data layout is not properly configured. Such observation motivates us to propose a file-oriented fast secure deletion (FFSD) strategy to alleviate the negative impacts of SMR drives' sequential-write constraint and improve the efficiency of secure deletion operations on SMR drives. A series of experiments was conducted to demonstrate the capability of the proposed strategy on improving the efficiency of secure deletion on SMR drives.

Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
Naik, Nitin, Jenkins, Paul, Savage, Nick, Yang, Longzhi.  2019.  Cyberthreat Hunting - Part 2: Tracking Ransomware Threat Actors Using Fuzzy Hashing and Fuzzy C-Means Clustering. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.

Threat actors are constantly seeking new attack surfaces, with ransomeware being one the most successful attack vectors that have been used for financial gain. This has been achieved through the dispersion of unlimited polymorphic samples of ransomware whilst those responsible evade detection and hide their identity. Nonetheless, every ransomware threat actor adopts some similar style or uses some common patterns in their malicious code writing, which can be significant evidence contributing to their identification. he first step in attempting to identify the source of the attack is to cluster a large number of ransomware samples based on very little or no information about the samples, accordingly, their traits and signatures can be analysed and identified. T herefore, this paper proposes an efficient fuzzy analysis approach to cluster ransomware samples based on the combination of two fuzzy techniques fuzzy hashing and fuzzy c-means (FCM) clustering. Unlike other clustering techniques, FCM can directly utilise similarity scores generated by a fuzzy hashing method and cluster them into similar groups without requiring additional transformational steps to obtain distance among objects for clustering. Thus, it reduces the computational overheads by utilising fuzzy similarity scores obtained at the time of initial triaging of whether the sample is known or unknown ransomware. The performance of the proposed fuzzy method is compared against k-means clustering and the two fuzzy hashing methods SSDEEP and SDHASH which are evaluated based on their FCM clustering results to understand how the similarity score affects the clustering results.

Todorov, Vassil, Taha, Safouan, Boulanger, Frédéric, Hernandez, Armando.  2019.  Improved Invariant Generation for Industrial Software Model Checking of Time Properties. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). :334–341.
Modern automotive embedded software is mostly designed using model-based design tools such as Simulink or SCADE, and source code is generated automatically from the models. Formal proof using symbolic model checking has been integrated in these tools and can provide a higher assurance by proving safety-critical properties. Our experience shows that proving properties involving time is rather challenging when they involve long durations and timers. These properties are generally not inductive and even advanced techniques such as PDR/IC3 are unable to handle them on production models in reasonable time. In this paper, we first present our industrial use case and comment on the results obtained with the existing model checkers. Then we present our invariant generator and methodology for selecting invariants according to physical dimensions. They enable the proof of properties with long-running timers. Finally, we discuss their implementation and benchmarks.
Abate, Carmine, Blanco, Roberto, Garg, Deepak, Hritcu, Catalin, Patrignani, Marco, Thibault, Jérémy.  2019.  Journey Beyond Full Abstraction: Exploring Robust Property Preservation for Secure Compilation. 2019 IEEE 32nd Computer Security Foundations Symposium (CSF). :256–25615.
Good programming languages provide helpful abstractions for writing secure code, but the security properties of the source language are generally not preserved when compiling a program and linking it with adversarial code in a low-level target language (e.g., a library or a legacy application). Linked target code that is compromised or malicious may, for instance, read and write the compiled program's data and code, jump to arbitrary memory locations, or smash the stack, blatantly violating any source-level abstraction. By contrast, a fully abstract compilation chain protects source-level abstractions all the way down, ensuring that linked adversarial target code cannot observe more about the compiled program than what some linked source code could about the source program. However, while research in this area has so far focused on preserving observational equivalence, as needed for achieving full abstraction, there is a much larger space of security properties one can choose to preserve against linked adversarial code. And the precise class of security properties one chooses crucially impacts not only the supported security goals and the strength of the attacker model, but also the kind of protections a secure compilation chain has to introduce. We are the first to thoroughly explore a large space of formal secure compilation criteria based on robust property preservation, i.e., the preservation of properties satisfied against arbitrary adversarial contexts. We study robustly preserving various classes of trace properties such as safety, of hyperproperties such as noninterference, and of relational hyperproperties such as trace equivalence. This leads to many new secure compilation criteria, some of which are easier to practically achieve and prove than full abstraction, and some of which provide strictly stronger security guarantees. For each of the studied criteria we propose an equivalent “property-free” characterization that clarifies which proof techniques apply. For relational properties and hyperproperties, which relate the behaviors of multiple programs, our formal definitions of the property classes themselves are novel. We order our criteria by their relative strength and show several collapses and separation results. Finally, we adapt existing proof techniques to show that even the strongest of our secure compilation criteria, the robust preservation of all relational hyperproperties, is achievable for a simple translation from a statically typed to a dynamically typed language.
Jiao, Y., Hohlfield, J., Victora, R. H..  2018.  Understanding Transition and Remanence Noise in HAMR. IEEE Transactions on Magnetics. 54:1–5.

Transition noise and remanence noise are the two most important types of media noise in heat-assisted magnetic recording. We examine two methods (spatial splitting and principal components analysis) to distinguish them: both techniques show similar trends with respect to applied field and grain pitch (GP). It was also found that PW50can be affected by GP and reader design, but is almost independent of write field and bit length (larger than 50 nm). Interestingly, our simulation shows a linear relationship between jitter and PW50NSRrem, which agrees qualitatively with experimental results.

Khatchadourian, R., Tang, Y., Bagherzadeh, M., Ahmed, S..  2019.  Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :619-630.

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.

Ölvecký, M., Gabriška, D..  2018.  Wiping Techniques and Anti-Forensics Methods. 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY). :000127–000132.

This paper presents a theoretical background of main research activity focused on the evaluation of wiping/erasure standards which are mostly implemented in specific software products developed and programming for data wiping. The information saved in storage devices often consists of metadata and trace data. Especially but not only these kinds of data are very important in the process of forensic analysis because they sometimes contain information about interconnection on another file. Most people saving their sensitive information on their local storage devices and later they want to secure erase these files but usually there is a problem with this operation. Secure file destruction is one of many Anti-forensics methods. The outcome of this paper is to define the future research activities focused on the establishment of the suitable digital environment. This environment will be prepared for testing and evaluating selected wiping standards and appropriate eraser software.

Gaston, J., Narayanan, M., Dozier, G., Cothran, D. L., Arms-Chavez, C., Rossi, M., King, M. C., Xu, J..  2018.  Authorship Attribution vs. Adversarial Authorship from a LIWC and Sentiment Analysis Perspective. 2018 IEEE Symposium Series on Computational Intelligence (SSCI). :920-927.

Although Stylometry has been effectively used for Authorship Attribution, there is a growing number of methods being developed that allow authors to mask their identity [2, 13]. In this paper, we investigate the usage of non-traditional feature sets for Authorship Attribution. By using non-traditional feature sets, one may be able to reveal the identity of adversarial authors who are attempting to evade detection from Authorship Attribution systems that are based on more traditional feature sets. In addition, we demonstrate how GEFeS (Genetic & Evolutionary Feature Selection) can be used to evolve high-performance hybrid feature sets composed of two non-traditional feature sets for Authorship Attribution: LIWC (Linguistic Inquiry & Word Count) and Sentiment Analysis. These hybrids were able to reduce the Adversarial Effectiveness on a test set presented in [2] by approximately 33.4%.

Cui, S., Asghar, M. R., Russello, G..  2018.  Towards Blockchain-Based Scalable and Trustworthy File Sharing. 2018 27th International Conference on Computer Communication and Networks (ICCCN). :1-2.

In blockchain-based systems, malicious behaviour can be detected using auditable information in transactions managed by distributed ledgers. Besides cryptocurrency, blockchain technology has recently been used for other applications, such as file storage. However, most of existing blockchain- based file storage systems can not revoke a user efficiently when multiple users have access to the same file that is encrypted. Actually, they need to update file encryption keys and distribute new keys to remaining users, which significantly increases computation and bandwidth overheads. In this work, we propose a blockchain and proxy re-encryption based design for encrypted file sharing that brings a distributed access control and data management. By combining blockchain with proxy re-encryption, our approach not only ensures confidentiality and integrity of files, but also provides a scalable key management mechanism for file sharing among multiple users. Moreover, by storing encrypted files and related keys in a distributed way, our method can resist collusion attacks between revoked users and distributed proxies.

Vassena, M., Breitner, J., Russo, A..  2017.  Securing Concurrent Lazy Programs Against Information Leakage. 2017 IEEE 30th Computer Security Foundations Symposium (CSF). :37–52.
Many state-of-the-art information-flow control (IFC) tools are implemented as Haskell libraries. A distinctive feature of this language is lazy evaluation. In his influencal paper on why functional programming matters, John Hughes proclaims:,,Lazy evaluation is perhaps the most powerful tool for modularization in the functional programmer's repertoire.,,Unfortunately, lazy evaluation makes IFC libraries vulnerable to leaks via the internal timing covert channel. The problem arises due to sharing, the distinguishing feature of lazy evaluation, which ensures that results of evaluated terms are stored for subsequent re-utilization. In this sense, the evaluation of a term in a high context represents a side-effect that eludes the security mechanisms of the libraries. A naïve approach to prevent that consists in forcing the evaluation of terms before entering a high context. However, this is not always possible in lazy languages, where terms often denote infinite data structures. Instead, we propose a new language primitive, lazyDup, which duplicates terms lazily. By using lazyDup to duplicate terms manipulated in high contexts, we make the security library MAC robust against internal timing leaks via lazy evaluation. We show that well-typed programs satisfy progress-sensitive non-interference in our lazy calculus with non-strict references. Our security guarantees are supported by mechanized proofs in the Agda proof assistant.
Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A. R. B., Stamatatos, E..  2017.  Authorship Attribution for Social Media Forensics. IEEE Transactions on Information Forensics and Security. 12:5–33.

The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.

Thankaraj, A., Nair, A. J., Vasudevan, N., Pathari, V..  2017.  Misclassifications: The Missing Link. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :1719–1722.

The notion of style is pivotal to literature. The choice of a certain writing style moulds and enhances the overall character of a book. Stylometry uses statistical methods to analyze literary style. This work aims to build a recommendation system based on the similarity in stylometric cues of various authors. The problem at hand is in close proximity to the author attribution problem. It follows a supervised approach with an initial corpus of books labelled with their respective authors as training set and generate recommendations based on the misclassified books. Results in book similarity are substantiated by domain experts.

Faust, C., Dozier, G., Xu, J., King, M. C..  2017.  Adversarial Authorship, Interactive Evolutionary Hill-Climbing, and Author CAAT-III. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–8.

We are currently witnessing the development of increasingly effective author identification systems (AISs) that have the potential to track users across the internet based on their writing style. In this paper, we discuss two methods for providing user anonymity with respect to writing style: Adversarial Stylometry and Adversarial Authorship. With Adversarial Stylometry, a user attempts to obfuscate their writing style by consciously altering it. With Adversarial Authorship, a user can select an author cluster target (ACT) and write toward this target with the intention of subverting an AIS so that the user's writing sample will be misclassified Our results show that Adversarial Authorship via interactive evolutionary hill-climbing outperforms Adversarial Stylometry.

Keerthana, S., Monisha, C., Priyanka, S., Veena, S..  2017.  De Duplication Scalable Secure File Sharing on Untrusted Storage in Big Data. 2017 International Conference on Information Communication and Embedded Systems (ICICES). :1–6.

Data Deduplication provides lots of benefits to security and privacy issues which can arise as user's sensitive data at risk of within and out of doors attacks. Traditional secret writing that provides knowledge confidentiality is incompatible with knowledge deduplication. Ancient secret writing wants completely different users to encode their knowledge with their own keys. Thus, identical knowledge copies of completely different various users can result in different ciphertexts that makes Deduplication not possible. Convergent secret writing has been planned to enforce knowledge confidentiality whereas creating Deduplication possible. It encrypts/decrypts a knowledge copy with a confluent key, that is obtained by computing the cryptographical hash price of the content of the information copy. Once generation of key and encryption, the user can retain the keys and send ciphertext to cloud.

Bours, P., Brahmanpally, S..  2017.  Language Dependent Challenge-Based Keystroke Dynamics. 2017 International Carnahan Conference on Security Technology (ICCST). :1–6.

Keystroke Dynamics can be used as an unobtrusive method to enhance password authentication, by checking the typing rhythm of the user. Fixed passwords will give an attacker the possibility to try to learn to mimic the typing behaviour of a victim. In this paper we will investigate the performance of a keystroke dynamic (KD) system when the users have to type given (English) words. Under the assumption that it is easy to type words in your native language and difficult in a foreign language will we also test the performance of such a challenge-based KD system when the challenges are not common English words, but words in the native language of the user. We collected data from participants with 6 different native language backgrounds and had them type random 8-12 character words in each of the 6 languages. The participants also typed random English words and random French words. English was assumed to be a language familiar to all participants, while French was not a native language to any participant and most likely most participants were not fluent in French. Analysis showed that using language dependent words gave a better performance of the challenge-based KD compared to an all English challenge-based system. When using words in a native language, then the performance of the participants with their mother-tongue equal to that native language had a similar performance compared to the all English challenge-based system, but the non-native speakers had an FMR that was significantly lower than the native language speakers. We found that native Telugu speakers had an FMR of less than 1% when writing Spanish or Slovak words. We also found that duration features were best to recognize genuine users, but latency features performed best to recognize non-native impostor users.

Chammas, E., Mokbel, C., Likforman-Sulem, L..  2015.  Arabic handwritten document preprocessing and recognition. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). :451–455.

Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.

Johnson, R., Kiourtis, N., Stavrou, A., Sritapan, V..  2015.  Analysis of content copyright infringement in mobile application markets. 2015 APWG Symposium on Electronic Crime Research (eCrime). :1–10.

As mobile devices increasingly become bigger in terms of display and reliable in delivering paid entertainment and video content, we also see a rise in the presence of mobile applications that attempt to profit by streaming pirated content to unsuspected end-users. These applications are both paid and free and in the case of free applications, the source of funding appears to be advertisements that are displayed while the content is streamed to the device. In this paper, we assess the extent of content copyright infringement for mobile markets that span multiple platforms (iOS, Android, and Windows Mobile) and cover both official and unofficial mobile markets located across the world. Using a set of search keywords that point to titles of paid streaming content, we discovered 8,592 Android, 5,550 iOS, and 3,910 Windows mobile applications that matched our search criteria. Out of those applications, hundreds had links to either locally or remotely stored pirated content and were not developed, endorsed, or, in many cases, known to the owners of the copyrighted contents. We also revealed the network locations of 856,717 Uniform Resource Locators (URLs) pointing to back-end servers and cyber-lockers used to communicate the pirated content to the mobile application.

Marukatat, R., Somkiadcharoen, R., Nalintasnai, R., Aramboonpong, T..  2014.  Authorship Attribution Analysis of Thai Online Messages. Information Science and Applications (ICISA), 2014 International Conference on. :1-4.

This paper presents a framework to identify the authors of Thai online messages. The identification is based on 53 writing attributes and the selected algorithms are support vector machine (SVM) and C4.5 decision tree. Experimental results indicate that the overall accuracies achieved by the SVM and the C4.5 were 79% and 75%, respectively. This difference was not statistically significant (at 95% confidence interval). As for the performance of identifying individual authors, in some cases the SVM was clearly better than the C4.5. But there were also other cases where both of them could not distinguish one author from another.