Visible to the public Biblio

Found 744 results

Filters: Keyword is machine learning  [Clear All Filters]
2017-02-23
B. Yang, E. Martiri.  2015.  "Using Honey Templates to Augment Hash Based Biometric Template Protection". 2015 IEEE 39th Annual Computer Software and Applications Conference. 3:312-316.

Hash based biometric template protection schemes (BTPS), such as fuzzy commitment, fuzzy vault, and secure sketch, address the privacy leakage concern on the plain biometric template storage in a database through using cryptographic hash calculation for template verification. However, cryptographic hashes have only computational security whose being cracked shall leak the biometric feature in these BTPS; and furthermore, existing BTPS are rarely able to detect during a verification process whether a probe template has been leaked from the database or not (i.e., being used by an imposter or a genuine user). In this paper we tailor the "honeywords" idea, which was proposed to detect the hashed password cracking, to enable the detectability of biometric template database leakage. However, unlike passwords, biometric features encoded in a template cannot be renewed after being cracked and thus not straightforwardly able to be protected by the honeyword idea. To enable the honeyword idea on biometrics, diversifiability (and thus renewability) is required on the biometric features. We propose to use BTPS for his purpose in this paper and present a machine learning based protected template generation protocol to ensure the best anonymity of the generated sugar template (from a user's genuine biometric feature) among other honey ones (from synthesized biometric features).

H. M. Ruan, M. H. Tsai, Y. N. Huang, Y. H. Liao, C. L. Lei.  2015.  "Discovery of De-identification Policies Considering Re-identification Risks and Information Loss". 2015 10th Asia Joint Conference on Information Security. :69-76.

In data analysis, it is always a tough task to strike the balance between the privacy and the applicability of the data. Due to the demand for individual privacy, the data are being more or less obscured before being released or outsourced to avoid possible privacy leakage. This process is so called de-identification. To discuss a de-identification policy, the most important two aspects should be the re-identification risk and the information loss. In this paper, we introduce a novel policy searching method to efficiently find out proper de-identification policies according to acceptable re-identification risk while retaining the information resided in the data. With the UCI Machine Learning Repository as our real world dataset, the re-identification risk can therefore be able to reflect the true risk of the de-identified data under the de-identification policies. Moreover, using the proposed algorithm, one can then efficiently acquire policies with higher information entropy.

2017-02-14
B. C. M. Cappers, J. J. van Wijk.  2015.  "SNAPS: Semantic network traffic analysis through projection and selection". 2015 IEEE Symposium on Visualization for Cyber Security (VizSec). :1-8.

Most network traffic analysis applications are designed to discover malicious activity by only relying on high-level flow-based message properties. However, to detect security breaches that are specifically designed to target one network (e.g., Advanced Persistent Threats), deep packet inspection and anomaly detection are indispensible. In this paper, we focus on how we can support experts in discovering whether anomalies at message level imply a security risk at network level. In SNAPS (Semantic Network traffic Analysis through Projection and Selection), we provide a bottom-up pixel-oriented approach for network traffic analysis where the expert starts with low-level anomalies and iteratively gains insight in higher level events through the creation of multiple selections of interest in parallel. The tight integration between visualization and machine learning enables the expert to iteratively refine anomaly scores, making the approach suitable for both post-traffic analysis and online monitoring tasks. To illustrate the effectiveness of this approach, we present example explorations on two real-world data sets for the detection and understanding of potential Advanced Persistent Threats in progress.

C. H. Hsieh, C. M. Lai, C. H. Mao, T. C. Kao, K. C. Lee.  2015.  "AD2: Anomaly detection on active directory log data for insider threat monitoring". 2015 International Carnahan Conference on Security Technology (ICCST). :287-292.

What you see is not definitely believable is not a rare case in the cyber security monitoring. However, due to various tricks of camouflages, such as packing or virutal private network (VPN), detecting "advanced persistent threat"(APT) by only signature based malware detection system becomes more and more intractable. On the other hand, by carefully modeling users' subsequent behaviors of daily routines, probability for one account to generate certain operations can be estimated and used in anomaly detection. To the best of our knowledge so far, a novel behavioral analytic framework, which is dedicated to analyze Active Directory domain service logs and to monitor potential inside threat, is now first proposed in this project. Experiments on real dataset not only show that the proposed idea indeed explores a new feasible direction for cyber security monitoring, but also gives a guideline on how to deploy this framework to various environments.

2016-04-07
Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon.  2015.  Optimal Personalized Filtering Against Spear-phishing Attacks. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. :958–964.

To penetrate sensitive computer networks, attackers can use spear phishing to sidestep technical security mechanisms by exploiting the privileges of careless users. In order to maximize their success probability, attackers have to target the users that constitute the weakest links of the system. The optimal selection of these target users takes into account both the damage that can be caused by a user and the probability of a malicious e-mail being delivered to and opened by a user. Since attackers select their targets in a strategic way, the optimal mitigation of these attacks requires the defender to also personalize the e-mail filters by taking into account the users' properties.

In this paper, we assume that a learned classifier is given and propose strategic per-user filtering thresholds for mitigating spear-phishing attacks. We formulate the problem of filtering targeted and non-targeted malicious e-mails as a Stackelberg security game. We characterize the optimal filtering strategies and show how to compute them in practice. Finally, we evaluate our results using two real-world datasets and demonstrate that the proposed thresholds lead to lower losses than nonstrategic thresholds.

2015-05-06
Boukhtouta, A., Lakhdari, N.-E., Debbabi, M..  2014.  Inferring Malware Family through Application Protocol Sequences Signature. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

The dazzling emergence of cyber-threats exert today's cyberspace, which needs practical and efficient capabilities for malware traffic detection. In this paper, we propose an extension to an initial research effort, namely, towards fingerprinting malicious traffic by putting an emphasis on the attribution of maliciousness to malware families. The proposed technique in the previous work establishes a synergy between automatic dynamic analysis of malware and machine learning to fingerprint badness in network traffic. Machine learning algorithms are used with features that exploit only high-level properties of traffic packets (e.g. packet headers). Besides, the detection of malicious packets, we want to enhance fingerprinting capability with the identification of malware families responsible in the generation of malicious packets. The identification of the underlying malware family is derived from a sequence of application protocols, which is used as a signature to the family in question. Furthermore, our results show that our technique achieves promising malware family identification rate with low false positives.

Boukhtouta, A., Lakhdari, N.-E., Debbabi, M..  2014.  Inferring Malware Family through Application Protocol Sequences Signature. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

The dazzling emergence of cyber-threats exert today's cyberspace, which needs practical and efficient capabilities for malware traffic detection. In this paper, we propose an extension to an initial research effort, namely, towards fingerprinting malicious traffic by putting an emphasis on the attribution of maliciousness to malware families. The proposed technique in the previous work establishes a synergy between automatic dynamic analysis of malware and machine learning to fingerprint badness in network traffic. Machine learning algorithms are used with features that exploit only high-level properties of traffic packets (e.g. packet headers). Besides, the detection of malicious packets, we want to enhance fingerprinting capability with the identification of malware families responsible in the generation of malicious packets. The identification of the underlying malware family is derived from a sequence of application protocols, which is used as a signature to the family in question. Furthermore, our results show that our technique achieves promising malware family identification rate with low false positives.

Chongxi Bao, Forte, D., Srivastava, A..  2014.  On application of one-class SVM to reverse engineering-based hardware Trojan detection. Quality Electronic Design (ISQED), 2014 15th International Symposium on. :47-54.

Due to design and fabrication outsourcing to foundries, the problem of malicious modifications to integrated circuits known as hardware Trojans has attracted attention in academia as well as industry. To reduce the risks associated with Trojans, researchers have proposed different approaches to detect them. Among these approaches, test-time detection approaches have drawn the greatest attention and most approaches assume the existence of a “golden model”. Prior works suggest using reverse-engineering to identify such Trojan-free ICs for the golden model but they did not state how to do this efficiently. In this paper, we propose an innovative and robust reverseengineering approach to identify the Trojan-free ICs. We adapt a well-studied machine learning method, one-class support vector machine, to solve our problem. Simulation results using state-of-the-art tools on several publicly available circuits show that our approach can detect hardware Trojans with high accuracy rate across different modeling and algorithm parameters.

Haddadi, F., Morgan, J., Filho, E.G., Zincir-Heywood, A.N..  2014.  Botnet Behaviour Analysis Using IP Flows: With HTTP Filters Using Classifiers. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :7-12.

Botnets are one of the most destructive threats against the cyber security. Recently, HTTP protocol is frequently utilized by botnets as the Command and Communication (C&C) protocol. In this work, we aim to detect HTTP based botnet activity based on botnet behaviour analysis via machine learning approach. To achieve this, we employ flow-based network traffic utilizing NetFlow (via Softflowd). The proposed botnet analysis system is implemented by employing two different machine learning algorithms, C4.5 and Naive Bayes. Our results show that C4.5 learning algorithm based classifier obtained very promising performance on detecting HTTP based botnet activity.

Stevanovic, M., Pedersen, J.M..  2014.  An efficient flow-based botnet detection using supervised machine learning. Computing, Networking and Communications (ICNC), 2014 International Conference on. :797-801.

Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs to be observed per flow in order to capture the patterns of malicious traffic. The proposed system has been tested through the series of experiments using traffic traces originating from two well-known P2P botnets and diverse non-malicious applications. The results of experiments indicate that the system is able to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates a strong potential of using the proposed approach within a future on-line detection framework.

2015-05-05
Koch, S., John, M., Worner, M., Muller, A., Ertl, T..  2014.  VarifocalReader #x2014; In-Depth Visual Analysis of Large Text Documents. Visualization and Computer Graphics, IEEE Transactions on. 20:1723-1732.

Interactive visualization provides valuable support for exploring, analyzing, and understanding textual documents. Certain tasks, however, require that insights derived from visual abstractions are verified by a human expert perusing the source text. So far, this problem is typically solved by offering overview-detail techniques, which present different views with different levels of abstractions. This often leads to problems with visual continuity. Focus-context techniques, on the other hand, succeed in accentuating interesting subsections of large text documents but are normally not suited for integrating visual abstractions. With VarifocalReader we present a technique that helps to solve some of these approaches' problems by combining characteristics from both. In particular, our method simplifies working with large and potentially complex text documents by simultaneously offering abstract representations of varying detail, based on the inherent structure of the document, and access to the text itself. In addition, VarifocalReader supports intra-document exploration through advanced navigation concepts and facilitates visual analysis tasks. The approach enables users to apply machine learning techniques and search mechanisms as well as to assess and adapt these techniques. This helps to extract entities, concepts and other artifacts from texts. In combination with the automatic generation of intermediate text levels through topic segmentation for thematic orientation, users can test hypotheses or develop interesting new research questions. To illustrate the advantages of our approach, we provide usage examples from literature studies.

Baughman, A.K., Chuang, W., Dixon, K.R., Benz, Z., Basilico, J..  2014.  DeepQA Jeopardy! Gamification: A Machine-Learning Perspective. Computational Intelligence and AI in Games, IEEE Transactions on. 6:55-66.

DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.

2015-05-04
Boukhtouta, A., Lakhdari, N.-E., Debbabi, M..  2014.  Inferring Malware Family through Application Protocol Sequences Signature. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

The dazzling emergence of cyber-threats exert today's cyberspace, which needs practical and efficient capabilities for malware traffic detection. In this paper, we propose an extension to an initial research effort, namely, towards fingerprinting malicious traffic by putting an emphasis on the attribution of maliciousness to malware families. The proposed technique in the previous work establishes a synergy between automatic dynamic analysis of malware and machine learning to fingerprint badness in network traffic. Machine learning algorithms are used with features that exploit only high-level properties of traffic packets (e.g. packet headers). Besides, the detection of malicious packets, we want to enhance fingerprinting capability with the identification of malware families responsible in the generation of malicious packets. The identification of the underlying malware family is derived from a sequence of application protocols, which is used as a signature to the family in question. Furthermore, our results show that our technique achieves promising malware family identification rate with low false positives.

2015-05-01
Yuxi Liu, Hatzinakos, D..  2014.  Earprint: Transient Evoked Otoacoustic Emission for Biometrics. Information Forensics and Security, IEEE Transactions on. 9:2291-2301.

Biometrics is attracting increasing attention in privacy and security concerned issues, such as access control and remote financial transaction. However, advanced forgery and spoofing techniques are threatening the reliability of conventional biometric modalities. This has been motivating our investigation of a novel yet promising modality transient evoked otoacoustic emission (TEOAE), which is an acoustic response generated from cochlea after a click stimulus. Unlike conventional modalities that are easily accessible or captured, TEOAE is naturally immune to replay and falsification attacks as a physiological outcome from human auditory system. In this paper, we resort to wavelet analysis to derive the time-frequency representation of such nonstationary signal, which reveals individual uniqueness and long-term reproducibility. A machine learning technique linear discriminant analysis is subsequently utilized to reduce intrasubject variability and further capture intersubject differentiation features. Considering practical application, we also introduce a complete framework of the biometric system in both verification and identification modes. Comparative experiments on a TEOAE data set of biometric setting show the merits of the proposed method. Performance is further improved with fusion of information from both ears.

2015-04-30
El Masri, A., Wechsler, H., Likarish, P., Kang, B.B..  2014.  Identifying users with application-specific command streams. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. :232-238.

This paper proposes and describes an active authentication model based on user profiles built from user-issued commands when interacting with GUI-based application. Previous behavioral models derived from user issued commands were limited to analyzing the user's interaction with the *Nix (Linux or Unix) command shell program. Human-computer interaction (HCI) research has explored the idea of building users profiles based on their behavioral patterns when interacting with such graphical interfaces. It did so by analyzing the user's keystroke and/or mouse dynamics. However, none had explored the idea of creating profiles by capturing users' usage characteristics when interacting with a specific application beyond how a user strikes the keyboard or moves the mouse across the screen. We obtain and utilize a dataset of user command streams collected from working with Microsoft (MS) Word to serve as a test bed. User profiles are first built using MS Word commands and identification takes place using machine learning algorithms. Best performance in terms of both accuracy and Area under the Curve (AUC) for Receiver Operating Characteristic (ROC) curve is reported using Random Forests (RF) and AdaBoost with random forests.

Dai, Y. S., Xiang, Y. P., Pan, Y..  2014.  Bionic Autonomic Nervous Systems for Self-Defense Against DoS, Spyware, Malware, Virus, and Fishing. ACM Trans. Auton. Adapt. Syst.. 9:4:1–4:20.

Computing systems and networks become increasingly large and complex with a variety of compromises and vulnerabilities. The network security and privacy are of great concern today, where self-defense against different kinds of attacks in an autonomous and holistic manner is a challenging topic. To address this problem, we developed an innovative technology called Bionic Autonomic Nervous System (BANS). The BANS is analogous to biological nervous system, which consists of basic modules like cyber axon, cyber neuron, peripheral nerve and central nerve. We also presented an innovative self-defense mechanism which utilizes the Fuzzy Logic, Neural Networks, and Entropy Awareness, etc. Equipped with the BANS, computer and network systems can intelligently self-defend against both known and unknown compromises/attacks including denial of services (DoS), spyware, malware, and virus. BANS also enabled multiple computers to collaboratively fight against some distributed intelligent attacks like DDoS. We have implemented the BANS in practice. Some case studies and experimental results exhibited the effectiveness and efficiency of the BANS and the self-defense mechanism.

2014-09-26
Sommer, R., Paxson, V..  2010.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. Security and Privacy (SP), 2010 IEEE Symposium on. :305-316.

In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings. We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success. Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively. We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection.

Dyer, K.P., Coull, S.E., Ristenpart, T., Shrimpton, T..  2012.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail. Security and Privacy (SP), 2012 IEEE Symposium on. :332-346.

We consider the setting of HTTP traffic over encrypted tunnels, as used to conceal the identity of websites visited by a user. It is well known that traffic analysis (TA) attacks can accurately identify the website a user visits despite the use of encryption, and previous work has looked at specific attack/countermeasure pairings. We provide the first comprehensive analysis of general-purpose TA countermeasures. We show that nine known countermeasures are vulnerable to simple attacks that exploit coarse features of traffic (e.g., total time and bandwidth). The considered countermeasures include ones like those standardized by TLS, SSH, and IPsec, and even more complex ones like the traffic morphing scheme of Wright et al. As just one of our results, we show that despite the use of traffic morphing, one can use only total upstream and downstream bandwidth to identify – with 98% accuracy - which of two websites was visited. One implication of what we find is that, in the context of website identification, it is unlikely that bandwidth-efficient, general-purpose TA countermeasures can ever provide the type of security targeted in prior work.

2014-09-17
Colbaugh, R., Glass, K..  2013.  Moving target defense for adaptive adversaries. Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on. :50-55.

Machine learning (ML) plays a central role in the solution of many security problems, for example enabling malicious and innocent activities to be rapidly and accurately distinguished and appropriate actions to be taken. Unfortunately, a standard assumption in ML - that the training and test data are identically distributed - is typically violated in security applications, leading to degraded algorithm performance and reduced security. Previous research has attempted to address this challenge by developing ML algorithms which are either robust to differences between training and test data or are able to predict and account for these differences. This paper adopts a different approach, developing a class of moving target (MT) defenses that are difficult for adversaries to reverse-engineer, which in turn decreases the adversaries' ability to generate training/test data differences that benefit them. We leverage the coevolutionary relationship between attackers and defenders to derive a simple, flexible MT defense strategy which is optimal or nearly optimal for a broad range of security problems. Case studies involving two distinct cyber defense applications demonstrate that the proposed MT algorithm outperforms standard static methods, offering effective defense against intelligent, adaptive adversaries.