Visible to the public Biblio

Filters: Keyword is attribution  [Clear All Filters]
Avellaneda, Florent, Alikacem, El-Hackemi, Jaafar, Femi.  2019.  Using Attack Pattern for Cyber Attack Attribution. 2019 International Conference on Cybersecurity (ICoCSec). :1—6.

A cyber attack is a malicious and deliberate attempt by an individual or organization to breach the integrity, confidentiality, and/or availability of data or services of an information system of another individual or organization. Being able to attribute a cyber attack is a crucial question for security but this question is also known to be a difficult problem. The main reason why there is currently no solution that automatically identifies the initiator of an attack is that attackers usually use proxies, i.e. an intermediate node that relays a host over the network. In this paper, we propose to formalize the problem of identifying the initiator of a cyber attack. We show that if the attack scenario used by the attacker is known, then we are able to resolve the cyber attribution problem. Indeed, we propose a model to formalize these attack scenarios, that we call attack patterns, and give an efficient algorithm to search for attack pattern on a communication history. Finally, we experimentally show the relevance of our approach.

Parafita, Álvaro, Vitrià, Jordi.  2019.  Explaining Visual Models by Causal Attribution. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). :4167—4175.

Model explanations based on pure observational data cannot compute the effects of features reliably, due to their inability to estimate how each factor alteration could affect the rest. We argue that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions. With these models, we can compute counterfactuals, new samples that will inform us how the model reacts to feature changes on our input. We propose a novel explanation methodology based on Causal Counterfactuals and identify the limitations of current Image Generative Models in their application to counterfactual creation.

Perry, Lior, Shapira, Bracha, Puzis, Rami.  2019.  NO-DOUBT: Attack Attribution Based On Threat Intelligence Reports. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :80—85.

The task of attack attribution, i.e., identifying the entity responsible for an attack, is complicated and usually requires the involvement of an experienced security expert. Prior attempts to automate attack attribution apply various machine learning techniques on features extracted from the malware's code and behavior in order to identify other similar malware whose authors are known. However, the same malware can be reused by multiple actors, and the actor who performed an attack using a malware might differ from the malware's author. Moreover, information collected during an incident may contain many clues about the identity of the attacker in addition to the malware used. In this paper, we propose a method of attack attribution based on textual analysis of threat intelligence reports, using state of the art algorithms and models from the fields of machine learning and natural language processing (NLP). We have developed a new text representation algorithm which captures the context of the words and requires minimal feature engineering. Our approach relies on vector space representation of incident reports derived from a small collection of labeled reports and a large corpus of general security literature. Both datasets have been made available to the research community. Experimental results show that the proposed representation can attribute attacks more accurately than the baselines' representations. In addition, we show how the proposed approach can be used to identify novel previously unseen threat actors and identify similarities between known threat actors.

Traylor, Terry, Straub, Jeremy, Gurmeet, Snell, Nicholas.  2019.  Classifying Fake News Articles Using Natural Language Processing to Identify In-Article Attribution as a Supervised Learning Estimator. 2019 IEEE 13th International Conference on Semantic Computing (ICSC). :445—449.

Intentionally deceptive content presented under the guise of legitimate journalism is a worldwide information accuracy and integrity problem that affects opinion forming, decision making, and voting patterns. Most so-called `fake news' is initially distributed over social media conduits like Facebook and Twitter and later finds its way onto mainstream media platforms such as traditional television and radio news. The fake news stories that are initially seeded over social media platforms share key linguistic characteristics such as making excessive use of unsubstantiated hyperbole and non-attributed quoted content. In this paper, the results of a fake news identification study that documents the performance of a fake news classifier are presented. The Textblob, Natural Language, and SciPy Toolkits were used to develop a novel fake news detector that uses quoted attribution in a Bayesian machine learning system as a key feature to estimate the likelihood that a news article is fake. The resultant process precision is 63.333% effective at assessing the likelihood that an article with quotes is fake. This process is called influence mining and this novel technique is presented as a method that can be used to enable fake news and even propaganda detection. In this paper, the research process, technical analysis, technical linguistics work, and classifier performance and results are presented. The paper concludes with a discussion of how the current system will evolve into an influence mining system.

BOUGHACI, Dalila, BENMESBAH, Mounir, ZEBIRI, Aniss.  2019.  An improved N-grams based Model for Authorship Attribution. 2019 International Conference on Computer and Information Sciences (ICCIS). :1—6.

Authorship attribution is the problem of studying an anonymous text and finding the corresponding author in a set of candidate authors. In this paper, we propose a method based on N-grams model for the problem of authorship attribution. Several measures are used to assign an anonymous text to an author. The different variants of the proposed method are implemented and validated on PAN benchmarks. The numerical results are encouraging and demonstrate the benefit of the proposed idea.

Khomytska, Iryna, Teslyuk, Vasyl.  2019.  Mathematical Methods Applied for Authorship Attribution on the Phonological Level. 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT). 3:7—11.

The proposed combination of statistical methods has proved efficient for authorship attribution. The complex analysis method based on the proposed combination of statistical methods has made it possible to minimize the number of phoneme groups by which the authorial differentiation of texts has been done.

Gopinath, Divya, S. Pasareanu, Corina, Wang, Kaiyuan, Zhang, Mengshi, Khurshid, Sarfraz.  2019.  Symbolic Execution for Attribution and Attack Synthesis in Neural Networks. 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). :282—283.

This paper introduces DeepCheck, a new approach for validating Deep Neural Networks (DNNs) based on core ideas from program analysis, specifically from symbolic execution. DeepCheck implements techniques for lightweight symbolic analysis of DNNs and applies them in the context of image classification to address two challenging problems: 1) identification of important pixels (for attribution and adversarial generation); and 2) creation of adversarial attacks. Experimental results using the MNIST data-set show that DeepCheck's lightweight symbolic analysis provides a valuable tool for DNN validation.

Karaküçük, Ahmet, Dirik, A. Emir.  2019.  Source Device Attribution of Thermal Images Captured with Handheld IR Cameras. 2019 11th International Conference on Electrical and Electronics Engineering (ELECO). :547—551.

Source camera attribution of digital images has been a hot research topic in digital forensics literature. However, the thermal cameras and the radiometric data they generate stood as a nascent topic, as such devices are expensive and tailored for specific use-cases - not adapted by the masses. This has changed dramatically, with the low-cost, pluggable thermal-camera add-ons to smartphones and similar low-cost pocket-size thermal cameras introduced to consumers recently, which enabled the use of thermal imaging devices for the masses. In this paper, we are going to investigate the use of an established source device attribution method on radiometric data produced with a consumer-level, low-cost handheld thermal camera. The results we represent in this paper are promising and show that it is quite possible to attribute thermal images with their source camera.

Jafariakinabad, Fereshteh, Hua, Kien A..  2019.  Style-Aware Neural Model with Application in Authorship Attribution. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :325—328.

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

Khomytska, Iryna, Teslyuk, Vasyl.  2019.  The Software for Authorship and Style Attribution. 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). :1—4.

A new program has been developed for style and authorship attribution. Differentiation of styles by transcription symbols has proved to be efficient The novel approach involves a combination of two ways of transforming texts into their transcription variants. The java programming language makes it possible to improve efficiency of style and authorship attribution.

Giles, Keir, Hartmann, Kim.  2019.  “Silent Battle” Goes Loud: Entering a New Era of State-Avowed Cyber Conflict. 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—13.

The unprecedented transparency shown by the Netherlands intelligence services in exposing Russian GRU officers in October 2018 is indicative of a number of new trends in state handling of cyber conflict. US public indictments of foreign state intelligence officials, and the UK's deliberate provision of information allowing the global media to “dox” GRU officers implicated in the Salisbury poison attack in early 2018, set a precedent for revealing information that previously would have been confidential. This is a major departure from previous practice where the details of state-sponsored cyber attacks would only be discovered through lengthy investigative journalism (as with Stuxnet) or through the efforts of cybersecurity corporations (as with Red October). This paper uses case studies to illustrate the nature of this departure and consider its impact, including potentially substantial implications for state handling of cyber conflict. The paper examines these implications, including: · The effect of transparency on perception of conflict. Greater public knowledge of attacks will lead to greater public acceptance that countermeasures should be taken. This may extend to public preparedness to accept that a state of declared or undeclared war exists with a cyber aggressor. · The resulting effect on legality. This adds a new element to the long-running debates on the legality of cyber attacks or counter-attacks, by affecting the point at which a state of conflict is politically and socially, even if not legally, judged to exist. · The further resulting effect on permissions and authorities to conduct cyber attacks, in the form of adjustment to the glaring imbalance between the means and methods available to aggressors (especially those who believe themselves already to be in conflict) and defenders. Greater openness has already intensified public and political questioning of the restraint shown by NATO and EU nations in responding to Russian actions; this trend will continue. · Consequences for deterrence, both specifically within cyber conflict and also more broadly deterring hostile actions. In sum, the paper brings together the direct and immediate policy implications, for a range of nations and for NATO, of the new apparent policy of transparency.

An, Ning, Jiang, Siyuan, Yang, Jiaoyun, Li, Lian.  2018.  Simplex Based Vector Mapping for Categorical Attributes Clustering. Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems. :56–60.
When clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we propose a strategy to map categorical attributes to high dimensional vectors based on the Simplex Theory, hence categorical attributes could be handled the same as numeral attributes. To achieve identical distances between any two values under Euclidean distance, we theoretically prove a categorical attribute with n types of values should be mapped to at least n–1 dimensional vectors. Furthermore, numerical vector mapping solutions are provided on condition of 0 normalized constraint. Experimentally, we show that integrating our vector mapping strategy with K-means algorithm achieves better accuracy than integrating similarities for categorical attributes with K-modes algorithm on four datasets.
Rubio-Medrano, Carlos E., Zhao, Ziming, Ahn, Gail-Joon.  2018.  RiskPol : A Risk Assessment Framework for Preventing Attribute-Forgery Attacks to ABAC Policies. Proceedings of the Third ACM Workshop on Attribute-Based Access Control. :54–60.

Recently, attribute-based access control (ABAC) has emerged as a convenient paradigm for specifying, enforcing and maintaining rich and flexible authorization policies, leveraging attributes originated from multiple sources, e.g., operative systems, software modules, remote services, etc. However, attackers may try to bypass ABAC policies by compromising such sources to forge the attributes they provide, e.g., by deliberately manipulating the data contained within those attributes at will, in an effort to gain unintended access to sensitive resources as a result. In such a context, performing a proper risk assessment of ABAC policies, taking into account their enlisted attributes as well as their corresponding sources, becomes highly convenient to overcome zero-day security incidents or vulnerabilities, before they can be later exploited by attackers. With this in mind, we introduce RiskPol, an automated risk assessment framework for ABAC policies based on dynamically combining previously-assigned trust scores for each attribute source, such that overall scores at the policy level can be later obtained and used as a reference for performing a risk assessment on each policy. In this paper, we detail the general intuition behind our approach, its current status, as well as our plans for future work.

Iqbal, A., Mahmood, F., Shalaginov, A., Ekstedt, M..  2018.  Identification of Attack-based Digital Forensic Evidences for WAMPAC Systems. 2018 IEEE International Conference on Big Data (Big Data). :3079–3087.
Power systems domain has generally been very conservative in terms of conducting digital forensic investigations, especially so since the advent of smart grids. This lack of research due to a multitude of challenges has resulted in absence of knowledge base and resources to facilitate such an investigation. Digitalization in the form of smart grids is upon us but in case of cyber-attacks, attribution to such attacks is challenging and difficult if not impossible. In this research, we have identified digital forensic artifacts resulting from a cyber-attack on Wide Area Monitoring, Protection and Control (WAMPAC) systems, which will help an investigator attribute an attack using the identified evidences. The research also shows the usage of sandboxing for digital forensics along with hardware-in-the-loop (HIL) setup. This is first of its kind effort to identify and acquire all the digital forensic evidences for WAMPAC systems which will ultimately help in building a body of knowledge and taxonomy for power system forensics.
[Anonymous].  2018.  A Systems Approach to Indicators of Compromise Utilizing Graph Theory. 2018 IEEE International Symposium on Technologies for Homeland Security (HST). :1–6.
It is common to record indicators of compromise (IoC) in order to describe a particular breach and to attempt to attribute a breach to a specific threat actor. However, many network security breaches actually involve multiple diverse modalities using a variety of attack vectors. Measuring and recording IoC's in isolation does not provide an accurate view of the actual incident, and thus does not facilitate attribution. A system's approach that describes the entire intrusion as an IoC would be more effective. Graph theory has been utilized to model complex systems of varying types and this provides a mathematical tool for modeling systems indicators of compromise. This current paper describes the applications of graph theory to creating systems-based indicators of compromise. A complete methodology is presented for developing systems IoC's that fully describe a complex network intrusion.
Herald, N. E., David, M. W..  2018.  A Framework for Making Effective Responses to Cyberattacks. 2018 IEEE International Conference on Big Data (Big Data). :4798–4805.
The process for determining how to respond to a cyberattack involves evaluating many factors, including some with competing risks. Consequentially, decision makers in the private sector and policymakers in the U.S. government (USG) need a framework in order to make effective response decisions. The authors' research identified two competing risks: 1) the risk of not responding forcefully enough to deter a suspected attacker, and 2) responding in a manner that escalates a situation with an attacker. The authors also identified three primary factors that influence these risks: attribution confidence/time, the scale of the attack, and the relationship with the suspected attacker. This paper provides a framework to help decision makers understand how these factors interact to influence the risks associated with potential response options to cyberattacks. The views expressed do not reflect the official policy or position of the National Intelligence University, the Department of Defense, the U.S. Intelligence Community, or the U.S. Government.
Aborisade, O., Anwar, M..  2018.  Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :269–276.

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.

Alsadhan, A. F., Alhussein, M. A..  2018.  Deleted Data Attribution in Cloud Computing Platforms. 2018 1st International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The introduction of Cloud-based storage represents one of the most discussed challenges among digital forensic professionals. In a 2014 report, the National Institute of Standards and Technology (NIST) highlighted the various forensic challenges created as a consequence of sharing storage area among cloud users. One critical issue discussed in the report is how to recognize a file's owner after the file has been deleted. When a file is deleted, the cloud system also deletes the file metadata. After metadata has been deleted, no one can know who owned the file. This critical issue has introduced some difficulties in the deleted data acquisition process. For example, if a cloud user accidently deletes a file, it is difficult to recover the file. More importantly, it is even more difficult to identify the actual cloud user that owned the file. In addition, forensic investigators encounter numerous obstacles if a deleted file was to be used as evidence against a crime suspect. Unfortunately, few studies have been conducted to solve this matter. As a result, this work presents our proposed solution to the challenge of attributing deleted files to their specific users. We call this the “user signature” approach. This approach aims to enhance the deleted data acquisition process in cloud computing environments by specifically attributing files to the corresponding user.
Gugelmann, D., Sommer, D., Lenders, V., Happe, M., Vanbever, L..  2018.  Screen watermarking for data theft investigation and attribution. 2018 10th International Conference on Cyber Conflict (CyCon). :391–408.
Organizations not only need to defend their IT systems against external cyber attackers, but also from malicious insiders, that is, agents who have infiltrated an organization or malicious members stealing information for their own profit. In particular, malicious insiders can leak a document by simply opening it and taking pictures of the document displayed on the computer screen with a digital camera. Using a digital camera allows a perpetrator to easily avoid a log trail that results from using traditional communication channels, such as sending the document via email. This makes it difficult to identify and prove the identity of the perpetrator. Even a policy prohibiting the use of any device containing a camera cannot eliminate this threat since tiny cameras can be hidden almost everywhere. To address this leakage vector, we propose a novel screen watermarking technique that embeds hidden information on computer screens displaying text documents. The watermark is imperceptible during regular use, but can be extracted from pictures of documents shown on the screen, which allows an organization to reconstruct the place and time of the data leak from recovered leaked pictures. Our approach takes advantage of the fact that the human eye is less sensitive to small luminance changes than digital cameras. We devise a symbol shape that is invisible to the human eye, but still robust to the image artifacts introduced when taking pictures. We complement this symbol shape with an error correction coding scheme that can handle very high bit error rates and retrieve watermarks from cropped and compressed pictures. We show in an experimental user study that our screen watermarks are not perceivable by humans and analyze the robustness of our watermarks against image modifications.
Han, C., Zhao, C., Zou, Z., Tang, H., You, J..  2018.  PATIP-TREE: An Efficient Method to Look up the Network Address Attribution Information. 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :466–473.
The IP address attribution information includes the geographical information, the network routing information, the agency information, Internet Content Provider (ICP) information, etc. Nowadays, the attribution information is important to the network traffic engineering, which needs to be obtained in real time in network traffic analysis system. The existing proposed methods for IP address attribution information lookup cannot be employed in actual systems efficiently due to their low scalability or bad performance. They cannot address the backbone network's requirements for real-time IP address attribution information lookup, and most lookup methods do not support custom IP address attribution lookup. In response to these challenges, we propose a novel high-speed approach for IP address attribution information lookup. We first devise a data structure of IP address attribution information search tree (PATIP-TREE) to store custom IP address attribution information. Based on the PATIP-TREE, an effective algorithm for IP information lookup is proposed, which can support custom IP addresses attribution information lookup in real time. The experimental results show that our method outperforms the existing methods in terms of higher efficiency. Our approach also provides high scalability, which is suitable for many kinds network address such as IPv4 address, IPv6 address, named data networking address, etc.
Zhu, Z., Jiang, R., Jia, Y., Xu, J., Li, A..  2018.  Cyber Security Knowledge Graph Based Cyber Attack Attribution Framework for Space-ground Integration Information Network. 2018 IEEE 18th International Conference on Communication Technology (ICCT). :870–874.
Comparing with the traditional Internet, the space-ground integration information network has more complicated topology, wider coverage area and is more difficult to find the source of attacks. In this paper, a cyber attack attribution framework is proposed to trace the attack source in space-ground integration information network. First, we constructs a cyber security knowledge graph for space-ground integration information network. An automated attributing framework for cyber-attack is proposed. It attributes the source of the attack by querying the cyber security knowledge graph we constructed. Experiments show that the proposed framework can attribute network attacks simply, effectively, and automatically.
Greenstadt, Rachel.  2017.  Using Stylometry to Attribute Programmers and Writers. Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. :91–91.

In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.

Krupp, Johannes, Backes, Michael, Rossow, Christian.  2016.  Identifying the Scan and Attack Infrastructures Behind Amplification DDoS Attacks. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1426–1437.

Amplification DDoS attacks have gained popularity and become a serious threat to Internet participants. However, little is known about where these attacks originate, and revealing the attack sources is a non-trivial problem due to the spoofed nature of the traffic. In this paper, we present novel techniques to uncover the infrastructures behind amplification DDoS attacks. We follow a two-step approach to tackle this challenge: First, we develop a methodology to impose a fingerprint on scanners that perform the reconnaissance for amplification attacks that allows us to link subsequent attacks back to the scanner. Our methodology attributes over 58% of attacks to a scanner with a confidence of over 99.9%. Second, we use Time-to-Live-based trilateration techniques to map scanners to the actual infrastructures launching the attacks. Using this technique, we identify 34 networks as being the source for amplification attacks at 98\textbackslash% certainty.

Vazirian, Samane, Zahedi, Morteza.  2016.  A modified language modeling method for authorship attribution. :32–37.

This paper presents an approach to a closed-class authorship attribution (AA) problem. It is based on language modeling for classification and called modified language modeling. Modified language modeling aims to offer a solution for AA problem by Combinations of both bigram words weighting and Unigram words weighting. It makes the relation between unseen text and training documents clearer with giving extra reward of training documents; training document including bigram word as well as unigram words. Moreover, IDF value multiplied by related word probability has been used, instead of removing stop words which are provided by Stop words list. we evaluate Experimental results by four approaches; unigram, bigram, trigram and modified language modeling by using two Persian poem corpora as WMPR-AA2016-A Dataset and WMPR-AA2016-B Dataset. Results show that modified language modeling attributes authors better than other approaches. The result on WMPR-AA2016-B, which is bigger dataset, is much better than another dataset for all approaches. This may indicate that if adequate data is provided to train language modeling the modified language modeling can be a good solution to AA problem.

Han, YuFei, Shen, Yun.  2016.  Accurate Spear Phishing Campaign Attribution and Early Detection. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :2079–2086.

There is growing evidence that spear phishing campaigns are increasingly pervasive, sophisticated, and remain the starting points of more advanced attacks. Current campaign identification and attribution process heavily relies on manual efforts and is inefficient in gathering intelligence in a timely manner. It is ideal that we can automatically attribute spear phishing emails to known campaigns and achieve early detection of new campaigns using limited labelled emails as the seeds. In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph based semi-supervised learning model for campaign attribution and detection. We demonstrate that our system, using only 25 labelled emails, achieves 0.9 F1 score with a 0.01 false positive rate in known campaign attribution, and is able to detect previously unknown spear phishing campaigns, achieving 100% 'darkmoon', over 97% of 'samkams' and 91% of 'bisrala' campaign detection using 246 labelled emails in our experiments.