Visible to the public Biblio

Filters: Keyword is semi-supervised learning  [Clear All Filters]
Li, Wenyue, Yin, Jihao, Han, Bingnan, Zhu, Hongmei.  2019.  Generative Adversarial Network with Folded Spectrum for Hyperspectral Image Classification. IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. :883—886.

Hyperspectral image (HSIs) with abundant spectral information but limited labeled dataset endows the rationality and necessity of semi-supervised spectral-based classification methods. Where, the utilizing approach of spectral information is significant to classification accuracy. In this paper, we propose a novel semi-supervised method based on generative adversarial network (GAN) with folded spectrum (FS-GAN). Specifically, the original spectral vector is folded to 2D square spectrum as input of GAN, which can generate spectral texture and provide larger receptive field over both adjacent and non-adjacent spectral bands for deep feature extraction. The generated fake folded spectrum, the labeled and unlabeled real folded spectrum are then fed to the discriminator for semi-supervised learning. A feature matching strategy is applied to prevent model collapse. Extensive experimental comparisons demonstrate the effectiveness of the proposed method.

Liu, Junfu, Chen, Keming, Xu, Guangluan, Li, Hao, Yan, Menglong, Diao, Wenhui, Sun, Xian.  2019.  Semi-Supervised Change Detection Based on Graphs with Generative Adversarial Networks. IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. :74—77.

In this paper, we present a semi-supervised remote sensing change detection method based on graph model with Generative Adversarial Networks (GANs). Firstly, the multi-temporal remote sensing change detection problem is converted as a problem of semi-supervised learning on graph where a majority of unlabeled nodes and a few labeled nodes are contained. Then, GANs are adopted to generate samples in a competitive manner and help improve the classification accuracy. Finally, a binary change map is produced by classifying the unlabeled nodes to a certain class with the help of both the labeled nodes and the unlabeled nodes on graph. Experimental results carried on several very high resolution remote sensing image data sets demonstrate the effectiveness of our method.

Zügner, Daniel, Akbarnejad, Amir, Günnemann, Stephan.  2018.  Adversarial Attacks on Neural Networks for Graph Data. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :2847-2856.
Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model.We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.
Ghosh, Shalini, Das, Ariyam, Porras, Phil, Yegneswaran, Vinod, Gehani, Ashish.  2017.  Automated Categorization of Onion Sites for Analyzing the Darkweb Ecosystem. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1793–1802.

Onion sites on the darkweb operate using the Tor Hidden Service (HS) protocol to shield their locations on the Internet, which (among other features) enables these sites to host malicious and illegal content while being resistant to legal action and seizure. Identifying and monitoring such illicit sites in the darkweb is of high relevance to the Computer Security and Law Enforcement communities. We have developed an automated infrastructure that crawls and indexes content from onion sites into a large-scale data repository, called LIGHTS, with over 100M pages. In this paper we describe Automated Tool for Onion Labeling (ATOL), a novel scalable analysis service developed to conduct a thematic assessment of the content of onion sites in the LIGHTS repository. ATOL has three core components – (a) a novel keyword discovery mechanism (ATOLKeyword) which extends analyst-provided keywords for different categories by suggesting new descriptive and discriminative keywords that are relevant for the categories; (b) a classification framework (ATOLClassify) that uses the discovered keywords to map onion site content to a set of categories when sufficient labeled data is available; (c) a clustering framework (ATOLCluster) that can leverage information from multiple external heterogeneous knowledge sources, ranging from domain expertise to Bitcoin transaction data, to categorize onion content in the absence of sufficient supervised data. The paper presents empirical results of ATOL on onion datasets derived from the LIGHTS repository, and additionally benchmarks ATOL's algorithms on the publicly available 20 Newsgroups dataset to demonstrate the reproducibility of its results. On the LIGHTS dataset, ATOLClassify gives a 12% performance gain over an analyst-provided baseline, while ATOLCluster gives a 7% improvement over state-of-the-art semi-supervised clustering algorithms. We also discuss how ATOL has been deployed and externally evaluated, as part of the LIGHTS system.

Meng, Q., Shameng, Wen, Chao, Feng, Chaojing, Tang.  2016.  Predicting buffer overflow using semi-supervised learning. 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). :1959–1963.

As everyone knows vulnerability detection is a very difficult and time consuming work, so taking advantage of the unlabeled data sufficiently is needed and helpful. According the above reality, in this paper a method is proposed to predict buffer overflow based on semi-supervised learning. We first employ Antlr to extract AST from C/C++ source files, then according to the 22 buffer overflow attributes taxonomies, a 22-dimension vector is extracted from every function in AST, at last, the vector is leveraged to train a classifier to predict buffer overflow vulnerabilities. The experiment and evaluation indicate our method is correct and efficient.

Alabdulmohsin, Ibrahim, Han, YuFei, Shen, Yun, Zhang, XiangLiang.  2016.  Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2395–2400.

Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

Han, YuFei, Shen, Yun.  2016.  Accurate Spear Phishing Campaign Attribution and Early Detection. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :2079–2086.

There is growing evidence that spear phishing campaigns are increasingly pervasive, sophisticated, and remain the starting points of more advanced attacks. Current campaign identification and attribution process heavily relies on manual efforts and is inefficient in gathering intelligence in a timely manner. It is ideal that we can automatically attribute spear phishing emails to known campaigns and achieve early detection of new campaigns using limited labelled emails as the seeds. In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph based semi-supervised learning model for campaign attribution and detection. We demonstrate that our system, using only 25 labelled emails, achieves 0.9 F1 score with a 0.01 false positive rate in known campaign attribution, and is able to detect previously unknown spear phishing campaigns, achieving 100% 'darkmoon', over 97% of 'samkams' and 91% of 'bisrala' campaign detection using 246 labelled emails in our experiments.