Visible to the public Biblio

Filters: Keyword is pubcrawl and Keyword is Natural Language Processing and Year is 2020  [Clear All Filters]
2021-04-08
Bouzar-Benlabiod, L., Rubin, S. H., Belaidi, K., Haddar, N. E..  2020.  RNN-VED for Reducing False Positive Alerts in Host-based Anomaly Detection Systems. 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). :17–24.
Host-based Intrusion Detection Systems HIDS are often based on anomaly detection. Several studies deal with anomaly detection by analyzing the system-call traces and get good detection rates but also a high rate off alse positives. In this paper, we propose a new anomaly detection approach applied on the system-call traces. The normal behavior learning is done using a Sequence to sequence model based on a Variational Encoder-Decoder (VED) architecture that integrates Recurrent Neural Networks (RNN) cells. We exploit the semantics behind the invoking order of system-calls that are then seen as sentences. A preprocessing phase is added to structure and optimize the model input-data representation. After the learning step, a one-class classification is run to categorize the sequences as normal or abnormal. The architecture may be used for predicting abnormal behaviors. The tests are achieved on the ADFA-LD dataset.
2021-03-29
Chauhan, R., Heydari, S. Shah.  2020.  Polymorphic Adversarial DDoS attack on IDS using GAN. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–6.
Intrusion Detection systems are important tools in preventing malicious traffic from penetrating into networks and systems. Recently, Intrusion Detection Systems are rapidly enhancing their detection capabilities using machine learning algorithms. However, these algorithms are vulnerable to new unknown types of attacks that can evade machine learning IDS. In particular, they may be vulnerable to attacks based on Generative Adversarial Networks (GAN). GANs have been widely used in domains such as image processing, natural language processing to generate adversarial data of different types such as graphics, videos, texts, etc. We propose a model using GAN to generate adversarial DDoS attacks that can change the attack profile and can be undetected. Our simulation results indicate that by continuous changing of attack profile, defensive systems that use incremental learning will still be vulnerable to new attacks.
2021-03-18
Bi, X., Liu, X..  2020.  Chinese Character Captcha Sequential Selection System Based on Convolutional Neural Network. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). :554—559.

To ensure security, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used in people's online lives. This paper presents a Chinese character captcha sequential selection system based on convolutional neural network (CNN). Captchas composed of English and digits can already be identified with extremely high accuracy, but Chinese character captcha recognition is still challenging. The task we need to complete is to identify Chinese characters with different colors and different fonts that are not on a straight line with rotation and affine transformation on pictures with complex backgrounds, and then perform word order restoration on the identified Chinese characters. We divide the task into several sub-processes: Chinese character detection based on Faster R-CNN, Chinese character recognition and word order recovery based on N-Gram. In the Chinese character recognition sub-process, we have made outstanding contributions. We constructed a single Chinese character data set and built a 10-layer convolutional neural network. Eventually we achieved an accuracy of 98.43%, and completed the task perfectly.

2021-02-22
Hirlekar, V. V., Kumar, A..  2020.  Natural Language Processing based Online Fake News Detection Challenges – A Detailed Review. 2020 5th International Conference on Communication and Electronics Systems (ICCES). :748–754.
Online social media plays an important role during real world events such as natural calamities, elections, social movements etc. Since the social media usage has increased, fake news has grown. The social media is often used by modifying true news or creating fake news to spread misinformation. The creation and distribution of fake news poses major threats in several respects from a national security point of view. Hence Fake news identification becomes an essential goal for enhancing the trustworthiness of the information shared on online social network. Over the period of time many researcher has used different methods, algorithms, tools and techniques to identify fake news content from online social networks. The aim of this paper is to review and examine these methodologies, different tools, browser extensions and analyze the degree of output in question. In addition, this paper discuss the general approach of fake news detection as well as taxonomy of feature extraction which plays an important role to achieve maximum accuracy with the help of different Machine Learning and Natural Language Processing algorithms.
Koda, S., Kambara, Y., Oikawa, T., Furukawa, K., Unno, Y., Murakami, M..  2020.  Anomalous IP Address Detection on Traffic Logs Using Novel Word Embedding. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1504–1509.
This paper presents an anomalous IP address detection algorithm for network traffic logs. It is based on word embedding techniques derived from natural language processing to extract the representative features of IP addresses. However, the features extracted from vanilla word embeddings are not always compatible with machine learning-based anomaly detection algorithms. Therefore, we developed an algorithm that enables the extraction of more compatible features of IP addresses for anomaly detection than conventional methods. The proposed algorithm optimizes the objective functions of word embedding-based feature extraction and anomaly detection, simultaneously. According to the experimental results, the proposed algorithm outperformed conventional approaches; it improved the detection performance from 0.876 to 0.990 in the area under the curve criterion in a task of detecting the IP addresses of attackers from network traffic logs.
Bhagat, V., J, B. R..  2020.  Natural Language Processing on Diverse Data Layers Through Microservice Architecture. 2020 IEEE International Conference for Innovation in Technology (INOCON). :1–6.
With the rapid growth in Natural Language Processing (NLP), all types of industries find a need for analyzing a massive amount of data. Sentiment analysis is becoming a more exciting area for the businessmen and researchers in Text mining & NLP. This process includes the calculation of various sentiments with the help of text mining. Supplementary to this, the world is connected through Information Technology and, businesses are moving toward the next step of the development to make their system more intelligent. Microservices have fulfilled the need for development platforms which help the developers to use various development tools (Languages and applications) efficiently. With the consideration of data analysis for business growth, data security becomes a major concern in front of developers. This paper gives a solution to keep the data secured by providing required access to data scientists without disturbing the base system software. This paper has discussed data storage and exchange policies of microservices through common JavaScript Object Notation (JSON) response which performs the sentiment analysis of customer's data fetched from various microservices through secured APIs.
Si, Y., Zhou, W., Gai, J..  2020.  Research and Implementation of Data Extraction Method Based on NLP. 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :11–15.
In order to accurately extract the data from unstructured Chinese text, this paper proposes a rule-based method based on natural language processing and regular expression. This method makes use of the language expression rules of the data in the text and other related knowledge to form the feature word lists and rule template to match the text. Experimental results show that the accuracy of the designed algorithm is 94.09%.
Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
Martinelli, F., Marulli, F., Mercaldo, F., Marrone, S., Santone, A..  2020.  Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Artificial Intelligence systems have enabled significant benefits for users and society, but whilst the data for their feeding are always increasing, a side to privacy and security leaks is offered. The severe vulnerabilities to the right to privacy obliged governments to enact specific regulations to ensure privacy preservation in any kind of transaction involving sensitive information. In the case of digital and/or physical documents comprising sensitive information, the right to privacy can be preserved by data obfuscation procedures. The capability of recognizing sensitive information for obfuscation is typically entrusted to the experience of human experts, who are over-whelmed by the ever increasing amount of documents to process. Artificial intelligence could proficiently mitigate the effort of the human officers and speed up processes. Anyway, until enough knowledge won't be available in a machine readable format, automatic and effectively working systems can't be developed. In this work we propose a methodology for transferring and leveraging general knowledge across specific-domain tasks. We built, from scratch, specific-domain knowledge data sets, for training artificial intelligence models supporting human experts in privacy preserving tasks. We exploited a mixture of natural language processing techniques applied to unlabeled domain-specific documents corpora for automatically obtain labeled documents, where sensitive information are recognized and tagged. We performed preliminary tests just over 10.000 documents from the healthcare and justice domains. Human experts supported us during the validation. Results we obtained, estimated in terms of precision, recall and F1-score metrics across these two domains, were promising and encouraged us to further investigations.

Rivera, S., Fei, Z., Griffioen, J..  2020.  POLANCO: Enforcing Natural Language Network Policies. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1–9.
Network policies govern the use of an institution's networks, and are usually written in a high-level human-readable natural language. Normally these policies are enforced by low-level, technically detailed network configurations. The translation from network policies into network configurations is a tedious, manual and error-prone process. To address this issue, we propose a new intermediate language called POlicy LANguage for Campus Operations (POLANCO), which is a human-readable network policy definition language intended to approximate natural language. Because POLANCO is a high-level language, the translation from natural language policies to POLANCO is straightforward. Despite being a high-level human readable language, POLANCO can be used to express network policies in a technically precise way so that policies written in POLANCO can be automatically translated into a set of software defined networking (SDN) rules and actions that enforce the policies. Moreover, POLANCO is capable of incorporating information about the current network state, reacting to changes in the network and adjusting SDN rules to ensure network policies continue to be enforced correctly. We present policy examples found on various public university websites and show how they can be written as simplified human-readable statements using POLANCO and how they can be automatically translated into SDN rules that correctly enforce these policies.
Eftimie, S., Moinescu, R., Rǎcuciu, C..  2020.  Insider Threat Detection Using Natural Language Processing and Personality Profiles. 2020 13th International Conference on Communications (COMM). :325–330.
This work represents an interdisciplinary effort to proactively identify insider threats, using natural language processing and personality profiles. Profiles were developed for the relevant insider threat types using the five-factor model of personality and were used in a proof-of-concept detection system. The system employs a third-party cloud service that uses natural language processing to analyze personality profiles based on personal content. In the end, an assessment was made over the feasibility of the system using a public dataset.
Lansley, M., Kapetanakis, S., Polatidis, N..  2020.  SEADer++ v2: Detecting Social Engineering Attacks using Natural Language Processing and Machine Learning. 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA). :1–6.
Social engineering attacks are well known attacks in the cyberspace and relatively easy to try and implement because no technical knowledge is required. In various online environments such as business domains where customers talk through a chat service with employees or in social networks potential hackers can try to manipulate other people by employing social attacks against them to gain information that will benefit them in future attacks. Thus, we have used a number of natural language processing steps and a machine learning algorithm to identify potential attacks. The proposed method has been tested on a semi-synthetic dataset and it is shown to be both practical and effective.
2021-02-08
Zhang, J..  2020.  DeepMal: A CNN-LSTM Model for Malware Detection Based on Dynamic Semantic Behaviours. 2020 International Conference on Computer Information and Big Data Applications (CIBDA). :313–316.
Malware refers to any software accessing or being installed in a system without the authorisation of administrators. Various malware has been widely used for cyber-criminals to accomplish their evil intentions and goals. To combat the increasing amount and reduce the threat of malicious programs, a novel deep learning framework, which uses NLP techniques for reference, combines CNN and LSTM neurones to capture the locally spatial correlations and learn from sequential longterm dependency is proposed. Hence, high-level abstractions and representations are automatically extracted for the malware classification task. The classification accuracy improves from 0.81 (best one by Random Forest) to approximately 1.0.
2021-02-01
Wu, L., Chen, X., Meng, L., Meng, X..  2020.  Multitask Adversarial Learning for Chinese Font Style Transfer. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.
Style transfer between Chinese fonts is challenging due to both the complexity of Chinese characters and the significant difference between fonts. Existing algorithms for this task typically learn a mapping between the reference and target fonts for each character. Subsequently, this mapping is used to generate the characters that do not exist in the target font. However, the characters available for training are unlikely to cover all fine-grained parts of the missing characters, leading to the overfitting problem. As a result, the generated characters of the target font may suffer problems of incomplete or even radicals and dirty dots. To address this problem, this paper presents a multi-task adversarial learning approach, termed MTfontGAN, to generate more vivid Chinese characters. MTfontGAN learns to transfer a reference font to multiple target ones simultaneously. An alignment is imposed on the encoders of different tasks to make them focus on the important parts of the characters in general style transfer. Such cross-task interactions at the feature level effectively improve the generalization capability of MTfontGAN. The performance of MTfontGAN is evaluated on three Chinese font datasets. Experimental results show that MTfontGAN outperforms the state-of-the-art algorithms in a single-task setting. More importantly, increasing the number of tasks leads to better performance in all of them.
2021-01-28
He, H. Y., Yang, Z. Guo, Chen, X. N..  2020.  PERT: Payload Encoding Representation from Transformer for Encrypted Traffic Classification. 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). :1—8.

Traffic identification becomes more important yet more challenging as related encryption techniques are rapidly developing nowadays. In difference to recent deep learning methods that apply image processing to solve such encrypted traffic problems, in this paper, we propose a method named Payload Encoding Representation from Transformer (PERT) to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique. Based on this, we further provide a traffic classification framework in which unlabeled traffic is utilized to pre-train an encoding network that learns the contextual distribution of traffic payload bytes. Then, the downward classification reuses the pre-trained network to obtain an enhanced classification result. By implementing experiments on a public encrypted traffic data set and our captured Android HTTPS traffic, we prove the proposed method can achieve an obvious better effectiveness than other compared baselines. To the best of our knowledge, this is the first time the encrypted traffic classification with the dynamic word embedding alone with its pre-training strategy has been addressed.