Visible to the public Biblio

Found 155 results

Filters: Keyword is natural language processing  [Clear All Filters]
2023-02-03
Oldal, Laura Gulyás, Kertész, Gábor.  2022.  Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts. 2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC). :000161–000166.
The range of text analysis methods in the field of natural language processing (NLP) has become more and more extensive thanks to the increasing computational resources of the 21st century. As a result, many deep learning-based solutions have been proposed for the purpose of authorship attribution, as they offer more flexibility and automated feature extraction compared to traditional statistical methods. A number of solutions have appeared for the attribution of English texts, however, the number of methods designed for Hungarian language is extremely small. Hungarian is a morphologically rich language, sentence formation is flexible and the alphabet is different from other languages. Furthermore, a language specific POS tagger, pretrained word embeddings, dependency parser, etc. are required. As a result, methods designed for other languages cannot be directly applied on Hungarian texts. In this paper, we review deep learning-based authorship attribution methods for English texts and offer techniques for the adaptation of these solutions to Hungarian language. As a part of the paper, we collected a new dataset consisting of Hungarian literary works of 15 authors. In addition, we extensively evaluate the implemented methods on the new dataset.
Ouamour, S., Sayoud, H..  2022.  Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features. 2022 5th International Symposium on Informatics and its Applications (ISIA). :1–4.
In the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94.
Praveen, Sivakami, Dcouth, Alysha, Mahesh, A S.  2022.  NoSQL Injection Detection Using Supervised Text Classification. 2022 2nd International Conference on Intelligent Technologies (CONIT). :1–5.
For a long time, SQL injection has been considered one of the most serious security threats. NoSQL databases are becoming increasingly popular as big data and cloud computing technologies progress. NoSQL injection attacks are designed to take advantage of applications that employ NoSQL databases. NoSQL injections can be particularly harmful because they allow unrestricted code execution. In this paper we use supervised learning and natural language processing to construct a model to detect NoSQL injections. Our model is designed to work with MongoDB, CouchDB, CassandraDB, and Couchbase queries. Our model has achieved an F1 score of 0.95 as established by 10-fold cross validation.
2022-12-01
Yu, Jialin, Cristea, Alexandra I., Harit, Anoushka, Sun, Zhongtian, Aduragba, Olanrewaju Tahir, Shi, Lei, Moubayed, Noura Al.  2022.  INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.
XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explain aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI [1]. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two.
2022-09-09
Khan, Aazar Imran, Jain, Samyak, Sharma, Purushottam, Deep, Vikas, Mehrotra, Deepti.  2021.  Stylometric Analysis of Writing Patterns Using Artificial Neural Networks. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :29—35.
Plagiarism checkers have been widely used to verify the authenticity of dissertation/project submissions. However, when non-verbatim plagiarism or online examinations are considered, this practice is not the best solution. In this work, we propose a better authentication system for online examinations that analyses the submitted text's stylometry for a match of writing pattern of the author by whom the text was submitted. The writing pattern is analyzed over many indicators (i.e., features of one's writing style). This model extracts 27 such features and stores them as the writing pattern of an individual. Stylometric Analysis is a better approach to verify a document's authorship as it doesn't check for plagiarism, but verifies if the document was written by a particular individual and hence completely shuts down the possibility of using text-convertors or translators. This paper also includes a brief comparative analysis of some simpler algorithms for the same problem statement. These algorithms yield results that vary in precision and accuracy and hence plotting a conclusion from the comparison shows that the best bet to tackle this problem is through Artificial Neural Networks.
Frankel, Sophia F., Ghosh, Krishnendu.  2021.  Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. 2021 IEEE International Conference on Big Data (Big Data). :3298—3304.
Identification of source code authorship is vital for attribution. In this work, a machine learning framework is described to identify source code authorship. The framework integrates the features extracted using natural language processing based approaches and abstract syntax tree of the code. We evaluate the methodology on Google Code Jam dataset. We present the performance measures of the logistic regression and deep learning on the dataset.
2022-08-26
Christopherjames, Jim Elliot, Saravanan, Mahima, Thiyam, Deepa Beeta, S, Prasath Alias Surendhar, Sahib, Mohammed Yashik Basheer, Ganapathi, Manju Varrshaa, Milton, Anisha.  2021.  Natural Language Processing based Human Assistive Health Conversational Agent for Multi-Users. 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC). :1414–1420.
Background: Most of the people are not medically qualified for studying or understanding the extremity of their diseases or symptoms. This is the place where natural language processing plays a vital role in healthcare. These chatbots collect patients' health data and depending on the data, these chatbot give more relevant data to patients regarding their body conditions and recommending further steps also. Purposes: In the medical field, AI powered healthcare chatbots are beneficial for assisting patients and guiding them in getting the most relevant assistance. Chatbots are more useful for online search that users or patients go through when patients want to know for their health symptoms. Methods: In this study, the health assistant system was developed using Dialogflow application programming interface (API) which is a Google's Natural language processing powered algorithm and the same is deployed on google assistant, telegram, slack, Facebook messenger, and website and mobile app. With this web application, a user can make health requests/queries via text message and might also get relevant health suggestions/recommendations through it. Results: This chatbot acts like an informative and conversational chatbot. This chatbot provides medical knowledge such as disease symptoms and treatments. Storing patients personal and medical information in a database for further analysis of the patients and patients get real time suggestions from doctors. Conclusion: In the healthcare sector AI-powered applications have seen a remarkable spike in recent days. This covid crisis changed the whole healthcare system upside down. So this NLP powered chatbot system reduced office waiting, saving money, time and energy. Patients might be getting medical knowledge and assisting ourselves within their own time and place.
2022-08-12
Maruyama, Yoshihiro.  2021.  Learning, Development, and Emergence of Compositionality in Natural Language Processing. 2021 IEEE International Conference on Development and Learning (ICDL). :1–7.
There are two paradigms in language processing, as characterised by symbolic compositional and statistical distributional modelling, which may be regarded as based upon the principles of compositionality (or symbolic recursion) and of contextuality (or the distributional hypothesis), respectively. Starting with philosophy of language as in Frege and Wittgenstein, we elucidate the nature of language and language processing from interdisciplinary perspectives across different fields of science. At the same time, we shed new light on conceptual issues in language processing on the basis of recent advances in Transformer-based models such as BERT and GPT-3. We link linguistic cognition with mathematical cognition through these discussions, explicating symbol grounding/emergence problems shared by both of them. We also discuss whether animal cognition can develop recursive compositional information processing.
2022-07-15
Wang, Shilei, Wang, Hui, Yu, Hongtao, Zhang, Fuzhi.  2021.  Detecting shilling groups in recommender systems based on hierarchical topic model. 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :832—837.
In a group shilling attack, attackers work collaboratively to inject fake profiles aiming to obtain desired recommendation result. This type of attacks is more harmful to recommender systems than individual shilling attacks. Previous studies pay much attention to detect individual attackers, and little work has been done on the detection of shilling groups. In this work, we introduce a topic modeling method of natural language processing into shilling attack detection and propose a shilling group detection method on the basis of hierarchical topic model. First, we model the given dataset to a series of user rating documents and use the hierarchical topic model to learn the specific topic distributions of each user from these rating documents to describe user rating behaviors. Second, we divide candidate groups based on rating value and rating time which are not involved in the hierarchical topic model. Lastly, we calculate group suspicious degrees in accordance with several indicators calculated through the analysis of user rating distributions, and use the k-means clustering algorithm to distinguish shilling groups. The experimental results on the Netflix and Amazon datasets show that the proposed approach performs better than baseline methods.
2022-06-06
Hung, Benjamin W.K., Muramudalige, Shashika R., Jayasumana, Anura P., Klausen, Jytte, Libretti, Rosanne, Moloney, Evan, Renugopalakrishnan, Priyanka.  2019.  Recognizing Radicalization Indicators in Text Documents Using Human-in-the-Loop Information Extraction and NLP Techniques. 2019 IEEE International Symposium on Technologies for Homeland Security (HST). :1–7.
Among the operational shortfalls that hinder law enforcement from achieving greater success in preventing terrorist attacks is the difficulty in dynamically assessing individualized violent extremism risk at scale given the enormous amount of primarily text-based records in disparate databases. In this work, we undertake the critical task of employing natural language processing (NLP) techniques and supervised machine learning models to classify textual data in analyst and investigator notes and reports for radicalization behavioral indicators. This effort to generate structured knowledge will build towards an operational capability to assist analysts in rapidly mining law enforcement and intelligence databases for cues and risk indicators. In the near-term, this effort also enables more rapid coding of biographical radicalization profiles to augment a research database of violent extremists and their exhibited behavioral indicators.
2022-05-19
Anusha, M, Leelavathi, R.  2021.  Analysis on Sentiment Analytics Using Deep Learning Techniques. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :542–547.
Sentiment analytics is the process of applying natural language processing and methods for text-based information to define and extract subjective knowledge of the text. Natural language processing and text classifications can deal with limited corpus data and more attention has been gained by semantic texts and word embedding methods. Deep learning is a powerful method that learns different layers of representations or qualities of information and produces state-of-the-art prediction results. In different applications of sentiment analytics, deep learning methods are used at the sentence, document, and aspect levels. This review paper is based on the main difficulties in the sentiment assessment stage that significantly affect sentiment score, pooling, and polarity detection. The most popular deep learning methods are a Convolution Neural Network and Recurrent Neural Network. Finally, a comparative study is made with a vast literature survey using deep learning models.
Zhang, Cheng, Yamana, Hayato.  2021.  Improving Text Classification Using Knowledge in Labels. 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA). :193–197.
Various algorithms and models have been proposed to address text classification tasks; however, they rarely consider incorporating the additional knowledge hidden in class labels. We argue that hidden information in class labels leads to better classification accuracy. In this study, instead of encoding the labels into numerical values, we incorporated the knowledge in the labels into the original model without changing the model architecture. We combined the output of an original classification model with the relatedness calculated based on the embeddings of a sequence and a keyword set. A keyword set is a word set to represent knowledge in the labels. Usually, it is generated from the classes while it could also be customized by the users. The experimental results show that our proposed method achieved statistically significant improvements in text classification tasks. The source code and experimental details of this study can be found on Github11https://github.com/HeroadZ/KiL.
Kuilboer, Jean-Pierre, Stull, Tristan.  2021.  Text Analytics and Big Data in the Financial domain. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI). :1–4.
This research attempts to provide some insights on the application of text mining and Natural Language Processing (NLP). The application domain is consumer complaints about financial institutions in the USA. As an advanced analytics discipline embedded within the Big Data paradigm, the practice of text analytics contains elements of emergent knowledge processes. Since our experiment should be able to scale up we make use of a pipeline based on Spark-NLP. The usage scenario is adapting the model to a specific industrial context and using the dataset offered by the "Consumer Financial Protection Bureau" to illustrate the application.
2022-04-25
Khalil, Hady A., Maged, Shady A..  2021.  Deepfakes Creation and Detection Using Deep Learning. 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :1–4.
Deep learning has been used in a wide range of applications like computer vision, natural language processing and image detection. The advancement in deep learning algorithms in image detection and manipulation has led to the creation of deepfakes, deepfakes use deep learning algorithms to create fake images that are at times very hard to distinguish from real images. With the rising concern around personal privacy and security, Many methods to detect deepfake images have emerged, in this paper the use of deep learning for creating as well as detecting deepfakes is explored, this paper also propose the use of deep learning image enhancement method to improve the quality of deepfakes created.
Ren, Jing, Xia, Feng, Liu, Yemeng, Lee, Ivan.  2021.  Deep Video Anomaly Detection: Opportunities and Challenges. 2021 International Conference on Data Mining Workshops (ICDMW). :959–966.
Anomaly detection is a popular and vital task in various research contexts, which has been studied for several decades. To ensure the safety of people’s lives and assets, video surveillance has been widely deployed in various public spaces, such as crossroads, elevators, hospitals, banks, and even in private homes. Deep learning has shown its capacity in a number of domains, ranging from acoustics, images, to natural language processing. However, it is non-trivial to devise intelligent video anomaly detection systems cause anomalies significantly differ from each other in different application scenarios. There are numerous advantages if such intelligent systems could be realised in our daily lives, such as saving human resources in a large degree, reducing financial burden on the government, and identifying the anomalous behaviours timely and accurately. Recently, many studies on extending deep learning models for solving anomaly detection problems have emerged, resulting in beneficial advances in deep video anomaly detection techniques. In this paper, we present a comprehensive review of deep learning-based methods to detect the video anomalies from a new perspective. Specifically, we summarise the opportunities and challenges of deep learning models on video anomaly detection tasks, respectively. We put forth several potential future research directions of intelligent video anomaly detection system in various application domains. Moreover, we summarise the characteristics and technical problems in current deep learning methods for video anomaly detection.
2022-04-19
Srinivasan, Sudarshan, Begoli, Edmon, Mahbub, Maria, Knight, Kathryn.  2021.  Nomen Est Omen - The Role of Signatures in Ascribing Email Author Identity with Transformer Neural Networks. 2021 IEEE Security and Privacy Workshops (SPW). :291–297.
Authorship attribution, an NLP problem where anonymous text is matched to its author, has important, cross-disciplinary applications, particularly those concerning cyber-defense. Our research examines the degree of sensitivity that attention-based models have to adversarial perturbations. We ask, what is the minimal amount of change necessary to maximally confuse a transformer model? In our investigation we examine a balanced subset of emails from the Enron email dataset, calculating the performance of our model before and after email signatures have been perturbed. Results show that the model's performance changed significantly in the absence of a signature, indicating the importance of email signatures in email authorship detection. Furthermore, we show that these models rely on signatures for shorter emails much more than for longer emails. We also indicate that additional research is necessary to investigate stylometric features and adversarial training to further improve classification model robustness.
2022-03-10
Pölöskei, István.  2021.  Continuous natural language processing pipeline strategy. 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI). :000221—000224.
Natural language processing (NLP) is a division of artificial intelligence. The constructed model's quality is entirely reliant on the training dataset's quality. A data streaming pipeline is an adhesive application, completing a managed connection from data sources to machine learning methods. The recommended NLP pipeline composition has well-defined procedures. The implemented message broker design is a usual apparatus for delivering events. It makes it achievable to construct a robust training dataset for machine learning use-case and serve the model's input. The reconstructed dataset is a valid input for the machine learning processes. Based on the data pipeline's product, the model recreation and redeployment can be scheduled automatically.
Sanyal, Hrithik, Shukla, Sagar, Agrawal, Rajneesh.  2021.  Natural Language Processing Technique for Generation of SQL Queries Dynamically. 2021 6th International Conference for Convergence in Technology (I2CT). :1—6.
Natural Language Processing is being used in every field of human to machine interaction. Database queries although have a confined set of instructions, but still found to be complex and dedicated human resources are required to write, test, optimize and execute structured query language statements. This makes it difficult, time-consuming and many a time inaccurate too. Such difficulties can be overcome if the queries are formed dynamically with standard procedures. In this work, parsing, lexical analysis, synonym detection and formation processes of the natural language processing are being proposed to be used for dynamically generating SQL queries and optimization of them for fast processing with high accuracy. NLP parsing of the user inputted text for retrieving, creation and insertion of data are being proposed to be created dynamically from English text inputs. This will help users of the system to generate reports from the data as per the requirement without the complexities of SQL. The proposed system will not only generate queries dynamically but will also provide high accuracy and performance.
Ozan, Şükrü, Taşar, D. Emre.  2021.  Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods. 2021 29th Signal Processing and Communications Applications Conference (SIU). :1—4.
In this study, we aim to find a method to autotag sentences specific to a domain. Our training data comprises short conversational sentences extracted from chat conversations between company's customer representatives and web site visitors. We manually tagged approximately 14 thousand visitor inputs into ten basic categories, which will later be used in a transformer-based language model with attention mechanisms for the ultimate goal of developing a chatbot application that can produce meaningful dialogue.We considered three different stateof- the-art models and reported their auto-tagging capabilities. We achieved the best performance with the bidirectional encoder representation from transformers (BERT) model. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.
Qin, Shuangling, Xu, Chaozhi, Zhang, Fang, Jiang, Tao, Ge, Wei, Li, Jihong.  2021.  Research on Application of Chinese Natural Language Processing in Constructing Knowledge Graph of Chronic Diseases. 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). :271—274.
Knowledge Graph can describe the concepts in the objective world and the relationships between these concepts in a structured way, and identify, discover and infer the relationships between things and concepts. It has been developed in the field of medical and health care. In this paper, the method of natural language processing has been used to build chronic disease knowledge graph, such as named entity recognition, relationship extraction. This method is beneficial to forecast analysis of chronic disease, network monitoring, basic education, etc. The research of this paper can greatly help medical experts in the treatment of chronic disease treatment, and assist primary clinicians with making more scientific decision, and can help Patients with chronic diseases to improve medical efficiency. In the end, it also has practical significance for clinical scientific research of chronic disease.
Gupta, Subhash Chand, Singh, Nidhi Raj, Sharma, Tulsi, Tyagi, Akshita, Majumdar, Rana.  2021.  Generating Image Captions using Deep Learning and Natural Language Processing. 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). :1—4.
In today's world, there is rapid progress in the field of artificial intelligence and image captioning. It becomes a fascinating task that has saw widespread interest. The task of image captioning comprises image description engendered based on the hybrid combination of deep learning, natural language processing, and various approaches of machine learning and computer vision. In this work authors emphasize on how the model generates a short description as an output of the input image using the functionalities of Deep Learning and Natural Language Processing, for helping visually impaired people, and can also be cast-off in various web sites to automate the generation of captions reducing the task of recitation with great ease.
Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen.  2021.  Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.
With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.
Ge, Xin.  2021.  Internet of things device recognition method based on natural language processing and text similarity. 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). :137—140.
Effective identification of Internet of things devices in cyberspace is of great significance to the protection of Cyberspace Security. However, there are a large number of such devices in cyberspace, which can not be identified by the existing methods of identifying IoT devices because of the lack of key information such as manufacturer name and device name in the response message. Their existence brings hidden danger to Cyberspace Security. In order to identify the IoT devices with missing key information in these response messages, this paper proposes an IoT device identification method, IoTCatcher. IoTCatcher uses HTTP response message and the structure and style characteristics of HTML document, and based on natural language processing technology and text similarity technology, classifies and compares the IoT devices whose response message lacks key information, so as to generate their device finger information. This paper proves that the recognition precision of IoTCatcher is 95.29%, and the recall rate is 91.01%. Compared with the existing methods, the overall performance is improved by 38.83%.
Yang, Mengde.  2021.  A Survey on Few-Shot Learning in Natural Language Processing. 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA). :294—297.
The annotated dataset is the foundation for Supervised Natural Language Processing. However, the cost of obtaining dataset is high. In recent years, the Few-Shot Learning has gradually attracted the attention of researchers. From the definition, in this paper, we conclude the difference in Few-Shot Learning between Natural Language Processing and Computer Vision. On that basis, the current Few-Shot Learning on Natural Language Processing is summarized, including Transfer Learning, Meta Learning and Knowledge Distillation. Furthermore, we conclude the solutions to Few-Shot Learning in Natural Language Processing, such as the method based on Distant Supervision, Meta Learning and Knowledge Distillation. Finally, we present the challenges facing Few-Shot Learning in Natural Language Processing.
Ahirrao, Mayur, Joshi, Yash, Gandhe, Atharva, Kotgire, Sumeet, Deshmukh, Rohini G..  2021.  Phrase Composing Tool using Natural Language Processing. 2021 International Conference on Intelligent Technologies (CONIT). :1—4.
In this fast-running world, machine communication plays a vital role. To compete with this world, human-machine interaction is a necessary thing. To enhance this, Natural Language Processing technique is used widely. Using this technique, we can reduce the interaction gap between the machine and human. Till now, many such applications are developed which are using this technique.This tool deals with the various methods which are used for development of grammar error correction. These methods include rule-based method, classifier-based method and machine translation-based method. Also, models regarding the Natural Language Processing (NLP) pipeline are trained and implemented in this project accordingly. Additionally, the tool can also perform speech to text operation.