Visible to the public Biblio

Filters: Keyword is information retrieval  [Clear All Filters]
2021-06-24
Moran, Kevin, Palacio, David N., Bernal-Cárdenas, Carlos, McCrystal, Daniel, Poshyvanyk, Denys, Shenefiel, Chris, Johnson, Jeff.  2020.  Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :873—885.
Traceability is a fundamental component of the modern software development process that helps to ensure properly functioning, secure programs. Due to the high cost of manually establishing trace links, researchers have developed automated approaches that draw relationships between pairs of textual software artifacts using similarity measures. However, the effectiveness of such techniques are often limited as they only utilize a single measure of artifact similarity and cannot simultaneously model (implicit and explicit) relationships across groups of diverse development artifacts. In this paper, we illustrate how these limitations can be overcome through the use of a tailored probabilistic model. To this end, we design and implement a HierarchiCal PrObabilistic Model for SoftwarE Traceability (Comet) that is able to infer candidate trace links. Comet is capable of modeling relationships between artifacts by combining the complementary observational prowess of multiple measures of textual similarity. Additionally, our model can holistically incorporate information from a diverse set of sources, including developer feedback and transitive (often implicit) relationships among groups of software artifacts, to improve inference accuracy. We conduct a comprehensive empirical evaluation of Comet that illustrates an improvement over a set of optimally configured baselines of ≈14% in the best case and ≈5% across all subjects in terms of average precision. The comparative effectiveness of Comet in practice, where optimal configuration is typically not possible, is likely to be higher. Finally, we illustrate Comet's potential for practical applicability in a survey with developers from Cisco Systems who used a prototype Comet Jenkins plugin.
2021-04-08
Guo, T., Zhou, R., Tian, C..  2020.  On the Information Leakage in Private Information Retrieval Systems. IEEE Transactions on Information Forensics and Security. 15:2999—3012.
We consider information leakage to the user in private information retrieval (PIR) systems. Information leakage can be measured in terms of individual message leakage or total leakage. Individual message leakage, or simply individual leakage, is defined as the amount of information that the user can obtain on any individual message that is not being requested, and the total leakage is defined as the amount of information that the user can obtain about all the other messages except the one being requested. In this work, we characterize the tradeoff between the minimum download cost and the individual leakage, and that for the total leakage, respectively. Coding schemes are proposed to achieve these optimal tradeoffs, which are also shown to be optimal in terms of the message size. We further characterize the optimal tradeoff between the minimum amount of common randomness and the total leakage. Moreover, we show that under individual leakage, common randomness is in fact unnecessary when there are more than two messages.
2021-03-22
Fan, X., Zhang, F., Turamat, E., Tong, C., Wu, J. H., Wang, K..  2020.  Provenance-based Classification Policy based on Encrypted Search. 2020 2nd International Conference on Industrial Artificial Intelligence (IAI). :1–6.
As an important type of cloud data, digital provenance is arousing increasing attention on improving system performance. Currently, provenance has been employed to provide cues regarding access control and to estimate data quality. However, provenance itself might also be sensitive information. Therefore, provenance might be encrypted and stored in the Cloud. In this paper, we provide a mechanism to classify cloud documents by searching specific keywords from their encrypted provenance, and we prove our scheme achieves semantic security. In term of application of the proposed techniques, considering that files are classified to store separately in the cloud, in order to facilitate the regulation and security protection for the files, the classification policies can use provenance as conditions to determine the category of a document. Such as the easiest sample policy goes like: the documents have been reviewed twice can be classified as “public accessible”, which can be accessed by the public.
2021-02-10
Kerschbaumer, C., Ritter, T., Braun, F..  2020.  Hardening Firefox against Injection Attacks. 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :653—663.
Web browsers display content in the form of HTML, CSS and JavaScript retrieved from the world wide web. The loaded content is subject to the web security model and considered untrusted and potentially malicious. To complicate security matters, Firefox uses the same technologies to render its user interface as it does to render untrusted web content which blurs the distinction between the two privilege levels.Getting interactions between the two correct turns out to be complicated and has led to numerous real-world security vulnerabilities. We study those vulnerabilities to discover common threats and explain how we address them systematically to harden Firefox.
2021-02-01
Li, R., Ishimaki, Y., Yamana, H..  2020.  Privacy Preserving Calculation in Cloud using Fully Homomorphic Encryption with Table Lookup. 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). :315–322.
To protect data in cloud servers, fully homomorphic encryption (FHE) is an effective solution. In addition to encrypting data, FHE allows a third party to evaluate arithmetic circuits (i.e., computations) over encrypted data without decrypting it, guaranteeing protection even during the calculation. However, FHE supports only addition and multiplication. Functions that cannot be directly represented by additions or multiplications cannot be evaluated with FHE. A naïve implementation of such arithmetic operations with FHE is a bit-wise operation that encrypts numerical data as a binary string. This incurs huge computation time and storage costs, however. To overcome this limitation, we propose an efficient protocol to evaluate multi-input functions with FHE using a lookup table. We extend our previous work, which evaluates a single-integer input function, such as f(x). Our extended protocol can handle multi-input functions, such as f(x,y). Thus, we propose a new method of constructing lookup tables that can evaluate multi-input functions to handle general functions. We adopt integer encoding rather than bit-wise encoding to speed up the evaluations. By adopting both permutation operations and a private information retrieval scheme, we guarantee that no information from the underlying plaintext is leaked between two parties: a cloud computation server and a decryptor. Our experimental results show that the runtime of our protocol for a two-input function is approximately 13 minutes, when there are 8,192 input elements in the lookup table. By adopting a multi-threading technique, the runtime can be further reduced to approximately three minutes with eight threads. Our work is more practical than a previously proposed bit-wise implementation, which requires 60 minutes to evaluate a single-input function.
2020-12-11
Liu, F., Li, J., Wang, Y., Li, L..  2019.  Kubestorage: A Cloud Native Storage Engine for Massive Small Files. 2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC). :1—4.
Cloud Native, the emerging computing infrastructure has become a new trend for cloud computing, especially after the development of containerization technology such as docker and LXD, and the orchestration system for them like Kubernetes and Swarm. With the growing popularity of Cloud Native, the following problems have been raised: (i) most Cloud Native applications were designed for making full use of the cloud platform, but their file storage has not been completely optimized for adapting it. (ii) the traditional file system is designed as a utility for storing and retrieving files, usually built into the kernel of the operating systems. But when placing it to a large-scale condition, like a network storage server shared by thousands of computing instances, and stores millions of files, it will be slow and even unstable. (iii) most storage solutions use metadata for faster tracking of files, but the metadata itself will take up a lot of space, and the capacity of it is usually limited. If the file system store metadata directly into hard disk without caching, the tracking of massive small files will be a lot slower. (iv) The traditional object storage solution can't provide enough features to make itself more practical on the cloud such as caching and auto replication. This paper proposes a new storage engine based on the well-known Haystack storage engine, optimized in terms of service discovery and Automated fault tolerance, make it more suitable for Cloud Native infrastructure, deployment and applications. We use the object storage model to solve the large and high-frequency file storage needs, offering a simple and unified set of APIs for application to access. We also take advantage of Kubernetes' sophisticated and automated toolchains to make cloud storage easier to deploy, more flexible to scale, and more stable to run.
2020-09-11
Kim, Donghoon, Sample, Luke.  2019.  Search Prevention with Captcha Against Web Indexing: A Proof of Concept. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :219—224.
A website appears in search results based on web indexing conducted by a search engine bot (e.g., a web crawler). Some webpages do not want to be found easily because they include sensitive information. There are several methods to prevent web crawlers from indexing in search engine database. However, such webpages can still be indexed by malicious web crawlers. Through this study, we explore a paradox perspective on a new use of captchas for search prevention. Captchas are used to prevent web crawlers from indexing by converting sensitive words to captchas. We have implemented the web-based captcha conversion tool based on our search prevention algorithm. We also describe our proof of concept with the web-based chat application modified to utilize our algorithm. We have conducted the experiment to evaluate our idea on Google search engine with two versions of webpages, one containing plain text and another containing sensitive words converted to captchas. The experiment results show that the sensitive words on the captcha version of the webpages are unable to be found by Google's search engine, while the plain text versions are.
2020-08-28
Singh, Kuhu, Sajnani, Anil Kumar, Kumar Khatri, Sunil.  2019.  Data Security Enhancement in Cloud Computing Using Multimodel Biometric System. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). :175—179.
Today, data is all around us, every device that has computation power is generating the data and we can assume that in today's world there is about 2 quintillion bytes of data is been generating every day. as data increase in the database of the world servers so as the risk of data leak where we are talking about unlimited confidential data that is available online but as humans are developing their data online so as its security, today we've got hundreds of way to secure out data but not all are very successful or compatible there the big question arises that how to secure our data to hide our all the confidential information online, in other words one's all life work can be found online which is on risk of leak. all that says is today we have cloud above all of our data centers that stores all the information so that one can access anything from anywhere. in this paper we are introducing a new multimodal biometric system that is possible for the future smartphones to be supported where one can upload, download or modify the files using cloud without worrying about the unauthorized access of any third person as this security authentication uses combination of multiple security system available today that are not easy to breach such as DNA encryption which mostly is based on AES cipher here in this paper there we have designed triple layer of security.
Yau, Yiu Chung, Khethavath, Praveen, Figueroa, Jose A..  2019.  Secure Pattern-Based Data Sensitivity Framework for Big Data in Healthcare. 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science Engineering (BCD). :65—70.
With the exponential growth in the usage of electronic medical records (EMR), the amount of data generated by the healthcare industry has too increased exponentially. These large amounts of data, known as “Big Data” is mostly unstructured. Special big data analytics methods are required to process the information and retrieve information which is meaningful. As patient information in hospitals and other healthcare facilities become increasingly electronic, Big Data technologies are needed now more than ever to manage and understand this data. In addition, this information tends to be quite sensitive and needs a highly secure environment. However, current security algorithms are hard to be implemented because it would take a huge amount of time and resources. Security protocols in Big data are also not adequate in protecting sensitive information in the healthcare. As a result, the healthcare data is both heterogeneous and insecure. As a solution we propose the Secure Pattern-Based Data Sensitivity Framework (PBDSF), that uses machine learning mechanisms to identify the common set of attributes of patient data, data frequency, various patterns of codes used to identify specific conditions to secure sensitive information. The framework uses Hadoop and is built on Hadoop Distributed File System (HDFS) as a basis for our clusters of machines to process Big Data, and perform tasks such as identifying sensitive information in a huge amount of data and encrypting data that are identified to be sensitive.
2020-08-13
Wang, Tianyi, Chow, Kam Pui.  2019.  Automatic Tagging of Cyber Threat Intelligence Unstructured Data using Semantics Extraction. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :197—199.
Threat intelligence, information about potential or current attacks to an organization, is an important component in cyber security territory. As new threats consecutively occurring, cyber security professionals always keep an eye on the latest threat intelligence in order to continuously lower the security risks for their organizations. Cyber threat intelligence is usually conveyed by structured data like CVE entities and unstructured data like articles and reports. Structured data are always under certain patterns that can be easily analyzed, while unstructured data have more difficulties to find fixed patterns to analyze. There exists plenty of methods and algorithms on information extraction from structured data, but no current work is complete or suitable for semantics extraction upon unstructured cyber threat intelligence data. In this paper, we introduce an idea of automatic tagging applying JAPE feature within GATE framework to perform semantics extraction upon cyber threat intelligence unstructured data such as articles and reports. We extract token entities from each cyber threat intelligence article or report and evaluate the usefulness of them. A threat intelligence ontology then can be constructed with the useful entities extracted from related resources and provide convenience for professionals to find latest useful threat intelligence they need.
2020-07-30
Zhang, Jin, Jin, Dahai, Gong, Yunzhan.  2018.  File Similarity Determination Based on Function Call Graph. 2018 IEEE International Conference on Electronics and Communication Engineering (ICECE). :55—59.
The similarity detection of the program has important significance in code reuse, plagiarism detection, intellectual property protection and information retrieval methods. Attribute counting methods cannot take into account program semantics. The method based on syntax tree or graph structure has a very high construction cost and low space efficiency. So it is difficult to solve problems in large-scale software systems. This paper uses different decision strategies for different levels, then puts forward a similarity detection method at the file level. This method can make full use of the features of the program and take into account the space-time efficiency. By using static analysis methods, we get function features and control flow features of files. And based on this, we establish the function call graph. The similar degree between two files can be measured with the two graphs. Experimental results show the method can effectively detect similar files. Finally, this paper discusses the direction of development of this method.
2020-07-13
Agrawal, Shriyansh, Sanagavarapu, Lalit Mohan, Reddy, YR.  2019.  FACT - Fine grained Assessment of web page CredibiliTy. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON). :1088–1097.
With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ( GCS). We validated our approach on `Information Security' dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on `Health' domain web pages and had comparable results. The calculated GCS correlated 69% with crowdsourced Web Of Trust ( WOT) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our GCS approach aligns with human way ( WOT) as compared to algorithmic way (Alexa) of web assessment in both the experiments.
2020-06-02
Zewail, Ahmed A., Yener, Aylin.  2019.  Secure Caching and Delivery for Combination Networks with Asymmetric Connectivity. 2019 IEEE Information Theory Workshop (ITW). :1—5.

We consider information theoretic security in a two-hop combination network where there are groups of end users with distinct degrees of connectivity served by a layer of relays. The model represents a network set up with users having access to asymmetric resources, here the number of relays that they are connected to, yet demand security guarantees uniformly. We study two security constraints separately and simultaneously: secure delivery where the information must be kept confidential from an external entity that wiretaps the delivery phase; and secure caching where each cache-aided end-user can retrieve the file it requests and cannot obtain any information on files it does not. The achievable schemes we construct are multi-stage where each stage completes requests by a class of users.

2020-05-22
Platonov, A.V., Poleschuk, E.A., Bessmertny, I. A., Gafurov, N. R..  2018.  Using quantum mechanical framework for language modeling and information retrieval. 2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT). :1—4.

This article shows the analogy between natural language texts and quantum-like systems on the example of the Bell test calculating. The applicability of the well-known Bell test for texts in Russian is investigated. The possibility of using this test for the text separation on the topics corresponding to the user query in information retrieval system is shown.

Yang, Jiacheng, Chen, Bin, Xia, Shu-Tao.  2019.  Mean-Removed Product Quantization for Approximate Nearest Neighbor Search. 2019 International Conference on Data Mining Workshops (ICDMW). :711—718.
Product quantization (PQ) and its variations are popular and attractive in approximate nearest neighbor search (ANN) due to their lower memory usage and faster retrieval speed. PQ decomposes the high-dimensional vector space into several low-dimensional subspaces, and quantizes each sub-vector in their subspaces, separately. Thus, PQ can generate a codebook containing an exponential number of codewords or indices by a Cartesian product of the sub-codebooks from different subspaces. However, when there is large variance in the average amplitude of the components of the data points, directly utilizing the PQ on the data points would result in poor performance. In this paper, we propose a new approach, namely, mean-removed product quantization (MRPQ) to address this issue. In fact, the average amplitude of a data point or the mean of a date point can be regarded as statistically independent of the variation of the vector, that is, of the way the components vary about this average. Then we can learn a separate scalar quantizer of the means of the data points and apply the PQ to their residual vectors. As shown in our comprehensive experiments on four large-scale public datasets, our approach can achieve substantial improvements in terms of Recall and MAP over some known methods. Moreover, our approach is general which can be combined with PQ and its variations.
Abdelhadi, Ameer M.S., Bouganis, Christos-Savvas, Constantinides, George A..  2019.  Accelerated Approximate Nearest Neighbors Search Through Hierarchical Product Quantization. 2019 International Conference on Field-Programmable Technology (ICFPT). :90—98.
A fundamental recurring task in many machine learning applications is the search for the Nearest Neighbor in high dimensional metric spaces. Towards answering queries in large scale problems, state-of-the-art methods employ Approximate Nearest Neighbors (ANN) search, a search that returns the nearest neighbor with high probability, as well as techniques that compress the dataset. Product-Quantization (PQ) based ANN search methods have demonstrated state-of-the-art performance in several problems, including classification, regression and information retrieval. The dataset is encoded into a Cartesian product of multiple low-dimensional codebooks, enabling faster search and higher compression. Being intrinsically parallel, PQ-based ANN search approaches are amendable for hardware acceleration. This paper proposes a novel Hierarchical PQ (HPQ) based ANN search method as well as an FPGA-tailored architecture for its implementation that outperforms current state of the art systems. HPQ gradually refines the search space, reducing the number of data compares and enabling a pipelined search. The mapping of the architecture on a Stratix 10 FPGA device demonstrates over ×250 speedups over current state-of-the-art systems, opening the space for addressing larger datasets and/or improving the query times of current systems.
2020-03-30
Abdolahi, Mahssa, Jiang, Hao, Kaminska, Bozena.  2019.  Robust data retrieval from high-security structural colour QR codes via histogram equalization and decorrelation stretching. 2019 IEEE 10th Annual Ubiquitous Computing, Electronics Mobile Communication Conference (UEMCON). :0340–0346.
In this work, robust readout of the data (232 English characters) stored in high-security structural colour QR codes, was achieved by using multiple image processing techniques, specifically, histogram equalization and decorrelation stretching. The decoded structural colour QR codes are generic diffractive RGB-pixelated periodic nanocones selectively activated by laser exposure to obtain the particular design of interest. The samples were imaged according to the criteria determined by the diffraction grating equation for the lighting and viewing angles given the red, green, and blue periodicities of the grating. However, illumination variations all through the samples, cross-module and cross-channel interference effects result in acquiring images with dissimilar lighting conditions which cannot be directly retrieved by the decoding script and need significant preprocessing. According to the intensity plots, even if the intensity values are very close (above 200) at some typical regions of the images with different lighting conditions, their inconsistencies (below 100) at the pixels of one representative region may lead to the requirement for using different methods for recovering the data from all red, green, and blue channels. In many cases, a successful data readout could be achieved by downscaling the images to 300-pixel dimensions (along with bilinear interpolation resampling), histogram equalization (HE), linear spatial low-pass mean filtering, and gamma function, each used either independently or with other complementary processes. The majority of images, however, could be fully decoded using decorrelation stretching (DS) either as a standalone or combinational process for obtaining a more distinctive colour definition.
2020-03-18
Padmashree, M G, Khanum, Shahela, Arunalatha, J S, Venugopal, K R.  2019.  SIRLC: Secure Information Retrieval using Lightweight Cryptography in HIoT. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON). :269–273.

Advances in new Communication and Information innovations has led to a new paradigm known as Internet of Things (IoT). Healthcare environment uses IoT technologies for Patients care which can be used in various medical applications. Patient information is encrypted consistently to maintain the access of therapeutic records by authoritative entities. Healthcare Internet of Things (HIoT) facilitate the access of Patient files immediately in emergency situations. In the proposed system, the Patient directly provides the Key to the Doctor in normal care access. In Emergency care, a Patient shares an Attribute based Key with a set of Emergency Supporting Representatives (ESRs) and access permission to the Doctor for utilizing Emergency key from ESR. The Doctor decrypts the medical records by using Attribute based key and Emergency key to save the Patient's life. The proposed model Secure Information Retrieval using Lightweight Cryptography (SIRLC) reduces the secret key generation time and cipher text size. The performance evaluation indicates that SIRLC is a better option to utilize in Healthcare IoT than Lightweight Break-glass Access Control(LiBAC) with enhanced security and reduced computational complexity.

Shah, Meet D., Mohanty, Manoranjan, Atrey, Pradeep K..  2019.  SecureCSearch: Secure Searching in PDF Over Untrusted Cloud Servers. 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). :347–352.
The usage of cloud for data storage has become ubiquitous. To prevent data leakage and hacks, it is common to encrypt the data (e.g. PDF files) before sending it to a cloud. However, this limits the search for specific files containing certain keywords over an encrypted cloud data. The traditional method is to take down all files from a cloud, store them locally, decrypt and then search over them, defeating the purpose of using a cloud. In this paper, we propose a method, called SecureCSearch, to perform keyword search operations on the encrypted PDF files over cloud in an efficient manner. The proposed method makes use of Shamir's Secret Sharing scheme in a novel way to create encrypted shares of the PDF file and the keyword to search. We show that the proposed method maintains the security of the data and incurs minimal computation cost.
2020-03-12
Wu, Hanqing, Cao, Jiannong, Yang, Yanni, Tung, Cheung Leong, Jiang, Shan, Tang, Bin, Liu, Yang, Wang, Xiaoqing, Deng, Yuming.  2019.  Data Management in Supply Chain Using Blockchain: Challenges and a Case Study. 2019 28th International Conference on Computer Communication and Networks (ICCCN). :1–8.

Supply chain management (SCM) is fundamental for gaining financial, environmental and social benefits in the supply chain industry. However, traditional SCM mechanisms usually suffer from a wide scope of issues such as lack of information sharing, long delays for data retrieval, and unreliability in product tracing. Recent advances in blockchain technology show great potential to tackle these issues due to its salient features including immutability, transparency, and decentralization. Although there are some proof-of-concept studies and surveys on blockchain-based SCM from the perspective of logistics, the underlying technical challenges are not clearly identified. In this paper, we provide a comprehensive analysis of potential opportunities, new requirements, and principles of designing blockchain-based SCM systems. We summarize and discuss four crucial technical challenges in terms of scalability, throughput, access control, data retrieval and review the promising solutions. Finally, a case study of designing blockchain-based food traceability system is reported to provide more insights on how to tackle these technical challenges in practice.

2020-02-18
Liu, Ying, He, Qiang, Zheng, Dequan, Zhang, Mingwei, Chen, Feifei, Zhang, Bin.  2019.  Data Caching Optimization in the Edge Computing Environment. 2019 IEEE International Conference on Web Services (ICWS). :99–106.

With the rapid increase in the use of mobile devices in people's daily lives, mobile data traffic is exploding in recent years. In the edge computing environment where edge servers are deployed around mobile users, caching popular data on edge servers can ensure mobile users' fast access to those data and reduce the data traffic between mobile users and the centralized cloud. Existing studies consider the data cache problem with a focus on the reduction of network delay and the improvement of mobile devices' energy efficiency. In this paper, we attack the data caching problem in the edge computing environment from the service providers' perspective, who would like to maximize their venues of caching their data. This problem is complicated because data caching produces benefits at a cost and there usually is a trade-off in-between. In this paper, we formulate the data caching problem as an integer programming problem, and maximizes the revenue of the service provider while satisfying a constraint for data access latency. Extensive experiments are conducted on a real-world dataset that contains the locations of edge servers and mobile users, and the results reveal that our approach significantly outperform the baseline approaches.

2020-02-17
Rodriguez, Ariel, Okamura, Koji.  2019.  Generating Real Time Cyber Situational Awareness Information Through Social Media Data Mining. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2:502–507.
With the rise of the internet many new data sources have emerged that can be used to help us gain insights into the cyber threat landscape and can allow us to better prepare for cyber attacks before they happen. With this in mind, we present an end to end real time cyber situational awareness system which aims to efficiently retrieve security relevant information from the social networking site Twitter.com. This system classifies and aggregates the data retrieved and provides real time cyber situational awareness information based on sentiment analysis and data analytics techniques. This research will assist security analysts to evaluate the level of cyber risk in their organization and proactively take actions to plan and prepare for potential attacks before they happen as well as contribute to the field through a cybersecurity tweet dataset.
2020-02-10
Pan, Yuyang, Yin, Yanzhao, Zhao, Yulin, Wu, Liji, Zhang, Xiangmin.  2019.  A New Information Extractor for Profiled DPA and Implementation of High Order Masking Circuit. 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :258–262.
Profiled DPA is a new method combined with machine learning method in side channel attack which is put forward by Whitnall in CHES 2015.[1]The most important part lies in effectiveness of extracting information. This paper introduces a new rule Explained Local Variance (ELV) to extract information in profiled stage for profiled DPA. It attracts information effectively and shields noise to get better accuracy than the original rule. The ELV enables an attacker to use less power traces to get the same result as before. It also leads to 94.6% space reduction and 29.2% time reduction for calculation. For security circuit implementation, a high order masking scheme in modelsim is implemented. A new exchange network is put forward. 96.9% hardware resource is saved due to the usage of this network.
2019-12-16
Karve, Shreya, Nagmal, Arati, Papalkar, Sahil, Deshpande, S. A..  2018.  Context Sensitive Conversational Agent Using DNN. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). :475–478.
We investigate a method of building a closed domain intelligent conversational agent using deep neural networks. A conversational agent is a dialog system intended to converse with a human, with a coherent structure. Our conversational agent uses a retrieval based model that identifies the intent of the input user query and maps it to a knowledge base to return appropriate results. Human conversations are based on context, but existing conversational agents are context insensitive. To overcome this limitation, our system uses a simple stack based context identification and storage system. The conversational agent generates responses according to the current context of conversation. allowing more human-like conversations.
2019-11-25
Zuin, Gianlucca, Chaimowicz, Luiz, Veloso, Adriano.  2018.  Learning Transferable Features For Open-Domain Question Answering. 2018 International Joint Conference on Neural Networks (IJCNN). :1–8.

Corpora used to learn open-domain Question-Answering (QA) models are typically collected from a wide variety of topics or domains. Since QA requires understanding natural language, open-domain QA models generally need very large training corpora. A simple way to alleviate data demand is to restrict the domain covered by the QA model, leading thus to domain-specific QA models. While learning improved QA models for a specific domain is still challenging due to the lack of sufficient training data in the topic of interest, additional training data can be obtained from related topic domains. Thus, instead of learning a single open-domain QA model, we investigate domain adaptation approaches in order to create multiple improved domain-specific QA models. We demonstrate that this can be achieved by stratifying the source dataset, without the need of searching for complementary data unlike many other domain adaptation approaches. We propose a deep architecture that jointly exploits convolutional and recurrent networks for learning domain-specific features while transferring domain-shared features. That is, we use transferable features to enable model adaptation from multiple source domains. We consider different transference approaches designed to learn span-level and sentence-level QA models. We found that domain-adaptation greatly improves sentence-level QA performance, and span-level QA benefits from sentence information. Finally, we also show that a simple clustering algorithm may be employed when the topic domains are unknown and the resulting loss in accuracy is negligible.