Visible to the public Biblio

Found 161 results

Filters: Keyword is Data analysis  [Clear All Filters]
Jaafar, Fehmi, Avellaneda, Florent, Alikacem, El-Hackemi.  2020.  Demystifying the Cyber Attribution: An Exploratory Study. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :35–40.
Current cyber attribution approaches proposed to use a variety of datasets and analytical techniques to distill the information that will be useful to identify cyber attackers. In contrast, practitioners and researchers in cyber attribution face several technical and regulation challenges. In this paper, we describe the main challenges of cyber attribution and present a state of the art of used approaches to face these challenges. Then, we will present an exploratory study to perform cyber attacks attribution based on pattern recognition from real data. In our study, we are using attack pattern discovery and identification based on real data collection and analysis.
Khalid, O., Senthilananthan, S..  2020.  A review of data analytics techniques for effective management of big data using IoT. 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA). :1—10.
IoT and big data are energetic technology of the world for quite a time, and both of these have become a necessity. On the one side where IoT is used to connect different objectives via the internet, the big data means having a large number of the set of structured, unstructured, and semi-structured data. The device used for processing based on the tools used. These tools help provide meaningful information used for effective management in different domains. Some of the commonly faced issues with the inadequate about the technologies are related to data privacy, insufficient analytical capabilities, and this issue is faced by in different domains related to the big data. Data analytics tools help discover the pattern of data and consumer preferences which is resulting in better decision making for the organizations. The major part of this work is to review different types of data analytics techniques for the effective management of big data using IoT. For the effective management of the ABD solution collection, analysis and control are used as the components. Each of the ingredients is described to find an effective way to manage big data. These components are considered and used in the validation criteria. The solution of effective data management is a stage towards the management of big data in IoT devices which will help the user to understand different types of elements of data management.
Yang, Y., Lu, K., Cheng, H., Fu, M., Li, Z..  2020.  Time-controlled Regular Language Search over Encrypted Big Data. 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 9:1041—1045.

The rapid development of cloud computing and the arrival of the big data era make the relationship between users and cloud closer. Cloud computing has powerful data computing and data storage capabilities, which can ubiquitously provide users with resources. However, users do not fully trust the cloud server's storage services, so lots of data is encrypted and uploaded to the cloud. Searchable encryption can protect the confidentiality of data and provide encrypted data retrieval functions. In this paper, we propose a time-controlled searchable encryption scheme with regular language over encrypted big data, which provides flexible search pattern and convenient data sharing. Our solution allows users with data's secret keys to generate trapdoors by themselves. And users without data's secret keys can generate trapdoors with the help of a trusted third party without revealing the data owner's secret key. Our system uses a time-controlled mechanism to collect keywords queried by users and ensures that the querying user's identity is not directly exposed. The obtained keywords are the basis for subsequent big data analysis. We conducted a security analysis of the proposed scheme and proved that the scheme is secure. The simulation experiment and comparison of our scheme show that the system has feasible efficiency.

Syafalni, I., Fadhli, H., Utami, W., Dharma, G. S. A., Mulyawan, R., Sutisna, N., Adiono, T..  2020.  Cloud Security Implementation using Homomorphic Encryption. 2020 IEEE International Conference on Communication, Networks and Satellite (Comnetsat). :341—345.

With the advancement of computing and communication technologies, data transmission in the internet are getting bigger and faster. However, it is necessary to secure the data to prevent fraud and criminal over the internet. Furthermore, most of the data related to statistics requires to be analyzed securely such as weather data, health data, financial and other services. This paper presents an implementation of cloud security using homomorphic encryption for data analytic in the cloud. We apply the homomorphic encryption that allows the data to be processed without being decrypted. Experimental results show that, for the polynomial degree 26, 28, and 210, the total executions are 2.2 ms, 4.4 ms, 25 ms per data, respectively. The implementation is useful for big data security such as for environment, financial and hospital data analytics.

Peng, X., Hongmei, Z., Lijie, C., Ying, H..  2020.  Analysis of Computer Network Information Security under the Background of Big Data. 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA). :409—412.
In today's society, under the comprehensive arrival of the Internet era, the rapid development of technology has facilitated people's production and life, but it is also a “double-edged sword”, making people's personal information and other data subject to a greater threat of abuse. The unique features of big data technology, such as massive storage, parallel computing and efficient query, have created a breakthrough opportunity for the key technologies of large-scale network security situational awareness. On the basis of big data acquisition, preprocessing, distributed computing and mining and analysis, the big data analysis platform provides information security assurance services to the information system. This paper will discuss the security situational awareness in large-scale network environment and the promotion of big data technology in security perception.
Grundy, J..  2020.  Human-centric Software Engineering for Next Generation Cloud- and Edge-based Smart Living Applications. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). :1—10.

Humans are a key part of software development, including customers, designers, coders, testers and end users. In this keynote talk I explain why incorporating human-centric issues into software engineering for next-generation applications is critical. I use several examples from our recent and current work on handling human-centric issues when engineering various `smart living' cloud- and edge-based software systems. This includes using human-centric, domain-specific visual models for non-technical experts to specify and generate data analysis applications; personality impact on aspects of software activities; incorporating end user emotions into software requirements engineering for smart homes; incorporating human usage patterns into emerging edge computing applications; visualising smart city-related data; reporting diverse software usability defects; and human-centric security and privacy requirements for smart living systems. I assess the usefulness of these approaches, highlight some outstanding research challenges, and briefly discuss our current work on new human-centric approaches to software engineering for smart living applications.

Liao, Q., Gu, Y., Liao, J., Li, W..  2020.  Abnormal transaction detection of Bitcoin network based on feature fusion. 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 9:542—549.

Anomaly detection is one of the research hotspots in Bitcoin transaction data analysis. In view of the existing research that only considers the transaction as an isolated node when extracting features, but has not yet used the network structure to dig deep into the node information, a bitcoin abnormal transaction detection method that combines the node’s own features and the neighborhood features is proposed. Based on the formation mechanism of the interactive relationship in the transaction network, first of all, according to a certain path selection probability, the features of the neighbohood nodes are extracted by way of random walk, and then the node’s own features and the neighboring features are fused to use the network structure to mine potential node information. Finally, an unsupervised detection algorithm is used to rank the transaction points on the constructed feature set to find abnormal transactions. Experimental results show that, compared with the existing feature extraction methods, feature fusion improves the ability to detect abnormal transactions.

Kuppa, A., Le-Khac, N.-A..  2020.  Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Cybersecurity community is slowly leveraging Machine Learning (ML) to combat ever evolving threats. One of the biggest drivers for successful adoption of these models is how well domain experts and users are able to understand and trust their functionality. As these black-box models are being employed to make important predictions, the demand for transparency and explainability is increasing from the stakeholders.Explanations supporting the output of ML models are crucial in cyber security, where experts require far more information from the model than a simple binary output for their analysis. Recent approaches in the literature have focused on three different areas: (a) creating and improving explainability methods which help users better understand the internal workings of ML models and their outputs; (b) attacks on interpreters in white box setting; (c) defining the exact properties and metrics of the explanations generated by models. However, they have not covered, the security properties and threat models relevant to cybersecurity domain, and attacks on explainable models in black box settings.In this paper, we bridge this gap by proposing a taxonomy for Explainable Artificial Intelligence (XAI) methods, covering various security properties and threat models relevant to cyber security domain. We design a novel black box attack for analyzing the consistency, correctness and confidence security properties of gradient based XAI methods. We validate our proposed system on 3 security-relevant data-sets and models, and demonstrate that the method achieves attacker's goal of misleading both the classifier and explanation report and, only explainability method without affecting the classifier output. Our evaluation of the proposed approach shows promising results and can help in designing secure and robust XAI methods.

Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
Wang, Y., Kjerstad, E., Belisario, B..  2020.  A Dynamic Analysis Security Testing Infrastructure for Internet of Things. 2020 Sixth International Conference on Mobile And Secure Services (MobiSecServ). :1—6.
IoT devices such as Google Home and Amazon Echo provide great convenience to our lives. Many of these IoT devices collect data including Personal Identifiable Information such as names, phone numbers, and addresses and thus IoT security is important. However, conducting security analysis on IoT devices is challenging due to the variety, the volume of the devices, and the special skills required for hardware and software analysis. In this research, we create and demonstrate a dynamic analysis security testing infrastructure for capturing network traffic from IoT devices. The network traffic is automatically mirrored to a server for live traffic monitoring and offline data analysis. Using the dynamic analysis security testing infrastructure, we conduct extensive security analysis on network traffic from Google Home and Amazon Echo. Our testing results indicate that Google Home enforces tighter security controls than Amazon Echo while both Google and Amazon devices provide the desired security level to protect user data in general. The dynamic analysis security testing infrastructure presented in the paper can be utilized to conduct similar security analysis on any IoT devices.
Hu, X., Deng, C., Yuan, B..  2020.  Reduced-Complexity Singular Value Decomposition For Tucker Decomposition: Algorithm And Hardware. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1793–1797.
Tensors, as the multidimensional generalization of matrices, are naturally suited for representing and processing high-dimensional data. To date, tensors have been widely adopted in various data-intensive applications, such as machine learning and big data analysis. However, due to the inherent large-size characteristics of tensors, tensor algorithms, as the approaches that synthesize, transform or decompose tensors, are very computation and storage expensive, thereby hindering the potential further adoptions of tensors in many application scenarios, especially on the resource-constrained hardware platforms. In this paper, we propose a reduced-complexity SVD (Singular Vector Decomposition) scheme, which serves as the key operation in Tucker decomposition. By using iterative self-multiplication, the proposed scheme can significantly reduce the storage and computational costs of SVD, thereby reducing the complexity of the overall process. Then, corresponding hardware architecture is developed with 28nm CMOS technology. Our synthesized design can achieve 102GOPS with 1.09 mm2 area and 37.6 mW power consumption, and thereby providing a promising solution for accelerating Tucker decomposition.
Uzhga-Rebrov, O., Kuleshova, G..  2020.  Using Singular Value Decomposition to Reduce Dimensionality of Initial Data Set. 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS). :1–4.
The purpose of any data analysis is to extract essential information implicitly present in the data. To do this, it often seems necessary to transform the initial data into a form that allows one to identify and interpret the essential features of their structure. One of the most important tasks of data analysis is to reduce the dimension of the original data. The paper considers an approach to solving this problem based on singular value decomposition (SVD).
Mathur, G., Pandey, A., Goyal, S..  2020.  Immutable DNA Sequence Data Transmission for Next Generation Bioinformatics Using Blockchain Technology. 2nd International Conference on Data, Engineering and Applications (IDEA). :1–6.
In recent years, there is fast growth in the high throughput DNA sequencing technology, and also there is a reduction in the cost of genome-sequencing, that has led to a advances in the genetic industries. However, the reduction in cost and time required for DNA sequencing there is still an issue of managing such large amount of data. Also, the security and transmission of such huge amount of DNA sequence data is still an issue. The idea is to provide a secure storage platform for future generation bioinformatics systems for both researchers and healthcare user. Secure data sharing strategies, that can permit the healthcare providers along with their secured substances for verifying the accuracy of data, are crucial for ensuring proper medical services. In this paper, it has been surveyed about the applications of blockchain technology for securing healthcare data, where the recorded information is encrypted so that it becomes difficult to penetrate or being removed, as the primary goals of block-chaining technology is to make data immutable.
Clark, D. J., Turnbull, B..  2020.  Experiment Design for Complex Immersive Visualisation. 2020 Military Communications and Information Systems Conference (MilCIS). :1—5.

Experimentation focused on assessing the value of complex visualisation approaches when compared with alternative methods for data analysis is challenging. The interaction between participant prior knowledge and experience, a diverse range of experimental or real-world data sets and a dynamic interaction with the display system presents challenges when seeking timely, affordable and statistically relevant experimentation results. This paper outlines a hybrid approach proposed for experimentation with complex interactive data analysis tools, specifically for computer network traffic analysis. The approach involves a structured survey completed after free engagement with the software platform by expert participants. The survey captures objective and subjective data points relating to the experience with the goal of making an assessment of software performance which is supported by statistically significant experimental results. This work is particularly applicable to field of network analysis for cyber security and also military cyber operations and intelligence data analysis.

Powley, B. T..  2020.  Exploring Immersive and Non-Immersive Techniques for Geographic Data Visualization. 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). :1—2.

Analyzing multi-dimensional geospatial data is difficult and immersive analytics systems are used to visualize geospatial data and models. There is little previous work evaluating when immersive and non-immersive visualizations are the most suitable for data analysis and more research is needed.

Zhang, Y., Liu, Y., Chung, C.-L., Wei, Y.-C., Chen, C.-H..  2020.  Machine Learning Method Based on Stream Homomorphic Encryption Computing. 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). :1–2.
This study proposes a machine learning method based on stream homomorphic encryption computing for improving security and reducing computational time. A case study of mobile positioning based on k nearest neighbors ( kNN) is selected to evaluate the proposed method. The results showed the proposed method can save computational resources than others.
Alghamdi, A. A., Reger, G..  2020.  Pattern Extraction for Behaviours of Multi-Stage Threats via Unsupervised Learning. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). :1—8.
Detection of multi-stage threats such as Advanced Persistent Threats (APT) is extremely challenging due to their deceptive approaches. Sequential events of threats might look benign when performed individually or from different addresses. We propose a new unsupervised framework to identify patterns and correlations of malicious behaviours by analysing heterogeneous log-files. The framework consists of two main phases of data analysis to extract inner-behaviours of log-files and then the patterns of those behaviours over analysed files. To evaluate the framework we have produced a (publicly available) labelled version of the SotM43 dataset. Our results demonstrate that the framework can (i) efficiently cluster inner-behaviours of log-files with high accuracy and (ii) extract patterns of malicious behaviour and correlations between those patterns from real-world data.
Lobo-Vesga, E., Russo, A., Gaboardi, M..  2020.  A Programming Framework for Differential Privacy with Accuracy Concentration Bounds. 2020 IEEE Symposium on Security and Privacy (SP). :411–428.
Differential privacy offers a formal framework for reasoning about privacy and accuracy of computations on private data. It also offers a rich set of building blocks for constructing private data analyses. When carefully calibrated, these analyses simultaneously guarantee the privacy of the individuals contributing their data, and the accuracy of the data analyses results, inferring useful properties about the population. The compositional nature of differential privacy has motivated the design and implementation of several programming languages aimed at helping a data analyst in programming differentially private analyses. However, most of the programming languages for differential privacy proposed so far provide support for reasoning about privacy but not for reasoning about the accuracy of data analyses. To overcome this limitation, in this work we present DPella, a programming framework providing data analysts with support for reasoning about privacy, accuracy and their trade-offs. The distinguishing feature of DPella is a novel component which statically tracks the accuracy of different data analyses. In order to make tighter accuracy estimations, this component leverages taint analysis for automatically inferring statistical independence of the different noise quantities added for guaranteeing privacy. We evaluate our approach by implementing several classical queries from the literature and showing how data analysts can figure out the best manner to calibrate privacy to meet the accuracy requirements.
Lee, H., Cho, S., Seong, J., Lee, S., Lee, W..  2020.  De-identification and Privacy Issues on Bigdata Transformation. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). :514—519.

As the number of data in various industries and government sectors is growing exponentially, the `7V' concept of big data aims to create a new value by indiscriminately collecting and analyzing information from various fields. At the same time as the ecosystem of the ICT industry arrives, big data utilization is treatened by the privacy attacks such as infringement due to the large amount of data. To manage and sustain the controllable privacy level, there need some recommended de-identification techniques. This paper exploits those de-identification processes and three types of commonly used privacy models. Furthermore, this paper presents use cases which can be adopted those kinds of technologies and future development directions.

Yang, H., Huang, L., Luo, C., Yu, Q..  2020.  Research on Intelligent Security Protection of Privacy Data in Government Cyberspace. 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). :284—288.

Based on the analysis of the difficulties and pain points of privacy protection in the opening and sharing of government data, this paper proposes a new method for intelligent discovery and protection of structured and unstructured privacy data. Based on the improvement of the existing government data masking process, this method introduces the technologies of NLP and machine learning, studies the intelligent discovery of sensitive data, the automatic recommendation of masking algorithm and the full automatic execution following the improved masking process. In addition, the dynamic masking and static masking prototype with text and database as data source are designed and implemented with agent-based intelligent masking middleware. The results show that the recognition range and protection efficiency of government privacy data, especially government unstructured text have been significantly improved.

Cuzzocrea, A., Maio, V. De, Fadda, E..  2020.  Experimenting and Assessing a Distributed Privacy-Preserving OLAP over Big Data Framework: Principles, Practice, and Experiences. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1344—1350.
OLAP is an authoritative analytical tool in the emerging big data analytics context, with particular regards to the target distributed environments (e.g., Clouds). Here, privacy-preserving OLAP-based big data analytics is a critical topic, with several amenities in the context of innovative big data application scenarios like smart cities, social networks, bio-informatics, and so forth. The goal is that of providing privacy preservation during OLAP analysis tasks, with particular emphasis on the privacy of OLAP aggregates. Following this line of research, in this paper we provide a deep contribution on experimenting and assessing a state-of-the-art distributed privacy-preserving OLAP framework, named as SPPOLAP, whose main benefit is that of introducing a completely-novel privacy notion for OLAP data cubes.
Chaves, A., Moura, Í, Bernardino, J., Pedrosa, I..  2020.  The privacy paradigm : An overview of privacy in Business Analytics and Big Data. 2020 15th Iberian Conference on Information Systems and Technologies (CISTI). :1—6.
In this New Age where information has an indispensable value for companies and data mining technologies are growing in the area of Information Technology, privacy remains a sensitive issue in the approach to the exploitation of the large volume of data generated and processed by companies. The way data is collected, handled and destined is not yet clearly defined and has been the subject of constant debate by several areas of activity. This literature review gives an overview of privacy in the era of Business Analytics and Big Data in different timelines, the opportunities and challenges faced, aiming to broaden discussions on a subject that deserves extreme attention and aims to show that, despite measures for data protection have been created, there is still a need to discuss the subject among the different parties involved in the process to achieve a positive ideal for both users and companies.
Enkhtaivan, B., Inoue, A..  2020.  Mediating Data Trustworthiness by Using Trusted Hardware between IoT Devices and Blockchain. 2020 IEEE International Conference on Smart Internet of Things (SmartIoT). :314–318.
In recent years, with the progress of data analysis methods utilizing artificial intelligence (AI) technology, concepts of smart cities collecting data from IoT devices and creating values by analyzing it have been proposed. However, making sure that the data is not tampered with is of the utmost importance. One way to do this is to utilize blockchain technology to record and trace the history of the data. Park and Kim proposed ensuring the trustworthiness of the data by utilizing an IoT device with a trusted execution environment (TEE). Also, Guan et al. proposed authenticating an IoT device and mediating data using a TEE. For the authentication, they use the physically unclonable function of the IoT device. Usually, IoT devices suffer from the lack of resources necessary for creating transactions for the blockchain ledger. In this paper, we present a secure protocol in which a TEE acts as a proxy to the IoT devices and creates the necessary transactions for the blockchain. We use an authenticated encryption method on the data transmission between the IoT device and TEE to authenticate the device and ensure the integrity and confidentiality of the data generated by the IoT devices.
Kumar, S., Vasthimal, D. K..  2019.  Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.
Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.
Islam, M. S., Verma, H., Khan, L., Kantarcioglu, M..  2019.  Secure Real-Time Heterogeneous IoT Data Management System. 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :228–235.
The growing adoption of IoT devices in our daily life engendered a need for secure systems to safely store and analyze sensitive data as well as the real-time data processing system to be as fast as possible. The cloud services used to store and process sensitive data are often come out to be vulnerable to outside threats. Furthermore, to analyze streaming IoT data swiftly, they are in need of a fast and efficient system. The Paper will envision the aspects of complexity dealing with real time data from various devices in parallel, building solution to ingest data from different IOT devices, forming a secure platform to process data in a short time, and using various techniques of IOT edge computing to provide meaningful intuitive results to users. The paper envisions two modules of building a real time data analytics system. In the first module, we propose to maintain confidentiality and integrity of IoT data, which is of paramount importance, and manage large-scale data analytics with real-time data collection from various IoT devices in parallel. We envision a framework to preserve data privacy utilizing Trusted Execution Environment (TEE) such as Intel SGX, end-to-end data encryption mechanism, and strong access control policies. Moreover, we design a generic framework to simplify the process of collecting and storing heterogeneous data coming from diverse IoT devices. In the second module, we envision a drone-based data processing system in real-time using edge computing and on-device computing. As, we know the use of drones is growing rapidly across many application domains including real-time monitoring, remote sensing, search and rescue, delivery of goods, security and surveillance, civil infrastructure inspection etc. This paper demonstrates the potential drone applications and their challenges discussing current research trends and provide future insights for potential use cases using edge and on-device computing.