Biblio

Filters: Keyword is metadata  [Clear All Filters]
2021-08-05
Bogatu, Alex, Fernandes, Alvaro A. A., Paton, Norman W., Konstantinou, Nikolaos.  2020.  Dataset Discovery in Data Lakes. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :709—720.
Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data wrangling, specific target datasets can be constructed that enable value- adding analytics. Given the potential vastness of such data lakes, the issue arises of how to pull out of the lake those datasets that might contribute to wrangling out a given target. We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. Our approach uses features of the values in a dataset to construct hash- based indexes that map those features into a uniform distance space. This makes it possible to define similarity distances between features and to take those distances as measurements of relatedness w.r.t. a target table. Given the latter (and exemplar tuples), our approach returns the most related tables in the lake. We provide a detailed description of the approach and report on empirical results for two forms of relatedness (unionability and joinability) comparing them with prior work, where pertinent, and showing significant improvements in all of precision, recall, target coverage, indexing and discovery times.
2021-05-13
Jaafar, Fehmi, Avellaneda, Florent, Alikacem, El-Hackemi.  2020.  Demystifying the Cyber Attribution: An Exploratory Study. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :35–40.
Current cyber attribution approaches proposed to use a variety of datasets and analytical techniques to distill the information that will be useful to identify cyber attackers. In contrast, practitioners and researchers in cyber attribution face several technical and regulation challenges. In this paper, we describe the main challenges of cyber attribution and present a state of the art of used approaches to face these challenges. Then, we will present an exploratory study to perform cyber attacks attribution based on pattern recognition from real data. In our study, we are using attack pattern discovery and identification based on real data collection and analysis.
2021-08-05
Alecakir, Huseyin, Kabukcu, Muhammet, Can, Burcu, Sen, Sevil.  2020.  Discovering Inconsistencies between Requested Permissions and Application Metadata by using Deep Learning. 2020 International Conference on Information Security and Cryptology (ISCTURKEY). :56—56.
Android gives us opportunity to extract meaningful information from metadata. From the security point of view, the missing important information in metadata of an application could be a sign of suspicious application, which could be directed for extensive analysis. Especially the usage of dangerous permissions is expected to be explained in app descriptions. The permission-to-description fidelity problem in the literature aims to discover such inconsistencies between the usage of permissions and descriptions. This study proposes a new method based on natural language processing and recurrent neural networks. The effect of user reviews on finding such inconsistencies is also investigated in addition to application descriptions. The experimental results show that high precision is obtained by the proposed solution, and the proposed method could be used for triage of Android applications.
Ramasubramanian, Muthukumaran, Muhammad, Hassan, Gurung, Iksha, Maskey, Manil, Ramachandran, Rahul.  2020.  ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings. 2020 SoutheastCon. :1—6.
Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described.
2021-05-18
Wei, Hanlin, Bai, Guangdong, Luo, Zongwei.  2020.  Foggy: A New Anonymous Communication Architecture Based on Microservices. 2020 25th International Conference on Engineering of Complex Computer Systems (ICECCS). :135–144.
This paper presents Foggy, an anonymous communication system focusing on providing users with anonymous web browsing. Foggy provides a microservice-based proxy for web browsing and other low-latency network activities without exposing users' metadata and browsed content to adversaries. It is designed with decentralized information management, web caching, and configurable service selection. Although Foggy seems to be more centralized compared with Tor, it gains an advantage in manageability while retaining anonymity. Foggy can be deployed by several agencies to become more decentralized. We prototype Foggy and test its performance. Our experiments show Foggy's low latency and deployability, demonstrating its potential to be a commercial solution for real-world deployment.
2021-01-11
Papadogiannaki, E., Deyannis, D., Ioannidis, S..  2020.  Head(er)Hunter: Fast Intrusion Detection using Packet Metadata Signatures. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1–6.
More than 75% of the Internet traffic is now encrypted, while this percentage is constantly increasing. The majority of communications are secured using common encryption protocols such as SSL/TLS and IPsec to ensure security and protect the privacy of Internet users. Yet, encryption can be exploited to hide malicious activities. Traditionally, network traffic inspection is based on techniques like deep packet inspection (DPI). Common applications for DPI include but are not limited to firewalls, intrusion detection and prevention systems, L7 filtering and packet forwarding. The core functionality of such DPI implementations is based on pattern matching that enables searching for specific strings or regular expressions inside the packet contents. With the widespread adoption of network encryption though, DPI tools that rely on packet payload content are becoming less effective, demanding the development of more sophisticated techniques in order to adapt to current network encryption trends. In this work, we present HeaderHunter, a fast signature-based intrusion detection system even in encrypted network traffic. We generate signatures using only network packet metadata extracted from packet headers. Also, to cope with the ever increasing network speeds, we accelerate the inner computations of our proposed system using off-the-shelf GPUs.
2021-05-18
Niloy, Nishat Tasnim, Islam, Md. Shariful.  2020.  IntellCache: An Intelligent Web Caching Scheme for Multimedia Contents. 2020 Joint 9th International Conference on Informatics, Electronics Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision Pattern Recognition (icIVPR). :1–6.
The traditional reactive web caching system is getting less popular day by day due to its inefficiency in handling the overwhelming requests for multimedia content. An intelligent web caching system intends to take optimal cache decisions by predicting future popular contents (FPC) proactively. In recent years, a few approaches have proposed some intelligent caching system where they were concerned about proactive caching. Those works intensified the importance of FPC prediction using the prediction models. However, only FPC prediction may not help to get the optimal solution in every scenario. In this paper, a technique named IntellCache has been proposed that increases the caching efficiency by taking a cache decision i.e. content storing decision before storing the predicted FPC. Different deep learning models such as- multilayer perceptron (MLP), Long short-term memory (LSTM) of Recurrent Neural Network (RNN) and ConvLSTM a combination of LSTM and Convolutional Neural Network (CNN) are compared to identify the most efficient model for FPC. The information on the contents of 18 years from the MovieLens data repository has been mined to evaluate the proposed approach. Results show that this proposed scheme outperforms previous solutions by achieving a higher cache hit ratio and lower average delay and thus, ensures users' satisfaction.
2021-08-05
Wang, Xiaowen, Huang, Yan.  2020.  Research on Semantic Based Metadata Method of SWIM Information Service. 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT. :1121—1125.
Semantic metadata is an important means to promote the integration of information and services and improve the level of search and discovery automation. Aiming at the problems that machine is difficult to handle service metadata description and lack of information metadata description in current SWIM information services, this paper analyzes the methods of metadata sematic empowerment and mainstream semantic metadata standards related to air traffic control system, constructs the SWIM information, and service sematic metadata model based on semantic expansion. The method of semantic metadata model mapping is given from two aspects of service and data, which can be used to improve the level of information sharing and intelligent processing.
2021-02-23
Patil, A., Jha, A., Mulla, M. M., Narayan, D. G., Kengond, S..  2020.  Data Provenance Assurance for Cloud Storage Using Blockchain. 2020 International Conference on Advances in Computing, Communication Materials (ICACCM). :443—448.

Cloud forensics investigates the crime committed over cloud infrastructures like SLA-violations and storage privacy. Cloud storage forensics is the process of recording the history of the creation and operations performed on a cloud data object and investing it. Secure data provenance in the Cloud is crucial for data accountability, forensics, and privacy. Towards this, we present a Cloud-based data provenance framework using Blockchain, which traces data record operations and generates provenance data. Initially, we design a dropbox like application using AWS S3 storage. The application creates a cloud storage application for the students and faculty of the university, thereby making the storage and sharing of work and resources efficient. Later, we design a data provenance mechanism for confidential files of users using Ethereum blockchain. We also evaluate the proposed system using performance parameters like query and transaction latency by varying the load and number of nodes of the blockchain network.

2021-08-12
Jaigirdar, Fariha Tasmin, Rudolph, Carsten, Bain, Chris.  2020.  Prov-IoT: A Security-Aware IoT Provenance Model. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1360—1367.
A successful application of an Internet of Things (IoT) based network depends on the accurate and successful delivery of a large amount of data collected from numerous sources. However, the highly dynamic nature of IoT network prevents the establishment of clear security perimeters and hampers the understanding of security aspects. Risk assessment in such networks requires good situational awareness with respect to security. Therefore, a comprehensive view of data propagation including information on security controls can improve security analysis and risk assessment in each layer of data propagation in an IoT architecture. Documentation of metadata is already used in data provenance to identify who generates which data, how, and when. However, documentation of security information is not seen as relevant for data provenance graphs. In this paper, we discuss the importance of adding security metadata in a data provenance graph. We propose a novel IoT Provenance model, Prov-IoT, which documents the history of data records considering data processing and aggregation along with security metadata to enable a foundation for trust in data. The model portrays a comprehensive framework and outlines the identification of information to be included in designing a security-aware provenance graph. This can be beneficial for uncovering system fault or intrusion. Also, it can be useful for decision-based systems for security analysis and risk estimation. We design an associated class diagram for the Prov-IoT model. Finally, we use an IoT healthcare example scenario to demonstrate the impact of the proposed model.
2021-05-05
Tabiban, Azadeh, Jarraya, Yosr, Zhang, Mengyuan, Pourzandi, Makan, Wang, Lingyu, Debbabi, Mourad.  2020.  Catching Falling Dominoes: Cloud Management-Level Provenance Analysis with Application to OpenStack. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

The dynamicity and complexity of clouds highlight the importance of automated root cause analysis solutions for explaining what might have caused a security incident. Most existing works focus on either locating malfunctioning clouds components, e.g., switches, or tracing changes at lower abstraction levels, e.g., system calls. On the other hand, a management-level solution can provide a big picture about the root cause in a more scalable manner. In this paper, we propose DOMINOCATCHER, a novel provenance-based solution for explaining the root cause of security incidents in terms of management operations in clouds. Specifically, we first define our provenance model to capture the interdependencies between cloud management operations, virtual resources and inputs. Based on this model, we design a framework to intercept cloud management operations and to extract and prune provenance metadata. We implement DOMINOCATCHER on OpenStack platform as an attached middleware and validate its effectiveness using security incidents based on real-world attacks. We also evaluate the performance through experiments on our testbed, and the results demonstrate that DOMINOCATCHER incurs insignificant overhead and is scalable for clouds.

2021-03-15
Perkins, J., Eikenberry, J., Coglio, A., Rinard, M..  2020.  Comprehensive Java Metadata Tracking for Attack Detection and Repair. 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :39—51.

We present ClearTrack, a system that tracks meta-data for each primitive value in Java programs to detect and nullify a range of vulnerabilities such as integer overflow/underflow and SQL/command injection vulnerabilities. Contributions include new techniques for eliminating false positives associated with benign integer overflows and underflows, new metadata-aware techniques for detecting and nullifying SQL/command command injection attacks, and results from an independent evaluation team. These results show that 1) ClearTrack operates successfully on Java programs comprising hundreds of thousands of lines of code (including instrumented jar files and Java system libraries, the majority of the applications comprise over 3 million lines of code), 2) because of computations such as cryptography and hash table calculations, these applications perform millions of benign integer overflows and underflows, and 3) ClearTrack successfully detects and nullifies all tested integer overflow and underflow and SQL/command injection vulnerabilities in the benchmark applications.

2021-08-31
Natarajan, K, Shaik, Vaheedbasha.  2020.  Transparent Data Encryption: Comparative Analysis and Performance Evaluation of Oracle Databases. 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). :137—142.
This Transparent Data Encryption (TDE) can provide enormous benefits to the Relational Databases in the aspects of Data Security, Cryptographic Encryption, and Compliances. For every transaction, the stored data must be decrypted before applying the updates as well as should be encrypted before permanently storing back at the storage level. By adding this extra functionality to the database, the general thinking denotes that the Database (DB) going to hit some performance overhead at the CPU and storage level. However, The Oracle Corporation has adversely claimed that their latest Oracle DB version 19c TDE feature can provide significant improvement in the optimization of CPU and no overhead at the storage level for data processing. Impressively, it is true. the results of this paper prove too. Most interestingly the results also revealed about highly impacted components in the servers which are not yet disclosed in any of the previous research work. This paper completely concentrates on CPU, IO, and RAM performance analysis and identifying the bottlenecks along with possible solutions.
2021-03-22
OGISO, S., Mohri, M., Shiraishi, Y..  2020.  Transparent Provable Data Possession Scheme for Cloud Storage. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–5.
Provable Data Possession (PDP) is one of the data security techniques to make sure that the data stored in the cloud storage exists. In PDP, the integrity of the data stored in the cloud storage is probabilistically verified by the user or a third-party auditor. In the conventional PDP, the user creates the metadata used for audition. From the viewpoint of user convenience, it is desirable to be able to audit without operations other than uploading. In other words, the challenge is to provide a transparent PDP that verifies the integrity of files according to the general cloud storage system model so as not to add operations to users. We propose a scheme in which the cloud generates the metadata used during verification, and the user only uploads files. It is shown that the proposed scheme is resistant to the forgery of cloud proof and the acquisition of data by a third-party auditor.
2020-03-30
Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
2020-12-11
Sabek, I., Chandramouli, B., Minhas, U. F..  2019.  CRA: Enabling Data-Intensive Applications in Containerized Environments. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1762—1765.
Today, a modern data center hosts a wide variety of applications comprising batch, interactive, machine learning, and streaming applications. In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA). In parallel, another trend, with containerization technologies (e.g., Docker), has taken a serious hold on cloud-scale data centers, with direct implications on building next generation of data center applications. Container orchestrators (e.g., Kubernetes) have made deployment a lot easy, and they solve many infrastructure level problems, e.g., service discovery, auto-restart, and replication. For best in class performance, there is a need to marry the next generation applications with containerization technologies. To that end, CRA leverages and builds upon the containerization and resource orchestration capabilities of Kubernetes/Docker, and makes it easy to build a wide range of cloud-edge applications on top. To the best of our knowledge, we are the first to present a cloud native runtime for building data center applications. We show the efficiency of CRA through various micro-benchmarking experiments.
2020-11-30
Georgakopoulos, D..  2019.  A Global IoT Device Discovery and Integration Vision. 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). :214–221.
This paper presents the vision of establishing a global service for Global IoT Device Discovery and Integration (GIDDI). The establishment of a GIDDI will: (1) make IoT application development more efficient and cost-effective via enabling sharing and reuse of existing IoT devices owned and maintained by different providers, and (2) promote deployment of new IoT devices supported by a revenue generation scheme for their providers. More specifically, this paper proposes a distributed IoT blockchain ledger that is specifically designed for managing the metadata needed to describe IoT devices and the data they produce. This GIDDI Blockchain is Internet-owned (i.e., it is not controlled by any individual or organization) and is Internet-scaled (i.e., it can support the discovery and reuse billions of IoT devices). The paper also proposes a GIDDI Marketplace that provides the functionality needed for IoT device registration, query, integration, payment and security via the proposed GIDDI Blockchain. We outline the GIDDI Blockchain and Marketplace implementation. We also discuss ongoing research for automatically mining the IoT Device metadata needed for IoT Device query and integration from the data produce. This significantly reduces the need for IoT device providers to supply the metadata descriptions the devices and the data they produce during the registration of IoT Devices in the GIDDI Blockchain.
2020-09-14
Wu, Pengfei, Deng, Robert, Shen, Qingni, Liu, Ximeng, Li, Qi, Wu, Zhonghai.  2019.  ObliComm: Towards Building an Efficient Oblivious Communication System. IEEE Transactions on Dependable and Secure Computing. :1–1.
Anonymous Communication (AC) hides traffic patterns and protects message metadata from being leaked during message transmission. Many practical AC systems have been proposed aiming to reduce communication latency and support a large number of users. However, how to design AC systems which possess strong security property and at the same time achieve optimal performance (i.e., the lowest latency or highest horizontal scalability) has been a challenging problem. In this paper, we propose an ObliComm framework, which consists of six modular AC subroutines. We also present a strong security definition for AC, named oblivious communication, encompassing confidentiality, unobservability, and a new requirement sending-and-receiving operation hiding. The AC subroutines in ObliComm allow for modular construction of oblivious communication systems in different network topologies. All constructed systems satisfy oblivious communication definition and can be provably secure in the universal composability (UC) framework. Additionally, we model the relationship between the network topology and communication measurements by queuing theory, which enables the system's efficiency can be optimized and estimated by quantitative analysis and calculation. Through theoretical analyses and empirical experiments, we demonstrate the efficiency of our scheme and soundness of the queuing model.
2020-12-11
Kumar, S., Vasthimal, D. K..  2019.  Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.
Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.
2020-02-10
Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna et al..  2019.  Securing HPC using Federated Authentication. 2019 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.
Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security. Integrating with the user's more frequently used account at their primary organization both provides a better experience to the end user and makes account compromise or changes in affiliation more likely to be noticed and acted upon. Additionally, with many organizations transitioning to multi-factor authentication for all account access, the ability to leverage external federated identity management systems provides the benefit of their efforts without the additional overhead of separately implementing a distinct multi-factor authentication process. This paper describes our experiences and the lessons we learned by enabling federated authentication with the U.S. Government PKI and In Common Federation, scaling it up to the user base of a production HPC system, and the motivations behind those choices. We have received only positive feedback from our users.
2020-02-18
Hasslinger, Gerhard, Ntougias, Konstantinos, Hasslinger, Frank, Hohlfeld, Oliver.  2019.  Fast and Efficient Web Caching Methods Regarding the Size and Performance Measures per Data Object. 2019 IEEE 24th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1–7.

Caching methods are developed since 50 years for paging in CPU and database systems, and since 25 years for web caching as main application areas among others. Pages of unique size are usual in CPU caches, whereas web caches are storing data chunks of different size in a widely varying range. We study the impact of different object sizes on the performance and the overhead of web caching. This entails different caching goals, starting from the byte and object hit ratio to a generalized value hit ratio for optimized costs and benefits of caching regarding traffic engineering (TE), reduced delays and other QoS measures. The selection of the cache contents turns out to be crucial for the web cache efficiency with awareness of the size and other properties in a score for each object. We introduce a new class of rank exchange caching methods and show how their performance compares to other strategies with extensions needed to include the size and scores for QoS and TE caching goals. Finally, we derive bounds on the object, byte and value hit ratio for the independent request model (IRM) based on optimum knapsack solutions of the cache content.

2020-03-30
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
2020-12-11
Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-03-30
Mao, Huajian, Chi, Chenyang, Yu, Jinghui, Yang, Peixiang, Qian, Cheng, Zhao, Dongsheng.  2019.  QRStream: A Secure and Convenient Method for Text Healthcare Data Transferring. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). :3458–3462.
With the increasing of health awareness, the users become more and more interested in their daily health information and healthcare activities results from healthcare organizations. They always try to collect them together for better usage. Traditionally, the healthcare data is always delivered by paper format from the healthcare organizations, and it is not easy and convenient for data usage and management. They would have to translate these data on paper to digital version which would probably introduce mistakes into the data. It would be necessary if there is a secure and convenient method for electronic health data transferring between the users and the healthcare organizations. However, for the security and privacy problems, almost no healthcare organization provides a stable and full service for health data delivery. In this paper, we propose a secure and convenient method, QRStream, which splits original health data and loads them onto QR code frame streaming for the data transferring. The results shows that QRStream can transfer text health data smoothly with an acceptable performance, for example, transferring 10K data in 10 seconds.
2020-07-10
Ra, Gyeong-Jin, Lee, Im-Yeong.  2019.  A Study on Hybrid Blockchain-based XGS (XOR Global State) Injection Technology for Efficient Contents Modification and Deletion. 2019 Sixth International Conference on Software Defined Systems (SDS). :300—305.

Blockchain is a database technology that provides the integrity and trust of the system can't make arbitrary modifications and deletions by being an append-only distributed ledger. That is, the blockchain is not a modification or deletion but a CRAB (Create-Retrieve-Append-Burn) method in which data can be read and written according to a legitimate user's access right(For example, owner private key). However, this can not delete the created data once, which causes problems such as privacy breach. In this paper, we propose an on-off block-chained Hybrid Blockchain system to separate the data and save the connection history to the blockchain. In addition, the state is changed to the distributed database separately from the ledger record, and the state is changed by generating the arbitrary injection in the XOR form, so that the history of modification / deletion of the Off Blockchain can be efficiently retrieved.