Visible to the public Biblio

Filters: Keyword is Provenance  [Clear All Filters]
2019-09-23
Ramijak, Dusan, Pal, Amitangshu, Kant, Krishna.  2018.  Pattern Mining Based Compression of IoT Data. Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking. :12:1–12:6.
The increasing proliferation of the Internet of Things (IoT) devices and systems result in large amounts of highly heterogeneous data to be collected. Although at least some of the collected sensor data is often consumed by the real-time decision making and control of the IoT system, that is not the only use of such data. Invariably, the collected data is stored, perhaps in some filtered or downselected fashion, so that it can be used for a variety of lower-frequency operations. It is expected that in a smart city environment with numerous IoT deployments, the volume of such data can become enormous. Therefore, mechanisms for lossy data compression that provide a trade-off between compression ratio and data usefulness for offline statistical analysis becomes necessary. In this paper, we discuss several simple pattern mining based compression strategies for multi-attribute IoT data streams. For each method, we evaluate the compressibility of the method vs. the level of similarity between original and compressed time series in the context of the home energy management system.
Psallidas, Fotis, Wu, Eugene.  2018.  Provenance for Interactive Visualizations. Proceedings of the Workshop on Human-In-the-Loop Data Analytics. :9:1–9:8.
We highlight the connections between data provenance and interactive visualizations. To do so, we first incrementally add interactions to a visualization and show how these interactions are readily expressible in terms of provenance. We then describe how an interactive visualization system that natively supports provenance can be easily extended with novel interactions.
Kalokyri, Varvara, Borgida, Alexander, Marian, Amélie.  2018.  YourDigitalSelf: A Personal Digital Trace Integration Tool. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1963–1966.
Personal information is typically fragmented across multiple, heterogeneous, distributed sources and saved as small, heterogeneous data objects, or traces. The DigitalSelf project at Rutgers University focuses on developing tools and techniques to manage (organize, search, summarize, make inferences on and personalize) such heterogeneous collections of personal digital traces. We propose to demonstrate YourDigitalSelf, a mobile phone-based personal information organization application developed as part of the DigitalSelf project. The demonstration will use a sample user data set to show how several disparate data traces can be integrated and combined to create personal narratives, or coherent episodes, of the user's activities. Conference attendees will be given the option to install YourDigitalSelf on their own devices to interact with their own data.
Pham, Quan, Malik, Tanu, That, Dai Hai Ton, Youngdahl, Andrew.  2018.  Improving Reproducibility of Distributed Computational Experiments. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. :2:1–2:6.
Conference and journal publications increasingly require experiments associated with a submitted article to be repeatable. Authors comply to this requirement by sharing all associated digital artifacts, i.e., code, data, and environment configuration scripts. To ease aggregation of the digital artifacts, several tools have recently emerged that automate the aggregation of digital artifacts by auditing an experiment execution and building a portable container of code, data, and environment. However, current tools only package non-distributed computational experiments. Distributed computational experiments must either be packaged manually or supplemented with sufficient documentation. In this paper, we outline the reproducibility requirements of distributed experiments using a distributed computational science experiment involving use of message-passing interface (MPI), and propose a general method for auditing and repeating distributed experiments. Using Sciunit we show how this method can be implemented. We validate our method with initial experiments showing application re-execution runtime can be improved by 63% with a trade-off of longer run-time on initial audit execution.
Stodden, Victoria, Krafczyk, Matthew S., Bhaskar, Adhithya.  2018.  Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. :3:1–3:5.
The ability to independently regenerate published computational claims is widely recognized as a key component of scientific reproducibility. In this article we take a narrow interpretation of this goal, and attempt to regenerate published claims from author-supplied information, including data, code, inputs, and other provided specifications, on a different computational system than that used by the original authors. We are motivated by Claerbout and Donoho's exhortation of the importance of providing complete information for reproducibility of the published claim. We chose the Elsevier journal, the Journal of Computational Physics, which has stated author guidelines that encourage the availability of computational digital artifacts that support scholarly findings. In an IRB approved study at the University of Illinois at Urbana-Champaign (IRB \#17329) we gathered artifacts from a sample of authors who published in this journal in 2016 and 2017. We then used the ICERM criteria generated at the 2012 ICERM workshop "Reproducibility in Computational and Experimental Mathematics" to evaluate the sufficiency of the information provided in the publications and the ease with which the digital artifacts afforded computational reproducibility. We find that, for the articles for which we obtained computational artifacts, we could not easily regenerate the findings for 67% of them, and we were unable to easily regenerate all the findings for any of the articles. We then evaluated the artifacts we did obtain (55 of 306 articles) and find that the main barriers to computational reproducibility are inadequate documentation of code, data, and workflow information (70.9%), missing code function and setting information, and missing licensing information (75%). We recommend improvements based on these findings, including the deposit of supporting digital artifacts for reproducibility as a condition of publication, and verification of computational findings via re-execution of the code when possible.
Whittaker, Michael, Teodoropol, Cristina, Alvaro, Peter, Hellerstein, Joseph M..  2018.  Debugging Distributed Systems with Why-Across-Time Provenance. Proceedings of the ACM Symposium on Cloud Computing. :333–346.
Systematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event's causal history is an over-approximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon.
Psallidas, Fotis, Wu, Eugene.  2018.  Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications. Proceedings of the 2018 International Conference on Management of Data. :1781–1784.
Data lineage is a fundamental type of information that describes the relationships between input and output data items in a workflow. As such, an immense amount of data-intensive applications with logic over the input-output relationships can be expressed declaratively in lineage terms. Unfortunately, many applications resort to hand-tuned implementations because either lineage systems are not fast enough to meet their requirements or due to no knowledge of the lineage capabilities. Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. In this demonstration, we showcase lineage as the building block across a variety of data-intensive applications, including tooltips and details on demand; crossfilter; and data profiling. In addition, we show how Smoke outperforms alternative lineage systems to meet or improve on existing hand-tuned implementations of these applications.
Chen, W., Liang, X., Li, J., Qin, H., Mu, Y., Wang, J..  2018.  Blockchain Based Provenance Sharing of Scientific Workflows. 2018 IEEE International Conference on Big Data (Big Data). :3814–3820.
In a research community, the provenance sharing of scientific workflows can enhance distributed research cooperation, experiment reproducibility verification and experiment repeatedly doing. Considering that scientists in such a community are often in a loose relation and distributed geographically, traditional centralized provenance sharing architectures have shown their disadvantages in poor trustworthiness, reliabilities and efficiency. Additionally, they are also difficult to protect the rights and interests of data providers. All these have been largely hindering the willings of distributed scientists to share their workflow provenance. Considering the big advantages of blockchain in decentralization, trustworthiness and high reliability, an approach to sharing scientific workflow provenance based on blockchain in a research community is proposed. To make the approach more practical, provenance is handled on-chain and original data is delivered off-chain. A kind of block structure to support efficient provenance storing and retrieving is designed, and an algorithm for scientists to search workflow segments from provenance as well as an algorithm for experiments backtracking are provided to enhance the experiment result sharing, save computing resource and time cost by avoiding repeated experiments as far as possible. Analyses show that the approach is efficient and effective.
Yazici, I. M., Karabulut, E., Aktas, M. S..  2018.  A Data Provenance Visualization Approach. 2018 14th International Conference on Semantics, Knowledge and Grids (SKG). :84–91.
Data Provenance has created an emerging requirement for technologies that enable end users to access, evaluate, and act on the provenance of data in recent years. In the era of Big Data, the amount of data created by corporations around the world has grown each year. As an example, both in the Social Media and e-Science domains, data is growing at an unprecedented rate. As the data has grown rapidly, information on the origin and lifecycle of the data has also grown. In turn, this requires technologies that enable the clarification and interpretation of data through the use of data provenance. This study proposes methodologies towards the visualization of W3C-PROV-O Specification compatible provenance data. The visualizations are done by summarization and comparison of the data provenance. We facilitated the testing of these methodologies by providing a prototype, extending an existing open source visualization tool. We discuss the usability of the proposed methodologies with an experimental study; our initial results show that the proposed approach is usable, and its processing overhead is negligible.
Suriarachchi, I., Withana, S., Plale, B..  2018.  Big Provenance Stream Processing for Data Intensive Computations. 2018 IEEE 14th International Conference on e-Science (e-Science). :245–255.
In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.
Zheng, N., Alawini, A., Ives, Z. G..  2019.  Fine-Grained Provenance for Matching ETL. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :184–195.
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks - for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
2019-03-04
Krishnamurthy, R., Meinel, M., Haupt, C., Schreiber, A., Mader, P..  2018.  DLR Secure Software Engineering. 2018 IEEE/ACM 1st International Workshop on Security Awareness from Design to Deployment (SEAD). :49–50.
DLR as research organization increasingly faces the task to share its self-developed software with partners or publish openly. Hence, it is very important to harden the softwares to avoid opening attack vectors. Especially since DLR software is typically not developed by software engineering or security experts. In this paper we describe the data-oriented approach of our new found secure software engineering group to improve the software development process towards more secure software. Therefore, we have a look at the automated security evaluation of software as well as the possibilities to capture information about the development process. Our aim is to use our information sources to improve software development processes to produce high quality secure software.
2019-02-08
Ioini, N. E., Pahl, C..  2018.  Trustworthy Orchestration of Container Based Edge Computing Using Permissioned Blockchain. 2018 Fifth International Conference on Internet of Things: Systems, Management and Security. :147-154.
The need to process the verity, volume and velocity of data generated by today's Internet of Things (IoT) devices has pushed both academia and the industry to investigate new architectural alternatives to support the new challenges. As a result, Edge Computing (EC) has emerged to address these issues, by placing part of the cloud resources (e.g., computation, storage, logic) closer to the edge of the network, which allows faster and context dependent data analysis and storage. However, as EC infrastructures grow, different providers who do not necessarily trust each other need to collaborate in order serve different IoT devices. In this context, EC infrastructures, IoT devices and the data transiting the network all need to be subject to identity and provenance checks, in order to increase trust and accountability. Each device/data in the network needs to be identified and the provenance of its actions needs to be tracked. In this paper, we propose a blockchain container based architecture that implements the W3C-PROV Data Model, to track identities and provenance of all orchestration decisions of a business network. This architecture provides new forms of interaction between the different stakeholders, which supports trustworthy transactions and leads to a new decentralized interaction model for IoT based applications.
2018-07-06
Baracaldo, Nathalie, Chen, Bryant, Ludwig, Heiko, Safavi, Jaehoon Amir.  2017.  Mitigating Poisoning Attacks on Machine Learning Models: A Data Provenance Based Approach. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :103–110.
The use of machine learning models has become ubiquitous. Their predictions are used to make decisions about healthcare, security, investments and many other critical applications. Given this pervasiveness, it is not surprising that adversaries have an incentive to manipulate machine learning models to their advantage. One way of manipulating a model is through a poisoning or causative attack in which the adversary feeds carefully crafted poisonous data points into the training set. Taking advantage of recently developed tamper-free provenance frameworks, we present a methodology that uses contextual information about the origin and transformation of data points in the training set to identify poisonous data, thereby enabling online and regularly re-trained machine learning applications to consume data sources in potentially adversarial environments. To the best of our knowledge, this is the first approach to incorporate provenance information as part of a filtering algorithm to detect causative attacks. We present two variations of the methodology - one tailored to partially trusted data sets and the other to fully untrusted data sets. Finally, we evaluate our methodology against existing methods to detect poison data and show an improvement in the detection rate.
2018-03-26
Pandey, M., Pandey, R., Chopra, U. K..  2017.  Rendering Trustability to Semantic Web Applications-Manchester Approach. 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS). :255–259.

The Semantic Web today is a web that allows for intelligent knowledge retrieval by means of semantically annotated tags. This web also known as Intelligent web aims to provide meaningful information to man and machines equally. However, the information thus provided lacks the component of trust. Therefore we propose a method to embed trust in semantic web documents by the concept of provenance which provides answers to who, when, where and by whom the documents were created or modified. This paper demonstrates the same using the Manchester approach of provenance implemented in a University Ontology.

2017-12-12
Miller, J. A., Peng, H., Cotterell, M. E..  2017.  Adding Support for Theory in Open Science Big Data. 2017 IEEE World Congress on Services (SERVICES). :71–75.

Open Science Big Data is emerging as an important area of research and software development. Although there are several high quality frameworks for Big Data, additional capabilities are needed for Open Science Big Data. These include data provenance, citable reusable data, data sources providing links to research literature, relationships to other data and theories, transparent analysis/reproducibility, data privacy, new optimizations/advanced algorithms, data curation, data storage and transfer. An important part of science is explanation of results, ideally leading to theory formation. In this paper, we examine means for supporting the use of theory in big data analytics as well as using big data to assist in theory formation. One approach is to fit data in a way that is compatible with some theory, existing or new. Functional Data Analysis allows precise fitting of data as well as penalties for lack of smoothness or even departure from theoretical expectations. This paper discusses principal differential analysis and related techniques for fitting data where, for example, a time-based process is governed by an ordinary differential equation. Automation in theory formation is also considered. Case studies in the fields of computational economics and finance are considered.

Jiang, L., Kuhn, W., Yue, P..  2017.  An interoperable approach for Sensor Web provenance. 2017 6th International Conference on Agro-Geoinformatics. :1–6.

The Sensor Web is evolving into a complex information space, where large volumes of sensor observation data are often consumed by complex applications. Provenance has become an important issue in the Sensor Web, since it allows applications to answer “what”, “when”, “where”, “who”, “why”, and “how” queries related to observations and consumption processes, which helps determine the usability and reliability of data products. This paper investigates characteristics and requirements of provenance in the Sensor Web and proposes an interoperable approach to building a provenance model for the Sensor Web. Our provenance model extends the W3C PROV Data Model with Sensor Web domain vocabularies. It is developed using Semantic Web technologies and thus allows provenance information of sensor observations to be exposed in the Web of Data using the Linked Data approach. A use case illustrates the applicability of the approach.

Suh, Y. K., Ma, J..  2017.  SuperMan: A Novel System for Storing and Retrieving Scientific-Simulation Provenance for Efficient Job Executions on Computing Clusters. 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W). :283–288.

Compute-intensive simulations typically charge substantial workloads on an online simulation platform backed by limited computing clusters and storage resources. Some (or most) of the simulations initiated by users may accompany input parameters/files that have been already provided by other (or same) users in the past. Unfortunately, these duplicate simulations may aggravate the performance of the platform by drastic consumption of the limited resources shared by a number of users on the platform. To minimize or avoid conducting repeated simulations, we present a novel system, called SUPERMAN (SimUlation ProvEnance Recycling MANager) that can record simulation provenances and recycle the results of past simulations. This system presents a great opportunity to not only reutilize existing results but also perform various analytics helpful for those who are not familiar with the platform. The system also offers interoperability across other systems by collecting the provenances in a standardized format. In our simulated experiments we found that over half of past computing jobs could be answered without actual executions by our system.

Stephan, E., Raju, B., Elsethagen, T., Pouchard, L., Gamboa, C..  2017.  A scientific data provenance harvester for distributed applications. 2017 New York Scientific Data Summit (NYSDS). :1–9.

Data provenance provides a way for scientists to observe how experimental data originates, conveys process history, and explains influential factors such as experimental rationale and associated environmental factors from system metrics measured at runtime. The US Department of Energy Office of Science Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project has developed a provenance harvester that is capable of collecting observations from file based evidence typically produced by distributed applications. To achieve this, file based evidence is extracted and transformed into an intermediate data format inspired in part by W3C CSV on the Web recommendations, called the Harvester Provenance Application Interface (HAPI) syntax. This syntax provides a general means to pre-stage provenance into messages that are both human readable and capable of being written to a provenance store, Provenance Environment (ProvEn). HAPI is being applied to harvest provenance from climate ensemble runs for Accelerated Climate Modeling for Energy (ACME) project funded under the U.S. Department of Energy's Office of Biological and Environmental Research (BER) Earth System Modeling (ESM) program. ACME informally provides provenance in a native form through configuration files, directory structures, and log files that contain success/failure indicators, code traces, and performance measurements. Because of its generic format, HAPI is also being applied to harvest tabular job management provenance from Belle II DIRAC scheduler relational database tables as well as other scientific applications that log provenance related information.

Dai, D., Chen, Y., Carns, P., Jenkins, J., Ross, R..  2017.  Lightweight Provenance Service for High-Performance Computing. 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). :117–129.

Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.

Bertino, E., Kantarcioglu, M..  2017.  A Cyber-Provenance Infrastructure for Sensor-Based Data-Intensive Applications. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :108–114.

Summary form only given. Strong light-matter coupling has been recently successfully explored in the GHz and THz [1] range with on-chip platforms. New and intriguing quantum optical phenomena have been predicted in the ultrastrong coupling regime [2], when the coupling strength Ω becomes comparable to the unperturbed frequency of the system ω. We recently proposed a new experimental platform where we couple the inter-Landau level transition of an high-mobility 2DEG to the highly subwavelength photonic mode of an LC meta-atom [3] showing very large Ω/ωc = 0.87. Our system benefits from the collective enhancement of the light-matter coupling which comes from the scaling of the coupling Ω ∝ √n, were n is the number of optically active electrons. In our previous experiments [3] and in literature [4] this number varies from 104-103 electrons per meta-atom. We now engineer a new cavity, resonant at 290 GHz, with an extremely reduced effective mode surface Seff = 4 × 10-14 m2 (FE simulations, CST), yielding large field enhancements above 1500 and allowing to enter the few (textless;100) electron regime. It consist of a complementary metasurface with two very sharp metallic tips separated by a 60 nm gap (Fig.1(a, b)) on top of a single triangular quantum well. THz-TDS transmission experiments as a function of the applied magnetic field reveal strong anticrossing of the cavity mode with linear cyclotron dispersion. Measurements for arrays of only 12 cavities are reported in Fig.1(c). On the top horizontal axis we report the number of electrons occupying the topmost Landau level as a function of the magnetic field. At the anticrossing field of B=0.73 T we measure approximately 60 electrons ultra strongly coupled (Ω/ω- textbartextbar

Polyzos, G. C., Fotiou, N..  2017.  Blockchain-Assisted Information Distribution for the Internet of Things. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :75–78.

The Internet of Things (IoT) is envisioned to include billions of pervasive and mission-critical sensors and actuators connected to the (public) Internet. This network of smart devices is expected to generate and have access to vast amounts of information, creating unique opportunities for novel applications but, at the same time raising significant privacy and security concerns that impede its further adoption and development. In this paper, we explore the potential of a blockchain-assisted information distribution system for the IoT. We identify key security requirements of such a system and we discuss how they can be satisfied using blockchains and smart contracts. Furthermore, we present a preliminary design of the system and we identify enabling technologies.

Davis, D. B., Featherston, J., Fukuda, M., Asuncion, H. U..  2017.  Data Provenance for Multi-Agent Models. 2017 IEEE 13th International Conference on e-Science (e-Science). :39–48.

Multi-agent simulations are useful for exploring collective patterns of individual behavior in social, biological, economic, network, and physical systems. However, there is no provenance support for multi-agent models (MAMs) in a distributed setting. To this end, we introduce ProvMASS, a novel approach to capture provenance of MAMs in a distributed memory by combining inter-process identification, lightweight coordination of in-memory provenance storage, and adaptive provenance capture. ProvMASS is built on top of the Multi-Agent Spatial Simulation (MASS) library, a framework that combines multi-agent systems with large-scale fine-grained agent-based models, or MAMs. Unlike other environments supporting MAMs, MASS parallelizes simulations with distributed memory, where agents and spatial data are shared application resources. We evaluate our approach with provenance queries to support three use cases and performance measures. Initial results indicate that our approach can support various provenance queries for MAMs at reasonable performance overhead.

That, D. H. T., Fils, G., Yuan, Z., Malik, T..  2017.  Sciunits: Reusable Research Objects. 2017 IEEE 13th International Conference on e-Science (e-Science). :374–383.

Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences.

Sowmyadevi, D., Karthikeyan, K..  2017.  Merkle-Hellman knapsack-side channel monitoring based secure scheme for detecting provenance forgery and selfish nodes in wireless sensor networks. 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1–8.

Provenance counterfeit and packet loss assaults are measured as threats in the large scale wireless sensor networks which are engaged for diverse application domains. The assortments of information source generate necessitate promising the reliability of information such as only truthful information is measured in the decision procedure. Details about the sensor nodes play an major role in finding trust value of sensor nodes. In this paper, a novel lightweight secure provenance method is initiated for improving the security of provenance data transmission. The anticipated system comprises provenance authentication and renovation at the base station by means of Merkle-Hellman knapsack algorithm based protected provenance encoding in the Bloom filter framework. Side Channel Monitoring (SCM) is exploited for noticing the presence of selfish nodes and packet drop behaviors. This lightweight secure provenance method decreases the energy and bandwidth utilization with well-organized storage and secure data transmission. The investigational outcomes establishes the efficacy and competence of the secure provenance secure system by professionally noticing provenance counterfeit and packet drop assaults which can be seen from the assessment in terms of provenance confirmation failure rate, collection error, packet drop rate, space complexity, energy consumption, true positive rate, false positive rate and packet drop attack detection.