Visible to the public Biblio

Filters: Keyword is Provenance  [Clear All Filters]
2020-04-13
R P, Jagadeesh Chandra Bose, Singi, Kapil, Kaulgud, Vikrant, Phokela, Kanchanjot Kaur, Podder, Sanjay.  2019.  Framework for Trustworthy Software Development. 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). :45–48.
Intelligent software applications are becoming ubiquitous and pervasive affecting various aspects of our lives and livelihoods. At the same time, the risks to which these systems expose the organizations and end users are growing dramatically. Trustworthiness of software applications is becoming a paramount necessity. Trust is to be regarded as a first-class citizen in the total product life cycle and should be addressed across all stages of software development. Trust can be looked at from two facets: one at an algorithmic level (e.g., bias-free, discrimination-aware, explainable and interpretable techniques) and the other at a process level by making development processes more transparent, auditable, and adhering to regulations and best practices. In this paper, we address the latter and propose a blockchain enabled governance framework for building trustworthy software. Our framework supports the recording, monitoring, and analysis of various activities throughout the application development life cycle thereby bringing in transparency and auditability. It facilitates the specification of regulations and best practices and verifies for its adherence raising alerts of non-compliance and prescribes remedial measures.
2020-03-30
Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Scherzinger, Stefanie, Seifert, Christin, Wiese, Lena.  2019.  The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning. 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). :1620–1629.
Machine learning experts prefer to think of their input as a single, homogeneous, and consistent data set. However, when analyzing large volumes of data, the entire data set may not be manageable on a single server, but must be stored on a distributed file system instead. Moreover, with the pressing demand to deliver explainable models, the experts may no longer focus on the machine learning algorithms in isolation, but must take into account the distributed nature of the data stored, as well as the impact of any data pre-processing steps upstream in their data analysis pipeline. In this paper, we make the point that even basic transformations during data preparation can impact the model learned, and that this is exacerbated in a distributed setting. We then sketch our vision of end-to-end explainability of the model learned, taking the pre-processing into account. In particular, we point out the potentials of linking the contributions of research on data provenance with the efforts on explainability in machine learning. In doing so, we highlight pitfalls we may experience in a distributed system on the way to generating more holistic explanations for our machine learning models.
Thida, Aye, Shwe, Thanda.  2020.  Process Provenance-based Trust Management in Collaborative Fog Environment. 2020 IEEE Conference on Computer Applications(ICCA). :1–5.
With the increasing popularity and adoption of IoT technology, fog computing has been used as an advancement to cloud computing. Although trust management issues in cloud have been addressed, there are still very few studies in a fog area. Trust is needed for collaborating among fog nodes and trust can further improve the reliability by assisting in selecting the fog nodes to collaborate. To address this issue, we present a provenance based trust mechanism that traces the behavior of the process among fog nodes. Our approach adopts the completion rate and failure rate as the process provenance in trust scores of computing workload, especially obvious measures of trustworthiness. Simulation results demonstrate that the proposed system can effectively be used for collaboration in a fog environment.
Souza, Renan, Azevedo, Leonardo, Lourenço, Vítor, Soares, Elton, Thiago, Raphael, Brandão, Rafael, Civitarese, Daniel, Brazil, Emilio, Moreno, Marcio, Valduriez, Patrick et al..  2019.  Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering. 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). :1–10.
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stackholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle, while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the O&G industry, along with its evaluation using 239,616 CUDA cores in parallel.
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
Jentzsch, Sophie F., Hochgeschwender, Nico.  2019.  Don't Forget Your Roots! Using Provenance Data for Transparent and Explainable Development of Machine Learning Models. 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). :37–40.
Explaining reasoning and behaviour of artificial intelligent systems to human users becomes increasingly urgent, especially in the field of machine learning. Many recent contributions approach this issue with post-hoc methods, meaning they consider the final system and its outcomes, while the roots of included artefacts are widely neglected. However, we argue in this position paper that there needs to be a stronger focus on the development process. Without insights into specific design decisions and meta information that accrue during the development an accurate explanation of the resulting model is hardly possible. To remedy this situation we propose to increase process transparency by applying provenance methods, which serves also as a basis for increased explainability.
Narendra, Nanjangud C., Shukla, Anshu, Nayak, Sambit, Jagadish, Asha, Kalkur, Rachana.  2019.  Genoma: Distributed Provenance as a Service for IoT-based Systems. 2019 IEEE 5th World Forum on Internet of Things (WF-IoT). :755–760.
One of the key aspects of IoT-based systems, which we believe has not been getting the attention it deserves, is provenance. Provenance refers to those actions that record the usage of data in the system, along with the rationale for said usage. Historically, most provenance methods in distributed systems have been tightly coupled with those of the underlying data processing frameworks in such systems. However, in this paper, we argue that IoT provenance requires a different treatment, given the heterogeneity and dynamism of IoT-based systems. In particular, provenance in IoT-based systems should be decoupled as far as possible from the underlying data processing substrates in IoT-based systems.To that end, in this paper, we present Genoma, our ongoing work on a system for provenance-as-a-service in IoT-based systems. By "provenance-as-a-service" we mean the following: distributed provenance across IoT devices, edge and cloud; and agnostic of the underlying data processing substrate. Genoma comprises a set of services that act together to provide useful provenance information to users across the system. We also show how we are realizing Genoma via an implementation prototype built on Apache Atlas and Tinkergraph, through which we are investigating several key research issues in distributed IoT provenance.
Kim, Sejin, Oh, Jisun, Kim, Yoonhee.  2019.  Data Provenance for Experiment Management of Scientific Applications on GPU. 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). :1–4.
Graphics Processing Units (GPUs) are getting popularly utilized for multi-purpose applications in order to enhance highly performed parallelism of computation. As memory virtualization methods in GPU nodes are not efficiently provided to deal with diverse memory usage patterns for these applications, the success of their execution depends on exclusive and limited use of physical memory in GPU environments. Therefore, it is important to predict a pattern change of GPU memory usage during runtime execution of an application. Data provenance extracted from application characteristics, GPU runtime environments, input, and execution patterns from runtime monitoring, is defined for supporting application management to set runtime configuration and predict an experimental result, and utilize resource with co-located applications. In this paper, we define data provenance of an application on GPUs and manage data by profiling the execution of CUDA scientific applications. Data provenance management helps to predict execution patterns of other similar experiments and plan efficient resource configuration.
Tabassum, Anika, Nady, Anannya Islam, Rezwanul Huq, Mohammad.  2019.  Mathematical Formulation and Implementation of Query Inversion Techniques in RDBMS for Tracking Data Provenance. 2019 7th International Conference on Information and Communication Technology (ICoICT). :1–6.
Nowadays the massive amount of data is produced from different sources and lots of applications are processing these data to discover insights. Sometimes we may get unexpected results from these applications and it is not feasible to trace back to the data origin manually to find the source of errors. To avoid this problem, data must be accompanied by the context of how they are processed and analyzed. Especially, data-intensive applications like e-Science always require transparency and therefore, we need to understand how data has been processed and transformed. In this paper, we propose mathematical formulation and implementation of query inversion techniques to trace the provenance of data in a relational database management system (RDBMS). We build mathematical formulations of inverse queries for most of the relational algebra operations and show the formula for join operations in this paper. We, then, implement these formulas of inversion techniques and the experiment shows that our proposed inverse queries can successfully trace back to original data i.e. finding data provenance.
2020-03-09
Flores, Denys A., Jhumka, Arshad.  2019.  Hybrid Logical Clocks for Database Forensics: Filling the Gap between Chain of Custody and Database Auditing. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :224–231.
Database audit records are important for investigating suspicious actions against transactional databases. Their admissibility as digital evidence depends on satisfying Chain of Custody (CoC) properties during their generation, collection and preservation in order to prevent their modification, guarantee action accountability, and allow third-party verification. However, their production has relied on auditing capabilities provided by commercial database systems which may not be effective if malicious users (or insiders) misuse their privileges to disable audit controls, and compromise their admissibility. Hence, in this paper, we propose a forensically-aware distributed database architecture that implements CoC properties as functional requirements to produce admissible audit records. The novelty of our proposal is the use of hybrid logical clocks, which compared with a previous centralised vector-clock architecture, has evident advantages as it (i) allows for more accurate provenance and causality tracking of insider actions, (ii) is more scalable in terms of system size, and (iii) although latency is higher (as expected in distributed environments), 70 per cent of user transactions are executed within acceptable latency intervals.
2019-09-23
Ramijak, Dusan, Pal, Amitangshu, Kant, Krishna.  2018.  Pattern Mining Based Compression of IoT Data. Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking. :12:1–12:6.
The increasing proliferation of the Internet of Things (IoT) devices and systems result in large amounts of highly heterogeneous data to be collected. Although at least some of the collected sensor data is often consumed by the real-time decision making and control of the IoT system, that is not the only use of such data. Invariably, the collected data is stored, perhaps in some filtered or downselected fashion, so that it can be used for a variety of lower-frequency operations. It is expected that in a smart city environment with numerous IoT deployments, the volume of such data can become enormous. Therefore, mechanisms for lossy data compression that provide a trade-off between compression ratio and data usefulness for offline statistical analysis becomes necessary. In this paper, we discuss several simple pattern mining based compression strategies for multi-attribute IoT data streams. For each method, we evaluate the compressibility of the method vs. the level of similarity between original and compressed time series in the context of the home energy management system.
Psallidas, Fotis, Wu, Eugene.  2018.  Provenance for Interactive Visualizations. Proceedings of the Workshop on Human-In-the-Loop Data Analytics. :9:1–9:8.
We highlight the connections between data provenance and interactive visualizations. To do so, we first incrementally add interactions to a visualization and show how these interactions are readily expressible in terms of provenance. We then describe how an interactive visualization system that natively supports provenance can be easily extended with novel interactions.
Kalokyri, Varvara, Borgida, Alexander, Marian, Amélie.  2018.  YourDigitalSelf: A Personal Digital Trace Integration Tool. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1963–1966.
Personal information is typically fragmented across multiple, heterogeneous, distributed sources and saved as small, heterogeneous data objects, or traces. The DigitalSelf project at Rutgers University focuses on developing tools and techniques to manage (organize, search, summarize, make inferences on and personalize) such heterogeneous collections of personal digital traces. We propose to demonstrate YourDigitalSelf, a mobile phone-based personal information organization application developed as part of the DigitalSelf project. The demonstration will use a sample user data set to show how several disparate data traces can be integrated and combined to create personal narratives, or coherent episodes, of the user's activities. Conference attendees will be given the option to install YourDigitalSelf on their own devices to interact with their own data.
Pham, Quan, Malik, Tanu, That, Dai Hai Ton, Youngdahl, Andrew.  2018.  Improving Reproducibility of Distributed Computational Experiments. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. :2:1–2:6.
Conference and journal publications increasingly require experiments associated with a submitted article to be repeatable. Authors comply to this requirement by sharing all associated digital artifacts, i.e., code, data, and environment configuration scripts. To ease aggregation of the digital artifacts, several tools have recently emerged that automate the aggregation of digital artifacts by auditing an experiment execution and building a portable container of code, data, and environment. However, current tools only package non-distributed computational experiments. Distributed computational experiments must either be packaged manually or supplemented with sufficient documentation. In this paper, we outline the reproducibility requirements of distributed experiments using a distributed computational science experiment involving use of message-passing interface (MPI), and propose a general method for auditing and repeating distributed experiments. Using Sciunit we show how this method can be implemented. We validate our method with initial experiments showing application re-execution runtime can be improved by 63% with a trade-off of longer run-time on initial audit execution.
Stodden, Victoria, Krafczyk, Matthew S., Bhaskar, Adhithya.  2018.  Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. :3:1–3:5.
The ability to independently regenerate published computational claims is widely recognized as a key component of scientific reproducibility. In this article we take a narrow interpretation of this goal, and attempt to regenerate published claims from author-supplied information, including data, code, inputs, and other provided specifications, on a different computational system than that used by the original authors. We are motivated by Claerbout and Donoho's exhortation of the importance of providing complete information for reproducibility of the published claim. We chose the Elsevier journal, the Journal of Computational Physics, which has stated author guidelines that encourage the availability of computational digital artifacts that support scholarly findings. In an IRB approved study at the University of Illinois at Urbana-Champaign (IRB \#17329) we gathered artifacts from a sample of authors who published in this journal in 2016 and 2017. We then used the ICERM criteria generated at the 2012 ICERM workshop "Reproducibility in Computational and Experimental Mathematics" to evaluate the sufficiency of the information provided in the publications and the ease with which the digital artifacts afforded computational reproducibility. We find that, for the articles for which we obtained computational artifacts, we could not easily regenerate the findings for 67% of them, and we were unable to easily regenerate all the findings for any of the articles. We then evaluated the artifacts we did obtain (55 of 306 articles) and find that the main barriers to computational reproducibility are inadequate documentation of code, data, and workflow information (70.9%), missing code function and setting information, and missing licensing information (75%). We recommend improvements based on these findings, including the deposit of supporting digital artifacts for reproducibility as a condition of publication, and verification of computational findings via re-execution of the code when possible.
Whittaker, Michael, Teodoropol, Cristina, Alvaro, Peter, Hellerstein, Joseph M..  2018.  Debugging Distributed Systems with Why-Across-Time Provenance. Proceedings of the ACM Symposium on Cloud Computing. :333–346.
Systematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event's causal history is an over-approximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon.
Psallidas, Fotis, Wu, Eugene.  2018.  Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications. Proceedings of the 2018 International Conference on Management of Data. :1781–1784.
Data lineage is a fundamental type of information that describes the relationships between input and output data items in a workflow. As such, an immense amount of data-intensive applications with logic over the input-output relationships can be expressed declaratively in lineage terms. Unfortunately, many applications resort to hand-tuned implementations because either lineage systems are not fast enough to meet their requirements or due to no knowledge of the lineage capabilities. Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. In this demonstration, we showcase lineage as the building block across a variety of data-intensive applications, including tooltips and details on demand; crossfilter; and data profiling. In addition, we show how Smoke outperforms alternative lineage systems to meet or improve on existing hand-tuned implementations of these applications.
Chen, W., Liang, X., Li, J., Qin, H., Mu, Y., Wang, J..  2018.  Blockchain Based Provenance Sharing of Scientific Workflows. 2018 IEEE International Conference on Big Data (Big Data). :3814–3820.
In a research community, the provenance sharing of scientific workflows can enhance distributed research cooperation, experiment reproducibility verification and experiment repeatedly doing. Considering that scientists in such a community are often in a loose relation and distributed geographically, traditional centralized provenance sharing architectures have shown their disadvantages in poor trustworthiness, reliabilities and efficiency. Additionally, they are also difficult to protect the rights and interests of data providers. All these have been largely hindering the willings of distributed scientists to share their workflow provenance. Considering the big advantages of blockchain in decentralization, trustworthiness and high reliability, an approach to sharing scientific workflow provenance based on blockchain in a research community is proposed. To make the approach more practical, provenance is handled on-chain and original data is delivered off-chain. A kind of block structure to support efficient provenance storing and retrieving is designed, and an algorithm for scientists to search workflow segments from provenance as well as an algorithm for experiments backtracking are provided to enhance the experiment result sharing, save computing resource and time cost by avoiding repeated experiments as far as possible. Analyses show that the approach is efficient and effective.
Yazici, I. M., Karabulut, E., Aktas, M. S..  2018.  A Data Provenance Visualization Approach. 2018 14th International Conference on Semantics, Knowledge and Grids (SKG). :84–91.
Data Provenance has created an emerging requirement for technologies that enable end users to access, evaluate, and act on the provenance of data in recent years. In the era of Big Data, the amount of data created by corporations around the world has grown each year. As an example, both in the Social Media and e-Science domains, data is growing at an unprecedented rate. As the data has grown rapidly, information on the origin and lifecycle of the data has also grown. In turn, this requires technologies that enable the clarification and interpretation of data through the use of data provenance. This study proposes methodologies towards the visualization of W3C-PROV-O Specification compatible provenance data. The visualizations are done by summarization and comparison of the data provenance. We facilitated the testing of these methodologies by providing a prototype, extending an existing open source visualization tool. We discuss the usability of the proposed methodologies with an experimental study; our initial results show that the proposed approach is usable, and its processing overhead is negligible.
Suriarachchi, I., Withana, S., Plale, B..  2018.  Big Provenance Stream Processing for Data Intensive Computations. 2018 IEEE 14th International Conference on e-Science (e-Science). :245–255.
In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.
Zheng, N., Alawini, A., Ives, Z. G..  2019.  Fine-Grained Provenance for Matching ETL. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :184–195.
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks - for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
2019-03-04
Krishnamurthy, R., Meinel, M., Haupt, C., Schreiber, A., Mader, P..  2018.  DLR Secure Software Engineering. 2018 IEEE/ACM 1st International Workshop on Security Awareness from Design to Deployment (SEAD). :49–50.
DLR as research organization increasingly faces the task to share its self-developed software with partners or publish openly. Hence, it is very important to harden the softwares to avoid opening attack vectors. Especially since DLR software is typically not developed by software engineering or security experts. In this paper we describe the data-oriented approach of our new found secure software engineering group to improve the software development process towards more secure software. Therefore, we have a look at the automated security evaluation of software as well as the possibilities to capture information about the development process. Our aim is to use our information sources to improve software development processes to produce high quality secure software.
2019-02-08
Ioini, N. E., Pahl, C..  2018.  Trustworthy Orchestration of Container Based Edge Computing Using Permissioned Blockchain. 2018 Fifth International Conference on Internet of Things: Systems, Management and Security. :147-154.

The need to process the verity, volume and velocity of data generated by today's Internet of Things (IoT) devices has pushed both academia and the industry to investigate new architectural alternatives to support the new challenges. As a result, Edge Computing (EC) has emerged to address these issues, by placing part of the cloud resources (e.g., computation, storage, logic) closer to the edge of the network, which allows faster and context dependent data analysis and storage. However, as EC infrastructures grow, different providers who do not necessarily trust each other need to collaborate in order serve different IoT devices. In this context, EC infrastructures, IoT devices and the data transiting the network all need to be subject to identity and provenance checks, in order to increase trust and accountability. Each device/data in the network needs to be identified and the provenance of its actions needs to be tracked. In this paper, we propose a blockchain container based architecture that implements the W3C-PROV Data Model, to track identities and provenance of all orchestration decisions of a business network. This architecture provides new forms of interaction between the different stakeholders, which supports trustworthy transactions and leads to a new decentralized interaction model for IoT based applications.