Visible to the public Biblio

Filters: Keyword is Provenance  [Clear All Filters]
Wittek, Kevin, Wittek, Neslihan, Lawton, James, Dohndorf, Iryna, Weinert, Alexander, Ionita, Andrei.  2021.  A Blockchain-Based Approach to Provenance and Reproducibility in Research Workflows. 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). :1–6.
The traditional Proof of Existence blockchain service on the Bitcoin network can be used to verify the existence of any research data at a specific point of time, and to validate the data integrity, without revealing its content. Several variants of the blockchain service exist to certify the existence of data relying on cryptographic fingerprinting, thus enabling an efficient verification of the authenticity of such certifications. However, nowadays research data is continuously changing and being modified through different processing steps in most scientific research workflows such that certifications of individual data objects seem to be constantly outdated in this setting. This paper describes how the blockchain and distributed ledger technology can be used to form a new certification model, that captures the research process as a whole in a more meaningful way, including the description of the used data through its different stages and the associated computational pipeline, code for analysis and the experimental design. The scientific blockchain infrastructure bloxberg, together with a deep learning based analysis from the behavioral science field are used to show the applicability of the approach.
Sadineni, Lakshminarayana, Pilli, Emmanuel S., Battula, Ramesh Babu.  2021.  Ready-IoT: A Novel Forensic Readiness Model for Internet of Things. 2021 IEEE 7th World Forum on Internet of Things (WF-IoT). :89–94.
Internet of Things (IoT) networks are often attacked to compromise the security and privacy of application data and disrupt the services offered by them. The attacks are being launched at different layers of IoT protocol stack by exploiting their inherent weaknesses. Forensic investigations need substantial artifacts and datasets to support the decisions taken during analysis and while attributing the attack to the adversary. Network provenance plays a crucial role in establishing the relationships between network entities. Hence IoT networks can be made forensic ready so that network provenance may be collected to help in constructing these artifacts. The paper proposes Ready-IoT, a novel forensic readiness model for IoT environment to collect provenance from the network which comprises of both network parameters and traffic. A link layer dataset, Link-IoT Dataset is also generated by querying provenance graphs. Finally, Link-IoT dataset is compared with other IoT datasets to draw a line of difference and applicability to IoT environments. We believe that the proposed features have the potential to detect the attacks performed on the IoT network.
Patil, Sonali, Kadam, Sarika, Katti, Jayashree.  2021.  Security Enhancement of Forensic Evidences Using Blockchain. 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV). :263–268.

In today's digital era, data is most important in every phase of work. The storage and processing on data with security is the need of each and every application field. Data need to be tamper resistant due to possibility of alteration. Data can be represented and stored in heterogeneous format. There are chances of attack on information which is vital for particular organization. With rapid increase in cyber crime, attackers behave maliciously to alter those data. But it is having great impact on forensic evidences which is required for provenance. Therefore, it is required to maintain the reliability and provenance of digital evidences as it travels through various stages during forensic investigation. In this approach, there is a forensic chain in which generated report passes through various levels or intermediaries such as pathology laboratory, doctor, police department etc. To build the transparent system with immutability of forensic evidences, blockchain technology is more suitable. Blockchain technology provides the transfer of assets or evidence reports in transparent environment without central authority. In this paper blockchain based secure system for forensic evidences is proposed. The proposed system is implemented on Ethereum platform. The tampering of forensic evidence can be easily traced at any stage by anyone in the forensic chain. The security enhancement of forensic evidences is achieved through implementation on Ethereum platform with high integrity, traceability and immutability.

Wilms, Daniel, Stoecker, Carsten, Caballero, Juan.  2021.  Data Provenance in Vehicle Data Chains. 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring). :1–5.
With almost every new vehicle being connected, the importance of vehicle data is growing rapidly. Many mobility applications rely on the fusion of data coming from heterogeneous data sources, like vehicle and "smart-city" data or process data generated by systems out of their control. This external data determines much about the behaviour of the relying applications: it impacts the reliability, security and overall quality of the application's input data and ultimately of the application itself. Hence, knowledge about the provenance of that data is a critical component in any data-driven system. The secure traceability of the data handling along the entire processing chain, which passes through various distinct systems, is critical for the detection and avoidance of misuse and manipulation. In this paper, we introduce a mechanism for establishing secure data provenance in real time, demonstrating an exemplary use-case based on a machine learning model that detects dangerous driving situations. We show with our approach based on W3C decentralized identity standards that data provenance in closed data systems can be effectively achieved using technical standards designed for an open data approach.
Abdelnabi, Sahar, Fritz, Mario.  2021.  Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding. 2021 IEEE Symposium on Security and Privacy (SP). :121–140.
Recent advances in natural language generation have introduced powerful language models with high-quality output text. However, this raises concerns about the potential misuse of such models for malicious purposes. In this paper, we study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training that, given an input text and a binary message, generates an output text that is unobtrusively encoded with the given message. We further study different training and inference strategies to achieve minimal changes to the semantics and correctness of the input text.AWT is the first end-to-end model to hide data in text by automatically learning -without ground truth- word substitutions along with their locations in order to encode the message. We empirically show that our model is effective in largely preserving text utility and decoding the watermark while hiding its presence against adversaries. Additionally, we demonstrate that our method is robust against a range of attacks.
Sebastian-Cardenas, D., Gourisetti, S., Mylrea, M., Moralez, A., Day, G., Tatireddy, V., Allwardt, C., Singh, R., Bishop, R., Kaur, K. et al..  2021.  Digital data provenance for the power grid based on a Keyless Infrastructure Security Solution. 2021 Resilience Week (RWS). :1–10.
In this work a data provenance system for grid-oriented applications is presented. The proposed Keyless Infrastructure Security Solution (KISS) provides mechanisms to store and maintain digital data fingerprints that can later be used to validate and assert data provenance using a time-based, hash tree mechanism. The developed solution has been designed to satisfy the stringent requirements of the modern power grid including execution time and storage necessities. Its applicability has been tested using a lab-scale, proof-of-concept deployment that secures an energy management system against the attack sequence observed on the 2016 Ukrainian power grid cyberattack. The results demonstrate a strong potential for enabling data provenance in a wide array of applications, including speed-sensitive applications such as those found in control room environments.
Phua, Thye Way, Patros, Panos, Kumar, Vimal.  2021.  Towards Embedding Data Provenance in Files. 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). :1319–1325.
Data provenance (keeping track of who did what, where, when and how) boasts of various attractive use cases for distributed systems, such as intrusion detection, forensic analysis and secure information dependability. This potential, however, can only be realized if provenance is accessible by its primary stakeholders: the end-users. Existing provenance systems are designed in a `all-or-nothing' fashion, making provenance inaccessible, difficult to extract and crucially, not controlled by its key stakeholders. To mitigate this, we propose that provenance be separated into system, data-specific and file-metadata provenance. Furthermore, we expand data-specific provenance as changes at a fine-grain level, or provenance-per-change, that is recorded alongside its source. We show that with the use of delta-encoding, provenance-per-change is viable, asserting our proposed architecture to be effectively realizable.
Schreiber, Andreas, Sonnekalb, Tim, Kurnatowski, Lynn von.  2021.  Towards Visual Analytics Dashboards for Provenance-driven Static Application Security Testing. 2021 IEEE Symposium on Visualization for Cyber Security (VizSec). :42–46.
The use of static code analysis tools for security audits can be time consuming, as the many existing tools focus on different aspects and therefore development teams often use several of these tools to keep code quality high and prevent security issues. Displaying the results of multiple tools, such as code smells and security warnings, in a unified interface can help developers get a better overview and prioritize upcoming work. We present visualizations and a dashboard that interactively display results from static code analysis for “interesting” commits during development. With this, we aim to provide an effective visual analytics tool for code security analysis results.
Jaigirdar, Fariha Tasmin, Rudolph, Carsten, Bain, Chris.  2021.  Risk and Compliance in IoT- Health Data Propagation: A Security-Aware Provenance based Approach. 2021 IEEE International Conference on Digital Health (ICDH). :27–37.
Data generated from various dynamic applications of Internet of Things (IoT) based healthcare technology is effectively used for decision-making, providing reliable and smart healthcare services to the elderly and patients with chronic diseases. Since these precious data are susceptible to various security attacks, continuous monitoring of the system's compliance and identification of security risks in IoT data propagation is essential through potentially several layers of applications. This paper pinpoints how security-aware data provenance graphs can support compliance checking and risk estimation by including sufficient information on security controls and other security-relevant evidence. Real-time analysis of these security evidence to enable a step-wise validation and providing the evidence of this validation to end-users is currently not possible with the available data. This paper analyzes the security concerns in different phases of data propagation in a designed IoT-health scenario and promotes step-wise validation of security evidence. It proposes a system model with a novel protocol that documents and verifies evidence for security controls for data-object relations in data provenance graphs to assist compliance checking of security regulation of healthcare systems. With this regard, this paper discusses the proposed system model design with the requirements for technical safeguards of the Health Insurance Portability and Accountability Act (HIPAA). Based on the verification output at each phase, the proposed protocol reports this chain of verification by creating certain security tokens. Finally, the paper provides a formal security validation and security design analysis to show the applicability of this step-wise validation within the proposed system model.
Lee, Hyo-Cheol, Lee, Seok-Won.  2021.  Towards Provenance-based Trust-aware Model for Socio-Technically Connected Self-Adaptive System. 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). :761—767.
In a socio-technically connected environment, self-adaptive systems need to cooperate with others to collect information to provide context-dependent functionalities to users. A key component of ensuring safe and secure cooperation is finding trustworthy information and its providers. Trust is an emerging quality attribute that represents the level of belief in the cooperative environments and serves as a promising solution in this regard. In this research, we will focus on analyzing trust characteristics and defining trust-aware models through the trust-aware goal model and the provenance model. The trust-aware goal model is designed to represent the trust-related requirements and their relationships. The provenance model is analyzed as trust evidence to be used for the trust evaluation. The proposed approach contributes to build a comprehensive understanding of trust and design a trust-aware self-adaptive system. In order to show the feasibility of the proposed approach, we will conduct a case study with the crowd navigation system for an unmanned vehicle system.
Dawit, Nahom Aron, Mathew, Sujith Samuel, Hayawi, Kadhim.  2020.  Suitability of Blockchain for Collaborative Intrusion Detection Systems. 2020 12th Annual Undergraduate Research Conference on Applied Computing (URC). :1–6.
Cyber-security is indispensable as malicious incidents are ubiquitous on the Internet. Intrusion Detection Systems have an important role in detecting and thwarting cyber-attacks. However, it is more effective in a centralized system but not in peer-to-peer networks which makes it subject to central point failure, especially in collaborated intrusion detection systems. The novel blockchain technology assures a fully distributed security system through its powerful features of transparency, immutability, decentralization, and provenance. Therefore, in this paper, we investigate and demonstrate several methods of collaborative intrusion detection with blockchain to analyze the suitability and security of blockchain for collaborative intrusion detection systems. We also studied the difference between the existing means of the integration of intrusion detection systems with blockchain and categorized the major vulnerabilities of blockchain with their potential losses and current enhancements for mitigation.
Sun, Yuxin, Zhang, Yingzhou, Zhu, Linlin.  2020.  An Anti-Collusion Fingerprinting based on CFF Code and RS Code. 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :56–63.
Data security is becoming more and more important in data exchange. Once the data is leaked, it will pose a great threat to the privacy and property security of users. Copyright authentication and data provenance have become an important requirement of the information security defense mechanism. In order to solve the collusion leakage of the data distributed by organization and the low efficiency of tracking the leak provenance after the data is destroyed, this paper proposes a concatenated-group digital fingerprint coding based on CFF code and Reed-solomon (RS) that can resist collusion attacks and corresponding detection algorithm. The experiments based on an asymmetric anti-collusion fingerprint protocol show that the proposed method has better performance to resist collusion attacks than similar non-grouped fingerprint coding and effectively reduces the percentage of misjudgment, which verifies the availability of the algorithm and enriches the means of organization data security audit.
Henry, Wayne C., Peterson, Gilbert L..  2020.  Exploring Provenance Needs in Software Reverse Engineering. 2020 13th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE). :57–65.
Reverse engineers are in high demand in digital forensics for their ability to investigate malicious cyberspace threats. This group faces unique challenges due to the security-intensive environment, such as working in isolated networks, a limited ability to share files with others, immense time pressure, and a lack of cognitive support tools supporting the iterative exploration of binary executables. This paper presents an exploratory study that interviewed experienced reverse engineers' work processes, tools, challenges, and visualization needs. The findings demonstrate that engineers have difficulties managing hypotheses, organizing results, and reporting findings during their analysis. By considering the provenance support techniques of existing research in other domains, this study contributes new insights about the needs and opportunities for reverse engineering provenance tools.
Hassan, Wajih Ul, Bates, Adam, Marino, Daniel.  2020.  Tactical Provenance Analysis for Endpoint Detection and Response Systems. 2020 IEEE Symposium on Security and Privacy (SP). :1172–1189.
Endpoint Detection and Response (EDR) tools provide visibility into sophisticated intrusions by matching system events against known adversarial behaviors. However, current solutions suffer from three challenges: 1) EDR tools generate a high volume of false alarms, creating backlogs of investigation tasks for analysts; 2) determining the veracity of these threat alerts requires tedious manual labor due to the overwhelming amount of low-level system logs, creating a "needle-in-a-haystack" problem; and 3) due to the tremendous resource burden of log retention, in practice the system logs describing long-lived attack campaigns are often deleted before an investigation is ever initiated.This paper describes an effort to bring the benefits of data provenance to commercial EDR tools. We introduce the notion of Tactical Provenance Graphs (TPGs) that, rather than encoding low-level system event dependencies, reason about causal dependencies between EDR-generated threat alerts. TPGs provide compact visualization of multi-stage attacks to analysts, accelerating investigation. To address EDR's false alarm problem, we introduce a threat scoring methodology that assesses risk based on the temporal ordering between individual threat alerts present in the TPG. In contrast to the retention of unwieldy system logs, we maintain a minimally-sufficient skeleton graph that can provide linkability between existing and future threat alerts. We evaluate our system, RapSheet, using the Symantec EDR tool in an enterprise environment. Results show that our approach can rank truly malicious TPGs higher than false alarm TPGs. Moreover, our skeleton graph reduces the long-term burden of log retention by up to 87%.
Sharma, Rohit, Pawar, Siddhesh, Gurav, Siddhita, Bhavathankar, Prasenjit.  2020.  A Unique Approach towards Image Publication and Provenance using Blockchain. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). :311–314.
The recent spurt of incidents related to copyrights and security breaches has led to the monetary loss of several digital content creators and publishers. These incidents conclude that the existing system lacks the ability to uphold the integrity of their published content. Moreover, some of the digital content owners rely on third parties, results in lack of ability to provide provenance of digital media. The question that needs to be addressed today is whether modern technologies can be leveraged to suppress such incidents and regain the confidence of creators and the audience. Fortunately, this paper presents a unique framework that empowers digital content creators to have complete control over the place of its origin, accessibility and impose restrictions on unauthorized alteration of their content. This framework harnesses the power of the Ethereum platform, a part of Blockchain technology, and uses S mart Contracts as a key component empowering the creators with enhanced control of their content and the corresponding audience.
Kashliev, Andrii.  2020.  Storage and Querying of Large Provenance Graphs Using NoSQL DSE. 2020 IEEE 6th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :260–262.
Provenance metadata captures history of derivation of an entity, such as a dataset obtained through numerous data transformations. It is of great importance for science, among other fields, as it enables reproducibility and greater intelligibility of research results. With the avalanche of provenance produced by today's society, there is a pressing need for storing and low-latency querying of large provenance graphs. To address this need, in this paper we present a scalable approach to storing and querying provenance graphs using a popular NoSQL column family database system called DataStax Enterprise (DSE). Specifically, we i) propose a storage scheme, including two novel indices that enable efficient traversal of provenance graphs along causality lines, ii) present an algorithm for building our proposed indices for a given provenance graph, iii) implement our algorithm and conduct a performance study in which we store and query a provenance graph with over five million vertices using a DSE cluster running in AWS cloud. Our performance study results further validate scalability and performance efficiency of our approach.
Jaigirdar, Fariha Tasmin, Rudolph, Carsten, Bain, Chris.  2020.  Prov-IoT: A Security-Aware IoT Provenance Model. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1360—1367.
A successful application of an Internet of Things (IoT) based network depends on the accurate and successful delivery of a large amount of data collected from numerous sources. However, the highly dynamic nature of IoT network prevents the establishment of clear security perimeters and hampers the understanding of security aspects. Risk assessment in such networks requires good situational awareness with respect to security. Therefore, a comprehensive view of data propagation including information on security controls can improve security analysis and risk assessment in each layer of data propagation in an IoT architecture. Documentation of metadata is already used in data provenance to identify who generates which data, how, and when. However, documentation of security information is not seen as relevant for data provenance graphs. In this paper, we discuss the importance of adding security metadata in a data provenance graph. We propose a novel IoT Provenance model, Prov-IoT, which documents the history of data records considering data processing and aggregation along with security metadata to enable a foundation for trust in data. The model portrays a comprehensive framework and outlines the identification of information to be included in designing a security-aware provenance graph. This can be beneficial for uncovering system fault or intrusion. Also, it can be useful for decision-based systems for security analysis and risk estimation. We design an associated class diagram for the Prov-IoT model. Finally, we use an IoT healthcare example scenario to demonstrate the impact of the proposed model.
Tabiban, Azadeh, Jarraya, Yosr, Zhang, Mengyuan, Pourzandi, Makan, Wang, Lingyu, Debbabi, Mourad.  2020.  Catching Falling Dominoes: Cloud Management-Level Provenance Analysis with Application to OpenStack. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

The dynamicity and complexity of clouds highlight the importance of automated root cause analysis solutions for explaining what might have caused a security incident. Most existing works focus on either locating malfunctioning clouds components, e.g., switches, or tracing changes at lower abstraction levels, e.g., system calls. On the other hand, a management-level solution can provide a big picture about the root cause in a more scalable manner. In this paper, we propose DOMINOCATCHER, a novel provenance-based solution for explaining the root cause of security incidents in terms of management operations in clouds. Specifically, we first define our provenance model to capture the interdependencies between cloud management operations, virtual resources and inputs. Based on this model, we design a framework to intercept cloud management operations and to extract and prune provenance metadata. We implement DOMINOCATCHER on OpenStack platform as an attached middleware and validate its effectiveness using security incidents based on real-world attacks. We also evaluate the performance through experiments on our testbed, and the results demonstrate that DOMINOCATCHER incurs insignificant overhead and is scalable for clouds.

Fan, X., Zhang, F., Turamat, E., Tong, C., Wu, J. H., Wang, K..  2020.  Provenance-based Classification Policy based on Encrypted Search. 2020 2nd International Conference on Industrial Artificial Intelligence (IAI). :1–6.
As an important type of cloud data, digital provenance is arousing increasing attention on improving system performance. Currently, provenance has been employed to provide cues regarding access control and to estimate data quality. However, provenance itself might also be sensitive information. Therefore, provenance might be encrypted and stored in the Cloud. In this paper, we provide a mechanism to classify cloud documents by searching specific keywords from their encrypted provenance, and we prove our scheme achieves semantic security. In term of application of the proposed techniques, considering that files are classified to store separately in the cloud, in order to facilitate the regulation and security protection for the files, the classification policies can use provenance as conditions to determine the category of a document. Such as the easiest sample policy goes like: the documents have been reviewed twice can be classified as “public accessible”, which can be accessed by the public.
Ayoade, G., Akbar, K. A., Sahoo, P., Gao, Y., Agarwal, A., Jee, K., Khan, L., Singhal, A..  2020.  Evolving Advanced Persistent Threat Detection using Provenance Graph and Metric Learning. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

Advanced persistent threats (APT) have increased in recent times as a result of the rise in interest by nation-states and sophisticated corporations to obtain high profile information. Typically, APT attacks are more challenging to detect since they leverage zero-day attacks and common benign tools. Furthermore, these attack campaigns are often prolonged to evade detection. We leverage an approach that uses a provenance graph to obtain execution traces of host nodes in order to detect anomalous behavior. By using the provenance graph, we extract features that are then used to train an online adaptive metric learning. Online metric learning is a deep learning method that learns a function to minimize the separation between similar classes and maximizes the separation between dis-similar instances. We compare our approach with baseline models and we show our method outperforms the baseline models by increasing detection accuracy on average by 11.3 % and increases True positive rate (TPR) on average by 18.3 %.

Agadakos, I., Ciocarlie, G. F., Copos, B., George, J., Leslie, N., Michaelis, J..  2019.  Security for Resilient IoBT Systems: Emerging Research Directions. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :1—6.

Continued advances in IoT technology have prompted new investigation into its usage for military operations, both to augment and complement existing military sensing assets and support next-generation artificial intelligence and machine learning systems. Under the emerging Internet of Battlefield Things (IoBT) paradigm, a multitude of operational conditions (e.g., diverse asset ownership, degraded networking infrastructure, adversary activities) necessitate the development of novel security techniques, centered on establishment of trust for individual assets and supporting resilience of broader systems. To advance current IoBT efforts, a set of research directions are proposed that aim to fundamentally address the issues of trust and trustworthiness in contested battlefield environments, building on prior research in the cybersecurity domain. These research directions focus on two themes: (1) Supporting trust assessment for known/unknown IoT assets; (2) Ensuring continued trust of known IoBT assets and systems.

Agadakos, I., Ciocarlie, G. F., Copos, B., Emmi, M., George, J., Leslie, N., Michaelis, J..  2019.  Application of Trust Assessment Techniques to IoBT Systems. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :833—840.

Continued advances in IoT technology have prompted new investigation into its usage for military operations, both to augment and complement existing military sensing assets and support next-generation artificial intelligence and machine learning systems. Under the emerging Internet of Battlefield Things (IoBT) paradigm, current operational conditions necessitate the development of novel security techniques, centered on establishment of trust for individual assets and supporting resilience of broader systems. To advance current IoBT efforts, a collection of prior-developed cybersecurity techniques is reviewed for applicability to conditions presented by IoBT operational environments (e.g., diverse asset ownership, degraded networking infrastructure, adversary activities) through use of supporting case study examples. The research techniques covered focus on two themes: (1) Supporting trust assessment for known/unknown IoT assets; (2) ensuring continued trust of known IoT assets and IoBT systems.

R P, Jagadeesh Chandra Bose, Singi, Kapil, Kaulgud, Vikrant, Phokela, Kanchanjot Kaur, Podder, Sanjay.  2019.  Framework for Trustworthy Software Development. 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). :45–48.
Intelligent software applications are becoming ubiquitous and pervasive affecting various aspects of our lives and livelihoods. At the same time, the risks to which these systems expose the organizations and end users are growing dramatically. Trustworthiness of software applications is becoming a paramount necessity. Trust is to be regarded as a first-class citizen in the total product life cycle and should be addressed across all stages of software development. Trust can be looked at from two facets: one at an algorithmic level (e.g., bias-free, discrimination-aware, explainable and interpretable techniques) and the other at a process level by making development processes more transparent, auditable, and adhering to regulations and best practices. In this paper, we address the latter and propose a blockchain enabled governance framework for building trustworthy software. Our framework supports the recording, monitoring, and analysis of various activities throughout the application development life cycle thereby bringing in transparency and auditability. It facilitates the specification of regulations and best practices and verifies for its adherence raising alerts of non-compliance and prescribes remedial measures.
Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Scherzinger, Stefanie, Seifert, Christin, Wiese, Lena.  2019.  The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning. 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). :1620–1629.
Machine learning experts prefer to think of their input as a single, homogeneous, and consistent data set. However, when analyzing large volumes of data, the entire data set may not be manageable on a single server, but must be stored on a distributed file system instead. Moreover, with the pressing demand to deliver explainable models, the experts may no longer focus on the machine learning algorithms in isolation, but must take into account the distributed nature of the data stored, as well as the impact of any data pre-processing steps upstream in their data analysis pipeline. In this paper, we make the point that even basic transformations during data preparation can impact the model learned, and that this is exacerbated in a distributed setting. We then sketch our vision of end-to-end explainability of the model learned, taking the pre-processing into account. In particular, we point out the potentials of linking the contributions of research on data provenance with the efforts on explainability in machine learning. In doing so, we highlight pitfalls we may experience in a distributed system on the way to generating more holistic explanations for our machine learning models.