Visible to the public Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security

TitleBlack Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security
Publication TypeConference Paper
Year of Publication2020
AuthorsKuppa, A., Le-Khac, N.-A.
Conference Name2020 International Joint Conference on Neural Networks (IJCNN)
Keywordsadversarial attack, Analytical models, artificial intelligence, artificial intelligence security, binary output, black box attack, black box encryption, black box settings, black-box models, composability, computer security, cyber security, cybersecurity domain, Data analysis, Data models, Deep Learning, domain experts, exact properties, explainable artificial intelligence, explainable artificial intelligence methods, gradient-based XAI, learning (artificial intelligence), Metrics, ML models, Predictive models, privacy, pubcrawl, Resiliency, Robustness, Scalability, security, security domain, security of data, security-relevant data-sets, threat models, white box, White Box Security, white box setting, xai, XAI methods

Cybersecurity community is slowly leveraging Machine Learning (ML) to combat ever evolving threats. One of the biggest drivers for successful adoption of these models is how well domain experts and users are able to understand and trust their functionality. As these black-box models are being employed to make important predictions, the demand for transparency and explainability is increasing from the stakeholders.Explanations supporting the output of ML models are crucial in cyber security, where experts require far more information from the model than a simple binary output for their analysis. Recent approaches in the literature have focused on three different areas: (a) creating and improving explainability methods which help users better understand the internal workings of ML models and their outputs; (b) attacks on interpreters in white box setting; (c) defining the exact properties and metrics of the explanations generated by models. However, they have not covered, the security properties and threat models relevant to cybersecurity domain, and attacks on explainable models in black box settings.In this paper, we bridge this gap by proposing a taxonomy for Explainable Artificial Intelligence (XAI) methods, covering various security properties and threat models relevant to cyber security domain. We design a novel black box attack for analyzing the consistency, correctness and confidence security properties of gradient based XAI methods. We validate our proposed system on 3 security-relevant data-sets and models, and demonstrate that the method achieves attacker's goal of misleading both the classifier and explanation report and, only explainability method without affecting the classifier output. Our evaluation of the proposed approach shows promising results and can help in designing secure and robust XAI methods.

Citation Keykuppa_black_2020