Visible to the public Test-Driven Anonymization for Artificial Intelligence

TitleTest-Driven Anonymization for Artificial Intelligence
Publication TypeConference Paper
Year of Publication2019
AuthorsAugusto, Cristian, Morán, Jesús, De La Riva, Claudio, Tuya, Javier
Conference Name2019 IEEE International Conference On Artificial Intelligence Testing (AITest)
Date Publishedapr
KeywordsAI context, anonymization, anonymization efforts, anonymization techniques, anonymized data, artificial intelligence, artificial intelligence security, artificial intelligence technique, artificial intelligence tools, classification AIs, Data models, data privacy, data protection, functional quality, functional suitability, Health Care, Insurance, internal requirements, k-anonymity, nonfunctional quality, pattern classification, Predictive models, Privacy Protections, Production, pubcrawl, regulatory requirements, security of data, Software Testing, test-driven anonymization approach, Testing, Tools
AbstractIn recent years, data published and shared with third parties to develop artificial intelligence (AI) tools and services has significantly increased. When there are regulatory or internal requirements regarding privacy of data, anonymization techniques are used to maintain privacy by transforming the data. The side-effect is that the anonymization may lead to useless data to train and test the AI because it is highly dependent on the quality of the data. To overcome this problem, we propose a test-driven anonymization approach for artificial intelligence tools. The approach tests different anonymization efforts to achieve a trade-off in terms of privacy (non-functional quality) and functional suitability of the artificial intelligence technique (functional quality). The approach has been validated by means of two real-life datasets in the domains of healthcare and health insurance. Each of these datasets is anonymized with several privacy protections and then used to train classification AIs. The results show how we can anonymize the data to achieve an adequate functional suitability in the AI context while maintaining the privacy of the anonymized data as high as possible.
Citation Keyaugusto_test-driven_2019