Visible to the public Data Anonymization: K-anonymity Sensitivity Analysis

TitleData Anonymization: K-anonymity Sensitivity Analysis
Publication TypeConference Paper
Year of Publication2020
AuthorsSantos, W., Sousa, G., Prata, P., Ferrão, M. E.
Conference Name2020 15th Iberian Conference on Information Systems and Technologies (CISTI)
Date PublishedJune 2020
ISBN Number978-989-54659-0-3
Keywordsanonymity, ARX, ARX k-anonymization, Brazilian higher education evaluation system, central governments, composability, data anonimization, data anonymization, data privacy, data protection, data usability, digitization process, Education, European General Data Protection Regulation, further education, GDPR, government data processing, Human Behavior, k-anonymity, k-anonymity sensitivity analysis, local authorities, Metrics, open government data, personal data privacy, personal data protection, pubcrawl, resilience, Resiliency, sensitivity analysis, social justice, Sociology, Statistics, Tools

These days the digitization process is everywhere, spreading also across central governments and local authorities. It is hoped that, using open government data for scientific research purposes, the public good and social justice might be enhanced. Taking into account the European General Data Protection Regulation recently adopted, the big challenge in Portugal and other European countries, is how to provide the right balance between personal data privacy and data value for research. This work presents a sensitivity study of data anonymization procedure applied to a real open government data available from the Brazilian higher education evaluation system. The ARX k-anonymization algorithm, with and without generalization of some research value variables, was performed. The analysis of the amount of data / information lost and the risk of re-identification suggest that the anonymization process may lead to the under-representation of minorities and sociodemographic disadvantaged groups. It will enable scientists to improve the balance among risk, data usability, and contributions for the public good policies and practices.

Citation Keysantos_data_2020