Visible to the public Automated Extraction of Regulated Information Types using Hyponymy RelationsConflict Detection Enabled

TitleAutomated Extraction of Regulated Information Types using Hyponymy Relations
Publication TypeConference Proceedings
Year of Publication2016
AuthorsJaspreet Bhatia, Morgan Evans, Sudarshan Wadkar, Travis Breaux
Conference Name2016 RE: Requirements Engineering Conference
Series Title3rd International Workshop on Artificial Intelligence for Requirements Engineering
Date Published09/2016
Conference LocationBeijing, China
KeywordsCMU, compliance, hypernym, Hyponym, natural language processing, Oct'16, Ontology, privacy policy

Requirements analysts can model regulated data practices to identify and reason about risks of noncompliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 15 privacy policies from three domains (shopping, telecommunication and social networks) to identify all instances of information type hyponymy. From this dataset, three semantic and four syntactic categories of hyponymy emerged based on category completeness and word-order. Among these, we identified and empirically evaluated 26 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns identify information type hypernym-hyponym pairs with an average precision of 0.83 and recall of 0.52 across our dataset of 15 policies.

Citation Keynode-30311

Other available formats:

Bhatia_Automated_extraction_TB.pdfPDF document327.04 KBDownloadPreview