Semantics-Aware Security Research
Abstract: The pervasiveness of human-generated text data presents a unique opportunity for security researchers to better understand, and effectively defend against emerging threats. For instance, tweets, technical posts, white papers and research articles in the public domain are a gold mine for collecting valuable Cyber threat Intelligence. Many infected websites can be efficiently and accurately identified through analyzing the text content for inconsistencies between the semantics of injected text they host (e.g., for selling Viagra) and the content expected to be on these sites (e.g., under an .edu domain). Further, analyzing the semantics of text data can help detect the discrepancies between a mobile app's description and its operation, recover a web site's privacy policies from its text content, capture potential exposure of classified information from messages (e.g., emails) to be publicly released. Increasingly, the security community has come to realize that such semantic processing can significantly enhance security protection and has the potential to completely change the landscape of security technologies. However, unlike its successful application to biomedical and molecular biology research, semantic analysis has so far been applied to security domains only in an ad-hoc way, without any systematic methods and core technologies that can enable reuse of security-specific semantic approaches across disparate security studies. The purpose of this breakout session is to discuss the research on such foundations, as well as the new application domains, for this emerging “Security-NLP” area.