Visible to the public Text Analytics for Mobile App Security and Beyond


Mobile apps are accompanied by a rich amount of natural language text: app descriptions, app user reviews, update/release notes, etc. Such natural language text is essential in conveying important information about the apps (such as expected functionalities) and such information is not easily attainable from other structured information of the apps (such as app source or binary code, execution traces). Given the overwhelming amount of available natural language text, there is a high demand of text analytics including natural language processing (NLP) and text mining techniques to automatically analyze the natural language text to improve mobile app security. The history of applying NLP and text mining techniques to analyze software artifacts can date back to about a decade ago. Only till recently, text analytics for software artifacts such as mobile app artifacts has become an emerging research area in the security community. Our recent work has shown that automated analysis of natural language text can help improve mobile app security, and software security in general.

This talk highlights our WHYPER framework published in USENIX Security 2013 ( In particular, while recent work has developed various techniques to determine what mobile apps do, no work has provided a technical approach to answer, what do users expect? Our WHYPER framework serves as the first step in addressing this challenge. WHYPER focuses on permissions for a given mobile app and examines whether the app description provides any indication for why the application needs a permission. WHYPER uses NLP techniques to identify sentences that describe the need for a given permission in an app description. The evaluation results demonstrate great promise in using NLP techniques to bridge the semantic gap between user expectations and application functionality, further aiding the risk assessment of mobile apps.

This talk also presents our Text2Policy approach (published in FSE 2012, a text analytics approach for helping assure high application security during application development. In particular, Text2Policy extracts Access Control Policies (ACP) from natural language software documents such as requirements documents. ACPs specify which principals such as users have access to which resources. Ensuring the correctness and consistency of ACPs is crucial to prevent security vulnerabilities. However, in practice, ACPs are commonly written in natural language and buried in large documents such as requirements documents, not amenable for automated techniques to check for correctness and consistency. To address these issues, Text2Policy automatically extracts ACPs from natural language software documents and resource-access information from natural language scenario-based functional requirements.

This talk finally summarizes other major research work done in the area of text analytics for software security, and outlines future research directions and highlights research challenges in this emerging research area.

Speaker Bio

Tao Xie is an Associate Professor in the Department of Computer Science at University of Illinois at Urbana-Champaign, USA since July 2013. Before then, he was an Associate Professor in the Department of Computer Science at North Carolina State University. He received his Ph.D. in Computer Science from the University of Washington in 2005, advised byDavid Notkin. Before that, he received an M.S. in Computer Science from the University of Washington in 2002, an M.S. in Computer Science from Peking University in 2000, advised by Hong Mei, and a B.S. in Computer Science from Fudan University in 1997. He has worked as a visiting researcher at Microsoft Research Redmond and Microsoft Research Asia. His research interests are in Software Engineering, with a focus on Software Testing, Debugging, and Analysis, Software Analytics, Software Security, Software Engineering for Mobile/Internet Computing, and Educational Software Engineering. He leads the Automated Software Engineering Research Group at Illinois, and is a member of the Programming Languages, Formal Methods, and Software Engineering (PL-FM-SE) area at Illinois.

He has contributed to broad software engineering and computing research communities with extensive professional services. He has served as the ACM SIGSOFT History Liaison in theSIGSOFT Executive Committee as well as a member and the SGB Liaison of the ACM History Committee. He is an ACM Distinguished Speaker and an IEEE Computer Society Distinguished Visitor. He received a National Science Foundation Faculty Early Career Development (CAREER) Award in 2009. He received a 2011 Microsoft Research Software Engineering Innovation Foundation (SEIF) Award, 2007 and 2013 Microsoft Research Awards, 2008, 2009, and 2010 IBM Faculty Awards, and a 2008 IBM Jazz Innovation Award. He received 2010 North Carolina State University Sigma Xi Faculty Research Award. He received the IEEE Software Best Software Engineering in Practice (SEIP) Paper Award at ICSE 2013, the ASE 2009 Best Paper Award, and an ACM SIGSOFT Distinguished Paper Award. His research has been supported by NSF, NIST, ARO, IBM, Microsoft Research, Fujitsu Lab, and ABB Research. He was Program Co-Chair of 2009 IEEE International Conference on Software Maintenance (ICSM) and Program Co-Chair of 2011 and 2012 International Working Conference on Mining Software Repositories (MSR). He is the Program Chair of 2015 International Symposium on Software Testing and Analysis (ISSTA). He has served on program committees of various conferences, including ICSE, ASE, ISSTA, and WWW.

Creative Commons 2.5

Other available formats:

Text Analytics for Mobile App Security and Beyond
Switch to experimental viewer