Visible to the public EAGER: Exploring the Feasibility of Phoneme Sound Origins to Enhance Mobile AuthenticationConflict Detection Enabled

Project Details

Lead PI

Performance Period

Aug 01, 2018 - Jul 31, 2020


Florida State University

Award Number

Using mobile devices to authenticate a person's identity, both for access to the device itself and as a platform for verifying access to other nearby devices, is an important problem to address in building secure and private computing systems. This proposal seeks to improve voice recognition as an authentication tool by developing physical models of people's vocal tracts that uniquely affect how individual people produce sounds. These novel biometric traits will be captured and inferred using a variety of sensors that are present on many mobile devices, and studied for their potential to both uniquely identify individuals and be practically used in real contexts. The work also includes systematic studies of how differences in device characteristics, user behavior, and users' physical state (for instance, having a cold) affect both voice production and the ability to model it and use it as a biometric identifier, and how methods for inferring voice characteristics can account for these differences. The work will lead to scientific contributions to both the science of voice production and more general questions about leveraging unique characteristics of physical systems, potential practical applications in authentication, and opportunities to support both undergraduate and graduate education.

The proposed research demonstrates how human physiology and mobile sensing can be explored to enhance mobile authentication. The work around modeling physical voice production will focus on modeling the phoneme sound origin for different sounds from different places in the human vocal tract. These differences in individual physiology are analogous to similar ideas that use small variations in the physical characteristics of computing devices to generate a unique hardware-based signature for each device. They will be sensed through signal processing algorithms that leverage time differences in sound capture from multiple microphones and be evaluated both individually and in combination with other biometric features on the quality of authentication as measured by error rates across datasets of different sizes. The next phase of the work will examine how the context of capture affects the biometric quality. These include the microphone placement and audio chipset and sampling rates of a variety of devices; aspects of a person's grip and interaction with the device and its relative location to their mouth, as well as their posture and motion; and aspects of their physiological (i.e., sickness) and psychological state (through standard techniques for eliciting emotion through video). Finally, to address the problem of adapting models across contexts, the project team will develop techniques to sense pose, distance, and emotional state, as well as evaluate the potential to leverage statistical learning methods well-suited to relative rather than absolute data values such as correlation analysis and Gaussian Mixture Models to address these contextual variations.