UNIVERSAL WAKE WORD ENGINE CREATION WITH VERY LITTLE HUMAN INVOLVEMENT

10 December 2019

New Image

Despite the advancements in Neural Network architectures for universal Wake Word Engines (WWEs), achieving higher accuracy still depends on curating the right type of training dataset, a task that traditionally requires significant of time and human interaction. In this paper, we present novel techniques to create a WWE based on synthetically created data and data obtained from public sources (e.g. YouTube). We demonstrate a level of performance that is on par with the WWEs trained with data obtained from traditional methods (e.g. Mechanical Turk data collection) while at the same time, cutting down the cycle time for data gathering from a few weeks to a few hours and requiring very little human involvement. This is achieved by utilizing data pipelines built on top of YouTube and a state of the art Text to Speech (TTS) Deep Learning model.