Augmenting Conversational Agents with Ambient AcousticContexts
05 October 2020
Conversational agents are rich in content today. However, they are entirely oblivious to user situational context, limiting their ability to adapt their response and interaction style. To this end, we explore the design space for a context augmented conversational agent, including analysis of input segment dynamics and computational alternatives. Building on these, we propose a solution that redesigns the input segment intelligently for ambient context recognition, achieved in a two-step inference pipeline. First, we separate the non-speech segment from acoustic signals and then use a neural network to infer diverse ambient context. To build the network, we curated public audio dataset through crowdsourcing. Our experimental results demonstrate that the proposed network can distinguish between 10 ambient contexts with an average F1 score of 0.79 and a computational latency of 3 milliseconds. We also build a compressed neural network, optimised for both accuracy and latency. Finally, we present a concrete manifestation of our solution in designing a context-aware conversational agent and demonstrate use cases.