Pushing the Decision Boundaries: Discovering New Classes in Audio Data

We usually need new data to train or fine-tune machine learning models for {\em new tasks}. However, previously collected data might include relevant information that is enough to learn the desired tasks. In this paper, we explore discovering new classes in audio data by extending a recent vision-based task discovery framework with an audio processing pipeline. Our proposed pipeline aims to find new class boundaries on specific acoustic components, such as speech and background noise, which extends the vision-based framework to effectively handle audio data. Furthermore, we introduce a new metric for assessing the clarity of newly discovered class boundaries.
We show that, compared to the baseline task discovery framework, we can discover new classes with 21% higher clarity, in average.