Data augmentation by guided deep interpolation

01 November 2021

New Image

State-of-the-art machine learning algorithms require large amount of high quality data. In practice, however, the sample size is commonly low and data is imbalanced along different class labels. Low sample size and imbalanced class distribution can significantly deteriorate the predictive performance of machine learning models. In order to overcome data quality issues, we propose a novel data augmentation method, Guided Deep Interpolation (GDI). It is based on a convolutional auto-encoder network, which is equipped with an auxiliary linear self-expressive layer. The network is trained by minimizing a composite objective function so that to extract the underlying clustered structure of semantic similarities of data points while high reconstruction quality is also preserved. The trained network is used to define a sampling strategy and a synthetic data generation procedure. Making use of the weights of the self-expressive layer, we introduce a measure of semantic variability to quantify how similar a data point to other data points on average.