Improving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detection

09 April 2021

New Image

Deep generative models have increasingly become popular in different domain such as image processing, though, they hardly appear for cybersecurity arena. While, the main application of these models is dimensionality reduction, marginally they have been utilized for challenges such as data generalization and overfitting issue inherited from feature selection methods. To solve the mentioned issue, we propose a combined architecture comprising a Conditional Variational AutoEncoder (CVAE) and a Random Forest (RF) classifier to automatically learn similarity among input features, provide a data distribution in order to extract discriminative features from the original features and finally classify various types of attacks. CVAE introduces the labels of the packets into the latent space in order to better learn the changes of the input samples and distinguish the data characteristics of each class. It avoids the confusion between classes while learning the whole data distribution. Compared with feature selection mechanisms such as Support Vector Machine Online (SVMo) and considering various evaluation metrics, the proposed architecture demonstrates considerable improvements in the performance with a significant testing time. To verify the versatility of the proposed architecture, two publicly available datasets have been used in the experiments.