ChaT: Failure Detection using Reconfigurations and Perturbations in Distributed Network Systems

07 December 2021

New Image

Detecting network faults, while assessing the normality of network system states can be challenging, promises to increase service quality and efficiency. This paper proposes ChaT, a testing framework used under system reconfigurations and perturbations on distributed network systems to identify and discriminate between safe and failure behavior under different testing scenario. Motivated by metamorphic testing (MT) technique that removes the burden of defining software oracles, ChaT correlates system inputs and outputs using the proposed metamorphic relationships (MRs) to find executional patterns. Machine learning techniques, principal component analysis (PCA) and support vector machine (SVM), are used to identify system states based on the testing scenario. Several anomaly detection techniques: isolation forest (IF), one-class SVM (OCSVM), local outlier factor (LOF), and robust covariance (RC) categorize experiments belonging to either safe or failure-prone states, thus validating the findings. We apply ChaT to our media streaming application with clients requesting video data to the servers while an identity provider server and reversed proxy server sit between them to offer data exchange security and balance. The simulation results show the effectiveness of ChaT to achieve the goals: classifying execution contexts and detecting failure-prone experiments with high level of statistical analysis.