RobOps: Robust control for cloud services
Performing online automated resource provisioning for services in the cloud needs to be robust in order to fulfil minimum Service Level Objectives (SLOs). Resource provisioning is mostly carried out by solutions based on scaling policies, being its probably most popular example Amazon Web Services AutoScaling. However, the validity of this approach is compromised as the complexity of the service being provisioned increases, as it lacks of a service model and cannot capture service dynamics. Although mode complex solutions based on control theory or other approaches have been proposed, their robustness is questionable in the presence of traffic surges or interferences that affect the underlying service. In this paper we propose RobOps, a framework for automated resource provisioning provides this robustness. RobOps uses online system identification (SID) to identify the best model for a service out of a preselected set. Moreover, it can adaptively adjust this model upon changes in the service. In addition, RobOps combines feedforward and feedback control loops to agilely provision the service regardless of the variations in the service workload. We validate its performance using a real enterprise communications service showing how RobOps adapts its model upon changes in the service and how it can reduce SLO violations by more than a 80% even in the presence of traffic surges.