Redesigning performance verification to new product architecture
25 October 2016
At the end of 2015 our business line have started to develop a new product which would completely be torn from its roots. The main idea was to get rid of the legacy code as much as possible - only keep the business logic, and introduce a more infrastructure independent solution. Along the way we also intended to move our processes to continuous delivery or even DevOps. At a very early stage of development we realized that the current methods and practices of performance verification will be a hindering factor both in the actual development and process renewal. We had tools and practices dating back over 10 years, which were simply insufficient to serve our goals and provide us the speed to get to the market in time. Problems laid in test environment creation, test automation and test design as well. To tackle this complex problem we took two parallel path. On one hand, we started redefining how we are doing performance tests, by reducing the test and SUT complexity to the smallest possible increment. On the other hand, we get heavy automation work going to cover configuration and execution. We created a scalable dynamic deployment framework (AvED - Automated virtual Environment Deployment), with which we can carry out on demand SUT and non-SUT deployment in a fraction of the original configuration time. With the utilization of AvED and scalable systems we are currently able to deploy 48 parallel test environments to our available cloud capacity both on VMware and Openstack infrastructure. To test the performance of different possible cloud variants we created a Python based measurement system to establish the performance baseline on reference configuration and developed a Python, Power CLI and Bash based performance predictor tool which utilizes IPSL and Gatling. The results of performance prediction can be further refined and expanded by the use of Clover (Cloud Verification) tool. With the use of predictor we can already specify the VNF performance on a never before tested infrastructure without having to actually deploy the VNF itself. In earlier practice, the test cases ran for couple of hours minimum, even when it would be visible that the result is failure after a few minutes. To solve the problem of futile test execution we implemented a continuous test supervision system, which would overlook all our execution and start tearing down the execution if fail criteria has been met. In order to automate the test result analysis we are developing a SUT behaviour and data discrepancy recognition frame along with selective log analytics. When designing the test cases we needed to fully understand what the new infrastructure mean and how we could test it in a meaningful way and earlier in the product lifecycle. To follow to possibilities of containerization we had to design test cases where - contrary to previous practice - do not test the whole system, but only one part or some parts of it. We designed test cases to identify load related problems and performance bottlenecks on the lowest possible level of the architecture and just verify our results on a full-scale environment. To align with our business process renewal we had to design the new test cases to provide the feedback as fast as one CI cycle. With our renewed test cases and increased automation, we significantly decreased performance verification turnover time from months to weeks or days and we are able to provide load/mass traffic testing feedback to development even in the desired 2 hours long CI cycle. With the reduction of testing cycle times we are able to introduce new types of tests into our delivery process, such as new ways of chaos and robustness tests with Pumba or Faulty Cat, which ultimately leads to higher coverage and quality affecting all stakeholders throughout our VNF delivery. In my presentation, I would provide a general overview on how we redesigned our test cases in order to meet automation possibilities, and would introduce our dynamic deployment framework and its connections towards the SCM/CI pipeline and automated test execution frame. I would demonstrate the key benefits of simplification in automation by which we were able to create a modular and generic system which can be applied companywide or even outside Nokia.