Keys to virtual network functions assurance
Virtual network functions (VNF) assurance in the management lifecycle is critical to operators as they take advantage of the dynamic application provisioning made possible by moving to the cloud. And while network functions virtualization (NFV) offers benefits to network operators and their customers, the scope of both operations and management systems tools are a concern.
- Will they be able to keep track of service performance and the rapidly changing state of network resources on a large scale?
- Will they be comprehensive and intelligent enough to enable the level of visibility and traceability required for monitoring and troubleshooting?
To address new network and service assurance challenges, operators need to evolve their network operations tools for NFV through tighter coupling with the virtual network functions management. (This tighter coupling is described in the Converged NMS / Virtual Network Functions Manager section of our recent vEPC related TechBlog.)
In addition to the ETSI Management and Orchestration (MANO) architecture, progress has been made in the ETSI specification for defining NFV Service Quality Metrics. It strives to enable better engineering of virtual network functions user service quality, more efficient fault localization and mitigation, and faster identification of true root cause of service impairment so proper corrective actions can be taken promptly.
As NFV service quality metrics and traditional network service performance are continuously monitored, a service aware infrastructure relationship model within a network operations tool will be important. It innately correlates events to the true root cause of service impacting problems without having to develop and pre-configure volumes of custom handling policy rules and scripts. In addition, this model allows operators to perform more rapid service impact assessments for network events under investigation, as well as speed fault isolation and resolution.
And to make this more advanced fault management meaningful for network operators, assurance visualization will provide intuitive views to easily understand how a multitude of events and key quality indicators (KQIs) relate to each other, with clear visibility into the root-cause of problems. It will also let operators understand the timeline for events and state changes in the network that indicate causes and possible effects.
This article is the 2nd in TechZine that discusses the evolution of network and service assurance. The earlier blog gives a general overview on how network operations tools can be more efficient.
Converged NMS and VNF manager provides full lifecycle management for both PNFs and VNFs, including advanced fault management and assurance visualization
ASSURING THE EVER CHANGING STATE OF THE VIRTUAL NETWORK
Virtual network functions configurations will be far more dynamic than those with physical network elements (PNF). This presents new challenges for network operations tools to keep pace with events related to highly dynamic network state changes and elastic scaling.
Manual processes that piece together assurance data from disparate views will not be sufficient to keep pace in this highly dynamic NFV environment. And traditional real-time-only monitoring and assurance views will not be effective when a VNF could be here in 1 moment and then scaled down and gone in the next. This means that both current and historical event and state information needs to be intelligently processed with near real-time performance, and at a large scale.
Consider how much more meaningful assurance views would be for network operators if they could be made more intuitive. If it was easier to understand how all the network events and MANO-related KQIs relate to each other.
For example, wouldn’t it be more insightful for operators troubleshooting a service performance issue to have a timeline that shows the service impacting threshold crossing alerts? And whether orchestration or network events occurred in the same general timeframe?
ENHANCING NFV ASSURANCE WITH SERVICE QUALITY METRICS
As the number of virtual network functions deployments increase, network operations tools will need to evolve with new NFV service quality metric definitions -- and provide intelligence for correlating all the different events coming from the various types of NFV infrastructure and MANO elements. Operators need service aware visibility and traceability to the various layers that could possibly impact service quality. Specifically, those related to troubleshooting and root cause analysis should work in coordination with VNF management – for the full lifecycle which includes assurance.
For operations to be effective in a highly dynamic environment -- with network services that depend on both VNFs and PNFs for underlying network infrastructure -- there must be a service aware understanding of the relationships between them and the services. And a mapping of how service quality events triggered by virtual machines, virtual network functions, and orchestration layers impact or trigger changes in dependent layers.
For example, when there are issues with virtual network provisioning latency or reliability or diversity compliance, these conditions may trigger actions within the orchestration layer. But as a primary concern of network operators:
- How will these actions impact service quality?
- How will they impact the virtual network?
- How will the VNF manager react?
Without a network operation tool to provide this type of intelligence for assuring virtual network functions, operators won’t have the visibility necessary to understand whether a problem is within their control. And this type of information would not only be highly valuable for troubleshooting, but even more broadly for clarifying accountability for a localized problem across various organization groups -- from IT to the different network domain groups.
Operators require a unified network operations tool that has evolved with the intelligence to meet all of these new network functions virtualization-related assurance challenges. This tool must possess a service aware model unified with NFV lifecycle management. It must scale and be able to track huge numbers of events that reflect the continual state of flux of change across service quality impacting layers. (For more examples of service quality metrics that provide requirements for assuring virtual networks, refer to the ETSI specification cited previously.)
EVOLVING ASSURANCE WITH ADVANCED FAULT MANAGEMENT
Operators deploying NFV require advanced fault management that provides both current and historical visibility for root-cause analysis, so that active and past faults can be correlated as the state of the network changes. This historical fault correlation is essential to identify the root cause of problems in the highly dynamic virtualized network where MANO-triggered corrective actions could potentially make intermittently reoccurring customer impacting issues difficult to investigate.
And network and service assurance tools in the cloud/NFV era must scale to track the full history of related service-impacting events so network operators can perform both real-time troubleshooting and trend analysis.
Tools also need to have the intelligence to detect reoccurring problems. Specifically, operators require a tool that can help them to assess whether corrective resolutions that were automated are successful, or whether they are failing. And if failing, whether the failures are persistent or intermittent, and whether there is an actionable probable cause against the network infrastructure within the scope of the network operator’s control.
And amongst the high volumes of events, there will also be a need to suppress or filter out events that don’t require action by the network operations team.
A service aware network operations tool that is converged with the VNF manager function and that provides advanced fault management with network assurance visualization lets operators address the challenges of virtual network functions assurance and more efficiently perform network operations tasks for:
- Service impact assessment
- Fault localization
- Identification of true root-cause (from symptomatic faults)
- Corrective actions for fast problem resolution
An OpenStack integrated VNF Manager for the Nokia VSR and EPC VNFs application note
5620 Service Aware Manager webpage
Our authors look forward to your questions and comments.