Skip to main content
Jun 13 2016

Optimize capacity with lean cloud computing

[caption id="attachment_5469" align="alignright" width="187"]

Read the book to learn more.[/caption]

Lean cloud computing offers ICT service providers a means to eliminate waste from the value chain and transition to an efficient, demand-driven operating model. By applying lean principles to capacity management, service providers can minimize wasted capacity, improve operational efficiency, and generate sustainable cost savings.

Published in April 2016, Lean Computing for the Cloud methodically analyzes the potentially disruptive impact of cloud computing and network functions virtualization (NFV). It methodically applies lean manufacturing and just-in-time inventory management principles to cloud capacity management to explore ways in which lean operations will transform ICT.

The book considers the fundamental capacity challenges facing cloud infrastructure and application service providers. Infrastructure providers are concerned about both physical and virtual resource capacity. They need to know how much infrastructure to deploy and how much equipment should be powered up and in service at any given time. Application service providers need to know how much online capacity should be available for each application at any given time.

These challenges are connected. By applying lean principles across the cloud service delivery chain, both cloud service providers and infrastructure service providers can reduce waste and improve their performance. Instead of trying to shift costs to other parties in the service delivery chain, lean organizations manage capacity more effectively across the service delivery chain so that all parties will benefit.

Each party in the cloud service delivery chain has its own incentives relative to capacity. Lean cloud computing enables application and cloud service provider organizations to work together to sustainably achieve the shortest lead time, best quality and value, and highest customer delight at the lowest cost (Figure 1).

Figure 1. Lean cloud computing visualization

Respect and continuous improvement are the two pillars of lean cloud computing. Respect means working across the service delivery chain to eliminate wasteful work from the value stream, both within organizations and between partners. Continuous improvement methodically identifies and eliminates waste from the service delivery chain.

The transition to lean cloud computing requires commitment from management and a willingness to question everything and embrace change. A successful transition will enable cloud application and infrastructure providers to:

  • Optimize physical resource, virtual resource, and application capacity management by shortening lead times for capacity fulfillment actions and making frequent, demand-driven capacity planning decisions
  • Boost operational efficiency with demand management that flattens demand peaks, shifts demand to off-peak periods, reduces physical capacity requirements, and boosts physical resource utilization
  • Generate sustainable cost savings by methodically squeezing waste and non-value-adding activities out of the service value chain

Lean cloud computing will transform ICT by enabling user demand to pull service capacity rather than forcing service providers to push existing supply onto the market. This shift in focus will empower providers to be more agile and responsive to the needs of users. It will also help them avoid pushing work, inconvenience, or waste onto other value stream players.

Reimagining application capacity management

Just-in-time inventory management principles can help application providers determine how much capacity they need to serve user demand and provide an adequate reserve margin. As shown in Figure 2, cloud capacity management is analogous to inventory management. Although application service providers offer virtual goods, their challenges and business objectives are similar to those of physical goods providers. For example:

  • Working capacity is like cycle stock. It is productive capacity that serves normal demand.
  • Reserve capacity is like safety stock. It covers demand surges, failures, and other contingencies. It is an overhead factor that should be minimized.
  • Excess capacity is like overstock. It is pure waste that should be eliminated.

Figure 2. Comparing capacity for inventory management and cloud computing

ICT service providers have traditionally attempted to match capacity to demand by executing capacity decision, planning and fulfillment cycles a few times a year. In contrast, electric power utilities routinely do day-ahead capacity planning for electricity production and make so-called real-time capacity adjustments to power generation every five minutes.

Lean service providers will operate more like electric utilities and execute very frequent capacity decision, planning, and fulfillment cycles. This approach will minimize wasted capacity by enabling online capacity to track actual user demand.

Cloud application and infrastructure service providers can use similar strategies to address their capacity management challenges. Application service providers must make the right capacity decisions for their applications. Infrastructure service providers must decide how much physical infrastructure to deploy. They must also determine how much physical capacity they need powered up and online to provide virtual resource capacity that will serve aggregate demand. With lean operations based on respect and continuous improvement, both application service providers and infrastructure service providers can minimize capacity fulfillment lead times and cost while maximizing quality, value, and customer satisfaction.

Providers that fulfill capacity change orders quickly and reliably can follow actual demand more closely than they can with long fulfillment lead-time intervals. Capacity planning decisions must consider likely service demand a few fulfillment lead time intervals into the future. This short planning horizon offers important advantages:

  • Application providers can hold just enough online application capacity to cover actual demand plus an adequate reserve margin.
  • Infrastructure providers can keep just enough equipment powered on to serve demand plus an adequate reserve margin.
  • Infrastructure providers can deploy enough physical equipment to serve likely peak load for the short to medium term. They can then use demand management techniques to smooth the workload and shave traffic peaks.

Infrastructure capacity management: Striking the right balance

Analysis of the power industry’s energy balance model yields insights and tactics that could be useful for optimizing cloud capacity management. Like power companies, cloud service providers are concerned with balancing the supply of online service capacity against actual customer demand (Figure 3). Imbalances create similar problems for both types of organizations: Insufficient online capacity yields poor service quality for customers, while excess online capacity wastes resources and increases costs for the service provider.

Figure 3. Infrastructure capacity management as a balancing problem

To balance virtual resource consumption and online virtual resource capacity, infrastructure providers need to manage cyclical application demand. They also need to manage random demand fluctuations over a variety of time periods, from microseconds to minutes to months. By borrowing lean techniques from the power industry, infrastructure providers can level their workload and use resource capacity more efficiently. They can smooth demand variations through a variety of actions, including resource scheduling and curtailment, and resource pricing.

Demand management actions require careful consideration. For example, well-designed resource pricing structures can shape demand, provide more attractive pricing for customers, and smooth the workload for service providers. Ineffective demand management can lead to unacceptable resource curtailment and disappointed customers. It’s essential to think about patterns of demand and different sensitivities before acting. For example, load balancers can help uphold service quality standards when throughput to individual resource instances is curtailed.

Pursuing perfect capacity

The electric power industry’s “perfect dispatch” methodology offers another useful model for continuously improving cloud capacity management. This methodology plots a previous day’s actual demand and actual online capacity on a single chart, and overlays two concepts:

  • Technically perfect capacity – The hypothetical capacity plan that would have tracked actual demand as closely as possible. The plan assumes that the service provider had perfect knowledge of actual demand and adhered to reserve capacity requirements (e.g., by maintaining sufficient spare online capacity to withstand any single failure event).
  • Economically perfect capacity – The hypothetical capacity plan that would minimize the provider’s operating expenses, assuming that the service provider had perfect knowledge of actual demand and adhered to reserve capacity requirements. Note that an organization’s cost structure, operational model and policies make the cheapest (i.e. economically perfect) capacity plan materially different from the theoretically optimal (i.e. technically perfect) capacity plan.

Figure 4 provides a sample visualization that compares a day’s actual online capacity and demand with economically and technically perfect capacity.

Figure 4. Sample perfect capacity visualization

A perfect capacity diagram such as the one in Figure 4 visualizes targets for continuous improvements in operational efficiency:

  • The gap between economically perfect capacity and actual online capacity represents opportunities for operational improvements, such as better demand forecasting.
  • The gap between economically perfect capacity and technically perfect capacity represents opportunities to adjust operational capabilities and cost structures to eliminate waste.
  • The gap between technically perfect capacity and actual demand represents opportunities for to make architectural refinements to eliminate wasted capacity.

Summary

Cloud and NFV technologies will bring fundamental changes to ICT. Most significantly, they will decouple hardware and software, support lifecycle automation, and enable dynamic operations. These changes will provide opportunities for service providers to move away from traditional economy-of-scale operating models and embrace demand-driven operations.

Lean Computing for the Cloud offers an in-depth look at how service providers can apply just-in-time delivery principles from other industries to optimize service delivery on cloud infrastructure. It shows that with methodical use of these principles, cloud application and infrastructure service providers can deliver excellent service quality, achieve sustainable cost savings, and eliminate waste.

Read the book to learn more.[/caption]

About Eric Bauer
Eric Bauer is reliability engineering manager in the IP Platforms Group of Alcatel-Lucent. He currently focuses on reliability and availability of Alcatel-Lucent's cloud related offerings, IMS and other solutions. Before focusing on reliability engineering topics, Mr. Bauer spent two decades designing and developing embedded firmware, networked operating systems, IP PBXs, internet platforms, and optical transmission systems. He has been awarded more than a dozen US patents, authored four engineering books and has published several papers in the Bell Labs Technical Journal. Mr. Bauer holds a BS in Electrical Engineering from Cornell University, Ithaca, New York, and an MS in Electrical Engineering from Purdue University, West Lafayette, Indiana. He lives in Freehold, New Jersey.