April 30, 2012
Every server in a data center runs on an allotted power cap that is programmed to withstand the peak-hour power consumption level. When an unexpected event causes a power spike, however, data center managers can be faced with serious problems. For example, in the summer of 2011, unusually high temperatures in Texas created havoc in data centers. The increased operation of air conditioning units affected data center servers that were already running close to capacity.
Preparedness for unexpected power events requires the ability to rapidly identify the individual servers at risk of power overload or failure. A variety of proactive energy management best practices can not only provide insights into the power patterns leading up to problematic events, but can offer remedial controls that avoid equipment failures and service disruptions.
Best Practice: Gaining Real-Time Visibility
Dealing with power surges requires a full understanding of your nominal data center power and thermal conditions. Unfortunately, many facilities and IT teams have only minimal monitoring in place, often focusing solely on return air temperature at the air-conditioning units.
The first step toward efficient energy management is to take advantage of all the power and thermal data provided by today’s hardware. This includes real-time server inlet temperatures and power consumption data from rack servers, blade servers, and the power-distribution units (PDUs) and uninterrupted power supplies (UPSs) related to those servers. Data center energy monitoring solutions are available for aggregating this hardware data and for providing views of conditions at the individual server or rack level or for user-defined groups of devices.
Unlike predictive models that are based on static data sets, real-time energy monitoring solutions can uncover hot spots and computer-area air handler (CRAH) failures early, when proactive actions can be taken.
By aggregating server inlet temperatures, an energy monitoring solution can help data center managers create real-time thermal maps of the data center. The solutions can also feed data into logs to be used for trending analysis as well as in-depth airflow studies for improving thermal profiles and for avoiding over- or undercooling. With adequate granularity and accuracy, an energy monitoring solution makes it possible to fine-tune power and cooling systems, instead of necessitating designs to accommodate the worst-case or spike conditions.
Best Practice: Shifting From Reactive to Proactive Energy Management
Accurate, real-time power and thermal usage data also makes it possible to set thresholds and alerts, and it introduce controls that enforce policies for optimized service and efficiencies. Real-time server data provides immediate feedback about power and thermal conditions that can affect server performance and ultimately end-user services.
Proactively identifying hot spots before they reach critical levels allows data center managers to take preventative actions and also creates a foundation for the following:
- Managing and billing for services based on actual energy use
- Automating actions relating to power management in order to minimize the impact on IT or facilities teams
- Integrating data center energy management with other data center and facilities management consoles.
Best Practice: Non-Invasive Monitoring
To avoid affecting the servers and end-user services, data center managers should look for energy management solutions that support agentless operation. Advanced solutions facilitate integration, with full support for Web Services Description Language (WSDL) APIs, and they can coexist with other applications on the designated host server or virtual machine.
Today’s regulated data centers also require that an energy management solution offer APIs designed for secure communications with managed nodes.
Best Practice: Holistic Energy Optimization
Real-time monitoring provides a solid foundation for energy controls, and state-of-the-art energy management systems provide enable dynamic adjustment of the internal power states of data center servers. The control functions support the optimal balance of server performance and power—and keep power under the cap to avoid spikes that would otherwise exceed equipment limits or energy budgets.
Intelligent aggregation of data center power and thermal data can be used to drive optimal power management policies across servers and storage area networks. In real-world use cases, intelligent energy management solutions are producing 20–40 percent reductions in energy waste.
These increases in efficiency ameliorate the conditions that may lead to power spikes, and they also enable other high-value benefits including prolonged business continuity (by up to 25 percent) when a power outage occurs. Power can also be allocated on a priority basis during an outage, giving maximum protection to business-critical services.
Intelligent power management for servers can also dramatically increase rack density without exceeding existing rack-level power caps. Some companies are also using intelligent energy management approaches to introduce power-based metering and energy cost charge-backs to motivate conservation and more fairly assign costs to organizational units.
Best Practice: Decreasing Data Center Power Without Affecting Performance
A crude energy management solution might mitigate power surges by simply capping the power consumption of individual servers or groups of servers. Because performance is directly tied to power, an intelligent energy management solution dynamically balances power and performance in accordance with the priorities set by the particular business.
The features required for fine-tuning power in relation to server performance include real-time monitoring of actual power consumption and the ability to maintain maximum performance by dynamically adjusting the processor operating frequencies. This requires a tightly integrated solution that can interact with the server operating system or hypervisor using threshold alerts.
Field tests of state-of-the-art energy management solutions have proven the efficacy of an intelligent approach for lowering server power consumption by as much as 20 percent without reducing performance. At BMW Group,[1]for example, a proof-of-concept exercise determined that an energy management solution could lower consumption by 18 percent and increase server efficiency by approximately 19 percent.
Similarly, by adjusting the performance levels, data center managers can more dramatically lower power to mitigate periods of power surges or to adjust server allocations on the basis of workloads and priorities.
Conclusions
Today, the motivations for avoiding power spikes include improving the reliability of data center services and curbing runaway energy costs. In the future, energy management will likely become more critical with the consumerization of IT, cloud computing and other trends that put increased service—and, correspondingly, energy—demands on the data center.
Bottom line, intelligent energy management is a critical first step to gaining control of the fastest-increasing operating cost for the data center. Plus, it puts a data center on a transition path towards more comprehensive IT asset management. Besides avoiding power spikes, energy management solutions provide in-depth knowledge for data center “right-sizing” and accurate equipment scheduling to meet workload demands.
Power data can also contribute to more-efficient cooling and air-flow designs and to space analysis for site expansion studies. Power is at the heart of optimized resource balancing in the data center; as such, the intelligent monitoring and management of power typically yields significant ROI for best-in-class energy management technology.