DCIM Yields Return on Investment

By: Michael Potts

As with any investment in the data center, the question of the return on the investment should be raised before purchasing a Data Center Infrastructure Management (DCIM) solution. In the APC white paper, “How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Costs,” the authors highlight the savings from a DCIM solution saying, “The deployment of modern planning tools can result in hundreds of man-hours saved per year and thousands of dollars saved in averted downtime costs.”

DCIM will not transform your data center overnight, but it will begin the process. While it isn’t necessary to reach the full level of maturity before seeing benefits, the areas of benefit are significant and can bring results in the short-term. The three primary methods in which DCIM provides ROI are:

Improved Energy Efficiency
Improved Availability
Improved Manageability

DCIM LEADS TO IMPROVED ENERGY EFFICIENCY

In his blog, Dan Fry gets right to the heart of DCIM’s role in improving energy efficiency when he says, “To improve energy efficiency inside the data center, IT executives need comprehensive information, not isolated data. They need to be able to ‘see’ the problem in order to manage and correct it because, as we all know, you can’t manage what you don’t understand.”

The information provided by DCIM can help data center managers in reducing energy consumption:

MATCHING SUPPLY WITH DEMAND

Oversizing is one of the biggest roadblocks to energy efficiency in the data center. In an APC survey of data center utilization, only 20 percent of respondents had a utilization of 60 percent or more, while 50 percent had a utilization of 30 percent or less. One of the primary factors for oversizing is the lack of power and cooling data to help make informed decisions on the amount of infrastructure required. DCIM solutions can provide information on both demand and supply to allow you to “right-size” the infrastructure, reducing overall energy costs by as much as 30 percent.

IDENTIFYING UNDER-UTILIZED SERVERS

As many as 10 percent of servers are estimated to be “ghost servers,” servers which are running no applications, yet still consume 70 percent or more of the resources of a fully-utilized server. DCIM solutions can help to find these under-utilized servers Which could be decommissioned, re-purposed or consolidated as well as servers which do not have power management functionality enabled, reducing IT energy usage as well as delaying the purchase of additional servers.

MEASURING THE IMPACT OF INFRASTRUCTURE CHANGES

DCIM tools can measure energy efficiency metrics such as Power Usage Effectiveness (PUE), Data Center Infrastructure Efficiency (DCiE) and Corporate Average Datacenter Efficiency (CADE). These metrics serve to focus attention on increasing the energy efficiency of data centers and to measure the results of changes to the infrastructure. In the white paper “Green Grid Data Center Power Efficiency Metrics: PUE and DCiE,” the authors lay out the case for the introduction of metrics to measure energy efficiency in the data center. The Green Grid believes that several metrics can help IT organizations better understand and improve the energy efficiency of their existing data centers as well as help them make smarter decisions on new data center deployments. In addition, these metrics provide a dependable way to measure their results against comparable IT organizations.

IMPROVED AVAILABILITY

DCIM solutions can improve availability in the following areas:

Understanding the Relationship Between Devices
A DCIM solution can help to answer questions such as “What systems will be impacted if I take the UPS down for maintenance?” It does this by understanding the relationship between devices, including the ability to track power and network chains. This information can be used to identify single points of failure and reduce downtime due to both planned and unplanned events.

Improved Change Management
When investigating an issue, examination of the asset’s change log allows problem managers to recommend a fix over 80 percent of the time, with a first fix rate of over 90 percent. This reduces the mean time to repair and increases system availability. DCIM systems which automate the change management process will log both authorized and unauthorized changes, increasing the data available to the problem manager and increasing the chances the issue can be quickly resolved.

Root Cause Analysis
One of the problems sometimes faced by data center managers is too much data. Disconnecting a router from the network might cause tens or hundreds of link lost alarms for the downstream devices. It is often difficult to find the root cause amidst all of the “noise” associated with cascading events. By understanding the relationship between devices, DCIM solution can help to narrow the focus to the single device — the router, in this case — which is causing the problem. By directing focus on the root cause, the problem can be resolved more quickly, reducing the associated downtime.

IMPROVED MANAGEABILITY

DCIM solutions can improve manageability in the following areas:

Data Center Audits
Regulations such as Sarbanes-Oxley, HIPA and CFR-11 increase the requirements for physical equipment audits. DCIM solutions provide a single source of the data to greatly reduce the time and cost to complete the audits. Those DCIM tools utilizing asset auto-discovery and asset location mechanisms such as RFID can further reduce the effort to perform a physical audit.

Asset Management
DCIM can be used to determine the best place to deploy new equipment based on the availability of rack space, power, cooling and network ports. It then can be used to track all of the changes from the initial request through deployment, system moves and changes, all the way through to decommissioning. The DCIM solution can provide detailed information on thousands of assets in the data center including location, system configuration, how much power it is drawing, relationship to other devices, and so on, without having to rely on spreadsheets or home-grown tools.

Capacity Planning
With a new or expanded data center representing a substantial capital investment, the ability to postpone new data center builds could save millions of dollars. DCIM solutions can be used to reclaim capacity at the server, rack and data center levels to maximize space, power and cooling resources. Using actual device power readings instead of the overly conservative nameplate values will allow an increase in the number of servers supported by a PDU without sacrificing availability. DCIM tools can track resource usage over time and provide much more accurate estimates of when additional equipment needs to be purchased.

This is the fifth article in the Data Center Knowledge Guide to DCIM series. To download the complete DCK Guide to DCIM click here.

Jeffrey S. Klaus April 30, 2012

Every server in a data center runs on an allotted power cap that is programmed to withstand the peak-hour power consumption level. When an unexpected event causes a power spike, however, data center managers can be faced with serious problems. For example, in the summer of 2011, unusually high temperatures in Texas created havoc in data centers. The increased operation of air conditioning units affected data center servers that were already running close to capacity.

Preparedness for unexpected power events requires the ability to rapidly identify the individual servers at risk of power overload or failure. A variety of proactive energy management best practices can not only provide insights into the power patterns leading up to problematic events, but can offer remedial controls that avoid equipment failures and service disruptions.

Best Practice: Gaining Real-Time Visibility

Dealing with power surges requires a full understanding of your nominal data center power and thermal conditions. Unfortunately, many facilities and IT teams have only minimal monitoring in place, often focusing solely on return air temperature at the air-conditioning units.

The first step toward efficient energy management is to take advantage of all the power and thermal data provided by today’s hardware. This includes real-time server inlet temperatures and power consumption data from rack servers, blade servers, and the power-distribution units (PDUs) and uninterrupted power supplies (UPSs) related to those servers. Data center energy monitoring solutions are available for aggregating this hardware data and for providing views of conditions at the individual server or rack level or for user-defined groups of devices.

Unlike predictive models that are based on static data sets, real-time energy monitoring solutions can uncover hot spots and computer-area air handler (CRAH) failures early, when proactive actions can be taken.

By aggregating server inlet temperatures, an energy monitoring solution can help data center managers create real-time thermal maps of the data center. The solutions can also feed data into logs to be used for trending analysis as well as in-depth airflow studies for improving thermal profiles and for avoiding over- or undercooling. With adequate granularity and accuracy, an energy monitoring solution makes it possible to fine-tune power and cooling systems, instead of necessitating designs to accommodate the worst-case or spike conditions.

Best Practice: Shifting From Reactive to Proactive Energy Management

Accurate, real-time power and thermal usage data also makes it possible to set thresholds and alerts, and it introduce controls that enforce policies for optimized service and efficiencies. Real-time server data provides immediate feedback about power and thermal conditions that can affect server performance and ultimately end-user services.

Proactively identifying hot spots before they reach critical levels allows data center managers to take preventative actions and also creates a foundation for the following:

Managing and billing for services based on actual energy use
Automating actions relating to power management in order to minimize the impact on IT or facilities teams
Integrating data center energy management with other data center and facilities management consoles.

Best Practice: Non-Invasive Monitoring

To avoid affecting the servers and end-user services, data center managers should look for energy management solutions that support agentless operation. Advanced solutions facilitate integration, with full support for Web Services Description Language (WSDL) APIs, and they can coexist with other applications on the designated host server or virtual machine.

Today’s regulated data centers also require that an energy management solution offer APIs designed for secure communications with managed nodes.

Best Practice: Holistic Energy Optimization

Real-time monitoring provides a solid foundation for energy controls, and state-of-the-art energy management systems provide enable dynamic adjustment of the internal power states of data center servers. The control functions support the optimal balance of server performance and power—and keep power under the cap to avoid spikes that would otherwise exceed equipment limits or energy budgets.

Intelligent aggregation of data center power and thermal data can be used to drive optimal power management policies across servers and storage area networks. In real-world use cases, intelligent energy management solutions are producing 20–40 percent reductions in energy waste.

These increases in efficiency ameliorate the conditions that may lead to power spikes, and they also enable other high-value benefits including prolonged business continuity (by up to 25 percent) when a power outage occurs. Power can also be allocated on a priority basis during an outage, giving maximum protection to business-critical services.

Intelligent power management for servers can also dramatically increase rack density without exceeding existing rack-level power caps. Some companies are also using intelligent energy management approaches to introduce power-based metering and energy cost charge-backs to motivate conservation and more fairly assign costs to organizational units.

Best Practice: Decreasing Data Center Power Without Affecting Performance

A crude energy management solution might mitigate power surges by simply capping the power consumption of individual servers or groups of servers. Because performance is directly tied to power, an intelligent energy management solution dynamically balances power and performance in accordance with the priorities set by the particular business.

The features required for fine-tuning power in relation to server performance include real-time monitoring of actual power consumption and the ability to maintain maximum performance by dynamically adjusting the processor operating frequencies. This requires a tightly integrated solution that can interact with the server operating system or hypervisor using threshold alerts.

Field tests of state-of-the-art energy management solutions have proven the efficacy of an intelligent approach for lowering server power consumption by as much as 20 percent without reducing performance. At BMW Group,[1]for example, a proof-of-concept exercise determined that an energy management solution could lower consumption by 18 percent and increase server efficiency by approximately 19 percent.

Similarly, by adjusting the performance levels, data center managers can more dramatically lower power to mitigate periods of power surges or to adjust server allocations on the basis of workloads and priorities.

Conclusions

Today, the motivations for avoiding power spikes include improving the reliability of data center services and curbing runaway energy costs. In the future, energy management will likely become more critical with the consumerization of IT, cloud computing and other trends that put increased service—and, correspondingly, energy—demands on the data center.

Bottom line, intelligent energy management is a critical first step to gaining control of the fastest-increasing operating cost for the data center. Plus, it puts a data center on a transition path towards more comprehensive IT asset management. Besides avoiding power spikes, energy management solutions provide in-depth knowledge for data center “right-sizing” and accurate equipment scheduling to meet workload demands.

Power data can also contribute to more-efficient cooling and air-flow designs and to space analysis for site expansion studies. Power is at the heart of optimized resource balancing in the data center; as such, the intelligent monitoring and management of power typically yields significant ROI for best-in-class energy management technology.

DCIM DataCenter Blog

Wednesday, June 6, 2012