By Patrick Schmidt
Back in the early summer, I blogged about Heatwaves, Electrical Outages, and SMEs. No, I’m not going to talk about the weather this time. We’ve had enough of that… for now. Instead, I will revisit the topic of subject matter expertise with some support from our friends at the Uptime Institute (UI) and their UI Intelligence practice.
The Uptime Institute Global Data Center Survey 2021, now in its eleventh year, covers a wide variety of topics of interest to data center owners and operators. Want to know about trends in power usage effectiveness (PUE)? The survey covers it. Curious about the number of outages and their severities? That’s in there too.
In the past years, when I wrote blogs about the UI Survey it was just a summary of the executive summary and pointed to some specific statistics. This year I want to take a different approach and treat one set of findings with more extensive consideration. That is – why are outages happening?
Equipment is more reliable than ever, and systems designed with redundancy can keep an application running even when failures occur. One would think that the number of outages should be near zero. That is not the case, as 69% of respondents to the UI Survey experienced some sort of outage in the past 12 months. In addition, the number of severe outages is holding steady. Why is that?
According to the survey, the answer appears to lie with the datacenter managers and employees themselves. According to the UI, “Our survey suggests a high and growing percentage of failures could have been prevented by better management and processes.” Even further, 76% of the survey respondents believe that their most recent downtime event could have been prevented with better management, processes, or procedures. These are all human factors.
More than three-quarters (79%) of the responses indicated there was some sort of human failure in their last outage. So, are these facilities simply understaffed? Apparently, most did not think so. Only 18% of those citing human error as a factor said insufficient staff was a cause.
What were the leading factors? Failure to follow procedures was a cause 48% of the time and utilizing incorrect procedures 41% of the time.
Over a third (36%) believed that in-service issues like inadequate maintenance or equipment adjustments were the root of downtime.
Other issues played a role as well. Datacenter design errors or omissions were cited 22% of the time and preventative maintenance frequency issues came in at 20%.
I promised to not just rattle off statistics, so let’s get back to the blog title. Every one of these human failures has something in common. Their negative effects can be mitigated or eliminated by engaging the right subject matter expert (SME) at the right time.
Failure to follow procedures can be a training issue and incorrect procedures a documentation issue. The right SME can “do it with you” for training purposes and develop the correct runbooks for documentation issues.
What about the 36% that believed inadequate maintenance or equipment adjustments were causes? When was the last time you looked at your contracts, assets, and their lifecycles? The SMEs providing services like the LRS Technology Lifecycle Management Review can help ensure your enterprise is covered properly and code levels are up to date.
Engaging an SME early in any process can boost your chances of success and smooth operation. Datacenter design errors or omissions can be eliminated by engaging a seasoned expert. In addition, consulting with maintenance experts about type and frequency of services can save you time and money.
Finally, sometimes you will need to augment your team due to insufficient personnel. LRS IT Solutions offers a wide range of SME services from both OEMs and our own certified staff. We offer expertise in hardware, software, and a variety of services from analytics to security.
To find out more about how LRS IT Solutions subject matter experts can help you reach your goals, contact us by filling out the form below and we will match you with one of our seasoned experts.
About the author
Patrick Schmidt is a Technology Lifecycle Management Specialist with LRS IT Solutions. For more than 21 years, he has been helping customers get a firm grasp on their asset and contract management with a combination of comprehensive service level analysis and lifecycle management best practices.