Reducing IT fire-fighting and improving business agility

By Mark Thiele April 10, 2014

Too much time spent fighting fires of maintenance and support and not enough time solving the underlying issues
Good leadership, excellent process/automation and solid training essential to reduce risk

Reducing IT fire-fighting and improving business agility IS IT reasonable to assume that if you were buying a safe for all our valuables that you’d buy the one that is the best combination of security and cost?

This combination of security and cost would be driven by your budget and the value (intrinsic or sentimental) of your precious items. I would guess that the same principle of budget versus value would apply to protecting your IT environments.

The normal enterprise IT environment is filled with hundreds of applications. In most cases each of these applications is supported by unique design at the hardware and software level, if not also at the network layer.

The fact that there is so much uniqueness about our IT environments means we expend inordinate amounts of time dealing with common problems in 100 unique ways. Maintaining these environments has become the bane of enterprise IT groups.

By now, we’ve all heard the story of how keeping the lights on comprises 70-80% of the IT budget leaving only a small amount for much needed innovation.

Keeping the lights on has several meanings, including the mundane but critical 'general maintenance and support' of each environment. However, keeping the lights on could also mean avoiding outages.

Generically speaking, all of us in IT attempt to build and maintain environments with the highest possible availability (within budget and available resources).

The problem is that we’re often spending too much time fighting fires of ‘maintenance and support’ and not enough time solving the underlying issues that cause many of the fires or in this case cause many of the outages (same as a fire only worse).

Where should IT focus its attention relative to avoiding outages and or reducing the number of fires?

If you can’t focus on everything

Few IT organizations have the luxury of being able to throw as much money or bodies at a problem as they’d like. So, if you have to pick which efforts will provide the most ‘keeping the lights on’ value for the dollar, you should pick something that can be fixed for everything, and fixed once.

How is that possible? How can I fix something for everything, isn’t that the opposite of focusing on something important and avoiding getting lost in the fire? The simple answer is no.

Data Centre as a Platform (DCaaP)

There are only a few services and solutions that affect everything in IT — one of them is the data centre and the other is people. The two areas of people and data centre are where the most value can be gained in virtually any IT organization when it comes to reducing risk and the threat of fires.

That’s right, just two areas, both of which can be worked on with minimal impact to active production environments and in most cases without too much additional expense.

It’s well documented that humans are almost always the biggest single risk factor to the availability of systems. The more humans need to be involved, the more likely a mistake will get made and a failure will occur.

We all talk about hardware failure and power failures, even viruses and software bugs, but if you want to reduce risk, you reduce the human touch factor.

The simple answer is that you need a combination of three things: good leadership, excellent process/automation, and solid training.

When it comes to owning and operating a data centre as a system, it begins to get a little more complex. Most organizations fail to treat the data centre as a system and are constantly dealing with components or services independent of the DCaaP.

While there are hundreds of discrete components and services that make up a functioning data centre, it is no different than how you might work on a car. You don’t talk about replacing the tires on your car without considering whether they will fit on the rims, fit in the wheel well, or cause the handling to change.

The same holds true with a data centre, as there is virtually nothing in a data centre that can be changed without having some effect on the performance of the system.

Some of the more well known discrete components of a data centre include power, HVAC, security, water, and environment (I.e., humidity, cleanliness, and temperature).

Any one of these areas could be cause for fires in IT but all of them together combine for high overhead if they aren’t expertly managed. Oft overlooked is networking/connectivity. Which, while a part of the data centre, connectivity is also an underlying service bridging all applications.

All of the aforementioned DC services, including connectivity should be considered part of DCaaP. Imagine if instead of buying a bunch of air conditioners, routers, UPS units, PDUs, racks, sensors, ducting, cable, ladder rack and etc., you could instead buy a package?

This likely isn’t news to anyone, but that’s what a colocation provider is supposed to offer – Data Centre-as-a-Platform.

Not all colocation providers are created equal, so just moving from your data centre to theirs won’t necessarily solve any problems and in fact could cause new ones.

The real opportunity is moving to a colocation provider that can improve your level of service by driving down the risk of fires to zero and increasing your ability to address new opportunities.

Imagine improving customer and employee satisfaction, while giving your employer better tools, simultaneously lowering costs and reducing your carbon footprint.

Remove the issue and focus on business enabling priorities

Generally speaking I don’t believe in washing my hands of a problem, I prefer fixing it. However, building and operating data centres at the highest levels of efficiency, performance and availability isn’t for the faint of heart.

As mentioned earlier, the majority of businesses don’t have the organizational alignment that allows for building and running first class data centres.It’s also true that building a data centre means locking in a 15-year business plan and capital expenditure (CapEx) investment that isn’t easily adjusted on the fly.

In the modern IT space, building a 15-year business plan and locking in a bunch of CapEx isn’t conducive to agility.

Reduce your overhead and fire risks while improving agility and lowering costs

Find a data centre partner that makes it their life’s work to provide their customers the equivalent of an Indy car for agility, a Tesla for efficiency and safety, and an armoured car for security in the form of the most efficient data centres with Tier IV Gold availability ratings combined with connectivity options beyond compare.

What else can you do in IT that will reduce your overhead and put out many of your fires while also improving agility and lowering costs?

Another way of looking at this opportunity is that you’re helping to ‘future proof’ your investments. When it comes to value, what better way to obtain value than by actually improving your operational capability and agility? Wouldn’t you rather focus on capability and agility first and have efficiency go up while costs go down, all as a side effect?

Mark Thiele is the executive vice-president of Data Center Tech at Switch Communications. He is a long-time blogger and enthusiastic industry evangelist. He maintains a regular blog for Switch as the SwitchScribe, where this article originally appeared. It is re-published here with his permission.

Related Stories:

If the tech don’t fit, you must convict

I’ll get right on that, boss – data centre ownership

Mobility’s impact on your data centre decisions

Open source IT vs commercial IT: What’s your call?

For more technology news and the latest updates, follow @dnewsasia on Twitter or Like us on Facebook.