There’s a dogmatic view that every complex, IT service management problem has a single root cause when very often there are actually multiple causes.
When 21 power stations shut down in just three minutes causing up to 48-hour blackouts for 55m people in the North East US and Canada in 2003, the cause was blamed on one power station in Akron, Ohio. However, further investigation revealed a host of contributory causes involving people, overgrown foliage, outdated technology and faulty monitoring.
So, why are complex problems so challenging to resolve?,
We don’t make it easy for ourselves. Quite often, organizations have embraced a (silly) mindset and service level agreement of completing root cause analysis for a complex problem within 24 hours. This drives people to focus on the “trigger” (like the Ohio power station) and resolve it while – like looking at the tip of an iceberg – missing everything else going on below.
This approach is essentially resolving the trigger but ignoring all of the underlying problems that need exploring further. Sometimes, this focus on speed reflects an attitude of proving that “it wasn’t our f+ault” rather than finding a higher quality resolution.
Avoiding the risk of people allocating blame and vindicating their actions needs a change in approach when dealing with complex problem resolution and ITIL® 4’s guiding principles provide such an approach.
Employing ITIL 4’s seven guiding principles
ITIL 4’s guiding principles combine concepts from a number of best practices – including Agile, Lean and DevOps – and are based in reality, pragmatic and proven.
There are five steps for complex problem resolution where the principles make sense:
- Simplifying the focus
Can you deploy a minimum viable fix to resolve the problem but then focus on making the fix more effective at a later date? This uses the principles of keep it simple and practical and focus on value. - Decide what “fixed” looks like
We can’t always “fix” something particularly if it involves third party software or infrastructure. So, ask yourself if “fixed” is reducing the risk of a problem happening, minimizing its impact or making it easier to spot before it happens again. This is a great example of starting where you are. - Solve complex problems in stages
Break down your problem into its component parts rather than trying to solve one, big problem at once. This is about progressing iteratively with feedback. For example, in one retail setting an application sending out product price updates to stores in batches was failing. Testing the updates by sending them individually showed what worked and what didn’t. Splitting the problem down helped us find the underlying cause and fix. - Know that your fix will work
If you know beforehand the outcome you want, what indicators will help make sure a problem resolution will work? An analogy is losing weight: rather than waiting to get on the scales, a lagging indicator, ask your gym to help you measure how much you exercise or use an app measure the calories you are eating. Both are early indicators of success! So, by collaborating and promoting visibility, you can access metrics and feedback that will help give advance notice of success in problem resolution. - Trial and error
Many organizations are risk averse but, sometimes, you have to try a potential resolution to see if it works. For example, if replacing a part or component fixes a problem, you can track back from this to identify potential causes. This is another example of progressing iteratively with feedback.
Reducing risk before problem resolution
IT problems in 2021 are, comparatively, way more complex and involved than 20 years ago – and today’s technologies are so intertwined with our everyday activities.
If problem resolution is not immediately visible, then aim to reduce the chance of problem repetition, limit its impact or make it easier to spot next time. Finally, please stop promising root cause analysis in 24 hours!
News by Barry Corless – ITSM expert