How to identify the reason an incident occurred by asking just 5 questions? (And these are not the 5 WHYs...)
Our approach is surprisingly simple, single-minded and focused. Every action is based on finding the CORRECT FAULT! Almost every time when our consultants get involved with a Bridge Team trying to restore an action, we find them floundering to identify the correct fault. All we do differently to what has been tried before is ask searing questions of the SMEs that focus on the fault.
Here are the two sole success factors
- Identifying the CORRECT FAULT – This is easier said than done. In my last 50 sessions, more than 95% of the time the “bridge team” was working on the wrong fault! The tendency is to deal with incidents at the wrong level. A typical example is when the team was struggling with a "Website is SLOW" complaint. Slow websites are one of the most popular culprits in not making quick progress on a restoration action.
Why is that? Because, “webpage is slow” is too vague and generic. There are a myriad of reasons that could have triggered this incident. This vagueness impedes the team significantly. Normally, when questioning a team about “SLOW” they respond with “slow is slow – what don’t you understand about that?” and that is where the conversation normally ends.
However, when questioning the team more deeply, they arrived at "the Next Button not activating." This presented the team with only three possible technical reasons that could have triggered this incident. Our focused and probing questions will reduce the most complex incident situation to a simple, single fault every time…and quickly.
- Identifying the CORRECT SMEs - Ask the right question from the right subject matter expert to get the right answer! Simply asking the right person our uniquely structured questions will deliver the right answer! Using the same example from above of the "Webpage being slow," we would work with different SMEs than the SMEs we worked with on the alternate Incident Statement of "Next Button not activating." This approach normally reduces the staff on the bridge significantly.
Have you ever had the benefit of learning how to ask 5 very specific questions around and about the fault?
Questions such as:
- What is the most specific fault we are dealing with? (Fault Drill)
- What is unique about this fault (there are six possible dimensions)?
- What is the virtual object that is associated with this fault?
- What could explain this uniqueness?
- Which explanation is the most probable event that triggered the incident?
This approach has delivered the following results to our users:
- Reduced MTR by at least 80%.
- Reduced workaround "Band-Aid" restorations by at least 50%.
- Stopped the "blame game" and had teams collaborating properly.
- Executed a clear and seamless handover between Incident & Problem management.
- Enabled and promoted a "thinking on your feet" behavior.
- Harnessed the power of a fast-structured approach with a common language.
This structured thinking approach will help Incident Investigation & Restoration staff to start their analysis at the right level of specificity and to get to a subsequent restoration, quickly and efficiently. This approach is supported by agile templates.
In summary, if the MIM and/or investigation team can refine the right level of specificity regarding the fault, identify the uniqueness of this fault and use SME expertise to explain how this unique fault has occurred, they will be problem solvers on steroids! Knowing what really happened technically will leverage an effective restoration first time every time.