Why it is so difficult to find an accurate and verified root cause… and what to do about it.(Updated version of blog originally seen here.)
Firstly, there is a great deal of confusion about a Root Cause as opposed to a Technical Cause. Secondly, you cannot really determine a verifiable Root Cause before first identifying a Technical Cause.
We’ve been working with Root Cause Analysis practices for the last 29 years and have never found so many different views on Root Cause as there are in the IT Industry. The term “root cause” is widely and loosely used to describe many diverse situations and causes. We need to clarify our terms first. The CIO of a global automobile manufacturer relayed to us the following story: “When I asked my team how we were conducting our RCA’s, I got eight difference responses. In effect, we had no RCA. ”
I think we can all agree that there are direct and indirect causes for an incident, accident or problem. In fact, for every incident there is at least one technical cause and one root cause. They are not the same thing. The confusion we see in Root Cause Analysis practices is that not all IT Professionals understand this difference.
Here at KEPNERandFOURIE we emphasize this distinction by introducing a term to make it easier for us understand this difference. The term is TECHNICAL CAUSE. The technical cause is a cause that is directly responsible for the “final straw that breaks the camel’s back. “ It is normally something physical or technical, hence the term “technical cause”.
When you research the term “root cause”, you will find the following alternate concepts being used; source, root, origin, basis and foundation, which really describes a root cause accurately. We explain this type of cause as being a systemic reason, basis or origin that caused something else to happen. This root cause is an indirect, hidden or underlying reason that caused something else (technical) to happen.
WHAT IS THE DIFFERENCE?
The best way to make sense of this is to remember the following:
- A TECHNICAL CAUSE is “an event in time”: Something happened (or didn’t happen) that should have happened (or not happened). That is why most investigators look for changes or a specific change that could have caused the incident. In other words “out of date documentation” cannot be a technical cause, because it is not an event in time. However, “increased volume” or “volume spikes” can be a technical cause, because something happened and it constitutes a change.
- A ROOT CAUSE on the other hand is “a condition that exists”. It is a condition that has been created or has been there for some time and it is still that way and will be that way for a foreseeable time. So, “increased volume” or “operator error” cannot be a root cause, because something changed at a point in time. However, “out of date documentation, legacy software, poor procedures and hardware specs are good examples of root causes. Unless they are changed, this condition will always have some impact on operations.
- There is a definite relationship between Technical Causes and Root Causes. The root cause is the trigger for the occurrence of the technical cause; the technical cause is the reason for the incident. So, you need to work your way backwards through this relationship. Start by identifying the specific fault in the incident accurately. Once you have identified the correct fault, you can then, and only then, determine the technical cause. Look for that “event in time” or change that would explain the fault. Only once you’ve identified, and most importantly, verified the technical cause, then you can use the 5 WHY technique to drill down to a “condition that exists” and thus determine the accurate Root Cause.
WHAT ARE THE BENEFITS
- Reduced cycle times with MTTR and MTR
- Reduced list of unresolved problems
- Reduction of recurring incidents
- Ultimately reduced DOWNTIME and REWORK
TAKE ACTION ON THIS OPPORTUNITY
We have trained countless IT Professionals on how to draw the distinction between a Technical and a Root Cause. They know it is imperative to first determine the Technical Cause before proceeding to the Root Cause. This approach changed their view of incidents and problems and how to deal with them successfully. It has resulted in greatly reduced MTR and MTTR.
Click through to this page if you are interested in learning more about this approach
If you are interested to speak to someone about tailoring this approach for your specific needs, contact the following per the respective countries:
For US : Bill Dunn ( firstname.lastname@example.org
For UK : John (email@example.com)
For Australia : Andrew Sauter (firstname.lastname@example.org)
For Singapore : Steven Loo (email@example.com)