UNDERSTANDING THE REAL MEANING OF A ROOT CAUSE
Most IT Professionals do not make the distinction between the Technical Cause of an incident and its Root Cause. These terms are often used inter-changeably or worse case, there is no distinction made between them at all. The IT Professional who “gets this” distinction has a major advantage over his/her colleagues in determining a Root Cause quicker, cheaper and permanently.
Is that root cause…really?
I’ve been working with Root Cause Analysis practices for the last 29 years and never found so many different views on Root Cause as in the IT Industry. The term “root cause” is loosely used to describe many kinds of causes and we need to clarify this first.
I think we can all agree that there are direct and indirect causes for an incident, accident or problem. In fact, for every incident there is at least one direct and one indirect cause. The confusion we see in Root Cause Analysis practices is that not all IT Professionals understand this difference.
Implications of this confusion…
The biggest negative impact is that IT staff on the same site are saying the same thing but actually meaning different things. They might mention “a volume spike” as being the cause identified and some of them are actually meaning direct cause and others again might even see it as the root of the situation. So, let’s try to define this difference accurately and how it impacts “restoration” and “repair” efforts.
Here at KEPNERandFOURIE we’ve decided to introduce another term that would make it easier for all to understand the difference, and that is TECHNICAL CAUSE. The technical cause would be a cause that is directly responsible for the final straw that broke the camel’s back. It is normally something “technical that broke” and hence the term “technical cause”. This would typically be the INCIDENT CAUSE, DIRECT CAUSE or TRUE CAUSE.
When you look up the term “cause” on thesaurus you will find the following alternate concepts being used; source, root, origin, basis and foundation, which really describes a root cause accurately. We explain this type of cause as being a company systemic reason, basis or origin that caused something else to happen. This is normally an indirect or underlying reason that caused something else (technical) to happen.
What is the difference?
The best way to make sense of this is to remember the following:
- A TECHNICAL CAUSE is “an event in time,” something happened and that is why most investigators are looking for changes or a specific change that could have caused the incident. We prefer to refer to this as the “event that triggered” the incident. In other words, “out of date documentation” cannot be a technical cause, because it is not an “event in time.” However, “increased volume” is truly a technical cause, because something happened and it constitutes a change and could easily be the “event that triggered the incident.”
- A ROOT CAUSE on the other hand is “a condition that exists.” It is a condition that has been like that for some time and it is still that way and will be that way for a foreseeable time. So, “increased volume” or “operator error” cannot be a root cause, because something changed. However, “out of date documentation, legacy software, poor procedures and hardware specs are good examples of root causes, unless changed, will always have some impact on operations."
PS: The above are general guidelines and as always there are cases where there is an exception to this self-made rule.