I’ve been working with Root Cause Analysis practices for the last 29 years and never found so many different views on Root Cause as in the IT Industry. The term “root cause” is loosely used to describe many kinds of causes and we need to clarify this first.
I think we can all agree that there are direct and indirect causes for an incident, accident or problem. In fact, for every incident there is at least one direct and one indirect cause. The confusion we see in Root Cause Analysis practices is that not all IT Professionals understand this difference.
Here at KEPNERandFOURIE we’ve decided to introduce another term that would make it easier for all to understand the difference, and that is TECHNICAL CAUSE. The technical cause would be a cause that is directly responsible for the final straw that broke the camel’s back. It is normally something technical that broke and hence the term “technical cause”. This would typically be the DIRECT CAUSE or TRUE CAUSE.
When you look up the term “cause” on thesaurus you will find the following alternate concepts being used; source, root, origin, basis and foundation, which really describes a root cause accurately. We explain this type of cause as being a company systemic reason, basis or origin that caused something else to happen. This is normally an indirect or underlying reason that caused something else (technical) to happen.
WHAT IS THE DIFFERENCE?
The best way to make sense of this is to remember the following:
- A TECHNICAL CAUSE is “an event in time,” something happened and that is why most investigators are looking for changes or a specific change that could have caused the incident. In other words “Out of date documentation” cannot be a technical cause, because it is not an event in time. However, “increased volume” is truly a technical cause, because something happened and it constitutes a change.
- A ROOT CAUSE on the other hand is “a condition that exists.” It is a condition that has been like that for some time and it is still that way and will be that way for a foreseeable time. So, “increased volume” or “operator error” cannot be a root cause, because something changed. However, “out of date documentation, legacy software, poor procedures and hardware specs are good examples of root causes, unless changed will always have some impact on operations.
PS: The above are general guidelines and as always there are cases where there is an exception to this self-made rule.