What Is Unique About This Fault?

Mar 11, 2015 4:26:00 PM

This will get you and your team to the core in minutes!


UniqueProblemIf you could not restore a service within three hours there is something unique about the fault being
experienced – do you know what it is?


It is a fair assumption that if you experience an incident that is typical and normally has typical causes then all you have to do is to find which cause is the culprit this time. An example would be when you cannot get logged in to your normal email service. You know from your own experience that only a few things could cause you to be unable to get access. You quickly check these normal factors until you have found the cause and “solved” the situation. 


However, did you know that if you knew the unique factor about your fault that you could have gone directly to the one or two typical causes that would be able to explain that uniqueness only?
This would have made it possible to avoid testing all the possibilities before you got to the one that caused the incident.


This scenario is even more important in any IT/Business relationship and looking for the unique factors first could save you and your business colleagues a lot of effort, time and money. Again let’s look at an example. You have a problem that you get booted off a website while doing quotes. Here are the typical reasons why this could happen:

  1. Your browser has a particular time out setting
  2. You are using an unauthorized key or character
  3. Too much traffic on the web and getting bumped off
  4. There is a certain field that has a compatibility issue
  5. There is a corrupted file in the application and you need to reboot
  6. Search engine having intermittent problems causing you to be booted off


I think you would agree that it would take a long time to work through all of these possibilities why you are getting booted off. However, let me ask the uniqueness question.
“What is unique about the fault of being booted off?” The uniqueness could be in the location, timing, type of user or size of fault. In this case I only get booted off after 4pm in the afternoon, every afternoon. That is what is unique about my situation. 

So, which of the six possible reasons mentioned above would be able to explain why I only get booted off at the end of the day? The aim would then be to find the one or two reasons that would explain this uniqueness and then to focus on these to restore my service. The following is a rudimental explanation for the sake of this example: 


uniquefault


Looking at the example above the only possible reason that could remotely attempt to explain the situation is the only time related reason that could explain why it was happening at 4pm every afternoon. This should be easy to check with Networks and if confirmed would need a “workaround” suggested by Networks.

So, the conclusion is that if we do know what the unique component of our fault is that would enable us to focus in onto the most probable cause/reason quickly and help us to restore service quickly and accurately without having to perform too many “trial & error” fixes. 

Download the Free Ebook: Quality Information Generation

 

Mat-thys Fourie

Written by Mat-thys Fourie

Washington, DC, United States | Founder & Chairman of Thinking Dimensions Global
Mr. Fourie is a thought leader on how IT professionals apply Incident Investigation techniques on a repeatable and sustainable basis within their organizations. His strength lies in customizing and embedding the various techniques within existing CSI, Incident and Problem Management practices.

SIGN UP TO OUR NEWSLETTER

Sign up for our newsletter and receive updates that will help your business to grow. Do not waste time, we're here for you.