TDX: What is an Incident vs a Problem?

Summary

• An incident is an unplanned disruption or degradation of service.

• A problem is a cause of one or more incidents.

For additional details, click on this article.

Body

  • An incident is an unplanned disruption or degradation of service.

  • A problem is a cause of one or more incidents.

Quite often, in operations, these two terms are used interchangeably. This causes a lot of confusion. Sometimes people will add another term, "issue" to mean the same thing.

What is an Incident?

Based on the definition provided, an incident is something that needs to be resolved immediately. This can either be through a permanent fix, a workaround or a temporary fix. An example of an incident would be a server crash which causes a disruption in the business process. If a server is used only during office hours, a crash after office hours is, strictly speaking based on the definition, not yet an incident since the no service was affected. It becomes an incident only when the outage extends to the hours of use.

If a disruption is planned, for example, a scheduled maintenance, this is not an incident. The outage should not be counted as part of the unavailability. If the scheduled outage exceeds the planned schedule, then the over time for the outage becomes an incident.

If an incident requires changes the emergency change process is normally followed, specially if the service level is critical.

What is a Problem?

Problems however are not incidents. An incident can raise a problem, specially if there is a high possibility that the incident might happen again. In the case of a server crash after office hours, the crash is a problem. This is a high priority problem because if this problem is not resolved, this will become an incident.

An incident does not become a problem. A problem may be raised because of an incident and as we've seen in the previous example, a problem may cause an incident.

You may raise a problem ticket and refer it to an incident.

The root cause of the problem may be known or not known. In any case, the following actions may be taken for problems:

  • Do nothing - if the problem does not affect the business, or if the cost of fixing the problem exceeds its benefits

  • Deploy work around if the determination of root cause exceeds the benefits.

  • Determine root cause and fix the problem if the benefit is worth it.

Incident vs Problem

To illustrate this further, let's take a practical example.

You are driving your car and you got a flat tire. This is an incident because it disrupted the service - transportation to a destination. You fix this by either changing the tire yourself or calling road-side assistance. Once the tire has been changed, the incident is closed. But now, you have a problem - you are running on your spare tire.

To fix the problem, you need to repair the flat tire and put it back.

Another example would be that you are driving on an almost bald tire. This is a problem. If you continue to drive your car with that bald tire, you are bound to have an incident.

Normally, an incident needs to be fixed within a specific timeline. Problems can be left indefinitely until an incident happens.

Questions to Help Identify Incident and Problem

I work in a maintenance shop and quite often, there is much discussion on whether something is an incident or a problem. There is only one question to ask: Should this be fixed now. Of course, when you talk to some people, they will always say yes. So to help me further, I ask the following questions:

  1. Is the service unusable?

  2. Is there a degradation of the service?

  3. Is the business process affected greatly?

  4. Are service levels affected?

If you answer yes to one of these questions, it is probably an incident.

Details

Details

Article ID: 1290
Created
Thu 4/3/14 11:56 PM
Modified
Fri 8/16/24 12:31 PM