When your incident investigation ends with 'Fix/Replace'

Katie Wohlust



How to Turn a basic Fix/Replace” Solution into Valuable Action Items

For our third blog in the incident investigation solutions series, we are going to talk about fixing or replacing a broken piece of equipment without digging any deeper into why that failure occurred. Although Cause Mapping has 3 main parts: problem, analysis, and solutions, the end goal in the investigation is seeking solutions that will prevent or reduce the risk of similar events from happening again. If we simply fix or replace equipment without considering why that piece of equipment failed, then we are setting ourselves up for future equipment failure incidents.

Generic solution #3: Fix/Replace…

A common cause in an incident investigation is “equipment failure” and that is about as specific as “human error.” A big picture preventative approach to equipment failure would be to determine exactly why it failed (failure analysis), if there is similar equipment in use that is near failure (audits), and identifying compromised equipment before it fails (hazard ID)? I would like to propose several ideas that can make your incident investigation and solution selection more thorough.

Failure Analysis

When you get to “equipment failure” in a root cause analysis consider asking the following questions: Why did that piece of equipment fail? Was it too big/small for application? Are we using it past its manufacturer’s life expectancy recommendation? Did we follow the manufacturer’s maintenance recommendations? Are we using it in conditions outside it’s recommended specification (i.e.: temperature and pressure)? Do we even still need this piece of equipment? These are all questions to ask during a failure analysis. Depending on the scale of impact this failure caused your organization, you might consider sending the failed piece of equipment to a third-party subject matter expert to conduct the failure analysis.

Conduct an audit

After one piece of equipment fails, it might be a good idea to look for similar equipment being used throughout your organization’s operations. For example, let’s say a gasket failed and that caused a loss of primary containment. There may be many more gaskets that are the same style and the same age being used in your process. If we are going to replace the one failed gasket, then maybe we could consider inspecting all the other gaskets to see if the failure was an isolated incident. If there are thousands of these gaskets in use, you may want to consider taking a random sample to test for integrity.

Hazard ID

A truly valuable action item to consider is developing a process for employees to report equipment that they have identified as potentially compromised. The employees working with the equipment day and night are in the perfect place to detect if something sounds, looks, or feels different. This might be indication that a further inspection is needed. Giving employees the tools to be able to tell the right person in a reasonable amount of time might prevent future incidents. In a previous blog, we discussed empowering employees to be able to expose weaknesses within an organization. That same concept applies here.

When developing solutions after equipment failure, consider complimenting the “fix/replace broken…” solution with additional solution(s) that focus on finding similar equipment that may be close to failure. If one thing breaks in the field, it may be a sign that other equipment is nearing failure too. It also might be indication of a symptom of a larger problem. Although it may take more time to dig deeper into these possibilities, if you prevent future failures then I think it is time well spent.

