What is the focus of investigations within your organization? Are you looking to identify the person or group responsible? Do you ask “who did it?”. Do you work to identify THE root cause? Do you often identify the root cause as “procedure not followed” or “human error”?
If you answered yes to any of the questions above then your organization may have a blame culture. A blame culture focuses on blaming humans for errors instead of looking for weak links in the company’s processes and procedures. Whether your organization has a blame or prevention culture will have a huge impact on the effectiveness of your investigations and your overall reliability.
The Blame Game
Recently, a user on Reddit shared a story right out of every employees’ worst nightmare. (Admittedly, this story is impossible to fact check, but it can still serve as a good example even if it is embellished.) According to the post on Reddit, a user with the handle cscareerthrowaway567 accidently made a copy-and-paste error and deleted the company’s entire production database the first day on the job as a junior software developer.
So how did the company handle the problem? The new employee was told to leave and never come back. According to the post, the employee pleaded for a chance to help fix the error, but was immediately fired and told to expect a call from the legal department.
The company’s approach appeared to be out of the blame game playbook: get rid of the employee responsible and the problem is ‘fixed”. While that one employee will never make the mistake again, the system will remain vulnerable to the same type of error if additional changes are not made. Not to mention the fact that the incident raises the many questions. For example, is it wise to let a first day employee work on anything where a single small error can have a huge impact?
A Prevention Perspective
In a previous blog, I wrote about an Amazon employee who made a $150 million typo when a command was entered incorrectly during planned maintenance and a large set of servers on the S3 platform was taken down for hours. So how did Amazon react? It’s reported that the employee is still working for Amazon which is a good indication that they have a different approach in how to deal with an employee’s expensive mistake. The different approach is out of the prevention playbook.
Rather than blaming the individual who entered the incorrect command, Amazon focused on improving the overall reliability of their system and worked to minimize the consequences of a similar error in the future. The system was modified so that servers would be removed more slowly and additional safeguards were added so that no subsystem would be reduced below its minimum required capacity level. Additionally, audits were performed to ensure that no other operational tools had similar vulnerabilities. Amazon focused on prevention and worked to make the design of the system itself more robust to improve reliability, thus preventing the same/similar incident from happening in the future.
Fewer missed opportunities
When companies focus on blame, they miss opportunities to make work procedures and systems more reliable. The input from the people most familiar with the work is needed to understand exactly why an error or incident occurred. This insight is invaluable for learning how to reduce the risk and/or consequences of a similar error occurring in the future. An organization with a blame culture tends to focus on punishing individuals that make mistakes. The fear of disciplinary action will often make employees reluctant to share information. Organizations with a prevention culture work to make it easier for employees to share what they know and that knowledge can be used to improve the reliability of work procedures. A prevention culture allows for continued process improvement. Whether it is a sales, production or service company the business as a whole is sure to benefit.
Imagine a situation where an employee puts the wrong oil in a very expensive piece of machinery, but realizes the error prior to the machine being turned on and informs his supervisor. Employees that feel comfortable enough to admit when they made an error can help reduce the consequences. Organizations with a blame culture will generally have fewer self-reported errors so problems won’t be caught as early.
Companies could even go so far as to have a process for employees to make management aware of weaknesses in a system before a mistake happens. The people doing the work with boots on the ground are usually in the best position to make suggestions for improvements. Don’t wait until a mistake happens. Empower employees to expose flaws and limitations and allow the employee to work to fix them before an incident occurs.
More robust systems and procedures
An organization where a single failure or mistake by one employee can result in a huge consequence is never going to be very reliable. By focusing on preventing errors and reducing the consequences of mistakes, an organization will improve their overall reliability. Obviously, there are cases where employees are a bad fit for a particular role, but then dig into the details of how employees are hired and trained. Work to improve the hiring processes so that employees are more likely to be placed into appropriate roles.
If an employee makes a mistake, then involve them in the investigation. Make it safe for them to share the details and learn from it. Developing a culture of prevention keeps the focus on improving work processes. Getting rid of an employee does not get rid of problems in a system that is vulnerable to a single point failure.
(And if you are worried about the Reddit user from our first example, according to an updatehe was able to land a new job relatively quickly with a different company that “knows what happened and no one blames me”.)