One of the most common things I hear from organizations is:
We keep having the same issues. We send out safety flashes and have stand downs, but the same incidents keep happening.
At ThinkReliability, we find that thoroughly analyzing a single incident can reveal much more than people expect. One event often exposes process breakdowns, gaps in documentation, or flawed assumptions that exist elsewhere too. You don’t need to wait for a pattern to develop—the first occurrence is often enough to show where the system is vulnerable.
The Boeing 737 MAX 8 crashes are a clear example. In late 2018, Lion Air Flight 610 crashed into the Java Sea. Just a few months later, Ethiopian Airlines Flight 302 went down under nearly identical circumstances. These two crashes killed 346 people and grounded the 737 MAX 8 fleet, but the warning signs were there after the first crash. This blog outlines four key lessons from that failure to learn.
The 737 MAX 8 was Boeing’s answer to the Airbus A320neo, a fuel-efficient aircraft offering lower operating costs and a longer range. To compete with Airbus, Boeing updated its long-running 737 series with larger, more efficient engines: the CFM LEAP-1B. These new engines had to be mounted higher and farther forward on the wing due to their size. This change subtly altered the aircraft's aerodynamic profile, particularly during high angle-of-attack situations.
To address this, Boeing introduced a new software-based system to the aircraft: the Maneuvering Characteristics Augmentation System (MCAS). MCAS was designed to automatically push the nose of the aircraft down if it sensed the plane was nearing a stall. The goal of adding this automated tech was to preserve the feel of previous 737s, so pilots wouldn’t need costly simulator training.
But there were two key problems. First, MCAS relied on input from a single angle-of-attack sensor. Second, MCAS could activate repeatedly without providing clear feedback to the pilots. These problems would prove catastrophic.
On October 29, 2018, Lion Air 610 crashed into the Java Sea shortly after takeoff from Jakarta, killing all 189 people on board. It was the first public sign of a problem with the new Boeing 737 MAX 8. On the Lion Air flight, a faulty angle-of-attack sensor fed incorrect data to MCAS, which repeatedly pushed the aircraft’s nose down despite the pilots' attempts to counter it.
It's important to note that in aviation, sensor failures are routine and usually manageable. The issue was that MCAS could activate based on a single sensor—and when it did, it took control away from the pilots without clearly communicating what was happening.
We start all our investigations with the problem outline. This problem outline was built from the perspective of Boeing with an impact on its business.
Lion Air 610 Problem Outline
We’ll start building our Cause Map™ diagram with the most severely impacted goal, the safety goal. Why was Boeing’s safety goal impacted? Because 189 lives were lost. We continue asking why to build an initial 5-Why analysis.
Lion Air 610 5-Why Cause Map™ Diagram
After the crash, internal discussions at Boeing, FAA reviews, and investigations by Lion Air revealed concerns about MCAS. But Boeing did not issue a software fix nor recommend grounding the aircraft. Instead, the company issued guidance suggesting that pilots could manage the issue by following standard runaway trim procedures.
This is the critical point: after the first crash, the information needed to prevent another crash was already available. The failure mode was known. The design vulnerability had been exposed. Yet, Boeing only released an Operations Manual Bulletin (OMB)1 that reiterated existing procedures. The key message from Boeing was that if MCAS activated unexpectedly, pilots could counter it by following the standard runaway stabilizer checklist. The bulletin emphasized that pilots could override MCAS using the trim switches on the control column, and if needed, disable the system using the stabilizer cutout switches.
But Ethiopian Airlines, an organization that prided itself on its safety performance, had questions. After reviewing the OMB, their flight operations team sent a set of pointed questions to Boeing2, which I’ve summarized below:
Boeing's response3 was... less than confidence-inspiring. Due to their involvement in the ongoing investigation, they declined to address the first two questions. On the third, they simply reiterated the language from the bulletin: “The pilot always has trim authority to override both the Speed Trim and MCAS flight control laws with the control wheel electric trim switches and ultimate authority to power off the entire stabilizer trim system using the Stabilizer Cutout Switches.”
Ethiopian Airlines raised legitimate questions that pointed to the need for a more systemic fix. Boeing’s response made it clear that no such fix was coming.
The cost of that decision would soon become tragically clear.
On March 10, 2019, less than five months after the Lion Air tragedy, Ethiopian Airlines Flight 302 crashed under eerily similar circumstances. Again, faulty angle-of-attack sensor data triggered MCAS. Again, the pilots struggled to regain control. And again, lives were lost.
If you look closely at the frequency in our second problem outline, we now start to see that this is the second case where we had a lost plane as a result of MCAS.
Ethiopian Airlines 302 Problem Outline
*Cost listed is an estimate based on the average 737 Max 8 cost
Again, starting from the safety goal, we’ll build our 5-Why analysis of the Ethiopian Airlines crash.
Ethiopian Airlines 302 5-Why Cause Map™ Diagram
The second crash confirmed what should have been clear after the first: This was not a one-off. The same system failed in the same way, with the same deadly outcome. The tragedy wasn’t just that MCAS failed, it was that a known failure mode went uncorrected.
At this point, it’s worth stepping back to ask, How does a company fail to act on a known risk? To answer this question, let’s bring in a fundamental cause-and-effect relationship: Degree of Consequence.
Degree of Consequence Fundamental Relationship
The Degree of Consequence relationship describes what happens when you have an event and then choose a course of action—or in this case, inaction—that causes a worse event. The Degree of Consequence allows us to expand our analysis to a 6-Why. The Ethiopian Airlines flight crashed because the aircraft nosed down unexpectedly and because the 737 MAX8 was not grounded after the first crash.
What is fascinating about the second 737 MAX 8 crash is that the airline involved, Ethiopian Airlines, was the same airline that proactively asked Boeing the right questions. Their flight operations team raised concerns about MCAS and how it was supposed to be identified in flight. But Boeing, citing the ongoing Lion Air investigation, stated they were “unable to answer questions directly related to this event.”
The airline that tried to prevent the next crash became the next crash.
Ethiopian Airlines 302 6-Why Cause Map™ Diagram
As we expand the Cause Map diagram to a 12-Why, we can see how the Degree of Consequence relationship connects the first and second crashes. Note that in this map, I’ve chosen to present the two crashes as two separate incidents. There’s a reason for that.
Ethiopian Airlines 302 12-Why Cause Map™ Diagram
When you have two or more incidents with the same piece of equipment, you may consider building a Cumulative Cause Map diagram. A Cumulative Cause Map diagram brings together all known failure modes, causes, evidence, and action items into one analysis. It’s a valuable input for a failure modes and effects analysis (FMEA).
However, combining incidents tends to highlight technical issues, which may obscure how process breakdowns allowed a known failure to repeat. By analyzing each incident independently first, we can better see how consequences escalated, where interventions were missed, and how one failure set the stage for the next.
This initial analysis focuses on the sequence of events and decision-making that allowed a known failure mode to repeat. In a future blog, we’ll take a deeper look at the technical aspects of the incident, including how Boeing’s design philosophy interacted with automation in ways that left crews vulnerable.
The Boeing 737 MAX tragedy is a case study in how inadequate learning from one incident can set the stage for a second, often more damaging one.
Here are four lessons that apply far beyond aviation:
One failure is a signal. You don’t need a string of incidents to justify action. A single event can expose a critical weakness. When the MCAS failure occurred, it should have been treated as a high-consequence event, regardless of whether it aligned with past patterns.
Procedural fixes are not always enough. Telling people to “follow the checklist” doesn’t solve the problem if the checklist doesn’t match how the failure occurs in real-world conditions. Improvement requires understanding not just went wrong, but why the existing defenses didn’t work.
Cross-functional learning is critical. Engineering, operations, customer support, and regulatory affairs all saw different pieces of the puzzle. But without integrated learning across teams and organizations, the full risk was never addressed.
Delays compound risk. Every day that goes by without addressing a known issue increases the chance that it will happen again, especially when the system (like a global aircraft fleet) is large and distributed.
The first crash revealed the failure mode. The second crash revealed the failure to learn.
Stay tuned for our next blog on the 737 MAX 8 crashes, where we’ll dig into the technical design behind MCAS and how it was shaped by management systems at Boeing.
1 - Boeings Operations Manual Bulletin
2- Investigation Report On Accident To The B737 Max8 Reg Et Avj Operated By Ethiopian Airlines (Questions on pages 290-291)
3 - Investigation Report On Accident To The B737 Max8 Reg Et Avj Operated By Ethiopian Airlines (Boeing's Response on pages 289-290)