A Valentine’s Day love note from an ex nine months too late. A note from a lost loved one. On the morning of Nov. 7, 2019, thousands of people received text messages that had been sent nearly nine months earlier. Multiple carriers, including Sprint, T-Mobile, AT&T and Verizon, and both iOS and Android users were impacted. Some of the more than 160,000 messages that were delayed were from exes, coworkers and even deceased friends and relatives. (Just imagine what it would feel like to receive a text from a parent that had recently died.)
There was no indication that a message had been sent from the accounts where they originated, so people had no idea that a message they had sent months ago, and probably forgot all about, was finally delivered. The delayed messages caused widespread confusion and resulted in many awkward conversations as people received text messages completely out of context.
How Did This Happen?
Sending a text message seems pretty straightforward. You push the send bottom on your phone and the text appears nearly instantly on the recipient’s phone, but the process isn’t as simple as it appears. The messages that were delayed were sent using short message service (SMS). Phone carriers use third-party vendors to deliver SMS text messages and the messages can be routed through multiple servers before reaching their destination.
The text messages were delayed when a server malfunctioned at a company that provides networking services for multiple carriers. Over 160,000 text messages were trapped on a server when it malfunctioned on Feb. 14, 2019. According to the company, the server came back online after routine maintenance was done on Nov. 7, 2019 and so, the trapped text messages were sent. No details were released about why the server malfunctioned or why it took so long to correct the problem.
Find the Causes, not THE Root Cause
In this example, it might be tempting to say that the failed server was the root cause, but focusing the investigation on a single cause means additional opportunities to reduce the risk of something similar occurring in the future could be missed.
A Cause Map™ diagram, a visual tool for performing a root cause analysis, can be built to intuitively lay out the causes that contributed to an issue and show the cause-and-effect relationships between them. Working to determine all of the causes (plural) and not focusing on finding a single root cause, naturally expands the solutions that are considered. Coming up with a solution to address the server malfunction is important, but there are other causes that can also be addressed to reduce risk. In this example, a more thorough investigation might consider what caused the server to malfunction, why the problem with the service wasn’t identified sooner and why the text messages were delivered instead of being deleted.
Click on the thumbnail above to see an example of a Cause Map™ diagram for this case study.
The Importance of Communication
Understanding the technical issues of an investigation is important, but I think the most powerful takeaway from this case study is about communication. It’s easy to forget that two people can be saying two different things, and both can be correct. You can accurately state that you didn't send someone a text last night, but that doesn't mean someone is wrong when they say they received a text from you. This case study is a good reminder to keep an open mind during an investigation and try not to jump to conclusions.