A software bug in a telecom provider's phone number blacklisting system caused the largest telephony outage in US history, according to a report released by the US Federal Communications Commission (FCC) at the start of the month.
The telco is Level 3, now part of CenturyLink, and the outage took place on October 4, 2016.
According to the FCC's investigation, the outage began after a Level 3 employee entered phone numbers suspected of malicious activity in the company's network management software.
The employee wanted to block incoming phone calls from these numbers and had entered each number in fields provided by the software's GUI.
The problem arose when the Level 3 technician left a field empty, without entering a number. Unbeknownst to the employee, the buggy software didn't ignore the empty field, like most software does, but instead viewed the empty space as a "wildcard" character.
As soon as the technician submitted his input, Level 3's network began blocking all telephone calls.
The event had massive repercussions, affecting the entire US. For 84 minutes between 10:06 and 11:30 AM Eastern Daylight Time (EDT), Level 3's network blocked all calls, a massive number of 111 million calls, 109 million of which where VoIP-based.
Approximately 29.4 million VoIP users and around 2.3 million wireless users were affected.
FCC said the event had "nationwide impact" and called it "the largest [outage] reported in the Federal Communications Commission’s Network Outage Reporting System (NORS)" history.
Calls to 911 were also blocked, but due to the emergency system's redundancy, only 15 of 117 calls failed to connect to a public safety answering point (PSAP).
The outage could have been more prolonged if Level 3 didn't have systems in place that alerted operators of abnormal activity. FCC says Level 3 became aware of the incident four minutes after it started.
"The technician was unaware of the consequences of leaving a field in the network management software blank," the FCC concluded in its report, absolving the employee and company of guilt. "Level 3 personnel had not previously observed or experienced this behavior in their network management software. According to Level 3, this was the first time that anti-fraud operations in network equipment caused an outage."
The FCC report also mentions that Level 3 used "vendor-supplied network management software," but did not name the supplier.
As part of subsequent corrective measures, Level 3 now uses a provisioning system to handle phone number bans and the company also removed over 800 technician accounts that had access to various networking systems they shouldn't.