Facebook outage a prime example of insider threat by machine

A buggy automated audit tool and human error took Facebook offline for six hours. Key lesson for CISOs: Look for single points of failure and hedge your bets.

please stand by problem technical difficulties tv mistake test screen by filo getty
Filo / Getty

The longest six hours in Facebook’s history took place on October 4, 2021, as Facebook and its sister properties went dark. The social network suffered a catastrophic outage. The only silver lining to the outage, if there is one, is that the outage wasn’t caused by malicious actors. Rather, it was a self-inflicted wound caused by Facebook’s own network engineering team.

According to the first engineering blog post from Facebook on October 4, they fingered “configuration changes on the backbone routers that coordinated network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”

They followed up their blog post on October 5 with more details: “A command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all connections in our backbone network, disconnecting Facebook data centers globally.” The blog explained how their systems have fail-safe processes in place to prevent this type of mistake, but “a bug in that audit tool prevented it from properly stopping the command.”

Yes, yet another instance where the machines turned out to be the insider that caused the havoc.

To continue reading this article register now

Microsoft's very bad year for security: A timeline