The longest six hours in Facebook\u2019s history took place on October 4, 2021, as Facebook and its sister properties went dark. The social network suffered a catastrophic outage. The only silver lining to the outage, if there is one, is that the outage wasn\u2019t caused by malicious actors. Rather, it was a self-inflicted wound caused by Facebook\u2019s own network engineering team.According to the first engineering blog post from Facebook on October 4, they fingered \u201cconfiguration changes on the backbone routers that coordinated network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.\u201dThey followed up their blog post on October 5 with more details: \u201cA command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all connections in our backbone network, disconnecting Facebook data centers globally.\u201d The blog explained how their systems have fail-safe processes in place to prevent this type of mistake, but \u201ca bug in that audit tool prevented it from properly stopping the command.\u201dYes, yet another instance where the machines turned out to be the insider that caused the havoc.Impact of a machine-based insider eventA Domain Name System (DNS) error caused their BGP (border gateway protocol) messages to essential go blank. Neither Facebook (Instagram\/WhatsApp), nor the internet could find them. When the audit tool failed, the platforms themselves were unreachable. The company wasn\u2019t able to operate remotely, so all work had to be managed locally. Imagine the gyrations that were necessary to manually bypass all the technological barriers to entry that were in place and were now defaulting to their error status.Additionally, it was widely reported that the same internal infrastructure supported various internet of things (IoT) devices and services within the company itself were affected, to include access control, company email, and employee online workspaces \u2013 all are managed in house.The impact went beyond Facebook\u2019s 3.5 billion users eager to share their photos, opinions, and recipes. Third-party entities that tied their authentication process to Facebook had clients\/customers\/employees unable to access their accounts. Individual users who opted to use their Facebook account as their log-in were also found twiddling their thumbs waiting for the outage to end, as access to their desired domains was being blocked due to the unavailability of the authentication processes.Lessons for CISOs from the Facebook outageIs this an instance of technical decisions being made by non-technical leaders? Cary Conrad, chief development officer at SilverSky, comments how the self-inflicted outage is \u201cemblematic of a broader leadership issue in the tech world.\u201d He observes how he has seen for more than 20 years how \u201cGood management trumps good technology every time, yet due to the ever-changing threatscape of the tech industry, inexperienced leadership is oftentimes relied upon for the sake of expediency.\u201d He continues, how within the world of cybersecurity, \u201cThe Peter Principle is in full effect. People progress to their level of incompetence, meaning a lot of people in leadership within cyber have risen to a level that is difficult for them to execute and often lack formal technical training. As a CISO, there is a need to configure, identify, and negotiate the cost of protecting an organization, and without the adequate experience or a disciplined approach, this mission is executed poorly.\u201dWhile the knee-jerk reaction may be to punish the engineer who gave the update order, that would be misdirected ire. The real culprit, in this instance, is Facebook\u2019s own architecture. It allowed their network to fail the most basic of network tenets: Do not allow for a single point of failure.Facebook\u2019s infrastructure collapsed when the automated audit process failed due to an undetected (or known but not yet mitigated) bug.Tom Krazit and Joe Williams hit the nail on the head with their summation published in protocol\u00a0of the three learning opportunities for CISOs which come out of Facebook\u2019s outage:Plan for the worst. Enterprises need a contingency plan for the complete loss of their computing resources or network connection, not just the loss of a data center or cloud region.Hedge your bets. It's extremely unlikely that the entire internet will go down at the same time; hedging at least a few bets across multiple service providers could be worth the effort.Check your priorities. There's no way to run an operation the size of Facebook without a serious amount of automation, which means code-auditing tools like the one that failed to stop this outage need extra attention.October 4 was a bad day for Facebook, and a tweet from Jonathan Zittrain, Harvard Law professor at the School of Engineering and Applied Science, wryly summarized it: Facebook basically locked its keys in the car.