Facebook Was Brought to Its Knees by ‘The Dark Magic of the Internet’

by -
Photo: Chip Somodevilla (Getty Images)

techno.rentetan.com – Here’s a look at how a key routing protocol played a part in the social media giant’s Monday snafu. On Monday, Facebook and all of its connected companies and services abruptly vanished from the internet, resulting in a five-hour outage that left users unable to access their Facebook, WhatsApp, or Instagram accounts. Rumors and conspiracy theories quickly circulated that the social media behemoth had been hacked, or that it was attempting to divert attention away from its impending congressional troubles.

Now we know what the true cause is: The business published a statement on Tuesday offering further information regarding the outage and stating that the entire worldwide blackout was caused by a “faulty configuration modification” made during normal maintenance. Facebook’s backbone, a globally dispersed network of fiber optic cables that connects all of the company’s data centers across the world, was unintentionally shut down as a result of the misconfiguration. As a result, the much-maligned social media behemoth vanished from the internet for the greater part of a day, providing much-needed relief from its poisonous influence.

Of course, the specifics of what occurred are more complex. The use of Border Gateway Protocol, or “BGP,” a strong but little-known routing protocol, is one of the most intriguing aspects of the whole operation. BGP was generally assumed by web experts — and now verified by Facebook — to have played a role in the entire incident. So, there you have it. What exactly is BGP?

BGP, in a nutshell

It’s been dubbed “glue” because it ties the web together. Others refer to it as the “post office” or “air traffic controller” of the internet. Stripe CEO Patrick Collison referred to BGP as “the black magic of the internet” after Facebook vanished from the face of the Earth on Monday. BGP is a complicated system that “no one completely understands.” BGP has a simple, clear purpose, but understanding it requires an understanding of the broad strokes of how the web works—which is obviously difficult.

In a nutshell, BGP is one of the numerous protocols that aid in the organization of the web’s vast network of interconnected networks. BGP, in particular, aids in the routing of traffic to and from the largest online entities, known as “autonomous systems.” An AS is a shorthand term for a big network or collection of networks, which can include a university, an ISP, a government organization, or, among other things, a very large tech firm like Facebook. Autonomous systems are in charge of preserving current information on the fastest web routes for sending and receiving data packets to and from their network.

BGP is then used to convey such policies to the rest of the internet (and hence to other networks). In this way, BGP essentially enables web-based data routing.

This is when the metaphor of the “post office” comes into play. BGP is in charge of determining and exchanging the most effective routes for relaying data (such as mail) between particular destinations. Others have referred to it as a map, one that is continuously changing and updating based on the changing state of the internet. An research by the security firm Imperva compares BGP to your car’s GPS system in yet another clever metaphor:

…the BGP routing protocol is like to your dependable GPS guide. The best route is determined by various factors, such as traffic congestion, roads temporarily closed for maintenance, and so on, similar to Google’s Waze app. The path is dynamically generated based on the state of the network nodes, which are similar to highways and junctions on a GPS map.

There’s a lot more to say about BGP, but the short version is that if an autonomous system’s BGP isn’t configured correctly, data can’t be routed successfully to and from its network, and users can’t contact it. This appears to be a component of Facebook’s demise.

How Does BGP Relate to Facebook’s Bad Day?

BGP misconfigurations have a history of producing “spectacular episodes of widespread outages,” preventing users from accessing internet services. Facebook has now admitted BGP’s participation in its shittiest of shittiest days, revealing in a recent update how a backbone issue contributed to the downing of its BGP “advertisement”—basically the mechanism that informs other online entities that it exists:

Our DNS servers block such BGP advertising if they themselves are unable to communicate with our data centers, as this is an indicator of a faulty network connection. The whole backbone was taken offline during the current outage, causing these locations to declare themselves unhealthy and delete their BGP advertising. Our DNS servers were unavailable as a result, despite the fact that they were still operating. The rest of the internet was unable to locate our servers as a result of this.

Notably, Facebook’s BGP advertising were disabled as a result of a bigger, more systematic issue. The event, however, highlights BGP’s critical role in web operation, while also remembering earlier instances in which BGP’s incapacity or misconfiguration caused major disruptions.

When asked about the downtime, Usman Muzaffar, SVP, Engineering at Cloudflare, stated in a statement shared with Gizmodo on Monday, “In our experience, these are generally errors, not attacks.” Experts say that such an outage isn’t unheard of, however the extent and duration of Facebook’s outage are noteworthy. Cloudflare has done its own analysis of how the BGP misconfiguration may have occurred.

The Electronic Frontier Foundation’s senior staff technologist, Jacob Hoffman-Andrews, stated, “It’s not that odd.” “The large Internet firms experience outages like this on a regular basis,” he added, citing a well-known BGP incident in 2008 in which Pakistan’s state-owned telecom managed to unintentionally shut down YouTube by co-opting traffic destined for the video-sharing site. A major section of Google was down for approximately an hour in 2018 when a BGP failure routed a large piece of web traffic through Russia, China, and other locations where it wasn’t meant to be.

Will There Be Another Incident Like This?

Yes, to summarize. Yes, most sure. If not Facebook, BGP will almost probably tangle with another popular platform you use frequently. Experts say there’s no need to be concerned, but it is an excellent example of the web’s fallibility, demonstrating how much of it can be knocked down by something as simple as a company’s technological blunder.

In a blog post about the incident, Cloudflare analysts said, “Today’s events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together.” “Trust, standardization, and cross-entity cooperation are at the heart of making it function for almost five billion active users throughout the world.”