|Home » News
The Facebook BGP Day
Not for weak minds.
Facebook went down today. All their sites went down. We still haven't seen anyone offer an explanation but the outage reeks of BGP fuckups from the 'old days'.
Border Gateway Protocol tables are sensitive creatures. You'd want to ask our dear friend 'Sargon' (RIP) who helped build Internet backbone in several countries with Vint Cerf. BGP tables take time to propagate.
So if somebody fucked up in a BGP table, it'd take time to show up. Conversely if you fix a BGP error, it'd take time to act corrected again.
This is close to rocket science. According to Sargon, one of the worst errors made is letting wimps take over BGP updates.
The outage was six hours, according to RT.
Wall Street went berserk - over an outage.
Snowden's quips were best.
This one was especially enjoyed.
'Facebook and Instagram go mysteriously offline and, for one shining day, the world becomes a healthier place.'
This sketch was amongst the best.
For Facebookers of course panicked and ran to - Twitter.
But they ran to other places too. Such as Jason Miller's GETTR. Although it may be only coincidental and not caused by a traffic surge, GETTR is still suffering from severe issues which began at the same time. Pinned posts are showing up, but none other.
And GETTR chose this particular time to nag users about backup email addresses, but the code wasn't too cool, but began looping back and nagged over and over again. Attempts to submit again were met with 'that address is already taken'.
Although the 'culture' at GETTR seems better than GAB, and although it's garnering more attemtion than Minds, the code is, to say the least, strange. Script code monitors keystrokes, watching out for 'land mines' the later text parser cannot (but should) handle. Putting a '?' at the end of a line emits weird stuff. Putting '??!?' in the same place is even worse. Putting punctuation after an account handle results in a space character. And so forth. The informal comment from 'support' is 'we're a new site and we need time to work out the kinks'. But crappy ideas like that should never be in the code to start with.
Jason's been trying to get Donald Trump to sign on to the site. With issues like the above, Trump and people with reputations of any sort should for now stay away.
Per a new blogpost the FB crack team seem to have isolated the flaw.
'Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.'
So yes, sounds like FB fucked up the BGP tables.
UPDATE - CLOUDFLARE
Here's an excellent and thorough explanation of how and why bad BGP can screw things.
For fun check out Cloudflare's DNS resolver 126.96.36.199.
The outage is discussed on Hacker News.