You may have noticed that parts of the internet went quiet yesterday. For users in the Northeastern United States, for a little less than two hours, many of the sites you access every day were, well, inaccessible.
The short version of what happened is that a small network services provider called DQE Communications erroneously shared inaccurate routing information with Verizon, which then passed it along to the greater network.
As a major internet transit provider, Verizon should have filtered out the bad information but didn't, which resulted in roughly two percent of IP address prefixes being redirected through a steel plant in a small town in Pennsylvania. That included sites like Facebook, Amazon and thousands of others that use services like Cloudflare and Amazon Web Services to host or run their sites.
Ironically, one of the sites dedicated to tracking outages like this, DownDetector, was in fact, down.
Look, sometimes things happen. In this case, it wasn't the direct fault of anyone at Verizon. It's not like the company made the change, it simply passed along the routing change from another provider. It is, however, Verizon's problem, and the company's response is, well, lacking.
In fact, so far, the only response I've found that the company has issued was a statement to The Register, saying "There was an intermittent disruption in internet service for some [Verizon] FiOS customers earlier this morning. Our engineers resolved the issue around 9 am ET."
Cloudflare's CEO, Matthew Prince, was understandably frustrated as one of the major company's effected, and said it more bluntly in a tweet yesterday:
The teams at @verizon and @noction should be incredibly embarrassed at their failings this morning which impacted @Cloudflare and other large chunks of the Internet. It's absurd BGP is so fragile. It's more absurd Verizon would blindly accept routes without basic filters.-- Matthew Prince (@eastdakota) June 24, 2019
Verizon should have filters in place to prevent this sort of thing from disrupting the internet across vast areas of the country. And so far Verizon's response hasn't exactly taken responsibility or even acknowledged that there is a problem with the company's protocol for handling events like this.
When mistakes happen, even when they aren't your fault, it's still your problem, especially when millions of people are depending on your business to get it right.
Even though taking responsibility seems to have gone out of style, it's actually one of the best ways to get past a problem. People will actually respect you and your company more as a result of the fact that you are willing to stand up and say "this is what happened, and we're sorry. Here's how we're making sure it won't happen again."
The words themselves aren't complicated, but for some reason, they seem to be very difficult to spit out. It might be a good idea to practice because there will absolutely come a time when your business has to use them.
I reached out to Verizon for comment, but the company did not immediately respond.
A final thought and a lesson.
This event is another reminder that the technology we depend on to "just work," often doesn't and the implications of that are far more significant than we realize.
Just last week, on two different days that resulted in customer's unable to check out in its stores. In the grand scheme of things, both of these outages were relatively minor compared to what could happen if the basic technology we count on to work fails due to either error or malicious attack. Target suffered an unrelated outage
If your business is responsible for providing a service that has become so ingrained in the daily lives of millions of people, you should absolutely have systems and process in place to deal with them. While the outage lasted only a few hours, it could have--and should have--been prevented entirely had Verizon had a process in place to reject the erroneous routing information in the first place.
Sometimes an ounce of prevention could, you know, prevent the entire internet from crashing for millions of people.