I frequently encounter customers, sales folks, and even engineers, who ask me: why is computer networking so hard? I’ve found that there’s a simple metaphor that provides a great explanation.
Imagine you need to make a trip from a gated community somewhere in rural Canada, and your destination is another gated community somewhere in the rural United States.
You need to plan a route. But now, rather than just calling up your favorite booking site, your Travel Safety Team says, “You can’t just go via any route and stay in whatever motel you find—bad stuff will happen. You have to use only approved travel providers or show that no such provider exists. You may only use pre-vetted lodging destinations. And we’re happy to pay for the extra expense because that’s how important it is.”
So you diligently dig in. After several days of tediously plotting options and confirming approved layover points, you come up with a plan that beautifully conforms to your team’s guidelines. It is both safe and efficient: it uses private travel carriers, highly-reviewed and approved lodgings at your layover points, and minimizes your own travel time. Your Travel Safety Team looks at the budget estimate and their faces go pale. “We need you to be safe, but we’re not made of money. Isn’t there a cheaper alternative?”
You have to drive. Due to your team’s restrictions, you have to talk to the travel and road-planning commissions for every major municipality between you and your destination, so you need to do the following:
- Figure out what the local ordinances are: some towns allow you to come in for as long as you like and then return home, but not to pass through—you can’t stay there overnight on your way to somewhere else: that’s illegal.
- Find out if any private roads will grant you through-access or if you need to charter a helicopter for certain legs of the trip.
- Negotiate with your Travel Safety Team: for places that don’t have private roads or helicopters, what if you use a dedicated lane? They’re going to ask you if it has double-white lines, double-yellow lines, medians, or something else. And you’re going to have to research that.
- Know exactly what luggage you’ll be carrying, because certain checkpoints will only allow bags of a certain count, size, or weight. Frequently these restrictions are not publicly known or even documented, and you may have to redistribute your luggage when you arrive.
- Obtain detailed interior maps—including cross streets and one-way street indications—of the gated communities that you’re departing and arriving at because the neighborhoods are labyrinthine.
- Understand how the address/house numbers at each major municipality relate to “general” addresses because every place seems to have its own system.
- Pre-write scripts to help you carefully explain your destination and purpose of visit to each of the border security agents along the way, each time using a location address that each particular agent will understand.
- Make sure you have passports, contact names, and gate codes for every stop along the way.
Then, to top it all off, just when you think you’ve nearly completed your plan, you find out you need to make another stop at a third gated community along the way… and it’s nowhere along your existing route.
Every single one of the above real-world examples maps back to a specific network challenge that you could face when trying to establish routing from your local offices to the destination of your choice:
- Your corporate LAN might feel like a sprawling local neighborhood with specialized address numbering; the broad private IP spaces and network address translated (NAT’ed) subnets can make the “simple” act of leaving your on-premises network a hurdle unto itself.
- Carefully planned dialogs with border security agents are just like the negotiations between VPN routers or transit through firewalls: even completely valid traffic must be thoughtful about how to avoid rejection at the border crossing.
- Local city ordinances around thru traffic are similar to SaaS vendors’ transit-routing policies: many network providers are fine with traffic that arrives and then returns home but have special (and often expensive) requirements for traffic that needs to be passed onward.
- Unpublished luggage restrictions are like Maximum Transmission Unit limits, with individual routers along a path unexpectedly rejecting packets that are “too big” until a compromise in size can be negotiated.
It gets worse: unlike when a human is making this trip and can improvise when challenges arise along the way, a network packet taking such a route has no such agency or autonomy. If it gets stuck somewhere, it’s done. It can’t even call home to get help. It just languishes wherever it lands, never to be heard from again. It’s a good thing packets are so highly replaceable, unlike you and your luggage.
So even though you, as a traveler, might be able to tell me all about the difference between a freeway, a highway, and a toll road; know all about overpasses and underpasses and on-ramps; know the purpose of immigration checkpoints; and, be able to describe at length how gate codes work, all of that is only the beginning: you have to know the SPECIFIC on-ramps, freeways, immigration checkpoint routines, surface streets, addresses, and gate codes, all planned perfectly in advance, in order to make it all work.
And even though I, as a Network Engineer, might be able to tell you all about Skytap’s ability to facilitate your network needs, whether that be a VPN directly to your corporate premises, an ExpressRoute to Azure, or multiple cloned NAT’ed environment subnets, there’s no “right answer” to the question of how your application should route its connectivity into Skytap. In addition to your application’s specific destination and performance needs, your networking and security teams will have their own requirements, and your project will have its own specific budget targets. The right solution will be a specific network path that blends (or at least finds an acceptable compromise between) all of these constraints.
When each new journey requires this level of detailed knowledge, planning, and flawless execution, it should come as no surprise that establishing new network routes is so difficult.
So what do we do about it?
For starters, accept that networking is hard. Rather than denying or fighting it, attempt to mitigate the difficulty. Networking technology has had more than fifty years of brilliant engineers working hard to design systems to speed our traffic along its merry way. Unfortunately, we’ve also had equal time in which malicious actors have been working to exploit these systems for their own benefit. All of this time and tension has resulted in layer upon layer of complexity, checks, and balances, which need to be thought through carefully, with no one-size-fits-all solutions. Making sure that your team has the right stakeholders and support is critical. Missing knowledgeable stakeholders from certain legs of the trip will only result in flying blind.
Secondly, just like your original application wasn’t built in a day, neither will your new network route. In the best traditions of software development, it’s going to require iterative rounds of careful requirements gathering, planning, implementation, and testing to succeed. That kind of effort takes time, which cannot be crammed into a two-week intensive effort. Plan accordingly.
Next up, remember that—unlike you—network packets are dumb, and cannot improvise. And you’re not the one making this trip: your packets are. You’re spinning up a fleet of little robot drones to make the journey over and over. It’s natural that in the first few (maybe even several) attempts, there will be wrong turns and unexpected blockages. Plan for these failures to occur and build in ways to monitor the truth on the ground: Finding ways to get visibility on the real status and location of your packets at each leg of the journey will be critical to diagnosing wrong turns, and getting back on the right path. And just like an unexpected snowstorm or power outage can occasionally waylay even the most seasoned traveler, unexpected changes will cause delays and reroutes, and then you’ll need to improvise again. You’ll want to be able to switch your network visibility monitors back on, even after weeks or months of flawless autopilot.
Finally, the point about understanding the full path bears repeating. It’s so fundamental, it may seem obvious; but when a driver is relying on you for directions and you fall asleep during a leg of the trip, it’s always possible you’ll wake up to find yourself in the wrong place. Gather your requirements thoroughly, do your research, find out exactly how all of the points connect. If there’s a leg of the trip that seems mysterious or a bit of a black box, stop and ask about it, until you know exactly how it fits into the plan. It’ll save you from going through the same trouble later on, when “everything was working great before” but now you’re under much more pressure to get it magically working again.