Networking Perspectives: How It’s Done at Skytap
In the first part of this piece, Networking Perspectives: OpenStack v CloudStack, I discussed some of the primary differences between the network architectures of OpenStack and CloudStack. A key point was what we here at Skytap believe are the three core tenets of cloud networking:
- Clouds are inherently multi-tenant, and network definition and provisioning should be completely self-service for all users.
- The network should be a service that is independent of the compute service and the underlying hypervisor technology.
- The cloud edge should be a highly dynamic and scalable software controlled service; running on a largely static core network infrastructure.
I talked about how OpenStack is doing it right at layer 2 with their Quantum service, and how CloudStack is doing it right at layer 3 with their virtual gateway model. In this post, I will go into more detail about how we are following these three core tenets at Skytap.
Networking as a Service
Skytap started building out a cloud platform five years ago, and we have learned a lot along the way. We have watched the development of things like OpenStack and CloudStack, and we’ve seen them making and learning from some of the same mistakes that we have already made and learned from. One of these was not treating the network as a full-fledged service; as a peer to the compute and storage services. We realized, a few years ago, that the network needed to be managed as a separate service in order to properly scale over time. This is tenet #2.
This is actually part of a larger mindset we have here at Skytap, which is that everything should be a service. While this sounds a little hyperbolic, it is largely true. As your cloud scales, both in terms of capacity and engineering, having monolithic systems becomes untenable. Frankly, Amazon is the king of this, and it has allowed them to scale tremendously. This is not a lesson that has been lost on us.
Layer 2 Resource Pool
At Skytap, we use VLANs to provide layer 2 network isolation. The VLANs, however, come from a pool1, and do not need to be explicitly assigned to the network by an administrator. They are simply another dynamic resource in the system. When a network is deployed in our system, a VLAN is assigned to it by the control software at that time. In this way, we are able to follow tenet #1, allowing networks to come and go based on user demand. We are able to do this, because we also follow tenet #3, relying on our core switching infrastructure to be statically configured to handle any VLAN at any time on any port that will host customer traffic.
Our choice to use VLANs is one that is largely made for us by virtue of the fact that we are using VMware ESXi as the underlying hypervisor. This is, unfortunately, counter to tenet #2, but we live and work in an imperfect world. ESXi only supports VLANs at this time and doesn’t provide for things like VEPA, OpenFlow, or NVGRE as options2. Since VMware is closed source, we don’t have the option of doing it ourselves. Until they decide to provide support for the other solutions in the ESXi vSwitch, we live with what we have. However, using VLANs is not a bad thing by any stretch. There is a fair amount of FUD circulating about VLANs as a solution for network isolation in cloud environments. It’s a complicated issue, however, and warrants its own separate discussion, which I will address in the future. I will make this point though: Even if we had the option to use things like VEPA or OpenFlow in ESXi, we would likely continue to use VLANs for some time because, quite simply, it is a tried and true technology. VMware’s implementation may leave some things to be desired, but the basic tech is not a problem.
Layer 3 Virtualization
Having virtual gateways is a model we decided to pursue a few years ago, and it has proven itself in our production environment. CloudStack’s decision to follow this model in their Acton release is a big step in the right direction for virtual networking in open source stacks. Like any business or technology model, if it’s a good idea, someone else is doing it, too. Calling it simply ‘virtual gateways’ is selling it a bit short, however. At Skytap, in addition to virtualizing the Internet gateway, it is also a virtualization of internal IP routing, network services like DHCP, DNS, File Sharing, and site-to-site IPSec VPNs. All of these services are completely virtualized and software controlled, following tenet #1, allowing for complex network topologies to be designed and implemented in the cloud.
We only rely on our core layer 3 infrastructure to be statically configured, leaving the dynamism and mobility of the virtual gateways and their services completely up to the software. Like our layer 2 solution, following tenet #3 enables the following of tenet #1.
Having a fully virtualized layer 3 edge also enables the development of some very cool new technology. Some of the most exciting things we are doing here at Skytap will be coming out of this area in the future.
Operations in the Cloud
One last point I’d like to make about tenet #3 is that it not only allows you to achieve tenet #1, but it also removes unnecessary dependencies that prevent the company from being agile. Hardware-based infrastructure is still slow to procure, configure, and deploy. Changes made to core infrastructure are difficult to isolate or release in a “beta” fashion to only part of your customer base. This is one of the key reasons why people move workloads to cloud providers like Skytap in the first place. By removing dependencies on the features and functions of the core infrastructure, we enable that infrastructure to be:
- Selected independently of the cloud application on a schedule that suits operational needs and requirements.
- Upgraded independently of the cloud application on a schedule that suits operational needs and requirements.
- Managed and maintained by an operations team independently of the application.
In much the same way that having “everything as a service” removes dependencies and allows the application and the engineering organization to scale, following tenet #3 frees up both operations and engineering to perform their duties and scale independently. By building our solution around dynamic software-controlled services, Skytap is able to advance and improve just as quickly as we enable our customers to do.
In this post, I reiterated the three core tenets of cloud networking that we believe here at Skytap and described how our architecture follows them. As we look forward, we see exciting things happening industry-wide. We are keenly interested in the development of OpenFlow-based systems. Protocols like VEPA offer the ability to remove dependency on the software switching layer in the hypervisor. NVGRE and VXLAN offer solutions for extending layer 2 networks across static core infrastructures. We clearly see a wealth of opportunity in the virtualization of the gateway and routing layers. Cloud computing is driving innovation in a way that we haven’t seen since the explosion of the Internet drove it in the nineties. It is a fantastic time to be in networking right now.
1. The architecture actually allows for many pools of VLANs, each associated with a separate core infrastructure. This allows us to scale beyond the 4095 VLAN limit.
2. VMware’s vCloud Director supports MAC-in-MAC network isolation as part of vCDNI, which attempts to solve the network isolation issue without VLANs. This design has its own problems, the most serious of which is a packet-injection security-flaw. This is moot for us, though, since we do not use vCloud Director.