We talk a lot about modernization at Skytap; it’s our business. But we haven’t shared much about how we are continually modernizing Skytap Cloud itself. This blog is the first in a series that will share our experiences evolving our infrastructure and software development processes. Our goal with this series is to give our customers, and the broader open source and cloud communities, useful insight into how to leverage our successes and avoid our mistakes in your own modernization process.
Skytap’s engineering team has spent the past two years building a sophisticated approach to container implementation. Today, the majority of our production services—the unique capabilities that deliver our global cloud—are containerized using Docker and orchestrated through a large Kubernetes cluster. Much like the enterprises that are our customers, we began modernizing what was once relatively traditional infrastructure and had to iteratively progress to our current way of doing things.
Our Challenge
Skytap was founded almost 12 years ago, so we have built up technical debt over time, just like so many other organizations. A few years ago, we knew change was necessary. Our virtual machines and process runtimes were experiencing scaling pains. Our service deployment model was beginning to strain under rapidly increasing usage, and we needed more transparency across service and operations teams into service consumption, deployment, and alerting.
Docker and Kubernetes
Enter Docker containers, and right after containers, Kubernetes. Once we were introduced to containers, we saw the potential they held for our services. We could package all service-oriented operating system components into a Docker image, so whatever infrastructure hosted that service would just need Docker’s runtime to support it. Containers would also address reliability issues from development through production by eliminating drift. Of course, we also saw how these same capabilities could be a major headache, with containers being used at random and with even more sprawl than in large VMware deployments.
So, we needed a management plane. Although it was early days for Kubernetes, pre their v1.0 release, we worked extensively with Kubernetes to determine it offered the best combination of flexibility and usability of anything available at the time. This turned out to be an investment that has paid major dividends for Skytap. We also knew introducing not one, but two new open source tools, and building the right practices, processes, and organizational structure around them to be successful, would take time.
Managing Expectations
Our adoption curve began by managing expectations. “Change is hard” is not a cliche, it is a truth. Questions abounded from the very specific and individual, to broad sweeping queries about strategy. We divided and conquered our tests to try a variety of approaches to everything from packaging services to sharing service ownership. Engineering leadership level-set the gaps in knowledge and expected rough edges before we even got going, then addressed concerns clearly, so we all knew the risks, rewards, and long road ahead.
At this point it’s important to point out that we experienced something we see in our customers again and again: most teams don’t want to burn everything to the ground. Many on our team had committed years of their careers to building Skytap Cloud. It had many unique and exceptional qualities; destroying it to be built again with some new toys didn’t make sense. What we needed to do was balance the great things we had built with a way forward to a scalable strategy for the future.
The Way Forward
Our road—from traditional VMs to running our cloud on a massive production cluster of Kubernetes nodes orchestrating services packaged into Docker containers—is proof that hard work upfront can really lighten the load going forward. In our next post, we’ll dig into that work, which continues to pay dividends today. We’ll also share details about our containers and Kubernetes implementation, including processes and tips for success.
Until the next post, if you’re interested in details on our approach to continuous integration, check out these two blogs.