For this edition of the ‘Ask a Skytap Engineer’ series, I spoke with Lonnie Hutchinson, Senior Architect with Skytap. Lonnie focuses on provisioning and orchestration. Prior to joining Skytap, he worked on a purpose-built database for storing high volumes of time-based performance metrics for HP. In his spare time Lonnie dabbles in amateur astronomy, metalworking, and urban farming.
1) What sort of development problem was Skytap Cloud created to solve?
Specifically, it wasn’t built to solve a development problem. It was created to be a platform to enable the complex, dynamic, cloud infrastructure needed not only during the development phase of the lifecycle, but testing, staging, and ultimately the production phases as well.
The dynamic nature of Skytap Cloud, relative to EC2, Azure, or Rackspace, enables developers to build the systems they need when they need them. When you realize you need to multi-home your service to solve a security issue, you can go for it—you don’t need to wait for IT to run a network drop, plug it in, or reconfigure networking.
2) As Skytap Cloud grew and we started reaching scalability limits of the initial architecture—what was the plan?
By mid-2010, Skytap had started to reach the scalability limits of the initial architecture for two of the four main components in the system. System reliability was suffering due to single points of failure—sometimes due to bugs, sometimes due to expected hardware failures. There were several conditions that threatened to prevent feature development, so it was time to rewrite.
Developers use the best tools available to them—and at Skytap, that includes our own product. One of the requirements of the rewrite was to remove single points of failure by introducing load balanced clusters of services behind VIPs with failover. Also, despite introducing more layers and more complexity into the system, we couldn’t let performance degrade. At the time, Skytap was a fairly small company that needed to continue delivering features, so we couldn’t devote the entire team to the effort—it had to happen in parallel.
Skytap Cloud was invaluable during the rewrite in two ways. First, being able to run identical machines in isolation from each other made development more efficient, particularly when done in parallel. Second, the highly dynamic nature of Skytap configurations, particularly around networking, reduced development delays as the system rapidly evolved.
During the rewrite, I got to the point where I would have several change sets backed up waiting for a build. Then, they would all be built together. But determining which failures were caused by which change set became less obvious.
To help with this, I cloned my build server and ran two identical copies of it. This way I could start builds for specific change sets when I wanted to, rather than waiting for the resources to become available. Setup was trivial since Skytap can run identical isolated copies of the same virtual machine with the same network settings. All I had to do was give the clone a unique public IP so I could access the machine remotely.
Another substantial benefit of Skytap over traditional on-premise labs and other large cloud providers was the ability to quickly build the deployment environment needed to develop and test the system. As the various components were rewritten, or extended to support fault tolerance and failover, I needed to run my development stack with the new features. Due to the distributed nature of fault tolerance and load balancing, I needed to add the new load balancers and virtual IPs on the front of the cluster, and more than a single member in the cluster. In a traditional IT environment, I would have needed to work with IT to procure and provision the new hosts and network infrastructure, and most critically, wait for that request to complete. Skytap Cloud put me in the driver’s seat and gave me the flexibility I needed as a developer to quickly evaluate different deployment strategies, load balancers, and fail over solutions.
3) Time for the trivia round: Lonnie, what is the best music to write code to?
Well, we have a stereo in the engineering wing of our office. Anyone can put what they want on, and anyone can turn it off at any time. I’m usually one of the first ones in, so we tend to listen to Phish (Run Like and Antelope are particularly good) as well as Grateful Dead and hippie bluegrass (String Cheese Incident and Yonder Mountain String Band).
[Editor’s note: I’m sure this question could be debated as having multiple possible correct answers. So, let the scientific discourse begin!]