Continuous Delivery of Fully Functional Environments

At Skytap, we face the same challenges as our customers in developing and releasing high-quality software quickly.

Often, the most painful stages of the software development lifecycle are the integration phase (when independently developed components run together for the first time) and the release phase (when new features face production workloads for the first time). Like other organizations, we’ve evolved a variety of DevOps techniques for mitigating these problems.

We currently use continuous integration (CI) and continuous delivery (CD) techniques, along with several of the unique features of the Skytap platform itself, to create entire environments that resemble the production platform. Copies of these ready-to-use environments are started on-demand and used by engineers to test code in a production-like environment, and in a variety of (possibly destructive) test scenarios. This frees developers to more thoroughly test their code, which reduces the QA burden and speeds up the release process.

This article outlines the continuous delivery workflow that we use to produce these environments and discusses how this workflow has improved our release velocity and reliability. Part two will be a deeper dive into how CI and CD are implemented with Skytap.

Update: Click here to read Part 2!

Our Problem Scope

The challenges that we face in scaling our software and organization aren’t unique. As with many organizations, as Skytap grows, the breadth and complexity of our platform necessarily increases. Additionally, as we add more engineers, we are able to develop features faster than they can be tested and released under a strictly linear, gated process. This introduces a bottleneck, which may leave developers in a wait-state during the test and release phases. It’s counter-productive to have chunks of an engineering organization idling while they wait to ship their work, and it’s expensive to coordinate complex scenarios where more and more features are added to the release.

Our Solution

To remove bottlenecks and improve the flow of code, we reviewed our integration and delivery processes. We decided that ideally, developers should strive to check in code frequently and that code should be built immediately (continuous integration). The result of that build—the artifact—should pass some test criteria and should be automatically packaged as a discrete piece of software. This packaged artifact then becomes a deployment candidate (continuous delivery).

We decided that to truly leverage the benefits of CI and CD, we needed to provide developers and testers with easy, safe access to their own copy of a clone of their production environment. We expected that this would enable engineers to:

Improve code quality

With decreased false positive/negative results in comparative testing; Skytap environments, while virtualized, very closely mimic the behavior of physical environments, and automated configuration management decreases drift between production and pre-prod environments
Access and visibility into all affected portions of the stack allows engineers to better understand the system-wide impact of their changes
Engineers are able to run more extensive automated tests, test against other features, run simulated production workloads, and perform dangerous or destructive experiments, among other things
We check-in code to main project branches more frequently; smaller incremental changes have fewer compounding issues
When problems do occur, they are visible earlier in the process and are smaller in scope

Enhance cross-functional collaboration

Test burden is shifted, in part, to the developers, who are likely to be most familiar with causes and solutions to problems that appear with their code; this frees up QA teams to spend more time developing robust test scenarios and to advise development on effective test techniques
Frequent delivery of discrete, fully functional environments with continuously integrated changes allows teams to use each other’s work quickly, instead of waiting for delivery to a shared integration environment
The self-service nature of pre-packaged environments allows developers and operations to focus on the most important interactions between the platform and infrastructure. This helps to identify potential issues well before code reaches the production environment, and reduces the operational load inherent in maintaining multiple pre-prod environments

Increase release speed

By front-loading the effort of addressing bugs, they’re cheaper and faster to fix. This reduces the QA time spent on final integration testing and pre-release verification
Releases are less risky, as changes have already been tested in the context of the platform at large. The development of smaller viable changes is simpler; a small change can be tested in a production environment clone by smaller teams, with simpler cross-team collaboration
Conceptually, this sounds great. To achieve this, we needed to combine a set of tools and a process that could scale with us, and this process should be largely automatic and easy to replicate.

Our Tool Set

Like most software companies, we heavily leverage distributed source control (Mercurial and Git in our case) and configurable build servers (Jenkins). We manage our build jobs with configuration and a job construction tool (Jenkins Job Builder). We make use of configuration management tools (like Ansible and Puppet), and we modularize our platform services with containerization tools (Docker, Kubernetes).

We already had many of the pieces in place to begin delivering full environments to engineers. To pull everything together, we needed to integrate these tools with our internally developed automated environment construction tool (Jenga) and add the real secret sauce: Skytap Templates.

With these tools and the CI/CD techniques we’ll explore in-depth in part two, we are now able to produce several nightly caches of our full stack — including the supporting infrastructure for each — and save these as Skytap Templates.

Each template contains a production environment clone with (currently) anywhere from around 40 VM’s and 3 networks, up to around 200 VM’s with six networks, and each captures a fully functional snapshot of the Skytap platform (which, conveniently, also runs on the Skytap Platform – we’re testing all the way down!) We’ve abstracted one step further away from continuous delivery of software artifacts; we’re instead delivering entire environments running the full platform.

What Does All of That Get Us?

If you’re a Skytap engineer, you simply need to copy one of those Jenga-constructed Skytap templates and run it. This provides an advantage over using provisioning tools to produce environments on the fly because provisioning environments is a slow, complex process, and a lot can go wrong. Engineers should have fast, easy access to production environment clones, and should be able to treat them as disposable when problems inevitably appear—otherwise, you’re losing one of the primary benefits of virtualization.

In just a few minutes, our engineers have an environment running a fully functional instance of Skytap. This is an incredible productivity boost for many common activities:

Comparing two releases to understand regressions
Producing a development environment that matches production
Testing release scenarios
Destructive testing and experimentation: if your cost to produce a new environment is nearly zero, you can break whatever you want!
Integration testing
Platform exploration—On-boarding new engineers is simplified because they can be trained in disposable environments

In part 2 of this series, we’ll dive deeper into how our build system creates these nightly templates.

The Results

Ultimately, we’ve been able to increase our release cadence and reliability, without sacrificing the ability of discrete teams to work independently of each other. Continuous Integration at the team level allows changes to be immediately merged into mainline development, and this, in turn, surfaces problems while the change is an active work item. Raising problems early in development simplifies resolution because the context surrounding the issue is still fresh in everyone’s mind.

Decoupling code check-ins and integration from the release process has given us ancillary benefits in code reviews. Our reviews are focused on professional growth and code quality, rather than being a gate that blocks our check-in. It’s always unfortunate when you make a mistake and break a build, but if your builds are cheap and frequent with fast and visible feedback, developers can safely treat check-in and integration as a separate activity from review. For us, this has been a boon to our culture of collaboration: code reviews are about feedback and growth, rather than being release-oriented transactions or gates.

Automating environment construction and making access to environments a cheap self-service task has allowed us to significantly reduce the load on our operations team. Without these tools, operations might be required to service requests to create and maintain dev/test environments. Additionally, it’s much simpler for our dev and ops teams to collaborate.

With functional environments that can break without impacting operational integrity, operations can advise development without worry that this advice will be misapplied to the production environment, and without the burden of resolving problems in shared dev/test environments when something goes wrong (again, we can just throw away the environment and start fresh). Constant operational support of non-prod environments doesn’t scale well and can lead to hostility between development and operations. DevOps is about the opposite!

Continuously delivering changes from each individual line of development into freshly constructed environments each day has helped us surface deployment and service integration issues more quickly. Your project’s CI process may complete successfully and the code may pass review, but you’re still likely to discover problems that only occur when you plug your service into the platform. By continuously exercising these changes together, we’re now able to see both the adverse effects and the added value of current work very quickly.

Additionally, disconnecting the running environment from the process used to produce working environments has allowed us to easily clone environment state. These clones simplify A/B comparison (E.G., “did this problem exist before, or is it new?”) Guaranteeing the same state in various clones also makes it simple to do destructive testing without impacting other teams. If you’ve ever had dev and test stalled because a shared integration environment was down, you’ll understand how much we like being able to let individuals and teams create and destroy environments at will. Plus, we’re able to run automated system or acceptance testing in single-purpose environments, without tests being ruined by activity in shared environments!

Finally, by leveraging Skytap Templates to continuously deliver fully functional snapshots of the current and upcoming platform, we’ve significantly reduced the amount of time it takes to get a functional clone of most environments. Even with powerful tools like Jenga, producing an environment from scratch often requires a lot of expertise about the tool-chain (dealing with puppet errors, for example), and a lot of knowledge about the infrastructure.

While slogging through these problems can be an instructive exercise, it’s time-consuming and can place a heavy support load on our infrastructure and provisioning experts. Delivering functional templates has reduced the time it takes to get a working dev stack for an individual from days or weeks, down to about an hour. Our engineers are free to spend time developing features, not wrangling their bespoke environments – and since they’re using automatically constructed clones, their environmental assumptions are far more likely to match the reality of integration and production environments when it comes time to release their changes.

Building and maintaining all of this CI and CD infrastructure takes time and effort, of course, but it pays strong dividends in the form of increased individual productivity, increased ability for teams to work in parallel without creating integration headaches, and increased confidence and predictability of our releases. One of the great things about being a cloud provider is that we frequently face the same challenges as our customers. It’s a constant pleasure to know that we’re able to use the very tools we provide as key components in our solutions to these problems.

We encourage you to check out Part 2 to this story to explore some of these solutions in more detail, with the hope that our success will be your success as well!

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-preferences	1 year	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
55d66ab20f0ad28a_cfid	2 years	Set by ChatFunnels to store chat sessions
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sc_anonymous_id	9 years	Cookie is placed by SoundCloud to provide functions across pages.

Cookie	Duration	Description
__utma	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat_UA-4086838-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
YSC		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gcl_au	2 months	This cookie is placed by Google Tag Manager to place and track conversions.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_uv_id	2 years	Slideshare: Collects data on the user's visits to the website, such as which pages have been read.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
bscookie	2 years	This cookie is placed by Linkedin to store performed actions on the website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
GPS	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
li_sugr	2 months	This cookie is placed by Linkedin to store browser details.
lissc	1 year	Used by the social networking service, LinkedIn, for tracking the use of embedded services.
MR	1 week	This cookie is used to measure the use of the website for analytics purposes.
pardot		The cookie is set when the visitor is logged in as a Pardot user.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
vuid	2 years	Vimeo

Cookie	Duration	Description
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and also verify the clicks from ads on the Bing search engine. The cookie helps in reporting and personalization as well.
IDE	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
MUID	1 year	Used by Microsoft as a unique identifier. The cookie is set by embedded Microsoft scripts. The purpose of this cookie is to synchronize the ID across many different Microsoft domains to enable user tracking.
SRM_B	1 year	Bing.com
SRM_I	1 year	Bing.com
u	2 months	Collects data on user visits to the website, such as what pages have been accessed. The registered data is used to categorize the user's interest and demographic profiles in terms of resales for targeted marketing
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
UserMatchHistory	1 month	This cookie is place by Linkedin to enable ad delivery or retargeting.
VISITOR_INFO1_LIVE	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Continuous Delivery of Fully Functional Environments at Skytap (Part 1)

Our Problem Scope

Our Solution

Our Tool Set

What Does All of That Get Us?

The Results

Join our email list for news, product updates, and more.

Product

Company

Help

Cookie	Duration	Description
_clck	1 year	No description
_clsk	1 day	No description
AnalyticsSyncHistory	1 month	No description
CLID	1 year	No description
ingrammicro.com	1 hour	No description
li_gc	2 years	No description
loglevel	never	No description available.
original_req_url	past	No description
visitor_id869971	10 years	No description
visitor_id869971-hash	10 years	No description