How Skytap Makes Creating Production Clones Easy

In previous blog posts, we covered Jenga (our environment construction tool) and did a deep dive into how we use Jenga as a component in delivering fully functional clones of our production environment.

We’re very happy with this delivery pipeline, and the test environments it produces. Our engineers always have up-to-date templates for development or testing work, and the disposable nature of pre-built copies of environments encourages a culture of productive experimentation. There are significant opportunities for engineers to explore components of the platform that are peripheral to their specialization, CI/CD increase confidence in our releases, and easy duplication of continuously delivered templates ensures that everyone has access to the most recent work done by our various internal teams.

Even with automatic delivery of these environments, however, these test environments still had some rough edges, particularly in our two primary categories of concern: test environments should be easy to use, and they should be as production-ready as possible.

We had a few high-level problems to address:

Each host in each environment should have a hostname and FQDN that can be resolved by DNS — including any virtualized hosts that are invisible to the automatic network so that we can avoid configuration that relies on NAT IP addresses.
Each host should be able to resolve domain names for records that exist only on our internal corporate DNS servers — the automatic network does not use our internal DNS as its upstream provider.
Environments should have something like the ‘organic’ data that comes about with real use of the platform.
Environments should have a TLS Termination Proxy instead of using workarounds to allow clients to communicate in the clear, and to avoid using self-signed certificates.
Containerized services should use load-balancing pools for Kubernetes pods as we do in production, instead of directly addressing a socket on a Kubernetes node.
Every environment should be accessible through a single, static domain; this eliminates the need for end-users to make any local configuration changes to access their environments.
End-users should be able to start up an environment, with all of the above problems solved, in a single step.

Breaking Up the Problem Space

While deciding where to apply effort to solve each of these problems, we took a look at how templates for test environments are constructed. We determined that some solutions made sense to apply early in the build process, and others made sense to apply later — and finally that some of the problems required external infrastructure, along with automation that should be applied when the cloned environment is launched from a template.

We broke the build process down into four phases. Phases 1-3 are ‘construction’ or delivery oriented; the last (launching) is related to load-balancing infrastructure and will be discussed in detail in an upcoming article: Dynamically Multiplexing Disposable Prod Clones

The four phases of a test environment:

Provisioning—base templates are cloned, VMs are launched, and hosts are configured with Puppet. Automatic network settings are replaced with environment-specific static networking configuration.
Bootstrapping—services are launched and configured according to the platform configuration. Additional virtualized infrastructure (hypervisors, virtual networking, etc) is configured and started within the test environment.
Post-boot configuration—we adjust settings in the environment using its now-functional API and install a startup script that the end-user will use to restart the environment from a template. At the end of this phase, we shut down the services, shut down the VMs, and create a template from the stopped environment.
Launching—in this phase, the pre-constructed template is copied and started by the end-user. This is where we want single-step operation: the end-user should be able to run a single script, and then be ready to go.

Enhancements at Each Phase

During provisioning (phase 1) we address the problems of resolving domain names within the test environment’s sandbox, and of enabling upstream DNS resolution for hosts within the Skytap Cloud corporate network.

During this phase, VMs are launched and their initial network configuration is bootstrapped from the automatic network provided by the Skytap Cloud platform. Puppet then configures each host depending on its type, and the Jenga build process applies additional configuration that allows the platform to operate inside of the sandbox of a Skytap Cloud Environment.

Once the provisioning for each node is complete, we pull a switcheroo and replace the automatic network (DHCP) with a static, custom configuration. We provision our puppetmaster host to include a DNSMasq server. DNSMasq is configured to the static IP set for our internal DNS as its upstream provider, which solves our problem of resolving corporate-network domains from within the test environment sandbox. Now, for each host in the test environment, we add a record to the DNSMasq configuration. DHCP is disabled on each host, and the /etc/resolv.conf for each is updated to use our puppetmaster host as its DNS provider.

The bootstrap phase (phase 2) happens after all of the hosts are provisioned, and the network is reconfigured. Services haven’t started up in the platform yet, so this is a convenient time to address our need to have usable platform data in the environment.

Our approach is to create database fixtures describing customers and users and inject them into the clean database. The fixtures should appear reasonably organic.

To achieve this effect, we gather a list of our Engineering users from LDAP, then feed this to a Rake task that creates the expected customers and users in the database. Additional Rake tasks configure a customer with appropriate administrative privileges and provide a functional configuration for the fixture customer so that environments, templates, VMs, networks, and storage will have usable quotas set for each environment cloned from this build.

Post-bootstrap configuration (phase 3) occurs after all of the platform services have started up — at this point, the environment has a functional API. We have an opportunity to do some real work with the proto-platform before we shut everything down and create a template, and to exercise the API to ensure that basic functionality is intact. So, we use the API to set up VM import jobs, transfer VM images to the FTP server in the environment with LFTP, and run those import jobs. At the end of this stage, the fresh test environment is decked out with usable VM templates that can be launched and used like any other VM, with the only difference being that it exists inside of the sandbox of a test environment.

We also configure projects and users that are appropriate for automatic testing. We can prepare the environment to run a suite of automated functional tests — so hey, why not go ahead and run that test suite and generate a report?

Finally, we install an environment startup script (and supporting modules) into the environment, then shut everything down and produce a template. We’ve done a detailed discussion of this template delivery pipeline in previous posts.

Test Environment Startup (phase 4) is distinct from the test environment template construction that occurs in phases 1-3. This is where our ‘single step startup’ problem is solved. In this phase, the end-user creates a copy of the previously constructed environment and runs the startup script to safely and automatically start up services.

The start script will also attach the new environment to the test environment support infrastructure (an external load balancer). A few additional niceties are applied — hosts in the environment are configured with an .hgrc with the environment owner’s mercurial username, and the ‘suspend on idle timeout’ period is adjusted.

At the end the startup phase, the environment is usable just like the production environment, either through our REST API or through the web UI. A web browser (or REST client) pointed at https://www-1234.test_envs.internal.skytap.com from within the corporate network will function like the production platform, assuming that ‘1234’ is the configuration ID of the newly-started test environment.

Phase four is simple for our end-user — but there’s a lot of stuff that needs to happen to make this function transparently for all of the concurrently running test environments. In the upcoming blog entry, Dynamically Multiplexing Disposable Prod Clones, we’ll discuss the details of how this works.

Did this really make life easier?

In short: yes, and it continues to improve.

The usability enhancements have been delivered iteratively over the lifecycle of the Jenga project, so our users haven’t always enjoyed ease of use in all of the problem areas we outlined. Setting up a basic customer account began relatively early in the project, for example, but injecting fixtures for user data didn’t happen until much later. With each milestone, however, ease of use and production parity have improved.

Until recently (early 2017), test environment owners were expected to configure local hosts files to use the web UI and REST API for their environment and to whitelist self-signed SSL certificates. This was tedious, and made people reluctant to discard their ‘pet’ environment in favor of launching from a fresh template; enhancements in the environment startup phase have greatly improved this experience, and consequently, engineers are more likely to be running an up-to-date environment.

Prior to our post-bootstrap configuration enhancements, testing things like VM hosting (which, unsurprisingly, is a primary function of the Skytap Cloud platform), required the environment owner to import a VM image. This is time-consuming. Having the ability to launch VM’s and use the platform to manage those VM without any preliminary work has greatly improved usability, and has given us additional confidence that the platform in the freshly delivered templates is at least minimally functional.

Without organic user data in environments, functional tests would require either manual administrative action, or some external test driver to establish fixture data—baking this into each template eliminates the need to repeat this for every environment, and again, improves usability.

Finally, by swapping out the automatic network that’s available during provisioning with a static configuration and sandboxed DNS, we were able to improve the flexibility to prototype new components within the environment — we can, for example, provide an FQDN to VM’s that are hosted within the test environment, or establish custom routing channels for things like testing the effects of intermittent network failures.

With each of these accommodations in place, we’ve seen two behaviors emerge for test environments: First, they’re used much more extensively (to the point where our efforts are shifting away from usability and towards capacity management), and second: problems are being identified and corrected very early in the lifecycle of a given release.

Even within Skytap Cloud, we continue to be impressed by the flexibility that our platform can deliver. Acting as our own customer, and continuously producing and using complex environments on the Skytap Cloud platform itself has afforded us opportunities to understand how to build and deliver software with Skytap Cloud as a cornerstone. Enhanced usability for the product that we’re running in Skytap Cloud has, in turn, exposed us to the sorts of challenges that our customers may face (and solve) with our platform.

We look forward to hearing more about the amazing things our customers build using Skytap Cloud!

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-preferences	1 year	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
55d66ab20f0ad28a_cfid	2 years	Set by ChatFunnels to store chat sessions
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sc_anonymous_id	9 years	Cookie is placed by SoundCloud to provide functions across pages.

Cookie	Duration	Description
__utma	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat_UA-4086838-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
YSC		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gcl_au	2 months	This cookie is placed by Google Tag Manager to place and track conversions.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_uv_id	2 years	Slideshare: Collects data on the user's visits to the website, such as which pages have been read.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
bscookie	2 years	This cookie is placed by Linkedin to store performed actions on the website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
GPS	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
li_sugr	2 months	This cookie is placed by Linkedin to store browser details.
lissc	1 year	Used by the social networking service, LinkedIn, for tracking the use of embedded services.
MR	1 week	This cookie is used to measure the use of the website for analytics purposes.
pardot		The cookie is set when the visitor is logged in as a Pardot user.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
vuid	2 years	Vimeo

Cookie	Duration	Description
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and also verify the clicks from ads on the Bing search engine. The cookie helps in reporting and personalization as well.
IDE	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
MUID	1 year	Used by Microsoft as a unique identifier. The cookie is set by embedded Microsoft scripts. The purpose of this cookie is to synchronize the ID across many different Microsoft domains to enable user tracking.
SRM_B	1 year	Bing.com
SRM_I	1 year	Bing.com
u	2 months	Collects data on user visits to the website, such as what pages have been accessed. The registered data is used to categorize the user's interest and demographic profiles in terms of resales for targeted marketing
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
UserMatchHistory	1 month	This cookie is place by Linkedin to enable ad delivery or retargeting.
VISITOR_INFO1_LIVE	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

How Skytap Cloud Makes Creating Production Clones Ridiculously Easy

Breaking Up the Problem Space

Enhancements at Each Phase

Did this really make life easier?

Join our email list for news, product updates, and more.

Product

Company

Help

Cookie	Duration	Description
_clck	1 year	No description
_clsk	1 day	No description
AnalyticsSyncHistory	1 month	No description
CLID	1 year	No description
ingrammicro.com	1 hour	No description
li_gc	2 years	No description
loglevel	never	No description available.
original_req_url	past	No description
visitor_id869971	10 years	No description
visitor_id869971-hash	10 years	No description