How our Docker and Kubernetes deployment grew up

My colleague Jonathan recently kicked off a recurring blog series about our internal adoption and use of Docker containers and Kubernetes. His post gave a brief history of how we came to introduce these technologies as part of ongoing modernization, and detailed how we addressed people and process changes along the way.

In this post, I’m going to take things a step (or two) further with a deeper look into how we went from cloud services running on traditional VMs to containerized services orchestrated by Kubernetes nodes in production. It was a major undertaking for our whole engineering team, with the goal of improving observability, manageability, and reliability as our cloud platform grew.

The familiar challenges of VM-based services

Like so many of our enterprise customers, we originally ran our services on a traditional VM-based infrastructure, with individual service owners responsible for the infrastructure powering their service. This approach worked when we were small, but complexity spread as our cloud service expanded and customers began consuming services en masse.

In our previous architecture, building a new service first required an engineer to specify hardware requirements, then engage our infrastructure team to request the necessary VMs. It was not an unusual process, but gauging requirements on services in progress was challenging. Engineers would often overestimate hardware requirements. Not only did this approach waste dollars and hours, it was also difficult to reclaim resources once deployed. The challenges didn’t stop there, however. Once the VMs were identified, we had to configure the prerequisites to run a given service, which introduced another scaling issue — VM provisioning trended toward a superset of all packages and dependencies necessary for a wide range of services, but we couldn’t easily track and audit the requirement drivers.

This challenge became untenable as our customer base went global, making simply provisioning infrastructure and calculating new hardware requirements a recurring struggle. We built alerts for critical services to help us monitor the state of our cloud, and notify us when an error occurred. Still, our cloud kept growing and manual service remediation became unwieldy. For instance, when a service experienced an out-of-memory kill, we first had to perform a manual restart, then, if there were multiple services on a single VM, begin to manually investigate each to determine the root cause.

Moving to containerized services

While evaluating Docker containers and Kubernetes — the latter being pre-1.0 at the time — as an alternative to traditional VMs, it became clear that such a dramatic shift would require major changes. While we were confident in the long-term benefits, we were also apprehensive about the journey ahead, from reskilling teams to unforeseen issues almost certain to arise.

We decided a measured, iterative approach was the best way forward. A small group of engineers was created to lead the effort. Their first job was to consolidate a set of best practices for modifying existing services to run in containers and to monitor and test these new containerized services. We built our own internal tools to address a range of issues:

Ease adoption path for service teams
Orchestrate and operationalize usage of container
Update our existing deployment tools to support containerized deploys

We were confident that containers orchestrated with Kubernetes would improve deployment, observability, manageability, and reliability for our cloud service. Achieving these goals meant working cross-functionally using DevOps methods. For us, that simply meant the infrastructure team could write scripts and create manifests, while developers had greater infrastructure responsibility, sharing the operational load and proactively managing the growth of our cloud platform.

Before we containerized anything, we evaluated existing services to identify those that were less critical and required minimal refactoring. We used these services, which were primarily Python-based services using asynchronous communication, as a proving ground. Container manifests provided an auditable, consistent, and repeatable template for every build. However, as we containerized more complex services, there was still significant administration required due to the unique nature of our cloud.

In-depth service monitoring

As our container adoption and approach matured, so did our team. Service owners now had a range of sophisticated analytics and monitoring available to them, giving them greater responsibility for their service profiles. Meanwhile, the infrastructure team now had much more visibility into resources thanks to service-level monitoring and a custom-built assessor we created to calculate when a service would need more capacity — all leading to dramatic efficiency gains.

In the next post we’ll share more details on our service-level monitoring tools and processes. The combination of what Kubernetes provides “out of the box” and our homegrown tooling has delivered dramatic efficiency gains we hope others can achieve, too. Stay tuned!

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-preferences	1 year	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
55d66ab20f0ad28a_cfid	2 years	Set by ChatFunnels to store chat sessions
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sc_anonymous_id	9 years	Cookie is placed by SoundCloud to provide functions across pages.

Cookie	Duration	Description
__utma	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat_UA-4086838-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
YSC		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gcl_au	2 months	This cookie is placed by Google Tag Manager to place and track conversions.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_uv_id	2 years	Slideshare: Collects data on the user's visits to the website, such as which pages have been read.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
bscookie	2 years	This cookie is placed by Linkedin to store performed actions on the website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
GPS	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
li_sugr	2 months	This cookie is placed by Linkedin to store browser details.
lissc	1 year	Used by the social networking service, LinkedIn, for tracking the use of embedded services.
MR	1 week	This cookie is used to measure the use of the website for analytics purposes.
pardot		The cookie is set when the visitor is logged in as a Pardot user.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
vuid	2 years	Vimeo

Cookie	Duration	Description
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and also verify the clicks from ads on the Bing search engine. The cookie helps in reporting and personalization as well.
IDE	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
MUID	1 year	Used by Microsoft as a unique identifier. The cookie is set by embedded Microsoft scripts. The purpose of this cookie is to synchronize the ID across many different Microsoft domains to enable user tracking.
SRM_B	1 year	Bing.com
SRM_I	1 year	Bing.com
u	2 months	Collects data on user visits to the website, such as what pages have been accessed. The registered data is used to categorize the user's interest and demographic profiles in terms of resales for targeted marketing
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
UserMatchHistory	1 month	This cookie is place by Linkedin to enable ad delivery or retargeting.
VISITOR_INFO1_LIVE	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

How our Docker and Kubernetes deployment grew up

Join our email list for news, product updates, and more.

Product

Company

Help

Cookie	Duration	Description
_clck	1 year	No description
_clsk	1 day	No description
AnalyticsSyncHistory	1 month	No description
CLID	1 year	No description
ingrammicro.com	1 hour	No description
li_gc	2 years	No description
loglevel	never	No description available.
original_req_url	past	No description
visitor_id869971	10 years	No description
visitor_id869971-hash	10 years	No description