Welcome to the first “Special Draft” edition of the Dev-Hops Podcast, in which we focus on real-world stories of process and technology transformation from the people who lived them. Rather than having a conversation, we’ll ask seasoned pros to relate their own case studies of big changes they were a part of. What went right, and what went terribly wrong? We can all benefit from this kind of insight, and it’s kind of nice to sit back and listen to someone’s story exactly as they want to tell it.
On board for today’s episode is Paul Farrall, Skytap’s VP of IT Operations. When he’s not busy managing our global cloud infrastructure, Paul is also a leading advocate for DevOps transformation, and advancing better collaboration between development and operations across the software delivery lifecycle. He contributed to the widely-read DevOps novel “The Phoenix Project” and is the current president of the Seattle Society of Information Management (SIM) chapter.
Before Skytap, Paul was part of a DevOps-style transformation in software delivery at a leading gaming and entertainment firm, Big Fish Inc. He and his colleagues discovered a lot of bumps in the road to DevOps, but like anything worthwhile, it took persistence and patience to ultimately improve. So now with his Dev-Hops Story “Fishing for DevOps Success,” here’s Paul Farrall.
Can’t listen right now? Below is a transcript of Paul’s story for your reading enjoyment!
As Jason mentioned, my name is Paul Farrall, and I am currently the VP of Operations here at Skytap, where I have responsibility for production operations at Skytap’s six data centers in Seattle, Dallas, Virginia, London, Singapore and Sydney, Australia, along with responsibilities for internal corporate IT services. Jason asked me to talk a little bit about my personal experiences with DevOps. At Skytap, we live and breathe DevOps. It’s what we do internally. Our product is designed to help IT departments implement their own DevOps efforts. In fact, this is one of the main reasons I joined Skytap.
I’ve been involved in the DevOps community since before it was called DevOps. I was a pre-publication reviewer for The Phoenix Project and I’m a pre-publication reviewer for The DevOps Handbook, which is about to go to press. In fact, I think I have a case study included in that book, if it made the final editing cut. Rather than talk to you today about DevOps at Skytap, which is a little bit of a unicorn example, as a cloud services provider who builds solutions for DevOps implementers, instead, I thought I would relate a couple of examples from my experiences prior to Skytap, when I worked in a more traditional IT shop. Before I joined Skytap, I was the VP of Operations at Big Fish Games for five years here in Seattle.
At Big Fish Games, I had overall responsibility for production operations at Big Fish’s three production data centers in Seattle, Virginia and Luxembourg. I also had responsibility for internal corporate IT and information security there. A few quick facts about Big Fish, in case people aren’t familiar with it: it’s a publisher of casual games for PC, Mac and mobile platforms. When I was at Big Fish, we had around 3,000 games in our catalog, and these games were developed by internal development teams, and a little over 500 or so, third party development studios located around the globe.
We did around a million game downloads a day. Games were localized into 11 different languages. We processed essentially every major currency, and right before I left Big Fish, we, in fact, started taking Bitcoin, as well. From an IT perspective, it was a global eCommerce company selling digital widgets on the internet. We had around 800 employees in our five offices in Seattle, Ireland, Oakland, Luxembourg and Vancouver, BC.
I’m going to give you a few examples from my DevOps experience at Big Fish Games, but first, I want to take a minute to define, “What is DevOps?” People joke frequently nowadays, “DevOps means whatever you want it to mean,” and I think this a little sad.
I see people making two mistakes, lately. The first is, I see people defining it too broadly. In other words, they define all that is good to be DevOps. Everything you do that is awesome is DevOps, and everything that you do that is not awesome, is not DevOps. That’s not really a useful definition. Then, also recently, I’ve seen people defining it too narrowly, where DevOps is simply a synonym for automated code deployment. DevOps was originally supposed to be much more than that. Here’s my simple two part definition: DevOps is simply, number one, a focus on overall systems throughput, instead of localized siloed optimizations.
Number two, it’s a collection of techniques designed to help development, operations, QA, and InfoSec teams work together to improve overall product development flow while simultaneously increasing quality and reliability. Now, whatever your definition of DevOps is, one of the attributes consistently mentioned as both a prerequisite for and an outcome of DevOps, is a high level of trust between development and operations organizations. Increased levels of trust between development and Ops organizations, is indeed a common outcome of DevOps initiatives.
I personally experienced this, but you do need a certain minimum level of trust to get started. This trust is important because you’re blurring traditional boundaries between development and operations responsibilities. This can be scary for Ops because it’s scary for Ops teams to involve development in production operations. It just feels wrong from a historical ITIL type perspective, and Ops teams may also know development as a team that always breaks things. It can be scary for development to involve operations in the development process because they may know Ops as the team that always says, “No.”
They may fear that involving development in the development process may cause the development team to get stuck between product management, pushing them to deliver features, and the operations team holding up the development of those features. Trust is a requirement to succeed at DevOps, but we’re handicapped right out of the gate because of fuzzy DevOps definitions and the dangerous territory of mixing up boundaries between development and operations teams. My first attempt to introduce DevOps at Big Fish, failed miserably, and I’m going to talk about this because it’s instructive. It was a trust failure.
I failed miserably due to cultural misunderstandings and a fundamental lack of trust between the development and operations teams. Here’s a simple, although embarrassing story, to illustrate this point. At Big Fish Games here in Seattle, application development teams equated the term DevOps with developers carrying pagers: an idea that they really did not like. They were highly resistant to this. Every time I brought up DevOps, they thought I was engaged in a secret plot to make the developers carry pagers. What was interesting about this is that I had no intention of asking developers to carry pagers.
The term “DevOps” covers a large range of continuous improvement techniques. Some organizations have developers carry pagers, and it works for them, but this is certainly not a prerequisite for DevOps. What’s worse, is I had no idea they were thinking this because they were afraid to bring it up for discussion. We failed in our first attempts to implement DevOps because the trust level between development and operations teams was so low, that we couldn’t even tell each other what we were thinking. One of my responsibilities at Big Fish Games was managing the release engineering team.
Here’s another simple failure example. I noticed while managing the release efforts, that release schedules for the various development teams ranged from a couple weeks to a couple months, in other words, too long for a hyper-competitive industry like gaming. I asked the development teams if they would like if their release engineering team made it possible for them to do lightweight co-releases every day, or even multiple times per day. I thought this was an innocuous question. In fact, I thought I was being a little clever by playing dumb here. I assumed I’d get an enthusiastic high-five, “Yes, we’d love that!”
Instead, I was surprised that I got a violent, “No!” response; “We don’t want that!” This confused me for awhile, until I realized that the development team thought I was in cahoots with the product management team to put the squeeze on them. In my mind, I was proposing to them to make their release process more lightweight so they could break up their work into smaller, more frequent releases, with less stress for everyone, but I didn’t understand their fears and their problems. I wasn’t looking at it from their perspective. They also didn’t trust Ops. They didn’t take us at face value.
What they were afraid of, is they imagined a product … So I pictured them being able to release code whenever they want, which seemed like something they would like. In their minds, they pictured a product manager leaning over their shoulder every day, randomizing them with demands for new features to be released, like right then, while the product manager was standing there. We were coming from two different worlds. This was not the kind of problem that you could resolve with logic. The solutions I was proposing, and the working procedures I was proposing, were so far outside the development team’s experience level, that they couldn’t extrapolate from their current state to the world I was describing through simple logic.
It was going to require some trust to get us over this barrier, into this new proposed world order. How do we get past this impasse? To be honest, there was no silver bullet. It required years of conversations, education and trustworthy actions. There were three specific strategies I employed though, consistently over time. These three strategies were pretty successful. In particular, number one, I somewhat sneakily stopped using the term DevOps, as soon as I discovered that this was a scary term for them. Instead, I just talked about continuous improvement and specific continuous improvement methodologies that we wanted to implement.
Number two, when deciding which of these DevOps, aka, “continuous improvement methodologies,” we wanted to implement, I specifically chose ones that would benefit development teams, as opposed to benefiting Ops, or other teams. The development teams noticed this over time. They could see that I was doing this, and I was going out of my way to help them out, and this helped build trust. This wasn’t a one-time thing. We did this consistently. The third strategy that I employed, was I took every opportunity to maximize the number of daily interactions between development and operations team members. It’s easy to distrust or hate on someone that you don’t know personally, or that you don’t have to see every day. It’s a lot harder to hate on someone that you have to work with on a daily basis.
Let me give you a couple of success examples. One of the early DevOps techniques that I implemented was creating a dedicated Ops liaison that was embedded into each development team. This liaison acted as an ambassador for that development team into Ops, and it helped smaller development teams navigate our large complex and sometimes opaque Ops organization. This Ops team member would attend their daily development standups and their team meetings, and their project meetings. This was a big success because the development teams could see that this Ops team member was helping expedite their work through the operations organization.
After the development teams got used to working with this person on a daily basis, we pushed a little bit farther and we got the development teams to agree to include Ops officially, in architectural discussions, at the beginning of application design. Now, at first, development teams were skeptical that Ops could provide useful input here, but the important thing is we had gained enough trust that they weren’t fearful of including Ops in the discussions. They were skeptical that it would add value, but they weren’t afraid that we would cause problems. This was a big win.
It was a little bit unusual for development teams at first to include Ops in design discussions, but they quickly saw how attaining input from operations early on, before a product was built, resulted in better quality and less headaches for everyone when we launched into production. I also made sure that we put our most senior and diplomatic Ops team members into this role. We also included InfoSec after awhile, too.
Here’s a trivial, but illustrative example of how this embedding Ops into the product design worked. We had a mobile client. We had a development team that was responsible for building this mobile iOS client that needed to fetch content from one of our two primary data centers in Seattle and Virginia.
These two data centers were identical, and the Ops team had implemented global load balancing in front of these data centers to geolocate customers and direct traffic to one data center or the other, based on which one was closer to the customer, or available. The mobile development team didn’t … Frankly, they didn’t understand what global load balancing was, and they didn’t know that we had this infrastructure, so instead, they had a built a complicated load balancing algorithm into the mobile iPhone client. The Ops team had no idea they had done this, until the Ops liaison for this team was invited to a planning meeting where the development team was discussing some problems with their load balancing algorithm on their iPhone client.
The Ops liaison said, “Hey, why don’t you guys just point at the global load balance IP address for our data centers and let the global load balancing infrastructure handle all the routing decisions?” The development team had no awareness of this infrastructure and didn’t know this was possible. Long story short, a simple five minute conversation and one simple sentence from the operations liaison, eliminated all of this global load balancing code that they had on the client, which eliminated a large amount of work for the development team, and ultimately resulted in a much simpler and more robust service for our customers.
I’ll give one more example. A larger project we took on after we started gaining momentum with this DevOps effort, was our release engineering team built a continuous delivery framework that allowed game development teams to publish their code directly to production, without direct involvement from operations. This was a big win for everyone. It wasn’t something that we could have taken on immediately because we didn’t have the process, and we didn’t have the bidirectional trust necessary to implement this, but after we got going, we reached a state where we could deploy something like this.
We built this tool and released it to the game developers. Game development teams were very happy because they could push product code as fast as they wanted without operations being a bottleneck. The operations team was happy because they were no longer in the spotlight as a bottleneck, and also designing and building release tools was much more satisfying work for the Ops release engineering team than manually pushing code to production. The business stakeholders were happy because in the end, game features were getting to production much faster.
It felt a little weird at first, letting developers push code straight to production, but the operations release engineering team maintained their release tools, so all the necessary controls, safeguards, audit trails, etc. were built into these tools, and we quickly got used to having the developers push the release button. Later, the operations team extended on this self-service team, and we built an internal private cloud infrastructure that allowed people to spin up their own virtual infrastructure that didn’t require any Ops team involvement. I think I’ll save that story for another day.
We hope you enjoyed this Special Draft premium content on Skytap’s Dev-Hops Podcast. Dev-Hops. Where innovation flows freely, and fresh ideas about software delivery are always on tap. Thanks for joining us.