The temporary suspension of trading on the NYSE this last week made me think of May 6, 2010, the day of the infamous “Flash Crash” in which US equity markets went into an inconceivable tailspin. I was working at the New York Stock Exchange at the time and distinctly remember the speculation as to what was causing such an incredible volume of absurd trade orders to be submitted to the various US equity exchanges. To this day there are several theories as to what precipitated the Flash Crash but many experts now agree that there was no single root cause but rather, a confluence of disparate events.
Regardless of cause, the Flash Crash presented some unique challenges for all US equity exchanges due to the sheer volume of orders/trading that day as well as the large number of cancelled trades. At that time, I was responsible for NYSE historical data products; products essential to customers employing, for example, traditional technical analysis or those developing highly-complex trading algorithms.
Because my focus was on historical data – as opposed to real-time data used to actively trade throughout the day – I worked exclusively with teams managing post-trade systems. Managing data for such an atypical trading day required not just diligent effort from the operations and development teams responsible for post-trade data at NYSE, but also the creative engineering and ultimate production support provided by those same teams.
Today, we would call this DevOps: the “practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.” Following the Flash Crash, the post-trade business, development, and operations teams involved took responsibility for proposing creative solutions, designing and developing those solutions, and then supporting the implementation. This is the power of DevOps. By taking a “whole team” approach, everyone has a stake in success and rightfully a sense of pride when solutions come together.
NYSE has reported that the outage this week was due to a “configuration issue.” In the coming days and weeks, I’m sure we’ll hear from many armchair sys-admins, operational and software engineers regarding NYSE’s outage. Given the strict regulation under which NYSE operates, more details on the outage are very likely to emerge.
But the recent NYSE outage — as well as the NASDAQ 2013 outage — are instructive in two ways. First, technology — no matter how well-designed, supported, and deployed — will have problems. Regardless of planning, redundancy, disaster recovery schemes, and any other measure used to attempt to reach 100% uptime, problems will appear. The goal, therefore, is to reduce the impact of technology problems by eliminating — as far as possible — points of failure within complex systems, by predicting them in safe dev/test environments before they can escape to production.
Second, development and operations teams must be well-prepared, equipped and jointly incentivized to address technology problems as before they arise. We often attribute system failures to the introduction of a new piece of code or software version, but any aspect of the entire software/hardware stack could become suspect when system updates are made. The adoption of DevOps and a more agile approach to system operations that keeps up with the rate of change within software development, offers a powerful methodology/mindset that moves teams closer to this ideal.
How does Skytap help enterprises eliminate SDLC constraints, reduce defects and accelerate innovation in DevOps initiatives? Learn more here!