Most of us have heard about the crisis Southwest Airlines had over the holidays. Most articles cite “problems related to legacy systems…” and “outdated scheduling software called SkySolver.” Though Southwest received a lot of attention from this incident and has offered affected customers loyalty points for the resulting disruptions, the legacy software problems encountered by Southwest are shared by many companies across many industries and therefore offer many learnings.
The old saying “if it works, don’t mess with it…” is often good advice in many situations in life, but not all. For example, if your car works fine, you still need to change the oil and do basic maintenance. If the roof of your house doesn’t leak, you still need to ensure your rain gutters are working and that moss is not growing on your shingles.
The same is true of IT systems. Because it “works” today, you might not be inclined to do regular basic maintenance because more pressing projects are on the table. Or maybe you’ve lost the specialized IT skills to perform the work if it is a legacy system. “Don’t mess with it” works for a while, but if left long enough, a bigger problem happens. It becomes “unfixable” or at least, very difficult to fix.
Southwest is probably in the “very difficult to fix” bucket.
I worked on another project for a company in the Healthcare industry. It had a very old legacy application that its business relied on every day. The application had actual “hard coded” IP addresses in the source code, which the customer could not modify since its original software vendor was no longer around. Its data center was closing down, so there was no time to refactor, re-write, etc. It had to move to the cloud, but wanted to keep the legacy software exactly as it is today. On top of that, it needed to perform an “incremental” migration to the cloud. That meant the network subnet with the hard-coded IP addresses had to simultaneously exist and be active on-prem and in the cloud. Not quite to the point of “unfixable,” but very close. Skytap able to sort it out with some clever networking during the migration.
It was another example of an “it works, don’t mess with it” that got out of hand over time.
The Healthcare company needed to “do something.” That ended up being lift-and-shift to the cloud to get out of the current emergency, then once stabilized, see how to begin the modernization process over time. That is probably what Southwest should do as well.
Refactoring a legacy application will be time-consuming and expensive. The technical debt will also make it high risk if you think about the tentacles an older application will have with a large number of other older applications that make up a company’s entire app ecosystem.
Southwest’s stated multi-cloud strategy is undoubtedly reasonable for new applications or for applications where refactoring makes sense. However, for very high technical debt applications like SkySolver, Southwest should attempt a lower-risk approach based on lift-and-shift.
Sometimes antiquated legacy applications depend on older operating systems or other outdated software, which becomes the “anchor” for why the application is difficult to modernize. This problem is solved using multi-cloud technology like VMware or ESX-compatible hosting services. You run the same “older” guest operating system, but on modern hardware that has increased performance, reliability, and scalability. Cloud-native VMs don’t support ancient operating systems, but cloud services can often run old x86 Windows or Windows Server operating systems based on VMware. Every cloud vendor can run VMware as a service or as a native hypervisor, and SkySolver running on modern VMware, would become multi-cloud agnostic.
Now, for argument’s sake, what if in fact SkySolver has been lifted and shifted to the cloud. Now what?
The airline would immediately have a core business risk reduction if it were possible to migrate SkySolver, and then successfully cut over using a lift and shift cloud strategy. The process would be similar to a traditional DR (disaster recovery) scenario, where there is a need to switch to the “secondary” site because the primary site was lost. Migrating with this style of architectural thinking could be used to move SkySolver from on-prem to the cloud.
The last part of the plan for SkySolver, if it were now running “as-is” but in the cloud, would be to somehow isolate any interfaces to it so that digital transformation could happen. For this, it should use the “Strangler Pattern.” This approach provides a “safe path to the cloud” and allows for incremental transformation rather than a “big bang” cloud native re-write that would be risky and costly. This type of strategy would provide Southwest with “multi-cloud” compatibility. They could pick and choose cloud services, implement them at its own pace, and incrementally and safely digitally transform SkySolver or other similar legacy applications.
Southwest’s recent challenges have definitely given organization’s a lot to think about with respect to the unforeseen risk legacy systems and applications provide.
Addendum: 1/11/2023 This morning, an FAA systems outage caused massive flight delays across much of the US. The problem was attributed to “outdated computer systems…” So, as mentioned above, legacy software and technical debt have far reaching consequences that go beyond any one company and are often prevalent throughout an entire industry. At Skytap, we have both ideas and advice on how your organization can move forward with legacy modernization to avoid a similar situation.
Skytap team members that contributed to this blog:
Tony Perez – Cloud Solutions Architect at Skytap
Matthew Romero – Technical Product Marketing Manager at Skytap