Cloud Taxonomy for Non-Production Use Cases for the Cloud
The following use cases are potential ways to use the cloud for non-production use cases.
The key is to deliver “quickly” the infrastructure needed to perform a specific need or task. Waiting weeks for infrastructure delivery should be considered an “anti-pattern” since the cumulative time of waiting over the course of a project would be considerable. Building internal resource delivery processes with slow delivery times goes against the concepts described in works such as “The Phoenix Project,” “The Goal,” and “Healthcare Digital Transformation.”
Here are a few ideas:
Developers need a way to create their own “environment” of components that represent the target system. See Gene Kim, “Access to Production Like Environments” for developers. Providing “representative” environments instead of mock environments running on local workstations/laptops would be a key indicator of the ability to reliably prove out many architectural assumptions being made in the project.
Classical QA Automation testing
Instead of QA having to share a limited number of environments that commonly develop “configuration drift” for the current versions of components. QA should be able to easily create ‘n’ number of QA environments “QA-1”, “QA-2”, “QA-3”, “QA-n.” Leveraging cloud representations of the target system will allow QA to destroy and rebuild the correct target environment from scratch within minutes or hours. No more scripting to back out, reset test data. No more “reset” scripts to return configurations to a starting state. The QA environment is completely ephemeral (short-lived) and may only exist for the test run duration. If tests fail, the entire environment is “saved” as a complete snapshot, aka ‘Template,’ and attached to the defect report to be reconstituted by ENG when diagnosing the problem. For the next test run, a completely new environment is generated from the Template and is separate from the previous environment used in past test runs that may contain defects.
By leveraging cloud resources for testing, more tests can be run at potentially a faster pace and with higher quality results hopefully reducing defects that slip through to the final product. Reducing defects is also described in “seven wastes” referenced previously.
Traditional Enterprise thinking historically creates a limited number of Integration test environments shared among many groups. Because of this, the Integration environment is often broken, misconfigured, out of date, stale, or unusable in some way. Environment “drift” becomes a barrier to doing regular testing.
Applying cloud thinking to the building of on-prem systems allows for different Integration testing approaches to be used economically and efficiently. The cloud can create multiple Integration environments that are all “identical” based on the current target goal. R&D and ENG subgroups can each have their dedicated Integration environment that can combine work from multiple squads of the same discipline, without colliding with other system components. For example, all the teams working on database changes can first integrate their work into a localized Integration testing environment. Once successful, move the bulk modifications to the higher level where all system components are being combined.
Chaos Engineering, “What if we take server X offline?”
“Chaos Engineering for Traditional Applications” documents the justification and use cases where cloud-native Chaos engineering theory can be applied to traditional applications running on-prem. Server consolidation projects have the manifest need to apply “Chaos Testing” to the intermediate release candidates being built. A new level of “What if XYZ happens?” will be achieved when combining multiple systems down into a smaller number. Some categories of problem areas that need “random failure testing” would include:
- Low memory
- Not enough CPU
- Full disk volumes
- Low network bandwidth, high latency
- Hardware failures like a failed disk drive, failed server, disconnected network
And not so obvious ones could be:
- Database/server process down
- Microservice down
- Application code failure
- Expired Certificate(s)
And even less obvious:
- Is there sufficient monitoring, and have alarms been validated?
Each category of items requires the execution of multiple “experiments” to understand how the overall system reacts when a chaotic event is introduced into the system. The cloud can then be used to recover the system back to a stable state. Quick environment recovery allows for the execution of multiple, potentially destructive experiments.
Creating a consolidated server solution creates a whole new application architecture that did not exist before. This brings up the question of how Disaster Recovery (DR) will be implemented for this new system. The DR requirements for a consolidated system are even more elevated than individual disparate systems that may have existed during pre-consolidation. The consolidated system creates an “all or nothing” approach for DR since the DR mechanism now holds all of the previous individual components as a single unit of failure. Before consolidation, one of the individual components may have failed without impacting the others. Now during post-consolidation, all components are in a single unit of failure. So a DR event may cause more “ripples in the pond” than during pre-consolidation.
But once again, a cloud-based thinking model allows for experimentation and trial-and-error during the design of the DR implementation. Viable approaches can be tested in production-like but mock environments running in the cloud, that will eventually be represented by traditional on-prem systems. The cloud becomes the DR “sandbox” so that the right approach can be validated in a way that does not require non-disposable fixed assets to be purchased.
Need to train employees on a new application package where each user needs to create their own data?
For example, all students have directions like “Create a new customer with these unique parameters: Name, Date of Birth, Social Security Number, etc. If all students try to create the same customer in the same data set, you will most likely have duplicate data collisions.
You can solve this problem by giving each student their own unique environment of data and application components so each student has their own “world” to work in. On traditional on-prem systems, this is very difficult. For the cloud, it is a piece of cake. You simply create a unique compute/data environment for each student. The students do their work, and when the class is over you delete their environments. If you have a “golden template” that is used as a starting point, resetting everything for the next class should be easy.
There are various use cases for the cloud that do not technically fall into the category of “classical production”. Just some imagination and a little ingenuity are required to satisfy a lot of non-prod use cases.
Oh yeah, here is one more…Need to test your virus mitigation tools in a way that won’t risk infecting your actual production user base? … Use the cloud.