I spoke on a panel about a month ago at a Mitchell developer’s conference, and a question was raised around how does Mitchell avoid cloud cost overruns. I thought this was a good question, and, after reading a recent LinkedIn article on out-of-control cloud OpEx, I wanted to present my thoughts on the matter in a quick post.
Most cloud providers like AWS do not provide any of the basic controls companies need to ensure predictable expenses. We believe at Skytap that it is the responsibility of the cloud provider to offer both visibility as well as controls into usage so that companies can avoid moving their on-premises VM sprawl problem to the cloud.
In the case of on-premises environments, teams typically camp on VMs because they were so difficult to acquire. Forms were filled out, justifications written, approvals secured all before the gear was ordered and shipped to the loading dock. Even with virtualization, these practices continued, and led to days or weeks of time before teams got the resources they needed. It’s no wonder they were reluctant to give resources up, and this is what has led to astoundingly low levels of utilization for on-premises environments for development and testing purposes.
In the case of cloud, resources have now become easily available. Self-service access to computing is one of the core tenets of the cloud, but sprawl still occurs for two primary reasons:
- It takes time to create and stitch together all of the components needed for a fully functional environment. Unless you have invested in scripting or infrastructure as code methods, you don’t want to get rid of the environment simply because it takes time to re-create.
- You simply forget you have resources running. In an on-premises lab, it does not cost an end user anything to keep their VMs running, but in the cloud, this is not the case. End users often times forget that keeping things running all the time costs their company money.
What is really needed to avoid sprawl and runaway costs are a few key capabilities, which we offer with Skytap cloud:
- Ability to suspend an environment. By “suspend,” I mean “capture the in-memory running state of the application—just like it works on your laptop.” How many of you reboot your laptop on a regular basis? And yet, this is how most clouds work today. By suspending the environment, the resources running the environment can be released and only storage costs accrue to the camped environment. If teams can quickly resume an environment and get back to the previous state without having to rebuild it from scratch they are less likely to want to keep them running.
- User and Department Quotas. I know it sounds old-school and “ITish,” but simply put, if you can assign quotas to individuals for smaller companies, or departments for larger companies, they will self-police their usage. To do this reasonably, you need course-grained controls like VM RAM concurrency or storage. In the case of AWS, the number of widgets you have access to is very high, so this would likely translate to some kind of $ quotas, but those are more difficult for end users to manage.
- Reporting and Notifications. Finally, you need good tools for reporting and notification on usage. If you can provide usage reporting based on departments, groups, projects, etc., you can ensure the right funds are being used appropriately. With notifications, you can avoid placing hard limits like quotas, but still be involved when usage reaches certain thresholds.
We encourage you to learn more about some of the features that make Skytap the global public cloud for running traditional enterprise applications.