The development team finished the product 3 weeks ago, but it’s still not in production. You released a new feature and half the website went down. A significant amount of your day is spent dealing with production outages or issues. No matter what your role is, if you work in a technology organization, chances are you’ve heard the term DevOps. If any of the above problems sound familiar, chances are your company could benefit from adopting DevOps practices.
Before we dive deeper into other common problems that DevOps tries to solve, let’s start with a brief history of what DevOps is and where it originated. If you’re looking for a more in depth understanding of DevOps philosophies, keep an eye out for Part 2 of this series, where we’ll explore the key pillars of DevOps. This article will focus on the common problems that DevOps aims to solve.
What is DevOps?
DevOps is a philosophy that arose circa 2009 following the success of other frameworks for optimizing software delivery such as agile, scrum, and xtreme. All of these frameworks have a similar goal - develop higher quality software faster. While the aforementioned frameworks concentrate mostly on the development process, DevOps takes a more holistic approach by applying its philosophies to the entire value stream. This includes everything from the development process, to QA, to production releases and beyond. Let’s explore some of the leading indicators your organization could benefit from DevOps.
Common Problems DevOps Aims to Solve
Realizing Value-Providing Features Take Too Long
This problem can take many forms, but can be summed up by saying it takes too long for new value-providing features to make their way to production. To paraphrase, there is too much lead time in your processes and you’d like to reduce your time to market. Common scenarios that impact lead time include:
- Unplanned Work. The agile software development process is well defined and works great when things run smoothly. However, it seems that a large portion of your team’s time is spent dealing with Unplanned work. This work may come in the form of production issues, tech debt, or inefficiencies in the deployment pipeline.
- Red Tape. If it takes more than a couple minutes for a developer’s code to make its way to production, chances are you need to evaluate your deployment pipeline. Overburdened change management processes riddled with multiple handoffs and gatekeepers that take days, weeks, or even months are a key indication that something has gone wrong. These handoffs usually involve significant manual Quality Assurance and User Acceptance testing. If bugs are found during the process, developers are asked to change context to revisit a problem they haven’t worked on in days, weeks, or months. This leads to unplanned work and is detrimental to productivity.
- Lacking Access to Production-like Environments. We’ve all been part of conversations where code fails to work as expected during deployment and the developer casually mentions that it works on their machine. If a developer does not have the ability to test their code in a production like environment early and often, these types of delays will be abundant when it comes time to release your new software or feature. This leads back to Unplanned Work as the development team must change contexts again to solve the issue that is currently delaying release. Ideally production like environments can be created on demand in minutes. Unfortunately many organizations still require weeks of lead time to create adequate test environments for development teams.
Absence of Production Metrics
One strong indicator that you have an absence of production metrics is that users tell you about software problems. If help desk tickets are the norm for detecting production issues, your software is most likely frustrating users on a daily basis. In today’s world, users expect things to just work when it comes to software and the internet, and many users will not take the time to send you a help desk ticket when software doesn’t work. There are many cases of organizations missing out on tens of thousands of dollars in revenue due to a bug they didn’t know about. Mature organizations have found ways to monitor users in production and detect anomalous behavior before the user reports it. This allows the company to proactively fix problems before many users are even aware there is an issue. Even if you aren’t able to fix the problem immediately, you can signal to users that a fix is in progress.
Tracking errors in production is important but is only 1 of the 4 golden signals. Other important metrics include throughput, latency, and saturation of your applications. Armed with information from these metrics you can make critical decisions about the health of your production system thus improving the experience for the end user. For example, latency issues means everything the user does takes too long which leads to an unsatisfactory experience. Measuring throughout can help you understand how many users are using your application at any given time and whether your infrastructure can support it. Finally, saturation is a measure of how much of your infrastructure is being utilized; this is important for optimizing the costs within your organization.
Deploying to Production is Scary
It’s that time of the week / month / quarter / year again when you deploy the new features your development team has been diligently working on for the past several sprints. You’ve calculated when there is the least amount of traffic on your website to minimize risk. All the relevant parties are in the room, the operations engineers, QA, developers, and product managers. There is excitement in the air, but also quite a bit of apprehension. These production deployments don’t usually go well and often spiral into hours of overtime and late nights. The operations team starts the deployment process and within a few minutes you begin to see the warning signs that things aren’t going well. Help desk tickets are starting to trickle in as things break. Unfortunately there is no way to reverse this process once it has started so the only path is forward. You settle in for the long haul where tempers will run short and emotions high as everyone points the finger at each other. This is a common scenario in organizations that have not adopted DevOps principles; many of these organizations don’t recognize there is a better way.
Inability to Scale to Demand
Companies often have peak traffic times for their applications. For a ticket sales website, this is when a new event goes on sale. For retail companies, this takes place over the holidays and may peak on Black Friday. Many organizations spend a lot of time and money preparing for these peak times. In a DevOps environment, scaling applications to handle a larger workload can be as easy as pressing a button. In sophisticated organizations that run on public or private clouds, scaling can often happen autonomously based on traffic. Automating these types of tasks frees up valuable resources to work on business value projects instead of maintaining infrastructure.
If your organization is suffering from any of the above, it is likely you could benefit from the implementation of DevOps practices. DevOps can significantly improve your engineering team’s ability to adapt to market changes and release the features that give you an edge against your competition. Adopting a DevOps culture improves company morale and stops the blame game when it comes to effectively releasing software.