SRE (site reliability engineering) is a field defined by its constant evolution – from Google’s in-house secret recipe to the hottest new practice for the biggest enterprises to a diverse and holistic mentality practiced by companies of all sizes.
In our State of SRE survey, we predicted that the skillsets and responsibilities of people in the SRE position would become more diverse in 2022. Indeed, we’ve seen SREs fill more roles beyond development and operations, with some SREs focusing entirely on process, strategy, or culture. This expansion of SRE has led to an even more significant potential for what the field can accomplish. We’re excited to speculate on what 2023 will bring for SRE.
1. Economic factors will force companies to look for more efficient ways of managing reliability
As the global economic situation weakens, organizations will have to learn to do more with fewer hires. Organizations in this position will prioritize SRE functions to ensure stability in the face of turmoil.
Consider some of the problems that could occur during the downturn:
- The disappearance of necessary tacit knowledge
- Fewer engineers on-call
- Decelerating development velocity requiring new prioritization
These problems and more are best addressed by SRE processes, like breaking down silos of tacit knowledge, balancing on-call better with a deeper investigation of incidents, and aligning development goals on the highest customer impact. Organizations will find investing in SRE skill sets and tools will be a good use of their limited resources.
[ Also read IT talent: 6 tips for success in an SRE role. ]
2. SRE will be valuable insurance for experimentation
Whether it’s AI assistance, VR immersion, or web3 decentralization, 2023 will continue to push organizations to adopt cutting-edge technology. It’s a challenge to guess which of these ideas will flourish and which will flounder, but either way, having a reliable foundation will be necessary. Adopting even the most successful new ideas at scale will bring new obstacles and types of incidents. These growing pains of new technologies will require new approaches.
As organizations experience these growing pains, they’ll turn to SRE to keep their customers happy while they adjust. Incident retrospectives can help teams handle new sources of incidents quickly, while a reliability mindset can keep customer happiness the number one priority.
3. A more holistic definition of reliability will emerge
Reliability is the subjective experience of users based on their expectations of the service. While this is a helpful way to align priorities with customer needs, 2023 will bring an even more holistic definition of reliability. Organizations will start thinking about the reliability of their system, not just in terms of their users’ experiences, but as a complete package covering everything starting from development ideation.
This new socio-technical definition of reliability will encompass the system’s health, the users’ expectations and experiences, and your team's resilience in the face of adversity. As organizations face increasingly complex systems, greater user reliance, and more strained personnel resources, a definition of reliability that faces these challenges will become necessary.
4. Reliability will be a growing priority for teams outside of engineering
SRE was expanding beyond just development roles, but with this new socio-technical definition of reliability, teams entirely outside engineering will prioritize it in their process and culture.
Consider a customer-facing team like sales. When landing new clients, the team must ensure continuity even if people are out of the office. They need consistency in their messaging and engagement to prevent any deals from falling through the cracks. They must manage unplanned work interfering with planned work, just like engineers dealing with incidents. The reliability mindset, along with processes and tooling, is the best way for these teams to uplevel these skills.
5. Organizations are confronting the build vs. buy dilemma
In our digital-first era, when a consistently available service is a baseline customer expectation, teams of all sizes need a process to handle incidents and downtime efficiently. Before, many teams got away with ad-hoc, improvised processes scattered across shared docs. These homegrown processes may suffice to handle occasional incidents, but we predict that in 2023, organizations will hit their limitations.
With limited budgets and growing expectations, companies must diligently evaluate their incident management solution. Investing in a vendor’s solution comes with upfront costs, but maintaining a homegrown solution has continuous time and, therefore, monetary costs. A purpose-built incident solution allows you to improve after each incident, continually reducing your downtime even as your service grows in complexity. A solution that’s “good enough” is no longer good enough.
Transformations can happen every year in a relatively new and quickly growing practice like SRE. It’s the thrill of being on the cutting edge.
[ New research from Harvard Business Review Analytic Services identifies four focus areas for CIOs as they seek more flexibility, resilience, and momentum for digital transformation. Download the report now. ]