A principal promise of Kubernetes is that it will enable the automation of much of the operational toil that comes with modern applications and infrastructure. Kubernetes Operators, then, could be thought of as a means of paying off on that promise.
What is a Kubernetes Operator?
Let’s back up for a moment and define what a Kubernetes Operator is, and give an example of how they work.
“Operators are clients of the Kubernetes API that control custom resources,” says Matthew Dresden, director of DevOps at Nexient. “This capability enables automation of tasks like deployments, backups, and upgrades by watching events without editing Kubernetes code.”
That’s no small capability, especially when it comes to running containerized workloads and services: This is a path toward more fully automating an application running on Kubernetes.
As Red Hat product manager Rob Szumski notes in a blog, “The key attribute of an Operator is the active, ongoing management of the application, including failover, backups, upgrades, and autoscaling, just like a cloud service. Of course, if your app doesn’t store stateful data, a backup might not be applicable to you, but log processing or alerting might be important. The important user experience that the Operator model aims for is getting that cloud-like, self-managing experience with knowledge baked in from the experts.”
If you can’t fully automate, you’re undermining the potential of containers and other cloud-native technologies.
[ Kubernetes terminology, demystified: Get our Kubernetes glossary cheat sheet for IT and business leaders. ]
What is a Kubernetes Operator example?
Dresden from Nexient shares a scenario of how an Operator works: “For example, if Kubernetes detects the loss of a node, an Operator could automatically replicate data from another cluster node running in Kubernetes while maintaining a quorum to bring the cluster back to the desired level of parity.” It happens automatically. The example might be reasonably straightforward, but its implications are anything but.
“It’s hard to overstate the importance of Operators in full-blown production,” Dresden says. “Consider what it takes to manage a highly available, fault-tolerant, multi-region database that uses stateful sets and requires backups, rolling upgrades, and scaling. Now imagine running it in Kubernetes and trying to do all that while wrangling several hundred microservices, each dynamically scaling up and down in seconds.”
This is not child’s play. Operators, however, can help make this kind of work more manageable.
Kubernetes Operators: What IT leaders should know
“It’s a daunting prospect for even the best operations teams,” Dresden says. “Operators make large-scale Kubernetes deployments practical by automating difficult, error-prone work that humans – the original ‘operators’ – would otherwise need to carry out.”
If you want more definitions of Kubernetes Operators and a deeper dive into their history, check out our primer, How to explain Kubernetes Operators in plain English. Today, we’re here to share four important things IT leaders should know about Operators as their usage grows in concert with overall Kubernetes adoption.
1. Kubernetes Operators are increasing in quantity and popularity
As Kubernetes environments grow, so too does the interest in Operators. coreOS first introduced Operators back in 2016, and they got a big boost with the launch of the Operator Framework in March 2018. (Red Hat acquired coreOS in January 2018, expanding the capabilities of the OpenShift container platform.)
There’s been a noticeable bump in the interest in and implementation of Operators of late, according to Liz Rice, VP of open source engineering at Aqua Security. Rice also chairs the Cloud Native Computing Foundation’s technical oversight committee.
“At the CNCF, we’re seeing interest in projects related to managing and discovering Kubernetes Operators, as well as observing an explosion in the number of Operators being implemented,” Rice says. “Project maintainers and vendors are building Operators to make it easier for people to use their projects or products within a Kubernetes deployment.”
This growing menu of Operators means there’s a need for a, well, menu. “This proliferation of Operators has created a gap for directories or discovery mechanisms to help people find and easily install what’s available,” Rice says.
The relatively new OperatorHub.io is one place where Kubernetes community members can find existing Operators or share their own. (Red Hat launched Operator Hub in conjunction with Amazon, Microsoft, and Google.)
[ Read also: OpenShift and Kubernetes: What's the difference? ]
2. Kubernetes Operators need ongoing attention
Even as you use Operators to more fully automate a Kubernetes application, you’ll need to keep tabs on them.
“It’s important to understand that – like Kubernetes itself is – Kubernetes Operators are constantly evolving,” says Ben Bromhead, CTO at Instaclustr. “Developer teams attempting to manage them in-house really must keep close track of Operator changes and proactively plan for upgrades. They will absolutely need to manually intervene when things go wrong since Kubernetes Operators can and will get into weird states that require hands-on attention.”
Bromhead recommends figuring out a plan for ongoing oversight of your Operators before you go gung-ho. Human expertise is still very much needed.
“This workflow should be figured out before putting it into production, or you could find yourself in trouble,” Bromhead says. “You still need to manage the Operator, ensuring it’s up-to-date, that it’s properly patched, et cetera. Don’t come in thinking that using Kubernetes Operators is a ‘set-it-and-forget’ type of situation.”
An Operator, in this sense, is like any other non-trivial application: “Operators can’t be written once and forgotten,” Dresden says. “Continuous improvement is a must.”
Moreover, Dresden points out that you’re automating complexity, not eliminating it. Things can still go wrong.
“Complexity doesn’t disappear just because it’s automated,” Dresden says. “Problems have a way of popping up at the worst possible times in ways that are often unexpected. As the level of automation increases, it can become impossible for a person to keep up with the state of everything where change is happening so fast.”
Comprehensive telemetry is key here, according to Dresden; otherwise, you’re going to find out about that bottleneck service when customers complain rather than monitoring and mitigating such issues proactively. Dresden also emphasizes the importance of writing declarative Operators rather than making the “beginner’s mistake” of writing imperative Operators. It may require a bit more time up-front, but it will be worth it.
“Operators can generate a huge boost in productivity, but only if both the providers and consumers of the managed service have visibility into what the Operators are doing,” Dresden says.
Are Operators mainly for databases? No. Let’s dispel that and one other misconception: