Mastering Apache Kafka on Kubernetes — Strimzi K8s operator

Arek Borucki
Beamery Hacking Talent
5 min readNov 25, 2021

--

In the article that follows, we present the first part of our blog series describing how we leverage the power of Kubernetes here at Beamery and how it helps us manage our datastore technologies at scale. Kubernetes, in their own words, is an open-source system for automating deployment, scaling and management of containerized applications. The application can be stateless or stateful — an example of the latter and a key focus of our post is Apache Kafka.

Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store and process streams of records in real-time and is designed to handle data streams from multiple sources and deliver them to multiple consumers. However, the power of Apache Kafka needs to be balanced with its operational difficulty as a system, its full utilisation requires a lot of dedicated knowledge, time and effort. It was these pain points that led us to research and implement a more efficient way to manage our Kafka clusters.

Our chosen solution was to use Kubernetes Operators as a way to launch our stateful applications. Operators are the recommended way to run datastore technologies on Kubernetes clusters. An operator extends the Kubernetes API via technology-specific objects and orchestrates the management life cycle. One doesn’t need to have deep technology-specific knowledge to start running it on Kubernetes, as an operator will take care of implementation and management out of the box. Of the various flavours available we settled upon and implemented Strimzi Kafka.

Strimzi is an open-source project that provides container images and Kubernetes operators for running Apache Kafka on Kubernetes and Red Hat OpenShift. It supports Kafka Connect (which we require for integration with MongoDB; but this is to be discussed in our next post), is simple to use, supports TLS, SCRAM-SHA and Certificate Management and streamlines Kafka upgrades by managing the rolling upgrade for the user. Strimzi provides — with documentation — the following Operators for managing a Kafka cluster running within a Kubernetes cluster:

  • Cluster Operator — Deploys and manages Apache Kafka and Zookeeper clusters, Kafka Connect, Kafka MirrorMaker, Kafka Bridge, Kafka Exporter, and the Entity Operator
  • Entity Operator — Comprises the Topic Operator and User Operator
  • Topic Operator — Manages Kafka topics
  • User Operator — Manages Kafka users

The deployment of Kafka components to a Kubernetes cluster using Strimzi is highly configurable through the application of custom resources. Custom resources are created as API instances and are added by Custom Resource Definitions (CRDs) to existing Kubernetes resources. CRDs act as configuration instructions to describe the custom resources in a Kubernetes cluster and are provided with Strimzi for each Kafka component used in a deployment as well as users and topics. Custom resources and CRDs are defined as YAML files. CRDs require a one-time installation in a cluster to define the schemas used to instantiate and manage Strimzi-specific resources. A CRD defines a new `kind` resource, such as `kind:Kafka` or `kind:Topic` within Kubernetes.

A visual representation of our deployment

To begin, we install Strimzi with all required operators via `kubectl create`. This allows us to start using the new API objects to manage our Kafka deployments.

Looking at what we deployed:

we created two operator pods. Those pods are extending the native Kubernetes API with the following new `kind` Kubernetes Kafka objects:

These new objects allow us to take advantage of those CRD’s and build new clusters

Kafka CRD (replicated from GitHub below) orchestrates the Kafka and Zookeeper installation and management process including the persistent storage element of the Kafka brokers (stateful set). As users, all we need to describe is the desired state of our Kafka cluster and Kubernetes, with the help of an operator, will build it for us.

Using `kubectl apply` our new Kafka cluster will be installed on Kubernetes via Strimzi within minutes.

Looking at the results:

The Kafka cluster is up and running (together with ZooKeeper) and our application can take advantage of this cluster immediately. The installation and configuration process has been orchestrated by Strimzi which then proceeds to constantly supervise the state of the cluster.

KafkaTopic CRD (replicated from GitHub below). The `KafkaTopic` resource is used to configure topics including the number of partitions and replicas. When we create, modify or delete a topic using the `KafkaTopic` resource, the `Topic Operator` ensures those changes are reflected in the Kafka cluster.

KafkaUser CRD (replicated from GitHub below) allows us to declare a `KafkaUser` resource as part of the application deployment. We can specify the `authentication` and `authorization` mechanism for the user and configure user quotas which control the usage of Kafka

Strimzi has its own commands. Below we demonstrate some particularly useful ones for running Strimzi with `kubectl`.

Display strimzi cluster (kafka + zookeeper)

Display kafka topics

Display kafkaconnect

Display kafkaconnector

And to conclude, here is a table showing all our resources and their short names:

Thank you for reading, we hope you enjoyed this first part. In the next part, we describe in detail our use case: data replication between MongoDB and Cloud Spanner via Kafka Strimzi, with the help of MongoDB Kafka Connector and KEDA (Kubernetes Event-driven Autoscaling).

Want to work alongside some great humans and inspiring leaders to solve big problems? We’re hiring! Click here to join the #BeamTeam and change the future of work.

--

--