Build self-hosted Apache Kafka server on RHEL

Building a self-hosted Apache Kafka server on a Kubernetes cluster running on Red Hat servers is a robust and scalable solution. This guide will walk you through the process, focusing on best practices and leveraging Kubernetes’ strengths.

We’ll primarily use Strimzi, which is a Kubernetes Operator that simplifies the deployment and management of Kafka on Kubernetes. Strimzi is widely adopted and recommended for production Kafka deployments on Kubernetes.

Here’s a breakdown of the steps:

1. Prerequisites: Red Hat Servers & Kubernetes Cluster

Before deploying Kafka, you need a functioning Kubernetes cluster running on your Red Hat servers.

1.1 Red Hat Server Preparation

Operating System: Red Hat Enterprise Linux (RHEL 8 or 9) or CentOS Stream 8/9.

System Updates:

sudo dnf update -y
sudo reboot # If kernel updated

Firewall: Ensure necessary ports are open for Kubernetes components.

sudo systemctl enable --now firewalld
# Allow SSH
sudo firewall-cmd --permanent --add-service=ssh
# Allow KubeAPI server, Kubelet ports etc. (specific to your K8s setup)
# If using OpenShift, it manages its own firewall rules
sudo firewall-cmd --reload

SELinux: SELinux can be tricky. While for a quick test you might disable it (setenforce 0 and edit /etc/selinux/config), for production, it’s best to configure proper policies or use an environment like OpenShift that handles container SELinux contexts.

Swap: Kubernetes generally requires swap to be disabled.

sudo swapoff -a
# To make it persistent across reboots: comment out swap lines in /etc/fstab
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

1.2 Kubernetes Cluster Setup on Red Hat

You have several options for setting up Kubernetes on RHEL:

Option A: OpenShift Container Platform (Recommended for RHEL Environments)

OpenShift is Red Hat’s enterprise-grade Kubernetes distribution, offering enhanced security, developer tools, and a more integrated experience.

Advantages: Enterprise support, integrated registry, CI/CD, monitoring, advanced networking (OVN-Kubernetes), and built-in StorageClass for OpenShift Container Storage (OCS) (Ceph).
Installation: OpenShift installation is complex and often automated.
- Installer-Provisioned Infrastructure (IPI): The installer deploys the underlying infrastructure (e.g., on AWS, Azure, vSphere) and then OpenShift.
- User-Provisioned Infrastructure (UPI): You prepare the underlying RHEL VMs/bare metal, and then the OpenShift installer deploys OpenShift on top.
- Resources: Refer to the official Red Hat OpenShift documentation for detailed installation guides: OpenShift Docs

Option B: Kubeadm (Open Source Kubernetes)

If you prefer pure upstream Kubernetes:

Container Runtime: Install containerd (recommended).
- Refer to the Kubernetes documentation for containerd installation on RHEL/CentOS: Install containerd

Kubeadm, Kubelet, Kubectl:

sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable --now kubelet

Initialize Control Plane: On your master node:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 # For Flannel, adjust for others
# After init, follow instructions to set up kubectl for current user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install CNI (Container Network Interface):
- Flannel: kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
- Calico: Refer to Calico docs.
Join Worker Nodes: On worker nodes, run the kubeadm join command provided by kubeadm init.
StorageClass: You’ll need a PersistentVolume provisioner.
- Rook-Ceph: For highly available, distributed storage: Rook.io
- NFS CSI: If you have an NFS server: NFS CSI Driver
- Local Path Provisioner: For single-node or testing (not production): Local Path Provisioner

1.3 Essential Tools

kubectl: For interacting with the Kubernetes cluster (installed with kubeadm).

helm: For deploying Strimzi and Kafka.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

git: To clone Strimzi examples (optional, but helpful).

2. Deploying Strimzi Kafka Operator

Strimzi makes deploying and managing Kafka on Kubernetes easy by using the Operator pattern.

2.1 Add Strimzi Helm Repository

helm repo add strimzi https://strimzi.io/charts/
helm repo update

2.2 Install Strimzi Operator

The Strimzi Operator watches for Kafka, KafkaTopic, KafkaUser, etc., Custom Resources (CRs) and manages the actual Kafka and ZooKeeper deployments.

You can install it in a specific namespace (e.g., kafka-operator) or cluster-wide. For simplicity, let’s install it in its own namespace and grant it cluster-wide permissions to manage Kafka resources across all namespaces.

kubectl create namespace kafka-operator

helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka-operator \
  --set watchAnyNamespace=true \
  --set createClusterRoles=true

watchAnyNamespace=true: Allows the operator to manage Kafka clusters in any namespace.
createClusterRoles=true: Ensures the necessary RBAC roles are created for cluster-wide management.

Verify Operator Deployment:

kubectl get pods -n kafka-operator
# You should see 'strimzi-cluster-operator-...' running

3. Deploying Kafka Cluster with Strimzi

Now that the operator is running, you can define your Kafka cluster using a Kafka Custom Resource.

3.1 Choose a StorageClass

Before defining Kafka, ensure you have a StorageClass defined in your Kubernetes cluster. Kafka brokers and ZooKeeper nodes need persistent storage.

Check existing StorageClasses:
```
kubectl get sc
```
Identify a default or create one: If you have OpenShift, ocs-storagecluster-cephfs or ocs-storagecluster-ceph-rbd might be available. For kubeadm, if you installed Rook-Ceph, it will have rook-ceph-block or similar. If you’re using local-path-provisioner for testing, its name is typically standard or local-path.

3.2 Create the Kafka Custom Resource (CR)

Create a YAML file (e.g., my-kafka-cluster.yaml) to define your Kafka cluster. This example uses three Kafka brokers and three ZooKeeper nodes, with basic authentication and external listeners.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.6.1 # Specify your desired Kafka version
    replicas: 3
    listeners:
      # Plain listener for internal K8s communication
      - name: plain
        port: 9092
        type: internal
        tls: false
      # TLS listener for internal K8s communication (recommended for production)
      - name: tls
        port: 9093
        type: internal
        tls: true
      # External listener for clients outside the Kubernetes cluster
      - name: external
        port: 9094
        type: route # Use 'route' for OpenShift, 'loadbalancer' or 'nodeport' for others
        tls: true
        authentication:
          type: scram-sha-512 # Recommended authentication
        configuration:
          broker.rack: ${STRIMZI_RACK_ID} # Required for 'rack' awareness
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
      auto.create.topics.enable: "false" # Prevent accidental topic creation
    storage:
      type: persistent-claim
      class: <YOUR_STORAGE_CLASS_NAME> # e.g., 'ocs-storagecluster-ceph-rbd', 'standard', 'rook-ceph-block'
      size: 100Gi # Adjust size as needed per broker
      deleteClaim: false # Set to true to delete PVCs when Kafka is deleted
    resources:
      requests:
        memory: 4Gi
        cpu: 2000m
      limits:
        memory: 4Gi
        cpu: 2000m
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      class: <YOUR_STORAGE_CLASS_NAME> # Same as Kafka storage class
      size: 10Gi
      deleteClaim: false
    resources:
      requests:
        memory: 2Gi
        cpu: 1000m
      limits:
        memory: 2Gi
        cpu: 1000m
  entityOperator:
    topicOperator: {} # Manages Kafka topics via KafkaTopic CRs
    userOperator: {}  # Manages Kafka users via KafkaUser CRs

Key Configuration Details:

apiVersion: kafka.strimzi.io/v1beta2: The Strimzi API version.
metadata.name: my-kafka-cluster: Name of your Kafka cluster.
namespace: kafka: Create this namespace: kubectl create namespace kafka.
kafka.version: Specify your desired Kafka version.
kafka.replicas: Number of Kafka brokers (typically 3 or more for production).
listeners:
- plain: For applications inside the Kubernetes cluster, often used for simplicity (e.g., testing or internal microservices).
- tls: Also for applications inside the cluster, but with TLS encryption (recommended).
- external: For applications outside the Kubernetes cluster.
  - type: route: For OpenShift, exposes via an OpenShift Route.
  - type: loadbalancer: If your Kubernetes cluster provides a cloud LoadBalancer (e.g., AWS, Azure, GCP). For on-prem, MetalLB can provide this.
  - type: nodeport: Exposes via a static port on each node. Clients need to know the node IPs and ports. Less ideal for production with multiple brokers.
  - authentication.type: scram-sha-512: Strongly recommended for external access. Strimzi will create secrets for users.
  - broker.rack: Important for anti-affinity and rack-awareness, especially when using type: nodeport or loadbalancer. Strimzi injects STRIMZI_RACK_ID automatically if your nodes have topology.kubernetes.io/zone labels.
kafka.config: Standard Kafka broker configurations. auto.create.topics.enable: "false" is a good production practice.
kafka.storage: Define persistent storage for Kafka logs.
- type: persistent-claim: Uses Kubernetes PVCs.
- class: Crucial! Replace <YOUR_STORAGE_CLASS_NAME> with an available StorageClass.
- size: Allocate enough storage.
- deleteClaim: false: Prevents PVCs from being deleted if the Kafka CR is deleted (data safety).
zookeeper: Similar configuration for ZooKeeper (or use K-Raft if supported by your Kafka version and you configure it).
entityOperator: Strimzi’s operators for managing Kafka Topics and Users via Kubernetes CRs.

3.3 Deploy the Kafka Cluster

kubectl create namespace kafka # If you haven't already
kubectl apply -f my-kafka-cluster.yaml

Verify Cluster Deployment:

It will take some time for the pods to spin up, PVCs to be provisioned, and Kafka to become ready.

kubectl get kafka -n kafka
kubectl get pods -n kafka
kubectl get pvc -n kafka
kubectl get svc -n kafka
kubectl get routes -n kafka # If using OpenShift and 'route' listener

Wait until all Kafka and ZooKeeper pods are in a Running state. The kubectl get kafka -n kafka command should show READY status.

4. Managing Kafka Resources (Topics, Users)

Strimzi’s Entity Operator includes a Topic Operator and a User Operator to manage Kafka topics and users directly through Kubernetes Custom Resources.

4.1 Create a Kafka Topic

Create a KafkaTopic CR (e.g., my-topic.yaml):

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-first-topic
  namespace: kafka
  labels:
    strimzi.io/cluster: my-kafka-cluster # Link to your Kafka cluster
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 604800000 # 7 days
    segment.bytes: 1073741824 # 1GB

Apply it:

kubectl apply -f my-topic.yaml

4.2 Create a Kafka User

Create a KafkaUser CR (e.g., my-user.yaml) to define a user with SCRAM-SHA-512 authentication and ACLs.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: my-app-user
  namespace: kafka
  labels:
    strimzi.io/cluster: my-kafka-cluster # Link to your Kafka cluster
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Allow producing to 'my-first-topic'
      - resource:
          type: topic
          name: my-first-topic
          patternType: literal
        operation: Write
        host: "*"
      # Allow consuming from 'my-first-topic'
      - resource:
          type: topic
          name: my-first-topic
          patternType: literal
        operation: Read
        host: "*"
      - resource:
          type: topic
          name: my-first-topic
          patternType: literal
        operation: Describe
        host: "*"
      # Required for consumer groups
      - resource:
          type: group
          name: my-consumer-group
          patternType: literal
        operation: Read
        host: "*"

Apply it:

kubectl apply -f my-user.yaml

Strimzi will create a Kubernetes Secret containing the username and password for this user.

kubectl get secret my-app-user -n kafka -o yaml
# You'll find 'password' and 'username' keys in base64 encoded format.

5. Connecting Clients to Kafka

The way clients connect depends on the external listener type you chose.

5.1 For OpenShift (`type: route`)

Get the Bootstrap URL:

oc get route my-kafka-cluster-kafka-external-bootstrap -n kafka -o jsonpath='{.spec.host}{"\n"}'
# Example output: my-kafka-cluster-kafka-external-bootstrap-kafka.apps.mycluster.example.com

Get User Credentials:

kubectl get secret my-app-user -n kafka -o jsonpath='{.data.password}' | base64 -d
# And username is 'my-app-user'

Get CA Certificate: Strimzi automatically generates TLS certificates. Clients need the CA cert to trust the Kafka brokers.
```
kubectl get secret my-kafka-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
```

Client Configuration Example (Java/Spring Boot properties):

spring.kafka.bootstrap-servers=<YOUR_BOOTSTRAP_URL>:443
spring.kafka.security.protocol=SASL_SSL
spring.kafka.ssl.trust-store-location=file:/path/to/ca.crt
spring.kafka.ssl.trust-store-type=PEM # Or JKS if you convert
spring.kafka.properties.sasl.mechanism=SCRAM-SHA-512
spring.kafka.properties.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="my-app-user" password="<YOUR_PASSWORD>";

Note: For OpenShift Routes, the port is usually 443, as the Route handles the TLS termination.

5.2 For Kubeadm (`type: loadbalancer` or `type: nodeport`)

LoadBalancer:
- Get the External IP: kubectl get svc my-kafka-cluster-kafka-external-bootstrap -n kafka
- Use that IP and port 9094.
NodePort:
- Get NodePort: kubectl get svc my-kafka-cluster-kafka-external-bootstrap -n kafka -o jsonpath='{.spec.ports[?(@.name=="external")].nodePort}{"\n"}'
- Clients need any node IP and the nodePort. If you have 3 brokers, Strimzi exposes each broker on a separate NodePort, or it can expose a single bootstrap service.
- Bootstrap Servers: Use NODE_IP:NODE_PORT for the bootstrap server, but the client will eventually discover all brokers on their respective NodePorts.

The authentication and TLS steps for getting the user password and CA certificate remain the same as for OpenShift (get from secrets my-app-user and my-kafka-cluster-cluster-ca-cert).

6. Monitoring and Management

Kafka Exporter: Strimzi includes kafka-exporter which provides Prometheus metrics for Kafka brokers and topics. You can integrate this with Prometheus and Grafana for comprehensive monitoring.
Kafka UIs:
- Kafka UI (formerly akhq): A popular web-based UI for managing Kafka clusters. You can deploy it as another Kubernetes deployment.
- Control Center (Confluent): More feature-rich but requires a license for advanced features.
Strimzi Documentation: The official Strimzi documentation is an excellent resource for advanced configurations, upgrades, and troubleshooting.

7. Security Best Practices

TLS Everywhere: Always use TLS for communication between clients and brokers, and ideally for inter-broker communication too (Strimzi handles this by default).
Authentication & Authorization: Use SCRAM-SHA-512 or mTLS for authentication and ACLs (configured via KafkaUser CRs) for fine-grained authorization.
Network Policies: Implement Kubernetes Network Policies to restrict network access to your Kafka pods.
Resource Limits: Set CPU and memory limits/requests for Kafka and ZooKeeper pods to prevent resource contention.
SELinux: If not using OpenShift, ensure your host SELinux policies are correctly configured to allow container operations without disabling it entirely.
Secrets Management: Use Kubernetes Secrets to store sensitive data like user passwords. Strimzi does this automatically.

8. Scalability and Resilience

Strimzi Scaling: You can easily scale your Kafka brokers by modifying the replicas field in the Kafka CR. Strimzi handles the rolling updates.
Topic Partitions & Replicas: Design your topics with appropriate numbers of partitions and replication factors (e.g., 3 replicas for fault tolerance).
Anti-Affinity: Strimzi automatically configures anti-affinity rules to ensure Kafka and ZooKeeper pods are spread across different nodes for high availability.
Persistent Storage: Using reliable persistent storage (like Rook-Ceph or cloud block storage) is critical for data durability.

Conclusion

Building a self-hosted Kafka server on a Red Hat Kubernetes cluster with Strimzi provides a powerful, scalable, and manageable messaging platform. While the initial setup involves a few steps, Strimzi significantly simplifies the ongoing operations, making it an excellent choice for production environments. Always refer to the official Strimzi and Kubernetes documentation for the most up-to-date and detailed information.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.