Track
In the world of modern data pipelines and microservices, Apache Kafka stands out as a go-to solution for handling real-time event streams and integrating distributed systems. As more engineering teams move toward scalable, event-driven architectures, Kafka becomes central to reliable data movement.
Alongside this trend, Docker has emerged as the leading way for developers to manage, share, and deploy complex services without worrying about inconsistent environments or local issues.
By using Docker to run Kafka, teams can quickly spin up production-like clusters for testing, proof-of-concept demos, or even production workloads.
This article is aimed at intermediate backend developers, DevOps engineers, and anyone involved in managing data platforms who wants to make working with Kafka straightforward and reproducible.
If you are new to Kafka or Docker, be sure to refer to our Introduction to Apache Kafka course or Docker for Beginners: A Practical Guide to Containers.
What is Kafka, and Why Use Docker?
Apache Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, and scalable messaging.
It acts as a durable, high-speed pipeline between data producers and consumers. Whenever you need to connect microservices, orchestrate data flows, or process real-time analytics, Kafka is often the tool of choice.
You can learn how to build real-time data processing applications with Kafka Streams with our Kafka Streams Tutorial. The tutorial covers core concepts, Java & Python implementations, and step-by-step examples for building scalable streaming applications.
Running Kafka can be complex. Even experienced engineers find the complexity of brokers, topics, partitions, and necessary components like Zookeeper or KRaft frustrating.
Docker simplifies the process by packaging up binaries, dependencies, and configurations into containers.
With Docker, Engineers can start a Kafka cluster on any machine, share identical setups with their colleagues, and avoid the common "works on my machine" headache. Dockerized Kafka is especially popular for:
- Local development and testing, where fast spins and teardowns are invaluable
- Isolated integration environments in CI/CD pipelines
- Mimicking multi-broker clusters on a single host
- Training and demonstration setups
Explore Apache Kafka with our Apache Kafka for Beginners guide. Learn the basics, get started, and uncover advanced features and real-world applications of this powerful event-streaming platform.
Fundamentals of Kafka in Docker
Before diving into orchestration, it helps to break down both Kafka’s technical pieces and how Docker helps recreate its distributed nature on a laptop or server.
As you will learn in our Learn Docker from Scratch tutorial, Docker is a popular tool for simplifying the deployment, scaling, and management of applications such as Kafka using containerization.
Kafka architecture basics
Kafka’s backbone is its broker: a server process that stores, receives, and serves messages. Each cluster can consist of multiple brokers, distributing data and load.
Inside each broker are topics (named channels for organizing messages) and partitions (sub-channels that spread events across many servers for parallelism and durability). Producers write data to topics, while consumers subscribe to and process those messages.
Coordination is key. Traditionally, Kafka has relied on Zookeeper to handle broker metadata, partition leadership, and overall cluster health. Newer deployments may use KRaft mode, which internalizes this coordination into Kafka itself, removing the Zookeeper dependency. ZooKeeper was deprecated in 3.5, and is on track for removal in 4.0.
Learn how to containerize machine learning applications with Docker and Kubernetes using our How to Containerize an Application Using Docker tutorial.
KRaft mode: Simplifying Kafka's architecture
Starting from version 2.8, Kafka introduced KRaft mode, allowing it to manage metadata internally without relying on Zookeeper, being declared production-ready in version 3.3. This shift simplifies deployments and reduces the number of components to manage.
To set up Kafka in KRaft mode using Docker Compose:
version: '3.8'
services:
kafka:
image: apache/kafka:latest
container_name: kafka
ports:
- "9092:9092"
- "9093:9093"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LOG_DIRS: /var/lib/kafka/data
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_LOG_RETENTION_HOURS: 168
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
CLUSTER_ID: "Mk3OEYBSD34fcwNTJENDM2Qk"
volumes:
- ./data:/var/lib/kafka/data
This configuration sets up a single-node Kafka cluster in KRaft mode, eliminating the need for Zookeeper.
Docker essentials
To run Kafka with Docker, you only need a few basics: Docker Engine (the runtime) and often Docker Compose for multi-container management.
Compose files allow you to specify several services (Kafka, Zookeeper, Kafka UI, etc.), their networks, environment variables, and any storage mounts. Docker’s networking makes it possible to emulate multi-broker clusters, even on a single machine, by giving each broker its container, hostnames, and ports.
One of the best ways to get familiar with Docker is through projects. Practice using these 10 Docker Project Ideas.
Setting Up Kafka with Docker Compose
Docker Compose is often the preferred way to run applications that require multiple interacting containers. Instead of juggling shell scripts or manual commands, Compose lets you define all services and their links in a single YAML file.
This approach reduces configuration drift, speeds up onboarding, and guarantees everyone can use the same stack for development.
Minimal Compose setup
Here’s a basic docker-compose.yml
that starts Kafka and Zookeeper for local development:
version: '3'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ports:
- "2181:2181"
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
ports:
- "9092:9092"
volumes:
- kafka_data:/var/lib/kafka/data
volumes:
kafka_data:
This minimal setup maps the relevant ports, links Kafka to Zookeeper, and mounts a Docker volume, so messages do not vanish if the containers restart. Be sure to use environment variables to configure broker identity, listener addresses, and connection endpoints between services.
A thorough understanding of Kafka is essential in acing a data engineering interview. Prepare for your next data engineering interview with our extensive list of Kafka interview questions and answers using these 20 Kafka Interview Questions for Data Engineers.
Extended configuration
Professional setups often add tools such as Kafka UI, Schema Registry, or REST Proxy for easier management and inspection of messages. These can be added to the services block in your compose file. For example:
kafka-ui:
image: provectuslabs/kafka-ui:latest
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: "Local"
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: "kafka:9092"
Persisting Kafka topics and logs across restarts is essential for testing recovery scenarios or long-running jobs. Always mount Docker volumes to /var/lib/kafka/data
for each broker and /var/lib/zookeeper
for Zookeeper, if in use.
Choosing the right image is key. Here is a quick summary:
Kafka Docker Image |
Maintainer |
Features |
Best Use Case |
Confluent |
Confluent |
Full set, many add-ons |
Production-like setups, advanced testing |
Bitnami |
Bitnami |
Clean, minimal |
Local development, resource-constrained environments |
Apache Kafka (KRaft) |
Apache |
Zookeeper-less, simplified setup |
Modern deployments, simplified architecture |
The How to Learn Apache Kafka in 2025 tutorial goes into more details about Kafka, including
- What makes Apache Kafka popular?
- Main features of Apache Kafka
- Various applications of Apache Kafka
Interacting with Kafka in Docker
Once Kafka is running via Docker Compose, you can immediately create topics and send data, just as you would with any standard deployment.
CLI access and Kafka commands
Use docker-compose exec
or docker exec
to run Kafka CLI tools. For example, to create a topic, you can execute:
docker-compose exec kafka kafka-topics.sh --create --topic demo --bootstrap-server localhost:9092
You can also use kafka-console-producer.sh
and kafka-console-consumer.sh
in the same way. This provides a fast feedback loop for integration testing or experimentation without polluting your local environment.
Programmatic access from apps
Applications and scripts can connect to your containerized Kafka using the host’s advertised bootstrap servers. For local apps, set bootstrap.servers=localhost:9092
or the equivalent.
Common clients include the official Kafka libraries for Python, Java, NodeJS, and Go. Watch that your application’s network stack can reach the correct ports and addresses.
Docker Networking and Kafka Connectivity
Kafka's networking configuration frequently presents challenges. Understanding internal and external listeners and how to expose services is vital.
Internal vs external listeners
Kafka brokers use listeners to control client connections. Two common setups are:
PLAINTEXT://:9092
for local, unsecured traffic, especially for devSSL
orSASL_SSL
listeners for encrypted or authenticated connections
In docker-compose
, expose the right ports and ensure KAFKA_ADVERTISED_LISTENER
S matches the actual host and port your apps use. If running in a VM or cloud, set this configuration to the public IP and mapped port.
Aspect |
Internal Listener |
External Listener |
Common Protocol |
PLAINTEXT://:9092 |
SSL://, SASL_SSL://, or mapped PLAINTEXT:// |
Security |
Unencrypted, unauthenticated |
Encrypted and/or authenticated |
Typical Usage |
Local development, intra-container traffic |
Remote clients, cloud VMs, production access |
Docker Compose Setup |
Expose port 9092 inside network |
Map 9092 to host, set KAFKA_ADVERTISED_LISTENERS to public IP |
Configuration Focus |
Simplicity and speed |
Reliability, security, public access |
Networking Scope |
Localhost or internal Docker network |
Public IP or external-facing domain |
Configuring multiple listeners for diverse client access
In Docker environments, it's common to set up multiple listeners to handle different client access scenarios:
environment:
KAFKA_LISTENERS: INTERNAL://0.0.0.0:29092,EXTERNAL://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:29092,EXTERNAL://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
This setup allows internal Docker clients to connect via the INTERNAL
listener and external clients to connect via the EXTERNAL
listener.
Troubleshooting connectivity
Common errors include ‘broker not available’, ‘connection refused’, or client timeouts. Verify:
- Ports are exposed and mapped correctly
- Broker advertised listeners match your client’s target address
- All containers are healthy using
docker-compose ps
- DNS resolution works between containers (use the service name, e.g.
kafka:9092
) - Try docker network inspect to debug cross-container links
Kafka Docker in Production-Like Environments
Beyond local testing, Dockerized Kafka environments also help with staging and CI/CD.
Container orchestration strategies
Orchestrators like Docker Swarm and Kubernetes can manage multi-broker Kafka setups, rolling updates, and service discovery. Each broker gets its container, with attached persistent storage.
In Kubernetes, StatefulSets
handle ordered deployment and create stable DNS names for each broker.
Logging and monitoring
Direct Kafka logs to the Docker log driver or external storage for analysis. Many teams pipe logs to Elasticsearch, Loki, or Splunk for troubleshooting. Pair Dockerized Kafka with Prometheus and Grafana for cluster monitoring.
Optimization Tips for Kafka in Docker
Kafka’s performance depends on careful resource and storage management.
Resource allocation
Give each broker container enough CPU, RAM, and disk to mimic production as closely as possible. Tune Docker resource limits and pass JVM options with KAFKA_JVM_PERFORMANCE_OPTS
for heap and garbage collector settings.
Persistent storage
Always use Docker volumes or bind mounts for /var/lib/kafka/data
and /var/lib/zookeeper
, storing data inside the container means you lose it on restart, which defeats any durability tests. High disk throughput is critical for sustained performance, especially in CI/CD load testing.
Best Practices and Common Pitfalls
Stability, maintainability, and team productivity all benefit from a few best practices.
Configuration management
Manage secrets and configurations externally with .env
files or mounted config directories. Never hardcode sensitive data in your compose
YAML. For reused setups, consider templating tools or config management frameworks.
Common mistakes
Avoiding common mistakes
- Do not store data inside containers; always mount volumes
- Check for port conflicts on your host
- Avoid using default admin passwords; secure exposed ports, even for local development
- Keep Docker images up to date; patch for CVEs and deprecated dependencies
Strengthening Kafka Security
To secure your Kafka deployment:
- Enable TLS Encryption: Protect data in transit by configuring SSL/TLS for all connections.
- Implement Authentication: Use SASL mechanisms (e.g., SCRAM, GSSAPI) to authenticate clients.
- Set Up Authorization: Define Access Control Lists (ACLs) to control client permissions.
- Regularly Rotate Credentials: Change passwords and keys periodically to minimize risk.
- Monitor and Audit: Enable audit logs to track access and changes within the cluster
Conclusion
Docker makes it remarkably easy to start, tweak, and experiment with Apache Kafka clusters for integration, learning, or even production-like testing. Containers insulate you from host OS issues and dependency mismatches, providing confidence that your dev and test environments match what you expect. As you grow from simple local clusters to orchestrated Kafka platforms, investing in robust Docker setups and best practices will save your team time and frustration.
Explore more information about Kafka and Docker from one of our comprehensive courses:
Docker Kafka FAQs
Do I need Zookeeper to run Kafka in Docker?
Not necessarily. While traditional Kafka setups rely on Zookeeper, newer versions support KRaft mode, which eliminates the need for Zookeeper by internalizing cluster coordination. However, many Docker Compose examples still use Zookeeper by default for compatibility and stability. ZooKeeper was deprecated in version 3.5 and is on track for removal in 4.0.
Can I use Kafka in Docker for production workloads?
While it is possible to deploy applications without orchestration tools like Kubernetes or Docker Swarm, it is not always recommended. For production, consider persistence, monitoring, security, scaling, and failover support. Docker is ideal for development and testing, but requires careful planning for production.
Why is my Kafka client getting connection errors when using Docker?
Most client connection issues stem from incorrect KAFKA_ADVERTISED_LISTENERS
settings. Make sure the listener value reflects the address the client uses to connect (e.g., localhost:9092 for local dev) and that Docker ports are exposed correctly.
Which Kafka Docker image should I use, Confluent, Bitnami, or Apache?
Use Confluent’s image for full-featured setups and advanced testing, Bitnami’s for straightforward local development, and Apache’s for minimal, customizable builds. Choose based on your use case, resource needs, and security requirements.
How do I persist data across Kafka container restarts?
Always use Docker volumes or bind mounts mapped to /var/lib/kafka/data
and /var/lib/zookeeper
(if used). Container filesystems are ephemeral, without external storage, all messages and topics will be lost on restarts.
