How to Install Zookeeper and Kafka Cluster

Larry Deng
5 min readApr 17, 2023

--

This article explains how to install the zookeeper and kafka cluster. For convenience, it will just run the cluster on single server, but it's the same for multiple servers case.

Introduction

Key Concept of Zookeeper

ZooKeeper is a distributed coordination service that provides a hierarchical key-value store used to maintain configuration information, provide distributed synchronization, and offer group services. The key concepts of ZooKeeper are:

  1. Nodes: ZooKeeper stores data in a hierarchical namespace similar to a file system. Each node in the namespace is called a “znode” and can store a small amount of data, typically less than 1 MB.
  2. Watches: Clients can set watches on znodes to receive notifications when the znode changes. Watches are one-time triggers that are fired when the data associated with a znode changes or when a znode is deleted.
  3. Quorums: ZooKeeper is designed to operate in a replicated mode, which provides high availability and fault tolerance. ZooKeeper uses a consensus protocol called ZAB (ZooKeeper Atomic Broadcast) to maintain consistency across all the nodes in the cluster. To achieve this, a quorum of nodes must agree on any changes to the data stored in ZooKeeper.
  4. Sessions: When a client connects to ZooKeeper, it creates a session. The session is used to maintain the connection between the client and the server and can be used to associate watches with znodes. ZooKeeper sessions have timeouts, and clients must periodically renew their sessions to prevent them from expiring.
  5. ACLs: ZooKeeper provides access control lists (ACLs) to control access to znodes. ACLs can be used to restrict access to certain znodes or to certain operations on znodes.

Overall, ZooKeeper provides a simple and reliable way to coordinate distributed systems by providing a shared and consistent view of configuration information and synchronization primitives. It is a critical component in many distributed systems and is widely used in production environments.

Key Concept of Kafka

Kafka is a distributed streaming platform that is used for building real-time data pipelines and streaming applications. The key concepts of Kafka are:

  1. Topics: A topic is a category or feed name to which messages are published. A topic is divided into partitions, which allows for scalability and parallelism.
  2. Partitions: A partition is a ordered sequence of messages in a topic. Each partition is a separate file on the broker, and messages within a partition are ordered by their offset.
  3. Brokers: A broker is a Kafka server that stores and receives messages from producers and consumers. A Kafka cluster consists of one or more brokers.
  4. Producers: Producers are processes that write messages to Kafka topics. They can specify which partition they want to write to, or they can rely on the default partitioner to select a partition.
  5. Consumers: Consumers are processes that read messages from Kafka topics. They can read from one or more partitions, and can maintain their own offset in each partition they consume from.
  6. Consumer Groups: Consumer groups are sets of consumers that work together to consume a topic. Each message is consumed by only one consumer in a consumer group, which allows for parallel consumption.
  7. Offsets: An offset is a unique identifier for each message within a partition. Consumers can keep track of the last message they read by storing the offset of the last message they consumed.
  8. Replication: Kafka provides replication of partitions for fault tolerance. Each partition can have multiple replicas, and each replica is stored on a different broker. This allows for high availability and durability of data.

Overall, Kafka provides a scalable, fault-tolerant, and distributed messaging system that can handle large volumes of data in real-time. Its key features include topics, partitions, brokers, producers, consumers, consumer groups, offsets, and replication.

Install Zookeeper Cluster

Download Zookeeper

Download the Zookeeper installation package:

curl https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz -o apache-zookeeper-3.7.1-bin.tar.gz

Unzip the installation package:

tar xvf apache-zookeeper-3.7.1-bin.tar.gz

Create Configuration

Create the folder zk1, and add the config:

zk1/myid:

1

zk1/zk.config:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/zookeeper/zk1
clientPort=2181

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

Repeat for 2 and 3:

zk2/myid:

2

zk2/zk.config:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/zookeeper/zk2
clientPort=2182

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

zk3/myid:

3

zk3/zk.config:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/zookeeper/zk3
clientPort=2183

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890

Start the servers

Start the 3 servers:

$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh start ./zk1/zk.config 
ZooKeeper JMX enabled by default
Using config: ./zk1/zk.config
Starting zookeeper ... STARTED

$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh start ./zk2/zk.config
ZooKeeper JMX enabled by default
Using config: ./zk2/zk.config
Starting zookeeper ... STARTED

$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh start ./zk3/zk.config
ZooKeeper JMX enabled by default
Using config: ./zk3/zk.config
Starting zookeeper ... STARTED

Health check

Check the status:

$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh status ./zk1/zk.config 
ZooKeeper JMX enabled by default
Using config: ./zk1/zk.config
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower


$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh status ./zk2/zk.config
ZooKeeper JMX enabled by default
Using config: ./zk2/zk.config
Client port found: 2182. Client address: localhost. Client SSL: false.
Mode: leader


$ ./apache-zookeeper-3.7.1-bin/bin/zkServer.sh status ./zk3/zk.config
ZooKeeper JMX enabled by default
Using config: ./zk3/zk.config
Client port found: 2183. Client address: localhost. Client SSL: false.
Mode: follower

Connect to one server and create data:

$ ./apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server localhost:2181

[zk: localhost:2181(CONNECTED) 0] create /pkslow
Created /pkslow
[zk: localhost:2181(CONNECTED) 1] create /pkslow/website www.pkslow.com
Created /pkslow/website

Connect to other server to check the data:

$ ./apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server localhost:2182

[zk: localhost:2182(CONNECTED) 1] get /pkslow/website
www.pkslow.com

Install Kafka Cluster

Download Kafka

Download the package:

curl https://downloads.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz -o kafka_2.13-3.4.0.tgz

Unzip the package:

tar -xzf kafka_2.13-3.4.0.tgz

Configuration

Configuration for broker1:

broker.id=1
port=9091
listeners=PLAINTEXT://:9091
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
log.dirs=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/kafka/kafka1/kafka-logs

Configuration for broker2:

broker.id=2
port=9092
listeners=PLAINTEXT://:9092
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
log.dirs=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/kafka/kafka2/kafka-logs

Configuration for broker3:

broker.id=3
port=9093
listeners=PLAINTEXT://:9093
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
log.dirs=/Users/larry/IdeaProjects/pkslow-samples/other/install-kafka-cluster/src/main/kafka/kafka3/kafka-logs

Start the brokers

Start the kafka servers:

./kafka_2.13-3.4.0/bin/kafka-server-start.sh ./kafka1/server.properties
./kafka_2.13-3.4.0/bin/kafka-server-start.sh ./kafka2/server.properties
./kafka_2.13-3.4.0/bin/kafka-server-start.sh ./kafka3/server.properties

Check and Test

Create topic:

$ kafka_2.13-3.4.0/bin/kafka-topics.sh --create --topic pkslow-topic --bootstrap-server localhost:9091,localhost:9092,localhost:9093 --partitions 3 --replication-factor 3
Created topic pkslow-topic.

List topic:

$ kafka_2.13-3.4.0/bin/kafka-topics.sh --list --bootstrap-server localhost:9091,localhost:9092,localhost:9093
pkslow-topic

Describe the topic:

$ kafka_2.13-3.4.0/bin/kafka-topics.sh --describe --topic pkslow-topic --bootstrap-server localhost:9091,localhost:9092,localhost:9093
Topic: pkslow-topic TopicId: 7CLy7iZeRvm8rCrn8Dw_mA PartitionCount: 3 ReplicationFactor: 3 Configs:
Topic: pkslow-topic Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: pkslow-topic Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: pkslow-topic Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1

Producer sends message to brokers:

$ kafka_2.13-3.4.0/bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092,localhost:9093 --topic pkslow-topic
>My name is Larry Deng.
>My website is www.pkslow.com.
>

Consumer receives message from brokers:

$ kafka_2.13-3.4.0/bin/kafka-console-consumer.sh --bootstrap-server localhost:9091,localhost:9092,localhost:9093 --topic pkslow-topic --from-beginning
My name is Larry Deng.
My website is www.pkslow.com.

code

Please check the configuration on GitHub pkslow-samples

--

--