Hands-on

Running on AWS EC2 (ubuntu, t2.micro)
Running on Mac
Test the Kafka broker is working

Configuring the Broker

General Broker Parameters

Why use a Chroot path?

It is generally considered to be good practice to use chroot path for the Kafka cluster. This allows the ZooKeeper ensemble to be shared with other applications, including other Kafka clusters, without a conflict. It is also best to specify multiple ZooKeeper servers in this configuration. This allows the Kafka broker to connect to another member of the ZooKeeper ensemble in the event of server failure


Topic Defaults Parameters

Summary as Ankicard

💡 What are the two simple commands to test a Kafka server?

kafka-console-producer --broker-list localhost:9092 --topic test
kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning

💡 What is broker.id config?

Every broker must have a integer identifier

💡 What is log.dirs config?

Where the log segments are stored in log.dir, for multiple directories, log.dirs are preferable.

💡 How will Kafka broker store the partitions in multiple log directories if log.dirs is configured?

Kafka will store the partitions in a “least-used” fashion

💡 What configuration will prevent Kafka to automatically create topic by sending message to a non-existing topic?

auto.create.topics.enable

💡 Which parameter configs how many partitions a new topic is created with?

num.partitions

💡 Can the number of partitions for a topic be decreased?

No, the number of partitions for a topic can only be increased

💡 What is the replication factor configured by default.replication.factor?

It is the number of copies of data across several brokers. It should be greater than 1 to ensure the reliability

💡 What is the min.insync.replicas config and how does it ensure the reliability?

You should set this to 2 at least to ensure that at least two replicas are caught up and “in sync” with the producer. This enables the semi-sync replication strategy.

💡 How does log.retention.ms and log.retention.bytes works differently as Kafka’s retention policies?

log.retention.ms is the most common retention policy - for how long the Kafka will retain the message, it indicates the amount of time after which messages may be deleted. log.retention.bytes indicates once the log segment has reached to the size specified by this parameter (defaults to 1GB), the log segment is closed and a new one is opened. Once a log segment has been closed, it can be considered expiration.

Buy Me A Coffee