MK Study Journal: Apache Kafka

As mention in https://kafka.apache.org/intro Apache Kafka is distributed streaming platform.

Kafka works on Publishers and Subscribers models.

Publishers publish the data into a Kafka Topic and Consumer that is listening this topic will consume this message.

Kafka used for real time data streaming.

Kafka Internal Processing.

Kafka has four core API

1. Producer API –

Publish a stream of records to one or more Kafka topics

2. Consumer API –

Subscribe to one or more topics and process the stream of records

3. Stream API –

Transforming the input streams to output streams.

4. Connector API –

Reusable producers or consumers that connect Kafka topics to existing applications or data system

Topic

A topic is a feed name to which records are published. Topics in Kafka are always multi-subscriber one or more consumer subscribe a topic.

Topic can have multiple partition. Refer below diagram

In above topic have two partition.

Kafka uses partitions to scale a topic across many servers for producer writes. In addition, Kafka also uses partitions to facilitate parallel consumers. Consumers consume records in parallel up to the number of partitions.

Kafka Topic Log Partition’s Ordering and Cardinality

Kafka maintains record order only in a single partition. A partition is an ordered, immutable record sequence. Kafka continually appended to partitions using the partition as a structured commit log. Records in partitions are assigned sequential id number called the offset. The offset identifies each record location within the partition. Topic partitions allow Kafka log to scale beyond a size that will fit on a single server. Topic partitions must fit on servers that host it, but topics can span many partitions hosted on many servers. In addition, topic partitions are a unit of parallelism - one consumer in a consumer group can only work on a partition at a time. Consumers can run in their own process or their own thread. If a consumer stops, Kafka spreads partitions across the remaining consumer in the same consumer group.

Kafka can replicate partitions across a configurable number of Kafka servers, which is used for fault tolerance. Each partition has a leader server and zero or more follower servers. Leaders handle all read and write requests for a partition.

Kafka Setup

1. Install JDK https://www.oracle.com/technetwork/es/java/javasebusiness/downloads/index.html

2. Set Java home, use below command to set JAVA_HOME

setx JAVA_HOME -m "Path". For “Path”, paste in your Java installation path .

3. Install zookeeper download tar file - http://zookeeper.apache.org/releases.html

4. Start zookeeper server zkServer.bat

5. Start zkCLI

6. Download Apache kafka - https://kafka.apache.org/quickstart

7. Start kafak server with below command

kafka-server-start.bat ..\..\config\server.properties

8. Now open a separate command prompt and run below command to create topic

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-name

9. Use below command to fetch list of all topics

kafka-topics --list --zookeeper localhost:2181

10. To publish the data from command line use below command

kafka-console-producer --broker-list localhost:9092 --topic topic-Name

11. To consume the message from Kafka topic use below command

kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic <topic-name>

MK Study Journal

Saturday, 8 June 2019

Apache Kafka

1 comment:

Total page views