Getting Started With Kafka
By Diwanshu Shekhar
- 3 minutes read - 534 wordsApache Kafka is an open-source framework that allows you to develop real-time applications. In this article, I will jot down some points that may help you save some time and frustration if you’re just learning about Apache Kafka. First of all, to setup a development Kafka environment, it will save you a lot of hassle if you just use confluent distribution of Kafka as opposed to the native Apache version. Download the Confluent Platform from https://docs.confluent.io/current/. The confluent distribution of Kafka comes with Confluent CLI that streamlines the administrative work of setting up Kafka server. It also comes with a Control Center that allows you to monitor the Kafka server from a browser. The coolest thing is all the enterprise level features are available for free for a single broker Kafka server.
How to quickly generate data to your Kafka Topic
Although you can use Kafka Connect Datagen Connector to generate data, I find the command line way the quickest way to generate data to your Kafka topics. All you need to do is run the following command and boom - you have the data in your Kafka Topic -
ksql-datagen quickstart=users format=avro topic=topic3 maxInterval=100
You can also generate data in a json format by using json as a value for the format parameter. You can also provide pageviews as a value to the quickstart parameter to generate a different kind of data.
Understanding Kafka Connect
Kafka Connect is part of Apache Kafka®, providing streaming integration between data stores and Kafka. The figure below shows architectural diagrams for importing data from MySQL to Kafka and from Kafka to HDFS.
A converter sits right before the enters or leaves Kafka. When data is ingested into Kafka, the converter converts the data in specific format to byte[]. When data is exported out of the Kafka, it converts the data in byte[] format to the format it was used to originally in before it was imported into Kafka. Common converters include:
io.confluent.connect.avro.AvroConverter
org.apache.kafka.connect.storage.StringConverter
org.apache.kafka.connect.json.JsonConverter
org.apache.kafka.connect.converters.ByteArrayConverter
com.blueapron.connect.protobuf.ProtobufConverter
Converters can set by setting value.converter parameter in the properties files. While consuming JSON data from Kafka, if the data doesn’t have a schema attached to it, you should tell Kafka not to look for schema by setting value.converter.schemas.enable=false in the properties file
To see what parameters are available for the Kafka Connect: go here
Commonly used Confluent CLI commands
# start all servers for Kafka
confluent start
# check log a particular service such as Kafka Connect
confluent log connect
# consume from a topic from command line
confluent consume mytopic5 --from-beginning
# check installed connectors
confluent list connectors
# unload a connector such as hdfs-sink
confluent unload hdfs-sink
# load a connector
confluent load <name of connector> -d <path to properties file>
# check status of loaded connectors
confluent status connectors
Application Development
So far, I talked about the administrative aspects of Kafka. But, to develop Kafka application, you need have a good understanding of the concept of Producer and Consumer and Kafka Streams. I recommend two books - Kafka: the Definitive Guide and Kafka Streams In Action that will get you going on your application development. Additionally, this webpage from confluent is a good resource to go through.