Setting up Kafka and Storm

This blog is about setting up of Kafka Storm cluster based on my experience

Prerequisites
  • JDK 1.7

 

Setting up Kafka 0.8.x and Storm 0.9.2

Setting up kafka 0.8.x and storm 0.9.2 has become relatively straightforward.

Steps to setup kafka

  • Download kafka from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
  • gunzip kafka_2.9.2-0.8.1.1.tgz
  • tar -xvf kafka_2.9.2-0.8.1.1.tar
  • cd kafka_2.9.2-0.8.1.1

Starting Kafka

  • ./zookeeper-server-start.sh ../config/zookeeper.properties &
  •  ./kafka-server-start.sh ../config/server.properties &c

Testing Kafka Setup

  • ./kafka-console-producer.sh –zookeeper localhost:2181 –topic test
    • Start providing messages
  • ./kafka-console-consumer.sh –zookeeper localhost:2181 –topic test –from-beginning
    • You should be able to get the messages

 

Setting up Storm-Kafka 

Storm kafka is being maintained part of storm release itself.

1) Since, oracle jars are not part of maven repository, download the following jars and install it in your server

mvn install:install-file -Dfile=jms-1.1.jar -DgroupId=javax.jms -DartifactId=jms -Dversion=1.1 -Dpackaging=jar
mvn install:install-file -Dfile=jmxtools-1.2.1.jar -DgroupId=com.sun.jdmk -DartifactId=jmxtools -Dversion=1.2.1 -Dpackaging=jar
mvn install:install-file -Dfile=jmxri-1.2.1.jar -DgroupId=com.sun.jmx -DartifactId=jmxri -Dversion=1.2.1 -Dpackaging=jar

2) if you are writing your own maven pom.xml, make sure to exclude zookeeper and log4j like this

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>

Sample Code 

This code should help you get going https://github.com/sathya21/kafka-storm-implementation 

 

Setting up Kafka 0.7 and Storm 0.9

Though the current Kafka version is 0.8.0, kafka storm works with Kafka 0.7.2 compiled with scala 2.9.2

Steps to setup Kafka

  • project.name=Kafka
  • sbt.version=0.7.5
  • project.version=0.7.2
  • build.scala.versions=2.9.2
  • contrib.root.dir=contrib
  • lib.dir=lib
  • target.dir=target/scala_2.9.2
  • dist.dir=dist
  • ./sbt update
  • ./sbt “++2.9.2 package”
  • open kafka_run_class.sh and replace reference to 2.8.0 scala directory to 2.9.2 scala directory (%s/2.8.0/2.9.2)

Starting Kafka

  • ./zookeeper-server-start.sh ../config/zookeeper.properties &
  •  ./kafka-server-start.sh ../config/server.properties &c

Testing Kafka Setup

  • ./kafka-console-producer.sh –zookeeper localhost:2181 –topic test
    • Start providing messages
  • ./kafka-console-consumer.sh –zookeeper localhost:2181 –topic test –from-beginning
    • You should be able to get the messages

Setting up Storm-Kafka 

<dependency>
<groupId>storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.0-wip16a-scala292</version>
</dependency>

  • and change the version of storm
                <dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0-rc2</version>
<scope>provided</scope>
</dependency>
  •  vi ./src/jvm/storm/starter/KafkaTopology.java
  •  Change the host to point to ZooKeeper
    •         SpoutConfig kafkaConf = new SpoutConfig(new SpoutConfig.ZkHosts(“localhost:2181″,”/brokers”), “test”, “/kafkastorm”, “discovery”);
  • Run it using r mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.KafkaTopology

Testing the setup

  • Use the kafka-console-producer.sh to produce data to “test” topic and you can see the data printed on your KafkaTopology console.

References

 

Advertisements

2 thoughts on “Setting up Kafka and Storm

  1. I finally managed to run Storm thanks to your tutorial! Thank you very much.

    But I still have a problem: The kafkaTopology doesn’t get any information from (connect to) the kafka producer. I am running kafka 2.9.2 from another website (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.8+Quick+Start) because the apache links are down (??). The zookeeper recognises that Storm connected to it.

    Do you know what the problem can be?
    Do you have the kafka version (tgz) that you used for this tutorial?

    Thanks in advance!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s