Apache Kafka is a powerful distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In this guide, we will walk you through the step-by-step process of installing and setting up Apache Kafka on Ubuntu.
Prerequisites
Before you begin, ensure that you have the following:
- A machine running Ubuntu (preferably the latest version).
- Administrative access to the machine (you can execute
sudo
commands).
Step 1: Install Java
Apache Kafka requires Java to run. You can install the default Java Runtime Environment (JRE) using the following commands:
sudo apt update
sudo apt install default-jre -y
Verify the installation by checking the Java version:
java -version
Step 2: Download Apache Kafka
Visit the official Kafka download page to get the latest stable version. Alternatively, use the wget
command to download Kafka directly:
wget https://downloads.apache.org/kafka/3.7.1/kafka_2.13-3.7.1.tgz
Step 3: Extract Kafka
Extract the downloaded Kafka archive:
tar -xzf kafka_2.13-3.7.1.tgz
Navigate to the Kafka directory:
cd kafka_2.13-3.7.1
Step 4: Start the Kafka Server
Kafka requires ZooKeeper to manage and coordinate the Kafka brokers. Start the ZooKeeper service and Kafka server in separate terminal windows or tabs.
1. Start ZooKeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
2. Start the Kafka server:
bin/kafka-server-start.sh config/server.properties
Step 5: Create a Kafka Topic
In a new terminal window or tab, create a topic named “my_topic” with a replication factor of 1 and a partition count of 1:
bin/kafka-topics.sh --create --topic my_topic --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1
Step 6: Test Apache Kafka
To ensure Kafka is working correctly, create a producer to send messages to the “my_topic” topic and a consumer to read messages from the topic.
1. Start a Kafka producer in a new terminal window or tab:
bin/kafka-console-producer.sh --topic my_topic --bootstrap-server localhost:9092
Type messages in the producer terminal. These messages will be sent to Kafka.
2. Start a Kafka consumer in another terminal window or tab:
bin/kafka-console-consumer.sh --topic my_topic --from-beginning --bootstrap-server localhost:9092
The consumer will display messages sent by the producer.
Frequently Asked Questions (FAQs)
1. What is Apache Kafka?
Apache Kafka is a distributed event streaming platform capable of handling high-throughput, low-latency data feeds. It is commonly used for building real-time data pipelines and streaming applications.
2. Why do I need Java to run Apache Kafka?
Apache Kafka is written in Java and requires a Java Runtime Environment (JRE) to run. Installing Java provides the necessary environment for Kafka to operate.
3. How can I verify if Java is installed correctly on my Ubuntu machine?
You can verify the Java installation by running the following command:
java -version
This command should display the installed Java version.
4. Where can I download the latest version of Apache Kafka?
You can download the latest stable version of Apache Kafka from the official Kafka download page.
5. What is ZooKeeper, and why is it needed for Kafka?
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Kafka uses ZooKeeper to manage and coordinate the Kafka brokers.
6. How do I start ZooKeeper and Kafka on Ubuntu?
To start ZooKeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
To start Kafka:
bin/kafka-server-start.sh config/server.properties
These commands should be run in separate terminal windows or tabs.
7. What is a Kafka topic?
A Kafka topic is a category or feed name to which records are stored and published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
8. How can I create a topic in Kafka?
You can create a topic named “my_topic” with the following command:
bin/kafka-topics.sh --create --topic my_topic --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1
9. How do I test if Kafka is working correctly?
To test Kafka, you can create a producer to send messages to a topic and a consumer to read messages from that topic.
Start a Kafka producer:
bin/kafka-console-producer.sh --topic my_topic --bootstrap-server localhost:9092
Start a Kafka consumer:
bin/kafka-console-consumer.sh --topic my_topic --from-beginning --bootstrap-server localhost:9092
Messages typed into the producer terminal should appear in the consumer terminal.
10. Can I run Kafka on Windows or macOS?
Yes, Apache Kafka can run on Windows, macOS, and Linux. The installation steps vary slightly for different operating systems, so refer to the official Kafka documentation for instructions specific to your OS.
11. What should I do if Kafka fails to start?
If Kafka fails to start, check the following:
- Ensure that ZooKeeper is running.
- Verify the configuration files (
zookeeper.properties
andserver.properties
) are correct. - Check for any error messages in the terminal or Kafka logs.
12. How do I stop the Kafka server and ZooKeeper?
To stop the Kafka server, press Ctrl+C
in the terminal where it is running. To stop ZooKeeper, press Ctrl+C
in the terminal where it is running.
13. Where can I find more information and support for Kafka?
For more information, refer to the official Kafka documentation. You can also seek support from the Kafka community through forums, mailing lists, and Stack Overflow.
Conclusion
Congratulations! You have successfully installed and set up Apache Kafka on your Ubuntu machine. You can now leverage Kafka’s powerful features to build real-time data pipelines and streaming applications. For more advanced configurations and usage, refer to the official Kafka documentation.