Streaming data is often too big for any one machine. Apache Kafka is a popular streaming platform that uses publish-subscribe patterns:
- Producers publish streaming data to topics
- Consumers subscribe to topics to process data in real-time
We'll write Python producers and consumers to work with Kafka topics.
Kafka needs space - it's big.
It also comes from the Linux world. We'll use WSL on Windows machines.
- Copy/fork this project into your GitHub account and create your own version of this project to run and experiment with.
- Name it
buzzline-02-yourname
where yourname is something unique to you.
Before starting, ensure you have completed the setup tasks in https://github.com/denisecase/buzzline-01-case first. Python 3.11 is required.
In this task, we will download, install, configure, and start a local Kafka service.
- Install Windows Subsystem for Linux (Windows machines only)
- Install Kafka Streaming Platform
- Start the Kafka service (leave the terminal open).
For detailed instructions, see:
Open your project in VS Code and use the commands for your operating system to:
- Create a Python virtual environment
- Activate the virtual environment
- Upgrade pip
- Install from requirements.txt
Open PowerShell terminal in VS Code (Terminal / New Terminal / PowerShell).
py -3.11 -m venv .venv
.venv\Scripts\Activate.ps1
py -m pip install --upgrade pip wheel setuptools
py -m pip install --upgrade -r requirements.txt
If you get execution policy error, run this first:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade -r requirements.txt
Producers generate streaming data for our topics.
In VS Code, open a terminal. Use the commands below to activate .venv, and start the producer.
Windows PowerShell:
.venv\Scripts\activate
py -m producers.kafka_producer_case
Mac/Linux:
source .venv/bin/activate
python3 -m producers.kafka_producer_case
Consumers process data from topics or logs in real time.
In VS Code, open a NEW terminal in your root project folder. Use the commands below to activate .venv, and start the consumer.
Windows PowerShell:
.venv\Scripts\activate
py -m consumers.kafka_consumer_case
Mac/Linux:
source .venv/bin/activate
python3 -m consumers.kafka_consumer_case
When resuming work on this project:
- Open the project folder in VS Code.
- Start the Kafka service (in WSL if Windows).
- Activate your local project virtual environment (.venv) in your OS-specific terminal.
To save disk space, you can delete the .venv folder when not actively working on this project. You can always recreate it, activate it, and reinstall the necessary packages later. Managing Python virtual environments is a valuable skill.
This project is licensed under the MIT License as an example project. You are encouraged to fork, copy, explore, and modify the code as you like. See the LICENSE file for more.