This repository helps you understand the basic components needed to build a data pipeline for IoT data and how they work together. Use this setup to test individual components or see how they function as a complete system. You can also expand this setup to create a more complex pipeline and deploy it to cloud platforms like AWS, Azure, or Google Cloud.
I chose Docker Compose for local deployment to focus on understanding the components and their interactions without the complexity of cloud providers. This approach also makes it easy to share the setup and run it on any machine with minimal effort.
The pipeline and infrastructure include:
- MQTT Broker
- MQTT Agent/Application
- Data Lake (MinIO)
- Database (Cassandra)
- REST API (FastAPI)
- Orchestration (Airflow)
- Transformation (ELT)
The components are connected as follows:
- The MQTT Broker is the entry point for the data. It receives data from the IoT devices and publishes it to a topic.
- The MQTT Agent subscribes to the topic and writes it to the Data Lake.
- The Data Lake stores raw data and acts as the source for the Transformation component.
- The Transformation reads raw data from the Data Lake, processes it, and writes it to the Database. Airflow is used to orchestrate the workflow.
Once you have clean data in the database, you can use it for analytics, machine learning, or other applications.
Prerequisites
Component Testing
- Docker
- Docker Compose
- Clone this repository
git clone https://github.com/daleonpz/iot_cloud_test.git
cd iot_cloud_test
- Build and Run
cd mqtt
docker build -t my-broker .
docker run -d --name my-broker -p 1883:1883 my-broker
- Test MQTT Broker
- Subscribe to Topic
docker exec -it my-broker mosquitto_sub -h localhost -t test
- Publish to Topic
In another terminal:
docker exec -it my-broker mosquitto_pub -h localhost -t test -m "hello"
- Build and Run
cd datalake
docker build -t my-datalake .
docker run -d --name my-datalake -p 9000:9000 -e "MINIO_ACCESS_KEY=minio" -e "MINIO_SECRET_KEY=minio123" my-datalake server /data --console-address ":9001"
- Access Data Lake
Open http://localhost:9000 in your browser.
- Access Key: minio
- Secret Key: minio123
If not accessible via localhost, use the container's IP address:
docker logs my-datalake
- Build and Run
cd database
docker build -t my-db .
docker run -d --name my-db -p 9042:9042 my-db
- Test Cassandra with cqlsh
docker exec -it my-db cqlsh localhost
- Run the following commands in cqlsh:
CREATE KEYSPACE iot WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE iot;
CREATE TABLE measurements (id UUID PRIMARY KEY, temperature float, battery_level float);
INSERT INTO measurements (id, temperature, battery_level) VALUES (uuid(), 25.0, 50.0);
SELECT * FROM measurements;
- Build and Run
cd restapi
docker build -t api .
docker run -d --name api -p 8000:8000 --link my-db:my-db api
- Test API
For debugging:
docker run -it --name api -p 8000:8000 --link my-db:my-db api bash
- Send Data to the Database
curl -X GET "http://localhost:8000/data/{id}" -H "accept: application/json" -d '{"temperature": 25.0, "battery_level": 50.0}'
- Get Data from the Database
curl -X POST "http://localhost:8000/data/{id}" -H "accept: application/json"
- Build and Run
docker-compose -f docker-compose.yml.etl_test up --build
- Verify Data
docker exec -it my-db cqlsh localhost
- Run the following commands in cqlsh:
USE iot;
SELECT * FROM measurements;
- Build and Run
docker-compose -f docker-compose.yml.mqtt_app_test up --build
- Publish Test Data
cd mqtt/
python mqtt_publisher_test.py
- Build and Run
docker-compose -f docker-compose.yml up --build
- Publish Test Data
cd mqtt/
python mqtt_publisher_test.py
- Access Airflow
Log in to http://localhost:8080 with:
- Username: airflow
- Password: airflow
Trigger the DAG:
- Click on "transform_data" under the "DAG" tab.
- Click "Trigger DAG" or the "Play" button.
- Verify Data
docker exec -it my-db cqlsh localhost
Run the following commands in cqlsh:
USE iot;
SELECT * FROM measurements;
- Remove All Containers
./tools/delete_containers.sh
- Delete All Images
./tools/delete_docker_images.sh
- There is a
.env
file in the root directory that sets environment variables for the services. You can modify this file as needed.