This project is designed to analyze sentiment in IRC chat messages from the osu! community via the #osu general chat. It uses a combination of Docker, Kafka, Logstash, Elasticsearch, Kibana, and Spark to process and visualize chat data.
graph LR
A[osu! IRC Server] -->|Chat Messages| B[Ingestion Service]
B -->|Forward Messages| C[Logstash]
C -->|Forward Messages| D[Kafka]
D -->|Consume Messages| E[Spark]
E -->|Sentiment Analysis| F[Elasticsearch]
F -->|Store Data| G[Kibana]
subgraph Data Ingestion
B
end
subgraph Data Processing
C
D
E
end
subgraph Data Storage & Visualization
F
G
end
.env.example
.gitignore
docker-compose.yml
ingestion/
.env
Dockerfile
ingestion.py
Kibana/
dashboard.ndjson
logstash/
config/
logstash.conf
Dockerfile
spark/
Dockerfile
sentiment_analysis.py
worker.Dockerfile
The ingestion service connects to the osu! IRC server, collects chat messages, and sends them to Logstash.
- Dockerfile: Defines the Docker image for the ingestion service.
- ingestion.py: The main script for connecting to the IRC server and sending messages to Logstash.
- .env: Contains environment variables for the IRC connection.
Logstash receives chat messages from the ingestion service and forwards them to Kafka.
- Dockerfile: Defines the Docker image for Logstash.
- logstash.conf: Configuration file for Logstash.
Kafka is used as a message broker to handle the chat messages.
Spark processes the chat messages, performs sentiment analysis, and sends the results to Elasticsearch.
- Dockerfile: Defines the Docker image for the Spark master and Spark submit services.
- worker.Dockerfile: Defines the Docker image for the Spark worker service.
- sentiment_analysis.py: The main script for performing sentiment analysis on chat messages.
Elasticsearch stores the processed chat messages and their sentiment scores.
Kibana is used to visualize the chat messages and their sentiment scores.
- dashboard.ndjson: Contains the Kibana dashboard configuration.
-
Clone the repository:
git clone https://github.com/yourusername/osu-chat-sentiment-analysis.git cd osu-chat-sentiment-analysis
-
Copy the example environment file:
cp .env.example ingestion/.env
-
Edit the .env file with your osu! IRC credentials.
-
Build and start the Docker containers:
docker-compose up --build
- The ingestion service connects to the osu! IRC server and collects chat messages.
- Logstash receives the messages and forwards them to Kafka.
- Spark processes the messages, performs sentiment analysis, and sends the results to Elasticsearch.
- Kibana visualizes the chat messages and their sentiment scores.
You can access the Kibana dashboard at http://localhost:5601
to see the visualizations of the chat messages and their sentiment scores.
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.