Nimbus is a blog recommendation system designed to help users stay updated with the latest insights in technology. It aggregates blog updates from various sources, processes the data, and provides personalized recommendations through natural language queries.
- Automated Blog Fetching: Periodically collects blog updates from RSS/Atom feeds
- Content Processing: Generates summaries and embeddings using OpenAI API
- Vector Search: Fast similarity search using DuckDB with VSS extension
- REST API: Real-time blog recommendations based on natural language queries
- CLI Tool: Command-line interface for searching blogs
- Interactive Mode: User-friendly interface with search history
Nimbus is designed as a serverless and cost-effective system deployed on Google Cloud Platform (GCP). The system consists of three main services:
- Purpose: Fetches blog updates from RSS/Atom feeds
- Technology: Python with feedparser, BeautifulSoup
- Storage: BigQuery for structured data storage
- Deployment: Cloud Run Job triggered by Cloud Scheduler
- Purpose: Processes blog entries to generate summaries and embeddings
- Technology:
- OpenAI API for summarization (GPT-4o-mini)
- OpenAI Embeddings API (text-embedding-3-small) for vectorization
- DuckDB with VSS extension for vector storage
- Features: Async processing, supports both local and GCS storage
- Deployment: Cloud Run Job
- Purpose: Provides REST API for blog recommendations
- Technology:
- FastAPI for REST API
- DuckDB VSS for vector similarity search
- HNSW index with cosine similarity
- Features:
- Real-time search with natural language queries
- Complete metadata retrieval without BigQuery dependency
- CLI tool with multiple output formats
- Deployment: Cloud Run Service with auto-scaling
- Python 3.13+
- uv (Python package manager)
- Task (Taskfile)
- Google Cloud SDK (for deployment)
- Clone the repository
git clone https://github.com/uu64/nimbus.git
cd nimbus
- Set up each service
# Blog Fetcher
cd blog_fetcher
task install
task test
# Blog Preprocessor
cd ../blog_preprocessor
task install
cp .env.local.example .env.local # Configure your OpenAI API key
task test
# Blog Recommender
cd ../blog_recommender
task install
task test
- Run the services
# In separate terminals:
# Terminal 1: Run blog fetcher (one-time)
cd blog_fetcher
task run
# Terminal 2: Run blog preprocessor (one-time)
cd blog_preprocessor
task run
# Terminal 3: Run blog recommender API
cd blog_recommender
task run
- Use the CLI tool
cd blog_recommender
# Search for blogs
uv run nimbus-cli search "Kubernetes security best practices"
# Interactive mode
uv run nimbus-cli interactive
# Check API health
uv run nimbus-cli health
# Search with natural language query
nimbus-cli search "Python async programming tips"
# Search with options
nimbus-cli search "Docker tutorials" --limit 5 --days 30
# Get different output formats
nimbus-cli search "React hooks" --format json
nimbus-cli search "Vue.js" --format simple
# Open first result in browser
nimbus-cli search "TypeScript" --open
# Get blog details
nimbus-cli detail <blog-id>
nimbus-cli interactive
# In interactive mode:
> Kubernetes security # Search
> 1 # Show details for result #1
> open 2 # Open result #2 in browser
> help # Show available commands
> exit # Exit
Each service uses environment variables for configuration:
- blog_fetcher: Uses
ENV
variable (local/production) - blog_preprocessor: Requires
.env
file with OpenAI API key - blog_recommender: Optional
NIMBUS_API_URL
for CLI tool
blog_fetcher
→ BigQuery (feed entries)- BigQuery →
blog_preprocessor
→ DuckDB (embeddings + metadata) - DuckDB →
blog_recommender
→ REST API/CLI
- Create a GCP project
- Enable required APIs: Cloud Run, BigQuery, Cloud Storage, Cloud Scheduler
- Set up service accounts with appropriate permissions
# Deploy each service
cd blog_fetcher && task deploy
cd blog_preprocessor && task deploy
cd blog_recommender && task deploy
- blog_fetcher: Schedule with Cloud Scheduler (e.g., daily)
- blog_preprocessor: Schedule after blog_fetcher completes
- Languages: Python 3.13+
- Frameworks: FastAPI, Typer, Rich
- Databases: BigQuery, DuckDB with VSS
- AI/ML: OpenAI API (GPT-4o-mini, text-embedding-3-small)
- Cloud: Google Cloud Platform (Cloud Run, BigQuery, Cloud Storage)
- Tools: uv, Task, pytest, ruff
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
task test
- Format code:
task fmt
- Submit a pull request
Nimbus: Keeping you updated with the latest in tech, effortlessly.