Skip to content

uu64/nimbus

Repository files navigation

nimbus

Nimbus is a blog recommendation system designed to help users stay updated with the latest insights in technology. It aggregates blog updates from various sources, processes the data, and provides personalized recommendations through natural language queries.

Features

  • Automated Blog Fetching: Periodically collects blog updates from RSS/Atom feeds
  • Content Processing: Generates summaries and embeddings using OpenAI API
  • Vector Search: Fast similarity search using DuckDB with VSS extension
  • REST API: Real-time blog recommendations based on natural language queries
  • CLI Tool: Command-line interface for searching blogs
  • Interactive Mode: User-friendly interface with search history

Architecture Overview

Nimbus is designed as a serverless and cost-effective system deployed on Google Cloud Platform (GCP). The system consists of three main services:

1. Blog Fetcher

  • Purpose: Fetches blog updates from RSS/Atom feeds
  • Technology: Python with feedparser, BeautifulSoup
  • Storage: BigQuery for structured data storage
  • Deployment: Cloud Run Job triggered by Cloud Scheduler

2. Blog Preprocessor

  • Purpose: Processes blog entries to generate summaries and embeddings
  • Technology:
    • OpenAI API for summarization (GPT-4o-mini)
    • OpenAI Embeddings API (text-embedding-3-small) for vectorization
    • DuckDB with VSS extension for vector storage
  • Features: Async processing, supports both local and GCS storage
  • Deployment: Cloud Run Job

3. Blog Recommender

  • Purpose: Provides REST API for blog recommendations
  • Technology:
    • FastAPI for REST API
    • DuckDB VSS for vector similarity search
    • HNSW index with cosine similarity
  • Features:
    • Real-time search with natural language queries
    • Complete metadata retrieval without BigQuery dependency
    • CLI tool with multiple output formats
  • Deployment: Cloud Run Service with auto-scaling

Quick Start

Prerequisites

  • Python 3.13+
  • uv (Python package manager)
  • Task (Taskfile)
  • Google Cloud SDK (for deployment)

Local Development

  1. Clone the repository
git clone https://github.com/uu64/nimbus.git
cd nimbus
  1. Set up each service
# Blog Fetcher
cd blog_fetcher
task install
task test

# Blog Preprocessor
cd ../blog_preprocessor
task install
cp .env.local.example .env.local  # Configure your OpenAI API key
task test

# Blog Recommender
cd ../blog_recommender
task install
task test
  1. Run the services
# In separate terminals:

# Terminal 1: Run blog fetcher (one-time)
cd blog_fetcher
task run

# Terminal 2: Run blog preprocessor (one-time)
cd blog_preprocessor
task run

# Terminal 3: Run blog recommender API
cd blog_recommender
task run
  1. Use the CLI tool
cd blog_recommender

# Search for blogs
uv run nimbus-cli search "Kubernetes security best practices"

# Interactive mode
uv run nimbus-cli interactive

# Check API health
uv run nimbus-cli health

CLI Usage

Basic Commands

# Search with natural language query
nimbus-cli search "Python async programming tips"

# Search with options
nimbus-cli search "Docker tutorials" --limit 5 --days 30

# Get different output formats
nimbus-cli search "React hooks" --format json
nimbus-cli search "Vue.js" --format simple

# Open first result in browser
nimbus-cli search "TypeScript" --open

# Get blog details
nimbus-cli detail <blog-id>

Interactive Mode

nimbus-cli interactive

# In interactive mode:
> Kubernetes security          # Search
> 1                           # Show details for result #1
> open 2                      # Open result #2 in browser
> help                        # Show available commands
> exit                        # Exit

Configuration

Environment Variables

Each service uses environment variables for configuration:

  • blog_fetcher: Uses ENV variable (local/production)
  • blog_preprocessor: Requires .env file with OpenAI API key
  • blog_recommender: Optional NIMBUS_API_URL for CLI tool

Data Flow

  1. blog_fetcher → BigQuery (feed entries)
  2. BigQuery → blog_preprocessor → DuckDB (embeddings + metadata)
  3. DuckDB → blog_recommender → REST API/CLI

Deployment

Google Cloud Platform Setup

  1. Create a GCP project
  2. Enable required APIs: Cloud Run, BigQuery, Cloud Storage, Cloud Scheduler
  3. Set up service accounts with appropriate permissions

Deploy Services

# Deploy each service
cd blog_fetcher && task deploy
cd blog_preprocessor && task deploy
cd blog_recommender && task deploy

Schedule Jobs

  • blog_fetcher: Schedule with Cloud Scheduler (e.g., daily)
  • blog_preprocessor: Schedule after blog_fetcher completes

Technology Stack

  • Languages: Python 3.13+
  • Frameworks: FastAPI, Typer, Rich
  • Databases: BigQuery, DuckDB with VSS
  • AI/ML: OpenAI API (GPT-4o-mini, text-embedding-3-small)
  • Cloud: Google Cloud Platform (Cloud Run, BigQuery, Cloud Storage)
  • Tools: uv, Task, pytest, ruff

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: task test
  5. Format code: task fmt
  6. Submit a pull request

License

MIT License


Nimbus: Keeping you updated with the latest in tech, effortlessly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published