A Retrieval-Augmented Generation (RAG) system combines information retrieval with language generation. It works by searching a collection of documents to find relevant content for a user's question, then uses that information to generate a helpful answer.
In this project, the RAG system reads a PDF document, retrieves the most relevant sections based on your query, and presents the answer in a user-friendly web app.
See deployed app version here: https://compliance-rag.streamlit.app/
I used the Cursor agent on Auto model select to help build this project.
Benefits of automating this process include:
- Faster information retrieval: Instantly find relevant content without manual searching.
- Improved accuracy: Reduces the risk of missing important details in large documents.
- Consistency: Ensures the same process is followed every time, minimizing human error.
- Reduced Hallucination: The RAG system only returns information that actually exists in your documents, minimizing the risk of generating made-up or inaccurate answers. This means you get reliable, document-grounded responses instead of AI "hallucinations."
- Time savings: Frees up time for more valuable tasks by automating repetitive lookups.
- Scalability: Easily handles multiple or large documents as your needs grow.
- Checklist Tracking: The system can help maintain a checklist of compliance tasks or requirements, marking off what has already been completed. This ensures nothing is overlooked and provides a clear record of progress.
- Completely Free - No API costs or cloud services required
- Local Processing - Everything runs on your machine
- Simple Setup - Minimal dependencies and configuration
- Persistent Storage - Documents are saved locally
- Python 3.8+
- pip package manager
- virtual environment
- Install dependencies:
pip install -r requirements.txt
- Create sample pdf:
python scripts/create_sample_pdf.py
- Run unit tests (optional):
python -m unittest tests.test_create_sample_pdf -v
- Run the demo app:
streamlit run rag_system/web_app.py
rag-tutorial/
├── rag_system/
│ ├── core.py # RAG core logic
│ ├── utils.py # RAG utilities
│ └── web_app.py # Streamlit web app
├── scripts/
│ └── create_sample_pdf.py # Script to generate sample PDF for testing
├── tests/
│ └── test_create_sample_pdf.py # Unit tests for PDF creation
├── data/
│ └── documents/ # Folder for PDF documents
├── requirements.txt # Python dependencies
├── README.md # This file
└── ARCHITECTURE.md # Project architecture
The project includes unit tests to verify that the PDF creation functionality works correctly:
- Test PDF creation: Verifies that the PDF file is generated successfully
- Test content verification: Ensures that expected content appears in the generated PDF
- Test multiple content items: Validates that various sections are included
- Test page count: Confirms the PDF has the expected number of pages
Run all tests with:
python -m unittest tests.test_create_sample_pdf -v