-
Notifications
You must be signed in to change notification settings - Fork 587
fix: run cognee in Docker [COG-1961] (#775) #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<!-- .github/pull_request_template.md --> ## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
Please make sure all the checkboxes are checked:
|
1 similar comment
Please make sure all the checkboxes are checked:
|
WalkthroughThis update introduces several changes across the project. Two new Alembic migration scripts are added: one for initial database creation and another for inserting a default user. The Dockerfile is revised to clarify and correct package installation steps, and the build context is reorganized to explicitly include migration files and a new entrypoint script. The entrypoint script is introduced to manage database migrations and conditional debug server startup. Documentation is updated with development instructions for using a local build of the Changes
Sequence Diagram(s)sequenceDiagram
participant EntrypointScript as entrypoint.sh
participant Alembic as Alembic (DB migration)
participant Server as Cognee MCP Server
EntrypointScript->>Alembic: Run migrations
Alembic-->>EntrypointScript: Migration result (success or handled error)
EntrypointScript->>EntrypointScript: Wait 2 seconds
EntrypointScript->>Server: Start server (with or without debugger)
Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
Dockerfile (1)
28-32
: 🛠️ Refactor suggestionConsider combining apt-get update and install in the same RUN instruction.
Separating
apt-get update
fromapt-get install
can lead to inconsistency issues if the Docker build cache is used, as you might end up installing packages from an outdated package list. While this change fixes the previous syntax, it introduces a potential problem.-RUN apt-get update - -RUN apt-get install -y \ - gcc \ - libpq-dev +RUN apt-get update && apt-get install -y \ + gcc \ + libpq-dev
🧹 Nitpick comments (12)
cognee-mcp/README.md (1)
88-105
: Development section instructions look good.The new development section provides clear steps for using a local build of the cognee library. The instructions are comprehensive and will improve the developer experience when working with local changes.
Consider these minor improvements:
-In order to use local cognee build, run in root of the cognee repo: +To use a local cognee build, run in root of the cognee repo: -After that add the following snippet to the same file (`cognee-mcp/pyproject.toml`). +After that, add the following snippet to the same file (`cognee-mcp/pyproject.toml`).🧰 Tools
🪛 LanguageTool
[style] ~88-~88: Consider a shorter alternative to avoid wordiness.
Context: ...mcp dev src/server.py` ### Development In order to use local cognee build, run in root of ...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~101-~101: A comma might be missing here.
Context: ...codegraph,gemini,huggingface] ``` After that add the following snippet to the same f...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
alembic/versions/8057ae7329c2_initial_migration.py (1)
1-27
: Initial migration implementation looks correct.The migration properly handles database creation and deletion using the sqlalchemy engine. The use of
await_only
correctly bridges the gap between the synchronous Alembic framework and asynchronous database operations.Consider adding minimal error handling and logging for better diagnostics during migration:
def upgrade() -> None: db_engine = get_relational_engine() - await_only(db_engine.create_database()) + try: + await_only(db_engine.create_database()) + print("Database created successfully") + except Exception as e: + print(f"Error creating database: {e}") + raise def downgrade() -> None: db_engine = get_relational_engine() - await_only(db_engine.delete_database()) + try: + await_only(db_engine.delete_database()) + print("Database deleted successfully") + except Exception as e: + print(f"Error deleting database: {e}") + raisealembic/versions/482cd6517ce4_add_default_user.py (3)
16-20
: Potential inconsistency in revision dependenciesI noticed that both
down_revision
anddepends_on
are set to the same value "8057ae7329c2". Typically,down_revision
is used to specify the previous revision in a linear migration path, whiledepends_on
is used for non-linear dependencies. Having both set to the same value is redundant.- down_revision: Union[str, None] = "8057ae7329c2" - depends_on: Union[str, Sequence[str], None] = "8057ae7329c2" + down_revision: Union[str, None] = "8057ae7329c2" + depends_on: Union[str, Sequence[str], None] = None
27-28
: Consider adding error handling for user deletionThe downgrade function should handle the case where the user doesn't exist.
def downgrade() -> None: - await_only(delete_user("default_user@example.com")) + try: + await_only(delete_user("default_user@example.com")) + except Exception as e: + if "UserNotFound" in str(e) or "User not found" in str(e): + # User might have been manually deleted, this is fine + pass + else: + # Re-raise other exceptions + raise
13-13
: Consider using a constant for the default emailThe default email "default_user@example.com" is used in the downgrade function but not defined as a constant, creating potential for inconsistency with the create_default_user function.
from cognee.modules.users.methods import create_default_user, delete_user + +# This should match the default in create_default_user +DEFAULT_USER_EMAIL = "default_user@example.com" # ... def downgrade() -> None: - await_only(delete_user("default_user@example.com")) + await_only(delete_user(DEFAULT_USER_EMAIL))cognee-mcp/entrypoint.sh (4)
3-3
: The script usesset -e
withoutset -o pipefail
While
set -e
ensures the script exits on error, it doesn't catch errors in pipelines. This could lead to silent failures in piped commands.- set -e # Exit on error + set -e # Exit on error + set -o pipefail # Exit on error in pipeline
16-24
: Consider additional retries for database migrationsThe script handles the case where the default user already exists but doesn't implement retries for other temporary database connection issues that might occur during migration.
Consider adding a retry mechanism for other transient errors:
- MIGRATION_OUTPUT=$(uv run alembic upgrade head 2>&1) || { - if [[ $MIGRATION_OUTPUT == *"UserAlreadyExists"* ]] || [[ $MIGRATION_OUTPUT == *"User default_user@example.com already exists"* ]]; then - echo "Warning: Default user already exists, continuing startup..." - else - echo "Migration failed with unexpected error:" - echo "$MIGRATION_OUTPUT" - exit 1 - fi - } + # Initialize retry counter + RETRY_COUNT=0 + MAX_RETRIES=3 + + while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do + MIGRATION_OUTPUT=$(uv run alembic upgrade head 2>&1) && break + + # Check for known acceptable errors + if [[ $MIGRATION_OUTPUT == *"UserAlreadyExists"* ]] || [[ $MIGRATION_OUTPUT == *"User default_user@example.com already exists"* ]]; then + echo "Warning: Default user already exists, continuing startup..." + break + # Check for connection errors that might be temporary + elif [[ $MIGRATION_OUTPUT == *"connection"* ]] || [[ $MIGRATION_OUTPUT == *"timeout"* ]]; then + RETRY_COUNT=$((RETRY_COUNT+1)) + echo "Database connection issue. Retry $RETRY_COUNT of $MAX_RETRIES in 5 seconds..." + sleep 5 + else + echo "Migration failed with unexpected error:" + echo "$MIGRATION_OUTPUT" + exit 1 + fi + done + + if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then + echo "Failed to run migrations after $MAX_RETRIES retries. Last error:" + echo "$MIGRATION_OUTPUT" + exit 1 + fi
31-41
: Consider checking for required environment variablesThe script doesn't validate if required environment variables are set before starting the application.
Add environment variable validation:
+ # Check for required environment variables + REQUIRED_VARS=("DATABASE_URL" "SECRET_KEY") + MISSING_VARS=false + + for VAR in "${REQUIRED_VARS[@]}"; do + if [ -z "${!VAR}" ]; then + echo "Error: Required environment variable $VAR is not set" + MISSING_VARS=true + fi + done + + if [ "$MISSING_VARS" = true ]; then + exit 1 + fi + # Modified Gunicorn startup with error handling if [ "$ENVIRONMENT" = "dev" ] || [ "$ENVIRONMENT" = "local" ]; then if [ "$DEBUG" = "true" ]; then
32-41
: Redundant exec cognee call for production environmentThe script has duplicate
exec cognee
statements for the production path.if [ "$ENVIRONMENT" = "dev" ] || [ "$ENVIRONMENT" = "local" ]; then if [ "$DEBUG" = "true" ]; then echo "Waiting for the debugger to attach..." exec python -m debugpy --wait-for-client --listen 0.0.0.0:5678 -m cognee else exec cognee fi else - exec cognee + # Production mode + exec cognee ficognee-mcp/Dockerfile (3)
25-25
: Multiple files copied to the same destinationThe COPY command is copying multiple files to the same destination (./), which can be confusing and error-prone.
Split this into separate COPY commands for clarity:
- COPY ./cognee-mcp/pyproject.toml ./cognee-mcp/uv.lock ./cognee-mcp/entrypoint.sh ./ + COPY ./cognee-mcp/pyproject.toml ./pyproject.toml + COPY ./cognee-mcp/uv.lock ./uv.lock + COPY ./cognee-mcp/entrypoint.sh ./entrypoint.sh
34-36
: Consider using relative paths for Alembic configurationThe Alembic configuration is copied using absolute paths, which might cause issues if the directory structure changes.
- # Copy Alembic configuration - COPY alembic.ini /app/alembic.ini - COPY alembic/ /app/alembic + # Copy Alembic configuration + COPY alembic.ini ./alembic.ini + COPY alembic/ ./alembic
20-22
: Consider using multi-stage build for system dependenciesThe current Dockerfile installs system dependencies in the build stage, but these might also be needed in the final image.
To ensure consistency, consider installing the same dependencies in both stages:
# Install system dependencies RUN apt-get update && apt-get install -y \ gcc \ libpq-dev # ... (later in the file) FROM python:3.12-slim-bookworm WORKDIR /app + +# Install runtime dependencies +RUN apt-get update && apt-get install -y \ + libpq-dev \ + && rm -rf /var/lib/apt/lists/* COPY --from=uv /root/.local /root/.local
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.github/workflows/dockerhub-mcp.yml
is excluded by!**/*.yml
docker-compose.yml
is excluded by!**/*.yml
📒 Files selected for processing (7)
Dockerfile
(1 hunks)alembic/versions/482cd6517ce4_add_default_user.py
(1 hunks)alembic/versions/8057ae7329c2_initial_migration.py
(1 hunks)cognee-mcp/Dockerfile
(3 hunks)cognee-mcp/README.md
(1 hunks)cognee-mcp/entrypoint.sh
(1 hunks)cognee-mcp/src/server.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
cognee-mcp/entrypoint.sh (2)
alembic/versions/482cd6517ce4_add_default_user.py (1)
upgrade
(23-24)alembic/versions/8057ae7329c2_initial_migration.py (1)
upgrade
(20-22)
alembic/versions/8057ae7329c2_initial_migration.py (3)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
get_relational_engine
(5-8)alembic/versions/482cd6517ce4_add_default_user.py (2)
upgrade
(23-24)downgrade
(27-28)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
create_database
(309-318)delete_database
(320-347)
alembic/versions/482cd6517ce4_add_default_user.py (2)
cognee/modules/users/methods/create_default_user.py (1)
create_default_user
(5-19)alembic/versions/8057ae7329c2_initial_migration.py (2)
upgrade
(20-22)downgrade
(25-27)
🪛 LanguageTool
cognee-mcp/README.md
[style] ~88-~88: Consider a shorter alternative to avoid wordiness.
Context: ...mcp dev src/server.py` ### Development In order to use local cognee build, run in root of ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~101-~101: A comma might be missing here.
Context: ...codegraph,gemini,huggingface] ``` After that add the following snippet to the same f...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Deletion Test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Basic Tests / Run Basic Graph Tests
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Publish Cognee Docker image
🔇 Additional comments (3)
cognee-mcp/src/server.py (1)
39-43
: Clear field descriptions improve usability.Adding "(Optional)" to the description fields provides better clarity to API users about which parameters they must provide. This is a good documentation improvement.
cognee-mcp/Dockerfile (2)
51-51
: LGTM: Making the entrypoint script executableMaking the entrypoint script executable is a good practice. This ensures the script can be executed without relying on the shell's interpretation.
63-63
: LGTM: Using a script as entrypointUsing a script as the container entrypoint is a good practice. It allows for more complex initialization logic like running migrations and checking database readiness.
Description
Resolve issue with .venv being broken when using docker compose with Cognee
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.