Skip to content

Conversation

borisarzentar
Copy link
Member

Description

Resolve issue with .venv being broken when using docker compose with Cognee

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.


Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

<!-- .github/pull_request_template.md -->

## Description
Resolve issue with .venv being broken when using docker compose with
Cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
@borisarzentar borisarzentar self-assigned this Apr 23, 2025
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

1 similar comment
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Contributor

coderabbitai bot commented Apr 23, 2025

Walkthrough

This update introduces several changes across the project. Two new Alembic migration scripts are added: one for initial database creation and another for inserting a default user. The Dockerfile is revised to clarify and correct package installation steps, and the build context is reorganized to explicitly include migration files and a new entrypoint script. The entrypoint script is introduced to manage database migrations and conditional debug server startup. Documentation is updated with development instructions for using a local build of the cognee library. Additionally, the descriptions of some input properties in the server schema are clarified as optional.

Changes

File(s) Change Summary
alembic/versions/482cd6517ce4_add_default_user.py, alembic/versions/8057ae7329c2_initial_migration.py Added Alembic migration scripts for initial database setup and default user insertion, with upgrade/downgrade logic.
Dockerfile, cognee-mcp/Dockerfile Refactored Dockerfile(s) for clearer package installation, explicit build context, and inclusion of migration files.
cognee-mcp/entrypoint.sh Added a new entrypoint script handling Alembic migrations, error handling, and conditional debug server startup.
cognee-mcp/README.md Updated documentation with instructions for using a local build of the cognee library in development.
cognee-mcp/src/server.py Updated schema property descriptions to indicate optional fields.

Sequence Diagram(s)

sequenceDiagram
    participant EntrypointScript as entrypoint.sh
    participant Alembic as Alembic (DB migration)
    participant Server as Cognee MCP Server

    EntrypointScript->>Alembic: Run migrations
    Alembic-->>EntrypointScript: Migration result (success or handled error)
    EntrypointScript->>EntrypointScript: Wait 2 seconds
    EntrypointScript->>Server: Start server (with or without debugger)
Loading

Possibly related PRs

Suggested labels

run-checks

Poem

A hop, a skip, a Docker build anew,
With Alembic scripts and entrypoints too!
The server now starts with a cautious glance,
Handling migrations, giving code a chance.
Docs for devs, and schema hints so clear—
This rabbit’s proud to see such progress here!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
Dockerfile (1)

28-32: 🛠️ Refactor suggestion

Consider combining apt-get update and install in the same RUN instruction.

Separating apt-get update from apt-get install can lead to inconsistency issues if the Docker build cache is used, as you might end up installing packages from an outdated package list. While this change fixes the previous syntax, it introduces a potential problem.

-RUN apt-get update
-
-RUN apt-get install -y \
-  gcc \
-  libpq-dev
+RUN apt-get update && apt-get install -y \
+  gcc \
+  libpq-dev
🧹 Nitpick comments (12)
cognee-mcp/README.md (1)

88-105: Development section instructions look good.

The new development section provides clear steps for using a local build of the cognee library. The instructions are comprehensive and will improve the developer experience when working with local changes.

Consider these minor improvements:

-In order to use local cognee build, run in root of the cognee repo:
+To use a local cognee build, run in root of the cognee repo:

-After that add the following snippet to the same file (`cognee-mcp/pyproject.toml`).
+After that, add the following snippet to the same file (`cognee-mcp/pyproject.toml`).
🧰 Tools
🪛 LanguageTool

[style] ~88-~88: Consider a shorter alternative to avoid wordiness.
Context: ...mcp dev src/server.py` ### Development In order to use local cognee build, run in root of ...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~101-~101: A comma might be missing here.
Context: ...codegraph,gemini,huggingface] ``` After that add the following snippet to the same f...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

alembic/versions/8057ae7329c2_initial_migration.py (1)

1-27: Initial migration implementation looks correct.

The migration properly handles database creation and deletion using the sqlalchemy engine. The use of await_only correctly bridges the gap between the synchronous Alembic framework and asynchronous database operations.

Consider adding minimal error handling and logging for better diagnostics during migration:

 def upgrade() -> None:
     db_engine = get_relational_engine()
-    await_only(db_engine.create_database())
+    try:
+        await_only(db_engine.create_database())
+        print("Database created successfully")
+    except Exception as e:
+        print(f"Error creating database: {e}")
+        raise


 def downgrade() -> None:
     db_engine = get_relational_engine()
-    await_only(db_engine.delete_database())
+    try:
+        await_only(db_engine.delete_database())
+        print("Database deleted successfully")
+    except Exception as e:
+        print(f"Error deleting database: {e}")
+        raise
alembic/versions/482cd6517ce4_add_default_user.py (3)

16-20: Potential inconsistency in revision dependencies

I noticed that both down_revision and depends_on are set to the same value "8057ae7329c2". Typically, down_revision is used to specify the previous revision in a linear migration path, while depends_on is used for non-linear dependencies. Having both set to the same value is redundant.

- down_revision: Union[str, None] = "8057ae7329c2"
- depends_on: Union[str, Sequence[str], None] = "8057ae7329c2"
+ down_revision: Union[str, None] = "8057ae7329c2"
+ depends_on: Union[str, Sequence[str], None] = None

27-28: Consider adding error handling for user deletion

The downgrade function should handle the case where the user doesn't exist.

def downgrade() -> None:
-    await_only(delete_user("default_user@example.com"))
+    try:
+        await_only(delete_user("default_user@example.com"))
+    except Exception as e:
+        if "UserNotFound" in str(e) or "User not found" in str(e):
+            # User might have been manually deleted, this is fine
+            pass
+        else:
+            # Re-raise other exceptions
+            raise

13-13: Consider using a constant for the default email

The default email "default_user@example.com" is used in the downgrade function but not defined as a constant, creating potential for inconsistency with the create_default_user function.

from cognee.modules.users.methods import create_default_user, delete_user
+
+# This should match the default in create_default_user
+DEFAULT_USER_EMAIL = "default_user@example.com"

# ...

def downgrade() -> None:
-    await_only(delete_user("default_user@example.com"))
+    await_only(delete_user(DEFAULT_USER_EMAIL))
cognee-mcp/entrypoint.sh (4)

3-3: The script uses set -e without set -o pipefail

While set -e ensures the script exits on error, it doesn't catch errors in pipelines. This could lead to silent failures in piped commands.

- set -e  # Exit on error
+ set -e      # Exit on error
+ set -o pipefail  # Exit on error in pipeline

16-24: Consider additional retries for database migrations

The script handles the case where the default user already exists but doesn't implement retries for other temporary database connection issues that might occur during migration.

Consider adding a retry mechanism for other transient errors:

- MIGRATION_OUTPUT=$(uv run alembic upgrade head 2>&1) || {
-     if [[ $MIGRATION_OUTPUT == *"UserAlreadyExists"* ]] || [[ $MIGRATION_OUTPUT == *"User default_user@example.com already exists"* ]]; then
-         echo "Warning: Default user already exists, continuing startup..."
-     else
-         echo "Migration failed with unexpected error:"
-         echo "$MIGRATION_OUTPUT"
-         exit 1
-     fi
- }
+ # Initialize retry counter
+ RETRY_COUNT=0
+ MAX_RETRIES=3
+ 
+ while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+     MIGRATION_OUTPUT=$(uv run alembic upgrade head 2>&1) && break
+     
+     # Check for known acceptable errors
+     if [[ $MIGRATION_OUTPUT == *"UserAlreadyExists"* ]] || [[ $MIGRATION_OUTPUT == *"User default_user@example.com already exists"* ]]; then
+         echo "Warning: Default user already exists, continuing startup..."
+         break
+     # Check for connection errors that might be temporary
+     elif [[ $MIGRATION_OUTPUT == *"connection"* ]] || [[ $MIGRATION_OUTPUT == *"timeout"* ]]; then
+         RETRY_COUNT=$((RETRY_COUNT+1))
+         echo "Database connection issue. Retry $RETRY_COUNT of $MAX_RETRIES in 5 seconds..."
+         sleep 5
+     else
+         echo "Migration failed with unexpected error:"
+         echo "$MIGRATION_OUTPUT"
+         exit 1
+     fi
+ done
+ 
+ if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
+     echo "Failed to run migrations after $MAX_RETRIES retries. Last error:"
+     echo "$MIGRATION_OUTPUT"
+     exit 1
+ fi

31-41: Consider checking for required environment variables

The script doesn't validate if required environment variables are set before starting the application.

Add environment variable validation:

+ # Check for required environment variables
+ REQUIRED_VARS=("DATABASE_URL" "SECRET_KEY")
+ MISSING_VARS=false
+ 
+ for VAR in "${REQUIRED_VARS[@]}"; do
+     if [ -z "${!VAR}" ]; then
+         echo "Error: Required environment variable $VAR is not set"
+         MISSING_VARS=true
+     fi
+ done
+ 
+ if [ "$MISSING_VARS" = true ]; then
+     exit 1
+ fi
+
# Modified Gunicorn startup with error handling
if [ "$ENVIRONMENT" = "dev" ] || [ "$ENVIRONMENT" = "local" ]; then
    if [ "$DEBUG" = "true" ]; then

32-41: Redundant exec cognee call for production environment

The script has duplicate exec cognee statements for the production path.

if [ "$ENVIRONMENT" = "dev" ] || [ "$ENVIRONMENT" = "local" ]; then
    if [ "$DEBUG" = "true" ]; then
        echo "Waiting for the debugger to attach..."
        exec python -m debugpy --wait-for-client --listen 0.0.0.0:5678 -m cognee
    else
        exec cognee
    fi
else
-    exec cognee
+    # Production mode
+    exec cognee
fi
cognee-mcp/Dockerfile (3)

25-25: Multiple files copied to the same destination

The COPY command is copying multiple files to the same destination (./), which can be confusing and error-prone.

Split this into separate COPY commands for clarity:

- COPY ./cognee-mcp/pyproject.toml ./cognee-mcp/uv.lock ./cognee-mcp/entrypoint.sh ./
+ COPY ./cognee-mcp/pyproject.toml ./pyproject.toml
+ COPY ./cognee-mcp/uv.lock ./uv.lock
+ COPY ./cognee-mcp/entrypoint.sh ./entrypoint.sh

34-36: Consider using relative paths for Alembic configuration

The Alembic configuration is copied using absolute paths, which might cause issues if the directory structure changes.

- # Copy Alembic configuration
- COPY alembic.ini /app/alembic.ini
- COPY alembic/ /app/alembic
+ # Copy Alembic configuration
+ COPY alembic.ini ./alembic.ini
+ COPY alembic/ ./alembic

20-22: Consider using multi-stage build for system dependencies

The current Dockerfile installs system dependencies in the build stage, but these might also be needed in the final image.

To ensure consistency, consider installing the same dependencies in both stages:

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev

# ... (later in the file)

FROM python:3.12-slim-bookworm

WORKDIR /app
+
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y \
+    libpq-dev \
+    && rm -rf /var/lib/apt/lists/*

COPY --from=uv /root/.local /root/.local
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49f2e51 and 98a1b79.

⛔ Files ignored due to path filters (2)
  • .github/workflows/dockerhub-mcp.yml is excluded by !**/*.yml
  • docker-compose.yml is excluded by !**/*.yml
📒 Files selected for processing (7)
  • Dockerfile (1 hunks)
  • alembic/versions/482cd6517ce4_add_default_user.py (1 hunks)
  • alembic/versions/8057ae7329c2_initial_migration.py (1 hunks)
  • cognee-mcp/Dockerfile (3 hunks)
  • cognee-mcp/README.md (1 hunks)
  • cognee-mcp/entrypoint.sh (1 hunks)
  • cognee-mcp/src/server.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
cognee-mcp/entrypoint.sh (2)
alembic/versions/482cd6517ce4_add_default_user.py (1)
  • upgrade (23-24)
alembic/versions/8057ae7329c2_initial_migration.py (1)
  • upgrade (20-22)
alembic/versions/8057ae7329c2_initial_migration.py (3)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-8)
alembic/versions/482cd6517ce4_add_default_user.py (2)
  • upgrade (23-24)
  • downgrade (27-28)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
  • create_database (309-318)
  • delete_database (320-347)
alembic/versions/482cd6517ce4_add_default_user.py (2)
cognee/modules/users/methods/create_default_user.py (1)
  • create_default_user (5-19)
alembic/versions/8057ae7329c2_initial_migration.py (2)
  • upgrade (20-22)
  • downgrade (25-27)
🪛 LanguageTool
cognee-mcp/README.md

[style] ~88-~88: Consider a shorter alternative to avoid wordiness.
Context: ...mcp dev src/server.py` ### Development In order to use local cognee build, run in root of ...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~101-~101: A comma might be missing here.
Context: ...codegraph,gemini,huggingface] ``` After that add the following snippet to the same f...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: End-to-End Tests / Deletion Test
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: Basic Tests / Run Basic Graph Tests
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: Publish Cognee Docker image
🔇 Additional comments (3)
cognee-mcp/src/server.py (1)

39-43: Clear field descriptions improve usability.

Adding "(Optional)" to the description fields provides better clarity to API users about which parameters they must provide. This is a good documentation improvement.

cognee-mcp/Dockerfile (2)

51-51: LGTM: Making the entrypoint script executable

Making the entrypoint script executable is a good practice. This ensures the script can be executed without relying on the shell's interpretation.


63-63: LGTM: Using a script as entrypoint

Using a script as the container entrypoint is a good practice. It allows for more complex initialization logic like running migrations and checking database readiness.

@borisarzentar borisarzentar merged commit b9a7c53 into main Apr 23, 2025
62 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants