Skip to content

Conversation

borisarzentar
Copy link
Member

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Vasilije1990 and others added 30 commits April 18, 2025 16:31
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with .venv being broken when using docker compose with
Cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
… 1947 (#760)

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
<!-- .github/pull_request_template.md -->

## Description
Add support for UV and for Poetry package management

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Switch typing from str to UUID for NetworkX node_id

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Add both sse and stdio support for Cognee MCP

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782)

<!-- .github/pull_request_template.md -->

## Description
Add log handling options for cognee exceptions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Fix issue with failing versions gh actions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->

## Description
This PR adds support for the Memgraph graph database following the
[graph database integration
guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration):
- Implemented `MemgraphAdapter` for interfacing with Memgraph.
- Updated `get_graph_engine.py` to return MemgraphAdapter when
appropriate.
- Added a test script:` test_memgraph.py.`
- Created a dedicated test workflow:
`.github/workflows/test_memgraph.yml.`

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md -->

## Description
refactor: Handle boto3 s3fs dependencies better

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Update LanceDB and rewrite data points to run async

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
As discussed with @hande-k and Lazar, I've created a short demo to
illustrate how to get the pagerank rankings from the knowledge graph
given the nx engine. This is a POC, and a first of step towards solving
#643 .

Please let me know what you think, and how to proceed from here. :)

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <boris@topoteretes.com>
Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md -->

## Description
Added tools to check current cognify and codify status

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
dexters1 and others added 8 commits May 13, 2025 12:12
<!-- .github/pull_request_template.md -->

## Description
Move CollectionNotFoundError logging to be at Debug level

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
@borisarzentar borisarzentar self-assigned this May 15, 2025
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Contributor

coderabbitai bot commented May 15, 2025

Walkthrough

This update introduces broad architectural, interface, and usability changes to the Cognee codebase. It adds Memgraph support, refactors pipeline and task execution with context propagation, modularizes observability, and standardizes dataset identification. Numerous adapters are updated for unified search and error handling. The MCP server is overhauled, and documentation and UI are improved.

Changes

File(s) / Group Change Summary
Dockerfile, entrypoint.sh Refactored Dockerfile to multi-stage build using uv for dependency management; entrypoint script now runs Alembic migrations directly and updates server startup commands.
README.md, assets/graph_visualization.html, notebooks/github_graph_visualization.html, notebooks/hr_demo.ipynb Updated documentation to introduce cognee UI, changed Colab link, and removed graph visualization assets and HR demo notebook.
cognee-mcp/src/server.py Refactored MCP server to use FastMCP, modular async tool registration, async task launching, and added status tools.
cognee/infrastructure/databases/graph/get_graph_engine.py, cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py Added Memgraph graph database support with new adapter and engine factory branch.
cognee/infrastructure/databases/graph/networkx/adapter.py Updated method signatures to use UUIDs for node identifiers.
cognee/infrastructure/databases/vector/ (ChromaDBAdapter.py, LanceDBAdapter.py, MilvusAdapter.py, PGVectorAdapter.py, QDrantAdapter.py, WeaviateAdapter.py, exceptions/exceptions.py) Centralized collection retrieval with existence checks, standardized search method, removed deprecated distance methods, improved error handling, updated type annotations, and made adapters fully async where needed.
cognee/modules/observability/get_observe.py, observers.py, base_config.py, infrastructure/llm/openai/adapter.py, infrastructure/llm/gemini/adapter.py Modularized observability with new Observer enum and dynamic decorator retrieval.
cognee/modules/data/methods/ (create_dataset.py, get_unique_dataset_id.py, init.py) Standardized dataset creation with unique ID generation based on user and name.
cognee/modules/pipelines/models/PipelineRun.py, operations/init.py, operations/get_pipeline_status.py, operations/log_pipeline_run_initiated.py, operations/pipeline.py Enhanced pipeline run tracking with explicit statuses, logging, and pipeline name filtering.
cognee/modules/pipelines/operations/run_tasks.py, run_tasks_base.py Added context propagation through task execution pipeline and updated function signatures.
cognee/modules/retrieval/exceptions/ (exceptions.py, init.py), graph_completion_retriever.py, utils/brute_force_triplet_search.py** Removed deprecated distance error, updated error handling and search logic for missing collections.
cognee/modules/graph/cognee_graph/CogneeGraph.py Updated vector distance mapping to use new search interface.
cognee/modules/settings/get_settings.py Made LLMConfig fields endpoint and api_version optional.
cognee/shared/data_models.py Removed deprecated MonitoringTool enum.
cognee/shared/logging_utils.py Broadened SQLAlchemy warning suppression to log levels above DEBUG.
cognee/api/v1/ (add/add.py, cognify/code_graph_pipeline.py, cognify/cognify.py, datasets/datasets.py) Reorganized imports, updated pipeline calls for explicit naming, and improved status queries.
cognee/exceptions/exceptions.py Enhanced exception logging flexibility with new parameters.
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py Simplified schema handling in database deletion logic.
cognee/tasks/ingestion/ingest_data.py Updated dataset creation to use user object.
cognee/tasks/temporal_awareness/index_graphiti_objects.py Unified query parameter usage and streamlined data access.
Frontend (cognee-frontend/src/app/page.tsx, wizard/CognifyStep/CognifyStep.tsx, wizard/ExploreStep/ExploreStep.tsx, wizard/WizardPage.tsx, modules/datasets/cognifyDataset.ts, modules/exploration/getExplorationGraphUrl.ts, modules/ingestion/DataView/DataView.tsx, ui/Partials/Explorer/Explorer.tsx, ui/Partials/SearchView/SearchView.tsx, utils/fetch.ts) Refactored dataset identification to use names, enhanced DataView with Cognify action, improved input handling, and updated API URLs.
Examples & Notebooks (examples/data/car_and_tech_companies.txt, python/graphiti_example.py, notebooks/cognee_demo.ipynb, cognee_graphiti_demo.ipynb, cognee_llama_index.ipynb, cognee_simple_demo.ipynb, graphrag_vs_rag.ipynb, llama_index_cognee_integration.ipynb) Added new demo data, updated user handling in examples, standardized notebook metadata, and fixed imports and function signatures.
Tests (cognee/tests/integration/run_toy_tasks/conftest.py, test_memgraph.py, test_neo4j.py, test_weaviate.py, unit/modules/pipelines/run_tasks_test.py, run_tasks_with_context_test.py, retrieval/chunks_retriever_test.py, retrieval/graph_completion_retriever_test.py, retrieval/summaries_retriever_test.py, retrieval/utils/brute_force_triplet_search_test.py) Added Memgraph integration test, updated test logic for new interfaces, removed obsolete tests, and consolidated async test execution.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Pipeline
    participant DB

    User->>Frontend: Add data / Click Cognify
    Frontend->>API: POST /add or /cognify (with dataset name)
    API->>Pipeline: cognee_pipeline(tasks, datasets, user, pipeline_name)
    Pipeline->>DB: Create or fetch dataset (get_unique_dataset_id)
    Pipeline->>Pipeline: Log pipeline run initiated (with status)
    Pipeline->>Pipeline: Check pipeline run status
    alt Not already processing/completed
        Pipeline->>Pipeline: Run tasks (with context)
        Pipeline->>DB: Update pipeline run status
    else Already processing/completed
        Pipeline-->>API: Return early
    end
    API-->>Frontend: Response (status, result)
    Frontend-->>User: Show notification / result
Loading
sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant MCP Server

    User->>Frontend: Submit tool action (e.g., cognify)
    Frontend->>API: Call MCP endpoint
    API->>MCP Server: Invoke tool (e.g., @mcp.tool('cognify'))
    MCP Server->>MCP Server: Launch async task
    MCP Server-->>API: Immediate confirmation (background processing)
    API-->>Frontend: Show status / instructions
Loading

Possibly related PRs

  • topoteretes/cognee#675: Adds and handles a pipeline_name attribute in pipeline run code, directly related to enhanced pipeline run tracking and status filtering in this PR.
  • topoteretes/cognee#788: Propagates an optional context parameter through task execution, which matches the context propagation changes in this PR.
  • topoteretes/cognee#751: Introduces MemgraphAdapter and updates the graph engine factory for Memgraph support, directly related to the Memgraph integration in this PR.

Suggested reviewers

  • borisarzentar

Poem

A hop, a skip, a Memgraph leap,
Pipelines now with context deep.
Datasets named, IDs unique,
Observers watch, adapters sleek.
MCP tools run in the night,
Cognee’s future’s looking bright!
🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad0bb0c and 7ac5761.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🔭 Outside diff range comments (5)
cognee/shared/logging_utils.py (1)

315-325: ⚠️ Potential issue

Reassess broadened warning suppression threshold.

The condition now silences SAWarning for all levels above DEBUG (including INFO), which may hide important warnings. Recommend reverting to suppress only levels above WARNING or refining to specific modules:

-    if log_level > logging.DEBUG:
+    if log_level >= logging.WARNING:
         import warnings
         from sqlalchemy.exc import SAWarning

         warnings.filterwarnings(
             "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.merge_job"
         )
         warnings.filterwarnings(
             "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.load_jobs"
         )

This ensures you don’t unintentionally swallow informative messages.

cognee/exceptions/exceptions.py (1)

38-44: ⚠️ Potential issue

Child exception classes don't forward new logging parameters.

ServiceError and other subclasses don't pass the new log and log_level parameters to the parent constructor, which means the new logging controls won't work for these exception types.

Update the subclass constructors to forward the new parameters:

def __init__(
    self,
    message: str = "Service is unavailable.",
    name: str = "ServiceError",
    status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+   log=True,
+   log_level="ERROR",
):
-   super().__init__(message, name, status_code)
+   super().__init__(message, name, status_code, log, log_level)

This change should be applied to all child exception classes (ServiceError, InvalidValueError, InvalidAttributeError).

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

210-237: ⚠️ Potential issue

closest_items type confusion causes attribute errors

session.execute(query) returns a SQLAlchemy Result.
Iterating over closest_items.all() yields RowMapping objects whose columns are available via index or key access (e.g. row[0] or row.PGVectorDataPoint), not via attributes on the row itself. Therefore vector.id on line 244 will raise AttributeError.

Suggested fix:

-        for vector in closest_items.all():
-            vector_list.append(
-                {
-                    "id": parse_id(str(vector.id)),
-                    "payload": vector.payload,
-                    "_distance": vector.similarity,
-                }
-            )
+        for row in closest_items.all():
+            data_point = row[0]          # PGVectorDataPoint instance
+            similarity = row.similarity  # labelled column
+            vector_list.append(
+                {
+                    "id": parse_id(str(data_point.id)),
+                    "payload": data_point.payload,
+                    "_distance": similarity,
+                }
+            )
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)

155-205: 🛠️ Refactor suggestion

Double-closing the Qdrant client & missing query_vector fallback

  1. await client.close() is invoked inside the try block (line 183) and again in the finally block (line 204).
    While idempotent, this is redundant and obscures intent.

  2. When query_vector is supplied but query_text is also given, the latter is ignored. Clarify precedence or raise ValueError.

Minimal patch:

@@
-            await client.close()
+            # close handled in finally

Optionally document parameter precedence.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

239-241: ⚠️ Potential issue

Exceptions are swallowed, callers receive an empty list

Catching all exceptions and returning [] hides real problems (network issues, authentication, malformed queries, etc.). Let the exception propagate or wrap it in a domain-specific error; otherwise upstream code cannot distinguish “no results” from “search failed”.

-        except Exception as e:
-            logger.error(f"Error in search: {str(e)}")
-            return []
+        except Exception as err:
+            logger.error("Search failed: %s", err, exc_info=True)
+            raise
🧹 Nitpick comments (27)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

332-334: Schema handling changed to use a fixed list instead of dynamic retrieval.

The code now uses a fixed schema list ["public", "public_staging"] instead of dynamically retrieving all schemas with get_schema_list(). This makes the deletion process more predictable but might not handle custom schemas that aren't in this list.

This change aligns with the broader infrastructure updates in the PR to stabilize database schema handling across relational and graph database adapters.

Consider adding a comment explaining why a fixed list is preferred over dynamically retrieving schemas, to help future developers understand the design decision.

examples/data/car_and_tech_companies.txt (2)

17-17: Fix grammatical number agreement in summary sentence.

The phrase “Each of these car manufacturer contributes…” mismatches plural “these” with singular “manufacturer.”
Apply this diff:

- Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry...
+ Each of these car manufacturers contributes to Germany's reputation as a leader in the global automotive industry...
🧰 Tools
🪛 LanguageTool

[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)


[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


1-37: Consider externalizing large static text for maintainability.

Embedding lengthy multi-line strings directly in a .txt file can make updates cumbersome. You might move these into a structured JSON/YAML resource (e.g., examples/data/car_and_tech_companies.json) and load them at runtime.
This change will:

  • Improve readability of code that ingests this data
  • Enable syntax validation and diffing in structured formats
  • Simplify future updates and translations

Would you like assistance converting this to a JSON/YAML resource?

🧰 Tools
🪛 LanguageTool

[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...

(A_VARIETY_OF)


[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)


[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...

(IN_RECENT_STYLE)


[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...

(ENGLISH_WORD_REPEAT_RULE)

cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)

53-55: Optional: Reevaluate the __main__ guard for test consistency.

Adding:

if __name__ == "__main__":
    test_run_tasks()

enables standalone execution but is nonstandard when using frameworks like pytest or unittest CLI. Consider relying solely on your test runner for consistency across CI and local runs.

cognee/modules/pipelines/operations/__init__.py (1)

1-1: Update the import to follow package conventions

The newly added import log_pipeline_run_initiated is not being used within this file and isn't exposed through an __all__ declaration. This creates "dead imports" as flagged by the static analysis tool.

Choose one of these approaches:

  1. Add an __all__ list to explicitly define the public API:
from .log_pipeline_run_initiated import log_pipeline_run_initiated
from .log_pipeline_run_start import log_pipeline_run_start
from .log_pipeline_run_complete import log_pipeline_run_complete
from .log_pipeline_run_error import log_pipeline_run_error
from .pipeline import cognee_pipeline

+__all__ = [
+    "log_pipeline_run_initiated",
+    "log_pipeline_run_start",
+    "log_pipeline_run_complete",
+    "log_pipeline_run_error",
+    "cognee_pipeline",
+]
  1. Or add a comment explaining the import pattern if this is an intentional design choice for implicit exports.
🧰 Tools
🪛 Ruff (0.8.2)

1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

examples/python/graphiti_example.py (1)

31-33: User context integration is correct but could use clarification

Adding user initialization to align with updated pipeline requirements is good. Consider adding a brief comment explaining why user context is necessary for task execution in the updated system.

 # Initialize default user
 user = await get_default_user()
+# User context is required for pipeline task execution in the updated system
cognee/modules/retrieval/exceptions/__init__.py (1)

7-7: Consider addressing the unused imports in the module

The static analysis tool has identified that SearchTypeNotSupported and CypherSearchError are imported but not used within this module. If the intent is to re-export these exceptions at the package level, consider adding them to an __all__ list.

from .exceptions import SearchTypeNotSupported, CypherSearchError

+__all__ = ["SearchTypeNotSupported", "CypherSearchError"]
🧰 Tools
🪛 Ruff (0.8.2)

7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/observability/get_observe.py (1)

1-11: Add docstring for better documentation

The function lacks a docstring explaining its purpose, parameters, return values, and possible exceptions. Since this is a utility function that will be used throughout the codebase, proper documentation is essential.

Add a docstring to clarify the function's purpose:

def get_observe():
+    """
+    Get the appropriate observe decorator based on the configured monitoring tool.
+    
+    Returns:
+        The observe decorator from the configured monitoring tool package.
+        
+    Raises:
+        ValueError: If the configured monitoring tool is not supported.
+    """
    monitoring = get_base_config().monitoring_tool

    if monitoring == Observer.LANGFUSE:
        from langfuse.decorators import observe

        return observe
cognee/modules/observability/observers.py (1)

7-9: Consider adding docstrings for individual enum values

While the class has a brief docstring, adding documentation for each enum value would improve clarity about what each monitoring tool represents and potentially any specific requirements or behaviors associated with it.

class Observer(str, Enum):
    """Monitoring tools"""

-    LANGFUSE = "langfuse"
-    LLMLITE = "llmlite"
-    LANGSMITH = "langsmith"
+    LANGFUSE = "langfuse"  # Langfuse observability platform
+    LLMLITE = "llmlite"    # LLMLite monitoring tool
+    LANGSMITH = "langsmith"  # LangSmith tracing and evaluation platform
cognee/tests/test_neo4j.py (1)

77-86: Consider adding a TODO to revisit this test

While commenting out the test with a clear explanation is reasonable, it would be helpful to add a TODO or FIXME to ensure this test is revisited in the future when the issue with weak LLM models is addressed.

-# NOTE: Due to the test failing often on weak LLM models we've removed this test for now
+# TODO: Revisit this test when more reliable LLM models are available or implement a more robust test
+# NOTE: Test removed temporarily as it fails frequently on weaker LLM models
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (2)

6-14: Add docstring to improve function documentation

The function lacks documentation explaining its purpose, parameters, and return value. Adding a docstring would improve code maintainability.

 async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID):
+    """
+    Creates and persists a new pipeline run record indicating the initiation of dataset processing.
+    
+    Args:
+        pipeline_id: The ID of the pipeline
+        pipeline_name: The name of the pipeline
+        dataset_id: The UUID of the dataset being processed
+        
+    Returns:
+        PipelineRun: The created pipeline run object
+    """
     pipeline_run = PipelineRun(
         pipeline_run_id=uuid4(),
         pipeline_name=pipeline_name,

13-14: Consider adding basic metadata to initial run_info

The run_info is initialized as an empty dictionary. Consider adding basic metadata like timestamp or initiator to provide more context.

         pipeline_id=pipeline_id,
         status=PipelineRunStatus.DATASET_PROCESSING_INITIATED,
         dataset_id=dataset_id,
-        run_info={},
+        run_info={
+            "initiated_at": datetime.datetime.utcnow().isoformat(),
+            "environment": os.environ.get("ENVIRONMENT", "unknown")
+        },
     )
cognee/infrastructure/databases/graph/get_graph_engine.py (1)

71-72: Consider more specific error messages for missing credentials

The error message could be more specific about which Memgraph credential is missing to aid in debugging configuration issues.

     elif graph_database_provider == "memgraph":
-        if not (graph_database_url and graph_database_username and graph_database_password):
-            raise EnvironmentError("Missing required Memgraph credentials.")
+        missing_credentials = []
+        if not graph_database_url:
+            missing_credentials.append("graph_database_url")
+        if not graph_database_username:
+            missing_credentials.append("graph_database_username")
+        if not graph_database_password:
+            missing_credentials.append("graph_database_password")
+        
+        if missing_credentials:
+            raise EnvironmentError(f"Missing required Memgraph credentials: {', '.join(missing_credentials)}")
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)

175-190: Clean up unused search type handling

The component no longer includes "INSIGHTS" in the search options (as mentioned in the AI summary), but the convertToSearchTypeOutput function still handles cases for "INSIGHTS", "SUMMARIES", and "CHUNKS" which may no longer be used.

Consider cleaning up the unused cases in this switch statement to match the available search options:

switch (searchType) {
-    case 'INSIGHTS':
-      return systemMessages.map((message: InsightMessage) => {
-        const [node1, relationship, node2] = message;
-        if (node1.name && node2.name) {
-          return `${node1.name} ${relationship.relationship_name} ${node2.name}.`;
-        }
-        return '';
-      }).join('\n');
-    case 'SUMMARIES':
-      return systemMessages.map((message: { text: string }) => message.text).join('\n');
-    case 'CHUNKS':
-      return systemMessages.map((message: { text: string }) => message.text).join('\n');
    default:
      return "";
}
cognee/exceptions/exceptions.py (1)

23-30: Consider validating the log_level parameter.

The code doesn't validate that log_level is one of the supported values. An invalid value would result in silent failure (no logging) without any error.

Consider adding validation:

def __init__(
    self,
    message: str = "Service is unavailable.",
    name: str = "Cognee",
    status_code=status.HTTP_418_IM_A_TEAPOT,
    log=True,
-   log_level="ERROR",
+   log_level: Literal["ERROR", "WARNING", "INFO", "DEBUG"] = "ERROR",
):
    self.message = message
    self.name = name
    self.status_code = status_code

+   valid_log_levels = ["ERROR", "WARNING", "INFO", "DEBUG"]
+   if log and log_level not in valid_log_levels:
+       log_level = "ERROR"
+       logger.warning(f"Invalid log_level '{log_level}'. Using 'ERROR' instead.")

    # Automatically log the exception details
    if log and (log_level == "ERROR"):

You'll need to add from typing import Literal at the top of the file.

cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (2)

60-70: Good implementation of button state management.

The implementation correctly:

  • Uses the useBoolean hook for managing button state
  • Disables the button during the async operation
  • Ensures the button is re-enabled in the finally block

Consider adding error handling to display feedback to the user if the cognify operation fails.

const handleCognify = () => {
  disableCognifyButton();
  onCognify()
+   .catch(error => {
+     // Display error to user, e.g. with a toast notification
+     console.error('Cognify operation failed:', error);
+   })
    .finally(() => enableCognifyButton());
};

36-36: Component name shadows global property.

The component name 'DataView' shadows a global property, which could lead to confusion.

Consider renaming the component to avoid shadowing:

-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {
+export default function DatasetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {

Remember to update the component name in any importing files as well.

🧰 Tools
🪛 Biome (1.9.4)

[error] 36-36: Do not shadow the global "DataView" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

cognee/modules/pipelines/operations/run_tasks_base.py (1)

31-33: Optimize dictionary key lookup.

The current implementation uses .keys() which creates an unnecessary view object when checking for key existence.

-    has_context = any(
-        [key == "context" for key in inspect.signature(running_task.executable).parameters.keys()]
-    )
+    has_context = "context" in inspect.signature(running_task.executable).parameters
🧰 Tools
🪛 Ruff (0.8.2)

32-32: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

Dockerfile (1)

33-34: Consider consolidating dependency installation.

The dependency installation is performed twice with the same parameters. Consider consolidating into a single installation after copying all source code, or clearly documenting why two separate installations are necessary.

Also applies to: 43-44

cognee/tests/test_memgraph.py (2)

14-14: Add docstring to the main function.

Consider adding a docstring to explain the purpose and flow of this integration test for better clarity and documentation.

 async def main():
+    """
+    Integration test for Memgraph adapter in Cognee.
+    
+    This test configures Cognee with Memgraph, loads test data, performs
+    cognification, executes various search queries, verifies history tracking,
+    and performs cleanup to ensure proper isolation.
+    """
     cognee.config.set_graph_database_provider("memgraph")

57-60: Enhance search result assertions.

The current assertions only check if results are non-empty. Consider adding more specific assertions to verify the quality and correctness of search results, such as checking for expected content or structural properties.

 search_results = await cognee.search(
     query_type=SearchType.INSIGHTS, query_text=random_node_name
 )
 assert len(search_results) != 0, "The search results list is empty."
+# Verify result quality
+assert any("quantum" in str(result).lower() for result in search_results), "Expected 'quantum' keyword not found in results"

Also applies to: 65-69, 71-77, 79-86

cognee/modules/pipelines/operations/pipeline.py (1)

173-175: Mismatch between validation logic and error message

The validation currently rejects dots "." and spaces " " in a dataset name, yet the error message claims that underscores are disallowed.
This will certainly confuse users and may lead to unnecessary support questions.

-    if "." in dataset_name or " " in dataset_name:
-        raise ValueError("Dataset name cannot contain spaces or underscores")
+    if "." in dataset_name or " " in dataset_name:
+        raise ValueError("Dataset name cannot contain spaces or dots (.)")
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

250-253: Early return path is good but misses with_vector flag

When no results are found the method returns an empty list ‑ great.
However, if with_vector=True the caller might still expect the vector field in each ScoredResult. Consider documenting this behaviour or returning a structure that makes the absence explicit.

cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)

37-47: Repeated connect() on every call – cache connection status

await self.client.connect() is executed on each get_client() invocation.
If the SDK treats connect() as idempotent this is only wasted latency; if not, it may raise AlreadyConnected or leak resources.

 async def get_client(self):
-    await self.client.connect()
-    return self.client
+    if not getattr(self.client, "_is_connected", False):
+        await self.client.connect()
+        # flag set by us; adjust to real SDK attribute if it exists
+        self.client._is_connected = True
+    return self.client
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

205-211: Serial deletes can be slow – batch predicate is simpler

Looping one-by-one round-trips to delete rows may crawl on large lists.
Since IDs are primary keys, a single predicate delete is supported:

-for data_point_id in data_point_ids:
-    await collection.delete(f"id = '{data_point_id}'")
+ids = "', '".join(data_point_ids)
+await collection.delete(f"id IN ('{ids}')")

This cuts network chatter and lock contention.

cognee-mcp/src/server.py (1)

79-86: all(results) is an unreliable success criterion

run_code_graph_pipeline appears to yield complex objects; relying on truthiness may mark partial failures as success (e.g., non-empty dicts are truthy). Consider:

  1. Inspect returned objects for explicit success flags, or
  2. Catch exceptions inside the loop instead of inferring success by truthiness, or
  3. Log and propagate the first exception.

Without this, the “Codify process failed” branch may never execute even when errors occur.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

66-73: Added defensive error handling for entity not found.

The code now gracefully handles cases where entities don't exist in the database by catching EntityNotFoundError and continuing execution rather than failing. This improves robustness when working with incomplete data.

Consider using contextlib.suppress(EntityNotFoundError) for a more concise implementation:

-    try:
-        await memory_fragment.project_graph_from_db(
-            graph_engine,
-            node_properties_to_project=properties_to_project,
-            edge_properties_to_project=["relationship_name"],
-        )
-    except EntityNotFoundError:
-        pass
+    from contextlib import suppress
+    with suppress(EntityNotFoundError):
+        await memory_fragment.project_graph_from_db(
+            graph_engine,
+            node_properties_to_project=properties_to_project,
+            edge_properties_to_project=["relationship_name"],
+        )
🧰 Tools
🪛 Ruff (0.8.2)

66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2faf238 and ad0bb0c.

⛔ Files ignored due to path filters (10)
  • .github/actions/cognee_setup/action.yml is excluded by !**/*.yml
  • .github/workflows/python_version_tests.yml is excluded by !**/*.yml
  • .github/workflows/test_memgraph.yml is excluded by !**/*.yml
  • assets/cognee_benefits.png is excluded by !**/*.png, !**/*.png
  • assets/graph_visualization.png is excluded by !**/*.png, !**/*.png
  • cognee-mcp/pyproject.toml is excluded by !**/*.toml
  • cognee-mcp/uv.lock is excluded by !**/*.lock, !**/*.lock
  • poetry.lock is excluded by !**/*.lock, !**/*.lock
  • pyproject.toml is excluded by !**/*.toml
  • uv.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (78)
  • CONTRIBUTING.md (1 hunks)
  • Dockerfile (1 hunks)
  • README.md (2 hunks)
  • assets/graph_visualization.html (0 hunks)
  • cognee-frontend/src/app/page.tsx (3 hunks)
  • cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1 hunks)
  • cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1 hunks)
  • cognee-frontend/src/app/wizard/WizardPage.tsx (1 hunks)
  • cognee-frontend/src/modules/datasets/cognifyDataset.ts (1 hunks)
  • cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1 hunks)
  • cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (4 hunks)
  • cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1 hunks)
  • cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2 hunks)
  • cognee-frontend/src/utils/fetch.ts (1 hunks)
  • cognee-mcp/src/server.py (3 hunks)
  • cognee/api/v1/add/add.py (1 hunks)
  • cognee/api/v1/cognify/code_graph_pipeline.py (3 hunks)
  • cognee/api/v1/cognify/cognify.py (1 hunks)
  • cognee/api/v1/datasets/datasets.py (1 hunks)
  • cognee/base_config.py (1 hunks)
  • cognee/exceptions/exceptions.py (1 hunks)
  • cognee/infrastructure/databases/graph/get_graph_engine.py (1 hunks)
  • cognee/infrastructure/databases/graph/graph_db_interface.py (2 hunks)
  • cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1 hunks)
  • cognee/infrastructure/databases/graph/networkx/adapter.py (7 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2 hunks)
  • cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (8 hunks)
  • cognee/infrastructure/databases/vector/exceptions/exceptions.py (1 hunks)
  • cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (8 hunks)
  • cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (9 hunks)
  • cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (5 hunks)
  • cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (6 hunks)
  • cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (9 hunks)
  • cognee/infrastructure/llm/gemini/adapter.py (1 hunks)
  • cognee/infrastructure/llm/openai/adapter.py (2 hunks)
  • cognee/modules/data/methods/__init__.py (1 hunks)
  • cognee/modules/data/methods/create_dataset.py (2 hunks)
  • cognee/modules/data/methods/get_unique_dataset_id.py (1 hunks)
  • cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
  • cognee/modules/observability/get_observe.py (1 hunks)
  • cognee/modules/observability/observers.py (1 hunks)
  • cognee/modules/pipelines/models/PipelineRun.py (1 hunks)
  • cognee/modules/pipelines/operations/__init__.py (1 hunks)
  • cognee/modules/pipelines/operations/get_pipeline_status.py (2 hunks)
  • cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1 hunks)
  • cognee/modules/pipelines/operations/pipeline.py (3 hunks)
  • cognee/modules/pipelines/operations/run_tasks.py (4 hunks)
  • cognee/modules/pipelines/operations/run_tasks_base.py (4 hunks)
  • cognee/modules/retrieval/exceptions/__init__.py (1 hunks)
  • cognee/modules/retrieval/exceptions/exceptions.py (0 hunks)
  • cognee/modules/retrieval/graph_completion_retriever.py (1 hunks)
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py (4 hunks)
  • cognee/modules/settings/get_settings.py (2 hunks)
  • cognee/shared/data_models.py (0 hunks)
  • cognee/shared/logging_utils.py (1 hunks)
  • cognee/tasks/ingestion/ingest_data.py (1 hunks)
  • cognee/tasks/temporal_awareness/index_graphiti_objects.py (2 hunks)
  • cognee/tests/integration/run_toy_tasks/conftest.py (0 hunks)
  • cognee/tests/test_memgraph.py (1 hunks)
  • cognee/tests/test_neo4j.py (1 hunks)
  • cognee/tests/test_weaviate.py (1 hunks)
  • cognee/tests/unit/modules/pipelines/run_tasks_test.py (1 hunks)
  • cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1 hunks)
  • cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (4 hunks)
  • cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1 hunks)
  • cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1 hunks)
  • cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py (0 hunks)
  • entrypoint.sh (3 hunks)
  • examples/data/car_and_tech_companies.txt (1 hunks)
  • examples/python/graphiti_example.py (2 hunks)
  • notebooks/cognee_demo.ipynb (3 hunks)
  • notebooks/cognee_graphiti_demo.ipynb (4 hunks)
  • notebooks/cognee_llama_index.ipynb (2 hunks)
  • notebooks/cognee_simple_demo.ipynb (7 hunks)
  • notebooks/github_graph_visualization.html (0 hunks)
  • notebooks/graphrag_vs_rag.ipynb (7 hunks)
  • notebooks/hr_demo.ipynb (0 hunks)
  • notebooks/llama_index_cognee_integration.ipynb (5 hunks)
💤 Files with no reviewable changes (7)
  • cognee/modules/retrieval/exceptions/exceptions.py
  • cognee/shared/data_models.py
  • cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
  • cognee/tests/integration/run_toy_tasks/conftest.py
  • notebooks/github_graph_visualization.html
  • assets/graph_visualization.html
  • notebooks/hr_demo.ipynb
🧰 Additional context used
🧬 Code Graph Analysis (26)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
  • test_run_tasks (42-43)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
cognee/modules/users/models/User.py (1)
  • User (12-39)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
cognee/modules/retrieval/summaries_retriever.py (1)
  • SummariesRetriever (9-33)
cognee/api/v1/cognify/cognify.py (2)
cognee/modules/pipelines/operations/pipeline.py (1)
  • cognee_pipeline (30-103)
cognee/api/v1/datasets/datasets.py (1)
  • datasets (7-40)
cognee/infrastructure/llm/openai/adapter.py (1)
cognee/modules/observability/get_observe.py (1)
  • get_observe (5-11)
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
  • Explorer (15-61)
cognee/tests/test_neo4j.py (1)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (12-37)
cognee/modules/data/methods/__init__.py (1)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
  • get_unique_dataset_id (5-6)
cognee/tasks/ingestion/ingest_data.py (1)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (11-33)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
cognee/modules/pipelines/models/PipelineRun.py (1)
  • PipelineRun (15-27)
cognee/modules/pipelines/operations/__init__.py (1)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
  • log_pipeline_run_initiated (6-22)
cognee/modules/observability/get_observe.py (2)
cognee/base_config.py (1)
  • get_base_config (29-30)
cognee/modules/observability/observers.py (1)
  • Observer (4-9)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
  • MemgraphAdapter (20-690)
cognee/base_config.py (1)
cognee/modules/observability/observers.py (1)
  • Observer (4-9)
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
cognee-frontend/src/utils/fetch.ts (1)
  • fetch (3-12)
cognee/api/v1/datasets/datasets.py (1)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
  • get_pipeline_status (8-35)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (5)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (12-37)
cognee/modules/pipelines/operations/run_tasks_base.py (1)
  • run_tasks_base (66-82)
cognee/infrastructure/databases/relational/create_db_and_tables.py (1)
  • create_db_and_tables (4-7)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
  • run_and_check_tasks (10-46)
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (1)
cognee/api/v1/config/config.py (2)
  • config (15-194)
  • system_root_directory (17-29)
cognee/tests/test_weaviate.py (1)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-6)
cognee/tests/test_memgraph.py (7)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (12-37)
cognee/shared/logging_utils.py (1)
  • get_logger (137-158)
cognee/modules/search/types/SearchType.py (1)
  • SearchType (4-13)
cognee/api/v1/config/config.py (4)
  • config (15-194)
  • set_graph_database_provider (57-59)
  • data_root_directory (32-34)
  • system_root_directory (17-29)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-6)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
  • get_graph_engine (9-21)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
  • get_graph_data (455-483)
cognee/modules/data/methods/create_dataset.py (2)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
  • get_unique_dataset_id (5-6)
cognee/modules/users/models/User.py (1)
  • User (12-39)
cognee/exceptions/exceptions.py (1)
cognee/shared/logging_utils.py (4)
  • error (127-128)
  • warning (124-125)
  • info (121-122)
  • debug (133-134)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
  • CollectionNotFoundError (5-14)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
  • get_async_session (34-40)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (5)
cognee/shared/logging_utils.py (1)
  • get_logger (137-158)
cognee/exceptions/exceptions.py (1)
  • InvalidValueError (47-54)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (16-96)
cognee/infrastructure/engine/utils/parse_id.py (1)
  • parse_id (4-10)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
  • CollectionNotFoundError (5-14)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (5)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (4)
  • get_collection (121-126)
  • has_collection (111-113)
  • get_connection (99-106)
  • create_data_points (128-144)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (3)
  • get_collection (75-80)
  • has_collection (51-53)
  • create_data_points (82-132)
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (2)
  • has_collection (74-78)
  • create_data_points (99-130)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
  • CollectionNotFoundError (5-14)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (16-96)
cognee/infrastructure/databases/graph/networkx/adapter.py (4)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (7)
  • has_node (66-75)
  • get_edges (264-275)
  • extract_node (121-124)
  • extract_nodes (126-136)
  • get_neighbors (381-383)
  • get_node (385-392)
  • get_nodes (394-402)
cognee/infrastructure/databases/graph/kuzu/adapter.py (7)
  • has_node (167-171)
  • get_edges (439-475)
  • extract_node (284-304)
  • extract_nodes (306-325)
  • get_neighbors (479-481)
  • get_node (483-502)
  • get_nodes (504-521)
cognee/infrastructure/databases/graph/graph_db_interface.py (4)
  • get_edges (177-179)
  • get_neighbors (182-184)
  • get_node (125-127)
  • get_nodes (130-132)
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)
  • extract_node (235-238)
  • extract_nodes (240-241)
🪛 Ruff (0.8.2)
cognee/modules/retrieval/exceptions/__init__.py

7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/data/methods/__init__.py

10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/pipelines/operations/__init__.py

1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/retrieval/utils/brute_force_triplet_search.py

66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)

cognee/modules/pipelines/operations/run_tasks_base.py

32-32: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

cognee-mcp/src/server.py

48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

430-430: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 Biome (1.9.4)
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx

[error] 36-36: Do not shadow the global "DataView" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🪛 LanguageTool
examples/data/car_and_tech_companies.txt

[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...

(A_VARIETY_OF)


[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)


[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...

(IN_RECENT_STYLE)


[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...

(ENGLISH_WORD_REPEAT_RULE)

🔇 Additional comments (119)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)

130-130: Good parameter naming alignment with the implementation.

The change from limit=20 to top_k=20 correctly aligns the test with the actual implementation of SummariesRetriever, which accepts top_k as a constructor parameter and then uses it internally as the limit parameter when calling the vector engine. This change is part of the broader standardization of parameter naming conventions across retrieval modules.

cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1)

157-162: Good refactoring of test execution pattern!

Converting individual test calls to a sequential async main function improves test organization and efficiency. This approach:

  • Ensures tests run in a predictable order
  • Uses a single event loop for all tests rather than creating/destroying multiple loops
  • Follows the async/await pattern consistently throughout the test file
  • Matches the pattern used in other retrieval test modules
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (4)

19-19: Good test isolation improvement

Using a test-specific directory path (test_chunks_context_simple) provides better isolation between test cases and prevents data leakage between tests.

Also applies to: 23-23


76-77: Good test isolation improvement

Using a test-specific directory path (test_chunk_context_complex) ensures that this test's data doesn't interfere with other tests. This also makes debugging easier when a specific test fails.

Also applies to: 80-81


165-166: Good test isolation improvement

Using a dedicated directory path (test_chunk_context_empty) for the empty graph test ensures accurate testing of the empty state behavior without interference from other test data.

Also applies to: 169-170


193-198: Good refactoring of test execution pattern!

Converting multiple individual test calls to a single async main function is an improvement that:

  • Ensures consistent use of the async/await pattern
  • Prevents multiple event loop creation/destruction cycles
  • Makes test execution more predictable and maintainable
  • Aligns with the pattern used in other retrieval test modules
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

72-72: Consistency fix in comment formatting.

Fixed apostrophe style in comment from curly to straight, improving code consistency.

cognee/modules/pipelines/models/PipelineRun.py (1)

9-9: Verify database migration for new enum member.

Adding DATASET_PROCESSING_INITIATED to PipelineRunStatus may require an Alembic migration if using a PostgreSQL ENUM type. Please confirm that your migration scripts include this new member, or update the type with an ALTER TYPE statement.

cognee-frontend/src/utils/fetch.ts (1)

4-4: LGTM on URL standardization

Changing from IP address to hostname is a good standardization practice. The functionality remains the same as both "127.0.0.1" and "localhost" resolve to the local machine.

For future improvements, consider using environment variables for the API base URL:

- return global.fetch('http://localhost:8000/api' + url, {
+ return global.fetch((process.env.NEXT_PUBLIC_API_BASE_URL || 'http://localhost:8000/api') + url, {

This would make the application more configurable across different environments.

cognee/tests/test_weaviate.py (1)

85-85: LGTM on adding await keyword

The addition of the await keyword is necessary and correct since the list_all() method has been converted to an asynchronous function as part of the broader WeaviateAdapter refactoring.

This change ensures the test properly waits for the asynchronous operation to complete before asserting that the collections list is empty.

cognee-frontend/src/app/wizard/WizardPage.tsx (1)

32-32: LGTM on standardizing dataset identification

Changing from id to name aligns with the broader standardization of dataset identification throughout the frontend codebase.

This change is consistent with updates in related components like CognifyStep and ExploreStep, which now expect a dataset prop with a name property instead of an id property.

cognee/api/v1/cognify/cognify.py (1)

37-39: Pipeline naming standardization looks good

The addition of the pipeline_name="cognify_pipeline" parameter helps standardize pipeline identification across the system, allowing for better tracking and filtering of pipeline executions. This aligns well with the changes in the datasets module where the same pipeline name is used for status retrieval.

cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)

10-10: Dataset identification change is aligned with broader architecture updates

Changing the dataset property from { id: string } to { name: string } reflects a shift in how datasets are identified throughout the application. This is consistent with similar changes in other frontend components. The Explorer component correctly handles this change since it only passes the dataset object to other functions without directly accessing specific properties.

examples/python/graphiti_example.py (1)

41-41: Pipeline execution with user context looks good

The updated run_tasks call with the user parameter aligns with the broader updates to the pipeline task execution system.

cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (2)

5-5: LGTM! Dataset identification unified by name.

The dataset prop type change from { id: string } to { name: string } aligns with the broader frontend refactoring to standardize dataset identification by name rather than ID.


11-11: Compatible with corresponding Explorer component changes.

Removed the non-null assertion operator (!) when passing the dataset prop to the Explorer component, which is consistent with the updated type system and how Explorer handles the dataset prop.

README.md (2)

61-61: LGTM! Updated Colab notebook link.

The Google Colab notebook link has been updated to a new URL, ensuring users have access to the latest getting started materials.


128-135: Great addition highlighting the new UI capabilities.

The new "cognee UI" section enhances the README by providing users with information about the visual interface, which aligns with the broader frontend improvements in this PR. The screenshot and local setup link make it easy for users to get started with the UI.

cognee/api/v1/add/add.py (1)

2-6: Improved import organization.

The reordering of imports provides better logical grouping of related modules, improving code readability:

  1. Pipeline imports grouped together
  2. User model below Task
  3. Task operations below pipeline definition

No functional changes were made to the code logic, making this a safe improvement.

cognee/infrastructure/llm/openai/adapter.py (2)

5-5: Simplified imports.

Removed unused imports and kept only the essential Type import from the typing module, improving code cleanliness.


18-20: Centralized observability implementation.

Replaced conditional import logic with a centralized approach using the new get_observe() function. This change:

  1. Improves code consistency across different LLM adapters
  2. Centralizes monitoring tool configuration logic
  3. Makes the code more maintainable by reducing duplicated conditional logic

This is a good architectural improvement that aligns with the broader observability refactoring in the codebase.

cognee/modules/retrieval/graph_completion_retriever.py (1)

78-78:

✅ Verification successful

Error handling pattern has changed.

The try-except block catching EntityNotFoundError has been removed. Ensure that error handling for entity-not-found scenarios is now properly managed in the get_triplets method or that this error condition is no longer relevant for this code path.


🏁 Script executed:

#!/bin/bash
# Search for EntityNotFoundError handling in the get_triplets method or brute_force_triplet_search
rg -A 5 "EntityNotFoundError" cognee/modules/retrieval/utils/brute_force_triplet_search.py

Length of output: 593


🏁 Script executed:

#!/bin/bash
echo "1. Locating get_triplets definition in graph_completion_retriever.py"
rg -n "def get_triplets" cognee/modules/retrieval/graph_completion_retriever.py

echo "2. Showing the first 200 lines of graph_completion_retriever.py for context"
sed -n '1,200p' cognee/modules/retrieval/graph_completion_retriever.py

echo "3. Searching for brute_force_triplet_search usage in graph_completion_retriever.py"
rg -n "brute_force_triplet_search" cognee/modules/retrieval/graph_completion_retriever.py

Length of output: 5994


Error handling for missing entities is covered upstream
The EntityNotFoundError is caught and suppressed within brute_force_triplet_search, so there’s no need for an additional try-except in get_triplets.

cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)

3-3:

✅ Verification successful

Breaking change: parameter type changed from id to name.

The function parameter has been changed from { id: string } to { name: string }. This is part of a broader refactor to use dataset names instead of IDs. Ensure all callers of this function have been updated to provide a name property instead of an id.


🏁 Script executed:

#!/bin/bash
# Search for instances where getExplorationGraphUrl is called to verify they use name
rg -A 2 "getExplorationGraphUrl" cognee-frontend/src

Length of output: 1050


Verified: getExplorationGraphUrl callers updated

  • The only invocation found in cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx now passes the full dataset object (which includes name).
  • No other call sites of getExplorationGraphUrl remain in the codebase.

No further changes required.

cognee/tasks/ingestion/ingest_data.py (1)

107-107: LGTM!

The change correctly passes the full user object instead of just the user ID to the create_dataset function, which aligns with the updated function signature that now expects a User object as the second parameter.

cognee/infrastructure/databases/graph/graph_db_interface.py (3)

61-61: Logging level changed from error to debug.

The change from logger.error to logger.debug for relationship errors reduces log noise for non-critical issues that are already being handled with rollback. This is a good practice as these exceptions are caught and handled properly within the code flow.


81-81: Consistent logging level adjustment for edge relationships.

Similarly to the previous change, the logging level for edge relationship errors has been appropriately reduced from error to debug, maintaining consistency in how these handled exceptions are logged.


88-88: Session commit errors now logged at debug level.

The session commit errors are now logged at debug level instead of error level. This change aligns with the overall approach to reduce noise in error logs for handled exceptions.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

131-135:

❓ Verification inconclusive

API call updated to use the new unified search interface.

The code has been modernized to use the search method with explicit keyword arguments instead of the deprecated get_distance_from_collection_elements method. This change aligns with broader refactoring across vector database adapters to use a unified search interface.

The parameter limit=0 might need verification - typically a limit of 0 might mean "no results" in some systems while in others it could mean "no limit".


🏁 Script executed:

#!/bin/bash
# Check if the search method implementation treats limit=0 appropriately
rg -A 10 "async def search" --type py cognee/infrastructure/databases/vector/

Length of output: 7548


I’m inspecting how each adapter handles a limit=0 parameter to confirm that it either returns all results or is treated safely. Running a scan of all search implementations now.


Confirm limit=0 Semantics Across All Adapters

The switch from the deprecated get_distance_from_collection_elements to the unified search() interface looks correct. However, passing limit=0 can mean “no results” in some clients rather than “unlimited.” Please verify that each adapter treats limit=0 as intended (e.g. return all matches or a sensible default) and adjust if needed:

• cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
• cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
• cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
• cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
• cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
• cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

cognee/modules/settings/get_settings.py (2)

2-2: Added import for Optional type.

The import for Optional from the typing module has been added to support the type changes in the LLMConfig model.


24-25:

✅ Verification successful

LLMConfig fields made optional.

The endpoint and api_version fields in the LLMConfig model have been changed from required strings to optional strings. This provides more flexibility in configuration, especially for LLM providers that might not require these fields.

Make sure that any code using these fields handles the case where they might be None.


🏁 Script executed:

#!/bin/bash
# Check usages of endpoint and api_version to ensure they handle None values
rg -A 5 -B 5 "endpoint|api_version" --type py cognee/infrastructure/llm/

Length of output: 15993


🏁 Script executed:

#!/bin/bash
# Display the first 200 lines of the settings file to understand context
sed -n '1,200p' cognee/modules/settings/get_settings.py

Length of output: 4448


Approved: Optional endpoint/api_version in Settings LLMConfig Are Safe

The endpoint and api_version fields in cognee/modules/settings/get_settings.py are only used to shape the settings payload returned to the UI. Since they’re now Optional[str], Pydantic’s BaseModel will accept None and emit null in the JSON—there’s no dereferencing or logic applied to these values elsewhere in the codebase.

No further changes are required.

cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1)

10-10:

✅ Verification successful

Dataset identification changed from id to name.

The dataset prop type has been updated from { id: string } to { name: string }, which aligns with a broader standardization of dataset identification across the application. This change ensures consistent dataset referencing throughout the frontend and backend pipeline.

Verify that the cognifyDataset function properly handles this change in the dataset identification scheme:


🏁 Script executed:

#!/bin/bash
# Check the implementation of cognifyDataset to ensure it uses dataset.name
cat src/modules/datasets/cognifyDataset.ts

Length of output: 115


🏁 Script executed:

#!/usr/bin/env bash
# Locate the implementation of cognifyDataset to verify it uses dataset.name
rg -n "cognifyDataset" .

Length of output: 584


🏁 Script executed:

#!/usr/bin/env bash
# Display the implementation of cognifyDataset to confirm it references dataset.name
sed -n '1,200p' cognee-frontend/src/modules/datasets/cognifyDataset.ts

Length of output: 427


Dataset identification change verified in cognifyDataset

The dataset prop has been updated to { name: string } in CognifyStep.tsx, and in src/modules/datasets/cognifyDataset.ts we see:

body: JSON.stringify({
  datasets: [dataset.id || dataset.name],
}),

This ensures that when id is absent, name is used. No further changes are needed.

cognee/modules/observability/observers.py (1)

4-9: LGTM! Good use of string-based enum

The Observer enum inheriting from str and Enum is a good choice, allowing for direct string comparisons without explicit conversions. The enumeration is clear and concise.

cognee/api/v1/datasets/datasets.py (1)

31-31: LGTM! Correct pipeline name specification

The update correctly specifies the pipeline name parameter when calling get_pipeline_status, which aligns with the changes in the pipeline operations module. This maintains the same interface for callers while adapting to the enhanced pipeline management system.

cognee/tests/test_neo4j.py (1)

90-90: LGTM! Correctly updated assertion

The assertion has been properly updated to reflect the removal of the natural language search test. The expected history length is now 6 instead of 8, which is consistent with the commented-out search operation.

cognee/base_config.py (2)

5-5: Consistent enum migration for observability tools.

The import change to use the new Observer enum aligns with the broader refactoring of observability tooling across the codebase.


11-11: Type consistency maintained in monitoring tool default.

The change from MonitoringTool.LANGFUSE to Observer.LANGFUSE preserves the same functionality while using the newly defined enum from the observability module.

cognee/modules/pipelines/operations/get_pipeline_status.py (2)

8-8: Function signature updated to support pipeline-specific filtering.

Adding the pipeline_name parameter allows callers to specify which pipeline's status to retrieve, improving the query's specificity.


23-23: Enhanced query filtering by pipeline name.

This new filter restricts the status retrieval to runs of the specified pipeline, improving the accuracy of pipeline status reporting.

cognee-frontend/src/app/page.tsx (2)

46-46: Improved user guidance in notification message.

The updated notification clearly informs users of the next step after adding data.


106-106: UI enhancement for direct cognification.

Adding the onCognify prop to DataView enables the new UI workflow for triggering dataset cognification directly from the data view.

cognee/infrastructure/llm/gemini/adapter.py (2)

6-7: Well-structured imports for improved observability

The centralized import of the observability module is a good approach, aligning with the broader refactoring of observability tooling across the codebase.


17-17: Good centralization of observability configuration

Replacing conditional imports with a centralized function call improves maintainability and consistency. This approach makes it easier to switch monitoring tools if needed in the future.

entrypoint.sh (4)

16-16: Direct Alembic usage aligned with dependency management changes

This change aligns with the shift from Poetry to uv for dependency management, simplifying the migration command.


25-25: Helpful migration completion message

Adding this message improves visibility into the startup process.


27-27: Simplified startup message

The more generic "Starting server..." message is appropriate since it's no longer specifically referring to Gunicorn.


36-37:

❓ Verification inconclusive

Removal of exec might affect signal handling in containers

Removing exec from the server startup commands changes process hierarchy and signal propagation. In containerized environments, this might affect how signals (like SIGTERM) are handled.

Also applies to: 38-39, 41-42


🏁 Script executed:

#!/bin/bash
# This script checks if the application is running in a container environment
# and warns about potential signal handling implications

if [ -f /.dockerenv ] || grep -q docker /proc/self/cgroup 2>/dev/null || grep -q kubernetes /proc/self/cgroup 2>/dev/null; then
  echo "WARNING: Application appears to be running in a container environment"
  echo "Without 'exec' prefix, signals may not propagate correctly to the application process."
  echo "This could affect graceful shutdown during container orchestration."
fi

# Check if container orchestration tools are being used
if [ -n "$KUBERNETES_SERVICE_HOST" ] || [ -n "$DOCKER_HOST" ]; then
  echo "Container orchestration detected. Signal handling is particularly important."
fi

Length of output: 130


Add exec to server startup for correct signal handling in containers

To ensure the startup process becomes PID 1 and properly receives SIGTERM/SIGINT in Docker/Kubernetes (allowing graceful shutdown), prepend exec to your debugpy/gunicorn commands:

• entrypoint.sh, Lines 36–37, 38–39, 41–42
• Example diff for the debugpy+gunicorn invocation:

-    debugpy --wait-for-client --listen 0.0.0.0:5678 \
-      -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \
-      -t 30000 --bind=0.0.0.0:8000 --log-level debug \
-      --reload cognee.api.client:app
+    exec debugpy --wait-for-client --listen 0.0.0.0:5678 \
+      -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \
+      -t 30000 --bind=0.0.0.0:8000 --log-level debug \
+      --reload cognee.api.client:app

Please verify that graceful shutdown behavior in your container orchestration works as expected.

cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)

16-21: LGTM: Database session handling follows best practices

The database session handling with async context manager follows best practices for asynchronous database operations.

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

70-80: Good implementation of Memgraph support

The addition of Memgraph support follows the established pattern of the factory function, maintaining consistency with other database providers.

notebooks/cognee_demo.ipynb (3)

470-470: Import path updated to match updated module structure.

The import path for the Task class has been updated to reflect the lowercase module naming convention (task instead of Task), aligning with Python's standard naming practices.


508-508: Parameter added to support user context in pipeline execution.

The run_tasks function call has been updated to include the user parameter, which aligns with the broader architectural changes for context propagation in task execution. This change correctly implements the updated function signature.


532-536: Explicit user context retrieval implemented.

The code now explicitly retrieves the user context in two steps:

  1. Getting the default user
  2. Fetching the complete user object with the default user's ID

This approach aligns with the codebase's architectural changes to standardize user context handling across the application.

cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)

81-82: Improved user experience with immediate input clearing

Moving the input clearing to happen immediately after submitting the search rather than waiting for the response provides better feedback to users that their query has been submitted.


108-110: Good error handling to preserve user input

Adding error handling to restore the input value if the fetch request fails is an excellent improvement to prevent users from losing their search query when errors occur.

cognee/api/v1/cognify/code_graph_pipeline.py (4)

5-7: Improved observability module organization

Reorganizing the imports for observability-related modules enhances code organization. The get_observe import from a dedicated module suggests a more centralized approach to observability management.


16-16: Added dataset ID generation capability

Adding the import for get_unique_dataset_id aligns with the asynchronous dataset ID generation approach and enables user-specific dataset IDs.


25-25: Centralized observability configuration

Replacing conditional imports with a centralized call to get_observe() improves code maintainability and provides a cleaner abstraction for observability tooling.


68-68: Enhanced dataset ID generation with user context

Switching to an asynchronous call to get_unique_dataset_id with user context is a good improvement that allows for user-specific dataset IDs and better integration with the user management system.

cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)

9-14: Improved exception handling and logging configuration

The changes to the CollectionNotFoundError class enhance error handling by:

  1. Making the default exception name consistent with the class name
  2. Adding flexible logging control parameters
  3. Updating the superclass constructor appropriately

This approach aligns with best practices for exception handling and allows for more granular control over error reporting.

notebooks/cognee_graphiti_demo.ipynb (4)

18-18: Fixed typo in markdown

Simple readability improvement in the notebook documentation.


27-29: Updated imports for required modules

Adding the asyncio import and updating the other imports to include get_llm_client and get_default_user properly prepares the notebook for the subsequent code changes.

Also applies to: 42-43


131-137: Integrated user context initialization

Simplifying the pruning call and adding the default user initialization is a good improvement that aligns with the broader changes to incorporate user context throughout the system.


143-148: Updated task execution with user context

Passing the user object to run_tasks properly integrates the user context into the pipeline execution, ensuring consistent handling of user-related data throughout the system.

cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (2)

1-47: Well-structured test for the new context propagation feature.

This test effectively validates that contextual information can be passed through pipeline tasks. The calculation (starting with 5, adding 7, multiplying by 2, then raising to power 7) correctly results in the expected value of 4,586,471,424.


14-21: Clear task definitions with proper context usage.

Good implementation of the test tasks with varied parameter patterns:

  • task_1 uses both input and context
  • task_2 only uses input
  • task_3 uses input and context again

This variety helps validate that the context is properly maintained throughout the pipeline execution regardless of whether individual tasks use it.

cognee/modules/data/methods/create_dataset.py (3)

7-8: Appropriate imports for the new implementation.

These imports correctly support the updated function signature that now accepts a User object instead of just an ID.


11-12: Good refactoring to accept full User objects.

This change aligns with the broader architectural improvements to propagate context throughout the application. Extracting the owner_id from the user object maintains compatibility with the existing database queries.


24-26: Improved dataset ID generation with proper modularization.

Moving the ID generation logic to a separate async helper function improves code organization while maintaining the same uniqueness guarantees (based on dataset name and user ID).

cognee/exceptions/exceptions.py (2)

15-16: Good addition of logging flexibility parameters.

These new parameters allow callers to control whether and at what level exceptions are logged, enhancing observability control.


23-30: Improved conditional logging based on log level.

The implementation properly handles different log levels, providing greater flexibility in how exceptions are reported.

cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (3)

9-10: Appropriate UI component imports.

The added imports support the new Cognify functionality with proper UI components.


33-33: Well-defined prop interface for the new cognify functionality.

The Promise return type allows for proper async handling of the cognify operation.


80-84: Good UI integration of the Cognify button.

The button is well-positioned next to related actions and has appropriate disabled state handling.

notebooks/graphrag_vs_rag.ipynb (5)

56-57: Updated cognee package version.

Updated to use the newer cognee package version (0.1.39), ensuring compatibility with the latest features and API changes.


152-153: Updated import path and API usage pattern.

Changed the import for SearchType from cognee.modules.search.types to cognee.api.v1.search and updated the function call to use keyword arguments. This reflects a move to a more stable and explicit API.


173-173: Fixed search type constant usage.

Correctly using SearchType.RAG_COMPLETION instead of the previous SearchType.COMPLETION, which is more explicit and accurately describes the search functionality being used.


202-202: Updated search function call to use keyword arguments.

Standardized the search function call to use keyword arguments, improving code clarity and alignment with the updated API.


246-248: Updated kernel metadata for consistency.

Ensured the notebook's kernel metadata is consistent and compatible with current Python environments, which is good practice for notebook portability.

cognee/modules/pipelines/operations/run_tasks_base.py (1)

17-17: Good implementation of context propagation.

The addition of the optional context parameter with inspection-based conditional passing provides a flexible way to propagate context through the task execution pipeline. This approach maintains backward compatibility while enabling more sophisticated inter-task communication.

Also applies to: 31-36, 66-66, 80-81

Dockerfile (1)

1-61: Excellent improvement with multi-stage build and modern dependency management.

The refactored Dockerfile adopts several best practices:

  1. Multi-stage build to minimize the final image size
  2. Using uv for faster dependency management instead of Poetry
  3. Proper layer caching with separated dependency installation
  4. Single RUN command for system dependencies with apt cache cleanup
  5. Mount caching for uv operations to speed up builds

These changes should result in faster builds, smaller images, and better maintainability.

cognee/tests/test_memgraph.py (1)

14-108: Well-structured integration test for Memgraph.

This test provides comprehensive coverage of the Memgraph adapter functionality including configuration, data loading, cognification, search operations across multiple query types, and proper cleanup. The test isolation with custom paths is a good practice.

cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (3)

122-125: Good error handling improvement.

Adding specific exception handling for CollectionNotExistException and raising a custom CollectionNotFoundError provides clearer error information and aligns with error handling standardization across adapters. This makes debugging and error recovery more straightforward.

Also applies to: 161-164, 211-214


148-148: Type annotation refinement from str to UUID.

Changing data_point_ids type from list[str] to list[UUID] improves type safety and makes the interface more precise. This helps catch potential type errors earlier in the development process.

Also applies to: 236-236


176-176: Search behavior improvements.

The increased default limit (5 → 15) provides more comprehensive search results, and handling non-positive limits by passing None is a robust approach that prevents unexpected behavior.

Also applies to: 196-196

cognee/modules/pipelines/operations/pipeline.py (1)

142-155: get_pipeline_status return-type assumption can break the status checks

task_status is assumed to be a mapping (dict-like) whose keys are str(dataset_id), yet the function call on line 142 provides a list of dataset IDs and—depending on its implementation—could plausibly return:

  • a list[PipelineRunStatus],
  • a dict[str, PipelineRunStatus], or
  • something else entirely.

If it returns a non-mapping type, the subsequent containment check (if str(dataset_id) in task_status:) will raise TypeError or always evaluate to False, allowing duplicate pipeline executions.

Please double-check get_pipeline_status’s signature and update the logic accordingly, or cast the result to the expected structure.

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

186-188: Great! Raising CollectionNotFoundError improves diagnostics

Explicitly surfacing a typed exception with a controllable log level is a solid improvement over returning None or raising generic errors.
No further action required.

cognee/infrastructure/databases/graph/networkx/adapter.py (1)

139-142: Edge retrieval logic looks good

The switch to UUID propagates correctly into the in_edges/out_edges queries; behaviour is intact.

cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)

75-81: collections.get may require await – double-check SDK

The async SDK exposes both sync & async variants.
If client.collections.get() returns a coroutine, omitting await will hand back an un-awaited coroutine to downstream code.

Please confirm with the SDK docs; apply await if necessary.


167-174: Limit parameter handled nicely

Raising the default search limit to 15 and forwarding None when limit<=0 is clean and keeps API flexible.

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

76-82: await connection.open_table() – confirm it really is a coroutine

AsyncConnection.open_table is async in current LanceDB (≥0.4), so awaiting is correct.
If your pinned version is older, this will raise TypeError: object is not awaitable.
Double-check the project’s LanceDB version; downgrade compatibility might need:

table = connection.open_table(collection_name)
if inspect.iscoroutine(table):
    table = await table
cognee/modules/retrieval/utils/brute_force_triplet_search.py (4)

4-6: Updated imports for improved error handling.

Added imports for EntityNotFoundError and CollectionNotFoundError exceptions, which are now used for more graceful error handling in the functions below. This aligns with the broader error handling improvements across the codebase.


146-153: Extracted search function with improved error handling.

Created a new helper function search_in_collection that encapsulates the vector search call and properly handles the case where a collection doesn't exist. This is a good refactoring that isolates error handling and makes the code more resilient.


156-157: Improved handling of empty search results.

The code now uses the new helper function for searches and returns an empty list when no results are found instead of potentially raising an exception. This provides more predictable behavior and a better API experience for callers.

Also applies to: 159-161


173-174: Added top-level error handling for collection not found.

The function now catches CollectionNotFoundError at the top level and returns an empty list, further improving resilience. This ensures the function doesn't fail even if there are unexpected collection issues.

cognee/modules/pipelines/operations/run_tasks.py (4)

23-25: Context parameter added to enable task context propagation

The addition of the optional context parameter enhances the pipeline's capability by allowing contextual data to be passed through to tasks that need it.


41-41: Context propagation properly passed to run_tasks_base

The context is correctly forwarded to the underlying function, ensuring seamless propagation throughout the execution chain.


77-77: Context parameter added to run_tasks function

Adding the optional context parameter here completes the context propagation path through the pipeline execution functions.


87-93: Updated function call with named parameters

The function call is updated with named parameters, making the code more readable and explicitly showing the context parameter being passed. This is good practice when functions have multiple parameters.

cognee/tasks/temporal_awareness/index_graphiti_objects.py (6)

23-23: Empty params parameter added for consistency

Adding explicit params={} ensures consistent parameter passing across all graph queries, which is important for standardization and compatibility with different graph database providers.


27-29: Query parameters standardized

The relationship name assignment and explicit empty params parameter follow the standardization pattern, ensuring uniform query interfaces across the codebase.


30-32: New query added with consistent parameter style

This new query ensures nodes have a text field by coalescing summary and content fields, maintaining the standardized query pattern with explicit params.


34-34: Updated method for retrieving graph data

Changing from get_model_independent_graph_data() to get_graph_data() aligns with the new unified graph engine interface that supports multiple graph database providers including the new Memgraph support.


36-40: Simplified node data iteration

The iteration pattern has been improved to directly use key-value pairs from the graph data structure, making the code more readable and less prone to indexing errors.


64-65: Simplified edge relationship extraction

The code now directly accesses the relationship name at index 2 of each edge tuple, simplifying the previous more complex filtering and indexing logic. This makes the code more maintainable.

notebooks/cognee_llama_index.ipynb (2)

15-17: Notebook cell format standardization looks good.

The source field for this code cell has been updated to use a list format instead of a single string, which aligns with standard Jupyter notebook JSON conventions. This is a purely structural change that doesn't affect code execution or functionality.


123-125: Structural field reordering approved.

The metadata and outputs fields in this cell have been reordered, which is again just a format standardization that aligns with Jupyter notebook conventions. This change doesn't affect the code execution or functionality of the notebook.

notebooks/cognee_simple_demo.ipynb (9)

17-19: Package version upgrade to cognee 0.1.39.

The cognee package has been updated from version 0.1.36 to 0.1.39. This appears to be part of the broader architectural changes mentioned in the PR summary.


13-16: Notebook metadata standardization looks good.

The execution_count and outputs fields have been properly standardized for this cell. This change improves notebook compatibility without affecting functionality.


33-41: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized for consistency with other notebook cells.


53-60: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized for consistent notebook formatting.


72-80: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized. These changes are structural only and don't affect the code's functionality.


92-98: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized in this cell.


102-108: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized in this cell.


112-118: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized in this cell.


130-143: Cell metadata formatting standardized.

The execution_count and outputs fields have been standardized in this visualization cell.

notebooks/llama_index_cognee_integration.ipynb (5)

6-8: Markdown cell formatting standardized.

The source field for this markdown cell has been converted from a single string to a list containing that string, which is a standard Jupyter notebook format. This change doesn't affect functionality.


62-64: Code cell formatting standardized.

The source field for this pip install cell has been converted to a list format, which aligns with Jupyter notebook standards. This is a formatting change only.


199-201: Markdown cell formatting standardized.

The source field for this markdown cell has been converted to list format, following standard Jupyter notebook conventions.


220-221: Markdown cell formatting standardized.

The source field for this conclusion markdown cell has been converted to list format, aligning with Jupyter notebook conventions.


282-283: Python version updated in kernel metadata.

The Python version in the language_info metadata has been updated to 3.12.9. This ensures the notebook is associated with the correct Python version.

@@ -97,7 +97,7 @@ git push origin feature/your-feature-name

2. Create a Pull Request:
- Go to the [**cognee** repository](https://github.com/topoteretes/cognee)
- Click "Compare & Pull Request" and make sure to open PR against dev branch
- Click "Compare & Pull Request"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure PRs target the dev branch: The removal of the explicit instruction to open pull requests against the dev branch may lead contributors to accidentally target main. Please reintroduce guidance to verify that the base branch is set to dev when clicking "Compare & Pull Request".

🤖 Prompt for AI Agents
In CONTRIBUTING.md at line 100, reintroduce the instruction reminding
contributors to ensure their pull requests target the `dev` branch. Add a clear
note or step near the "Click 'Compare & Pull Request'" instruction that
explicitly tells users to verify the base branch is set to `dev` before
submitting the PR.

@@ -7,6 +7,7 @@
from .get_datasets_by_name import get_datasets_by_name
from .get_dataset_data import get_dataset_data
from .get_data import get_data
from .get_unique_dataset_id import get_unique_dataset_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Address unused import in __init__.py.

The import of get_unique_dataset_id is never referenced within this file, causing a F401 lint error. Since we want to expose it as part of the public API, add a # noqa: F401 or define an __all__ list.

For example, to suppress the warning:

-from .get_unique_dataset_id import get_unique_dataset_id
+# noqa: F401
+from .get_unique_dataset_id import get_unique_dataset_id
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .get_unique_dataset_id import get_unique_dataset_id
# noqa: F401
from .get_unique_dataset_id import get_unique_dataset_id
🧰 Tools
🪛 Ruff (0.8.2)

10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🤖 Prompt for AI Agents
In cognee/modules/data/methods/__init__.py at line 10, the import of
get_unique_dataset_id is unused and triggers a F401 lint error. To fix this,
either add a comment # noqa: F401 at the end of the import line to suppress the
warning or define an __all__ list including 'get_unique_dataset_id' to
explicitly declare it as part of the public API.

Comment on lines +5 to +6
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unnecessary async declaration.

The function is declared as async but doesn't perform any asynchronous operations. Consider removing the async keyword unless you plan to add asynchronous operations later.

-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
    return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")

Additionally, consider adding input validation for empty dataset names and checking that the user and user.id are not None.

-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
-    return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
+async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+    if not dataset_name:
+        raise ValueError("Dataset name cannot be empty")
+    if not user or not user.id:
+        raise ValueError("User or user ID cannot be None")
+    return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
Suggested change
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
if not dataset_name:
raise ValueError("Dataset name cannot be empty")
if not user or not user.id:
raise ValueError("User or user ID cannot be None")
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_unique_dataset_id.py at lines 5 to 6, remove
the async keyword from the function declaration since it does not perform any
asynchronous operations. Additionally, add input validation to check that
dataset_name is not empty and that user and user.id are not None, raising
appropriate exceptions if these conditions are not met.

Comment on lines +5 to +11
def get_observe():
monitoring = get_base_config().monitoring_tool

if monitoring == Observer.LANGFUSE:
from langfuse.decorators import observe

return observe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing handling for other Observer types

The function only handles the case when monitoring == Observer.LANGFUSE, but doesn't handle the other enum values (LLMLITE, LANGSMITH) defined in the Observer enum. There's also no fallback implementation or error handling if an unsupported monitoring tool is configured.

I recommend adding handling for other observer types or at least a fallback:

def get_observe():
    monitoring = get_base_config().monitoring_tool

    if monitoring == Observer.LANGFUSE:
        from langfuse.decorators import observe
        return observe
+    elif monitoring == Observer.LLMLITE:
+        # Import and return LLMLITE observe decorator
+        pass
+    elif monitoring == Observer.LANGSMITH:
+        # Import and return LANGSMITH observe decorator
+        pass
+    else:
+        # Provide a fallback or raise a meaningful error
+        raise ValueError(f"Unsupported monitoring tool: {monitoring}")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/modules/observability/get_observe.py around lines 5 to 11, the
get_observe function only handles the case when monitoring equals
Observer.LANGFUSE and lacks handling for other Observer enum values like LLMLITE
and LANGSMITH. To fix this, add conditional branches to handle these other
observer types appropriately, or provide a fallback implementation or raise a
clear error if the monitoring tool is unsupported, ensuring all enum cases are
covered.

Comment on lines +63 to +69
const onCognify = useCallback(() => {
const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
return onDatasetCognify({
id: dataset!.id,
name: dataset!.name,
});
}, [datasets, onDatasetCognify, selectedDataset]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding null check before accessing dataset properties.

The function doesn't validate that a dataset was found before accessing its properties with non-null assertions (!). This could cause runtime errors if selectedDataset doesn't match any dataset in the array.

  const onCognify = useCallback(() => {
    const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
+   if (!dataset) {
+     showNotification("No dataset selected. Please select a dataset first.", 5000);
+     return Promise.reject(new Error("No dataset selected"));
+   }
    return onDatasetCognify({
-     id: dataset!.id,
-     name: dataset!.name,
+     id: dataset.id,
+     name: dataset.name,
    });
  }, [datasets, onDatasetCognify, selectedDataset]);
🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify
function uses non-null assertions on dataset properties without checking if
dataset is null or undefined. To fix this, add a null check after finding the
dataset to ensure it exists before accessing its id and name. If dataset is not
found, handle the case appropriately, such as returning early or showing an
error, to prevent runtime errors.

Comment on lines +82 to +88
query = """
UNWIND $nodes AS node
MERGE (n {id: node.node_id})
ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp()
ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp()
RETURN ID(n) AS internal_id, n.id AS nodeId
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Dynamic labels cannot be parameterised – Cypher syntax error

n:node.label attempts to use a parameter for the label, which Cypher/Memgraph doesn’t allow. The query will raise a syntax error.

-MERGE (n {id: node.node_id})
-ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp()
-ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp()
+MERGE (n {id: node.node_id})
+WITH n, node
+CALL apoc.create.addLabels(n, [node.label]) YIELD node AS n2   // requires APOC; otherwise build separate queries per label
+SET n2 += node.properties,
+    n2.updated_at = timestamp()

Alternatively, group nodes by label and execute one query per label.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 82 to 88, the Cypher query incorrectly uses a parameter for a node label
with syntax like n:node.label, which is not allowed. To fix this, refactor the
code to group nodes by their label and run separate queries for each label,
embedding the label directly in the query string instead of using a parameter.
This avoids dynamic label parameterization errors by executing one query per
label with the label hardcoded in the Cypher MERGE statement.

Comment on lines +380 to +394
predecessors, successors = await asyncio.gather(
self.query(predecessors_query, dict(node_id=str(node_id))),
self.query(successors_query, dict(node_id=str(node_id))),
)

connections = []

for neighbour in predecessors:
neighbour = neighbour["relation"]
connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2]))

for neighbour in successors:
neighbour = neighbour["relation"]
connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

get_connections builds invalid tuples – relation is not subscriptable

neighbour = neighbour["relation"] returns a neo4j.graph.Relationship object; indexing it (neighbour[0]) will raise TypeError. Consider extracting properties explicitly:

-    for rec in predecessors:
-        rel = rec["relation"]
-        connections.append((rel.start_node["id"],
-                            {"relationship_name": rel.type},
-                            rel.end_node["id"]))

Also remove the shadowing of the neighbour loop variable.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 380 to 394, the code incorrectly treats the 'relation' object as
subscriptable, causing a TypeError. To fix this, avoid indexing the 'relation'
object directly; instead, explicitly extract the start node ID, relationship
type, and end node ID using the appropriate properties or methods of the
Relationship object. Also, rename the inner loop variable to prevent shadowing
the outer 'neighbour' variable.

Comment on lines +120 to +124
sanitized_id = node_id.replace(":", "_")

query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
params = {"node_id": sanitized_id}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

delete_node Cypher syntax error (: before {)

MATCH (node: {{id: $node_id}}) results in MATCH (node: {id: $node_id}), which is invalid. The colon introduces a label, not properties. Replace with:

-query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
+query = "MATCH (node {id: $node_id}) DETACH DELETE node"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sanitized_id = node_id.replace(":", "_")
query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
params = {"node_id": sanitized_id}
sanitized_id = node_id.replace(":", "_")
query = "MATCH (node {id: $node_id}) DETACH DELETE node"
params = {"node_id": sanitized_id}
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 120 to 124, the Cypher query syntax is incorrect because it uses a colon
before a property map, which is invalid. Replace the query string to remove the
colon and use parentheses with a property map directly, like MATCH (node {id:
$node_id}), to correctly match nodes by property.

Comment on lines +415 to +418
UNWIND $node_ids AS id
MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
DELETE r;
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect f-string placeholders introduce labels named after IDs

MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)

produces labels like :123e4567-e89b... which is almost certainly unintended. Use a property match instead:

-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+MATCH (node {id: successor_id})<-[r:`{edge_label}`]-(successor)

(again, consider validating edge_label and using parameterised Cypher).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 415 to 418, the Cypher query incorrectly uses f-string placeholders to
insert node IDs as labels, resulting in unintended labels like
:123e4567-e89b.... Replace the label match with a property match on the node,
such as matching nodes by an ID property instead of using the ID as a label.
Also, validate the edge_label input and use parameterized Cypher queries to
prevent injection risks.

Comment on lines +430 to +444
def serialize_properties(self, properties=dict()):
serialized_properties = {}

for property_key, property_value in properties.items():
if isinstance(property_value, UUID):
serialized_properties[property_key] = str(property_value)
continue

if isinstance(property_value, dict):
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder)
continue

serialized_properties[property_key] = property_value

return serialized_properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Mutable default argument violates best practices

serialize_properties(self, properties=dict()) uses a mutable default; callers mutating the dict will affect subsequent calls.

-    def serialize_properties(self, properties=dict()):
+    def serialize_properties(self, properties: Optional[dict] = None):
+        if properties is None:
+            properties = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def serialize_properties(self, properties=dict()):
serialized_properties = {}
for property_key, property_value in properties.items():
if isinstance(property_value, UUID):
serialized_properties[property_key] = str(property_value)
continue
if isinstance(property_value, dict):
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder)
continue
serialized_properties[property_key] = property_value
return serialized_properties
def serialize_properties(self, properties: Optional[dict] = None):
if properties is None:
properties = {}
serialized_properties = {}
for property_key, property_value in properties.items():
if isinstance(property_value, UUID):
serialized_properties[property_key] = str(property_value)
continue
if isinstance(property_value, dict):
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder)
continue
serialized_properties[property_key] = property_value
return serialized_properties
🧰 Tools
🪛 Ruff (0.8.2)

430-430: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 430 to 444, the method serialize_properties uses a mutable default
argument (properties=dict()), which can lead to unexpected behavior if the
dictionary is modified. Change the default value to None and inside the method
initialize properties to an empty dictionary if it is None before proceeding.

@Vasilije1990 Vasilije1990 self-requested a review May 15, 2025 11:15
@Vasilije1990 Vasilije1990 merged commit 0aac93e into main May 15, 2025
59 of 94 checks passed
@coderabbitai coderabbitai bot mentioned this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants