Skip to content

Conversation

borisarzentar
Copy link
Member

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@borisarzentar borisarzentar self-assigned this May 12, 2025
Copy link

pull-checklist bot commented May 12, 2025

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Contributor

coderabbitai bot commented May 12, 2025

Walkthrough

This update introduces a new Memgraph graph database adapter, unifies collection error handling across vector database adapters, and propagates an optional context parameter through pipeline task execution. It removes deprecated methods and error classes, updates documentation and test scripts, and refactors observability configuration for monitoring tools. Several files are added, deleted, or refactored for consistency and robustness.

Changes

Files/Groups Change Summary
Dockerfile Installs build-essential, updates Poetry install command to include dev dependencies.
README.md, community/README.zh.md Adds language translation links, updates Colab link, introduces cognee UI section; adds Chinese README.
assets/graph_visualization.html Removes interactive D3.js graph visualization HTML asset.
cognee-mcp/src/server.py Refactors server to use decorator-based tool registration, async background tasks, argparse for CLI; removes central dispatcher.
cognee/api/v1/cognify/code_graph_pipeline.py, cognee/infrastructure/llm/gemini/adapter.py, cognee/infrastructure/llm/openai/adapter.py Refactors observability setup to use get_observe(); removes conditional imports and monitoring tool checks.
cognee/base_config.py, cognee/shared/data_models.py, cognee/modules/observability/observers.py, cognee/modules/observability/get_observe.py Replaces MonitoringTool enum with new Observer enum; centralizes observability decorator retrieval.
cognee/exceptions/exceptions.py, cognee/infrastructure/databases/vector/exceptions/exceptions.py Adds logging control to exception constructors; corrects error names and logging parameters.
cognee/infrastructure/databases/graph/get_graph_engine.py, cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py Adds Memgraph graph database support with new async adapter; updates factory to handle Memgraph credentials.
cognee/infrastructure/databases/graph/graph_db_interface.py Lowers error logging level from error to debug in relationship/session handling.
cognee/infrastructure/databases/graph/networkx/adapter.py Updates method signatures to use UUID instead of str for node IDs.
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py Restricts schema deletion to public and public_staging schemas only.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py, cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py, cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py, cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py, cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py, cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py Centralizes collection retrieval, improves error handling for missing collections, removes deprecated methods, adjusts default search limits, and ensures async consistency.
cognee/modules/graph/cognee_graph/CogneeGraph.py Updates vector distance mapping to use unified search method with explicit keyword arguments.
cognee/modules/pipelines/operations/run_tasks.py, cognee/modules/pipelines/operations/run_tasks_base.py Adds optional context parameter to task execution pipeline; propagates through all relevant functions.
cognee/modules/retrieval/exceptions/init.py, cognee/modules/retrieval/exceptions/exceptions.py Removes unused CollectionDistancesNotFoundError exception.
cognee/modules/retrieval/graph_completion_retriever.py, cognee/modules/retrieval/utils/brute_force_triplet_search.py Refactors error handling to suppress or return empty results on missing entities/collections.
cognee/shared/logging_utils.py Changes SQLAlchemy warning suppression to apply for log levels above DEBUG.
cognee/tests/integration/run_toy_tasks/conftest.py, cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py Removes unused test fixtures and unit tests related to deprecated error handling.
cognee/tests/test_memgraph.py Adds new async integration test for Memgraph adapter, including data setup, search, and cleanup.
cognee/tests/test_neo4j.py Comments out natural language search test, updates search history assertion.
cognee/tests/test_weaviate.py Awaits async collection listing after pruning.
cognee/tests/unit/modules/pipelines/run_tasks_test.py Adds script entry point for running test directly.
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py Adds new async test for pipeline execution with context propagation.
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py, cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py, cognee/tests/unit/modules/retrieval/summaries_retriever_test.py Refactors test execution blocks for async consistency and updates parameter names.
examples/data/car_and_tech_companies.txt Adds text data describing car and tech companies for use in examples/tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Pipeline
    participant Task
    participant Context

    User->>Pipeline: run_tasks(tasks, data, user, context)
    Pipeline->>Task: handle_task(task, args, ..., context)
    Task->>Task: executable(..., context) if accepted
    Task-->>Pipeline: result
    Pipeline-->>User: final output
Loading
sequenceDiagram
    participant App
    participant GraphEngineFactory
    participant MemgraphAdapter

    App->>GraphEngineFactory: create_graph_engine(provider="memgraph", ...)
    GraphEngineFactory->>MemgraphAdapter: __init__(...)
    MemgraphAdapter-->>GraphEngineFactory: instance
    GraphEngineFactory-->>App: MemgraphAdapter instance
Loading

Possibly related PRs

  • topoteretes/cognee#798: Also modifies the Dockerfile to install build-essential and changes Poetry install flags, matching the Dockerfile changes in this PR.
  • topoteretes/cognee#788: Adds a context parameter to pipeline task execution modules, similar to the context propagation changes in this PR.
  • topoteretes/cognee#145: Updates the README.md Colab link; related to the documentation updates in this PR.

Suggested labels

run-checks

Suggested reviewers

  • dexters1

Poem

A rabbit hops through code so bright,
Adapters bloom for Memgraph’s might.
Context now flows through each task,
Collections checked—no need to ask!
Docs in Chinese, tests anew,
Pipelines run with context too.
🐇✨ Change is sweet—Cognee grew!

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@borisarzentar borisarzentar changed the base branch from main to dev May 12, 2025 16:51
@borisarzentar borisarzentar requested a review from soobrosa May 12, 2025 16:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 12

🔭 Outside diff range comments (3)
cognee/exceptions/exceptions.py (1)

34-54: 🛠️ Refactor suggestion

Update subclasses to support new logging parameters

The subclasses like ServiceError and InvalidValueError don't accept or pass the new log and log_level parameters to the superclass constructor. They will always use default values regardless of what callers provide.

 class ServiceError(CogneeApiError):
     """Failures in external services or APIs, like a database or a third-party service"""
 
     def __init__(
         self,
         message: str = "Service is unavailable.",
         name: str = "ServiceError",
         status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+        log=True,
+        log_level="ERROR",
     ):
-        super().__init__(message, name, status_code)
+        super().__init__(message, name, status_code, log, log_level)

Similar updates should be made to InvalidValueError and InvalidAttributeError classes to maintain consistency across all exception types.

cognee/infrastructure/databases/graph/networkx/adapter.py (1)

139-143: 🛠️ Refactor suggestion

ID type inconsistencies ripple through multiple APIs

The following methods now demand UUID while other parts of the adapter (e.g. has_edge, add_edge) still accept str:

  • get_edges
  • extract_node / extract_nodes
  • get_neighbors
  • get_node / get_nodes

In addition, get_disconnected_nodes (165-175) promises List[str] but will return UUID objects.

Uneven typing will confuse callers, especially when IDs originate from external sources such as REST/JSON where they arrive as strings.

Recommendation:

-async def get_edges(self, node_id: UUID):
+async def get_edges(self, node_id: str | UUID):
     node_id = UUID(node_id) if isinstance(node_id, str) else node_id

…and apply the same coercion pattern everywhere, or standardise once in the interface.

Also applies to: 177-185, 624-637

cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)

151-165: 🛠️ Refactor suggestion

retrieve returns raw Weaviate objects – breaks adapter contract

Other adapters return a List[ScoredResult]. Returning driver-specific objects
forces callers to add special-case handling.

Either map to ScoredResult, or update the interface & downstream code.

🧹 Nitpick comments (16)
examples/data/car_and_tech_companies.txt (1)

20-37: Streamline text_2 entries and enhance readability.

Similarly, remove repetitive headings and tighten phrasing for better flow. Consider replacing verbose constructions with more concise alternatives.

text_2 = """
-1. Apple
-Apple is renowned for its innovative consumer electronics and software. Its product lineup includes the iPhone, iPad, Mac computers, and wearables like the Apple Watch. Known for its emphasis on sleek design and user-friendly interfaces, Apple has built a loyal customer base and created a seamless ecosystem that integrates hardware, software, and services.
+1. Apple is renowned for its innovative consumer electronics and software. Its product lineup includes the iPhone, iPad, Mac computers, and wearables like the Apple Watch. Known for its emphasis on sleek design and user-friendly interfaces, Apple has built a loyal customer base and created a seamless ecosystem that integrates hardware, software, and services.
 
-2. Google
-Founded in 1998, Google started as a search engine and quickly became the go-to resource for finding information online. Over the years, the company has diversified its offerings to include digital advertising, cloud computing, mobile operating systems (Android), and various web services like Gmail and Google Maps. Google's innovations have played a major role in shaping the internet landscape.
+2. Google started as a search engine in 1998 and quickly became the go-to resource for finding information online. Over the years, the company has diversified its offerings to include digital advertising, cloud computing, mobile operating systems (Android), and various web services like Gmail and Google Maps. Google's innovations have played a major role in shaping the internet landscape.
 
-3. Microsoft
-Microsoft Corporation has been a dominant force in software for decades. In recent years, Microsoft has expanded into cloud computing with Azure, gaming with the Xbox platform, and even hardware through products like the Surface line. This evolution has helped the company maintain its relevance in a rapidly changing tech world.
+3. Microsoft Corporation has been a dominant force in software for decades. Recently, Microsoft has expanded into cloud computing with Azure, gaming with the Xbox platform, and even hardware through products like the Surface line. This evolution has helped the company maintain its relevance in a rapidly changing tech world.
 
-4. Amazon
-What began as an online bookstore has grown into one of the largest e-commerce platforms globally. Amazon is known for its vast online marketplace, but its influence extends far beyond retail. With Amazon Web Services (AWS), the company has become a leader in cloud computing, offering robust solutions that power websites, applications, and businesses around the world. Amazon's constant drive for innovation continues to reshape both retail and technology sectors.
+4. Amazon has grown from an online bookstore into one of the largest e-commerce platforms globally. Known for its vast online marketplace, Amazon's influence extends far beyond retail. With Amazon Web Services (AWS), the company is a leader in cloud computing, offering robust solutions that power websites, applications, and businesses worldwide. Amazon's constant drive for innovation continues to reshape both retail and technology sectors.
 
-5. Meta
-Meta, originally known as Facebook, revolutionized social media by connecting billions of people worldwide. Beyond its core social networking service, Meta is investing in the next generation of digital experiences through virtual and augmented reality technologies, with projects like Oculus. The company's efforts signal a commitment to evolving digital interaction and building the metaverse—a shared virtual space where users can connect and collaborate.
+5. Meta, originally known as Facebook, revolutionized social media by connecting billions of people worldwide. Beyond its core social networking service, Meta is investing in the next generation of digital experiences through virtual and augmented reality technologies, with projects like Oculus. The company's efforts signal a commitment to evolving digital interaction and building the metaverse—a shared virtual space where users can connect and collaborate.
"""
🧰 Tools
🪛 LanguageTool

[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...

(IN_RECENT_STYLE)


[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...

(ENGLISH_WORD_REPEAT_RULE)

cognee/modules/retrieval/exceptions/__init__.py (1)

7-7: Address unused imports flagged by static analysis.

The imports SearchTypeNotSupported and CypherSearchError are flagged as unused in this module. Since this is an __init__.py file, these are likely intended to be re-exported at the package level.

To properly re-export these exceptions and silence the static analysis warnings, add an __all__ declaration:

-from .exceptions import SearchTypeNotSupported, CypherSearchError
+from .exceptions import SearchTypeNotSupported, CypherSearchError
+
+__all__ = ["SearchTypeNotSupported", "CypherSearchError"]
🧰 Tools
🪛 Ruff (0.8.2)

7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (2)

14-21: Add comment explaining task signature differences

The test effectively validates context propagation, but task_2 has a different signature from the other tasks. Consider adding a comment to clarify that this is intentional, demonstrating that tasks can optionally accept or ignore the context parameter.

 def task_1(num, context):
     return num + context

+# Intentionally omitting context parameter to test tasks that don't need context
 def task_2(num):
     return num * 2

 def task_3(num, context):
     return num**context

37-39: Add validation comments for clarity

The test assertions would be more informative with comments explaining how the expected value was calculated.

-    final_result = 4586471424
+    # Calculate expected result: 5 + 7 = 12, 12 * 2 = 24, 24^7 = 4,586,471,424
+    final_result = 4586471424
     async for result in pipeline:
         assert result == final_result
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

66-73: Refactor using contextlib.suppress for cleaner code

The try-except-pass pattern can be simplified using Python's contextlib module.

-try:
-    await memory_fragment.project_graph_from_db(
-        graph_engine,
-        node_properties_to_project=properties_to_project,
-        edge_properties_to_project=["relationship_name"],
-    )
-except EntityNotFoundError:
-    pass
+from contextlib import suppress
+
+with suppress(EntityNotFoundError):
+    await memory_fragment.project_graph_from_db(
+        graph_engine,
+        node_properties_to_project=properties_to_project,
+        edge_properties_to_project=["relationship_name"],
+    )
🧰 Tools
🪛 Ruff (0.8.2)

66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)

cognee/modules/pipelines/operations/run_tasks_base.py (1)

31-34: Optimize dict key checking

The key lookup can be simplified for better performance.

-has_context = any(
-    [key == "context" for key in inspect.signature(running_task.executable).parameters.keys()]
-)
+has_context = "context" in inspect.signature(running_task.executable).parameters
🧰 Tools
🪛 Ruff (0.8.2)

32-32: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)

179-180: Consider conditional limit parameter in SQL query

The limit parameter is passed directly to the Qdrant client even when its value is negative or zero, which could lead to unexpected behavior. While line 179 checks if the limit is positive, consider improving this by only passing the limit parameter when it's greater than zero.

-                limit=limit if limit > 0 else None,
+                limit=limit if limit > 0 else None,

This is actually already correctly implemented - the code checks if the limit is greater than zero and passes None otherwise, which is the right approach.

cognee-mcp/src/server.py (1)

170-191: Graceful shutdown signal missing

If mcp.run_stdio_async()/run_sse_async() are cancelled (e.g. SIGINT), the background asyncio.create_task jobs keep running and the process may exit abruptly, losing work.
Consider attaching them to a supervisor task or using asyncio.TaskGroup / add_done_callback to capture and flush logs before exit.

cognee/infrastructure/databases/graph/networkx/adapter.py (1)

268-285: Edge-removal loops are O(N×E) and synchronous

For large node_ids this double-nested loop becomes costly.
Consider using NetworkX’s vectorised helpers (e.g. edges(node_id, data=True) + list comprehension) or batching removals, then committing a single save_graph_to_file at the end.

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)

78-84: get_collection re-checks existence but opens the table anyway

The second call to await connection.open_table(collection_name) can still fail if the collection was deleted between the two operations (TOCTOU window).
To make this atomic, try to open and catch the provider-specific “not found” error instead of issuing two separate RPCs.


208-213: Serial deletion can be slow – leverage a bulk filter

Deleting each ID one-by-one incurs N round-trips.
LanceDB supports filtering; a single call is faster and transactional:

-        for data_point_id in data_point_ids:
-            await collection.delete(f"id = '{data_point_id}'")
+        ids_tuple = tuple(data_point_ids)
+        await collection.delete(f"id IN {ids_tuple}")
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

239-241: Catching Exception broadly obscures real problems

Swallowing every exception logs the message but returns an empty list – masking functional bugs, mis-configurations, or network issues. Restrict the except clause to the concrete errors you expect (e.g. CollectionNotFoundError, chromadb.errors.ChromaError) and let the rest propagate.


243-249: Search-limit defaults are now inconsistent (15 vs 5)

search() defaults to limit=15, but batch_search() still defaults to 5. For a predictable API keep them aligned:

-        limit: int = 5,
+        limit: int = 15,
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)

430-444: Mutable default argument (dict()) in serialize_properties

Using a mutable default causes shared state between calls.

-    def serialize_properties(self, properties=dict()):
+    def serialize_properties(self, properties: dict | None = None):
     ...
-        for property_key, property_value in properties.items():
+        for property_key, property_value in (properties or {}).items():
🧰 Tools
🪛 Ruff (0.8.2)

430-430: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)

89-100: list.index() inside loop gives O(n²) behaviour

convert_to_weaviate_data_points calls data_points.index(...) for each item,
turning an otherwise linear transformation into quadratic time for large
batches.

-        def convert_to_weaviate_data_points(data_point: DataPoint):
-            vector = data_vectors[data_points.index(data_point)]
+        def convert_to_weaviate_data_points(idx: int, data_point: DataPoint):
+            vector = data_vectors[idx]
 ...
-data_points = [convert_to_weaviate_data_points(dp) for dp in data_points]
+data_points = [
+    convert_to_weaviate_data_points(i, dp) for i, dp in enumerate(data_points)
+]

43-46: Repeated await client.connect() on every call is wasteful

use_async_with_weaviate_cloud keeps its own connection pool. Calling connect
each time adds latency and may exhaust sockets. Connect once in __init__() or
guard with an internal flag.

🛑 Comments failed to post (12)
examples/data/car_and_tech_companies.txt (1)

1-18: 🛠️ Refactor suggestion

Remove redundant company name headings and correct pluralization in text_1.

Each bullet entry repeats the company name on the next line, which is redundant. Consolidating the name and description into one line improves readability. Also fix the subject-verb agreement in the closing sentence.

text_1 = """
-1. Audi
-Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars.
+1. Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars.
 
-2. BMW
-BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance.
+2. BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance.
 
-3. Mercedes-Benz
-Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs.
+3. Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs.
 
-4. Porsche
-Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style.
+4. Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style.
 
-5. Volkswagen
-Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality.
+5. Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality.
 
-Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.
+Each of these car manufacturers contribute to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.
"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

text_1 = """
1. Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars.

2. BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance.

3. Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs.

4. Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style.

5. Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality.

Each of these car manufacturers contribute to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.
"""
🧰 Tools
🪛 LanguageTool

[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...

(ENGLISH_WORD_REPEAT_RULE)


[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...

(A_VARIETY_OF)


[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)


[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

🤖 Prompt for AI Agents (early access)
In examples/data/car_and_tech_companies.txt lines 1 to 18, remove the redundant
company name headings that appear as separate lines before each description by
merging the company name with the first sentence of its description. Also,
correct the pluralization in the last sentence by changing "Each of these car
manufacturer contributes" to "Each of these car manufacturers contributes" to
fix subject-verb agreement.
cognee/modules/observability/get_observe.py (1)

1-11: 💡 Verification agent

🧩 Analysis chain

Well-structured observability decorator provider

This new function correctly centralizes the logic for obtaining the appropriate observability decorator based on the configured monitoring tool. However, there's no default return value if the monitoring tool isn't LANGFUSE.

Consider adding a default return value to handle cases where the monitoring tool isn't LANGFUSE:

def get_observe():
    monitoring = get_base_config().monitoring_tool

    if monitoring == Observer.LANGFUSE:
        from langfuse.decorators import observe

        return observe
+    elif monitoring == Observer.LLMLITE:
+        # Import and return LLMLITE observe decorator
+        pass
+    elif monitoring == Observer.LANGSMITH:
+        # Import and return LANGSMITH observe decorator
+        pass
+    else:
+        # Return a no-op decorator as fallback
+        def noop_observe(*args, **kwargs):
+            def decorator(func):
+                return func
+            return decorator if args and callable(args[0]) else decorator
+        
+        return noop_observe

Let's verify if there are any potential issues with missing the other observer types:


🏁 Script executed:

#!/bin/bash
# Check if the other observer types are being used anywhere in the codebase
echo "Checking for LLMLITE usage:"
rg "Observer\.LLMLITE" -A 2 -B 2

echo "Checking for LANGSMITH usage:"
rg "Observer\.LANGSMITH" -A 2 -B 2

echo "Checking if monitoring_tool is set to values other than LANGFUSE:"
rg "monitoring_tool\s*=\s*" -A 2 -B 2

echo "Finding potential error cases for get_observe():"
rg "get_observe\(\)" -A 5 -B 5

Length of output: 3613


I’d like to inspect the Observer enum to confirm which monitoring types are defined. Then we can decide on the appropriate fallback behavior.


🏁 Script executed:

#!/bin/bash
set -e
# Locate and display the Observer enum definitions
file=$(fd -e py "observers.py" | head -n1)
echo "Found observers file: $file"
sed -n '1,200p' "$file"

Length of output: 446


Add fallback and handle all Observer types in get_observe()

Currently, get_observe() only returns a decorator for Observer.LANGFUSE—for any other value (including LLMLITE and LANGSMITH), it returns None and will break when used as a decorator. We should:

  • Provide a no-op fallback decorator so code doesn’t error out.
  • Stub out branches for the other enum values (to be implemented when those integrations land).

File: cognee/modules/observability/get_observe.py

 def get_observe():
     monitoring = get_base_config().monitoring_tool

     if monitoring == Observer.LANGFUSE:
         from langfuse.decorators import observe
         return observe
+    # TODO: implement LLMLITE observability
+    elif monitoring == Observer.LLMLITE:
+        # from llmlite.decorators import observe
+        # return observe
+        pass
+    # TODO: implement LANGSMITH observability
+    elif monitoring == Observer.LANGSMITH:
+        # from langsmith.decorators import observe
+        # return observe
+        pass
+    # Fallback no-op decorator
+    def noop_observe(*args, **kwargs):
+        def decorator(func):
+            return func
+        return decorator if args and callable(args[0]) else decorator
+    return noop_observe

– Ensures that observe = get_observe() is always a callable decorator
– Leaves clear TODOs for LLMLITE/LANGSMITH support

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from cognee.base_config import get_base_config
from .observers import Observer


def get_observe():
    monitoring = get_base_config().monitoring_tool

    if monitoring == Observer.LANGFUSE:
        from langfuse.decorators import observe
        return observe
    # TODO: implement LLMLITE observability
    elif monitoring == Observer.LLMLITE:
        # from llmlite.decorators import observe
        # return observe
        pass
    # TODO: implement LANGSMITH observability
    elif monitoring == Observer.LANGSMITH:
        # from langsmith.decorators import observe
        # return observe
        pass
    # Fallback no-op decorator
    def noop_observe(*args, **kwargs):
        def decorator(func):
            return func
        return decorator if args and callable(args[0]) else decorator
    return noop_observe
🤖 Prompt for AI Agents (early access)
In cognee/modules/observability/get_observe.py around lines 1 to 11, the
get_observe() function only returns a decorator for Observer.LANGFUSE and
returns None otherwise, which will cause errors when used as a decorator. Fix
this by adding a no-op fallback decorator that simply returns the original
function unchanged, ensuring get_observe() always returns a callable. Also, add
stub branches for Observer.LLMLITE and Observer.LANGSMITH with TODO comments for
future implementation, so all enum values are handled explicitly.
cognee-mcp/src/server.py (2)

44-46: 🛠️ Refactor suggestion

⚠️ Potential issue

Preserve original traceback when re-raising the exception

Re-throwing with a plain ValueError masks the original stack-trace, making debugging harder.
Use exception chaining so the root cause is retained:

-            except Exception as e:
-                logger.error("Cognify process failed.")
-                raise ValueError(f"Failed to cognify: {str(e)}")
+            except Exception as err:
+                logger.error("Cognify process failed.")
+                raise ValueError(f"Failed to cognify: {err}") from err
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            except Exception as err:
                logger.error("Cognify process failed.")
                raise ValueError(f"Failed to cognify: {err}") from err
🧰 Tools
🪛 Ruff (0.8.2)

46-46: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents (early access)
In cognee-mcp/src/server.py around lines 44 to 46, the current code raises a new
ValueError without preserving the original traceback, which obscures the root
cause. Modify the raise statement to use exception chaining by adding "from e"
after the new exception to retain the original exception context and stack trace
for better debugging.

104-128: ⚠️ Potential issue

search_type is not validated – may explode with KeyError

SearchType[search_type.upper()] will raise a KeyError for any invalid input (e.g. typos or future enum changes).
Fail fast with a friendly message before dereferencing:

-            search_results = await cognee.search(
-                query_type=SearchType[search_type.upper()], query_text=search_query
-            )
+            try:
+                query_type = SearchType[search_type.upper()]
+            except KeyError:
+                raise ValueError(
+                    f"Unsupported search_type '{search_type}'. "
+                    f"Expected one of {[t.name.lower() for t in SearchType]}"
+                )
+
+            search_results = await cognee.search(
+                query_type=query_type, query_text=search_query
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

async def search(search_query: str, search_type: str) -> list:
    async def search_task(search_query: str, search_type: str) -> str:
        """Search the knowledge graph"""
        # NOTE: MCP uses stdout to communicate, we must redirect all output
        #       going to stdout ( like the print function ) to stderr.
        with redirect_stdout(sys.stderr):
-            search_results = await cognee.search(
-                query_type=SearchType[search_type.upper()], query_text=search_query
-            )
+            try:
+                query_type = SearchType[search_type.upper()]
+            except KeyError:
+                raise ValueError(
+                    f"Unsupported search_type '{search_type}'. "
+                    f"Expected one of {[t.name.lower() for t in SearchType]}"
+                )
+
+            search_results = await cognee.search(
+                query_type=query_type, query_text=search_query
+            )

            if search_type.upper() == "CODE":
                return json.dumps(search_results, cls=JSONEncoder)
            elif (
                search_type.upper() == "GRAPH_COMPLETION" or search_type.upper() == "RAG_COMPLETION"
            ):
                return search_results[0]
            elif search_type.upper() == "CHUNKS":
                return str(search_results)
            elif search_type.upper() == "INSIGHTS":
                results = retrieved_edges_to_string(search_results)
                return results
            else:
                return str(search_results)

    search_results = await search_task(search_query, search_type)
🤖 Prompt for AI Agents (early access)
In cognee-mcp/src/server.py around lines 104 to 128, the code uses
SearchType[search_type.upper()] without validating if search_type.upper() is a
valid key, which can cause a KeyError. To fix this, add a validation step before
accessing the enum: check if search_type.upper() is in SearchType.__members__,
and if not, raise a clear and friendly exception or return an error message.
This ensures the function fails fast with a helpful message instead of crashing
unexpectedly.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

45-47: ⚠️ Potential issue

Public method signatures diverge from the declared interface

GraphDBInterface.has_node (see graph_db_interface.py) expects node_id: str, but the adapter now declares node_id: UUID.
This breaks LSP compliance, static type-checking and any upstream code still sending str IDs.

Either:

  1. Keep str in the signature and convert inside the method, or
  2. Change the interface (and every other adapter) to UUID.

Mixing the two will cause subtle TypeError/AttributeError at runtime.

🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/networkx/adapter.py around lines 45 to
47, the has_node method uses node_id of type UUID, which conflicts with the
GraphDBInterface declaration expecting a str type. To fix this, change the
method signature to accept node_id as a str and convert it to UUID inside the
method if needed, or alternatively update the interface and all adapters to use
UUID consistently. Ensure the method signature matches the interface to maintain
LSP compliance and avoid runtime type errors.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

163-168: ⚠️ Potential issue

limit == 0 edge case may still call vector_search.limit(0)

If the collection is empty, await collection.count_rows() returns 0, leading to vector_search.limit(0) which typically raises an error (“limit must be >=1”).
Guard against this:

-        if limit == 0:
-            limit = await collection.count_rows()
+        if limit == 0:
+            limit = await collection.count_rows()
+            if limit == 0:  # collection empty
+                return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        collection = await self.get_collection(collection_name)

        if limit == 0:
            limit = await collection.count_rows()
            if limit == 0:  # collection empty
                return []

        results = await collection.vector_search(query_vector).limit(limit).to_pandas()
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py around lines
163 to 168, the code sets limit to the collection row count if limit is 0, but
if the collection is empty, this results in limit being 0, causing
vector_search.limit(0) which raises an error. Fix this by adding a check after
setting limit to ensure it is at least 1 before calling
vector_search.limit(limit), for example by setting limit to 1 if it is 0.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

308-312: ⚠️ Potential issue

Undefined list_collections() call will raise AttributeError at runtime

self.list_collections() no longer exists after the refactor; the correct helper is now get_collection_names().
Any call path that reaches prune() will currently blow up before doing its job.

-        collections = await self.list_collections()
+        collections = await self.get_collection_names()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        client = await self.get_connection()
-       collections = await self.list_collections()
+       collections = await self.get_collection_names()
        for collection_name in collections:
            await client.delete_collection(collection_name)
        return True
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py around
lines 308 to 312, the method prune() calls self.list_collections(), which no
longer exists and will cause an AttributeError. Replace the call to
self.list_collections() with self.get_collection_names() to correctly retrieve
the collection names before deleting them.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (4)

400-405: ⚠️ Potential issue

Unescaped f-string braces introduce runtime errors & injection risk

Inside an f-string, {id: nid} is interpreted by Python, producing NameError.
Additionally, string interpolation of Cypher fragments opens the door to injection.

-        MATCH (node {id: nid})-[r]->(predecessor)
+        MATCH (node {id: nid})-[r]->(predecessor)

Better: parameterise rather than interpolating:

query = """
UNWIND $node_ids AS nid
MATCH (node {id: nid})-[r]->(predecessor)
WHERE type(r) = $edge_label
DELETE r
"""
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 400 to 405, the f-string uses unescaped braces in {id: nid}, causing a
NameError and potential injection risk. To fix this, remove the f-string
formatting and write the query as a plain string with parameter placeholders,
using $node_ids and $edge_label for parameters instead of interpolating values
directly into the query string.

414-418: ⚠️ Potential issue

Formatted label & relationship names break Cypher and leak variables

The f-string injects {id} and {edge_label} directly into the query.
Escape the braces or, preferably, use parameters:

query = """
UNWIND $node_ids AS nid
MATCH (node {id: nid})<-[r]-(successor)
WHERE type(r) = $edge_label
DELETE r
"""
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 414 to 418, the Cypher query incorrectly injects the node label and edge
label directly using f-string formatting, which breaks the query and risks
variable leakage. To fix this, remove the f-string and instead use parameterized
queries by passing node IDs as a parameter and filtering relationships by type
using a WHERE clause with a parameter for edge_label. This avoids direct
injection of labels and relationship types into the query string.

594-605: 🛠️ Refactor suggestion

Metrics queries assume labels Node/EDGE that are never created

Nodes are labelled with the Python class name (e.g. MyDataPoint) and edges with
dynamic relationship types, not Node/EDGE. The metrics queries will always
return empty results.

Refactor the queries to be label-agnostic, e.g.:

MATCH (n) RETURN count(n)
MATCH ()-[r]->() RETURN count(r)

and adjust subsequent sub-queries accordingly.

🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 594 to 605, the Cypher query uses fixed labels `Node` and `EDGE` which do
not exist, causing empty results. Refactor the query to be label-agnostic by
removing specific labels and relationship types, for example, use `MATCH (n)`
instead of `MATCH (n:Node)` and `MATCH path = (n)-[:EDGE*0..]-()` should be
adjusted to `MATCH path = (n)-[*0..]-()`. Update all parts of the query and
subsequent code to work without relying on fixed labels or relationship types.

120-125: ⚠️ Potential issue

Cypher syntax error in delete_node query

MATCH (node: {{id: $node_id}}) is invalid Cypher and will throw immediately.
Use property syntax, not double-braced template literals:

-        query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
+        query = "MATCH (node {id: $node_id}) DETACH DELETE node"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        sanitized_id = node_id.replace(":", "_")

        query = "MATCH (node {id: $node_id}) DETACH DELETE node"
        params = {"node_id": sanitized_id}

        return await self.query(query, params)
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 120 to 125, the Cypher query uses invalid syntax with double-braced
template literals in MATCH (node: {{id: $node_id}}). Replace this with correct
property syntax by using MATCH (node {id: $node_id}) to properly match nodes by
their id property. Update the query string accordingly to fix the syntax error.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)

115-118: ⚠️ Potential issue

Possible missing await on collection.data.exists

collection.data.exists is an async method in the Weaviate async client; calling
it without await will return a coroutine and the subsequent if check will
always be truthy.

-                if collection.data.exists(data_point.uuid):
+                if await collection.data.exists(data_point.uuid):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                data_point: DataObject = data_points[0]
-                if collection.data.exists(data_point.uuid):
+                if await collection.data.exists(data_point.uuid):
                    return await collection.data.update(
                        uuid=data_point.uuid,
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py around
lines 115 to 118, the call to collection.data.exists is missing an await
keyword. Since collection.data.exists is an async method, you need to prefix it
with await to properly get the boolean result before the if check. Add await
before collection.data.exists to fix the issue.

@Vasilije1990 Vasilije1990 self-requested a review May 13, 2025 12:21
@borisarzentar borisarzentar merged commit f93463e into dev May 13, 2025
49 checks passed
@borisarzentar borisarzentar deleted the fix/make-onnxruntime-flexible branch May 13, 2025 12:35
@coderabbitai coderabbitai bot mentioned this pull request May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants