-
Notifications
You must be signed in to change notification settings - Fork 589
fix: make onnxruntime flexible #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please make sure all the checkboxes are checked:
|
WalkthroughThis update introduces a new Memgraph graph database adapter, unifies collection error handling across vector database adapters, and propagates an optional context parameter through pipeline task execution. It removes deprecated methods and error classes, updates documentation and test scripts, and refactors observability configuration for monitoring tools. Several files are added, deleted, or refactored for consistency and robustness. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Pipeline
participant Task
participant Context
User->>Pipeline: run_tasks(tasks, data, user, context)
Pipeline->>Task: handle_task(task, args, ..., context)
Task->>Task: executable(..., context) if accepted
Task-->>Pipeline: result
Pipeline-->>User: final output
sequenceDiagram
participant App
participant GraphEngineFactory
participant MemgraphAdapter
App->>GraphEngineFactory: create_graph_engine(provider="memgraph", ...)
GraphEngineFactory->>MemgraphAdapter: __init__(...)
MemgraphAdapter-->>GraphEngineFactory: instance
GraphEngineFactory-->>App: MemgraphAdapter instance
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 12
🔭 Outside diff range comments (3)
cognee/exceptions/exceptions.py (1)
34-54
: 🛠️ Refactor suggestionUpdate subclasses to support new logging parameters
The subclasses like
ServiceError
andInvalidValueError
don't accept or pass the newlog
andlog_level
parameters to the superclass constructor. They will always use default values regardless of what callers provide.class ServiceError(CogneeApiError): """Failures in external services or APIs, like a database or a third-party service""" def __init__( self, message: str = "Service is unavailable.", name: str = "ServiceError", status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + log=True, + log_level="ERROR", ): - super().__init__(message, name, status_code) + super().__init__(message, name, status_code, log, log_level)Similar updates should be made to
InvalidValueError
andInvalidAttributeError
classes to maintain consistency across all exception types.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
139-143
: 🛠️ Refactor suggestionID type inconsistencies ripple through multiple APIs
The following methods now demand
UUID
while other parts of the adapter (e.g.has_edge
,add_edge
) still acceptstr
:
get_edges
extract_node
/extract_nodes
get_neighbors
get_node
/get_nodes
In addition,
get_disconnected_nodes
(165-175) promisesList[str]
but will returnUUID
objects.Uneven typing will confuse callers, especially when IDs originate from external sources such as REST/JSON where they arrive as strings.
Recommendation:
-async def get_edges(self, node_id: UUID): +async def get_edges(self, node_id: str | UUID): node_id = UUID(node_id) if isinstance(node_id, str) else node_id…and apply the same coercion pattern everywhere, or standardise once in the interface.
Also applies to: 177-185, 624-637
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
151-165
: 🛠️ Refactor suggestion
retrieve
returns raw Weaviate objects – breaks adapter contractOther adapters return a
List[ScoredResult]
. Returning driver-specific objects
forces callers to add special-case handling.Either map to
ScoredResult
, or update the interface & downstream code.
🧹 Nitpick comments (16)
examples/data/car_and_tech_companies.txt (1)
20-37
: Streamlinetext_2
entries and enhance readability.Similarly, remove repetitive headings and tighten phrasing for better flow. Consider replacing verbose constructions with more concise alternatives.
text_2 = """ -1. Apple -Apple is renowned for its innovative consumer electronics and software. Its product lineup includes the iPhone, iPad, Mac computers, and wearables like the Apple Watch. Known for its emphasis on sleek design and user-friendly interfaces, Apple has built a loyal customer base and created a seamless ecosystem that integrates hardware, software, and services. +1. Apple is renowned for its innovative consumer electronics and software. Its product lineup includes the iPhone, iPad, Mac computers, and wearables like the Apple Watch. Known for its emphasis on sleek design and user-friendly interfaces, Apple has built a loyal customer base and created a seamless ecosystem that integrates hardware, software, and services. -2. Google -Founded in 1998, Google started as a search engine and quickly became the go-to resource for finding information online. Over the years, the company has diversified its offerings to include digital advertising, cloud computing, mobile operating systems (Android), and various web services like Gmail and Google Maps. Google's innovations have played a major role in shaping the internet landscape. +2. Google started as a search engine in 1998 and quickly became the go-to resource for finding information online. Over the years, the company has diversified its offerings to include digital advertising, cloud computing, mobile operating systems (Android), and various web services like Gmail and Google Maps. Google's innovations have played a major role in shaping the internet landscape. -3. Microsoft -Microsoft Corporation has been a dominant force in software for decades. In recent years, Microsoft has expanded into cloud computing with Azure, gaming with the Xbox platform, and even hardware through products like the Surface line. This evolution has helped the company maintain its relevance in a rapidly changing tech world. +3. Microsoft Corporation has been a dominant force in software for decades. Recently, Microsoft has expanded into cloud computing with Azure, gaming with the Xbox platform, and even hardware through products like the Surface line. This evolution has helped the company maintain its relevance in a rapidly changing tech world. -4. Amazon -What began as an online bookstore has grown into one of the largest e-commerce platforms globally. Amazon is known for its vast online marketplace, but its influence extends far beyond retail. With Amazon Web Services (AWS), the company has become a leader in cloud computing, offering robust solutions that power websites, applications, and businesses around the world. Amazon's constant drive for innovation continues to reshape both retail and technology sectors. +4. Amazon has grown from an online bookstore into one of the largest e-commerce platforms globally. Known for its vast online marketplace, Amazon's influence extends far beyond retail. With Amazon Web Services (AWS), the company is a leader in cloud computing, offering robust solutions that power websites, applications, and businesses worldwide. Amazon's constant drive for innovation continues to reshape both retail and technology sectors. -5. Meta -Meta, originally known as Facebook, revolutionized social media by connecting billions of people worldwide. Beyond its core social networking service, Meta is investing in the next generation of digital experiences through virtual and augmented reality technologies, with projects like Oculus. The company's efforts signal a commitment to evolving digital interaction and building the metaverse—a shared virtual space where users can connect and collaborate. +5. Meta, originally known as Facebook, revolutionized social media by connecting billions of people worldwide. Beyond its core social networking service, Meta is investing in the next generation of digital experiences through virtual and augmented reality technologies, with projects like Oculus. The company's efforts signal a commitment to evolving digital interaction and building the metaverse—a shared virtual space where users can connect and collaborate. """🧰 Tools
🪛 LanguageTool
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...(ENGLISH_WORD_REPEAT_RULE)
cognee/modules/retrieval/exceptions/__init__.py (1)
7-7
: Address unused imports flagged by static analysis.The imports
SearchTypeNotSupported
andCypherSearchError
are flagged as unused in this module. Since this is an__init__.py
file, these are likely intended to be re-exported at the package level.To properly re-export these exceptions and silence the static analysis warnings, add an
__all__
declaration:-from .exceptions import SearchTypeNotSupported, CypherSearchError +from .exceptions import SearchTypeNotSupported, CypherSearchError + +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]🧰 Tools
🪛 Ruff (0.8.2)
7-7:
.exceptions.SearchTypeNotSupported
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
7-7:
.exceptions.CypherSearchError
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (2)
14-21
: Add comment explaining task signature differencesThe test effectively validates context propagation, but task_2 has a different signature from the other tasks. Consider adding a comment to clarify that this is intentional, demonstrating that tasks can optionally accept or ignore the context parameter.
def task_1(num, context): return num + context +# Intentionally omitting context parameter to test tasks that don't need context def task_2(num): return num * 2 def task_3(num, context): return num**context
37-39
: Add validation comments for clarityThe test assertions would be more informative with comments explaining how the expected value was calculated.
- final_result = 4586471424 + # Calculate expected result: 5 + 7 = 12, 12 * 2 = 24, 24^7 = 4,586,471,424 + final_result = 4586471424 async for result in pipeline: assert result == final_resultcognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
66-73
: Refactor using contextlib.suppress for cleaner codeThe try-except-pass pattern can be simplified using Python's contextlib module.
-try: - await memory_fragment.project_graph_from_db( - graph_engine, - node_properties_to_project=properties_to_project, - edge_properties_to_project=["relationship_name"], - ) -except EntityNotFoundError: - pass +from contextlib import suppress + +with suppress(EntityNotFoundError): + await memory_fragment.project_graph_from_db( + graph_engine, + node_properties_to_project=properties_to_project, + edge_properties_to_project=["relationship_name"], + )🧰 Tools
🪛 Ruff (0.8.2)
66-73: Use
contextlib.suppress(EntityNotFoundError)
instead oftry
-except
-pass
Replace with
contextlib.suppress(EntityNotFoundError)
(SIM105)
cognee/modules/pipelines/operations/run_tasks_base.py (1)
31-34
: Optimize dict key checkingThe key lookup can be simplified for better performance.
-has_context = any( - [key == "context" for key in inspect.signature(running_task.executable).parameters.keys()] -) +has_context = "context" in inspect.signature(running_task.executable).parameters🧰 Tools
🪛 Ruff (0.8.2)
32-32: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
179-180
: Consider conditional limit parameter in SQL queryThe limit parameter is passed directly to the Qdrant client even when its value is negative or zero, which could lead to unexpected behavior. While line 179 checks if the limit is positive, consider improving this by only passing the limit parameter when it's greater than zero.
- limit=limit if limit > 0 else None, + limit=limit if limit > 0 else None,This is actually already correctly implemented - the code checks if the limit is greater than zero and passes None otherwise, which is the right approach.
cognee-mcp/src/server.py (1)
170-191
: Graceful shutdown signal missingIf
mcp.run_stdio_async()
/run_sse_async()
are cancelled (e.g. SIGINT), the backgroundasyncio.create_task
jobs keep running and the process may exit abruptly, losing work.
Consider attaching them to a supervisor task or usingasyncio.TaskGroup
/add_done_callback
to capture and flush logs before exit.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
268-285
: Edge-removal loops are O(N×E) and synchronousFor large
node_ids
this double-nested loop becomes costly.
Consider using NetworkX’s vectorised helpers (e.g.edges(node_id, data=True)
+ list comprehension) or batching removals, then committing a singlesave_graph_to_file
at the end.cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
78-84
:get_collection
re-checks existence but opens the table anywayThe second call to
await connection.open_table(collection_name)
can still fail if the collection was deleted between the two operations (TOCTOU window).
To make this atomic, try to open and catch the provider-specific “not found” error instead of issuing two separate RPCs.
208-213
: Serial deletion can be slow – leverage a bulk filterDeleting each ID one-by-one incurs N round-trips.
LanceDB supports filtering; a single call is faster and transactional:- for data_point_id in data_point_ids: - await collection.delete(f"id = '{data_point_id}'") + ids_tuple = tuple(data_point_ids) + await collection.delete(f"id IN {ids_tuple}")cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
239-241
: CatchingException
broadly obscures real problemsSwallowing every exception logs the message but returns an empty list – masking functional bugs, mis-configurations, or network issues. Restrict the
except
clause to the concrete errors you expect (e.g.CollectionNotFoundError
,chromadb.errors.ChromaError
) and let the rest propagate.
243-249
: Search-limit defaults are now inconsistent (15 vs 5)
search()
defaults tolimit=15
, butbatch_search()
still defaults to 5. For a predictable API keep them aligned:- limit: int = 5, + limit: int = 15,cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
430-444
: Mutable default argument (dict()
) inserialize_properties
Using a mutable default causes shared state between calls.
- def serialize_properties(self, properties=dict()): + def serialize_properties(self, properties: dict | None = None): ... - for property_key, property_value in properties.items(): + for property_key, property_value in (properties or {}).items():🧰 Tools
🪛 Ruff (0.8.2)
430-430: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)
89-100
:list.index()
inside loop gives O(n²) behaviour
convert_to_weaviate_data_points
callsdata_points.index(...)
for each item,
turning an otherwise linear transformation into quadratic time for large
batches.- def convert_to_weaviate_data_points(data_point: DataPoint): - vector = data_vectors[data_points.index(data_point)] + def convert_to_weaviate_data_points(idx: int, data_point: DataPoint): + vector = data_vectors[idx] ... -data_points = [convert_to_weaviate_data_points(dp) for dp in data_points] +data_points = [ + convert_to_weaviate_data_points(i, dp) for i, dp in enumerate(data_points) +]
43-46
: Repeatedawait client.connect()
on every call is wasteful
use_async_with_weaviate_cloud
keeps its own connection pool. Callingconnect
each time adds latency and may exhaust sockets. Connect once in__init__()
or
guard with an internal flag.
🛑 Comments failed to post (12)
examples/data/car_and_tech_companies.txt (1)
1-18: 🛠️ Refactor suggestion
Remove redundant company name headings and correct pluralization in
text_1
.Each bullet entry repeats the company name on the next line, which is redundant. Consolidating the name and description into one line improves readability. Also fix the subject-verb agreement in the closing sentence.
text_1 = """ -1. Audi -Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars. +1. Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars. -2. BMW -BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance. +2. BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance. -3. Mercedes-Benz -Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs. +3. Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs. -4. Porsche -Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style. +4. Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style. -5. Volkswagen -Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality. +5. Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality. -Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence. +Each of these car manufacturers contribute to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence. """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.text_1 = """ 1. Audi is known for its modern designs and advanced technology. Founded in the early 1900s, the brand has earned a reputation for precision engineering and innovation. With features like the Quattro all-wheel-drive system, Audi offers a range of vehicles from stylish sedans to high-performance sports cars. 2. BMW, short for Bayerische Motoren Werke, is celebrated for its focus on performance and driving pleasure. The company's vehicles are designed to provide a dynamic and engaging driving experience, and their slogan, "The Ultimate Driving Machine," reflects that commitment. BMW produces a variety of cars that combine luxury with sporty performance. 3. Mercedes-Benz is synonymous with luxury and quality. With a history dating back to the early 20th century, the brand is known for its elegant designs, innovative safety features, and high-quality engineering. Mercedes-Benz manufactures not only luxury sedans but also SUVs, sports cars, and commercial vehicles, catering to a wide range of needs. 4. Porsche is a name that stands for high-performance sports cars. Founded in 1931, the brand has become famous for models like the iconic Porsche 911. Porsche cars are celebrated for their speed, precision, and distinctive design, appealing to car enthusiasts who value both performance and style. 5. Volkswagen, which means "people's car" in German, was established with the idea of making affordable and reliable vehicles accessible to everyone. Over the years, Volkswagen has produced several iconic models, such as the Beetle and the Golf. Today, it remains one of the largest car manufacturers in the world, offering a wide range of vehicles that balance practicality with quality. Each of these car manufacturers contribute to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence. """
🧰 Tools
🪛 LanguageTool
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
🤖 Prompt for AI Agents (early access)
In examples/data/car_and_tech_companies.txt lines 1 to 18, remove the redundant company name headings that appear as separate lines before each description by merging the company name with the first sentence of its description. Also, correct the pluralization in the last sentence by changing "Each of these car manufacturer contributes" to "Each of these car manufacturers contributes" to fix subject-verb agreement.
cognee/modules/observability/get_observe.py (1)
1-11: 💡 Verification agent
🧩 Analysis chain
Well-structured observability decorator provider
This new function correctly centralizes the logic for obtaining the appropriate observability decorator based on the configured monitoring tool. However, there's no default return value if the monitoring tool isn't LANGFUSE.
Consider adding a default return value to handle cases where the monitoring tool isn't LANGFUSE:
def get_observe(): monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observe + elif monitoring == Observer.LLMLITE: + # Import and return LLMLITE observe decorator + pass + elif monitoring == Observer.LANGSMITH: + # Import and return LANGSMITH observe decorator + pass + else: + # Return a no-op decorator as fallback + def noop_observe(*args, **kwargs): + def decorator(func): + return func + return decorator if args and callable(args[0]) else decorator + + return noop_observeLet's verify if there are any potential issues with missing the other observer types:
🏁 Script executed:
#!/bin/bash # Check if the other observer types are being used anywhere in the codebase echo "Checking for LLMLITE usage:" rg "Observer\.LLMLITE" -A 2 -B 2 echo "Checking for LANGSMITH usage:" rg "Observer\.LANGSMITH" -A 2 -B 2 echo "Checking if monitoring_tool is set to values other than LANGFUSE:" rg "monitoring_tool\s*=\s*" -A 2 -B 2 echo "Finding potential error cases for get_observe():" rg "get_observe\(\)" -A 5 -B 5Length of output: 3613
I’d like to inspect the
Observer
enum to confirm which monitoring types are defined. Then we can decide on the appropriate fallback behavior.
🏁 Script executed:
#!/bin/bash set -e # Locate and display the Observer enum definitions file=$(fd -e py "observers.py" | head -n1) echo "Found observers file: $file" sed -n '1,200p' "$file"Length of output: 446
Add fallback and handle all Observer types in get_observe()
Currently,
get_observe()
only returns a decorator forObserver.LANGFUSE
—for any other value (includingLLMLITE
andLANGSMITH
), it returnsNone
and will break when used as a decorator. We should:
- Provide a no-op fallback decorator so code doesn’t error out.
- Stub out branches for the other enum values (to be implemented when those integrations land).
File:
cognee/modules/observability/get_observe.py
def get_observe(): monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observe + # TODO: implement LLMLITE observability + elif monitoring == Observer.LLMLITE: + # from llmlite.decorators import observe + # return observe + pass + # TODO: implement LANGSMITH observability + elif monitoring == Observer.LANGSMITH: + # from langsmith.decorators import observe + # return observe + pass + # Fallback no-op decorator + def noop_observe(*args, **kwargs): + def decorator(func): + return func + return decorator if args and callable(args[0]) else decorator + return noop_observe– Ensures that
observe = get_observe()
is always a callable decorator
– Leaves clear TODOs for LLMLITE/LANGSMITH support📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from cognee.base_config import get_base_config from .observers import Observer def get_observe(): monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observe # TODO: implement LLMLITE observability elif monitoring == Observer.LLMLITE: # from llmlite.decorators import observe # return observe pass # TODO: implement LANGSMITH observability elif monitoring == Observer.LANGSMITH: # from langsmith.decorators import observe # return observe pass # Fallback no-op decorator def noop_observe(*args, **kwargs): def decorator(func): return func return decorator if args and callable(args[0]) else decorator return noop_observe
🤖 Prompt for AI Agents (early access)
In cognee/modules/observability/get_observe.py around lines 1 to 11, the get_observe() function only returns a decorator for Observer.LANGFUSE and returns None otherwise, which will cause errors when used as a decorator. Fix this by adding a no-op fallback decorator that simply returns the original function unchanged, ensuring get_observe() always returns a callable. Also, add stub branches for Observer.LLMLITE and Observer.LANGSMITH with TODO comments for future implementation, so all enum values are handled explicitly.
cognee-mcp/src/server.py (2)
44-46: 🛠️ Refactor suggestion
⚠️ Potential issuePreserve original traceback when re-raising the exception
Re-throwing with a plain
ValueError
masks the original stack-trace, making debugging harder.
Use exception chaining so the root cause is retained:- except Exception as e: - logger.error("Cognify process failed.") - raise ValueError(f"Failed to cognify: {str(e)}") + except Exception as err: + logger.error("Cognify process failed.") + raise ValueError(f"Failed to cognify: {err}") from err📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.except Exception as err: logger.error("Cognify process failed.") raise ValueError(f"Failed to cognify: {err}") from err
🧰 Tools
🪛 Ruff (0.8.2)
46-46: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
🤖 Prompt for AI Agents (early access)
In cognee-mcp/src/server.py around lines 44 to 46, the current code raises a new ValueError without preserving the original traceback, which obscures the root cause. Modify the raise statement to use exception chaining by adding "from e" after the new exception to retain the original exception context and stack trace for better debugging.
104-128:
⚠️ Potential issue
search_type
is not validated – may explode withKeyError
SearchType[search_type.upper()]
will raise aKeyError
for any invalid input (e.g. typos or future enum changes).
Fail fast with a friendly message before dereferencing:- search_results = await cognee.search( - query_type=SearchType[search_type.upper()], query_text=search_query - ) + try: + query_type = SearchType[search_type.upper()] + except KeyError: + raise ValueError( + f"Unsupported search_type '{search_type}'. " + f"Expected one of {[t.name.lower() for t in SearchType]}" + ) + + search_results = await cognee.search( + query_type=query_type, query_text=search_query + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def search(search_query: str, search_type: str) -> list: async def search_task(search_query: str, search_type: str) -> str: """Search the knowledge graph""" # NOTE: MCP uses stdout to communicate, we must redirect all output # going to stdout ( like the print function ) to stderr. with redirect_stdout(sys.stderr): - search_results = await cognee.search( - query_type=SearchType[search_type.upper()], query_text=search_query - ) + try: + query_type = SearchType[search_type.upper()] + except KeyError: + raise ValueError( + f"Unsupported search_type '{search_type}'. " + f"Expected one of {[t.name.lower() for t in SearchType]}" + ) + + search_results = await cognee.search( + query_type=query_type, query_text=search_query + ) if search_type.upper() == "CODE": return json.dumps(search_results, cls=JSONEncoder) elif ( search_type.upper() == "GRAPH_COMPLETION" or search_type.upper() == "RAG_COMPLETION" ): return search_results[0] elif search_type.upper() == "CHUNKS": return str(search_results) elif search_type.upper() == "INSIGHTS": results = retrieved_edges_to_string(search_results) return results else: return str(search_results) search_results = await search_task(search_query, search_type)
🤖 Prompt for AI Agents (early access)
In cognee-mcp/src/server.py around lines 104 to 128, the code uses SearchType[search_type.upper()] without validating if search_type.upper() is a valid key, which can cause a KeyError. To fix this, add a validation step before accessing the enum: check if search_type.upper() is in SearchType.__members__, and if not, raise a clear and friendly exception or return an error message. This ensures the function fails fast with a helpful message instead of crashing unexpectedly.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
45-47:
⚠️ Potential issuePublic method signatures diverge from the declared interface
GraphDBInterface.has_node
(seegraph_db_interface.py
) expectsnode_id: str
, but the adapter now declaresnode_id: UUID
.
This breaks LSP compliance, static type-checking and any upstream code still sendingstr
IDs.Either:
- Keep
str
in the signature and convert inside the method, or- Change the interface (and every other adapter) to
UUID
.Mixing the two will cause subtle
TypeError
/AttributeError
at runtime.🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/networkx/adapter.py around lines 45 to 47, the has_node method uses node_id of type UUID, which conflicts with the GraphDBInterface declaration expecting a str type. To fix this, change the method signature to accept node_id as a str and convert it to UUID inside the method if needed, or alternatively update the interface and all adapters to use UUID consistently. Ensure the method signature matches the interface to maintain LSP compliance and avoid runtime type errors.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
163-168:
⚠️ Potential issue
limit == 0
edge case may still callvector_search.limit(0)
If the collection is empty,
await collection.count_rows()
returns0
, leading tovector_search.limit(0)
which typically raises an error (“limit must be >=1”).
Guard against this:- if limit == 0: - limit = await collection.count_rows() + if limit == 0: + limit = await collection.count_rows() + if limit == 0: # collection empty + return []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.collection = await self.get_collection(collection_name) if limit == 0: limit = await collection.count_rows() if limit == 0: # collection empty return [] results = await collection.vector_search(query_vector).limit(limit).to_pandas()
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py around lines 163 to 168, the code sets limit to the collection row count if limit is 0, but if the collection is empty, this results in limit being 0, causing vector_search.limit(0) which raises an error. Fix this by adding a check after setting limit to ensure it is at least 1 before calling vector_search.limit(limit), for example by setting limit to 1 if it is 0.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
308-312:
⚠️ Potential issueUndefined
list_collections()
call will raiseAttributeError
at runtime
self.list_collections()
no longer exists after the refactor; the correct helper is nowget_collection_names()
.
Any call path that reachesprune()
will currently blow up before doing its job.- collections = await self.list_collections() + collections = await self.get_collection_names()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.client = await self.get_connection() - collections = await self.list_collections() + collections = await self.get_collection_names() for collection_name in collections: await client.delete_collection(collection_name) return True
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py around lines 308 to 312, the method prune() calls self.list_collections(), which no longer exists and will cause an AttributeError. Replace the call to self.list_collections() with self.get_collection_names() to correctly retrieve the collection names before deleting them.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (4)
400-405:
⚠️ Potential issueUnescaped f-string braces introduce runtime errors & injection risk
Inside an f-string,
{id: nid}
is interpreted by Python, producingNameError
.
Additionally, string interpolation of Cypher fragments opens the door to injection.- MATCH (node {id: nid})-[r]->(predecessor) + MATCH (node {id: nid})-[r]->(predecessor)Better: parameterise rather than interpolating:
query = """ UNWIND $node_ids AS nid MATCH (node {id: nid})-[r]->(predecessor) WHERE type(r) = $edge_label DELETE r """🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 400 to 405, the f-string uses unescaped braces in {id: nid}, causing a NameError and potential injection risk. To fix this, remove the f-string formatting and write the query as a plain string with parameter placeholders, using $node_ids and $edge_label for parameters instead of interpolating values directly into the query string.
414-418:
⚠️ Potential issueFormatted label & relationship names break Cypher and leak variables
The f-string injects
{id}
and{edge_label}
directly into the query.
Escape the braces or, preferably, use parameters:query = """ UNWIND $node_ids AS nid MATCH (node {id: nid})<-[r]-(successor) WHERE type(r) = $edge_label DELETE r """🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 414 to 418, the Cypher query incorrectly injects the node label and edge label directly using f-string formatting, which breaks the query and risks variable leakage. To fix this, remove the f-string and instead use parameterized queries by passing node IDs as a parameter and filtering relationships by type using a WHERE clause with a parameter for edge_label. This avoids direct injection of labels and relationship types into the query string.
594-605: 🛠️ Refactor suggestion
Metrics queries assume labels
Node
/EDGE
that are never createdNodes are labelled with the Python class name (e.g.
MyDataPoint
) and edges with
dynamic relationship types, notNode
/EDGE
. The metrics queries will always
return empty results.Refactor the queries to be label-agnostic, e.g.:
MATCH (n) RETURN count(n) MATCH ()-[r]->() RETURN count(r)and adjust subsequent sub-queries accordingly.
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 594 to 605, the Cypher query uses fixed labels `Node` and `EDGE` which do not exist, causing empty results. Refactor the query to be label-agnostic by removing specific labels and relationship types, for example, use `MATCH (n)` instead of `MATCH (n:Node)` and `MATCH path = (n)-[:EDGE*0..]-()` should be adjusted to `MATCH path = (n)-[*0..]-()`. Update all parts of the query and subsequent code to work without relying on fixed labels or relationship types.
120-125:
⚠️ Potential issueCypher syntax error in
delete_node
query
MATCH (node: {{id: $node_id}})
is invalid Cypher and will throw immediately.
Use property syntax, not double-braced template literals:- query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" + query = "MATCH (node {id: $node_id}) DETACH DELETE node"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.sanitized_id = node_id.replace(":", "_") query = "MATCH (node {id: $node_id}) DETACH DELETE node" params = {"node_id": sanitized_id} return await self.query(query, params)
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 120 to 125, the Cypher query uses invalid syntax with double-braced template literals in MATCH (node: {{id: $node_id}}). Replace this with correct property syntax by using MATCH (node {id: $node_id}) to properly match nodes by their id property. Update the query string accordingly to fix the syntax error.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
115-118:
⚠️ Potential issuePossible missing
await
oncollection.data.exists
collection.data.exists
is an async method in the Weaviate async client; calling
it withoutawait
will return a coroutine and the subsequentif
check will
always be truthy.- if collection.data.exists(data_point.uuid): + if await collection.data.exists(data_point.uuid):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.data_point: DataObject = data_points[0] - if collection.data.exists(data_point.uuid): + if await collection.data.exists(data_point.uuid): return await collection.data.update( uuid=data_point.uuid,
🤖 Prompt for AI Agents (early access)
In cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py around lines 115 to 118, the call to collection.data.exists is missing an await keyword. Since collection.data.exists is an async method, you need to prefix it with await to properly get the boolean result before the if check. Add await before collection.data.exists to fix the issue.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.