-
Notifications
You must be signed in to change notification settings - Fork 594
Merge dev to main #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge dev to main #827
Conversation
<!-- .github/pull_request_template.md --> ## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
… 1947 (#760) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com>
<!-- .github/pull_request_template.md --> ## Description Add support for UV and for Poetry package management ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Switch typing from str to UUID for NetworkX node_id ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add both sse and stdio support for Cognee MCP ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782) <!-- .github/pull_request_template.md --> ## Description Add log handling options for cognee exceptions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fix issue with failing versions gh actions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description This PR adds support for the Memgraph graph database following the [graph database integration guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration): - Implemented `MemgraphAdapter` for interfacing with Memgraph. - Updated `get_graph_engine.py` to return MemgraphAdapter when appropriate. - Added a test script:` test_memgraph.py.` - Created a dedicated test workflow: `.github/workflows/test_memgraph.yml.` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>
<!-- .github/pull_request_template.md --> ## Description refactor: Handle boto3 s3fs dependencies better ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Update LanceDB and rewrite data points to run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description As discussed with @hande-k and Lazar, I've created a short demo to illustrate how to get the pagerank rankings from the knowledge graph given the nx engine. This is a POC, and a first of step towards solving #643 . Please let me know what you think, and how to proceed from here. :) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <boris@topoteretes.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Move CollectionNotFoundError logging to be at Debug level ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
This reverts commit c058219.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Please make sure all the checkboxes are checked:
|
WalkthroughThis update introduces broad architectural, interface, and usability changes to the Cognee codebase. It adds Memgraph support, refactors pipeline and task execution with context propagation, modularizes observability, and standardizes dataset identification. Numerous adapters are updated for unified search and error handling. The MCP server is overhauled, and documentation and UI are improved. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant API
participant Pipeline
participant DB
User->>Frontend: Add data / Click Cognify
Frontend->>API: POST /add or /cognify (with dataset name)
API->>Pipeline: cognee_pipeline(tasks, datasets, user, pipeline_name)
Pipeline->>DB: Create or fetch dataset (get_unique_dataset_id)
Pipeline->>Pipeline: Log pipeline run initiated (with status)
Pipeline->>Pipeline: Check pipeline run status
alt Not already processing/completed
Pipeline->>Pipeline: Run tasks (with context)
Pipeline->>DB: Update pipeline run status
else Already processing/completed
Pipeline-->>API: Return early
end
API-->>Frontend: Response (status, result)
Frontend-->>User: Show notification / result
sequenceDiagram
participant User
participant Frontend
participant API
participant MCP Server
User->>Frontend: Submit tool action (e.g., cognify)
Frontend->>API: Call MCP endpoint
API->>MCP Server: Invoke tool (e.g., @mcp.tool('cognify'))
MCP Server->>MCP Server: Launch async task
MCP Server-->>API: Immediate confirmation (background processing)
API-->>Frontend: Show status / instructions
Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
🔭 Outside diff range comments (5)
cognee/shared/logging_utils.py (1)
315-325
:⚠️ Potential issueReassess broadened warning suppression threshold.
The condition now silences
SAWarning
for all levels aboveDEBUG
(includingINFO
), which may hide important warnings. Recommend reverting to suppress only levels aboveWARNING
or refining to specific modules:- if log_level > logging.DEBUG: + if log_level >= logging.WARNING: import warnings from sqlalchemy.exc import SAWarning warnings.filterwarnings( "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.merge_job" ) warnings.filterwarnings( "ignore", category=SAWarning, module="dlt.destinations.impl.sqlalchemy.load_jobs" )This ensures you don’t unintentionally swallow informative messages.
cognee/exceptions/exceptions.py (1)
38-44
:⚠️ Potential issueChild exception classes don't forward new logging parameters.
ServiceError and other subclasses don't pass the new
log
andlog_level
parameters to the parent constructor, which means the new logging controls won't work for these exception types.Update the subclass constructors to forward the new parameters:
def __init__( self, message: str = "Service is unavailable.", name: str = "ServiceError", status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + log=True, + log_level="ERROR", ): - super().__init__(message, name, status_code) + super().__init__(message, name, status_code, log, log_level)This change should be applied to all child exception classes (ServiceError, InvalidValueError, InvalidAttributeError).
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
210-237
:⚠️ Potential issue
closest_items
type confusion causes attribute errors
session.execute(query)
returns a SQLAlchemyResult
.
Iterating overclosest_items.all()
yieldsRowMapping
objects whose columns are available via index or key access (e.g.row[0]
orrow.PGVectorDataPoint
), not via attributes on the row itself. Thereforevector.id
on line 244 will raiseAttributeError
.Suggested fix:
- for vector in closest_items.all(): - vector_list.append( - { - "id": parse_id(str(vector.id)), - "payload": vector.payload, - "_distance": vector.similarity, - } - ) + for row in closest_items.all(): + data_point = row[0] # PGVectorDataPoint instance + similarity = row.similarity # labelled column + vector_list.append( + { + "id": parse_id(str(data_point.id)), + "payload": data_point.payload, + "_distance": similarity, + } + )cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
155-205
: 🛠️ Refactor suggestionDouble-closing the Qdrant client & missing
query_vector
fallback
await client.close()
is invoked inside thetry
block (line 183) and again in thefinally
block (line 204).
While idempotent, this is redundant and obscures intent.When
query_vector
is supplied butquery_text
is also given, the latter is ignored. Clarify precedence or raiseValueError
.Minimal patch:
@@ - await client.close() + # close handled in finallyOptionally document parameter precedence.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
239-241
:⚠️ Potential issueExceptions are swallowed, callers receive an empty list
Catching all exceptions and returning
[]
hides real problems (network issues, authentication, malformed queries, etc.). Let the exception propagate or wrap it in a domain-specific error; otherwise upstream code cannot distinguish “no results” from “search failed”.- except Exception as e: - logger.error(f"Error in search: {str(e)}") - return [] + except Exception as err: + logger.error("Search failed: %s", err, exc_info=True) + raise
🧹 Nitpick comments (27)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
332-334
: Schema handling changed to use a fixed list instead of dynamic retrieval.The code now uses a fixed schema list
["public", "public_staging"]
instead of dynamically retrieving all schemas withget_schema_list()
. This makes the deletion process more predictable but might not handle custom schemas that aren't in this list.This change aligns with the broader infrastructure updates in the PR to stabilize database schema handling across relational and graph database adapters.
Consider adding a comment explaining why a fixed list is preferred over dynamically retrieving schemas, to help future developers understand the design decision.
examples/data/car_and_tech_companies.txt (2)
17-17
: Fix grammatical number agreement in summary sentence.The phrase “Each of these car manufacturer contributes…” mismatches plural “these” with singular “manufacturer.”
Apply this diff:- Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry... + Each of these car manufacturers contributes to Germany's reputation as a leader in the global automotive industry...🧰 Tools
🪛 LanguageTool
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
1-37
: Consider externalizing large static text for maintainability.Embedding lengthy multi-line strings directly in a
.txt
file can make updates cumbersome. You might move these into a structured JSON/YAML resource (e.g.,examples/data/car_and_tech_companies.json
) and load them at runtime.
This change will:
- Improve readability of code that ingests this data
- Enable syntax validation and diffing in structured formats
- Simplify future updates and translations
Would you like assistance converting this to a JSON/YAML resource?
🧰 Tools
🪛 LanguageTool
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...(ENGLISH_WORD_REPEAT_RULE)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
53-55
: Optional: Reevaluate the__main__
guard for test consistency.Adding:
if __name__ == "__main__": test_run_tasks()enables standalone execution but is nonstandard when using frameworks like
pytest
orunittest
CLI. Consider relying solely on your test runner for consistency across CI and local runs.cognee/modules/pipelines/operations/__init__.py (1)
1-1
: Update the import to follow package conventionsThe newly added import
log_pipeline_run_initiated
is not being used within this file and isn't exposed through an__all__
declaration. This creates "dead imports" as flagged by the static analysis tool.Choose one of these approaches:
- Add an
__all__
list to explicitly define the public API:from .log_pipeline_run_initiated import log_pipeline_run_initiated from .log_pipeline_run_start import log_pipeline_run_start from .log_pipeline_run_complete import log_pipeline_run_complete from .log_pipeline_run_error import log_pipeline_run_error from .pipeline import cognee_pipeline +__all__ = [ + "log_pipeline_run_initiated", + "log_pipeline_run_start", + "log_pipeline_run_complete", + "log_pipeline_run_error", + "cognee_pipeline", +]
- Or add a comment explaining the import pattern if this is an intentional design choice for implicit exports.
🧰 Tools
🪛 Ruff (0.8.2)
1-1:
.log_pipeline_run_initiated.log_pipeline_run_initiated
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
examples/python/graphiti_example.py (1)
31-33
: User context integration is correct but could use clarificationAdding user initialization to align with updated pipeline requirements is good. Consider adding a brief comment explaining why user context is necessary for task execution in the updated system.
# Initialize default user user = await get_default_user() +# User context is required for pipeline task execution in the updated system
cognee/modules/retrieval/exceptions/__init__.py (1)
7-7
: Consider addressing the unused imports in the moduleThe static analysis tool has identified that
SearchTypeNotSupported
andCypherSearchError
are imported but not used within this module. If the intent is to re-export these exceptions at the package level, consider adding them to an__all__
list.from .exceptions import SearchTypeNotSupported, CypherSearchError +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]
🧰 Tools
🪛 Ruff (0.8.2)
7-7:
.exceptions.SearchTypeNotSupported
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
7-7:
.exceptions.CypherSearchError
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
cognee/modules/observability/get_observe.py (1)
1-11
: Add docstring for better documentationThe function lacks a docstring explaining its purpose, parameters, return values, and possible exceptions. Since this is a utility function that will be used throughout the codebase, proper documentation is essential.
Add a docstring to clarify the function's purpose:
def get_observe(): + """ + Get the appropriate observe decorator based on the configured monitoring tool. + + Returns: + The observe decorator from the configured monitoring tool package. + + Raises: + ValueError: If the configured monitoring tool is not supported. + """ monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observecognee/modules/observability/observers.py (1)
7-9
: Consider adding docstrings for individual enum valuesWhile the class has a brief docstring, adding documentation for each enum value would improve clarity about what each monitoring tool represents and potentially any specific requirements or behaviors associated with it.
class Observer(str, Enum): """Monitoring tools""" - LANGFUSE = "langfuse" - LLMLITE = "llmlite" - LANGSMITH = "langsmith" + LANGFUSE = "langfuse" # Langfuse observability platform + LLMLITE = "llmlite" # LLMLite monitoring tool + LANGSMITH = "langsmith" # LangSmith tracing and evaluation platformcognee/tests/test_neo4j.py (1)
77-86
: Consider adding a TODO to revisit this testWhile commenting out the test with a clear explanation is reasonable, it would be helpful to add a TODO or FIXME to ensure this test is revisited in the future when the issue with weak LLM models is addressed.
-# NOTE: Due to the test failing often on weak LLM models we've removed this test for now +# TODO: Revisit this test when more reliable LLM models are available or implement a more robust test +# NOTE: Test removed temporarily as it fails frequently on weaker LLM modelscognee/modules/pipelines/operations/log_pipeline_run_initiated.py (2)
6-14
: Add docstring to improve function documentationThe function lacks documentation explaining its purpose, parameters, and return value. Adding a docstring would improve code maintainability.
async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID): + """ + Creates and persists a new pipeline run record indicating the initiation of dataset processing. + + Args: + pipeline_id: The ID of the pipeline + pipeline_name: The name of the pipeline + dataset_id: The UUID of the dataset being processed + + Returns: + PipelineRun: The created pipeline run object + """ pipeline_run = PipelineRun( pipeline_run_id=uuid4(), pipeline_name=pipeline_name,
13-14
: Consider adding basic metadata to initial run_infoThe run_info is initialized as an empty dictionary. Consider adding basic metadata like timestamp or initiator to provide more context.
pipeline_id=pipeline_id, status=PipelineRunStatus.DATASET_PROCESSING_INITIATED, dataset_id=dataset_id, - run_info={}, + run_info={ + "initiated_at": datetime.datetime.utcnow().isoformat(), + "environment": os.environ.get("ENVIRONMENT", "unknown") + }, )cognee/infrastructure/databases/graph/get_graph_engine.py (1)
71-72
: Consider more specific error messages for missing credentialsThe error message could be more specific about which Memgraph credential is missing to aid in debugging configuration issues.
elif graph_database_provider == "memgraph": - if not (graph_database_url and graph_database_username and graph_database_password): - raise EnvironmentError("Missing required Memgraph credentials.") + missing_credentials = [] + if not graph_database_url: + missing_credentials.append("graph_database_url") + if not graph_database_username: + missing_credentials.append("graph_database_username") + if not graph_database_password: + missing_credentials.append("graph_database_password") + + if missing_credentials: + raise EnvironmentError(f"Missing required Memgraph credentials: {', '.join(missing_credentials)}")cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
175-190
: Clean up unused search type handlingThe component no longer includes "INSIGHTS" in the search options (as mentioned in the AI summary), but the
convertToSearchTypeOutput
function still handles cases for "INSIGHTS", "SUMMARIES", and "CHUNKS" which may no longer be used.Consider cleaning up the unused cases in this switch statement to match the available search options:
switch (searchType) { - case 'INSIGHTS': - return systemMessages.map((message: InsightMessage) => { - const [node1, relationship, node2] = message; - if (node1.name && node2.name) { - return `${node1.name} ${relationship.relationship_name} ${node2.name}.`; - } - return ''; - }).join('\n'); - case 'SUMMARIES': - return systemMessages.map((message: { text: string }) => message.text).join('\n'); - case 'CHUNKS': - return systemMessages.map((message: { text: string }) => message.text).join('\n'); default: return ""; }cognee/exceptions/exceptions.py (1)
23-30
: Consider validating the log_level parameter.The code doesn't validate that log_level is one of the supported values. An invalid value would result in silent failure (no logging) without any error.
Consider adding validation:
def __init__( self, message: str = "Service is unavailable.", name: str = "Cognee", status_code=status.HTTP_418_IM_A_TEAPOT, log=True, - log_level="ERROR", + log_level: Literal["ERROR", "WARNING", "INFO", "DEBUG"] = "ERROR", ): self.message = message self.name = name self.status_code = status_code + valid_log_levels = ["ERROR", "WARNING", "INFO", "DEBUG"] + if log and log_level not in valid_log_levels: + log_level = "ERROR" + logger.warning(f"Invalid log_level '{log_level}'. Using 'ERROR' instead.") # Automatically log the exception details if log and (log_level == "ERROR"):You'll need to add
from typing import Literal
at the top of the file.cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (2)
60-70
: Good implementation of button state management.The implementation correctly:
- Uses the useBoolean hook for managing button state
- Disables the button during the async operation
- Ensures the button is re-enabled in the finally block
Consider adding error handling to display feedback to the user if the cognify operation fails.
const handleCognify = () => { disableCognifyButton(); onCognify() + .catch(error => { + // Display error to user, e.g. with a toast notification + console.error('Cognify operation failed:', error); + }) .finally(() => enableCognifyButton()); };
36-36
: Component name shadows global property.The component name 'DataView' shadows a global property, which could lead to confusion.
Consider renaming the component to avoid shadowing:
-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) { +export default function DatasetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {Remember to update the component name in any importing files as well.
🧰 Tools
🪛 Biome (1.9.4)
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
cognee/modules/pipelines/operations/run_tasks_base.py (1)
31-33
: Optimize dictionary key lookup.The current implementation uses
.keys()
which creates an unnecessary view object when checking for key existence.- has_context = any( - [key == "context" for key in inspect.signature(running_task.executable).parameters.keys()] - ) + has_context = "context" in inspect.signature(running_task.executable).parameters🧰 Tools
🪛 Ruff (0.8.2)
32-32: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
Dockerfile (1)
33-34
: Consider consolidating dependency installation.The dependency installation is performed twice with the same parameters. Consider consolidating into a single installation after copying all source code, or clearly documenting why two separate installations are necessary.
Also applies to: 43-44
cognee/tests/test_memgraph.py (2)
14-14
: Add docstring to the main function.Consider adding a docstring to explain the purpose and flow of this integration test for better clarity and documentation.
async def main(): + """ + Integration test for Memgraph adapter in Cognee. + + This test configures Cognee with Memgraph, loads test data, performs + cognification, executes various search queries, verifies history tracking, + and performs cleanup to ensure proper isolation. + """ cognee.config.set_graph_database_provider("memgraph")
57-60
: Enhance search result assertions.The current assertions only check if results are non-empty. Consider adding more specific assertions to verify the quality and correctness of search results, such as checking for expected content or structural properties.
search_results = await cognee.search( query_type=SearchType.INSIGHTS, query_text=random_node_name ) assert len(search_results) != 0, "The search results list is empty." +# Verify result quality +assert any("quantum" in str(result).lower() for result in search_results), "Expected 'quantum' keyword not found in results"Also applies to: 65-69, 71-77, 79-86
cognee/modules/pipelines/operations/pipeline.py (1)
173-175
: Mismatch between validation logic and error messageThe validation currently rejects dots
"."
and spaces" "
in a dataset name, yet the error message claims that underscores are disallowed.
This will certainly confuse users and may lead to unnecessary support questions.- if "." in dataset_name or " " in dataset_name: - raise ValueError("Dataset name cannot contain spaces or underscores") + if "." in dataset_name or " " in dataset_name: + raise ValueError("Dataset name cannot contain spaces or dots (.)")cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
250-253
: Early return path is good but misseswith_vector
flagWhen no results are found the method returns an empty list ‑ great.
However, ifwith_vector=True
the caller might still expect the vector field in eachScoredResult
. Consider documenting this behaviour or returning a structure that makes the absence explicit.cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
37-47
: Repeatedconnect()
on every call – cache connection status
await self.client.connect()
is executed on eachget_client()
invocation.
If the SDK treatsconnect()
as idempotent this is only wasted latency; if not, it may raiseAlreadyConnected
or leak resources.async def get_client(self): - await self.client.connect() - return self.client + if not getattr(self.client, "_is_connected", False): + await self.client.connect() + # flag set by us; adjust to real SDK attribute if it exists + self.client._is_connected = True + return self.clientcognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
205-211
: Serial deletes can be slow – batch predicate is simplerLooping one-by-one round-trips to delete rows may crawl on large lists.
Since IDs are primary keys, a single predicate delete is supported:-for data_point_id in data_point_ids: - await collection.delete(f"id = '{data_point_id}'") +ids = "', '".join(data_point_ids) +await collection.delete(f"id IN ('{ids}')")This cuts network chatter and lock contention.
cognee-mcp/src/server.py (1)
79-86
:all(results)
is an unreliable success criterion
run_code_graph_pipeline
appears to yield complex objects; relying on truthiness may mark partial failures as success (e.g., non-empty dicts are truthy). Consider:
- Inspect returned objects for explicit success flags, or
- Catch exceptions inside the loop instead of inferring success by truthiness, or
- Log and propagate the first exception.
Without this, the “Codify process failed” branch may never execute even when errors occur.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
66-73
: Added defensive error handling for entity not found.The code now gracefully handles cases where entities don't exist in the database by catching
EntityNotFoundError
and continuing execution rather than failing. This improves robustness when working with incomplete data.Consider using
contextlib.suppress(EntityNotFoundError)
for a more concise implementation:- try: - await memory_fragment.project_graph_from_db( - graph_engine, - node_properties_to_project=properties_to_project, - edge_properties_to_project=["relationship_name"], - ) - except EntityNotFoundError: - pass + from contextlib import suppress + with suppress(EntityNotFoundError): + await memory_fragment.project_graph_from_db( + graph_engine, + node_properties_to_project=properties_to_project, + edge_properties_to_project=["relationship_name"], + )🧰 Tools
🪛 Ruff (0.8.2)
66-73: Use
contextlib.suppress(EntityNotFoundError)
instead oftry
-except
-pass
Replace with
contextlib.suppress(EntityNotFoundError)
(SIM105)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
.github/actions/cognee_setup/action.yml
is excluded by!**/*.yml
.github/workflows/python_version_tests.yml
is excluded by!**/*.yml
.github/workflows/test_memgraph.yml
is excluded by!**/*.yml
assets/cognee_benefits.png
is excluded by!**/*.png
,!**/*.png
assets/graph_visualization.png
is excluded by!**/*.png
,!**/*.png
cognee-mcp/pyproject.toml
is excluded by!**/*.toml
cognee-mcp/uv.lock
is excluded by!**/*.lock
,!**/*.lock
poetry.lock
is excluded by!**/*.lock
,!**/*.lock
pyproject.toml
is excluded by!**/*.toml
uv.lock
is excluded by!**/*.lock
,!**/*.lock
📒 Files selected for processing (78)
CONTRIBUTING.md
(1 hunks)Dockerfile
(1 hunks)README.md
(2 hunks)assets/graph_visualization.html
(0 hunks)cognee-frontend/src/app/page.tsx
(3 hunks)cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx
(1 hunks)cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx
(1 hunks)cognee-frontend/src/app/wizard/WizardPage.tsx
(1 hunks)cognee-frontend/src/modules/datasets/cognifyDataset.ts
(1 hunks)cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts
(1 hunks)cognee-frontend/src/modules/ingestion/DataView/DataView.tsx
(4 hunks)cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx
(1 hunks)cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx
(2 hunks)cognee-frontend/src/utils/fetch.ts
(1 hunks)cognee-mcp/src/server.py
(3 hunks)cognee/api/v1/add/add.py
(1 hunks)cognee/api/v1/cognify/code_graph_pipeline.py
(3 hunks)cognee/api/v1/cognify/cognify.py
(1 hunks)cognee/api/v1/datasets/datasets.py
(1 hunks)cognee/base_config.py
(1 hunks)cognee/exceptions/exceptions.py
(1 hunks)cognee/infrastructure/databases/graph/get_graph_engine.py
(1 hunks)cognee/infrastructure/databases/graph/graph_db_interface.py
(2 hunks)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py
(1 hunks)cognee/infrastructure/databases/graph/networkx/adapter.py
(7 hunks)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
(2 hunks)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py
(8 hunks)cognee/infrastructure/databases/vector/exceptions/exceptions.py
(1 hunks)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
(8 hunks)cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
(9 hunks)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
(5 hunks)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
(6 hunks)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
(9 hunks)cognee/infrastructure/llm/gemini/adapter.py
(1 hunks)cognee/infrastructure/llm/openai/adapter.py
(2 hunks)cognee/modules/data/methods/__init__.py
(1 hunks)cognee/modules/data/methods/create_dataset.py
(2 hunks)cognee/modules/data/methods/get_unique_dataset_id.py
(1 hunks)cognee/modules/graph/cognee_graph/CogneeGraph.py
(1 hunks)cognee/modules/observability/get_observe.py
(1 hunks)cognee/modules/observability/observers.py
(1 hunks)cognee/modules/pipelines/models/PipelineRun.py
(1 hunks)cognee/modules/pipelines/operations/__init__.py
(1 hunks)cognee/modules/pipelines/operations/get_pipeline_status.py
(2 hunks)cognee/modules/pipelines/operations/log_pipeline_run_initiated.py
(1 hunks)cognee/modules/pipelines/operations/pipeline.py
(3 hunks)cognee/modules/pipelines/operations/run_tasks.py
(4 hunks)cognee/modules/pipelines/operations/run_tasks_base.py
(4 hunks)cognee/modules/retrieval/exceptions/__init__.py
(1 hunks)cognee/modules/retrieval/exceptions/exceptions.py
(0 hunks)cognee/modules/retrieval/graph_completion_retriever.py
(1 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py
(4 hunks)cognee/modules/settings/get_settings.py
(2 hunks)cognee/shared/data_models.py
(0 hunks)cognee/shared/logging_utils.py
(1 hunks)cognee/tasks/ingestion/ingest_data.py
(1 hunks)cognee/tasks/temporal_awareness/index_graphiti_objects.py
(2 hunks)cognee/tests/integration/run_toy_tasks/conftest.py
(0 hunks)cognee/tests/test_memgraph.py
(1 hunks)cognee/tests/test_neo4j.py
(1 hunks)cognee/tests/test_weaviate.py
(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_test.py
(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py
(1 hunks)cognee/tests/unit/modules/retrieval/chunks_retriever_test.py
(4 hunks)cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py
(1 hunks)cognee/tests/unit/modules/retrieval/summaries_retriever_test.py
(1 hunks)cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
(0 hunks)entrypoint.sh
(3 hunks)examples/data/car_and_tech_companies.txt
(1 hunks)examples/python/graphiti_example.py
(2 hunks)notebooks/cognee_demo.ipynb
(3 hunks)notebooks/cognee_graphiti_demo.ipynb
(4 hunks)notebooks/cognee_llama_index.ipynb
(2 hunks)notebooks/cognee_simple_demo.ipynb
(7 hunks)notebooks/github_graph_visualization.html
(0 hunks)notebooks/graphrag_vs_rag.ipynb
(7 hunks)notebooks/hr_demo.ipynb
(0 hunks)notebooks/llama_index_cognee_integration.ipynb
(5 hunks)
💤 Files with no reviewable changes (7)
- cognee/modules/retrieval/exceptions/exceptions.py
- cognee/shared/data_models.py
- cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
- cognee/tests/integration/run_toy_tasks/conftest.py
- notebooks/github_graph_visualization.html
- assets/graph_visualization.html
- notebooks/hr_demo.ipynb
🧰 Additional context used
🧬 Code Graph Analysis (26)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
test_run_tasks
(42-43)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
cognee/modules/users/models/User.py (1)
User
(12-39)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
cognee/modules/retrieval/summaries_retriever.py (1)
SummariesRetriever
(9-33)
cognee/api/v1/cognify/cognify.py (2)
cognee/modules/pipelines/operations/pipeline.py (1)
cognee_pipeline
(30-103)cognee/api/v1/datasets/datasets.py (1)
datasets
(7-40)
cognee/infrastructure/llm/openai/adapter.py (1)
cognee/modules/observability/get_observe.py (1)
get_observe
(5-11)
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
Explorer
(15-61)
cognee/tests/test_neo4j.py (1)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user
(12-37)
cognee/modules/data/methods/__init__.py (1)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id
(5-6)
cognee/tasks/ingestion/ingest_data.py (1)
cognee/modules/data/methods/create_dataset.py (1)
create_dataset
(11-33)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
cognee/modules/pipelines/models/PipelineRun.py (1)
PipelineRun
(15-27)
cognee/modules/pipelines/operations/__init__.py (1)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
log_pipeline_run_initiated
(6-22)
cognee/modules/observability/get_observe.py (2)
cognee/base_config.py (1)
get_base_config
(29-30)cognee/modules/observability/observers.py (1)
Observer
(4-9)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
MemgraphAdapter
(20-690)
cognee/base_config.py (1)
cognee/modules/observability/observers.py (1)
Observer
(4-9)
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
cognee-frontend/src/utils/fetch.ts (1)
fetch
(3-12)
cognee/api/v1/datasets/datasets.py (1)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
get_pipeline_status
(8-35)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (5)
cognee/modules/pipelines/tasks/task.py (1)
Task
(5-97)cognee/modules/users/methods/get_default_user.py (1)
get_default_user
(12-37)cognee/modules/pipelines/operations/run_tasks_base.py (1)
run_tasks_base
(66-82)cognee/infrastructure/databases/relational/create_db_and_tables.py (1)
create_db_and_tables
(4-7)cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
run_and_check_tasks
(10-46)
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (1)
cognee/api/v1/config/config.py (2)
config
(15-194)system_root_directory
(17-29)
cognee/tests/test_weaviate.py (1)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine
(5-6)
cognee/tests/test_memgraph.py (7)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user
(12-37)cognee/shared/logging_utils.py (1)
get_logger
(137-158)cognee/modules/search/types/SearchType.py (1)
SearchType
(4-13)cognee/api/v1/config/config.py (4)
config
(15-194)set_graph_database_provider
(57-59)data_root_directory
(32-34)system_root_directory
(17-29)cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine
(5-6)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine
(9-21)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
get_graph_data
(455-483)
cognee/modules/data/methods/create_dataset.py (2)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id
(5-6)cognee/modules/users/models/User.py (1)
User
(12-39)
cognee/exceptions/exceptions.py (1)
cognee/shared/logging_utils.py (4)
error
(127-128)warning
(124-125)info
(121-122)debug
(133-134)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError
(5-14)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
get_async_session
(34-40)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (5)
cognee/shared/logging_utils.py (1)
get_logger
(137-158)cognee/exceptions/exceptions.py (1)
InvalidValueError
(47-54)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint
(16-96)cognee/infrastructure/engine/utils/parse_id.py (1)
parse_id
(4-10)cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError
(5-14)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (5)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (4)
get_collection
(121-126)has_collection
(111-113)get_connection
(99-106)create_data_points
(128-144)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (3)
get_collection
(75-80)has_collection
(51-53)create_data_points
(82-132)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (2)
has_collection
(74-78)create_data_points
(99-130)cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError
(5-14)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint
(16-96)
cognee/infrastructure/databases/graph/networkx/adapter.py (4)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (7)
has_node
(66-75)get_edges
(264-275)extract_node
(121-124)extract_nodes
(126-136)get_neighbors
(381-383)get_node
(385-392)get_nodes
(394-402)cognee/infrastructure/databases/graph/kuzu/adapter.py (7)
has_node
(167-171)get_edges
(439-475)extract_node
(284-304)extract_nodes
(306-325)get_neighbors
(479-481)get_node
(483-502)get_nodes
(504-521)cognee/infrastructure/databases/graph/graph_db_interface.py (4)
get_edges
(177-179)get_neighbors
(182-184)get_node
(125-127)get_nodes
(130-132)cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)
extract_node
(235-238)extract_nodes
(240-241)
🪛 Ruff (0.8.2)
cognee/modules/retrieval/exceptions/__init__.py
7-7: .exceptions.SearchTypeNotSupported
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
7-7: .exceptions.CypherSearchError
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
cognee/modules/data/methods/__init__.py
10-10: .get_unique_dataset_id.get_unique_dataset_id
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
cognee/modules/pipelines/operations/__init__.py
1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
cognee/modules/retrieval/utils/brute_force_triplet_search.py
66-73: Use contextlib.suppress(EntityNotFoundError)
instead of try
-except
-pass
Replace with contextlib.suppress(EntityNotFoundError)
(SIM105)
cognee/modules/pipelines/operations/run_tasks_base.py
32-32: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
cognee-mcp/src/server.py
48-48: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py
430-430: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🪛 Biome (1.9.4)
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🪛 LanguageTool
examples/data/car_and_tech_companies.txt
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...
(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...
(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...
(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...
(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (119)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
130-130
: Good parameter naming alignment with the implementation.The change from
limit=20
totop_k=20
correctly aligns the test with the actual implementation ofSummariesRetriever
, which acceptstop_k
as a constructor parameter and then uses it internally as thelimit
parameter when calling the vector engine. This change is part of the broader standardization of parameter naming conventions across retrieval modules.cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1)
157-162
: Good refactoring of test execution pattern!Converting individual test calls to a sequential async main function improves test organization and efficiency. This approach:
- Ensures tests run in a predictable order
- Uses a single event loop for all tests rather than creating/destroying multiple loops
- Follows the async/await pattern consistently throughout the test file
- Matches the pattern used in other retrieval test modules
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (4)
19-19
: Good test isolation improvementUsing a test-specific directory path (
test_chunks_context_simple
) provides better isolation between test cases and prevents data leakage between tests.Also applies to: 23-23
76-77
: Good test isolation improvementUsing a test-specific directory path (
test_chunk_context_complex
) ensures that this test's data doesn't interfere with other tests. This also makes debugging easier when a specific test fails.Also applies to: 80-81
165-166
: Good test isolation improvementUsing a dedicated directory path (
test_chunk_context_empty
) for the empty graph test ensures accurate testing of the empty state behavior without interference from other test data.Also applies to: 169-170
193-198
: Good refactoring of test execution pattern!Converting multiple individual test calls to a single async main function is an improvement that:
- Ensures consistent use of the async/await pattern
- Prevents multiple event loop creation/destruction cycles
- Makes test execution more predictable and maintainable
- Aligns with the pattern used in other retrieval test modules
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
72-72
: Consistency fix in comment formatting.Fixed apostrophe style in comment from curly to straight, improving code consistency.
cognee/modules/pipelines/models/PipelineRun.py (1)
9-9
: Verify database migration for new enum member.Adding
DATASET_PROCESSING_INITIATED
toPipelineRunStatus
may require an Alembic migration if using a PostgreSQLENUM
type. Please confirm that your migration scripts include this new member, or update the type with anALTER TYPE
statement.cognee-frontend/src/utils/fetch.ts (1)
4-4
: LGTM on URL standardizationChanging from IP address to hostname is a good standardization practice. The functionality remains the same as both "127.0.0.1" and "localhost" resolve to the local machine.
For future improvements, consider using environment variables for the API base URL:
- return global.fetch('http://localhost:8000/api' + url, { + return global.fetch((process.env.NEXT_PUBLIC_API_BASE_URL || 'http://localhost:8000/api') + url, {This would make the application more configurable across different environments.
cognee/tests/test_weaviate.py (1)
85-85
: LGTM on adding await keywordThe addition of the
await
keyword is necessary and correct since thelist_all()
method has been converted to an asynchronous function as part of the broader WeaviateAdapter refactoring.This change ensures the test properly waits for the asynchronous operation to complete before asserting that the collections list is empty.
cognee-frontend/src/app/wizard/WizardPage.tsx (1)
32-32
: LGTM on standardizing dataset identificationChanging from
id
toname
aligns with the broader standardization of dataset identification throughout the frontend codebase.This change is consistent with updates in related components like CognifyStep and ExploreStep, which now expect a dataset prop with a
name
property instead of anid
property.cognee/api/v1/cognify/cognify.py (1)
37-39
: Pipeline naming standardization looks goodThe addition of the
pipeline_name="cognify_pipeline"
parameter helps standardize pipeline identification across the system, allowing for better tracking and filtering of pipeline executions. This aligns well with the changes in the datasets module where the same pipeline name is used for status retrieval.cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
10-10
: Dataset identification change is aligned with broader architecture updatesChanging the
dataset
property from{ id: string }
to{ name: string }
reflects a shift in how datasets are identified throughout the application. This is consistent with similar changes in other frontend components. The Explorer component correctly handles this change since it only passes the dataset object to other functions without directly accessing specific properties.examples/python/graphiti_example.py (1)
41-41
: Pipeline execution with user context looks goodThe updated
run_tasks
call with the user parameter aligns with the broader updates to the pipeline task execution system.cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (2)
5-5
: LGTM! Dataset identification unified by name.The dataset prop type change from
{ id: string }
to{ name: string }
aligns with the broader frontend refactoring to standardize dataset identification by name rather than ID.
11-11
: Compatible with corresponding Explorer component changes.Removed the non-null assertion operator (
!
) when passing the dataset prop to the Explorer component, which is consistent with the updated type system and how Explorer handles the dataset prop.README.md (2)
61-61
: LGTM! Updated Colab notebook link.The Google Colab notebook link has been updated to a new URL, ensuring users have access to the latest getting started materials.
128-135
: Great addition highlighting the new UI capabilities.The new "cognee UI" section enhances the README by providing users with information about the visual interface, which aligns with the broader frontend improvements in this PR. The screenshot and local setup link make it easy for users to get started with the UI.
cognee/api/v1/add/add.py (1)
2-6
: Improved import organization.The reordering of imports provides better logical grouping of related modules, improving code readability:
- Pipeline imports grouped together
- User model below Task
- Task operations below pipeline definition
No functional changes were made to the code logic, making this a safe improvement.
cognee/infrastructure/llm/openai/adapter.py (2)
5-5
: Simplified imports.Removed unused imports and kept only the essential
Type
import from the typing module, improving code cleanliness.
18-20
: Centralized observability implementation.Replaced conditional import logic with a centralized approach using the new
get_observe()
function. This change:
- Improves code consistency across different LLM adapters
- Centralizes monitoring tool configuration logic
- Makes the code more maintainable by reducing duplicated conditional logic
This is a good architectural improvement that aligns with the broader observability refactoring in the codebase.
cognee/modules/retrieval/graph_completion_retriever.py (1)
78-78
:✅ Verification successful
Error handling pattern has changed.
The try-except block catching
EntityNotFoundError
has been removed. Ensure that error handling for entity-not-found scenarios is now properly managed in theget_triplets
method or that this error condition is no longer relevant for this code path.
🏁 Script executed:
#!/bin/bash # Search for EntityNotFoundError handling in the get_triplets method or brute_force_triplet_search rg -A 5 "EntityNotFoundError" cognee/modules/retrieval/utils/brute_force_triplet_search.pyLength of output: 593
🏁 Script executed:
#!/bin/bash echo "1. Locating get_triplets definition in graph_completion_retriever.py" rg -n "def get_triplets" cognee/modules/retrieval/graph_completion_retriever.py echo "2. Showing the first 200 lines of graph_completion_retriever.py for context" sed -n '1,200p' cognee/modules/retrieval/graph_completion_retriever.py echo "3. Searching for brute_force_triplet_search usage in graph_completion_retriever.py" rg -n "brute_force_triplet_search" cognee/modules/retrieval/graph_completion_retriever.pyLength of output: 5994
Error handling for missing entities is covered upstream
TheEntityNotFoundError
is caught and suppressed withinbrute_force_triplet_search
, so there’s no need for an additional try-except inget_triplets
.cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)
3-3
:✅ Verification successful
Breaking change: parameter type changed from
id
toname
.The function parameter has been changed from
{ id: string }
to{ name: string }
. This is part of a broader refactor to use dataset names instead of IDs. Ensure all callers of this function have been updated to provide aname
property instead of anid
.
🏁 Script executed:
#!/bin/bash # Search for instances where getExplorationGraphUrl is called to verify they use name rg -A 2 "getExplorationGraphUrl" cognee-frontend/srcLength of output: 1050
Verified:
getExplorationGraphUrl
callers updated
- The only invocation found in
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx
now passes the fulldataset
object (which includesname
).- No other call sites of
getExplorationGraphUrl
remain in the codebase.No further changes required.
cognee/tasks/ingestion/ingest_data.py (1)
107-107
: LGTM!The change correctly passes the full
user
object instead of just the user ID to thecreate_dataset
function, which aligns with the updated function signature that now expects aUser
object as the second parameter.cognee/infrastructure/databases/graph/graph_db_interface.py (3)
61-61
: Logging level changed from error to debug.The change from
logger.error
tologger.debug
for relationship errors reduces log noise for non-critical issues that are already being handled with rollback. This is a good practice as these exceptions are caught and handled properly within the code flow.
81-81
: Consistent logging level adjustment for edge relationships.Similarly to the previous change, the logging level for edge relationship errors has been appropriately reduced from error to debug, maintaining consistency in how these handled exceptions are logged.
88-88
: Session commit errors now logged at debug level.The session commit errors are now logged at debug level instead of error level. This change aligns with the overall approach to reduce noise in error logs for handled exceptions.
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
131-135
:❓ Verification inconclusive
API call updated to use the new unified search interface.
The code has been modernized to use the
search
method with explicit keyword arguments instead of the deprecatedget_distance_from_collection_elements
method. This change aligns with broader refactoring across vector database adapters to use a unified search interface.The parameter
limit=0
might need verification - typically a limit of 0 might mean "no results" in some systems while in others it could mean "no limit".
🏁 Script executed:
#!/bin/bash # Check if the search method implementation treats limit=0 appropriately rg -A 10 "async def search" --type py cognee/infrastructure/databases/vector/Length of output: 7548
I’m inspecting how each adapter handles a
limit=0
parameter to confirm that it either returns all results or is treated safely. Running a scan of allsearch
implementations now.
Confirm
limit=0
Semantics Across All AdaptersThe switch from the deprecated
get_distance_from_collection_elements
to the unifiedsearch()
interface looks correct. However, passinglimit=0
can mean “no results” in some clients rather than “unlimited.” Please verify that each adapter treatslimit=0
as intended (e.g. return all matches or a sensible default) and adjust if needed:• cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
• cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
• cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
• cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
• cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
• cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.pycognee/modules/settings/get_settings.py (2)
2-2
: Added import for Optional type.The import for
Optional
from the typing module has been added to support the type changes in theLLMConfig
model.
24-25
:✅ Verification successful
LLMConfig fields made optional.
The
endpoint
andapi_version
fields in theLLMConfig
model have been changed from required strings to optional strings. This provides more flexibility in configuration, especially for LLM providers that might not require these fields.Make sure that any code using these fields handles the case where they might be None.
🏁 Script executed:
#!/bin/bash # Check usages of endpoint and api_version to ensure they handle None values rg -A 5 -B 5 "endpoint|api_version" --type py cognee/infrastructure/llm/Length of output: 15993
🏁 Script executed:
#!/bin/bash # Display the first 200 lines of the settings file to understand context sed -n '1,200p' cognee/modules/settings/get_settings.pyLength of output: 4448
Approved: Optional
endpoint
/api_version
in Settings LLMConfig Are SafeThe
endpoint
andapi_version
fields incognee/modules/settings/get_settings.py
are only used to shape the settings payload returned to the UI. Since they’re nowOptional[str]
, Pydantic’sBaseModel
will acceptNone
and emitnull
in the JSON—there’s no dereferencing or logic applied to these values elsewhere in the codebase.No further changes are required.
cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1)
10-10
:✅ Verification successful
Dataset identification changed from id to name.
The
dataset
prop type has been updated from{ id: string }
to{ name: string }
, which aligns with a broader standardization of dataset identification across the application. This change ensures consistent dataset referencing throughout the frontend and backend pipeline.Verify that the
cognifyDataset
function properly handles this change in the dataset identification scheme:
🏁 Script executed:
#!/bin/bash # Check the implementation of cognifyDataset to ensure it uses dataset.name cat src/modules/datasets/cognifyDataset.tsLength of output: 115
🏁 Script executed:
#!/usr/bin/env bash # Locate the implementation of cognifyDataset to verify it uses dataset.name rg -n "cognifyDataset" .Length of output: 584
🏁 Script executed:
#!/usr/bin/env bash # Display the implementation of cognifyDataset to confirm it references dataset.name sed -n '1,200p' cognee-frontend/src/modules/datasets/cognifyDataset.tsLength of output: 427
Dataset identification change verified in cognifyDataset
The
dataset
prop has been updated to{ name: string }
inCognifyStep.tsx
, and insrc/modules/datasets/cognifyDataset.ts
we see:body: JSON.stringify({ datasets: [dataset.id || dataset.name], }),This ensures that when
id
is absent,name
is used. No further changes are needed.cognee/modules/observability/observers.py (1)
4-9
: LGTM! Good use of string-based enumThe Observer enum inheriting from
str
andEnum
is a good choice, allowing for direct string comparisons without explicit conversions. The enumeration is clear and concise.cognee/api/v1/datasets/datasets.py (1)
31-31
: LGTM! Correct pipeline name specificationThe update correctly specifies the pipeline name parameter when calling
get_pipeline_status
, which aligns with the changes in the pipeline operations module. This maintains the same interface for callers while adapting to the enhanced pipeline management system.cognee/tests/test_neo4j.py (1)
90-90
: LGTM! Correctly updated assertionThe assertion has been properly updated to reflect the removal of the natural language search test. The expected history length is now 6 instead of 8, which is consistent with the commented-out search operation.
cognee/base_config.py (2)
5-5
: Consistent enum migration for observability tools.The import change to use the new
Observer
enum aligns with the broader refactoring of observability tooling across the codebase.
11-11
: Type consistency maintained in monitoring tool default.The change from
MonitoringTool.LANGFUSE
toObserver.LANGFUSE
preserves the same functionality while using the newly defined enum from the observability module.cognee/modules/pipelines/operations/get_pipeline_status.py (2)
8-8
: Function signature updated to support pipeline-specific filtering.Adding the
pipeline_name
parameter allows callers to specify which pipeline's status to retrieve, improving the query's specificity.
23-23
: Enhanced query filtering by pipeline name.This new filter restricts the status retrieval to runs of the specified pipeline, improving the accuracy of pipeline status reporting.
cognee-frontend/src/app/page.tsx (2)
46-46
: Improved user guidance in notification message.The updated notification clearly informs users of the next step after adding data.
106-106
: UI enhancement for direct cognification.Adding the
onCognify
prop toDataView
enables the new UI workflow for triggering dataset cognification directly from the data view.cognee/infrastructure/llm/gemini/adapter.py (2)
6-7
: Well-structured imports for improved observabilityThe centralized import of the observability module is a good approach, aligning with the broader refactoring of observability tooling across the codebase.
17-17
: Good centralization of observability configurationReplacing conditional imports with a centralized function call improves maintainability and consistency. This approach makes it easier to switch monitoring tools if needed in the future.
entrypoint.sh (4)
16-16
: Direct Alembic usage aligned with dependency management changesThis change aligns with the shift from Poetry to uv for dependency management, simplifying the migration command.
25-25
: Helpful migration completion messageAdding this message improves visibility into the startup process.
27-27
: Simplified startup messageThe more generic "Starting server..." message is appropriate since it's no longer specifically referring to Gunicorn.
36-37
:❓ Verification inconclusive
Removal of
exec
might affect signal handling in containersRemoving
exec
from the server startup commands changes process hierarchy and signal propagation. In containerized environments, this might affect how signals (like SIGTERM) are handled.Also applies to: 38-39, 41-42
🏁 Script executed:
#!/bin/bash # This script checks if the application is running in a container environment # and warns about potential signal handling implications if [ -f /.dockerenv ] || grep -q docker /proc/self/cgroup 2>/dev/null || grep -q kubernetes /proc/self/cgroup 2>/dev/null; then echo "WARNING: Application appears to be running in a container environment" echo "Without 'exec' prefix, signals may not propagate correctly to the application process." echo "This could affect graceful shutdown during container orchestration." fi # Check if container orchestration tools are being used if [ -n "$KUBERNETES_SERVICE_HOST" ] || [ -n "$DOCKER_HOST" ]; then echo "Container orchestration detected. Signal handling is particularly important." fiLength of output: 130
Add
exec
to server startup for correct signal handling in containersTo ensure the startup process becomes PID 1 and properly receives SIGTERM/SIGINT in Docker/Kubernetes (allowing graceful shutdown), prepend
exec
to your debugpy/gunicorn commands:• entrypoint.sh, Lines 36–37, 38–39, 41–42
• Example diff for the debugpy+gunicorn invocation:- debugpy --wait-for-client --listen 0.0.0.0:5678 \ - -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \ - -t 30000 --bind=0.0.0.0:8000 --log-level debug \ - --reload cognee.api.client:app + exec debugpy --wait-for-client --listen 0.0.0.0:5678 \ + -m gunicorn -w 3 -k uvicorn.workers.UvicornWorker \ + -t 30000 --bind=0.0.0.0:8000 --log-level debug \ + --reload cognee.api.client:appPlease verify that graceful shutdown behavior in your container orchestration works as expected.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
16-21
: LGTM: Database session handling follows best practicesThe database session handling with async context manager follows best practices for asynchronous database operations.
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
70-80
: Good implementation of Memgraph supportThe addition of Memgraph support follows the established pattern of the factory function, maintaining consistency with other database providers.
notebooks/cognee_demo.ipynb (3)
470-470
: Import path updated to match updated module structure.The import path for the
Task
class has been updated to reflect the lowercase module naming convention (task
instead ofTask
), aligning with Python's standard naming practices.
508-508
: Parameter added to support user context in pipeline execution.The
run_tasks
function call has been updated to include theuser
parameter, which aligns with the broader architectural changes for context propagation in task execution. This change correctly implements the updated function signature.
532-536
: Explicit user context retrieval implemented.The code now explicitly retrieves the user context in two steps:
- Getting the default user
- Fetching the complete user object with the default user's ID
This approach aligns with the codebase's architectural changes to standardize user context handling across the application.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)
81-82
: Improved user experience with immediate input clearingMoving the input clearing to happen immediately after submitting the search rather than waiting for the response provides better feedback to users that their query has been submitted.
108-110
: Good error handling to preserve user inputAdding error handling to restore the input value if the fetch request fails is an excellent improvement to prevent users from losing their search query when errors occur.
cognee/api/v1/cognify/code_graph_pipeline.py (4)
5-7
: Improved observability module organizationReorganizing the imports for observability-related modules enhances code organization. The
get_observe
import from a dedicated module suggests a more centralized approach to observability management.
16-16
: Added dataset ID generation capabilityAdding the import for
get_unique_dataset_id
aligns with the asynchronous dataset ID generation approach and enables user-specific dataset IDs.
25-25
: Centralized observability configurationReplacing conditional imports with a centralized call to
get_observe()
improves code maintainability and provides a cleaner abstraction for observability tooling.
68-68
: Enhanced dataset ID generation with user contextSwitching to an asynchronous call to
get_unique_dataset_id
with user context is a good improvement that allows for user-specific dataset IDs and better integration with the user management system.cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
9-14
: Improved exception handling and logging configurationThe changes to the
CollectionNotFoundError
class enhance error handling by:
- Making the default exception name consistent with the class name
- Adding flexible logging control parameters
- Updating the superclass constructor appropriately
This approach aligns with best practices for exception handling and allows for more granular control over error reporting.
notebooks/cognee_graphiti_demo.ipynb (4)
18-18
: Fixed typo in markdownSimple readability improvement in the notebook documentation.
27-29
: Updated imports for required modulesAdding the
asyncio
import and updating the other imports to includeget_llm_client
andget_default_user
properly prepares the notebook for the subsequent code changes.Also applies to: 42-43
131-137
: Integrated user context initializationSimplifying the pruning call and adding the default user initialization is a good improvement that aligns with the broader changes to incorporate user context throughout the system.
143-148
: Updated task execution with user contextPassing the user object to
run_tasks
properly integrates the user context into the pipeline execution, ensuring consistent handling of user-related data throughout the system.cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (2)
1-47
: Well-structured test for the new context propagation feature.This test effectively validates that contextual information can be passed through pipeline tasks. The calculation (starting with 5, adding 7, multiplying by 2, then raising to power 7) correctly results in the expected value of 4,586,471,424.
14-21
: Clear task definitions with proper context usage.Good implementation of the test tasks with varied parameter patterns:
- task_1 uses both input and context
- task_2 only uses input
- task_3 uses input and context again
This variety helps validate that the context is properly maintained throughout the pipeline execution regardless of whether individual tasks use it.
cognee/modules/data/methods/create_dataset.py (3)
7-8
: Appropriate imports for the new implementation.These imports correctly support the updated function signature that now accepts a User object instead of just an ID.
11-12
: Good refactoring to accept full User objects.This change aligns with the broader architectural improvements to propagate context throughout the application. Extracting the owner_id from the user object maintains compatibility with the existing database queries.
24-26
: Improved dataset ID generation with proper modularization.Moving the ID generation logic to a separate async helper function improves code organization while maintaining the same uniqueness guarantees (based on dataset name and user ID).
cognee/exceptions/exceptions.py (2)
15-16
: Good addition of logging flexibility parameters.These new parameters allow callers to control whether and at what level exceptions are logged, enhancing observability control.
23-30
: Improved conditional logging based on log level.The implementation properly handles different log levels, providing greater flexibility in how exceptions are reported.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (3)
9-10
: Appropriate UI component imports.The added imports support the new Cognify functionality with proper UI components.
33-33
: Well-defined prop interface for the new cognify functionality.The Promise return type allows for proper async handling of the cognify operation.
80-84
: Good UI integration of the Cognify button.The button is well-positioned next to related actions and has appropriate disabled state handling.
notebooks/graphrag_vs_rag.ipynb (5)
56-57
: Updated cognee package version.Updated to use the newer cognee package version (0.1.39), ensuring compatibility with the latest features and API changes.
152-153
: Updated import path and API usage pattern.Changed the import for SearchType from
cognee.modules.search.types
tocognee.api.v1.search
and updated the function call to use keyword arguments. This reflects a move to a more stable and explicit API.
173-173
: Fixed search type constant usage.Correctly using SearchType.RAG_COMPLETION instead of the previous SearchType.COMPLETION, which is more explicit and accurately describes the search functionality being used.
202-202
: Updated search function call to use keyword arguments.Standardized the search function call to use keyword arguments, improving code clarity and alignment with the updated API.
246-248
: Updated kernel metadata for consistency.Ensured the notebook's kernel metadata is consistent and compatible with current Python environments, which is good practice for notebook portability.
cognee/modules/pipelines/operations/run_tasks_base.py (1)
17-17
: Good implementation of context propagation.The addition of the optional
context
parameter with inspection-based conditional passing provides a flexible way to propagate context through the task execution pipeline. This approach maintains backward compatibility while enabling more sophisticated inter-task communication.Also applies to: 31-36, 66-66, 80-81
Dockerfile (1)
1-61
: Excellent improvement with multi-stage build and modern dependency management.The refactored Dockerfile adopts several best practices:
- Multi-stage build to minimize the final image size
- Using
uv
for faster dependency management instead of Poetry- Proper layer caching with separated dependency installation
- Single RUN command for system dependencies with apt cache cleanup
- Mount caching for
uv
operations to speed up buildsThese changes should result in faster builds, smaller images, and better maintainability.
cognee/tests/test_memgraph.py (1)
14-108
: Well-structured integration test for Memgraph.This test provides comprehensive coverage of the Memgraph adapter functionality including configuration, data loading, cognification, search operations across multiple query types, and proper cleanup. The test isolation with custom paths is a good practice.
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (3)
122-125
: Good error handling improvement.Adding specific exception handling for
CollectionNotExistException
and raising a customCollectionNotFoundError
provides clearer error information and aligns with error handling standardization across adapters. This makes debugging and error recovery more straightforward.Also applies to: 161-164, 211-214
148-148
: Type annotation refinement from str to UUID.Changing
data_point_ids
type fromlist[str]
tolist[UUID]
improves type safety and makes the interface more precise. This helps catch potential type errors earlier in the development process.Also applies to: 236-236
176-176
: Search behavior improvements.The increased default limit (5 → 15) provides more comprehensive search results, and handling non-positive limits by passing
None
is a robust approach that prevents unexpected behavior.Also applies to: 196-196
cognee/modules/pipelines/operations/pipeline.py (1)
142-155
:get_pipeline_status
return-type assumption can break the status checks
task_status
is assumed to be a mapping (dict-like
) whose keys arestr(dataset_id)
, yet the function call on line 142 provides a list of dataset IDs and—depending on its implementation—could plausibly return:
- a
list[PipelineRunStatus]
,- a
dict[str, PipelineRunStatus]
, or- something else entirely.
If it returns a non-mapping type, the subsequent containment check (
if str(dataset_id) in task_status:
) will raiseTypeError
or always evaluate toFalse
, allowing duplicate pipeline executions.Please double-check
get_pipeline_status
’s signature and update the logic accordingly, or cast the result to the expected structure.cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
186-188
: Great! RaisingCollectionNotFoundError
improves diagnosticsExplicitly surfacing a typed exception with a controllable log level is a solid improvement over returning
None
or raising generic errors.
No further action required.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
139-142
: Edge retrieval logic looks goodThe switch to
UUID
propagates correctly into thein_edges
/out_edges
queries; behaviour is intact.cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)
75-81
:collections.get
may requireawait
– double-check SDKThe async SDK exposes both sync & async variants.
Ifclient.collections.get()
returns a coroutine, omittingawait
will hand back an un-awaited coroutine to downstream code.Please confirm with the SDK docs; apply
await
if necessary.
167-174
: Limit parameter handled nicelyRaising the default search limit to 15 and forwarding
None
whenlimit<=0
is clean and keeps API flexible.cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
76-82
:await connection.open_table()
– confirm it really is a coroutine
AsyncConnection.open_table
is async in current LanceDB (≥0.4), so awaiting is correct.
If your pinned version is older, this will raiseTypeError: object is not awaitable
.
Double-check the project’s LanceDB version; downgrade compatibility might need:table = connection.open_table(collection_name) if inspect.iscoroutine(table): table = await tablecognee/modules/retrieval/utils/brute_force_triplet_search.py (4)
4-6
: Updated imports for improved error handling.Added imports for
EntityNotFoundError
andCollectionNotFoundError
exceptions, which are now used for more graceful error handling in the functions below. This aligns with the broader error handling improvements across the codebase.
146-153
: Extracted search function with improved error handling.Created a new helper function
search_in_collection
that encapsulates the vector search call and properly handles the case where a collection doesn't exist. This is a good refactoring that isolates error handling and makes the code more resilient.
156-157
: Improved handling of empty search results.The code now uses the new helper function for searches and returns an empty list when no results are found instead of potentially raising an exception. This provides more predictable behavior and a better API experience for callers.
Also applies to: 159-161
173-174
: Added top-level error handling for collection not found.The function now catches
CollectionNotFoundError
at the top level and returns an empty list, further improving resilience. This ensures the function doesn't fail even if there are unexpected collection issues.cognee/modules/pipelines/operations/run_tasks.py (4)
23-25
: Context parameter added to enable task context propagationThe addition of the optional
context
parameter enhances the pipeline's capability by allowing contextual data to be passed through to tasks that need it.
41-41
: Context propagation properly passed to run_tasks_baseThe context is correctly forwarded to the underlying function, ensuring seamless propagation throughout the execution chain.
77-77
: Context parameter added to run_tasks functionAdding the optional context parameter here completes the context propagation path through the pipeline execution functions.
87-93
: Updated function call with named parametersThe function call is updated with named parameters, making the code more readable and explicitly showing the context parameter being passed. This is good practice when functions have multiple parameters.
cognee/tasks/temporal_awareness/index_graphiti_objects.py (6)
23-23
: Empty params parameter added for consistencyAdding explicit
params={}
ensures consistent parameter passing across all graph queries, which is important for standardization and compatibility with different graph database providers.
27-29
: Query parameters standardizedThe relationship name assignment and explicit empty params parameter follow the standardization pattern, ensuring uniform query interfaces across the codebase.
30-32
: New query added with consistent parameter styleThis new query ensures nodes have a text field by coalescing summary and content fields, maintaining the standardized query pattern with explicit params.
34-34
: Updated method for retrieving graph dataChanging from
get_model_independent_graph_data()
toget_graph_data()
aligns with the new unified graph engine interface that supports multiple graph database providers including the new Memgraph support.
36-40
: Simplified node data iterationThe iteration pattern has been improved to directly use key-value pairs from the graph data structure, making the code more readable and less prone to indexing errors.
64-65
: Simplified edge relationship extractionThe code now directly accesses the relationship name at index 2 of each edge tuple, simplifying the previous more complex filtering and indexing logic. This makes the code more maintainable.
notebooks/cognee_llama_index.ipynb (2)
15-17
: Notebook cell format standardization looks good.The source field for this code cell has been updated to use a list format instead of a single string, which aligns with standard Jupyter notebook JSON conventions. This is a purely structural change that doesn't affect code execution or functionality.
123-125
: Structural field reordering approved.The metadata and outputs fields in this cell have been reordered, which is again just a format standardization that aligns with Jupyter notebook conventions. This change doesn't affect the code execution or functionality of the notebook.
notebooks/cognee_simple_demo.ipynb (9)
17-19
: Package version upgrade to cognee 0.1.39.The cognee package has been updated from version 0.1.36 to 0.1.39. This appears to be part of the broader architectural changes mentioned in the PR summary.
13-16
: Notebook metadata standardization looks good.The execution_count and outputs fields have been properly standardized for this cell. This change improves notebook compatibility without affecting functionality.
33-41
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized for consistency with other notebook cells.
53-60
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized for consistent notebook formatting.
72-80
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized. These changes are structural only and don't affect the code's functionality.
92-98
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
102-108
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
112-118
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this cell.
130-143
: Cell metadata formatting standardized.The execution_count and outputs fields have been standardized in this visualization cell.
notebooks/llama_index_cognee_integration.ipynb (5)
6-8
: Markdown cell formatting standardized.The source field for this markdown cell has been converted from a single string to a list containing that string, which is a standard Jupyter notebook format. This change doesn't affect functionality.
62-64
: Code cell formatting standardized.The source field for this pip install cell has been converted to a list format, which aligns with Jupyter notebook standards. This is a formatting change only.
199-201
: Markdown cell formatting standardized.The source field for this markdown cell has been converted to list format, following standard Jupyter notebook conventions.
220-221
: Markdown cell formatting standardized.The source field for this conclusion markdown cell has been converted to list format, aligning with Jupyter notebook conventions.
282-283
: Python version updated in kernel metadata.The Python version in the language_info metadata has been updated to 3.12.9. This ensures the notebook is associated with the correct Python version.
@@ -97,7 +97,7 @@ git push origin feature/your-feature-name | |||
|
|||
2. Create a Pull Request: | |||
- Go to the [**cognee** repository](https://github.com/topoteretes/cognee) | |||
- Click "Compare & Pull Request" and make sure to open PR against dev branch | |||
- Click "Compare & Pull Request" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure PRs target the dev
branch: The removal of the explicit instruction to open pull requests against the dev
branch may lead contributors to accidentally target main
. Please reintroduce guidance to verify that the base branch is set to dev
when clicking "Compare & Pull Request".
🤖 Prompt for AI Agents
In CONTRIBUTING.md at line 100, reintroduce the instruction reminding
contributors to ensure their pull requests target the `dev` branch. Add a clear
note or step near the "Click 'Compare & Pull Request'" instruction that
explicitly tells users to verify the base branch is set to `dev` before
submitting the PR.
@@ -7,6 +7,7 @@ | |||
from .get_datasets_by_name import get_datasets_by_name | |||
from .get_dataset_data import get_dataset_data | |||
from .get_data import get_data | |||
from .get_unique_dataset_id import get_unique_dataset_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address unused import in __init__.py
.
The import of get_unique_dataset_id
is never referenced within this file, causing a F401 lint error. Since we want to expose it as part of the public API, add a # noqa: F401
or define an __all__
list.
For example, to suppress the warning:
-from .get_unique_dataset_id import get_unique_dataset_id
+# noqa: F401
+from .get_unique_dataset_id import get_unique_dataset_id
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from .get_unique_dataset_id import get_unique_dataset_id | |
# noqa: F401 | |
from .get_unique_dataset_id import get_unique_dataset_id |
🧰 Tools
🪛 Ruff (0.8.2)
10-10: .get_unique_dataset_id.get_unique_dataset_id
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
🤖 Prompt for AI Agents
In cognee/modules/data/methods/__init__.py at line 10, the import of
get_unique_dataset_id is unused and triggers a F401 lint error. To fix this,
either add a comment # noqa: F401 at the end of the import line to suppress the
warning or define an __all__ list including 'get_unique_dataset_id' to
explicitly declare it as part of the public API.
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | ||
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unnecessary async declaration.
The function is declared as async
but doesn't perform any asynchronous operations. Consider removing the async
keyword unless you plan to add asynchronous operations later.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
Additionally, consider adding input validation for empty dataset names and checking that the user and user.id are not None.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
- return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
+async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+ if not dataset_name:
+ raise ValueError("Dataset name cannot be empty")
+ if not user or not user.id:
+ raise ValueError("User or user ID cannot be None")
+ return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") | |
def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") | |
async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: | |
if not dataset_name: | |
raise ValueError("Dataset name cannot be empty") | |
if not user or not user.id: | |
raise ValueError("User or user ID cannot be None") | |
return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") |
🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_unique_dataset_id.py at lines 5 to 6, remove
the async keyword from the function declaration since it does not perform any
asynchronous operations. Additionally, add input validation to check that
dataset_name is not empty and that user and user.id are not None, raising
appropriate exceptions if these conditions are not met.
def get_observe(): | ||
monitoring = get_base_config().monitoring_tool | ||
|
||
if monitoring == Observer.LANGFUSE: | ||
from langfuse.decorators import observe | ||
|
||
return observe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing handling for other Observer types
The function only handles the case when monitoring == Observer.LANGFUSE
, but doesn't handle the other enum values (LLMLITE, LANGSMITH) defined in the Observer enum. There's also no fallback implementation or error handling if an unsupported monitoring tool is configured.
I recommend adding handling for other observer types or at least a fallback:
def get_observe():
monitoring = get_base_config().monitoring_tool
if monitoring == Observer.LANGFUSE:
from langfuse.decorators import observe
return observe
+ elif monitoring == Observer.LLMLITE:
+ # Import and return LLMLITE observe decorator
+ pass
+ elif monitoring == Observer.LANGSMITH:
+ # Import and return LANGSMITH observe decorator
+ pass
+ else:
+ # Provide a fallback or raise a meaningful error
+ raise ValueError(f"Unsupported monitoring tool: {monitoring}")
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/modules/observability/get_observe.py around lines 5 to 11, the
get_observe function only handles the case when monitoring equals
Observer.LANGFUSE and lacks handling for other Observer enum values like LLMLITE
and LANGSMITH. To fix this, add conditional branches to handle these other
observer types appropriately, or provide a fallback implementation or raise a
clear error if the monitoring tool is unsupported, ensuring all enum cases are
covered.
const onCognify = useCallback(() => { | ||
const dataset = datasets.find((dataset) => dataset.id === selectedDataset); | ||
return onDatasetCognify({ | ||
id: dataset!.id, | ||
name: dataset!.name, | ||
}); | ||
}, [datasets, onDatasetCognify, selectedDataset]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding null check before accessing dataset properties.
The function doesn't validate that a dataset was found before accessing its properties with non-null assertions (!
). This could cause runtime errors if selectedDataset
doesn't match any dataset in the array.
const onCognify = useCallback(() => {
const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
+ if (!dataset) {
+ showNotification("No dataset selected. Please select a dataset first.", 5000);
+ return Promise.reject(new Error("No dataset selected"));
+ }
return onDatasetCognify({
- id: dataset!.id,
- name: dataset!.name,
+ id: dataset.id,
+ name: dataset.name,
});
}, [datasets, onDatasetCognify, selectedDataset]);
🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify
function uses non-null assertions on dataset properties without checking if
dataset is null or undefined. To fix this, add a null check after finding the
dataset to ensure it exists before accessing its id and name. If dataset is not
found, handle the case appropriately, such as returning early or showing an
error, to prevent runtime errors.
query = """ | ||
UNWIND $nodes AS node | ||
MERGE (n {id: node.node_id}) | ||
ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp() | ||
ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp() | ||
RETURN ID(n) AS internal_id, n.id AS nodeId | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic labels cannot be parameterised – Cypher syntax error
n:node.label
attempts to use a parameter for the label, which Cypher/Memgraph doesn’t allow. The query will raise a syntax error.
-MERGE (n {id: node.node_id})
-ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp()
-ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp()
+MERGE (n {id: node.node_id})
+WITH n, node
+CALL apoc.create.addLabels(n, [node.label]) YIELD node AS n2 // requires APOC; otherwise build separate queries per label
+SET n2 += node.properties,
+ n2.updated_at = timestamp()
Alternatively, group nodes by label and execute one query per label.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 82 to 88, the Cypher query incorrectly uses a parameter for a node label
with syntax like n:node.label, which is not allowed. To fix this, refactor the
code to group nodes by their label and run separate queries for each label,
embedding the label directly in the query string instead of using a parameter.
This avoids dynamic label parameterization errors by executing one query per
label with the label hardcoded in the Cypher MERGE statement.
predecessors, successors = await asyncio.gather( | ||
self.query(predecessors_query, dict(node_id=str(node_id))), | ||
self.query(successors_query, dict(node_id=str(node_id))), | ||
) | ||
|
||
connections = [] | ||
|
||
for neighbour in predecessors: | ||
neighbour = neighbour["relation"] | ||
connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2])) | ||
|
||
for neighbour in successors: | ||
neighbour = neighbour["relation"] | ||
connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_connections
builds invalid tuples – relation
is not subscriptable
neighbour = neighbour["relation"]
returns a neo4j.graph.Relationship
object; indexing it (neighbour[0]
) will raise TypeError
. Consider extracting properties explicitly:
- for rec in predecessors:
- rel = rec["relation"]
- connections.append((rel.start_node["id"],
- {"relationship_name": rel.type},
- rel.end_node["id"]))
Also remove the shadowing of the neighbour
loop variable.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 380 to 394, the code incorrectly treats the 'relation' object as
subscriptable, causing a TypeError. To fix this, avoid indexing the 'relation'
object directly; instead, explicitly extract the start node ID, relationship
type, and end node ID using the appropriate properties or methods of the
Relationship object. Also, rename the inner loop variable to prevent shadowing
the outer 'neighbour' variable.
sanitized_id = node_id.replace(":", "_") | ||
|
||
query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" | ||
params = {"node_id": sanitized_id} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_node
Cypher syntax error (:
before {
)
MATCH (node: {{id: $node_id}})
results in MATCH (node: {id: $node_id})
, which is invalid. The colon introduces a label, not properties. Replace with:
-query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
+query = "MATCH (node {id: $node_id}) DETACH DELETE node"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sanitized_id = node_id.replace(":", "_") | |
query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" | |
params = {"node_id": sanitized_id} | |
sanitized_id = node_id.replace(":", "_") | |
query = "MATCH (node {id: $node_id}) DETACH DELETE node" | |
params = {"node_id": sanitized_id} |
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 120 to 124, the Cypher query syntax is incorrect because it uses a colon
before a property map, which is invalid. Replace the query string to remove the
colon and use parentheses with a property map directly, like MATCH (node {id:
$node_id}), to correctly match nodes by property.
UNWIND $node_ids AS id | ||
MATCH (node:`{id}`)<-[r:{edge_label}]-(successor) | ||
DELETE r; | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect f-string placeholders introduce labels named after IDs
MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
produces labels like :123e4567-e89b...
which is almost certainly unintended. Use a property match instead:
-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+MATCH (node {id: successor_id})<-[r:`{edge_label}`]-(successor)
(again, consider validating edge_label
and using parameterised Cypher).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 415 to 418, the Cypher query incorrectly uses f-string placeholders to
insert node IDs as labels, resulting in unintended labels like
:123e4567-e89b.... Replace the label match with a property match on the node,
such as matching nodes by an ID property instead of using the ID as a label.
Also, validate the edge_label input and use parameterized Cypher queries to
prevent injection risks.
def serialize_properties(self, properties=dict()): | ||
serialized_properties = {} | ||
|
||
for property_key, property_value in properties.items(): | ||
if isinstance(property_value, UUID): | ||
serialized_properties[property_key] = str(property_value) | ||
continue | ||
|
||
if isinstance(property_value, dict): | ||
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | ||
continue | ||
|
||
serialized_properties[property_key] = property_value | ||
|
||
return serialized_properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mutable default argument violates best practices
serialize_properties(self, properties=dict())
uses a mutable default; callers mutating the dict will affect subsequent calls.
- def serialize_properties(self, properties=dict()):
+ def serialize_properties(self, properties: Optional[dict] = None):
+ if properties is None:
+ properties = {}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def serialize_properties(self, properties=dict()): | |
serialized_properties = {} | |
for property_key, property_value in properties.items(): | |
if isinstance(property_value, UUID): | |
serialized_properties[property_key] = str(property_value) | |
continue | |
if isinstance(property_value, dict): | |
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | |
continue | |
serialized_properties[property_key] = property_value | |
return serialized_properties | |
def serialize_properties(self, properties: Optional[dict] = None): | |
if properties is None: | |
properties = {} | |
serialized_properties = {} | |
for property_key, property_value in properties.items(): | |
if isinstance(property_value, UUID): | |
serialized_properties[property_key] = str(property_value) | |
continue | |
if isinstance(property_value, dict): | |
serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) | |
continue | |
serialized_properties[property_key] = property_value | |
return serialized_properties |
🧰 Tools
🪛 Ruff (0.8.2)
430-430: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 430 to 444, the method serialize_properties uses a mutable default
argument (properties=dict()), which can lead to unexpected behavior if the
dictionary is modified. Change the default value to None and inside the method
initialize properties to an empty dictionary if it is None before proceeding.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.