Skip to content

Conversation

lxobr
Copy link
Collaborator

@lxobr lxobr commented Apr 14, 2025

Description

  • Extracted run_tasks_base function into a new file run_tasks_base.py.
  • Extracted four executors that execute core logic based on the task type.
  • Extracted a task handler/wrapper that safely executes the core logic with logging and telemetry.
  • Fixed the inconsistency with the batches of size 1.

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@lxobr lxobr self-assigned this Apr 14, 2025
Copy link

pull-checklist bot commented Apr 14, 2025

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Contributor

coderabbitai bot commented Apr 14, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce new environment variables for default user configuration and a new workflow migration process. Several configuration modules received updates including new methods and attributes for migration and LLM prompt handling. A new asynchronous task execution framework was added to refactor pipeline flows, while the search functionality now uses an updated type mapping. Additionally, multiple new evaluation pipelines for HotpotQA were implemented and obsolete files and notebooks were removed. Minor adjustments to logging, error handling, and test cases were also applied.

Changes

File(s) Change Summary
.env.template Added DEFAULT_USER_EMAIL and DEFAULT_USER_PASSWORD under a "Default User Configuration" section.
.github/README_WORKFLOW_MIGRATION.md,
.github/workflows/disable_independent_workflows.sh
Introduced a new document for migrating test workflows and a shell script to disable independent workflows by replacing triggers.
cognee-mcp/src/server.py Added new conditional branch in search to handle "GRAPH_COMPLETION" and "RAG_COMPLETION" search types.
cognee/api/v1/config/config.py,
cognee/base_config.py
Added a static method for migration DB configuration and new default user attributes sourced from environment variables.
cognee/eval_framework/corpus_builder/run_corpus_builder.py,
.../task_getters/get_default_tasks_by_indices.py,
cognee/eval_framework/evaluation/deep_eval_adapter.py
Updated signatures with additional parameters and optional types; modified conditional assignment for retrieval context.
cognee/infrastructure/databases/relational/config.py,
.../vector/embeddings/get_embedding_engine.py,
.../vector/PGVectorAdapter.py
Updated type annotations for migration DB fields, added an endpoint parameter for the embedding engine, and revamped data point creation to update existing entries.
cognee/infrastructure/llm/config.py,
cognee/infrastructure/llm/prompts/answer_simple_question_benchmark2.txt,
.../benchmark3.txt,
.../benchmark4.txt,
.../generate_graph_prompt_guided.txt,
.../generate_graph_prompt_oneshot.txt,
.../generate_graph_prompt_simple.txt,
.../generate_graph_prompt_strict.txt
Added a new graph_prompt_path attribute (with updates to to_dict) and introduced multiple new prompt files for QA and knowledge graph extraction.
cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py,
cognee/modules/pipelines/operations/run_tasks.py,
cognee/modules/pipelines/operations/run_tasks_base.py,
cognee/modules/search/methods/search.py,
cognee/modules/search/types/SearchType.py,
cognee/modules/users/methods/create_default_user.py,
cognee/modules/users/methods/get_default_user.py
Modified search type mapping (from COMPLETION to RAG_COMPLETION), removed the old task runner in favor of a new asynchronous task execution framework, and updated default user creation to rely on configuration.
cognee/shared/logging_utils.py Updated the exception handler to check for the __name__ attribute before assignment.
cognee/tasks/graph/extract_graph_from_data.py,
cognee/tasks/ingestion/migrate_relational_database.py
Added Optional to import statements and introduced a private method to remove duplicate edges during migration.
cognee/tests/test_cognee_server_start.py,
cognee/tests/test_custom_model.py,
cognee/tests/test_relational_db_migration.py,
cognee/tests/test_telemetry.py,
cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py
Added new test files for server start, relational DB migration, and telemetry; updated search type in model test; commented out regex entity extraction test logic.
evals/README.md,
evals/falkor_01042025/hotpot_qa_falkor_graphrag_sdk.py,
evals/graphiti_01042025/hotpot_qa_graphiti.py,
evals/mem0_01042025/hotpot_qa_mem0.py,
evals/plot_metrics.py,
evals/requirements.txt
(and deletions: evals/eval_swe_bench.py, evals/eval_utils.py)
Introduced comprehensive evaluation documentation and new pipelines for HotpotQA using different approaches (Falkor, Graphiti, Mem0) along with metrics plotting; removed obsolete evaluation scripts.
evals/test_datasets/initial_test/natural_language_processing.txt,
evals/test_datasets/initial_test/trump.txt
Deleted outdated test dataset files.
examples/node/fetch.js,
examples/node/handleServerErrors.js,
examples/node/main.js,
examples/python/pokemon_datapoints_example.py
Removed obsolete Node.js example files; in the Python example, modified task instantiation by removing the explicit batch size configuration.
notebooks/cognee_code_graph_demo.ipynb,
notebooks/cognee_hotpot_eval.ipynb,
notebooks/hr_demo.ipynb,
notebooks/pokemon_datapoints_notebook.ipynb,
notebooks/cognee_simple_demo.ipynb
Deleted multiple demonstration notebooks and updated the Cognee package version in one notebook from 0.1.26 to 0.1.36.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant RTB as run_tasks_base
    participant HT as handle_task
    participant GT as get_task_type
    participant GE as get_task_executor
    participant Exec as Executor (async/gen/coroutine/func)
    
    Caller->>RTB: Invoke run_tasks_base(tasks, data, user)
    RTB->>HT: For each task, call handle_task(task, args, leftover, batch_size, user)
    HT->>GT: Determine task type
    GT-->>HT: Return task type (e.g. async, generator, etc.)
    HT->>GE: Fetch corresponding executor function
    GE-->>HT: Return executor for task type
    HT->>Exec: Execute task using executor function
    Exec-->>HT: Yield results/errors
    HT-->>RTB: Return task result
    RTB-->>Caller: Return aggregated results
Loading
sequenceDiagram
    autonumber
    participant User
    participant SS as specific_search
    participant CR as CompletionRetriever
    
    User->>SS: Submit search query with type "RAG_COMPLETION"
    SS->>SS: Map RAG_COMPLETION to CompletionRetriever
    SS->>CR: Invoke retrieval method
    CR-->>SS: Return search results
    SS-->>User: Deliver results
Loading

Possibly related PRs

  • feat: Add default user to config #682: Aligns with the new environment variable additions in .env.template for default user configuration, ensuring consistency across configuration management.

Poem

I'm just a bunny with a curious twist,
Hopping through code changes that can’t be missed.
With new workflows and pipelines on track,
And prompts and tests keeping things intact.
My whiskers twitch with glee in delight –
CodeRabbit’s changes make my heart light!
🐇💻 Hop on into the future bright!


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

gitguardian bot commented Apr 14, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@lxobr lxobr marked this pull request as draft April 14, 2025 14:13
@lxobr lxobr changed the base branch from main to dev April 14, 2025 14:15
@lxobr lxobr marked this pull request as ready for review April 14, 2025 14:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 9

🧹 Nitpick comments (38)
cognee/infrastructure/llm/prompts/answer_simple_question_benchmark4.txt (1)

1-15: Review of Prompt Instructions Document

The new file clearly documents the standardized protocol for generating concise, context-based answers. The prompt instructions are well-structured, detailing:

  • Minimalism: (Line 3) advising brevity.
  • Question-Specific Responses: (Lines 4–8) with explicit instructions for handling yes/no, what/who/where, when, and how/why questions.
  • Formatting Rules: (Lines 9–11) enforcing no punctuation and that responses be in lowercase.
  • Context-Only Responses: (Line 12) ensuring that answers are derived solely from the provided context.
  • Overall Protocol Meaning: (Line 14) summarizing the design goal for direct communication.

Suggestions:

  • Consider adding a short header or file description at the very top (before line 1) to clarify the file’s purpose for future maintainers.
  • Optionally remove or repurpose extra blank lines (such as after line 2) to enhance compactness and consistency with other benchmark prompt files.

Overall, the document is consistent and clear in its intent.

cognee/infrastructure/llm/prompts/answer_simple_question_benchmark2.txt (1)

2-7: Bullet Instruction Clarity and Style Suggestion

The bullet points effectively specify the response guidelines for various question types. However, the repeated use of a similar structure (each bullet beginning with "- For") may trigger style warnings and could benefit from slight rewordings to enhance readability. Consider varying the sentence starters in some bullets to avoid stylistic repetition.

🧰 Tools
🪛 LanguageTool

[style] ~5-~5: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...y with a single word or brief phrase. - For when questions: return only the relevan...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~6-~6: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: return only the relevant date/time. - For how/why questions: use the briefest phr...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

cognee/infrastructure/llm/prompts/generate_graph_prompt_guided.txt (2)

10-15: Node Label Consistency – Wording Improvement
The guideline on label consistency is clear; however, the phrase on line 14 ("in a case of multiple words separated by whitespaces") could be rephrased for clarity. For example: "Each entity type should be singular, and if it consists of multiple words, they should be separated by a single space."


72-74: Inferred Facts Guidelines – Clarity Enhancement
The guideline on extracting inferred facts is valuable for enhancing the knowledge graph. To avoid ambiguity, consider adding a brief example or additional context on what qualifies as an inferred fact.

cognee/infrastructure/llm/prompts/generate_graph_prompt_strict.txt (2)

31-49: Clear Guidance on Dates, Numbers, and Properties

The sections on date formatting and numerical properties are detailed and unambiguous. The emphasis on using the "YYYY-MM-DD" format for dates and enforcing snake_case for properties adds necessary rigor.
Suggestion: Including an explicit example of a key-value extraction for a numerical value might further aid implementers.


89-89: Annotation Consistency

Line 89 is missing the tilde (~) annotation that is consistently used on the other added lines. Please update this line's annotation to maintain consistency across the file.

cognee/tasks/graph/extract_graph_from_data.py (1)

2-2: Unused import detected.

The Optional type is imported but not used anywhere in this file. This creates unnecessary clutter.

-from typing import Type, List, Optional
+from typing import Type, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/infrastructure/llm/prompts/generate_graph_prompt_oneshot.txt (3)

30-32: Minor grammatical improvement opportunity.

Consider revising the wording for clarity:

-  - Always use full, canonical names.
+  - Always use complete, canonical names.
🧰 Tools
🪛 LanguageTool

[grammar] ~30-~30: Did you mean the adjective “useful”?
Context: ...ived directly from the text. - Always use full, canonical names. - Do not use in...

(THANK_FULL)


83-89: Date format consistency.

The date example includes a missing comma after the year which is inconsistent with standard format:

-> **Input**: "Google was founded on September 4, 1998 and has a market cap of 800000000000."
+> **Input**: "Google was founded on September 4, 1998, and has a market cap of 800000000000."
🧰 Tools
🪛 LanguageTool

[style] ~83-~83: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...**: "Google was founded on September 4, 1998 and has a market cap of 800000000000." ...

(MISSING_COMMA_AFTER_YEAR)


127-132: Wording simplification opportunity.

The phrase "absolutely essential" could be simplified for clarity:

-- **Rule**: Avoid vague or empty edges (e.g., "X is a concept") unless absolutely essential.
+- **Rule**: Avoid vague or empty edges (e.g., "X is a concept") unless essential.
🧰 Tools
🪛 LanguageTool

[style] ~127-~127: ‘absolutely essential’ might be wordy. Consider a shorter alternative.
Context: ...y edges (e.g., "X is a concept") unless absolutely essential. ### 4.3 Inferred Facts - Rule: On...

(EN_WORDINESS_PREMIUM_ABSOLUTELY_ESSENTIAL)

cognee/tasks/ingestion/migrate_relational_database.py (1)

148-168: Good implementation of duplicate edge removal.

The _remove_duplicate_edges function efficiently eliminates duplicate edges using a set to track unique edges. The implementation correctly handles the dictionary attributes by converting them to a hashable representation.

Consider adding logging to track how many duplicates were removed:

def _remove_duplicate_edges(edge_mapping):
    seen = set()
    unique_original_shape = []

    for tup in edge_mapping:
        # We go through all the tuples in the edge_mapping and we only add unique tuples to the list
        # To eliminate duplicate edges.
        source_id, target_id, rel_name, rel_dict = tup
        # We need to convert the dictionary to a frozenset to be able to compare values for it
        rel_dict_hashable = frozenset(sorted(rel_dict.items()))
        hashable_tup = (source_id, target_id, rel_name, rel_dict_hashable)

        # We use the seen set to keep track of unique edges
        if hashable_tup not in seen:
            # A list that has frozensets elements instead of dictionaries is needed to be able to compare values
            seen.add(hashable_tup)
            # append the original tuple shape (with the dictionary) if it's the first time we see it
            unique_original_shape.append(tup)

+    duplicates_removed = len(edge_mapping) - len(unique_original_shape)
+    if duplicates_removed > 0:
+        logger.info(f"Removed {duplicates_removed} duplicate edges")
    return unique_original_shape
cognee/base_config.py (1)

17-18: Ensure secure handling of default user credentials.

Storing user credentials in environment variables is generally acceptable, but make sure they aren’t logged or exposed inadvertently. For enhanced security, consider using a secrets manager or vault-based approach.

cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py (1)

1-1: Remove unused import.

The Optional import from typing is not being used in this file.

-from typing import Type, Optional
+from typing import Type
🧰 Tools
🪛 Ruff (0.8.2)

1-1: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/api/v1/config/config.py (1)

134-147: Fix documentation in docstring for migration config method.

The docstring incorrectly refers to "relational db config" instead of "migration db config".

-        Updates the relational db config with values from config_dict.
+        Updates the migration db config with values from config_dict.

Otherwise, the implementation correctly follows the pattern used in other configuration methods, which provides consistency in the codebase.

cognee/eval_framework/corpus_builder/run_corpus_builder.py (1)

3-3: Remove unused import

The Optional type is imported but not used in the code.

-from typing import List, Optional
+from typing import List
🧰 Tools
🪛 Ruff (0.8.2)

3-3: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/infrastructure/databases/relational/config.py (2)

38-38: Consider using Optional[str] for readability.

Instead of Union[str, None], you may prefer Optional[str] from typing to make it more apparent that None is allowed and to align with typical Python conventions.

-    migration_db_name: Union[str, None] = None
+    from typing import Optional
+    migration_db_name: Optional[str] = None

43-43: Apply the same convention for migration_db_provider.

Similar to the suggestion above, opt for Optional[str].

-    migration_db_provider: Union[str, None] = None
+    migration_db_provider: Optional[str] = None
cognee/tests/test_cognee_server_start.py (2)

10-28: Enhance server startup reliability.

Relying on a fixed 20-second time.sleep may cause flakiness on slower machines or waste time on faster ones. Consider a loop to check readiness (e.g., repeatedly try connecting until the server responds or a timeout is reached).


38-44: Graceful teardown approach is acceptable.

Using os.killpg with SIGTERM is typically safe. Just ensure the parent process group logic works properly on non-POSIX systems. In smaller test environments, a simple terminate() might suffice, but this approach is fine if you tested it cross-platform.

.github/README_WORKFLOW_MIGRATION.md (1)

30-92: Good manual migration guide.

The step-by-step approach to switch on: triggers to workflow_call: is straightforward. Consider adding a note on how to reference the new test-suites.yml file, if it’s in a non-obvious location.

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

128-151: Potential performance concern in per-data-point lookups.

The loop executes a SELECT for each data point to determine if it needs an update or insert. This may be inefficient for large batches. Consider a bulk approach (e.g., a single SELECT for all IDs, building a map, then updating or inserting accordingly) to reduce database roundtrips.

evals/README.md (3)

66-69: Remove unnecessary blank lines.

There are several empty lines before the "Human Evaluation" section that disrupt the document flow. Consider removing them to maintain consistent spacing throughout the document.

- 
- 
- 
#### Human Evaluation

85-85: Fix grammar in the limitations list.

The sentence has a grammar issue that affects readability.

- LLM as a judge metrics are not reliable measure and can indicate the overall accuracy
+ LLM as a judge metrics are not a reliable measure and can only approximate the overall accuracy

87-87: Hyphenate the compound adjective "labor-intensive".

According to English style conventions, compound adjectives should be hyphenated when they precede a noun.

- Human as a judge is labor intensive and does not scale
+ Human as a judge is labor-intensive and does not scale
🧰 Tools
🪛 LanguageTool

[misspelling] ~87-~87: This word is normally spelled with a hyphen.
Context: ...memory evaluation - Human as a judge is labor intensive and does not scale - Hotpot is not the ...

(EN_COMPOUNDS_LABOR_INTENSIVE)

cognee/tests/test_relational_db_migration.py (2)

111-111: Rename unused loop variable.

The loop control variable edge_data is not used within the loop body, which is flagged by static analysis.

- for src, tgt, key, edge_data in edges:
+ for src, tgt, key, _edge_data in edges:
🧰 Tools
🪛 Ruff (0.8.2)

111-111: Loop control variable edge_data not used within loop body

Rename unused edge_data to _edge_data

(B007)


165-169: Replace magic numbers with named constants.

The assertions use hardcoded numbers for expected node and edge counts, which makes the tests brittle and harder to maintain.

+ # Define expected counts at the top of the file or in a configuration
+ SQLITE_EXPECTED_NODE_COUNT = 227
+ SQLITE_EXPECTED_EDGE_COUNT = 580
+ POSTGRES_EXPECTED_NODE_COUNT = 115
+ POSTGRES_EXPECTED_EDGE_COUNT = 356

# Then use these constants in the assertions
- assert node_count == 227, f"Expected 227 nodes, got {node_count}"
- assert edge_count == 580, f"Expected 580 edges, got {edge_count}"
+ assert node_count == SQLITE_EXPECTED_NODE_COUNT, f"Expected {SQLITE_EXPECTED_NODE_COUNT} nodes, got {node_count}"
+ assert edge_count == SQLITE_EXPECTED_EDGE_COUNT, f"Expected {SQLITE_EXPECTED_EDGE_COUNT} edges, got {edge_count}"

# And later for PostgreSQL
- assert node_count == 115, f"Expected 115 nodes, got {node_count}"
- assert edge_count == 356, f"Expected 356 edges, got {edge_count}"
+ assert node_count == POSTGRES_EXPECTED_NODE_COUNT, f"Expected {POSTGRES_EXPECTED_NODE_COUNT} nodes, got {node_count}"
+ assert edge_count == POSTGRES_EXPECTED_EDGE_COUNT, f"Expected {POSTGRES_EXPECTED_EDGE_COUNT} edges, got {edge_count}"

Also applies to: 196-200

evals/mem0_01042025/hotpot_qa_mem0.py (2)

28-28: Rename unused loop variable.

The loop control variable i is not used within the loop body.

- for i, document in enumerate(tqdm(corpus, desc="Adding documents")):
+ for _, document in enumerate(tqdm(corpus, desc="Adding documents")):
🧰 Tools
🪛 Ruff (0.8.2)

28-28: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


71-72: Consider handling empty search results.

The code doesn't check if the memory search returns empty results before trying to format them.

- relevant_memories = memory.search(query=question, user_id=user_id, limit=5)
- memories_str = "\n".join(f"- {entry['memory']}" for entry in relevant_memories["results"])
+ relevant_memories = memory.search(query=question, user_id=user_id, limit=5)
+ if not relevant_memories.get("results"):
+     memories_str = "No relevant context found."
+ else:
+     memories_str = "\n".join(f"- {entry['memory']}" for entry in relevant_memories["results"])
evals/plot_metrics.py (2)

9-20: Robust confidence interval calculation.

This method correctly uses stats.t.interval for computing the confidence interval. Consider handling edge cases like NaN values or infinities in accuracies if your data source can produce malformed values.


62-175: Extensive logic for loading specialized metrics.

The conditional checks for system-specific metrics (Graphiti, Mem0, Cognee) are well-structured. However, the nested conditions are somewhat verbose. You might consider centralizing the logic in a mapping dict to reduce repetitive code.

Additionally, lines 210-210 iterate over all_systems_metrics.keys(). This could be streamlined:

-for system in all_systems_metrics.keys():
+for system in all_systems_metrics:
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2)

33-46: Combining base tasks with graph-related tasks.

Returning [graph_task, add_data_points_task] in addition to the base tasks is straightforward and clear. If user is essential to these tasks, confirm its usage or future usage in subsequent references.


49-58: Core chunking tasks plus data points addition.

Similar to get_no_summary_tasks, user is introduced but not used within the function body. If it’s truly optional, consider documenting that in the docstring for clarity.

evals/falkor_01042025/hotpot_qa_falkor_graphrag_sdk.py (2)

1-9: Unused import detected.

URL from graphrag_sdk.source is never used (line 5). Consider removing it to keep imports clean:

- from graphrag_sdk.source import URL, STRING
+ from graphrag_sdk.source import STRING
🧰 Tools
🪛 Ruff (0.8.2)

5-5: graphrag_sdk.source.URL imported but unused

Remove unused import: graphrag_sdk.source.URL

(F401)


35-64: Ontology creation logic seems solid.

The function creates an ontology from source data and writes it to JSON. Good job providing print statements for user awareness. Consider robust error handling if the corpus file has unexpected structure.

evals/graphiti_01042025/hotpot_qa_graphiti.py (3)

9-9: Remove unused import

The from openai import OpenAI import is never referenced. Removing it will keep the code clean and avoid confusion.

- from openai import OpenAI
🧰 Tools
🪛 Ruff (0.8.2)

9-9: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)


23-24: Add robust file handling

Reading JSON files without error handling can lead to runtime failures if files are missing or malformed. Safely handle file I/O and JSON parsing errors to improve resilience.

 try:
     with open(corpus_file, "r", encoding="utf-8") as file:
         corpus = json.load(file)
 except (FileNotFoundError, json.JSONDecodeError) as e:
     print(f"Error loading {corpus_file}: {e}")
     return

Also applies to: 61-62, 118-119


94-95: Handle potential LLM call failures

For calls to llm.ainvoke, consider adding retries or exception handling to manage rate limits, network errors, or unexpected responses from the LLM.

cognee/modules/pipelines/operations/run_tasks_base.py (1)

18-42: Check for potential recursion depth with large tasks

The approach of yielding results via run_tasks_base within an async generator is elegant. However, if there are many leftover tasks, repeated calls could accumulate recursive iterations. Ensure that the sequence of tasks is finite and that you won't exceed recursion limits or have performance bottlenecks.

🛑 Comments failed to post (9)
cognee/shared/logging_utils.py (1)

216-217: ⚠️ Potential issue

Critical syntax error in hasattr check.

The __name__ reference is incorrect - it's trying to use the value of __name__ variable rather than checking for the string literal "__name__". This will cause a NameError at runtime.

-            if hasattr(exc_type, __name__):
+            if hasattr(exc_type, "__name__"):
                event_dict["exception_type"] = exc_type.__name__
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            if hasattr(exc_type, "__name__"):
                event_dict["exception_type"] = exc_type.__name__
.github/workflows/disable_independent_workflows.sh (1)

4-4: ⚠️ Potential issue

Add error handling to directory change.

The cd command should include error handling to prevent the script from continuing execution in the wrong directory if the command fails.

-cd "$(dirname "$0")"
+cd "$(dirname "$0")" || exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

cd "$(dirname "$0")" || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

cognee/tests/test_telemetry.py (1)

44-44: ⚠️ Potential issue

Fix misleading comment about environment setting.

The comment suggests setting the environment to "dev" but the code sets it to "prod", which is contradictory.

- os.environ["ENV"] = "prod"  # Set to dev to ensure telemetry is sent
+ os.environ["ENV"] = "prod"  # Set to prod to ensure telemetry is sent
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        os.environ["ENV"] = "prod"  # Set to prod to ensure telemetry is sent
cognee/tests/test_relational_db_migration.py (1)

221-221: 🛠️ Refactor suggestion

Add more detailed instructions for PostgreSQL test setup.

The comment mentions running a SQL script but lacks specifics on where to find it or how to run it.

- # To run test manually you first need to run the Chinook_PostgreSql.sql script in the test_data directory
+ # To run this test manually:
+ # 1. Ensure PostgreSQL is running and accessible with the credentials specified below
+ # 2. Run the following command from the project root:
+ #    psql -U cognee -d test_migration_db -h 127.0.0.1 -f cognee/tests/test_data/Chinook_PostgreSql.sql
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # To run this test manually:
    # 1. Ensure PostgreSQL is running and accessible with the credentials specified below
    # 2. Run the following command from the project root:
    #    psql -U cognee -d test_migration_db -h 127.0.0.1 -f cognee/tests/test_data/Chinook_PostgreSql.sql
evals/mem0_01042025/hotpot_qa_mem0.py (4)

53-54: 🛠️ Refactor suggestion

Add error handling for QA pairs file.

Similar to the corpus loading, there's no error handling for loading the QA pairs file.

- with open(qa_pairs_file, "r") as file:
-     qa_pairs = json.load(file)
+ try:
+     with open(qa_pairs_file, "r") as file:
+         qa_pairs = json.load(file)
+ except FileNotFoundError:
+     print(f"Error: QA pairs file '{qa_pairs_file}' not found.")
+     return []
+ except json.JSONDecodeError:
+     print(f"Error: QA pairs file '{qa_pairs_file}' contains invalid JSON.")
+     return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    try:
        with open(qa_pairs_file, "r") as file:
            qa_pairs = json.load(file)
    except FileNotFoundError:
        print(f"Error: QA pairs file '{qa_pairs_file}' not found.")
        return []
    except json.JSONDecodeError:
        print(f"Error: QA pairs file '{qa_pairs_file}' contains invalid JSON.")
        return []

111-136: 🛠️ Refactor suggestion

Add error handling in main function.

The main function should handle potential exceptions from the pipeline to prevent crashes.

- def main(config):
-     """Main function for HotpotQA memory pipeline."""
-     print("Starting HotpotQA memory pipeline...")
-     print(f"Configuration: {config}")
-
-     # Initialize clients
-     memory = Memory()
-     openai_client = OpenAI()
-
-     # Load corpus to memory
-     load_corpus_to_memory(
-         memory=memory,
-         corpus_file=config.corpus_file,
-         user_id=config.user_id,
-         limit=config.corpus_limit,
-     )
-
-     # Answer questions
-     print(f"Answering questions from {config.qa_pairs_file}...")
-     answer_questions(
-         memory=memory,
-         openai_client=openai_client,
-         model_name=config.model_name,
-         user_id=config.user_id,
-         qa_pairs_file=config.qa_pairs_file,
-         print_results=config.print_results,
-         output_file=config.results_file,
-         limit=config.qa_limit,
-     )
-
-     print(f"Results saved to {config.results_file}")
-     print("Pipeline completed successfully")
+ def main(config):
+     """Main function for HotpotQA memory pipeline."""
+     try:
+         print("Starting HotpotQA memory pipeline...")
+         print(f"Configuration: {config}")
+
+         # Initialize clients
+         memory = Memory()
+         openai_client = OpenAI()
+
+         # Load corpus to memory
+         load_corpus_to_memory(
+             memory=memory,
+             corpus_file=config.corpus_file,
+             user_id=config.user_id,
+             limit=config.corpus_limit,
+         )
+
+         # Answer questions
+         print(f"Answering questions from {config.qa_pairs_file}...")
+         answer_questions(
+             memory=memory,
+             openai_client=openai_client,
+             model_name=config.model_name,
+             user_id=config.user_id,
+             qa_pairs_file=config.qa_pairs_file,
+             print_results=config.print_results,
+             output_file=config.results_file,
+             limit=config.qa_limit,
+         )
+
+         print(f"Results saved to {config.results_file}")
+         print("Pipeline completed successfully")
+     except Exception as e:
+         print(f"Error in pipeline execution: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def main(config):
    """Main function for HotpotQA memory pipeline."""
    try:
        print("Starting HotpotQA memory pipeline...")
        print(f"Configuration: {config}")

        # Initialize clients
        memory = Memory()
        openai_client = OpenAI()

        # Load corpus to memory
        load_corpus_to_memory(
            memory=memory,
            corpus_file=config.corpus_file,
            user_id=config.user_id,
            limit=config.corpus_limit,
        )

        # Answer questions
        print(f"Answering questions from {config.qa_pairs_file}...")
        answer_questions(
            memory=memory,
            openai_client=openai_client,
            model_name=config.model_name,
            user_id=config.user_id,
            qa_pairs_file=config.qa_pairs_file,
            print_results=config.print_results,
            output_file=config.results_file,
            limit=config.qa_limit,
        )

        print(f"Results saved to {config.results_file}")
        print("Pipeline completed successfully")
    except Exception as e:
        print(f"Error in pipeline execution: {e}")

18-38: 🛠️ Refactor suggestion

Add error handling for file operations.

The function lacks error handling for file operations, which could fail if the corpus file doesn't exist or has invalid JSON.

- with open(corpus_file, "r") as file:
-     corpus = json.load(file)
+ try:
+     with open(corpus_file, "r") as file:
+         corpus = json.load(file)
+ except FileNotFoundError:
+     print(f"Error: Corpus file '{corpus_file}' not found.")
+     return memory
+ except json.JSONDecodeError:
+     print(f"Error: Corpus file '{corpus_file}' contains invalid JSON.")
+     return memory
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    print(f"Loading corpus from {corpus_file}...")
    try:
        with open(corpus_file, "r") as file:
            corpus = json.load(file)
    except FileNotFoundError:
        print(f"Error: Corpus file '{corpus_file}' not found.")
        return memory
    except json.JSONDecodeError:
        print(f"Error: Corpus file '{corpus_file}' contains invalid JSON.")
        return memory

    # Apply limit if specified
    if limit is not None:
        corpus = corpus[:limit]
        print(f"Limited to first {limit} documents")

    print(f"Adding {len(corpus)} documents to memory...")
    for i, document in enumerate(tqdm(corpus, desc="Adding documents")):
        # Create a conversation that includes the document content
        messages = [
            {"role": "system", "content": "This is a document to remember."},
            {"role": "user", "content": "Please remember this document."},
            {"role": "assistant", "content": document},
        ]
        memory.add(messages, user_id=user_id)

    print("All documents added to memory")
    return memory
🧰 Tools
🪛 Ruff (0.8.2)

28-28: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


98-100: 🛠️ Refactor suggestion

Add error handling for writing results.

The code should handle potential errors when writing to the output file.

- with open(output_file, "w", encoding="utf-8") as file:
-     json.dump(results, file, indent=2)
+ try:
+     with open(output_file, "w", encoding="utf-8") as file:
+         json.dump(results, file, indent=2)
+ except IOError as e:
+     print(f"Error: Failed to write results to '{output_file}': {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        print(f"Saving results to {output_file}...")
        try:
            with open(output_file, "w", encoding="utf-8") as file:
                json.dump(results, file, indent=2)
        except IOError as e:
            print(f"Error: Failed to write results to '{output_file}': {e}")
cognee/modules/pipelines/operations/run_tasks_base.py (1)

117-169: ⚠️ Potential issue

Guard against a potential None user

In handle_task, the code sends telemetry using user.id, but there is no check if user is None. If user is optional, referencing user.id will cause errors. Consider a safeguard or a fallback mechanism.

-        user_id=user.id,
+        user_id=user.id if user else None,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

async def handle_task(
    running_task: Task,
    args: list,
    leftover_tasks: list[Task],
    next_task_batch_size: int,
    user: User,
):
    """Handle common task workflow with logging, telemetry, and error handling around the core execution logic."""
    # Get task information using the helper functions
    task_type = get_task_type(running_task)
    executor = get_task_executor(task_type)

    # Determine executor args based on task type
    execute_args = (args, leftover_tasks)
    if task_type in ["Async Generator", "Generator"]:
        execute_args += (next_task_batch_size,)

    logger.info(f"{task_type} task started: `{running_task.executable.__name__}`")
    send_telemetry(
        f"{task_type} Task Started",
        user_id=user.id if user else None,
        additional_properties={
            "task_name": running_task.executable.__name__,
        },
    )
    try:
        # Add user to the execute args
        complete_args = execute_args + (user,)
        async for result in executor(running_task, *complete_args):
            yield result

        logger.info(f"{task_type} task completed: `{running_task.executable.__name__}`")
        send_telemetry(
            f"{task_type} Task Completed",
            user_id=user.id if user else None,
            additional_properties={
                "task_name": running_task.executable.__name__,
            },
        )
    except Exception as error:
        logger.error(
            f"{task_type} task errored: `{running_task.executable.__name__}`\n{str(error)}\n",
            exc_info=True,
        )
        send_telemetry(
            f"{task_type} Task Errored",
            user_id=user.id if user else None,
            additional_properties={
                "task_name": running_task.executable.__name__,
            },
        )
        raise error

@lxobr lxobr marked this pull request as draft April 14, 2025 14:38
@lxobr lxobr requested review from soobrosa and borisarzentar April 14, 2025 15:52
@lxobr lxobr marked this pull request as ready for review April 14, 2025 15:52
@lxobr lxobr requested a review from borisarzentar April 15, 2025 19:00
@lxobr
Copy link
Collaborator Author

lxobr commented Apr 15, 2025

I also fixed a little bug with the TaskGetters.

@lxobr lxobr merged commit d1eab97 into dev Apr 16, 2025
45 of 49 checks passed
@lxobr lxobr deleted the feature/cog-1879-tighten-run_tasks_base branch April 16, 2025 07:19
@soobrosa
Copy link
Contributor

soobrosa commented Apr 16, 2025

Great job @lxobr !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants