-
Notifications
You must be signed in to change notification settings - Fork 605
fix: raise an error when different rules produce identical (temp) output #3667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: raise an error when different rules produce identical (temp) output #3667
Conversation
📝 WalkthroughWalkthroughA new validation step was moved from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Snakemake
participant DAG
User->>Snakemake: Start workflow
Snakemake->>DAG: init(progress=False)
DAG->>DAG: (No job validation here)
Snakemake->>DAG: check_jobs()
DAG->>DAG: For each needrun job not checked:
DAG->>DAG: call check_protected_output()
DAG->>DAG: For each unchecked job:
DAG->>DAG: Validate job (job.is_valid())
DAG->>DAG: For each output:
alt Output already seen and ambiguity not ignored
DAG-->>Snakemake: Raise AmbiguousRuleException
else
DAG->>_seen_outputs: Record output with job
end
DAG-->>Snakemake: Continue if no conflicts
Snakemake-->>User: Workflow proceeds or error reported
sequenceDiagram
participant TestSuite
participant Snakemake
participant DAG
TestSuite->>Snakemake: Run test workflow with ambiguous outputs
Snakemake->>DAG: init(progress=False)
Snakemake->>DAG: check_jobs()
DAG->>DAG: Detect duplicate outputs across jobs
DAG-->>Snakemake: Raise AmbiguousRuleException
Snakemake-->>TestSuite: Exception propagated
TestSuite->>TestSuite: Test passes if exception is raised
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (56)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/tests.py (1)
445-451
: Replaceassert False
withraise AssertionError()
to handle Python optimizationThe test logic correctly verifies that the ambiguous rule case raises an
AmbiguousRuleException
. However, usingassert False
can be problematic when Python runs with optimization (python -O
) as assert statements are removed.def test_ambiguousruleexception(): try: run(dpath("test_ambiguousruleexception")) except AmbiguousRuleException: return - assert False, "This is an ambiguous case! Should have raised an error..." + raise AssertionError("This is an ambiguous case! Should have raised an error...")The test structure correctly validates that the new DAG validation detects ambiguous output files and raises the appropriate exception.
🧹 Nitpick comments (1)
src/snakemake/dag.py (1)
277-277
: Remove unused loop variable.The loop variable
i
is not used within the loop body and can be replaced with_
to indicate it's intentionally unused.Apply this diff:
- for i, job in enumerate(self.jobs): + for _, job in enumerate(self.jobs):Or if the enumeration isn't needed at all:
- for i, job in enumerate(self.jobs): + for job in self.jobs:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/dag.py
(1 hunks)tests/tests.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
tests/tests.py
src/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
tests/tests.py (6)
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#2985
File: tests/tests.py:2051-2051
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.
Learnt from: johanneskoester
PR: snakemake/snakemake#2985
File: tests/tests.py:2051-2051
Timestamp: 2024-08-13T09:25:24.046Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.
Learnt from: mbhall88
PR: snakemake/snakemake#3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.
Learnt from: johanneskoester
PR: snakemake/snakemake#2927
File: tests/test_subpath/Snakefile:11-11
Timestamp: 2024-12-14T13:10:48.450Z
Learning: In Snakemake, when a rule defines a single output file (even if the output is specified as a dictionary), `{output}` in the shell command can be used directly to refer to that file.
src/snakemake/dag.py (6)
undefined
<retrieved_learning>
Learnt from: johanneskoester
PR: #3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In snakemake/dag.py
, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
</retrieved_learning>
<retrieved_learning>
Learnt from: johanneskoester
PR: #3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In snakemake/dag.py
, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
</retrieved_learning>
<retrieved_learning>
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py
, avoid suggesting the use of asyncio.gather
in the jobs_rewards
method due to overhead concerns and the need for immediate results.
</retrieved_learning>
<retrieved_learning>
Learnt from: johanneskoester
PR: #3184
File: snakemake/dag.py:1340-1342
Timestamp: 2024-11-05T09:41:56.469Z
Learning: The method self.workflow.persistence.software_stack_changed(job)
in snakemake/persistence.py
is not asynchronous.
</retrieved_learning>
<retrieved_learning>
Learnt from: johanneskoester
PR: #3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
</retrieved_learning>
<retrieved_learning>
Learnt from: johanneskoester
PR: #3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
</retrieved_learning>
🪛 Ruff (0.11.9)
tests/tests.py
2450-2450: Do not assert False
(python -O
removes these calls), raise AssertionError()
Replace assert False
(B011)
src/snakemake/dag.py
276-276: Undefined name Dict
(F821)
277-277: Loop control variable i
not used within loop body
Rename unused i
to _i
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (10, windows-latest, py312)
- GitHub Check: tests (10, macos-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (8, macos-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (4, macos-latest, py312)
- GitHub Check: tests (6, windows-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (6, macos-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (5, macos-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (4, windows-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: apidocs
- GitHub Check: docs
- GitHub Check: formatting
🔇 Additional comments (2)
src/snakemake/dag.py (1)
278-289
: Excellent addition for detecting ambiguous output files.This validation logic effectively addresses the edge case mentioned in the PR description where multiple jobs could produce the same output file. The implementation correctly:
- Tracks seen outputs to detect duplicates
- Respects the
ignore_ambiguity
setting- Raises the appropriate
AmbiguousRuleException
with relevant job information- Integrates well with the existing ambiguity checking framework
The placement within the
init
method ensures this validation runs after the DAG is fully constructed, which is the appropriate time to detect such conflicts.tests/tests.py (1)
22-22
: LGTM: Import statement is correctThe import of
AmbiguousRuleException
is necessary for the new test function and properly imported from the appropriate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_ambiguousruleexception/Snakefile
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2927
File: tests/test_subpath/Snakefile:11-11
Timestamp: 2024-12-14T13:10:48.450Z
Learning: In Snakemake, when a rule defines a single output file (even if the output is specified as a dictionary), `{output}` in the shell command can be used directly to refer to that file.
tests/test_ambiguousruleexception/Snakefile (7)
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#2925
File: tests/test_nonstr_params/Snakefile:14-15
Timestamp: 2024-12-21T15:10:31.992Z
Learning: The file "test.jsonl" in tests/test_nonstr_params is automatically created by Snakemake, rather than manually generated in the Snakefile.
Learnt from: mbhall88
PR: snakemake/snakemake#3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.
Learnt from: johanneskoester
PR: snakemake/snakemake#2985
File: tests/tests.py:2051-2051
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.
Learnt from: johanneskoester
PR: snakemake/snakemake#2985
File: tests/tests.py:2051-2051
Timestamp: 2024-08-13T09:25:24.046Z
Learning: In the Snakemake repository, avoid suggesting return type annotations for test functions.
Learnt from: johanneskoester
PR: snakemake/snakemake#2927
File: tests/test_subpath/Snakefile:11-11
Timestamp: 2024-12-14T13:10:48.450Z
Learning: In Snakemake, when a rule defines a single output file (even if the output is specified as a dictionary), `{output}` in the shell command can be used directly to refer to that file.
🔇 Additional comments (1)
tests/test_ambiguousruleexception/Snakefile (1)
1-10
: LGTM! Test setup correctly defines target files.The
all
rule properly specifies the target files that will trigger both rules A and B to execute, creating the scenario needed to test the ambiguous rule detection.
No need to update the changelog, that will happen automatically via release-please. |
🤖 I have created a release *beep* *boop* --- ## [9.10.0](v9.9.0...v9.10.0) (2025-08-19) ### Features * migrate to scheduler plugin interface and scheduler plugins ([#3676](#3676)) ([26fcd38](26fcd38)) ### Bug Fixes * don't rate limit jobs when touching ([#3699](#3699)) ([9c499e5](9c499e5)) * raise an error when different rules produce identical (temp) output ([#3667](#3667)) ([f627176](f627176)) * silence messages on module load ([#3688](#3688)) ([b13e9c8](b13e9c8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
I noticed an edge case where unused temporary output of different rules would go to the same file. This can introduce nasty side effects, and also made it so that my rules needed to be re-run all the times (because two rules that produce the same output is always detected as a code change by one of the two rules). This is now nicely raises an exception giving an early warning to the user that sth is misconfigured.
I'm not sure how this affects performance or other stuff, let me know if sth needs changes or if this is actually wanted behaviour 😄
p.s. not sure if I am supposed to update the CHANGELOG?
QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
Tests
New Features