Skip to content

Conversation

MaddyGuthridge
Copy link
Contributor

@MaddyGuthridge MaddyGuthridge commented Jan 25, 2025

Resolves: python-poetry#10105

My first Poetry contribution!

  • I've placed this inside the standard (not strict) checks intentionally, since otherwise errors occur later on in Poetry due to it expecting the file to exist.
  • I needed to add a temporary_cd context manager to the testutils module. Is there a better way of doing this?
  • The code for checking the property's validity felt extremely convoluted to me. If anyone can think of a cleaner way to write it, I'd love some suggestions.
  • This fix could be a band-aid rather than a proper solution. While it should prevent the confusing error described in the original issue, the fact that file-system checking is done during non-strict validation doesn't feel right to me. Would it be better to instead patch code that expects the license file to be valid for seemingly unimportant operations (eg poetry run pytest as per the original comment)?
  • Added tests for changed code.
  • Updated documentation for changed code.

Summary by Sourcery

Tests:

  • Added tests for license file validation.

Copy link

sourcery-ai bot commented Jan 25, 2025

Reviewer's Guide by Sourcery

This pull request introduces a check to ensure that the license file specified in the pyproject.toml exists. It also adds a utility for temporarily changing the current directory during tests.

Sequence diagram for license file validation

sequenceDiagram
    participant Factory as Poetry Factory
    participant FS as File System

    Factory->>Factory: validate()
    Note over Factory: Check project.license
    alt project has license data
        Factory->>Factory: Get license data
        alt license has file property
            Factory->>FS: Check if license file exists
            alt file does not exist
                Factory->>Factory: Add error to result
            end
        end
    end
Loading

File-Level Changes

Change Details Files
Added a check to validate the existence of the license file specified in project.license.file.
  • Added a check within the validate function to verify if the license file exists.
  • If the file does not exist, an error message is added to the validation result.
src/poetry/core/factory.py
Added a context manager for temporarily changing the current directory.
  • Introduced a temporary_cd context manager that changes the current directory to a specified path.
  • The context manager ensures that the current directory is restored to its original state after the context exits.
tests/testutils.py
Added tests for the new license file validation.
  • Added a test case test_validate_missing_license_file to verify that an error is reported when the license file is missing.
  • Added a test case test_validate_existing_license_file to verify that no error is reported when the license file exists.
tests/test_factory.py
Added a fixture for testing missing license files.
  • Added a new fixture missing_license_file with a pyproject.toml file that specifies a license file that does not exist.
tests/fixtures/missing_license_file/pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@MaddyGuthridge MaddyGuthridge changed the title Report an error when a license file does not exist Pyproject validation: report an error when a license file does not exist Jan 25, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MaddyGuthridge - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider moving filesystem checks out of the validation phase. While this fix works, performing filesystem operations during pyproject.toml validation could cause issues in scenarios where we only want to parse/validate the file structure (e.g., CI/CD pipelines, documentation tools).
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

license_data = project.get("license")
if isinstance(license_data, dict) and "file" in license_data:
license_path: str = license_data["file"]
if not Path(license_path).exists():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The license file path should be resolved relative to the pyproject.toml location, not the current working directory

This could cause false validation results if the code is run from a different directory. Consider using the config file's directory as the base path for resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the bot's assessment here, but to my knowledge there is no clear way to access the path of the license file within the validation. Changing it would likely require significant refactoring, which makes me wonder if this is the best place for this bug fix...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my answer from #800 (comment) applies here, too. License files and readme files are quite similar and should be treated the same. Thus, the question is whether it would be sufficient for your use case to handle it in poetry check?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this point, @MaddyGuthridge and @radoering. I agree that refactoring to access the pyproject.toml's directory might be extensive, and that consistency with how README files are handled is important.

@MaddyGuthridge, could you elaborate on why accessing the pyproject.toml's path for license file resolution would require significant refactoring? Understanding the specific challenges would help in exploring alternative solutions. Perhaps there's a less intrusive way to achieve this.

Regarding handling this in poetry check, @radoering, while that might be sufficient for some use cases, it still leaves a gap in the validation process. Ideally, the validation should be accurate regardless of where it's invoked. A user might encounter this issue in other contexts beyond poetry check, and having consistent behavior is crucial.

Let's explore a few options:

  1. Investigate minimal refactoring: Even if directly accessing the pyproject.toml's path is difficult, there might be a less invasive way to get the base directory. Perhaps we can pass it as a parameter during the validation process.

  2. Defer file existence check: If resolving the path is too complex for this stage, we could consider deferring the file existence check to a later stage where the project's root directory is readily available. This would avoid the immediate validation error but ensure the check happens eventually.

  3. Document the limitation: If neither of the above is feasible, we should clearly document this limitation, explaining that the license file path is resolved relative to the current working directory during this specific validation step. This would at least inform users about the potential for false positives/negatives.

I'm leaning towards option 1, as it addresses the root cause. Let's discuss the feasibility of a minimal refactoring approach before considering other options. I'm happy to help explore this further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly happy to handle it in poetry check (which I presume does strict validation). My reason for handling it in non-strict validation is that otherwise, a very confusing error message is given later-on in Poetry, as per python-poetry#10105. I'm interpreting your suggested fix as comprising of a few steps:

  1. Remove the check for the license path from code paths that are called from poetry run and similar commands which don't need to validate the license.
  2. Move this more-detailed error message to poetry check.
  3. Ensure that commands such as poetry build (which do need to check the license file) also give helpful error info.

To clarify for @sourcery-ai (and onlookers), by significant refactoring, I don't mean refactoring the entire project, but rather changing the interface in a way that may not be desired by the core maintainers (I want to avoid this without their approval given I am an outside contributor).

How does this sound, @radoering ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, @MaddyGuthridge. I understand your concern about changing interfaces without maintainer approval. Your proposed 3-step approach sounds like a good compromise, addressing both the confusing error message and the validation accuracy.

To summarize your proposal:

  1. Remove license check from non-essential paths: This prevents the confusing error during commands like poetry run where license validation isn't strictly required.

  2. Move detailed error to poetry check: This ensures a comprehensive error message is presented during the dedicated checking phase.

  3. Ensure helpful errors during essential operations: Commands like poetry build, where the license file is crucial, should also provide informative error messages.

This approach avoids the need for immediate refactoring while still improving the user experience. I agree that this is a good path forward, especially for an outside contribution. It balances the need for a fix with the constraints of working with an existing codebase.

@radoering, what are your thoughts on this revised approach? Does it align with your suggestions from the linked issue comment?

@MaddyGuthridge, if this approach is agreed upon, I'm happy to help you think through the implementation details or review a revised PR based on these steps. Let me know if you'd like any assistance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaddyGuthridge I checked your issue again and noticed that license is different from readme in the way that we read the license file in Factory._configure_package_metadata():

raw_license = (root / license_file).read_text(encoding="utf-8")

(We do not read the readme file that early.)

Maybe, just catch the error there and raise one with a better error message?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, @radoering! Catching the error in Factory._configure_package_metadata() and providing a better error message there seems like the most direct and least intrusive solution. This avoids the need for refactoring the validation logic or moving the check to a different part of the codebase.

@MaddyGuthridge, this approach aligns well with your original goal of providing a clearer error message without extensive changes. It also addresses the concern about the confusing error appearing in contexts where license validation isn't strictly necessary.

If you're comfortable with this approach, I'd suggest focusing your efforts on modifying _configure_package_metadata() to handle the missing license file gracefully and provide a user-friendly error message. This would likely be the most efficient and maintainable solution.

I'm happy to review a revised PR with this change. Let me know if you have any questions or would like to discuss the implementation details further.

MaddyGuthridge and others added 2 commits January 25, 2025 14:17
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@MaddyGuthridge
Copy link
Contributor Author

MaddyGuthridge commented Jan 25, 2025

A little beyond the scope of this PR but I wonder if it could be helpful to have some simple TypedDict definitions for the config. It'd help a ton for reducing some of the type gymnastics I found myself doing in this PR. It's pretty unfortunate that Python doesn't have something as nice as TypeScript's superstruct which gives you type definitions and a schema (albeit non-JSON) at the same time.

@MaddyGuthridge
Copy link
Contributor Author

Based on the advice above, I've opened #827. It ended up being a much simpler approach than this, so I'll close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misleading errors are reported if some pyproject.toml fields are invalid
2 participants