Skip to content

Conversation

willcl-ark
Copy link
Member

Potential followup to: #30025

This should prevent us reintroducing broken markdown links.

It does not test "online" (external) links, only those within this repo. Both relative and absolute links are parsed successfully if they resolve.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 3, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko, davidgumberg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29965 (Lint: support running individual rust linters and improve subtree exclusion by davidgumberg)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the Tests label May 3, 2024
@willcl-ark
Copy link
Member Author

Without #30025 (or alternative fix) the new action being added here will fail.
It takes ~11s to run:

@willcl-ark willcl-ark force-pushed the link-check-action branch from 80d166b to b6be80c Compare May 7, 2024 19:22
@DrahtBot
Copy link
Contributor

DrahtBot commented May 7, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24698006837

@willcl-ark willcl-ark force-pushed the link-check-action branch from 961a536 to 9d2a8a8 Compare May 8, 2024 10:04
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@willcl-ark willcl-ark marked this pull request as ready for review May 8, 2024 10:23
@maflcko
Copy link
Member

maflcko commented May 8, 2024

utACK 9d2a8a8

with or without the nits

@willcl-ark willcl-ark force-pushed the link-check-action branch from 9d2a8a8 to 82c5f95 Compare May 8, 2024 10:44
@willcl-ark
Copy link
Member Author

Thanks @maflcko , did a final force-push for that silly nit: git range-diff 9d2a8a8...82c5f95

Sorry to invalidate the ACK

@maflcko
Copy link
Member

maflcko commented May 8, 2024

utACK 82c5f95

Thanks!

@DrahtBot DrahtBot removed the CI failed label May 8, 2024
@davidgumberg
Copy link
Contributor

crACK 82c5f95

I tested by manually breaking relative and absolute offline links and verifying that the lint runner catches them.

Some notes:

  • At present, mlc does not check for anchor link validity, but likely will in the future. (Support anchor link targets becheran/mlc#31)

    For example:

    -Verification of [scripted diffs](/doc/developer-notes.md#scripted-diffs).
    +Verification of [scripted diffs](/doc/developer-notes.md#nonsense-link).

    mlc will not complain if you apply this diff

  • I have opened a PR (Move error summary output to stdout becheran/mlc#87) to remove the extraneous stderr output mlc produces when we suppress stdout:

    The following links could not be resolved:

  • I have some reservations about the growth of software that is encouraged or required to set up a development environment. mlc seems well maintained, and I don't think it would be better to shift the maintenance burden of markdown link checking to this repo, but it gives me some pause that mlc's entire dependency tree becomes a common attack surface for bitcoin developers.

    That being said, this holds for all of the existing linter dependencies, and this dependency is optional and I am not an expert in security.

    I am not sure if this PR is an appropriate or productive venue for this discussion, but I just wanted to give mention to this.

@maflcko
Copy link
Member

maflcko commented May 23, 2024

  • That being said, this holds for all of the existing linter dependencies, and this dependency is optional and I am not an expert in security.

Right, the pull here does not change that. I'd say, it is best to only run the linters locally in a sandbox, or at least skip the optional dependencies.

@@ -37,6 +37,7 @@ Then you can use:
| [`lint-python-dead-code.py`](/test/lint/lint-python-dead-code.py) | [vulture](https://github.com/jendrikseipp/vulture)
| [`lint-shell.py`](/test/lint/lint-shell.py) | [ShellCheck](https://github.com/koalaman/shellcheck)
| [`lint-spelling.py`](/test/lint/lint-spelling.py) | [codespell](https://github.com/codespell-project/codespell)
| markdown link check | [mlc](https://github.com/becheran/mlc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on additional checks.

Could this have been an individual python lint script like the other ones in /test/lint/? I appreciated being able to run individual python scripts locally to check my work or when reviewing. It seems we have been losing that recently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it could be, but as I see it would need us to write our own markdown parser and link checker.

I don't have the appetite for that myself, hence this approach.

Also see #29965 which may (I didn't fully check yet) permit you to run this lint individually in the future :)

@willcl-ark willcl-ark force-pushed the link-check-action branch from 82c5f95 to 838eace Compare May 24, 2024 09:20
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 838eace

@DrahtBot DrahtBot requested a review from davidgumberg May 24, 2024 09:54
This adds a markdown hyperlink check task to the lint test_runner. It
relies on having the [`mlc`](https://crates.io/crates/mlc) binary found
on $PATH, but will fail with `success` if the binary is not found.

`mlc` is also added to the ci/04_install.sh script run by the
containerfile.

Note that broken markdown hyperlinks will be detected in untracked
markdown files found in a dirty working directory (including e.g.
.venv).
@willcl-ark willcl-ark force-pushed the link-check-action branch from 838eace to 4b7d984 Compare May 24, 2024 11:12
@maflcko
Copy link
Member

maflcko commented May 24, 2024

re-utACK 4b7d984

@davidgumberg
Copy link
Contributor

reACK 4b7d984

Looks great!

Tested with the following diff:

diff --git a/test/lint/README.md b/test/lint/README.md
index 49ed8356c3..47ac472aa6 100644
--- a/test/lint/README.md
+++ b/test/lint/README.md
@@ -36,7 +36,7 @@ Then you can use:
 | [`lint-shell.py`](/test/lint/lint-shell.py) | [ShellCheck](https://github.com/koalaman/shellcheck)
-| [`lint-spelling.py`](/test/lint/lint-spelling.py) | [codespell](https://github.com/codespell-project/codespell)
+| [`lint-spelling.py`](/test/lint/lint-spling.py) | [codespell](https://github.com/codespell-project/codespell)
 | markdown link check | [mlc](https://github.com/becheran/mlc)
And the linter complained
$ cargo run
# [...]
One or more markdown links are broken.

Relative links are preferred (but not required) as jumping to file works natively within Emacs.

Markdown link errors found:
[Err ] ./test/lint/README.md (39, 3) => /test/lint/lint-spling.py - Target filename not found.

                
^---- ⚠️  Failure generated from markdown hyperlink check!

@fanquake fanquake merged commit 0a7c650 into bitcoin:master May 30, 2024
@hebasto
Copy link
Member

hebasto commented Jun 4, 2024

After conducting Guix builds, lint_markdown starts complaining about docs in guix-build-* subdirectories.

@willcl-ark
Copy link
Member Author

After conducting Guix builds, lint_markdown starts complaining about docs in guix-build-* subdirectories.

Oh I didn't consider this directory! I think it should be possible to add /guix-build-* to the ignored files.

@hebasto
Copy link
Member

hebasto commented Jun 4, 2024

After conducting Guix builds, lint_markdown starts complaining about docs in guix-build-* subdirectories.

Oh I didn't consider this directory! I think it should be possible to add /guix-build-* to the ignored files.

This was what I tried before writing my comment :)

It doesn't work, unfortunately.

@willcl-ark
Copy link
Member Author

Ah, that is unfortunate. I will try and find another option...

@willcl-ark
Copy link
Member Author

I think an upstream PR may be the best approach here, So I opened becheran/mlc#89

@maflcko
Copy link
Member

maflcko commented Jun 4, 2024

Not sure if globbing is the right fix. Conceptually, no folder or file should be scanned that is not tracked in git.

Though, I wonder if there is an easy way to ask git to print all files/folders that are "dirty", ignoring the gitignore. An alternative may be to teach mlc to read understand the gitignore files. However, that will also be incomplete, because devs may add random other folders themselves.

So for now the only solution would be to call mlc in a loop over git ls-files -- '*.md'. Maybe a better upsteam fix would be to teach mlc to iterate over all given files, instead of only over one?

@willcl-ark
Copy link
Member Author

Not sure if globbing is the right fix. Conceptually, no folder or file should be scanned that is not tracked in git.

Though, I wonder if there is an easy way to ask git to print all files/folders that are "dirty", ignoring the gitignore. An alternative may be to teach mlc to read understand the gitignore files. However, that will also be incomplete, because devs may add random other folders themselves.

So for now the only solution would be to call mlc in a loop over git ls-files -- '*.md'. Maybe a better upsteam fix would be to teach mlc to iterate over all given files, instead of only over one?

This was a. bit of a pain (and prob not worth it!) but seems to work: becheran/mlc@master...willcl-ark:mlc:git-aware

Passing the glob symbols to the subcommand in Rust was ... tricky, or a skill issue. So I went for a script which matches md (and html, which is the other thing this tool seems to do) against git check-ignore. Feels a bit hacky, but should result in the same as git ls-files -- **/* I think?

@maflcko do you think this hits the correct files? I think check-ignore will nicely include dirty files etc.

@maflcko
Copy link
Member

maflcko commented Jun 4, 2024

@maflcko do you think this hits the correct files? I think check-ignore will nicely include dirty files etc.

Ah sorry. I guess I was using the wrong words. With "dirty" I meant folders/files that should be ignored, but are not according to the gitignore. Maybe "untracked" is a better word, which is also used by git status. For example, a python venv or a generated untracked markdown file should not be included in the check.

@maflcko
Copy link
Member

maflcko commented Jun 7, 2024

git ls-files is also used in ./test/lint/lint-python.py, for reference. So using git ls-files -- '*.md' and passing the full array to mlc may be the best approach for us?

@willcl-ark
Copy link
Member Author

git ls-files is also used in ./test/lint/lint-python.py, for reference. So using git ls-files -- '*.md' and passing the full array to mlc may be the best approach for us?

Thanks, I changed the approach in https://github.com/willcl-ark/mlc/tree/git-aware to use git status --ignored *.md and append to the ignored paths, as adding to that felt like the path of least resistance.

I will investigate if I can have a --git flag only check the result of git ls-files -- '*.md', as I feel this makes even more sense than having a --gitignore flag exclude (all) git-ignored files... Most folks using the tool are going to want to check:

i) a single file or
ii) all files tracked by git

...I think

@bitcoin bitcoin locked and limited conversation to collaborators Jun 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants