Skip to content

Support anchor link targets #31

@becheran

Description

@becheran

Anchor links

The part after the # called anchor link is currently not checked. A markdown link including an anchor link target is for example [anchor](#go-to-anchor-on-same-page), or [external](http://external.com#headline-1).

How do anchor links work

HTML defines anchors targets via the anchor name tag (e.g. <a id="generator"></a>). An anchor target can also be any html tag with an id attribute (e.g. <div id="fol-bar"></div>).

The official markdown spec does not define anchor targets. But most interpreters and renderer support the generation of default anchor targets in markdown documents. For example the github markdown flawor supports auto generated link targets for all headline (h1 to h6) to with the following rules:

  1. downcase the headline
  2. remove anything that is not a letter, number, space or hyphen
  3. changes any space to a hyphen

Implementation hints

A first good step would be to add valid anchor links example markdown files to the benches dir which will be used for the [end-to-end unit tests[(./tests/end_to_end.rs).

The library run method is the most important method which will use all submodules and does the actual execution.

In the link extractor module the part after the # needs to be extracted and saved in the MarkupLink struct.

The lilnk validator module is responsible for the actual resolving and check whether a resource exists (either on disk or as URL. This code needs to be enhanced to not only check for existence if an anchor link was extracted, but also actually parse the target file and extract all anchor targets. Same must be done for we links. Here a HEAD request is send right now and only of that failes a GET is send. If an achor link needs to be followed a GET request is needed and the resulting page needs to be parsed for all anchors.

Besides the already existing grouping of same links which are only checked once for performance boost, it would also make sense to parse an document wich contains an anchor to it only once and reuse the parse result for others references to the same doc, Also for performacne reasons it would be great to only download and parse documents which actually have an anchor link to them and not all docs for all links.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions