Skip to content

add source locations tracking #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 10, 2022
Merged

add source locations tracking #168

merged 15 commits into from
Jan 10, 2022

Conversation

biojppm
Copy link
Owner

@biojppm biojppm commented Nov 4, 2021

Fixes #164.

New features

  • Parser:
    • add source() and filename() to get the latest buffer and filename to be parsed
    • add callbacks() to get the parser's callbacks
  • Add tracking of source code locations. This is useful for reporting semantic errors (ie where the YAML is syntatically valid but the contents are semantically invalid). The locations can be obtained lazily from the parser when the first location is queried:
    // To obtain locations, use of the parser is needed:
    ryml::Parser parser;
    ryml::Tree tree = parser.parse_in_arena("source.yml", R"({
    aa: contents,
    foo: [one, [two, three]]
    })");
    // After parsing, on the first call to obtain a location,
    // the parser will cache a lookup structure to accelerate
    // tracking the location of a node, with complexity
    // O(numchars(srcbuffer)). Then it will do the lookup, with
    // complexity O(log(numlines(srcbuffer))).
    ryml::Location loc = parser.location(tree.rootref());
    assert(parser.location_contents(loc).begins_with("{"));
    // note the location members are zero-based:
    assert(loc.offset == 0u);
    assert(loc.line == 0u);
    assert(loc.col == 0u);
    // On the next call to location(), the accelerator is reused
    // and only the lookup is done.
    loc = parser.location(tree["aa"]);
    assert(parser.location_contents(loc).begins_with("aa"));
    assert(loc.offset == 2u);
    assert(loc.line == 1u);
    assert(loc.col == 0u);
    // KEYSEQ in flow style: points at the key
    loc = parser.location(tree["foo"]);
    assert(parser.location_contents(loc).begins_with("foo"));
    assert(loc.offset == 16u);
    assert(loc.line == 2u);
    assert(loc.col == 0u);
    loc = parser.location(tree["foo"][0]);
    assert(parser.location_contents(loc).begins_with("one"));
    assert(loc.line == 2u);
    assert(loc.col == 6u);
    // SEQ in flow style: location points at the opening '[' (there's no key)
    loc = parser.location(tree["foo"][1]);
    assert(parser.location_contents(loc).begins_with("["));
    assert(loc.line == 2u);
    assert(loc.col == 11u);
    loc = parser.location(tree["foo"][1][0]);
    assert(parser.location_contents(loc).begins_with("two"));
    assert(loc.line == 2u);
    assert(loc.col == 12u);
    loc = parser.location(tree["foo"][1][1]);
    assert(parser.location_contents(loc).begins_with("three"));
    assert(loc.line == 2u);
    assert(loc.col == 17u);
    // NOTE: reusing the parser with a new YAML source buffer
    // will invalidate the accelerator.
    See more details in the quickstart sample.

Breaking changes

As part of the work above, opportunity was taken to address a number of pre-existing API issues. These changes consisted of:

  • Deprecate c4::yml::parse() and c4::yml::Parser::parse() overloads; all these functions will be shortly removed. Until removal, any call from client code will trigger a compiler warning.
  • Add parse() alternatives, either parse_in_place() or parse_in_arena():
    • parse_in_place() receives only substr buffers, ie mutable YAML source buffers. Trying to pass a csubstr buffer to parse_in_place() will cause a compile error:
      substr readwrite = /*...*/;
      Tree tree = parse_in_place(readwrite); // OK
      
      csubstr readonly = /*...*/;
      Tree tree = parse_in_place(readonly); // compile error
    • parse_in_arena() receives only csubstr buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Because parse_in_arena() is meant for immutable buffers, overloads receiving a substr YAML buffer are now declared and marked deprecated, and intentionally left undefined, such that calling parse_in_arena() with a substr will cause a linker error.
      substr readwrite = /*...*/;
      Tree tree = parse_in_arena(readwrite); // compile warning+linker error
      This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena: substr is implicitly convertible to csubstr. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first to immutable by assigning the substr to a csubstr prior to calling parse_in_arena():
      substr readwrite = /*...*/;
      csubstr readonly = readwrite; // ok
      Tree tree = parse_in_arena(readonly); // ok
      This problem is not raised by parse_in_place() because csubstr is not implicitly convertible to substr.
  • In the python API, ryml.parse() was removed and not just deprecated; the parse_in_arena() and parse_in_place() now replace this.
  • Callbacks: changed behavior in Parser and Tree:
    • When a tree is copy-constructed or move-constructed to another, the receiving tree will start with the callbacks of the original.
    • When a tree is copy-assigned or move-assigned to another, the receiving tree will now change its callbacks to the original.
    • When a parser creates a new tree, the tree will now use a copy of the parser's callbacks object.
    • When an existing tree is given directly to the parser, both the tree and the parser now retain their own callback objects; any allocation or error during parsing will go through the respective callback object.

@biojppm biojppm changed the title [wip] adding locations [wip] optionally preserve source locations of nodes Nov 8, 2021
@biojppm biojppm marked this pull request as draft December 16, 2021 11:07
@biojppm biojppm force-pushed the preserve_location branch 2 times, most recently from 0c8c8f8 to 41bd851 Compare December 31, 2021 02:33
@codecov
Copy link

codecov bot commented Dec 31, 2021

Codecov Report

Merging #168 (070504d) into master (a02f273) will increase coverage by 0.27%.
The diff coverage is 97.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   96.41%   96.69%   +0.27%     
==========================================
  Files          63       68       +5     
  Lines       12905    14393    +1488     
==========================================
+ Hits        12443    13917    +1474     
- Misses        462      476      +14     
Impacted Files Coverage Δ
test/test_basic.cpp 97.36% <ø> (-2.57%) ⬇️
test/test_tree.cpp 99.92% <ø> (ø)
test/test_case.hpp 46.01% <28.57%> (+9.90%) ⬆️
test/test_group.cpp 98.19% <84.00%> (+0.07%) ⬆️
src/c4/yml/tree.cpp 94.20% <92.21%> (-0.29%) ⬇️
src/c4/yml/parse.cpp 92.99% <95.69%> (+0.26%) ⬆️
test/test_location.cpp 98.62% <98.62%> (ø)
test/test_serialize.cpp 99.29% <99.29%> (ø)
test/test_parser.cpp 99.44% <99.44%> (ø)
samples/quickstart.cpp 99.53% <100.00%> (+0.02%) ⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a02f273...070504d. Read the comment docs.

@biojppm biojppm force-pushed the preserve_location branch 4 times, most recently from 773a029 to 19d533f Compare January 3, 2022 03:44
@biojppm biojppm force-pushed the preserve_location branch 3 times, most recently from 23f4b8c to 9a1d0e6 Compare January 4, 2022 15:23
@biojppm biojppm force-pushed the preserve_location branch from 9a1d0e6 to b0b7591 Compare January 4, 2022 15:33
@biojppm biojppm marked this pull request as ready for review January 5, 2022 03:02
@biojppm biojppm force-pushed the preserve_location branch from 7034f8f to 33be969 Compare January 5, 2022 03:21
@biojppm biojppm changed the title [wip] optionally preserve source locations of nodes add source locations tracking Jan 5, 2022
@biojppm biojppm force-pushed the preserve_location branch 3 times, most recently from 2ea3d61 to dbdbec9 Compare January 5, 2022 17:08
@biojppm biojppm force-pushed the preserve_location branch from bfb0732 to 8161add Compare January 6, 2022 00:38
@cschreib
Copy link

cschreib commented Jan 6, 2022

Nice, I will test this whenever I get the chance. One thing I thought when reading the PR description: for calling parse_in_arena() with a mutable buffer, rather than deprecate + linker error, you can make it a compilation error by marking the overload as = delete.

@biojppm biojppm force-pushed the preserve_location branch 6 times, most recently from ea8f5e3 to e375143 Compare January 6, 2022 19:49
@biojppm biojppm force-pushed the preserve_location branch from e375143 to 070504d Compare January 6, 2022 19:53
@biojppm biojppm merged commit afb1980 into master Jan 10, 2022
@biojppm biojppm deleted the preserve_location branch January 10, 2022 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing ryml::Location after parsing
2 participants