-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
The fix for #618 (7243989) introduced DUPN
to the compilation of reduce f as $x (init; update)
and foreach f as $x (init; update; extract)
to address extra references being held.
However, the value that DUPN
dupes (that is, .
as it is passed to f
) comes from before a forkpoint, which makes DUPN
unsafe here; if we backtrack to the FORK
(in particular, if init
backtracks), then f
will see .
mutated to null
. This is a problem, as to my understanding the mutation in DUPN
(and the other -N
instructions) should be a non-observable optimization.
To reproduce
$ jq --null-input '[] | reduce .[] as $x (0, 0; .)'
0
jq: error (at <unknown>): Cannot iterate over null (null)
$ jq --null-input '[] | foreach .[] as $x (0, 0; .)'
jq: error (at <unknown>): Cannot iterate over null (null)
The init contains a ,
and thus backtracks. .[]
then gets []
on the first pass, but null
when backtracking.
Expected behavior
reduce
and foreach
should not break backtracks that run back through the body. That is to say, we should see:
$ jq --null-input '[] | reduce .[] as $x (0, 0; .)'
0
0
$ jq --null-input '[] | foreach .[] as $x (0, 0; .)'
<empty>
Put another way, it should be the same as e.g. reduce [][] as $x (0, 0; .)
.
Environment
- OS: Linux (NixOS 25.05)
jq
version: tested against currentmaster
(588ff18) and 1.7.1
Additional context
I noticed this issue while inspecting the bytecode, which gives some insights towards addressing it, which I've noted below.
From digging in thread history, it seems that @leonid-s-usov noticed the same issues with the bytecode while looking for a way to remove the DUPN
instruction for another reason. They give a draft of potential changes in #2059, which from my testing would be sufficient to fix this bug.
foreach
This shouldn't be an issue with foreach
in the first place, and should be fixable without having to compromise. Inspecting the bytecode for the example above:
0000 TOP
0001 LOADK []
// foreach init
0003 DUP
0004 FORK 0010
0006 LOADK 0
0008 JUMP 0012
0010 LOADK 0
0012 STOREV $foreach:0
// foreach body
0015 FORK 0031
// generator
0017 DUPN
0018 EACH
0019 STOREV $x:1
// update
0022 LOADVN $foreach:0
0025 DUP
0026 STOREV $foreach:0
0029 JUMP 0032
0031 BACKTRACK
0032 RET
Notice that the FORK 0031
backtracks directly to a BACKTRACK
, which is unnecessary, as backtracks in the loop body will already backtrack properly, and we don't need to consume the stream like we do in reduce
. I suspect that this is only here because 5a863bf copied gen_reduce
and modified it a bit.
If the FORK ... BACKTRACK
(and JUMP
) structure is removed, then DUPN
isn't necessary anymore; if we don't need to backtrack, then .
isn't held across a forkpoint and stack_pop_will_free
, meaning no unnecessary references. If we do need to backtrack, then we needed to keep the reference anyway.
reduce
reduce
can't avoid the FORK ... BACKTRACK
. A STOREV $dot ... FORK ... LOADVN $dot
would serve a similar purpose to DUPN
for avoiding unnecessary references while not causing backtracking issues; that's the only potential fix I can think of while staying within the constraints of the current instruction set.
Footnotes
-
This is the output from local testing of a potential fix (removing
FORK
fromforeach
, usingSTOREV
/LOADVN
instead ofDUPN
inreduce
) ↩