Skip to content

ER: basic TCO #86

@pkoppstein

Description

@pkoppstein

Basic TCO support would substantially improve gojq's speed and memory efficiency for cetain jq programs.

This note first considers a simple and direct test of recursion limits (see [*Recursion] below), and then a more practical test involving an optimized form of walk and a well-known large JSON file
https://github.com/ryanwholey/jeopardy_bot/blob/master/JEOPARDY_QUESTIONS1.json

[*Recursion]

for jq in gojq ; do
    echo $jq ::
    /usr/bin/time -lp $jq --argjson max 100000000 -n '
    def zero_arity:
      if . == $max then "completed \($max)"
      else if (. % 100000 == 0) then . else empty end, ((.+1)| zero_arity)
      end;
    1 | zero_arity'
done

In abbreviated form, the output for jq is:

"completed 100000000"
user        90.48
sys          0.24
   1880064  maximum resident set size

For gojq, the program fails to complete in a reasonable amount of time. In fact, it takes many hours to reach 600,000.

Setting max to 100,000 gives these performance indicators:

user 78.83
sys 0.65
47173632 maximum resident set size

[*walk]

#!/bin/bash

for jq in jq gojq ; do
    echo $jq
  /usr/bin/time -lp $jq '
  def walk(f):
  def w:
    if type == "object"
    then . as $in
    | reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | w) } ) | f
    elif type == "array" then map( w ) | f
    else f
    end;
  w;

  walk(if type == "string" then 0 else 1 end)
  ' jeopardy.json > /dev/null

done

Results:

jq
user         6.44
sys          0.11
 226742272  maximum resident set size
gojq
user         9.99
sys          0.85
1861201920  maximum resident set size

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions