Skip to content

Conversation

robmyersrobmyers
Copy link
Contributor

I have a need to marshal the AST to JSON to input into Open Policy Agent, but when I do this for non-trivial schemas the Position information takes a massive amount of memory.
This PR aligns the JSON Marshal rules with the dump rules, which also do not marshal Position.

I have attached a reproducer to demonstrate the issue.

# On master this allocates 30+GB of memory and never finishes.

$ go build -o reproducer reproducer.go && time ./reproducer
Now passing 1262568 bytes to ParseSchema() to demonstrate the benefit of the json marshaling change
Alloc = 4160 MiB	TotalAlloc = 5630 MiB	Sys = 5578 MiB	NumGC = 18
Alloc = 8312 MiB	TotalAlloc = 11174 MiB	Sys = 11120 MiB	NumGC = 19
Alloc = 19392 MiB	TotalAlloc = 22254 MiB	Sys = 22203 MiB	NumGC = 19
Alloc = 16616 MiB	TotalAlloc = 22254 MiB	Sys = 22203 MiB	NumGC = 20
Alloc = 16616 MiB	TotalAlloc = 22254 MiB	Sys = 22203 MiB	NumGC = 20
Alloc = 16616 MiB	TotalAlloc = 22254 MiB	Sys = 22203 MiB	NumGC = 20
Alloc = 33223 MiB	TotalAlloc = 44405 MiB	Sys = 44371 MiB	NumGC = 21
Alloc = 33223 MiB	TotalAlloc = 44405 MiB	Sys = 44371 MiB	NumGC = 21
Alloc = 33223 MiB	TotalAlloc = 44405 MiB	Sys = 44371 MiB	NumGC = 21
Alloc = 33223 MiB	TotalAlloc = 44405 MiB	Sys = 44371 MiB	NumGC = 21
Alloc = 33223 MiB	TotalAlloc = 44405 MiB	Sys = 44371 MiB	NumGC = 21

# With the position JSON annotation change this finishes in well under a second.

$ go build -o reproducer reproducer.go && time ./reproducer
Now passing 1262568 bytes to ParseSchema() to demonstrate the benefit of the json marshaling change
Got 3705551 bytes of JSON

real	0m0.380s
user	0m0.075s
sys	0m0.020s

All tests pass, so I do not think this change breaks any supported use case.

$ go test  ./...
?   	github.com/vektah/gqlparser/v2	[no test files]
ok  	github.com/vektah/gqlparser/v2/ast	0.197s
ok  	github.com/vektah/gqlparser/v2/formatter	0.706s
ok  	github.com/vektah/gqlparser/v2/gqlerror	0.849s
ok  	github.com/vektah/gqlparser/v2/lexer	0.527s
?   	github.com/vektah/gqlparser/v2/main	[no test files]
ok  	github.com/vektah/gqlparser/v2/parser	0.366s
?   	github.com/vektah/gqlparser/v2/parser/testrunner	[no test files]
ok  	github.com/vektah/gqlparser/v2/validator	1.195s
ok  	github.com/vektah/gqlparser/v2/validator/rules	0.994s

@coveralls
Copy link

Coverage Status

coverage: 87.597%. remained the same
when pulling 7d05152 on robmyersrobmyers:no_marshal_position
into bcc3a70 on vektah:master.

robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 11, 2025
… marshaled

     to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
     memory for JSON fields that will be subsequently pruned.

     Upstream PR: vektah/gqlparser#364

     # Without this change:
     $ time ./reproducer-pre-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
     Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
     Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

     With this change:
     $ time ./reproducer-post-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

     real	0m0.545s
     user	0m0.201s
     sys	0m0.028s
robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 11, 2025
… marshaled

     to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
     memory for JSON fields that will be subsequently pruned.

     Upstream PR: vektah/gqlparser#364

     # Without this change:
     $ time ./reproducer-pre-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
     Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
     Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

     With this change:
     $ time ./reproducer-post-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

     real	0m0.545s
     user	0m0.201s
     sys	0m0.028s

Signed-off-by: Rob Myers <1243316+robmyersrobmyers@users.noreply.github.com>
robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 11, 2025
… marshaled

     to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
     memory for JSON fields that will be subsequently pruned.

     Upstream PR: vektah/gqlparser#364

     # Without this change:
     $ time ./reproducer-pre-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
     Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
     Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

     # With this change:
     $ time ./reproducer-post-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

     real	0m0.545s
     user	0m0.201s
     sys	0m0.028s

Signed-off-by: Rob Myers <1243316+robmyersrobmyers@users.noreply.github.com>
robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 11, 2025
… marshaled

     to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
     memory for JSON fields that will be subsequently pruned.

     Upstream PR: vektah/gqlparser#364

     # Without this change:
     $ time ./reproducer-pre-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
     Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
     Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
     Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

     With this change:
     $ time ./reproducer-post-fix
     Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

     real	0m0.545s
     user	0m0.201s
     sys	0m0.028s
robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 11, 2025
Annotate internal/gqlparser structs not to include Position when marshaled
to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
memory for JSON fields that will be subsequently pruned.

Upstream PR: vektah/gqlparser#364

$ time ./reproducer-pre-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

$ time ./reproducer-post-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

real	0m0.545s
user	0m0.201s
sys	0m0.028s

Signed-off-by: Rob Myers <1243316+robmyersrobmyers@users.noreply.github.com>
robmyersrobmyers added a commit to robmyersrobmyers/opa that referenced this pull request Apr 16, 2025
Annotate internal/gqlparser structs not to include Position when marshaled
to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
memory for JSON fields that will be subsequently pruned.

Upstream PR: vektah/gqlparser#364

$ time ./reproducer-pre-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

$ time ./reproducer-post-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

real	0m0.545s
user	0m0.201s
sys	0m0.028s

Signed-off-by: Rob Myers <1243316+robmyersrobmyers@users.noreply.github.com>
johanfylling pushed a commit to open-policy-agent/opa that referenced this pull request Apr 16, 2025
…fields (#7509)

Annotate internal/gqlparser structs not to include Position when marshaled
to JSON.  This makes the JSON roundtrip succeed without allocating a ton of
memory for JSON fields that will be subsequently pruned.

Upstream PR: vektah/gqlparser#364

$ time ./reproducer-pre-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()
Alloc = 4159 MiB	TotalAlloc = 5625 MiB	Sys = 5584 MiB	NumGC = 18
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 8312 MiB	TotalAlloc = 11169 MiB	Sys = 11125 MiB	NumGC = 19
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 16615 MiB	TotalAlloc = 22247 MiB	Sys = 22209 MiB	NumGC = 20
Alloc = 38765 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 20
Alloc = 33223 MiB	TotalAlloc = 44397 MiB	Sys = 44377 MiB	NumGC = 21

$ time ./reproducer-post-fix
Now passing 1262568 bytes to builtinGraphQLParseSchema() to reproduce the issue in ast.InterfaceToValue()

real	0m0.545s
user	0m0.201s
sys	0m0.028s

Signed-off-by: Rob Myers <1243316+robmyersrobmyers@users.noreply.github.com>
@StevenACoffman
Copy link
Collaborator

StevenACoffman commented Apr 17, 2025

Sorry for sitting on this. I would like to merge this, but I maintain this as a volunteer after hours and verifying this particular PR would take more time than I've had. Can you help me?

I can see how this entirely solves the memory consumption issue, but my concern is that it may break existing behavior since it is not an opt-in or opt-out change.

In order to verify that it doesn't break gqlgen, I would need to make a fork (or branch), pin it to your version in the root go.mod and also in the _examples/go.mod, run go mod tidy in both, re-generate (go generate ./... in the root) and commit that and push that up as PR (even in draft) and mention me. That will give me the confidence that it doesn't break anything that a large segment of the ecosystem relies on.

Is that a fair ask?

Ok, nevermind. I'm just going to carve out some time and do that myself. I feel bad to have made you wait this long already.

@StevenACoffman
Copy link
Collaborator

StevenACoffman commented Apr 17, 2025

Ok, works in gqlgen locally.

@StevenACoffman StevenACoffman merged commit a4eb489 into vektah:master Apr 17, 2025
7 checks passed
@robmyersrobmyers robmyersrobmyers deleted the no_marshal_position branch May 6, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants