Skip to content

Make JSON dictionary output of Map[K, V] usable for other types of K #381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 29, 2022

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Mar 1, 2022

Currently, only Map[String, V] is hardcoded to serialize as a JSON dictionary {"...": ..., "...": ...}, with other non-String Ts serializing as a JSON list-of-lists [["...", ...], ["...", ...]]. The list-of-lists representation, while usable, has downsides:

  1. It is much less idiomatic JSON than dictionary representation, and would be surprising for people familiar with JSON
  2. It is much harder to read or skim, with lots of open and closing square brackets that all look the same, even when pretty-printed or syntax highlighted using common tools
{"foo": "bar", "baz": "qux"}
[["foo", "bar"], ["baz", "qux"]]

{
  "foo": "bar",
  "baz": "qux"
}

[
  [
    "foo",
    "bar"
  ],
  [
    "baz",
    "qux"
  ]
]

Ideally, we should use the dictionary representation whenever possible. This PR makes it possible to use it for non-String types: primitives like Int, Long, Boolean, string-y types like UUID or Duration, and even user-defined types whose Reader/Writer inherits from MapKey or are wrapped in a stringKeyRW. For now, we leave the case where the Reader/Writer generates non-primitive JSON types like Arrays or Dictionaries as unsupported.

The change is accomplished by some tweaks to the MapWriter0 function to remove the hardcoded special-casing of StringWriter, and tweaks to the BaseElemRenderer protocol to allow it to accept usage of non-string keys. We instead allow anything to be written when a key is required, surrounded by quotes, with the built-in quoting of strings disabled to avoid double-quoting. Again, this does not work properly for JSON Arrays or Dictionaries.

This is a forward-breaking change in the uPickle JSON serialization format: newer versions of uPickle will be able to read the old format, but older versions of uPickle will not be able to read the new format. Nevertheless, I think the problem this fixes is sufficiently important to justify the breakage. We can bump the major version number if necessary.

Also did some tightening up of the unit test suite, to make sure that the rw helper checks that the input value is serialized to the first of the JSON strings given. We previously did not validate this at all, only that the JSON strings all deserialized to the input value, and that the input value round tripped, but not that the string that the input value round tripped through was in the list of strings given.

@lihaoyi lihaoyi requested review from htmldoug, lolgab and lefou March 6, 2022 01:39
@lihaoyi lihaoyi changed the title [WIP] Make JSON dictionary output of Map[K, V] usable for other types of K Make JSON dictionary output of Map[K, V] usable for other types of K Mar 6, 2022
@@ -103,7 +103,7 @@ trait Api
def writeBinaryTo[T: Writer](t: T, out: java.io.OutputStream): Unit = {
streamBinary[T](t).writeBytesTo(out)
}
def writeBinaryToByteArray[T: Writer](t: T, out: java.io.OutputStream): Unit = {
def writeBinaryToByteArray[T: Writer](t: T) = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated bugfix

Copy link
Collaborator

@htmldoug htmldoug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marker trait is brilliant here. I wish I had considered that for rallyhealth/weePickle#28.

Picking a behavior was difficult since visitArray/visitObject are valid keys for msgpack, but not JSON. Seems like in a perfect world, the fix should be on the Visitor side rather than the Writer side. I ended up using the keyWriter 100% of the time (i.e. no array tuples). Weepickle's JsonGeneratorVisitor.keyVisitor coerces primitives to strings, but throws on visitArray/visitObject. I justified it with the argument that the latter are a corner case, should get caught during tests, and are fixable in user space.

OTOH, your approach favors safe JSON over idiomatic msgpack. Very reasonable compromise. Thanks for the tag on this.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 6, 2022

@htmldoug your comment made me think of how we could support idiomatic MessagePack dictionaries, since MessagePack does not have a string-keys-only restriction. I pushed another update that implements this: in effect, we avoid making the "encode as dictionary or string" decision in upickle.implicits.Writers#MapWriter0, instead passing a jsonableKeys: Boolean flag into def visitObjectall the way to ujson.JsVisitor which then makes the decision.

That lets the JSON-related backends use the flag to swap between a json-dictionary encoding (now called def visitJsonableObject) in the case where jsonableKeys == true, otherwise using a custom ObjVisitor that proxies back to the visitArray/ArrVisitor implementation. The MessagePack backend can then choose to ignore the flag.

One subtlety is that jsonableKeys is set to true when parsing MessagePack binaries or when traversing upack.Msg data structures. This is not strictly correct, but is not any more incorrect than things were previously, since converting from MessagePack to JSON always had the possibility of generating invalid output.

@htmldoug
Copy link
Collaborator

htmldoug commented Mar 6, 2022

Great solution!

Unrelated, but one thing to be mindful of with the marker trait is that the MapKey type could be lost through ReadWriter.join. I've seen cases where users got burned asking for [T: ReadWriter] and lost TaggedReader, for example. Same caveat would apply here. Users could work around it with [T: Reader: Writer] (everywhere), but it's still a sharp edge. It might be better to add something like an isMapKey() method that composites could proxy.

@lihaoyi
Copy link
Member Author

lihaoyi commented Mar 6, 2022

@htmldoug good catch with the ReadWriter.join issue; I added a test case that surfaces the problem, and converted the market trait MapKeys into a def isJsonDictKey: Boolean flag. That makes propagation a lot more simple.

Perhaps later we can move the Tagged* marker traits into fields too, but that would be beyond the scope of this PR.

I also broadened our test suite to test the ReadWriter[T], in addition to the Reader[T] and Writer[T] logic it was previously already testing. This caught some issues with the implicit def macroSingletonR being applied when we don't want it to. I can't tell if it's a bug in our own usage or a bug in the implicit resolution logic of the compiler, but for now i just removed the implicit marker from those methods to force people to instantiate their macroRWs for case objects manually.

@lihaoyi lihaoyi mentioned this pull request Mar 7, 2022
4 tasks
@htmldoug
Copy link
Collaborator

htmldoug commented Mar 7, 2022

This caught some issues with the implicit def macroSingletonR being applied when we don't want it to.

Yep, that's the main cause of #314.

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I have also skimmed over your changes, but due to the lack of any previous work on this code base I rather won't judge the changes. Starting a new major version is definitely necessary for this kind of breaking change.

@lihaoyi lihaoyi merged commit 0fa8495 into master Apr 29, 2022
@lolgab lolgab deleted the mapkeys branch April 29, 2022 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants