Skip to content

Add support for tagged values. #408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add support for tagged values. #408

wants to merge 1 commit into from

Conversation

pyfisch
Copy link
Contributor

@pyfisch pyfisch commented Jun 23, 2016

Some serialization formats allow the addition of "tags"
to values to add additional information. These formats
include CBOR, BSON and YAML.

This commit introduces serialization and deserialization
of tags. It introduces a trait Tagger that resolves
the tag for a specific format. It supports integer,
binary, and string tags and is extensible to cover
more yet unknown formats.

By default tags are discarded at deserialization
except when the user has implemented visit_tagged_value
function.

To serialize tags the user has to call
serialize_tagged_value.

Tagged values may be of any type.

Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags.

Supersedes #301

@@ -41,7 +41,7 @@ pub trait Serialize {
///////////////////////////////////////////////////////////////////////////////

/// A trait that describes a type that can serialize a stream of values into the underlying format.
pub trait Serializer {
pub trait Serializer: Sized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that necessary?

Copy link
Contributor Author

@pyfisch pyfisch Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler says so (and is usually right 😬).

   Compiling serde v0.7.7 (file:///serde/serde)
/serde/serde/src/ser/mod.rs:359:15: 359:24 error: the trait bound `Self: std::marker::Sized` is not satisfied [E0277]
/serde/serde/src/ser/mod.rs:359         value.serialize(self)
                                                                         ^~~~~~~~~
/serde/serde/src/ser/mod.rs:359:15: 359:24 help: run `rustc --explain E0277` to see a detailed explanation
/serde/serde/src/ser/mod.rs:359:15: 359:24 help: consider adding a `where Self: std::marker::Sized` bound
error: aborting due to previous error
error: Could not compile `serde`.

Maybe it can be done in a different way but I don't know how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... if you look at all the other default method impls, none call methods on the value. I'm aware of the motivation, that formats not caring about tags don't need to implement a new method. But maybe we need something along the lines of MapVisitor and SeqVisitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes adding a TaggedVisitor is possible. In this case the Sized bound can be removed.

@pyfisch
Copy link
Contributor Author

pyfisch commented Jun 23, 2016

I have added support for tagged values in a branch of my CBOR crate

#[inline]
fn serialize_tagged_value<T, V>(&mut self,
_tag: T,
value: V) -> Result<(), Self::Error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you update the indent here?

@erickt
Copy link
Member

erickt commented Jun 30, 2016

How do you handle tagging higher level types, like a datetime value? Would the idea be that the serializer calls Tagger::string_tag, which returns Some("DateTime"), and it's the responsibility of the serializer to interpret those tags, or error out if it doesn't know how to handle it?

Another thing to consider is that specialization is coming down the pipeline. How could that impact this design? It would be interesting if there was a design that eventually could enable serde to have a registry of generic tag names that then could be used by the different serialization formats without having to do a runtime check.

Some serialization formats allow the addition of "tags"
to values to add additional information. These formats
include CBOR, BSON and YAML.

This commit introduces serialization and deserialization
of tags. It introduces a trait Tagger that resolves
the tag for a specific format. It supports integer,
binary, and string tags and is extensible to cover
more yet unknown formats.

By default tags are discarded at deserialization
except when the user has implemented visit_tagged_value
function.

To serialize tags the user has to call
serialize_tagged_value.

Tagged values may be of any type.

Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags.

Supersedes #301
@pyfisch
Copy link
Contributor Author

pyfisch commented Jun 30, 2016

I'll take the rust-url serialization as an example. Currently the URL is serialized as a string:

impl serde::Serialize for Url {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error> where S: serde::Serializer {
        format!("{}", self).serialize(serializer)
    }
}

Since CBOR recommends an optional tagging of URIs with tag 32 this information could be added to the implementation of Serialize. As you propose to use a registry for common types there could be a "serde-registry" format mapping types to string tags. It would be up to the formats serializer to map the generic tag to a format specific tag or to ignore it. The introduction of a serde registry is not strictly necessary, but it adds a bit of portability. Now a format the rust-url authors are unaware of can still tag URL values correctly if it knows the mapping from "serde-url" to an internal tag.

impl serde::Serialize for Url {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error> where S: serde::Serializer {
        struct Tagger;
        impl serde::Tagger for Tagger {
            fn u64_tag(&self, format: &'static str) -> Option<u64> {
                if format == "cbor" {
                    // URI; see RFC7049 Section 2.4.4.3
                    return Some(32);
                }
                None
            }

            fn string_tag(&self, format: &'static str) -> Option<&str> {
                if format == "serde-registry" {
                    return Some("serde-url");
                }
                None
            }
        }
        serializer.serialize_tagged_value(Tagger, self.as_str())
    }
}

@pyfisch
Copy link
Contributor Author

pyfisch commented Jul 16, 2016

@dtolnay Why is this "WIP"? What needs to be changed?

@oli-obk
Copy link
Member

oli-obk commented Jul 20, 2016

So I've been thinking some about this. And I still dislike the runtime decision about the tag type.

For serialization I played with impl specialization, and (I think) it works

Output for serializing with DefaultSerializer:

Tagged::serialize
S::serialize_tag (default)
DefaultSerializer::i32: 99

Output for serializing with MySerializer (that wants tags):

Tagged::serialize
MySerializer::serialize_tag
Tag: MySerializer::i32: 42
MySerializer::i32: 99

The impl specialization magic happens here:

struct Tagged(i32);

trait SerializeTag<S: Serialize>: Serializer {
    fn serialize_tag(&mut self, value: &S) -> Result<(), ()>;
}

impl SerializeTag<Tagged> for MySerializer {
    fn serialize_tag(&mut self, value: &Tagged) -> Result<(), ()> {
        println!("MySerializer::serialize_tag");
        self.serialize_tagged(42, value.0)
    }
}

impl<S: Serializer> SerializeTag<Tagged> for S {
    default fn serialize_tag(&mut self, value: &Tagged) -> Result<(), ()> {
        println!("S::serialize_tag (default)");
        value.0.serialize(self)
    }
}

impl Serialize for Tagged {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), ()>
        where S: Serializer {
        println!("Tagged::serialize");
        serializer.serialize_tag(self)
    }
}

I haven't put any thought into deserialization yet, but I feel confident we can rig something.

Note: this design does not touch the Serialize trait at all, and adds zero implementation work to existing implementations (I considered both alternatives, but found this "workaround" to be suitable)

@oli-obk
Copy link
Member

oli-obk commented Jul 20, 2016

Ok, there's an issue with this. It works fine if the SerializeTag trait exists in the same crate as the type to be serialized and the Serializer that requires tags. I can't get it to work yet, if the Serializer comes from a different crate, and the local crate defines the SerializeTag trait and impls it. Might be a specialization-bug though

@oli-obk
Copy link
Member

oli-obk commented Jul 20, 2016

Ah, nevermind, it's just horrible error reporting. If the Serializer is generic, make sure to specify ALL bounds that are on the impl Serializer.

@oli-obk
Copy link
Member

oli-obk commented Jul 20, 2016

I implemented the serialization part in serde on a branch based on yours (+ I merged master into it) https://github.com/oli-obk/rust-serde/commits/tagged_values

@oli-obk
Copy link
Member

oli-obk commented Jul 20, 2016

I got deserialization to mostly work. it's now failing on something really weird. I think syntex is eating my code.

@dtolnay
Copy link
Member

dtolnay commented Jul 22, 2016

Closing in favor of #455 which seems more flexible and more promising. We can reopen later if it turns out otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants