Skip to content

Tagged values through specialization #455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

oli-obk
Copy link
Member

@oli-obk oli-obk commented Jul 21, 2016

@pyfisch could you test whether this fits your needs?

have a look at the tests in serde_macros to see how it should be used from the Deserialize impl side and look at the Deserializer and Serializer impls in serde_test to see how the actual format should use this.

Note that this requires that a crate implementing Deserialize for a tagged type to have the Deserializer as a dependency.

supersedes #408

pyfisch and others added 8 commits June 30, 2016 22:21
Some serialization formats allow the addition of "tags"
to values to add additional information. These formats
include CBOR, BSON and YAML.

This commit introduces serialization and deserialization
of tags. It introduces a trait Tagger that resolves
the tag for a specific format. It supports integer,
binary, and string tags and is extensible to cover
more yet unknown formats.

By default tags are discarded at deserialization
except when the user has implemented visit_tagged_value
function.

To serialize tags the user has to call
serialize_tagged_value.

Tagged values may be of any type.

Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags.

Supersedes serde-rs#301
@oli-obk
Copy link
Member Author

oli-obk commented Jul 21, 2016

we can add this to 0.8 later, because it doesn't need to be a breaking change. We can add a default impl for the new methods that simply error. This is actually the correct behaviour for all Serializers and Deserializers that don't support tagging. The Serialize and Deserialize implementors need to ensure that the serialize_tagged_value and deserialize_tagged_value aren't called in the default impl. I'll add a clippy lint for this once this is merged.

@dtolnay
Copy link
Member

dtolnay commented Jul 21, 2016

Agreed that this should go in backward-compatibly later. (I have not reviewed yet.)

@dtolnay dtolnay self-assigned this Jul 22, 2016
@pyfisch
Copy link
Contributor

pyfisch commented Jul 26, 2016

I am testing this but before I have to port my cbor crate to serde 0.8.0.

@pyfisch
Copy link
Contributor

pyfisch commented Jul 27, 2016

It turned out I am not smart enough to understand tagged values through specialization. 😞 I was not yet able to get it working.

I have three problems with the approach:

  1. Tags are encoded as variable length binary integers with major type 6 encoded in the first three bits of the first byte. Some tags may take one byte, others two, three, five or nine bytes. So in serialize_tagged_value I need to know what the tag is to serialize it in its most compact form. But tags are completly opaque to the serializer. I am supposed to call tag.serialize(self) to serialize the tag. If the user supplied a integer as it would be correct we will be serialized as an integer value and not as a tag value (the major type is wrong) so it is impossible to serialize tags correctly.
  2. Tagging should be implemented for std types like std::time::SystemTime. Serialization for std types can only be done in serde itself since either the trait Serialize or the type must me defined in the current crate (orphan rules). As you mention for deserialization but I think it also applies to serialization the crate implementing serialization with tagging must depend on the crate providing the Serializer for all supported formats. This means serde must depend on serde_cbor, but serde_cbor itself must depend on serde for various traits and since circular dependencies are not allowed by cargo it does not work. It is impossible to implement (de)serialization with tags for std types.
  3. Tags should be something simple. A consumer of serde_cbor should not need to understand the whole type system of Rust to have some tagging for his types.

The deserialization part have I not yet testet.

I have published a version of serde_cbor for serde 0.8-rc3. Feel free to try adding tagging yourself. All relevant information is in RFC 7049.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 27, 2016

Tags should be something simple. A consumer of serde_cbor should not need to understand the whole type system of Rust to have some tagging for his types.

we can probably solve this with a macro once we get the design to work

Serialization for std types can only be done in serde itself since either the trait Serialize or the type must me defined in the current crate (orphan rules).

Maybe specialization allows us to do this (I'll check). if not, lets talk to the lang team, maybe it makes sense to extend specialization to do this.

If the user supplied a integer as it would be correct we will be serialized as an integer value and not as a tag value (the major type is wrong) so it is impossible to serialize tags correctly.

I obviously have no clue about the tagging, that's why we're having this conversation. We'll figure it out ;) Let me play a little.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 27, 2016

I think adding a TagSerializer (that knows that any integers it gets should be serialized by magic) should do the trick.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 27, 2016

alternatively, if we assume that a specific format only has one type of tag, we can add an associated type. Is that an option?

@oli-obk
Copy link
Member Author

oli-obk commented Jul 27, 2016

see https://github.com/oli-obk/cbor/commits/serde0.8 for the working serialization. Point 2 is still not addressed. I won't get to it before next week.

@@ -71,7 +71,7 @@ pub trait Serialize {
/// reference to that state. You do not need to do any additional checks for the correctness of the
/// state object, as it is expected that the user will not modify it. Due to the generic nature
/// of the `Serialize` impls, modifying the object is impossible on stable Rust.
pub trait Serializer {
pub trait Serializer: Sized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still necessary for Serializer to be bound by Sized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check, but unsized serializers are mostly useless anyway. I don't think they can be used to serialize anything, but I didn't check

@dtolnay
Copy link
Member

dtolnay commented Jul 27, 2016

@oli-obk I haven't worked through an example but I want to make sure we can support values that have different tags in different formats. So I should be able to call serialize_tagged_value(tag, v) where tag implements a trait provided by serde_yaml that serde_yaml recognizes, and a different trait provided by serde_cbor that serde_cbor recognizes.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 27, 2016

The Serialize impl is specialized on the serializer to produce a different tag value and type depending on the serializer. The serializers themselves forward to a tagserializer to check the type and get the value. Look at the unit tests I wrote in serde_testing

@oli-obk
Copy link
Member Author

oli-obk commented Jul 28, 2016

Tagging should be implemented for std types like std::time::SystemTime. Serialization for std types can only be done in serde itself since either the trait Serialize or the type must me defined in the current crate (orphan rules). As you mention for deserialization but I think it also applies to serialization the crate implementing serialization with tagging must depend on the crate providing the Serializer for all supported formats. This means serde must depend on serde_cbor, but serde_cbor itself must depend on serde for various traits and since circular dependencies are not allowed by cargo it does not work. It is impossible to implement (de)serialization with tags for std types.

Ok, so this is a problem I don't know how to solve in my current design.

What we can do, is change the Serialize trait to the following:

pub trait Serialize<S: Serializer> {
    /// Serializes this value into this serializer.
    fn serialize(&self, serializer: &mut S) -> Result<(), S::Error>;
}

Then we can create a default impl in serde, and hopefully everyone else can use impl specialization to override the impl if they either own the type or the serializer.

This will mean that Serializer will change all its Serialize bounds to Serialize<Self> bounds, and every usage of Serialize will not be able to do <T: Serialize>, but needs to do <S: Serialize, T: Serialize<S>>.

@dtolnay
Copy link
Member

dtolnay commented Jul 30, 2016

Let's keep thinking, that doesn't sound great.

@dtolnay dtolnay mentioned this pull request Jul 31, 2016
@oli-obk
Copy link
Member Author

oli-obk commented Jul 31, 2016

I have an idea. We do make the change for Serialize, but we add a second trait Serializable, which replaces Serialize wherever it is used (not implemented, only where it's used as a bound). This way we can specialize Serialize (like I did in the unit tests). I'll fiddle together a minimal impl to prove that the type system can do all we need

@oli-obk oli-obk mentioned this pull request Aug 8, 2016
@dtolnay
Copy link
Member

dtolnay commented Aug 12, 2016

Here is a very different, promising, not fully fleshed out approach. This is what I was getting at in my previous comment "Why do we need the bound T: Serialize? If a Serializer does not recognize a tag, it should just ignore."

#![feature(specialization)]

//// serde crate ///////////////////////////////////////////

#[macro_use]
pub mod serde {
    pub trait Serialize {
        fn serialize<S: ?Sized>(&self, &mut S) -> Result<(), S::Error>
            where S: Serializer;
    }

    pub trait Serializer {
        type Error;

        fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error>;

        // No trait bounds on T.
        fn serialize_tagged<T, V>(&mut self,
                                  _tag: T,
                                  value: V) -> Result<(), Self::Error>
            where V: Serialize
        {
            value.serialize(self)
        }
    }

    impl<'a> Serialize for &'a str {
        fn serialize<S: ?Sized>(&self,
                                serializer: &mut S) -> Result<(), S::Error>
            where S: Serializer
        {
            serializer.serialize_str(self)
        }
    }

    pub trait Tagged<T> {
        fn tag(self) -> T;
    }

    // Just for convenience - this is helpful if you only care about the
    // tags of a single format.
    impl<T> Tagged<T> for T {
        fn tag(self) -> T {
            self
        }
    }

    // Don't read this until the end. The stuff below will look like magic.
    #[macro_export]
    macro_rules! serialize_tagged {
        ((&mut $self_:ident, $tagname:ident: $tagtype:ty, $value:ident) -> $ret:ty $blk:block) => {
            fn serialize_tagged<T, V>(&mut self,
                                      tag: T,
                                      value: V) -> Result<(), Self::Error>
                where V: serde::Serialize
            {
                return tag.distinguish(self, value);

                trait Distinguish {
                    fn distinguish<V>(
                        self,
                        serializer: &mut Serializer,
                        value: V,
                    ) -> Result<(), <Serializer as serde::Serializer>::Error>
                        where V: serde::Serialize;
                }

                impl<T> Distinguish for T {
                    default fn distinguish<V>(
                        self,
                        serializer: &mut Serializer,
                        value: V,
                    ) -> Result<(), <Serializer as serde::Serializer>::Error>
                        where V: serde::Serialize,
                    {
                        value.serialize(serializer)
                    }
                }

                impl<T> Distinguish for T where T: serde::Tagged<$tagtype> {
                    fn distinguish<V>(
                        self,
                        serializer: &mut Serializer,
                        value: V,
                    ) -> Result<(), ()>
                        where V: serde::Serialize
                    {
                        serializer.finalize(value, self.tag())
                    }
                }

                // The point is to set `self` back to the Serializer as opposed
                // to the tag, so the user's code can use `self` in a way that
                // makes more sense.
                trait Finalize: serde::Serializer {
                    fn finalize<V>(&mut self, V, $tagtype) -> Result<(), Self::Error>
                        where V: serde::Serialize;
                }

                impl Finalize for Serializer {
                    fn finalize<V>(
                        &mut $self_,
                        $value: V,
                        $tagname: $tagtype,
                    ) -> Result<(), Self::Error>
                        where V: serde::Serialize
                    {
                        $blk
                    }
                }
            }
        };
    }
}

//// serde_stdout crate ////////////////////////////////////

pub mod serde_stdout {
    use serde;

    pub struct Serializer;

    pub enum Tag {
        Plus,
        Minus,
    }

    impl serde::Serializer for Serializer {
        type Error = ();

        fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error> {
            println!("{}", value);
            Ok(())
        }

        // This macro hides all the specialization and just gives you tags
        // of the right type along with the value being serialized.
        serialize_tagged! {
            (&mut self, tag: Tag, value) -> Result<(), Self::Error> {
                match tag {
                    Tag::Plus => print!("+"),
                    Tag::Minus => print!("-"),
                };
                value.serialize(self)
            }
        }
    }
}

//// serde_stderr crate ////////////////////////////////////

pub mod serde_stderr {
    use serde;

    pub struct Serializer;

    // Does not implement serialize_tagged.
    impl serde::Serializer for Serializer {
        type Error = ();

        fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error> {
            use std::io::{self, Write};
            writeln!(&mut io::stderr(), "{}", value).unwrap();
            Ok(())
        }
    }
}

//// my_crate //////////////////////////////////////////////

pub mod my_crate {
    use serde;
    use serde_stdout as stdout;

    pub struct MyTaggedStr<'a>(pub &'a str);

    impl<'a> serde::Serialize for MyTaggedStr<'a> {
        fn serialize<S: ?Sized>(&self,
                                serializer: &mut S) -> Result<(), S::Error>
            where S: serde::Serializer
        {
            // This passes a concrete tag that relies on the impl Tagged<T> for T.
            // Could instead pass some other type for which we impl
            // Tagged<format1::Tag>, Tagged<format2::Tag>, Tagged<format3::Tag>.
            // Those impls can be provided by the crate implementing Serialize
            // *and/or* the crates containing each Serializer.
            serializer.serialize_tagged(stdout::Tag::Plus, self.0)
        }
    }
}

fn main() {
    use serde::Serialize;
    let v = my_crate::MyTaggedStr("test");
    v.serialize(&mut serde_stdout::Serializer).unwrap();
    v.serialize(&mut serde_stderr::Serializer).unwrap();
}

@pyfisch
Copy link
Contributor

pyfisch commented Aug 12, 2016

@dtolnay Your proposal looks promising. This could work for CBOR and other formats.

@dtolnay
Copy link
Member

dtolnay commented Sep 7, 2016

I experimented a bit with the Serialize<S: Serializer> idea. The broken thing there is Serialize<Self> is often not enough of a bound, for example here in serde_json.

I haven't given up yet because fundamentally we need (something like) Serialize<S: Serializer> in order to give the Serializer any control, but there is a piece that still needs to be figured out.

@dtolnay
Copy link
Member

dtolnay commented Sep 7, 2016

I also ran into this confusing behavior where it looks like the blanket impl is not kicking in as I would expect. I will read through the impl specialization RFC again and see where this is specified.

#![feature(specialization)]

struct JsonSerializer;
struct CborSerializer;

fn main() {
    0i32.serialize(&mut CborSerializer);
}

trait Serialize<S: Serializer> {
    fn serialize(&self, &mut S);
}

trait Serializer {}
impl Serializer for JsonSerializer {}
impl Serializer for CborSerializer {}

impl<S: Serializer> Serialize<S> for i32 {
    default fn serialize(&self, _: &mut S) {
        println!("default");
    }
}

impl Serialize<JsonSerializer> for i32 {
    fn serialize(&self, _: &mut JsonSerializer) {
        println!("specialized");
    }
}
error[E0308]: mismatched types
 --> <anon>:7:20
  |
7 |     0i32.serialize(&mut CborSerializer);
  |                    ^^^^^^^^^^^^^^^^^^^ expected struct `JsonSerializer`, found struct `CborSerializer`
  |
  = note: expected type `&mut JsonSerializer`
  = note:    found type `&mut CborSerializer`

@dtolnay
Copy link
Member

dtolnay commented Dec 21, 2016

I filed it as rust-lang/rust#38516.

@dtolnay
Copy link
Member

dtolnay commented Feb 26, 2017

Let's reopen when somebody is ready to dedicate time to this and we have a clearer commitment to specialization.

@dtolnay dtolnay closed this Feb 26, 2017
@gavento
Copy link

gavento commented Dec 29, 2017

While rust specialization is still not fully working, there is certainly some commitment to it now, and I would be also able to dedicate some time to a CBOR crate.

I have written a forum post on CBOR in rust also dealing with tags (even though they are not the main point) and I am looking for some kind of consensus on where to go from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants