-
-
Notifications
You must be signed in to change notification settings - Fork 844
Tagged values through specialization #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some serialization formats allow the addition of "tags" to values to add additional information. These formats include CBOR, BSON and YAML. This commit introduces serialization and deserialization of tags. It introduces a trait Tagger that resolves the tag for a specific format. It supports integer, binary, and string tags and is extensible to cover more yet unknown formats. By default tags are discarded at deserialization except when the user has implemented visit_tagged_value function. To serialize tags the user has to call serialize_tagged_value. Tagged values may be of any type. Thanks to @dtolnay, @oli-obk and @erickt for their comments on tags. Supersedes serde-rs#301
we can add this to 0.8 later, because it doesn't need to be a breaking change. We can add a default impl for the new methods that simply error. This is actually the correct behaviour for all |
Agreed that this should go in backward-compatibly later. (I have not reviewed yet.) |
I am testing this but before I have to port my cbor crate to serde 0.8.0. |
It turned out I am not smart enough to understand tagged values through specialization. 😞 I was not yet able to get it working. I have three problems with the approach:
The deserialization part have I not yet testet. I have published a version of serde_cbor for serde 0.8-rc3. Feel free to try adding tagging yourself. All relevant information is in RFC 7049. |
we can probably solve this with a macro once we get the design to work
Maybe specialization allows us to do this (I'll check). if not, lets talk to the lang team, maybe it makes sense to extend specialization to do this.
I obviously have no clue about the tagging, that's why we're having this conversation. We'll figure it out ;) Let me play a little. |
I think adding a |
alternatively, if we assume that a specific format only has one type of tag, we can add an associated type. Is that an option? |
see https://github.com/oli-obk/cbor/commits/serde0.8 for the working serialization. Point 2 is still not addressed. I won't get to it before next week. |
@@ -71,7 +71,7 @@ pub trait Serialize { | |||
/// reference to that state. You do not need to do any additional checks for the correctness of the | |||
/// state object, as it is expected that the user will not modify it. Due to the generic nature | |||
/// of the `Serialize` impls, modifying the object is impossible on stable Rust. | |||
pub trait Serializer { | |||
pub trait Serializer: Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still necessary for Serializer
to be bound by Sized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check, but unsized serializers are mostly useless anyway. I don't think they can be used to serialize anything, but I didn't check
@oli-obk I haven't worked through an example but I want to make sure we can support values that have different tags in different formats. So I should be able to call |
The Serialize impl is specialized on the serializer to produce a different tag value and type depending on the serializer. The serializers themselves forward to a tagserializer to check the type and get the value. Look at the unit tests I wrote in serde_testing |
Ok, so this is a problem I don't know how to solve in my current design. What we can do, is change the pub trait Serialize<S: Serializer> {
/// Serializes this value into this serializer.
fn serialize(&self, serializer: &mut S) -> Result<(), S::Error>;
} Then we can create a default impl in This will mean that |
Let's keep thinking, that doesn't sound great. |
I have an idea. We do make the change for Serialize, but we add a second trait Serializable, which replaces Serialize wherever it is used (not implemented, only where it's used as a bound). This way we can specialize Serialize (like I did in the unit tests). I'll fiddle together a minimal impl to prove that the type system can do all we need |
Here is a very different, promising, not fully fleshed out approach. This is what I was getting at in my previous comment "Why do we need the bound #![feature(specialization)]
//// serde crate ///////////////////////////////////////////
#[macro_use]
pub mod serde {
pub trait Serialize {
fn serialize<S: ?Sized>(&self, &mut S) -> Result<(), S::Error>
where S: Serializer;
}
pub trait Serializer {
type Error;
fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error>;
// No trait bounds on T.
fn serialize_tagged<T, V>(&mut self,
_tag: T,
value: V) -> Result<(), Self::Error>
where V: Serialize
{
value.serialize(self)
}
}
impl<'a> Serialize for &'a str {
fn serialize<S: ?Sized>(&self,
serializer: &mut S) -> Result<(), S::Error>
where S: Serializer
{
serializer.serialize_str(self)
}
}
pub trait Tagged<T> {
fn tag(self) -> T;
}
// Just for convenience - this is helpful if you only care about the
// tags of a single format.
impl<T> Tagged<T> for T {
fn tag(self) -> T {
self
}
}
// Don't read this until the end. The stuff below will look like magic.
#[macro_export]
macro_rules! serialize_tagged {
((&mut $self_:ident, $tagname:ident: $tagtype:ty, $value:ident) -> $ret:ty $blk:block) => {
fn serialize_tagged<T, V>(&mut self,
tag: T,
value: V) -> Result<(), Self::Error>
where V: serde::Serialize
{
return tag.distinguish(self, value);
trait Distinguish {
fn distinguish<V>(
self,
serializer: &mut Serializer,
value: V,
) -> Result<(), <Serializer as serde::Serializer>::Error>
where V: serde::Serialize;
}
impl<T> Distinguish for T {
default fn distinguish<V>(
self,
serializer: &mut Serializer,
value: V,
) -> Result<(), <Serializer as serde::Serializer>::Error>
where V: serde::Serialize,
{
value.serialize(serializer)
}
}
impl<T> Distinguish for T where T: serde::Tagged<$tagtype> {
fn distinguish<V>(
self,
serializer: &mut Serializer,
value: V,
) -> Result<(), ()>
where V: serde::Serialize
{
serializer.finalize(value, self.tag())
}
}
// The point is to set `self` back to the Serializer as opposed
// to the tag, so the user's code can use `self` in a way that
// makes more sense.
trait Finalize: serde::Serializer {
fn finalize<V>(&mut self, V, $tagtype) -> Result<(), Self::Error>
where V: serde::Serialize;
}
impl Finalize for Serializer {
fn finalize<V>(
&mut $self_,
$value: V,
$tagname: $tagtype,
) -> Result<(), Self::Error>
where V: serde::Serialize
{
$blk
}
}
}
};
}
}
//// serde_stdout crate ////////////////////////////////////
pub mod serde_stdout {
use serde;
pub struct Serializer;
pub enum Tag {
Plus,
Minus,
}
impl serde::Serializer for Serializer {
type Error = ();
fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error> {
println!("{}", value);
Ok(())
}
// This macro hides all the specialization and just gives you tags
// of the right type along with the value being serialized.
serialize_tagged! {
(&mut self, tag: Tag, value) -> Result<(), Self::Error> {
match tag {
Tag::Plus => print!("+"),
Tag::Minus => print!("-"),
};
value.serialize(self)
}
}
}
}
//// serde_stderr crate ////////////////////////////////////
pub mod serde_stderr {
use serde;
pub struct Serializer;
// Does not implement serialize_tagged.
impl serde::Serializer for Serializer {
type Error = ();
fn serialize_str(&mut self, value: &str) -> Result<(), Self::Error> {
use std::io::{self, Write};
writeln!(&mut io::stderr(), "{}", value).unwrap();
Ok(())
}
}
}
//// my_crate //////////////////////////////////////////////
pub mod my_crate {
use serde;
use serde_stdout as stdout;
pub struct MyTaggedStr<'a>(pub &'a str);
impl<'a> serde::Serialize for MyTaggedStr<'a> {
fn serialize<S: ?Sized>(&self,
serializer: &mut S) -> Result<(), S::Error>
where S: serde::Serializer
{
// This passes a concrete tag that relies on the impl Tagged<T> for T.
// Could instead pass some other type for which we impl
// Tagged<format1::Tag>, Tagged<format2::Tag>, Tagged<format3::Tag>.
// Those impls can be provided by the crate implementing Serialize
// *and/or* the crates containing each Serializer.
serializer.serialize_tagged(stdout::Tag::Plus, self.0)
}
}
}
fn main() {
use serde::Serialize;
let v = my_crate::MyTaggedStr("test");
v.serialize(&mut serde_stdout::Serializer).unwrap();
v.serialize(&mut serde_stderr::Serializer).unwrap();
} |
@dtolnay Your proposal looks promising. This could work for CBOR and other formats. |
I experimented a bit with the I haven't given up yet because fundamentally we need (something like) |
I also ran into this confusing behavior where it looks like the blanket impl is not kicking in as I would expect. I will read through the impl specialization RFC again and see where this is specified. #![feature(specialization)]
struct JsonSerializer;
struct CborSerializer;
fn main() {
0i32.serialize(&mut CborSerializer);
}
trait Serialize<S: Serializer> {
fn serialize(&self, &mut S);
}
trait Serializer {}
impl Serializer for JsonSerializer {}
impl Serializer for CborSerializer {}
impl<S: Serializer> Serialize<S> for i32 {
default fn serialize(&self, _: &mut S) {
println!("default");
}
}
impl Serialize<JsonSerializer> for i32 {
fn serialize(&self, _: &mut JsonSerializer) {
println!("specialized");
}
}
|
I filed it as rust-lang/rust#38516. |
Let's reopen when somebody is ready to dedicate time to this and we have a clearer commitment to specialization. |
While rust specialization is still not fully working, there is certainly some commitment to it now, and I would be also able to dedicate some time to a CBOR crate. I have written a forum post on CBOR in rust also dealing with tags (even though they are not the main point) and I am looking for some kind of consensus on where to go from here. |
@pyfisch could you test whether this fits your needs?
have a look at the tests in
serde_macros
to see how it should be used from theDeserialize
impl side and look at theDeserializer
andSerializer
impls inserde_test
to see how the actual format should use this.Note that this requires that a crate implementing
Deserialize
for a tagged type to have theDeserializer
as a dependency.supersedes #408