-
Notifications
You must be signed in to change notification settings - Fork 11
fix: partial support flatten enum in struct #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put some comments on the code. But I have some overall notes.
First of all, I'm a little uneasy with the change. There are at least 3 comments about hacks in this change, which kinda makes me uncomfortable. But I'm a lenient person, so I'm okay giving all of them a pass. I'm just uncomfortable because I'm not fully mastering this topic, or even the bug it's trying to fix.
However, this doesn't break backward compatibility. So I'm strongly considering approving this change.
Even though I don't know yet, there are some things that need to be addressed if you want me be able to merge it.
- Please make only one commit. I don't mind multiple commits on changes. But I hate clean up commits. I know this repo has many from me, but as the owner of the repo, I have the right to break my own rules 😄 . I just don't like it when other don't follow my rules 😛 . For this change, I would like to ask you to make only one commit please, because I think it fits nicely into one.
- Please rebase onto the
master
branch, as I've said in one of the comments, the tests were failing when you forked it, and I fixed them ~1h ago. I want the CI to be green before merging this change. It is currently green onmaster
, if it becomes red again, no need to fix it, I'll fix it again. We're regularly running master against the latest clippy, so when clippy adds new checks, it breaks once in a while. - Please use
git commit -s
as mentionned in the README. This means that you signed off the Developer Certificate of Origin. See more information in my comment about it
Thank you for your patch, with a nice test 🙂 .
4037163
to
35bb238
Compare
I have revised the comments and squashed the commits into a single signed commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it. I'm not gonna lie, it's not my favorite change, but I think it's fine.
I would like to ask you to fix the comment from Mingun (I commented on it). Other than that I think we're good to go. 👍
Thanks for your contribution 🙂
ec3a7c0
to
39cfdee
Compare
Hi. I have updated the adviced lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accidentally send two comments without starting review.
My main concern here that this changes is not described in the code, how they are worked and why in that way.
I understood that you've represent tagged YAML values as maps and use visit_map
when deserializing value is serde's Content
, but it is not quite clear why it is needed to call visit_map
for that? Why visit_enum
does not work or why we cannot use visit_map
in all cases?
Hi, I’ve got a bit of a breather from my busy work, and over the next few days, I’ll start addressing these reviews. |
Close acatton#14. I adapted the implementation from [ron-rs/ron#451](ron-rs/ron#451). This is a workaround for Serde's internal buffer type used when deserializing via `visit_enum`. Signed-off-by: Jonson Petard <41122242+greenhat616@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me summarize what the problem this PR fixes.
We have a structure with enum field a
, and enum { A, B, C(String) }
. The values of such string are represented in YAML as (here I use an array of 4 items to show each variant):
- a: !A # Inner { a: Enum::A }
- a: !B # Inner { a: Enum::B }
- a: !C # Inner { a: Enum::C("") }
- a: !C inner string # Inner { a: Enum::C("inner string") }
The tags (!Thing
) is a thing where representation in JSON and YAML are differs. The JSON equivalent representation would be
- a: A # Inner { a: Enum::A }
- a: B # Inner { a: Enum::B }
- a: { C: null } # Inner { a: Enum::C("") }
- a: { C: "inner string" } # Inner { a: Enum::C("inner string") }
or using only canonical YAML syntax
- a: A # Inner { a: Enum::A }
- a: B # Inner { a: Enum::B }
- a: # Inner { a: Enum::C("") }
C:
- a: # Inner { a: Enum::C("inner string") }
C: inner string
Deserialization of Enum
type involves calling Deserializer::deserialize_enum
which is DeserializerFromEvents::deserialize_enum
when we deserialize from yaml deserializer.
Now we want to flatten our struct type into another type:
#[derive(Deserialize, PartialEq, Debug)]
struct Outer {
#[serde(flatten)]
inner: Inner,
}
but due to #[serde(flatten)]
deserialization code buffers all content into internal Content
type. That buffering disrupts many formats, because then types are deserialized from that buffer using generic ContentDeserializer
or ContentRefDeserializer
which are not aware of the special handling in some formats. That is what serde-rs/serde#1183 about.
Bufferization is performed by calling Deserializer::deserialize_any
which is DeserializerFromEvents::deserialize_any
. Content
is created by using ContentVisitor
which does not support Visitor::visit_enum
. YAML deserializer uses visit_enum
in deserialize_any
and therefore gets error (note, that using it in deserialize_enum
is fine because this method not called when deserialize into Content
).
The fix detects when we trying to deserialize into Content
(is_serde_content
variable) and replaces visit_enum
by visit_map
. The serde_json
never suffers from that, because it never calls visit_enum
in deserialize_any
.
Because now we sometimes call visit_map
instead of visit_enum
, we implement MapAccess
for EnumAccess
. The new flag has_visited
tracks if we already deserialized the key or not, in order to our map represent one-entry map.
I would rename and maybe remove some fields and change comments according to the description above (I left individual comments), but in other aspects I think it can be merged.
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true. | ||
/// That is required because `visit_enum` is not available in serde internal newtype visitor. | ||
is_serde_content_newtype: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true. | |
/// That is required because `visit_enum` is not available in serde internal newtype visitor. | |
is_serde_content_newtype: bool, | |
/// A flag to track, whether a [`serde::__private::de::Content`] type is deserialized | |
/// from this deserializer. This is hackish solution of the flattening problem and | |
/// some other transformations which involves bufferization into the value of the | |
/// `Content` type. This problem is tracked in https://github.com/serde-rs/serde/issues/1183. | |
/// | |
/// The bufferization performed by capturing everything via `deserialize_any` using | |
/// a special visitor, which creates `Content` value, but this visitor returns error | |
/// from `visit_enum` method. Because it is used in our implementation of `deserialize_any`, | |
/// flattening (and something other) does not work. | |
/// | |
/// When we detect, that target type for deserialization is `Content`, we call `visit_map` | |
/// instead of `visit_enum` and represent enum as one-entry map `{ variant_name: variant_value }`. | |
/// This should be properly fixed in serde, but serde mainainers gives almost none attention to PRs. | |
/// | |
/// When we start deserializing `Content`, we should pass the same flag to all nested | |
/// deserializers to avoid calling `visit_enum`. | |
bufferization_in_progress: bool, |
(because English is neither my native or favorite language, some fixes in wording may be required, please do that)
Actually, I'm not sure that this flag is ever required. If we would deserialize the nested Content
, it would be deserialized using DeserializerFromEvents::deserialize_any
and the check for the type name will handle that. I suggest to check if the fix would work if remove this field.
I also suggest to introduce a helper function and put the comment from above on it (adapted of course), if the field will be removed:
fn is_serde_content<T>() -> bool {
std::any::type_name::<V::Value>() == std::any::type_name::<serde::__private::de::Content>()
}
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true. | ||
/// That is required because `visit_enum` is not available in serde internal newtype visitor. | ||
has_visited: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By itself, representing an enumeration as a one-entry map is not a hacker solution, so there is no need to mention it:
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true. | |
/// That is required because `visit_enum` is not available in serde internal newtype visitor. | |
has_visited: bool, | |
/// When this struct is used as [`MapAccess`], tracks whether the key was already | |
/// requested or not. We want to represent enum value as a one-entry map. | |
/// When key was requested, this field is set to `true` | |
is_key_deserialized: bool, |
let yaml: &str = indoc! {" | ||
a: !C | ||
"}; | ||
// let expected = Outer { | ||
// inner: Inner { | ||
// a: Enum::C(String::new()), | ||
// }, | ||
// }; | ||
// test_de_no_value(yaml, &expected); | ||
// This should fail. Blocked by by https://github.com/serde-rs/serde/issues/1183 | ||
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error is expected here, because according to the description in the review, this should be deserialized to None
if it would be Option
:
let yaml: &str = indoc! {" | |
a: !C | |
"}; | |
// let expected = Outer { | |
// inner: Inner { | |
// a: Enum::C(String::new()), | |
// }, | |
// }; | |
// test_de_no_value(yaml, &expected); | |
// This should fail. Blocked by by https://github.com/serde-rs/serde/issues/1183 | |
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err()); | |
let yaml: &str = indoc! {" | |
a: !C | |
"}; | |
assert!(serde_yaml_ng::from_str::<Inner>(yaml).is_err()); | |
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err()); |
Of course, if deserialization of Inner
works (I didn't check that), keep that comment.
Close #14
The coding approach is according to ron-rs/ron#451.
The
serde_yaml_ng::from_str::<Outer>(yaml).is_err()
should always failed withinvalid type: unit value, expected a string
, which is blocked by serde-rs/serde#1183