Skip to content

Conversation

greenhat616
Copy link

Close #14

The coding approach is according to ron-rs/ron#451.

The serde_yaml_ng::from_str::<Outer>(yaml).is_err() should always failed with invalid type: unit value, expected a string, which is blocked by serde-rs/serde#1183

Copy link
Owner

@acatton acatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some comments on the code. But I have some overall notes.

First of all, I'm a little uneasy with the change. There are at least 3 comments about hacks in this change, which kinda makes me uncomfortable. But I'm a lenient person, so I'm okay giving all of them a pass. I'm just uncomfortable because I'm not fully mastering this topic, or even the bug it's trying to fix.

However, this doesn't break backward compatibility. So I'm strongly considering approving this change.

Even though I don't know yet, there are some things that need to be addressed if you want me be able to merge it.

  1. Please make only one commit. I don't mind multiple commits on changes. But I hate clean up commits. I know this repo has many from me, but as the owner of the repo, I have the right to break my own rules 😄 . I just don't like it when other don't follow my rules 😛 . For this change, I would like to ask you to make only one commit please, because I think it fits nicely into one.
  2. Please rebase onto the master branch, as I've said in one of the comments, the tests were failing when you forked it, and I fixed them ~1h ago. I want the CI to be green before merging this change. It is currently green on master, if it becomes red again, no need to fix it, I'll fix it again. We're regularly running master against the latest clippy, so when clippy adds new checks, it breaks once in a while.
  3. Please use git commit -s as mentionned in the README. This means that you signed off the Developer Certificate of Origin. See more information in my comment about it

Thank you for your patch, with a nice test 🙂 .

@greenhat616 greenhat616 force-pushed the master branch 2 times, most recently from 4037163 to 35bb238 Compare October 21, 2024 06:44
@greenhat616
Copy link
Author

I have revised the comments and squashed the commits into a single signed commit.

@greenhat616 greenhat616 requested a review from acatton October 21, 2024 06:47
Copy link
Owner

@acatton acatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it. I'm not gonna lie, it's not my favorite change, but I think it's fine.

I would like to ask you to fix the comment from Mingun (I commented on it). Other than that I think we're good to go. 👍

Thanks for your contribution 🙂

@greenhat616 greenhat616 force-pushed the master branch 2 times, most recently from ec3a7c0 to 39cfdee Compare November 4, 2024 08:51
@greenhat616
Copy link
Author

Hi. I have updated the adviced lines.

@greenhat616 greenhat616 requested review from acatton and Mingun November 5, 2024 12:08
Copy link
Contributor

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally send two comments without starting review.

My main concern here that this changes is not described in the code, how they are worked and why in that way.

I understood that you've represent tagged YAML values as maps and use visit_map when deserializing value is serde's Content, but it is not quite clear why it is needed to call visit_map for that? Why visit_enum does not work or why we cannot use visit_map in all cases?

@greenhat616
Copy link
Author

Hi, I’ve got a bit of a breather from my busy work, and over the next few days, I’ll start addressing these reviews.

Close acatton#14.

I adapted the implementation from [ron-rs/ron#451](ron-rs/ron#451).

This is a workaround for Serde's internal buffer type used when deserializing via `visit_enum`.

Signed-off-by: Jonson Petard <41122242+greenhat616@users.noreply.github.com>
@michalmoc
Copy link

@Mingun @acatton any chances on this MR? I need this solution too

Copy link
Contributor

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me summarize what the problem this PR fixes.

We have a structure with enum field a, and enum { A, B, C(String) }. The values of such string are represented in YAML as (here I use an array of 4 items to show each variant):

- a: !A # Inner { a: Enum::A }
- a: !B # Inner { a: Enum::B }
- a: !C # Inner { a: Enum::C("") }
- a: !C inner string # Inner { a: Enum::C("inner string") }

The tags (!Thing) is a thing where representation in JSON and YAML are differs. The JSON equivalent representation would be

- a: A # Inner { a: Enum::A }
- a: B # Inner { a: Enum::B }
- a: { C: null } # Inner { a: Enum::C("") }
- a: { C: "inner string" } # Inner { a: Enum::C("inner string") }

or using only canonical YAML syntax

- a: A # Inner { a: Enum::A }
- a: B # Inner { a: Enum::B }
- a:   # Inner { a: Enum::C("") }
    C:
- a:   # Inner { a: Enum::C("inner string") }
    C: inner string

Deserialization of Enum type involves calling Deserializer::deserialize_enum which is DeserializerFromEvents::deserialize_enum when we deserialize from yaml deserializer.

Now we want to flatten our struct type into another type:

#[derive(Deserialize, PartialEq, Debug)]
struct Outer {
  #[serde(flatten)]
  inner: Inner,
}

but due to #[serde(flatten)] deserialization code buffers all content into internal Content type. That buffering disrupts many formats, because then types are deserialized from that buffer using generic ContentDeserializer or ContentRefDeserializer which are not aware of the special handling in some formats. That is what serde-rs/serde#1183 about.

Bufferization is performed by calling Deserializer::deserialize_any which is DeserializerFromEvents::deserialize_any. Content is created by using ContentVisitor which does not support Visitor::visit_enum. YAML deserializer uses visit_enum in deserialize_any and therefore gets error (note, that using it in deserialize_enum is fine because this method not called when deserialize into Content).

The fix detects when we trying to deserialize into Content (is_serde_content variable) and replaces visit_enum by visit_map. The serde_json never suffers from that, because it never calls visit_enum in deserialize_any.

Because now we sometimes call visit_map instead of visit_enum, we implement MapAccess for EnumAccess. The new flag has_visited tracks if we already deserialized the key or not, in order to our map represent one-entry map.

I would rename and maybe remove some fields and change comments according to the description above (I left individual comments), but in other aspects I think it can be merged.

Comment on lines +439 to +441
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true.
/// That is required because `visit_enum` is not available in serde internal newtype visitor.
is_serde_content_newtype: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true.
/// That is required because `visit_enum` is not available in serde internal newtype visitor.
is_serde_content_newtype: bool,
/// A flag to track, whether a [`serde::__private::de::Content`] type is deserialized
/// from this deserializer. This is hackish solution of the flattening problem and
/// some other transformations which involves bufferization into the value of the
/// `Content` type. This problem is tracked in https://github.com/serde-rs/serde/issues/1183.
///
/// The bufferization performed by capturing everything via `deserialize_any` using
/// a special visitor, which creates `Content` value, but this visitor returns error
/// from `visit_enum` method. Because it is used in our implementation of `deserialize_any`,
/// flattening (and something other) does not work.
///
/// When we detect, that target type for deserialization is `Content`, we call `visit_map`
/// instead of `visit_enum` and represent enum as one-entry map `{ variant_name: variant_value }`.
/// This should be properly fixed in serde, but serde mainainers gives almost none attention to PRs.
///
/// When we start deserializing `Content`, we should pass the same flag to all nested
/// deserializers to avoid calling `visit_enum`.
bufferization_in_progress: bool,

(because English is neither my native or favorite language, some fixes in wording may be required, please do that)

Actually, I'm not sure that this flag is ever required. If we would deserialize the nested Content, it would be deserialized using DeserializerFromEvents::deserialize_any and the check for the type name will handle that. I suggest to check if the fix would work if remove this field.

I also suggest to introduce a helper function and put the comment from above on it (adapted of course), if the field will be removed:

fn is_serde_content<T>() -> bool {
  std::any::type_name::<V::Value>() == std::any::type_name::<serde::__private::de::Content>()
}

Comment on lines +752 to +754
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true.
/// That is required because `visit_enum` is not available in serde internal newtype visitor.
has_visited: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By itself, representing an enumeration as a one-entry map is not a hacker solution, so there is no need to mention it:

Suggested change
/// a flag to do a hack to run visitor.visit_map() instead of visitor.visit_enum() when `is_serde_content_newtype` is true.
/// That is required because `visit_enum` is not available in serde internal newtype visitor.
has_visited: bool,
/// When this struct is used as [`MapAccess`], tracks whether the key was already
/// requested or not. We want to represent enum value as a one-entry map.
/// When key was requested, this field is set to `true`
is_key_deserialized: bool,

Comment on lines +272 to +282
let yaml: &str = indoc! {"
a: !C
"};
// let expected = Outer {
// inner: Inner {
// a: Enum::C(String::new()),
// },
// };
// test_de_no_value(yaml, &expected);
// This should fail. Blocked by by https://github.com/serde-rs/serde/issues/1183
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is expected here, because according to the description in the review, this should be deserialized to None if it would be Option:

Suggested change
let yaml: &str = indoc! {"
a: !C
"};
// let expected = Outer {
// inner: Inner {
// a: Enum::C(String::new()),
// },
// };
// test_de_no_value(yaml, &expected);
// This should fail. Blocked by by https://github.com/serde-rs/serde/issues/1183
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err());
let yaml: &str = indoc! {"
a: !C
"};
assert!(serde_yaml_ng::from_str::<Inner>(yaml).is_err());
assert!(serde_yaml_ng::from_str::<Outer>(yaml).is_err());

Of course, if deserialization of Inner works (I didn't check that), keep that comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Newtype enum variant deserialization failure when used with #[serde(flatten)]
4 participants