Skip to content

illegal characters inside data_base64 #320

@ivan-penchev

Description

@ivan-penchev

Warning

I am reporting this as a bug, however I need help to verify if this is the case, or if I am doing something incorrect

Context

There are two services communicating trough Pulsar via CloudEvents.
Service A is a producer and written in .NET
Service B is a consumer and written in .Go

Service B uses this super simple code from the GoSDK examples:

var eventWithBase64Data cloudevents.Event
if err := json.Unmarshal(byteContentFromMessage, &eventWithBase64Data); err != nil {
	slog.Error("Failed to unmarshal cloudevent JSON", err)
}

And we noticed it fails with the message:

ERROR Failed to unmarshal cloudevent JSON !BADKEY="illegal base64 data at input byte 394"

Upon inspecting the message content I saw that in the middle of the data_base64 there is a \ character ... like zh\u0.
According to the RFC4648 this character is not on the list of allowed ones, so how is it present in the message.

Steps to reproduce

If you do not have .NET SDK installed (this works on Mac):

docker run --rm -it -u $(id -u):$(id -g) -e DOTNET_CLI_HOME=/tmp -e XDG_DATA_HOME=/tmp -v .:/code -w /code mcr.microsoft.com/dotnet/sdk:8.0

## Create project
dotnet new console -n CloudEventsEncoder
## Add package
dotnet add CloudEventsEncoder package CloudNative.CloudEvents.SystemTextJson

Afterwards edit ./CloudEventsEncoder/Program.csfile to create a simple CLI, which would take a cloudevent wrapper and data, and persist it as file.

Note

I am using files to simulate the real process in the service, in reality the cloudevent.json is generated on the run, and data.json is just an object fetched from an API

using System.Text;
using CloudNative.CloudEvents;
using CloudNative.CloudEvents.SystemTextJson;

string[] cliArgs = Environment.GetCommandLineArgs();

var cloudEventFilePath = cliArgs.Length > 1 ? cliArgs[1] : "cloudevent.json";
var cloudEventDataFilePath = cliArgs.Length > 2 ? cliArgs[2] : "data.json";
var outputFilePath = cliArgs.Length > 3 ? cliArgs[3] : "output.json";

var formatter = new JsonEventFormatter();

var cloudEventJson = File.ReadAllText(cloudEventFilePath);
var inputCloudEvent = formatter.DecodeStructuredModeMessage(Encoding.UTF8.GetBytes(cloudEventJson), contentType: null, extensionAttributes: null);

if (File.Exists(cloudEventDataFilePath))
    inputCloudEvent.Data = Encoding.UTF8.GetBytes(File.ReadAllText("data.json"));

if (!inputCloudEvent.IsValid)
    Console.Error.WriteLine("Invalid CloudEvent");

var serializedCloudEventMemory = formatter.EncodeStructuredModeMessage(inputCloudEvent, out var contentType);
byte[] serializedCloudEvent = serializedCloudEventMemory.ToArray();

File.WriteAllBytes(outputFilePath, serializedCloudEvent);
Console.WriteLine($"CloudEvent saved to {outputFilePath}");

Create a ./cloudevent.json and ./data.json files

My cloudevent.json
{
    "specversion": "1.0",
    "id": "00716a6e-6063-43d3-8717-66888f060055",
    "source": "urn:aws-global:7777",
    "type": "updated.v1",
    "time": "2027-06-14T15:50:52.7725755Z",
    "subject": "maika-ti",
    "traceparent": "00-f309c227009f1b0cad03e8b46ebc460b-951bf1a4f674aab2-01",
    "datacontenttype": "application/cloudevents+json"
}
My data.json (it is translations from different asian language)
{
  "texts": {
    "product": {
      "texts": [
        {
          "culture": "ko-KR",
          "description": "\uC55E\uC88C\uC11D\uC5D0 \uBBF8\uB2C8\uD53C\uACA8\uB97C \uC549\uD600 \uC870\uBA85 \uBE14\uB85D\uC744 \uCF20 \uB2E4\uC74C, \uB2EC\uB824\uB098\uAC00\uC138\uC694. \uC774\uC58F\uD638n�M�!"
        }
      ]
    }
  }
}
## To run and generate output.json file
dotnet run --project CloudEventsEncoder
Generate output.json with the illegal character
{
    "specversion": "1.0",
    "id": "00716a6e-6063-43d3-8717-66888f060055",
    "source": "urn:aws-global:7777",
    "type": "updated.v1",
    "time": "2027-06-14T15:50:52.7725755Z",
    "subject": "maika-ti",
    "traceparent": "00-f309c227009f1b0cad03e8b46ebc460b-951bf1a4f674aab2-01",
    "datacontenttype": "application/json",
    "data_base64": "ewogICJ0ZXh0cyI6IHsKICAgICJwcm9kdWN0IjogewogICAgICAidGV4dHMiOiBbCiAgICAgICAgewogICAgICAgICAgImN1bHR1cmUiOiAia28tS1IiLAogICAgICAgICAgImRlc2NyaXB0aW9uIjogIlx1QzU1RVx1Qzg4Q1x1QzExRFx1QzVEMCBcdUJCRjhcdUIyQzhcdUQ1M0NcdUFDQThcdUI5N0MgXHVDNTQ5XHVENjAwIFx1Qzg3MFx1QkE4NSBcdUJFMTRcdUI4NURcdUM3NDQgXHVDRjIwIFx1QjJFNFx1Qzc0QywgXHVCMkVDXHVCODI0XHVCMDk4XHVBQzAwXHVDMTM4XHVDNjk0LiBcdUM3NzRcdUM1OEZcdUQ2Mzhu77\u002B9Te\u002B/vSEiCiAgICAgICAgfQogICAgICBdCiAgICB9CiAgfQp9"
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions