[#11] rfc: propose the metadata schema spec for Unified Catalog #14

jerryshao · 2023-05-11T08:36:42Z

What changes were proposed in this pull request?

This PR propose the schema and type spec for Unified Catalog. This spec is used to describe how metadata is organized in the system.

Why are the changes needed?

This PR defines the basic metadata schema model, which will be used in the system for memory structure, on-wire protocol and serialization protocol.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

N/A

xunliu · 2023-05-12T03:12:07Z

rfc/rfc-1/rfc-1.md

+    2. We will further use Substrait to represent our logical plans (for example, like view, function and others), so using Substrait’s type system will reduce some converting works later on.
+2. We choose JSON protocol as our user-faced protocol, which is easy to debug for users and systems.
+3. We choose Protobuf binary layout to store the schema, the main considerations are here:
+    1. Binary layout is much more concise compared to HMS’s schema layout.


Do we need to perform Search or other operations on schema?
If so, Use Binary storage would make these operations difficult to support.
Maybe we can refer to SnowflakeDB's metadata use of AVRO format for storage?

Can you please give an example of how we search the metadata? Also, AVRO is also a binary format IIUC.

You could check the details of Snowflake's metadata design, it also says that using AVRO makes use hard to query the metadata compared to SQL DB, so Snowflake builds a series of CLI tools for users to maintain the metadata.

xunliu · 2023-05-15T06:53:00Z

rfc/rfc-1/rfc-1.md

+
+| Field Name          | Field Type      | Description                                                  | Optional |
+| ------------------- | --------------- | ------------------------------------------------------------ | -------- |
+| connection_id (TBD) | uint32          | The unique id to represent the connector which used to get physical table | Required |


I think adding a new word to differentiate two types of different data source connect:

collector: connect data source get/put metadata.

connector: connect data source get/put data.
What do you think?

This is just a placeholder, I will update the doc when you finish the connection-related design

jerryshao · 2023-05-16T02:40:20Z

@xunliu would you please review this again when you have time, thanks.

xunliu

LGTM.

Propose the metadata schema spec for Unified Catalog

57f29c0

jerryshao requested a review from xunliu May 11, 2023 08:36

jerryshao self-assigned this May 11, 2023

jerryshao requested a review from JunpingDu May 11, 2023 08:38

jerryshao closed this May 11, 2023

jerryshao reopened this May 11, 2023

jerryshao closed this May 11, 2023

jerryshao reopened this May 11, 2023

polish table type spec to add virtual table definition

90abcb3

xunliu reviewed May 12, 2023

View reviewed changes

xunliu reviewed May 15, 2023

View reviewed changes

xunliu previously approved these changes May 15, 2023

View reviewed changes

jerryshao added 2 commits May 16, 2023 10:23

continue the definition updating

2ce00f4

continue the definition updating

582ba24

jerryshao dismissed xunliu’s stale review via 582ba24 May 16, 2023 02:37

xunliu approved these changes May 16, 2023

View reviewed changes

jerryshao merged commit a7607d6 into apache:main May 16, 2023

jerryshao mentioned this pull request May 16, 2023

[SUBTASK] Metadata and Type Spec design #11

Closed

LanceHsun mentioned this pull request Jun 21, 2024

[Bug report] Chrome failed to start during MetalakePageTest on WSL2 #3927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[#11] rfc: propose the metadata schema spec for Unified Catalog #14

[#11] rfc: propose the metadata schema spec for Unified Catalog #14

Uh oh!

jerryshao commented May 11, 2023

Uh oh!

xunliu May 12, 2023

Uh oh!

jerryshao May 12, 2023

Uh oh!

xunliu May 15, 2023

Uh oh!

jerryshao May 15, 2023

Uh oh!

jerryshao commented May 16, 2023

Uh oh!

xunliu left a comment

Uh oh!

Uh oh!

[#11] rfc: propose the metadata schema spec for Unified Catalog #14

[#11] rfc: propose the metadata schema spec for Unified Catalog #14

Uh oh!

Conversation

jerryshao commented May 11, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xunliu May 12, 2023

Choose a reason for hiding this comment

Uh oh!

jerryshao May 12, 2023

Choose a reason for hiding this comment

Uh oh!

xunliu May 15, 2023

Choose a reason for hiding this comment

Uh oh!

jerryshao May 15, 2023

Choose a reason for hiding this comment

Uh oh!

jerryshao commented May 16, 2023

Uh oh!

xunliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!