-
Notifications
You must be signed in to change notification settings - Fork 599
[#11] rfc: propose the metadata schema spec for Unified Catalog #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2. We will further use Substrait to represent our logical plans (for example, like view, function and others), so using Substrait’s type system will reduce some converting works later on. | ||
2. We choose JSON protocol as our user-faced protocol, which is easy to debug for users and systems. | ||
3. We choose Protobuf binary layout to store the schema, the main considerations are here: | ||
1. Binary layout is much more concise compared to HMS’s schema layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to perform Search or other operations on schema?
If so, Use Binary
storage would make these operations difficult to support.
Maybe we can refer to SnowflakeDB's metadata use of AVRO format for storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give an example of how we search the metadata? Also, AVRO is also a binary format IIUC.
You could check the details of Snowflake's metadata design, it also says that using AVRO makes use hard to query the metadata compared to SQL DB, so Snowflake builds a series of CLI tools for users to maintain the metadata.
rfc/rfc-1/rfc-1.md
Outdated
|
||
| Field Name | Field Type | Description | Optional | | ||
| ------------------- | --------------- | ------------------------------------------------------------ | -------- | | ||
| connection_id (TBD) | uint32 | The unique id to represent the connector which used to get physical table | Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a new word to differentiate two types of different data source connect:
collector
: connect data source get/put metadata.connector
: connect data source get/put data.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a placeholder, I will update the doc when you finish the connection-related design
@xunliu would you please review this again when you have time, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What changes were proposed in this pull request?
This PR propose the schema and type spec for Unified Catalog. This spec is used to describe how metadata is organized in the system.
Why are the changes needed?
This PR defines the basic metadata schema model, which will be used in the system for memory structure, on-wire protocol and serialization protocol.
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
N/A