Skip to content

Conversation

elefeint
Copy link
Contributor

@elefeint elefeint commented Nov 9, 2023

This is a follow up from #9603 (comment).

DBConfig::UserAgent() does not have the git SHA, so I kept DuckDB::SourceID() as the last token in the HTTP User-Agent header sent.

CLI Before:

GET /v0.9.1/linux_amd64/httpfs.duckdb_extension.gz HTTP/1.1
User-Agent: DuckDB 0.9.1 19862840b8 linux_amd64

CLI After:

GET /v0.9.1/linux_amd64/httpfs.duckdb_extension.gz HTTP/1.1
User-Agent: duckdb/0.9.1(linux_amd64) 19862840b8

Example from a Go app

GET /v0.9.1/linux_amd64/httpfs.duckdb_extension.gz HTTP/1.1
User-Agent: duckdb/0.9.1(linux_amd64) capi 19862840b8

@carlopi
Copy link
Contributor

carlopi commented Nov 10, 2023

Given we are at this, and this is not (yet) set in stone: could it make sense to have UserAgent be made backward compatible so that when used here the first few elements matches the old syntax?

Before:
User-Agent: DuckDB 0.9.1 19862840b8 linux_amd64

With this PR:
User-Agent: duckdb/0.9.1(linux_amd64) 19862840b8 [capi | ...]

CLI with this PR + changing (slightly) UserAgent():
User-Agent: DuckDB 0.9.1 19862840b8 linux_amd64 [capi | ...]

This would require changing UserAgent string, but I think it might be for the better to have a simpler space delimited string.

I would also consider having the last part of the user-agent be non-optional, with a default of "default" or something, this would also simplify parsing (say we add stuff after, it's not super cool to parse A B C [D] [E], if both D and E can be optional).

@elefeint
Copy link
Contributor Author

Hmm. It might make sense to deviate even farther from the original format of DuckDB <version> <sha> <platform>, and to fold the SHA as a detail under the first duckdb token:

User-Agent: duckdb/0.9.1(linux_amd64;19862840b8) [capi | ...]

Here is why:

  1. The original user-agent value described only core DuckDB. When core DuckDB versioning diverges from API versioning (with DuckDB core becoming stable and slower changing), it will be helpful to have a version/sha combination for each layer.
  2. The SHA is more useful for troubleshooting than analysis. When analyzing the data, answering the question of "what are all unique surfaces from which duckdb is used" would become very messy if SHA is a standalone first-layer attribute. With the product/version (detail) syntax, it's trivial to extract just the space separated products and discard the other information.

One more consideration is that listing products/layers in descending importance order and placing product detail in parentheses is what the user-agent spec asks for.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@Mytherin Mytherin changed the base branch from main to feature November 13, 2023 08:39
@Mytherin Mytherin changed the base branch from feature to main November 20, 2023 12:42
@carlopi carlopi self-requested a review December 1, 2023 18:16
@Mytherin
Copy link
Collaborator

Mytherin commented Dec 7, 2023

@carlopi can this be merged?

Copy link
Contributor

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me!

@Mytherin Mytherin merged commit 6af3519 into duckdb:main Dec 8, 2023
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 14, 2023
Merge pull request duckdb/duckdb#9632 from motherduckdb/user_agent_in_http_header
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants