Skip to content

Conversation

carlopi
Copy link
Contributor

@carlopi carlopi commented Jul 1, 2025

Without this PR, there are situations where httpfs extension defaults to own HTTPFSUtil, instead of using the global registered one. I think while this currently work, it's somewhat of a code smell, and basically prevent other extensions (or clients, such as duckdb-wasm) to have a clean path to override the default http_util to be used.

This is possibly also a problem connected to introduction of curl in the httpfs extension, since it's not possible for httpfs to offer two interfaces, and register one or the other depending on settings.

Solution is adding a layer of indirection, where the http_util shared_ptr is moved from DBConfig to an inner class HTTPUtilContainer. DBConfig keeps then a shared_ptr to HTTPUtilContainer, that keeps a shared_ptr to HTTPUtil currently in use. HTTPUtil keeps a weak_ptr back to the HTTPUtilContainer, since this makes so that having a HTTPUtil object in hand allows to walk back to the default HTTPUtil (and possibly setting it to a different value).

HTTPUtilContainer is also make compulsory for constructing a FileOpener, so that with a FileOpener it's always possible to get the currently in-use HTTPUtil class.

There are some layers of pointers, but this allows to remove a current limitation that is that HTTPFSUtil can be the only one to override HTTPUtil (and this needs to be 1 way only).

Main question is whether this is feasible.
Secondary questions are:

  • are there other objects that should be moved from DBConfig to HTTPUtilContainer?
  • how HTTPUtilContainer should be actually named? GlobalFileSystemProperties?

Main relevant change is the patch to HTTPFS, that goes from:

 	if (opener) {
		auto db = opener->TryGetDatabase();
		if (db) {
			auto &config = DBConfig::GetConfig(*db);
			return config.http_util;
		}
 	}
	return make_shared_ptr<HTTPFSUtil>();

to

 	if (opener) {
		return opener->http_util_container.get()->http_util;
        }
	// TODO: Actually provide a backup plan
	throw InternalException("FileOpener not provided, can't get HTTPUtil");

throw is there for testing, but haven't managed to trigger. I would think the opener is always present, but to limit damages if this is not correct would be to add a logging line + add back return make_shared_ptr<HTTPFSUtil>();, so that worst case same behavior is kept.

…Util can be reached from FileOpener

Without this PR, there are situations where `httpfs` extension defaults to own HTTPFSUtil, instead of using the global registered one.
I think while this currently work, it's somewhat of a code smell, and basically prevent other extensions (or clients, such as duckdb-wasm) to have
a clean path to override the default http_util to be used.

This is possibly also a problem connected to introduction of curl in the `httpfs` extension, since it's not possible for `httpfs` to offer two interfaces, and register one or the other
depending on settings.

Solution is adding a layer of indirection, where the `http_util` shared_ptr is moved from DBConfig to an inner class HTTPUtilContainer.
DBConfig keeps then a shared_ptr to HTTPUtilContainer, that keeps a shared_ptr to HTTPUtil currently in use.
HTTPUtil keeps a weak_ptr back to the HTTPUtilContainer, since this makes so that having a HTTPUtil object in hand allows to walk back to the default HTTPUtil (and possibly setting it to a different value).

HTTPUtilContainer is also make compulsory for constructing a FileOpener, so that with a FileOpener it's always possible to get the currently in-use HTTPUtil class.

There are some layers of pointers, but this allows to remove a current limitation that is that HTTPFSUtil can be the only one to override HTTPUtil (and this needs to be 1 way only).

Main question is whether this is feasible.
Secondary questions are:
* are there other objects that should be moved from DBConfig to HTTPUtilContainer?
* how HTTPUtilContainer should be actually named? GlobalFileSystemProperties?
@carlopi
Copy link
Contributor Author

carlopi commented Jul 1, 2025

Discussed on a side with @Mytherin, another approach would be adding a less invasive:

virtual shared_ptr<HTTPUtil> &GetHTTPUtil();

to FileOpener class.

I will have a look at that and then update.

@carlopi carlopi marked this pull request as ready for review July 1, 2025 08:54
@carlopi carlopi marked this pull request as draft July 1, 2025 08:54
carlopi added a commit to carlopi/duckdb that referenced this pull request Jul 1, 2025
Without this PR, there are situations where `httpfs` extension defaults to own HTTPFSUtil, instead of using the global registered one. I think while this currently work, it's somewhat of a code smell, and basically prevent other extensions (or clients, such as duckdb-wasm) to have a clean path to override the default http_util to be used.

This is possibly also a problem connected to introduction of curl in the `httpfs` extension, since it's not possible for `httpfs` to offer two interfaces, and register one or the other depending on settings.

This commit forces FileOpener to have a way to return a reference to the original shared_ptr<HTTPUtil> (possibly caching on costructor / other ways).
Then the main improvement is the chagnge (currently via a patch) that allows in duckdb-httpfs.

This is an improved / slimmer version of duckdb#18103, thanks to feedback from @Mytherin.
carlopi added a commit to carlopi/duckdb that referenced this pull request Jul 1, 2025
Without this PR, there are situations where `httpfs` extension defaults to own HTTPFSUtil, instead of using the global registered one. I think while this currently work, it's somewhat of a code smell, and basically prevent other extensions (or clients, such as duckdb-wasm) to have a clean path to override the default http_util to be used.

This is possibly also a problem connected to introduction of curl in the `httpfs` extension, since it's not possible for `httpfs` to offer two interfaces, and register one or the other depending on settings.

This commit forces FileOpener to have a way to return a reference to the original shared_ptr<HTTPUtil> (possibly caching on costructor / other ways).
Then the main improvement is the chagnge (currently via a patch) that allows in duckdb-httpfs.

This is an improved / slimmer version of duckdb#18103, thanks to feedback from @Mytherin.
Mytherin added a commit that referenced this pull request Jul 1, 2025
Without this PR, there are situations where `httpfs` extension defaults
to own HTTPFSUtil, instead of using the global registered one. I think
while this currently work, it's somewhat of a code smell, and basically
prevent other extensions (or clients, such as duckdb-wasm) to have a
clean path to override the default http_util to be used.

This is possibly also a problem connected to introduction of curl in the
`httpfs` extension, since it's not possible for `httpfs` to offer two
interfaces, and register one or the other depending on settings.

This commit forces FileOpener to have a way to return a reference to the
original shared_ptr<HTTPUtil> (possibly caching on costructor / other
ways). Then the main improvement is the chagnge (currently via a patch)
that allows in duckdb-httpfs.

This is a less clunky / slimmer version of
#18103, thanks to feedback from
@Mytherin.
@carlopi carlopi closed this Jul 1, 2025
@carlopi
Copy link
Contributor Author

carlopi commented Jul 1, 2025

Superseeded by #18107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant