-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Experimentation around support for HTTP/1.1, HTTP/2 and HTTP/3 outside of http.client #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Firstly, thank you for the work you've done here. It's exciting to see the next steps towards HTTP/2 and HTTP/3 support in urllib3 that we've made possible with the v2.0 work! 🚀 I have some high level thoughts on direction:
Maybe this is something you and @mila can collaborate on to reach the right home for whatever changes are necessary? Lastly, I think it'd be useful to break this work down into individual issues so everything can be tracked and worked on by others concurrently. For me, the most important part of this work is being able to reuse our high-level test suite to test HTTP/1.1, 2 and 3 with the same level of assuredness. Maybe this can be the first task and we can write up more? Perhaps this is the right feature to write up in #3000 :) |
@Ousret I've changed #3000 to be a high-level tracking task for HTTP/2 support specifically. I think we should focus on supporting HTTP/2 and then try to tackle HTTP/3 since most of the fundamental changes will have to happen for both protocols and HTTP/2 gets us a long ways already. Take a look at the list I've written there already and let me know what you think. |
Hello @sethmlarson Yes, seeing it actually work is pretty compelling. So, let's talk about your initial thoughts.
hface requires all of that and more, so it is much worse than the foremost. starting this PoC, I already knew the dependency footprint would matter (a lot). Everything depends upon what you expect to release. Our minds did not see the same things, Why unify? To ease the protocol upgrade for example. -- Not having too much code when alternatives arrive and/or newer protocol surfaces (which will undoubtedly happen). Also, that is why I am considering not enforcing aioquic as a dependency due to the footprint and potential issue with crypto and built whl at setup. what I drafted graph TD;
HttpLib-->BaseHttpConn;
HfaceExt-->BaseHttpConn;
BaseHttpConn-->BaseHttpsConn;
What I think I understood from your comment? graph TD;
HttpLib-->BaseHttpConn;
BaseHttpConn-->BaseHttpsConn;
h11-->BaseHttpConn;
BaseHttpConn-->BaseHttpsConn;
h2-->BaseHttpsConn;
h3-->BaseHttpsConn;
There is a need to sit down and carefully study impact on the complexity and sustainability.
I wanted to ease the migration for the end user. Having to tell them to patch their code brings friction. Of course, if you want this in contrib, I am okay with this.
Honestly don't know. I would not do threads in urllib3, given the risks.
Most likely won't happen, I removed as many things as possible to keep it lightweight. including dependencies, we don't need.
OK.
I agree.
Thanks, I will look into it at my early. |
Just to be clear of expectations, I don't think this PR is going to be reviewable as-is (so if it's an experiment that's okay!). We've tried giant PRs like this before and they always peter out, I think a better approach is to work incrementally starting with the most critical part first. Following that thinking, getting the test suite working with HTTP/2 first before starting on HTTP/2 client support at all (even using curl w/ HTTP/2 support to verify that our test suite is able to verify requests!) seems like the best option forward.
I think I also would like a unified HTTPSConnection class that supports both HTTP/2 and HTTP/1.1, but I think that the HTTP/1.1 support could use http.client instead of h11. Most servers these days support HTTP/2 so the investment into HTTP/1.1 should be minimal, let's focus on the value add of HTTP/2
I think having this logic all in contrib is the way to go. Everything should either be in contrib or in h2.
This is fine, let's start without threads. This is all very exciting, let me know if you need assistance. Also I think we should create a bunch of separate issues and bounties for this work in order to reward you and also encourage involvement from other contributors. |
I've created this issue for our test suite: #3038 |
It's great to see that hface motivated somebody to look at h2 and h3 support in urllib3. It would be great if I were able to made hface more stable, but priorities at Akamai changed, so I'm not working on that now. I agree with @sethmlarson that small steps one at a time would be better. If I thought that hface can be integrated to urllib3 in a week or two of full-time work, I would try to do that. But I am afraid that this i would be much more work. |
Yes, I never expected this to be merged, ever. This is a proof of concept that is public knowledge.
..and so on...
Just an opinion: I highly discourage this. It would produce a hazardous code base, Sans-IO mix with IO bound, that is most likely more difficult to implement and test. Using the abstraction behind hface makes a harmonized code base for all three protocols.
OK by me. Hello @mila Indeed, I found hface in a non-working state, which is why I forked the project. I am working on a highly slimmed version that solely focuses on clients. This (forked) module may or may not find itself in urllib3.contrib. Regards, |
update: so, now, I am near the end of having solid proof of concept. I will be able to publish a full report soon and potentially archive this so that we can move on to the next steps. on requests
Of the 4 failures, three are subject to discussion, one is caused by this incomplete PoC.
That test fail because
Headers are decoded using
The initial premise push an incomplete headers frame and the state machine behind does not produce an on urllib3 Those tests are no longer valid (identified so far)
The sub-dependencies cannot proceed without a valid HTTP version/svn.
The sub-dependencies cannot proceed without a valid |
🎉now, requests downstream tests pass with this experiment. The test |
Some details that must be studied for impacts:
so h2 and h3 no longer stream in plain, encapsulation, and complex format makes it impossible to expect sock.read(__amt) to output len(data/body) == __amt.
so, the order is preserved, but the case (original) won't be guaranteed in response.
Coverage |
Awesome progress! 🚀
Hmm, I would like to dig into the feasibility of this more. If we're only supporting HTTP/2 when it occurs from ALPN there's less mixing of the two protocol implementations, am I missing something?
Could you open a separate issue with this information (and separately any other behaviors you want to discuss whether they are bugs/not) so they can be triaged?
We created a memory stream buffer to achieve this property for decompression, I think we can use a similar mechanism for HTTP/2? I would like to preserve this feature of
I'm curious about this one, is this due to the above constraint or something else? We shouldn't need access to the socket directly, only calls to
Yeah order is the only thing that matters here, HTTP/2 forces lowercase so this makes sense.
No worries about tunneling over HTTP/2 or HTTP/3, I think we should keep the proxy code simple and focus on HTTP/2 for origins first so tunnel CONNECT can use HTTP/1.x as is done today. |
Yes, there is some point missed. Let me propose some heavy pros:
Already done. Will do it anytime I think this can lead to constructive exchanges.
Yes! Saw that, actually, yes this is a non-issue. Just a note.
This is also a non-issue, the original author did know in advance that this would die eventually. And in case direct access isn't available, fallback on
So we are on the same page, |
update This PR is now a valid proof of concept. TestsAll tests pass (target: h2n3)
I only had to disable 10!
CoverageThe coverage reached 100% again. While it does not guarantee stability, it is a firm good starting point. Fake serverWith #3038 uncertain outcoming, I decided to take another road, for this experiment. So, the server is completely written in Golang, it could permit us to detect anomalies that would not be found if we constrained ourselves in the same dependencies in and out as stated by @pquentin rightfully. I pushed a modest number of tests to verify basic functionalities and reach an acceptable coverage. nb. There is two Traefik instance, it is done on purpose. Only one of them (non-alt) supports HTTP/3 over QUIC. I could write a dedicated Github Action if that is desirable. Python 3.12Failures are unrelated. I despatched a specific issue to track progress on it. GHATwo things: concurrency:
group: ci-${{ github.ref_name }}
cancel-in-progress: true It's better to enable this, so we can pollute less. We let CIs run when there's no reason to. continue-on-error: ${{ matrix.experimental }} Does not work as intended. Tried to enable debug run on the fork repo without success. So what's next?The last thing to produce would be :
|
update PerformanceA brief glimpse at early performance from that experiment without any fine-tuning. 10 thousand requests to the Golang server Traefik + GoHttpbin. The above scenario ran multiple times of course. in average
Python 3.11, Docker Alpine Note that the Golang server always responds in less than 1 ms. No timeout, bad response, or request, and no crash is to be reported. Memory usage
nb. here hface ships with all three protocols' support, and does not import lazily. For more details, look at the tree posted in the first post of this thread. Python 3.11, Docker Alpine OpenSSL, aioquicSo, OpenSSL does not ship with what is required for QUIC. That's what it is. My rough estimation on when Python may ship with QUIC support TLS natively is one and a half years to two full years starting now.
To be fair, aioquic does a pretty good job at offering support for QUIC and H3. They can drop external SSL once its support is available, thus making the dependencies optional for the latest interpreter version. What can be done today at aioquic to better the experience:
FinallyI expected those numbers, it was easily foreseeable. I don't think that I will do the massive real-world test for a while. I purposely set a weird version in the fork, in case...you'd get unsolicited support. |
update statistics from requests, fetching a simple resource from Cloudflare static cdn with TLS 1.3. Using sufficient samples for stabilization without getting banned from the server. By a burst of 256 requests, repeated multiples times. here, Handshake Included always means the initial TCP/TLS connection. (for h3 the two firsts -- h2/h11,h3)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub insists that I review this given the changes in CI, let's see if a comment is enough to make it happy.
Now basically covered everything. wrote tests that prove it using dummy proxy (local thread) + traefik + httpbin containers. |
Final notes on my experimentI consider that my experiment in #3030 reached its final form. The main objectives of this experiment were to identify as broadly as possible concerns, issues, or design complications in an attempt
All of that while remaining in a synchronous context. 👋 Farewell http.client?Yes..!
1️⃣ Unify all protocolsIntroductionWe just cannot integrate directly the three underlying libraries as-is. Akamai, through @mila, prepended a hackable interface client and server that follow the Sans-IO guidelines. Unfortunately, the project is unmaintained, not working, and contains too many things that we do not need. Fork and DivergenceYou have properly guessed it alright from the introduction. I forked the project located at https://github.com/akamai/hface to The complete changeset is available at https://github.com/Ousret/urllib3-ext-hface/blob/main/CHANGELOG.rst
In a nutshellSans-IO Common Interface
from urllib3_ext_hface import HTTPProtocolFactory, HTTP1Protocol, HTTP2Protocol, HTTP3Protocol
protocol = HTTPProtocolFactory.new(HTTP2Protocol)
# ...
protocol.bytes_to_send() # Out to your network I/O implementation
# ...
protocol.bytes_received() # Whatever comes from your network I/O implementation
# ...
protocol.next_event() # Output the next event in order that the state-machine generated In my own implementation of that work, you may find three major concepts applied, find them at:
shows how to handle the IN/OUT in a synchronous context shows how to initialize protocol instance EventsEach underlying libraries have its own set of yield events, so the module has to propose another common set (e.g. translated internally)
In those, Port of the moduleMaintainers expressed an interest in having this module ported in Why keeping a separate package is a good idea for now?
*forcing that change onto them: even by enabling HTTP/2 through the optional So if the separate package is still not an option:
Downside to this is how limited this would impact people, as requests won't do anything as per the indefinite feature freeze.
Downside to this also limits its range, not many people are comfortable with it. 🏗️ ConceptionLet's see together the possible outcome of integrating said module. I do not expose the possibility of hybrid http.client with h2 and h3 as discussed in (why drop http.client?).
This is by far the implementation I most dislike because it causes havoc and complicates the code paths.
We should not care about it (the anti-pattern) as QUIC will in the foreseeable future be mainstream in Python.
That is going to take so much time and is going to be more uncertain than the other two. Then I created and propagated two arguments:
A set of
It's a Finally For security considerations I purposely disabled hostname swapping when interpreting the Alt-Svc. ✅ TestsToday urllib3 ships with more than 1.7k test cases, a non-negligible part concerning the HTTP 1.1 protocol at the socket level. So a discussion was started at #3038 My conclusion is as follows:
Why did you ask? It is impossible to guarantee backward compatibility with the previous hard-coded plain socket tests. Why you asked--again? I increasingly felt like I am writing whole over the test suite from
We should keep the existing test suite to ensure no extravagant breakage is inserted in flight,
I drafted an interesting test scenario (so I think) that set up a Traefik instance linked to an httpbin (all written in Golang) that permitted me to find many interesting cases. Which would not have been the case otherwise.
👀 ObservationsWe assess impacts based on the test suite. ⚡ Breaking changesThe most notable item in the menu is that underlying libraries (meaning h11, h2, and aioquic) are stricter than what the general mass is used to.
So as I stated in #3045 urllib3 must either document that you will have to catch more often protocol error or be stricter itself. As discussed with the maintainer, the header name case does not matter only the order does. 📝 NotableIn this section, we can invoke that:
⏩ PerformanceAs you can see on preliminary results in #3030 all three implementations perform nearly as fast as http.client. ⛓️ Dependency chainWhen this adventure started, we had the following chain: graph TD;
urllib3-->urllib3-ext-hface;
urllib3-ext-hface-->h11;
urllib3-ext-hface-->h2;
urllib3-ext-hface-->aioquic;
aioquic-->pylsqpack;
aioquic-->pyopenssl;
pyopenssl-->cryptography;
aioquic-->certifi;
h2-->hyperframe;
h2-->hpack;
In that chain,
Fortunately!
The original author seems to take a break, let's wait for its return (or not). I made a fork that complies with expectations. To finally render: graph TD;
urllib3-->urllib3-ext-hface;
urllib3-ext-hface-->h11;
urllib3-ext-hface-->h2;
urllib3-ext-hface-->qh3;
qh3-->cryptography;
h2-->hyperframe;
h2-->hpack;
🎉 More reasonable isn't it? There is more. I patched the edit: made the whl from 1 MiB+ to ~200KiB and dropped the need for statically linked OpenSSL developments res. 🗓️ aioquic futureSo, OpenSSL does not ship with what is required for QUIC. That's what it is. My rough estimation on when Python may ship with QUIC support TLS natively is one and a half years to two full years starting now.
To be fair, aioquic does a pretty good job at offering support for QUIC and H3. They can drop external SSL and QUIC once its support is available, thus making the dependencies optional (cryptography) for the latest interpreter version. For now, using my proposed changes, it is acceptable as-is. ✋ ConclusionSo this reach the end of my journey assessing, and experimenting toward said goals. I am willing to help and collaborate with the rest. Get in touch directly. The fork works correctly for the usage I do personally and at work. I can maintain the fork pending official support within urllib3. Good luck to anyone tackling the rest, partially or in full. ➕ Bonus noteI drafted a solution that takes more advantage of the multiplexed aspect of HTTP/2 and 3. But that requires that to pass first. With streams, we can send as many requests as we want (provided we do not run out of stream ids) and then wait for responses, After getting a HandshakeCompleted event, we set the socket to non-blocking (sendall to send). What I did, is make the HttpLibProxyResponse completely lazy. Meaning it does not load anything from the response until accessed (even headers, and status code). The function/method inside Immediately I get a robust performance increase. But.. tests (in a small portion started to fail consistently) It can be fixed, but unfortunately too tired to fix them. So this is more of an idea shared. Async does not have to be the answer. That's all, folks! |
Detach ourselves from the http.client package. This requires a separate package to be installed, namely `urllib3-ext-hface` for the Sans-IO interfaces.
Thank you for the thorough analysis @Ousret, we'll have to spend some time picking apart your comment and creating individual issues that we can assign bounties to and others can start working on. |
Closed due to jawah#46 (comment) |
Disclaimer:
Introduction
This PR experiment the possibility for urllib3 to allow detaching ourselves from the
http.client
package.The following add support to the final specification (RFCs) of HTTP/1.1, HTTP/2, and HTTP/3 over QUIC.
Are you curious about the results? Install that experimental patch by doing the following:
The how
I felt it is time to try moving away from the
httplib
while remaining the default.I know that
async
has a certain appeal and usefulness. But let's face it, sync libs are here to stay (numbers speak for themselves github depends, stats google dataset, etc..), so we might as wellimplementing the newest protocol in the sync enclosure.
I saw the attraction around the
hface
library posted on the Akamai GH organization and saw an opportunity.Immediately I decided to fork the project because:
So the project name is
urllib3-ext-hface
and is located at https://github.com/Ousret/urllib3-ext-hface (for now)Backward compatibility
Initially seeing the current implementation that relies on
http.client
was scary. I decided to make the alternative implementation follow as closely as possible the HttpConnection behavior so that the rest of the library would remain stable.Protocol upgrade
For the time being, this implementation act as follow:
h2
, otherwise HTTP/1.1And for HTTP/3?
Remaining tasks
raising NotImplementedErrorMigrate ciphers TLS to DTLS if supportedMaking aioquic optional?no need in that experimentGlimpses
from urllib3