Skip to content

Conversation

mrdaybird
Copy link
Contributor

@mrdaybird mrdaybird commented Sep 8, 2024

Fixes #3300, Fixes #3294 Fixes #3301

Firstly, this PR is not complete, still I have put it up to get the discussion started.

This PR tries to achieve the following:

  1. Allow multiple streams per HTTP2Connection. (solves Allow HTTP/2 connection and socket to be re-used for future requests #3301)
  2. Add support for read and stream. Add support for streaming HTTP/2 responses #3300
  3. Decoding the response data. Move response body handling into HTTP2Response #3294

I have merged these things together into because they are kind of interdependent and gives an idea about the whole picture, like how the API will look like.

Changes proposed by this PR ⇒

  1. Adds one new object called HTTP2Stream, which manages the state of a stream. (similar to idea begin by @pquentin in Move response body handling into HTTP2Response #3294)
  2. Since a HTTP2Connection is now associated with multiple streams, the methods putrequest, endheaders, send and getresponse only have meaning in context of a stream. So, I suggest they should not be part of public API. Thus,
    1. putrequest is changed to _putrequest + a new stream_id parameter
    2. endheaders is changed to _endheaders + a new stream_id parameter
    3. send is changed to _send_stream + a new stream_id parameter
    4. getresponse is removed, and moved to HTTP2Stream
    5. a new _read_stream that reads from socket and handles h2 events

To give you a little sneak peek, here is a working example of sending two request using single HTTP2Connection(HTTP/2 is awesome!) ⇒

from urllib3.http2 import inject_into_urllib3
from urllib3.http2.connection import HTTP2Connection

inject_into_urllib3()

conn = HTTP2Connection('www.nghttp2.org', 443)
conn.connect()

stream1 = conn.request("GET", "/httpbin/json")
stream2 = conn.request("GET", "/httpbin/gzip")

res1 = stream1.getresponse()
res2 = stream2.getresponse()

print(res1.data)
print(res2.data) # it also decompresses the gzip!

Important thing to note is that 1. request returns a HTTP2Stream and 2. getresponse needs to be called on the HTTP2Stream to get a HTTP2Response

TODO:

  • Support multiple streams
  • Addread method
  • Add stream method
  • Add tests for all of the above
  • Fix old tests

@pquentin @sethmlarson

Since there are several substantial changes, therefore before moving on with this PR, I wanted to know your opinion of it!
If you want my explanation for any design/code decisions, please ask!

@pquentin
Copy link
Member

pquentin commented Sep 9, 2024

Thank you for starting this! I haven't looked at the code yet, only at your original description.

I have merged these things together into because they are kind of interdependent and gives an idea about the whole picture, like how the API will look like.

We should try to work on as small units of work as possible. If the existing issues don't work well, we can create new ones with a different split. That said, yes, we need to agree on the API itself.

Adds one new object called HTTP2Stream, which manages the state of a stream. (similar to idea begin by @pquentin in #3294)

Before we move forward with this, I'd like to have feedback from @sethmlarson on this HTTP2Stream class.

Important thing to note is that 1. request returns a HTTP2Stream and 2. getresponse needs to be called on the HTTP2Stream to get a HTTP2Response

We need to keep the existing APIs. In other words, we need request() to return an HTTP2Response. When created, this response class can take ownership of an existing HTTP2Stream.

@mrdaybird
Copy link
Contributor Author

mrdaybird commented Sep 9, 2024

Thanks for the reply! @pquentin

We need to keep the existing APIs. In other words, we need request() to return an HTTP2Response. When created, this response class can take ownership of an existing HTTP2Stream.

The existing request method in HTTPSConnection does not return anything, but the getresponse method returns the HTTP2Response. This doesn't quiet work for HTTP/2 because HTTP/2 can fire up multiple requests but defer the receiving the responses for later. If request does not return anything, we have no way of associating a request with its response!

We should try to work on as small units of work as possible. If the existing issues don't work well, we can create new ones with a different split.

You are right, I should split up the work. I merged them because it was easier to test things out after everything was in-place. Also, I suggest the following changes to the existing issues:

  1. Allow HTTP/2 connection and socket to be re-used for future requests #3301 can be changed to 'Add support for HTTP/2 multiplexing using streams'. The original issue suggested that the connection be reused for multiple request, in HTTP/2 this can be achieved using streams.
  2. Add support for streaming HTTP/2 responses #3300 suggests adding support for read and stream methods, but I think when implementing multiplexing (point 1), we have to implement read, thus most of the work for this issue can be done in the previous issue. The only thing that is left is stream method, which is a wrapper over read method.
  3. Move response body handling into HTTP2Response #3294 suggests that the data processing, which I think means receiving from socket + decoding, be moved to HTTP2Response. But the events received over socket will contain event for multiple streams, thus the reading socket + h2 event loop should happen within the connection. Thus, the only part left is decoding the data.

This also means that issue in 1 is more work and of higher priority than issue in 2 and 3, mainly because a lot of stuff for 2 and 3 gets done in 1. Maybe the bounties should be readjusted as well!

@sethmlarson
Copy link
Member

Being brief: I was imagining a one-to-one mapping of HTTP2Connection to stream ID and then relying on a global registry of sockets/HTTPStream-like objects so that we can continue using the existing HTTPConnection.getresponse() -> HTTPResponse API. Let me know if you need further clarification about my design ideas up to this point.

@hold7door
Copy link

hold7door commented Sep 13, 2024

Being brief: I was imagining a one-to-one mapping of HTTP2Connection to stream ID and then relying on a global registry of sockets/HTTPStream-like objects so that we can continue using the existing HTTPConnection.getresponse() -> HTTPResponse API. Let me know if you need further clarification about my design ideas up to this point.

So, in this approach will there will be one request active at a time in a connection. Since, each stream handles one request and its associated response.

@sethmlarson
Copy link
Member

@hold7door Correct! That's already the case for HTTP/1.1 since we don't support pipe-lining.

@mrdaybird
Copy link
Contributor Author

@sethmlarson I was wondering about how the multiplexing for HTTP/2 will look like in your design? Is it something like, one TCP connection/Socket is associated with multiple HTTP2Connection object. Also, where will the socket data(h2 events) be handled?

@sethmlarson
Copy link
Member

sethmlarson commented Sep 13, 2024

@mrdaybird Multiplexing is handled by the HTTPSConnectionPool, similar to how you need to use this interface for multiplexing in HTTP/1.1.

Also, where will the socket data(h2 events) be handled?

Socket data and queues of events would be handled in the global HTTP2Stream/socket registry. HTTP2Connections only handle events and exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants