Skip to content

Conversation

jeraki
Copy link
Contributor

@jeraki jeraki commented Nov 30, 2022

Fixes #730.

I took my best guess at how you'd want this change implemented and tested, but I'm happy to revise it based on feedback. Thanks!

@tarcieri
Copy link
Member

tarcieri commented Nov 30, 2022

A few thoughts on this:

  1. It'd be good to find the spec for these sorts of bracketed addresses in URIs, so we know we're implementing them correctly. I've never seen them before and I'm not sure what the behavior is supposed to be (e.g. are they IPv6 only or do they also work with IPv4? Do DNS names work?)
  2. The regex should match what the spec says about what's allowed inside the brackets, i.e. the inner capture group should be for e.g. an IPv6 address (or whatever else is allowed), and not just "anything"
  3. It'd be good to do this handling at the time the URI is processed (i.e. as early as possible) so we don't have bracketed host strings flying around that need to be stripped in various places. Perhaps inside of the constructor for lib/http/uri.rb?

@tarcieri
Copy link
Member

tarcieri commented Nov 30, 2022

Apparently the bracketed hosts are defined in RFC3896 § 3.2.2:

A host identified by an Internet Protocol literal address, version 6
[RFC3513] or later, is distinguished by enclosing the IP literal
within square brackets ("[" and "]"). This is the only place where
square bracket characters are allowed in the URI syntax.

So only IPv6 addresses should be allowed.

Here is a regex for IPv6 addresses:

https://gist.github.com/syzdek/6086792

(
([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|          # 1:2:3:4:5:6:7:8
([0-9a-fA-F]{1,4}:){1,7}:|                         # 1::                              1:2:3:4:5:6:7::
([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|         # 1::8             1:2:3:4:5:6::8  1:2:3:4:5:6::8
([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|  # 1::7:8           1:2:3:4:5::7:8  1:2:3:4:5::8
([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|  # 1::6:7:8         1:2:3:4::6:7:8  1:2:3:4::8
([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|  # 1::5:6:7:8       1:2:3::5:6:7:8  1:2:3::8
([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|  # 1::4:5:6:7:8     1:2::4:5:6:7:8  1:2::8
[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|       # 1::3:4:5:6:7:8   1::3:4:5:6:7:8  1::8  
:((:[0-9a-fA-F]{1,4}){1,7}|:)|                     # ::2:3:4:5:6:7:8  ::2:3:4:5:6:7:8 ::8       ::     
fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|     # fe80::7:8%eth0   fe80::7:8%1     (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|          # ::255.255.255.255   ::ffff:255.255.255.255  ::ffff:0:255.255.255.255  (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
([0-9a-fA-F]{1,4}:){1,4}:
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])           # 2001:db8:3:4::192.0.2.33  64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)

Alternatively you could speculatively attempt to parse the captured inner address as an IPAddr and verify it's an IPv6 address.

@jeraki
Copy link
Contributor Author

jeraki commented Dec 1, 2022

Thank you for the feedback and the useful references. I like the idea of moving this into HTTP::URI rather than needing to remember to strip the brackets at the point of use.

An open question for me is whether the bracket-stripped form should be part of HTTP::URI itself or whether it should be part of the underlying Addressable::URI. Consider that Addressable::URI expects the brackets for an IPv6 address:

bracketed = Addressable::URI.parse("http://[2606:2800:220:1:248:1893:25c8:1946]/foo?bar=baz")
bracketed.host # => "[2606:2800:220:1:248:1893:25c8:1946]"
bracketed.port # => nil

unbracketed = Addressable::URI.parse("http://2606:2800:220:1:248:1893:25c8:1946/foo?bar=baz")
unbracketed.host # => "2606:2800:220:1:248:1893:25c8"
unbracketed.port # => 1946

Should we maintain the bracketed host in the underlying Addressable::URI and have the host method on HTTP::URI strip the brackets, or should we mutate the Addressable::URI after construction to remove the brackets? I'm not sure if mutating the Addressable::URI to something it didn't parse as would violate some intended invariant.

I notice that currently host-related methods are just delegated to the Addressable::URI with def_delegators :@uri, :host, :normalized_host, :host=, so presumably we'd define real methods for the two getters that do the stripping rather than delegating directly. Unclear if the setter should be changed in any way.

@tarcieri
Copy link
Member

tarcieri commented Dec 1, 2022

@jeraki I would suggest changing HTTP::URI to eagerly strip the brackets from IPv6 addresses and caching it as a @host variable which the accessor methods can be changed to use

@jeraki
Copy link
Contributor Author

jeraki commented Dec 1, 2022

Using IPAddr looks cleaner than using our own regex. A nice property of it is that if you call to_s on an IPv6 IPAddr it strips the brackets for you:

ip_bracketed = IPAddr.new("[2606:2800:220:1:248:1893:25c8:1946]")
ip_bracketed.to_s # => "2606:2800:220:1:248:1893:25c8:1946"

ip_unbracketed = IPAddr.new("2606:2800:220:1:248:1893:25c8:1946")
ip_unbracketed.to_s # => "2606:2800:220:1:248:1893:25c8:1946"

So the logic for host could be:

def host
  begin
    ip = IPAddr.new(uri.host)
    return ip.to_s if ip.ipv6?
  rescue IPAddr::Error
  end
  uri.host
end

Also,Addressable::URI#to_s is used in HTTP::URI's to_s, to_str, and inspect methods, which don't strip brackets. I'm not sure what these are used for in practice, but given they're public methods it'd be a breaking change to do bracket-stripping in these methods. Maybe they don't need to be modified to match the bracket-stripping behavior of a new host method because there isn't a use case for them to change (and it wouldn't be necessary to fix #730), but it's unsatisfying to have inconsistency in how the host is represented between these methods.

Come to think of it, host is also already a public method via delegation, so changing host to strip brackets is also a breaking change. Should the bracket-stripping version be exposed as a new method with a different name?

@tarcieri
Copy link
Member

tarcieri commented Dec 1, 2022

I think it's fine to change the behavior. The current behavior is broken per #730, so changing it is a bugfix.

As to how to approach the setter and whether it should mutate the underlying Addressable::URI is a tricky question. I guess the setter should add brackets to the host being set on the underlying Addressable::URI if host= receives a valid IPv6 address? That is to say, it needs to update both @host and the inner URI, and the inner URI needs to be serialized with brackets.

@jeraki
Copy link
Contributor Author

jeraki commented Dec 2, 2022

Okay, I pushed up a new commit that does the stripping via HTTP::URI as we discussed. Let me know what you think.

RuboCop was reporting a violation because the top-level describe block in the spec has too many lines, but that seems like a useless metric in this case, so I added it to the exclusions along with all the other specs. It's probably worth just turning that rule off for the whole spec directory.

@jeraki
Copy link
Contributor Author

jeraki commented Dec 2, 2022

The only thing I'm still unsure about is whether to add logic to HTTP::URI#host= to make sure IPv6 addresses have brackets in the underlying Addressable::URI. In either case, the brackets will be stripped when accessing calling HTTP::URI#host or HTTP::URI#normalized_host, I'm just a little wary about unexpected effects from changing the internal representation. Addressable::URI.parse will only recognize an IPv6 address correctly if it's bracketed, and calling Addressable::URI#host on a value constructed this way will show the brackets in the string representation. But if you mutate it after the fact with Addressable::URI#host=, it's perfectly happy to accept an IPv6 without brackets, and subsequent calls to Addressable::URI#host will show it without the brackets, but that form couldn't have been instantiated correctly by the parse constructor.

@tarcieri
Copy link
Member

tarcieri commented Dec 3, 2022

@jeraki I mentioned that particular issue earlier.

I suggested the setters need to use the bracketed form of IPv6 addresses with Addressable::URI so the URLs serialize correctly, while internally caching the host with the brackets stripped.

That's another thing I think would be easier with eagerly updating instance variables, rather than trying to do it lazily.

I think if you have one private helper method, called from both the constructor and the setter, which extracts the instance variables out of Addressable::URI, you can have fairly clean logic for this, and also DRY out the logic for checking if an address is an IPv6 address.

…ying Addressable::URI when calling URI#host=.
@jeraki
Copy link
Contributor Author

jeraki commented Dec 16, 2022

Sorry for the delay. I pushed a commit to process host and normalized_host eagerly and to ensure the underlying Addressable::URI has brackets when host= is used. Let me know if you are okay with my implementation or if there's something you'd like changed. Thank you!

@tarcieri
Copy link
Member

Looks good now, thanks!

@tarcieri tarcieri merged commit 324f780 into httprb:main Dec 16, 2022
@jeraki jeraki deleted the strip-ipv6-brackets branch December 16, 2022 21:16
@jeraki
Copy link
Contributor Author

jeraki commented Dec 16, 2022

Thanks for your guidance! When you get a chance, could you publish a new gem that includes this change?

@tarcieri
Copy link
Member

Sure

@tarcieri tarcieri mentioned this pull request Dec 17, 2022
@cben
Copy link

cben commented Dec 22, 2022

Conceptually, it's weird having brackets in host. They are not part of an IPv6 address by itself; the brackets are merely part of the URL syntax for IPv6 (needed to distinguish the colons inside the address from :port number).

However, the Addressable gem makes the distinction that #hostname is the address by itself vs. #host is the bracketed-if-IPv6 part of the #authority portion of the URL.
With that context, this is consistent 👍.
(But if you can switch to dealing with hostname (?) it might simplify things)

@tarcieri
Copy link
Member

tarcieri commented Dec 26, 2022

@cben that sounds like a great point. Could one of you open a PR to switch to revert this and switch to #hostname?

Can't believe there are 3 different methods to get the hostname

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Aug 11, 2025
Changelog:
## [5.3.1] - 2025-06-09

### Changed

- Revert switch to the native llhttp on MRI, as it's not compatible with
  standalone bundles
  ([#802](httprb/http#802))


## [5.3.0] - 2025-06-09

### Added

- (backported) Add .retriable feature to Http
- (backported) Add more specific ConnectionError classes
- (backported) New feature: RaiseError

### Changed

- (backported) Drop depenency on base64
- (backported) Cache header normalization to reduce object allocation
- (backported) Use native llhttp on MRI


## [5.2.0] - 2024-02-05

### Added

- Add `Connection#finished_request?`
  ([#743](httprb/http#743))
- Add `Instrumentation#on_error`
  ([#746](httprb/http#746))
- Add `base64` dependency (suppresses warnings on Ruby 3.0)
  ([#759](httprb/http#759))
- Add `PURGE` HTTP verb
  ([#757](httprb/http#757))
- Add Ruby-3.3 support

### Changed

- **BREAKING** Process features in reverse order
  ([#766](httprb/http#766))
- **BREAKING** Downcase Content-Type charset name
  ([#753](httprb/http#753))
- **BREAKING** Make URI normalization more conservative
  ([#758](httprb/http#758))

### Fixed

- Close sockets on initialize failure
  ([#762](httprb/http#762))
- Prevent CRLF injection due to broken URL normalizer
  ([#765](httprb/http#765))

[unreleased]: httprb/http@v5.3.0...5-x-stable
[5.3.0]: httprb/http@v5.2.0...v5.3.0
[5.2.0]: httprb/http@v5.1.1...v5.2.0

## 5.1.1 (2022-12-17)

* [#731](httprb/http#731)
  Strip brackets from IPv6 addresses in `HTTP::URI`.
  ([@jeraki])

* [#722](httprb/http#722)
  Add `on_redirect` callback.
  ([@benubois])

## 5.1.0 (2022-06-17)

* Drop ruby-2.5 support.

* [#715](httprb/http#715)
  Set default encoding to UTF-8 for `application/json`.
  ([@drwl])

* [#712](httprb/http#712)
  Recognize cookies set by redirect.
  ([@tkellogg])

* [#707](httprb/http#707)
  Distinguish connection timeouts.
  ([@YuLeven])

## 5.0.4 (2021-10-07)

* [#698](httprb/http#698)
  Fix `HTTP::Timeout::Global#connect_ssl`.
  ([@tarcieri])

## 5.0.3 (2021-10-06)

* [#695](httprb/http#695)
  Revert DNS resolving feature.
  ([@PhilCoggins])

* [#694](httprb/http#694)
  Fix cookies extraction.
  ([@flosacca])

## 5.0.2 (2021-09-10)

* [#686](httprb/http#686)
  Correctly reset the parser.
  ([@bryanp])

* [#684](httprb/http#684)
  Don't set Content-Length for GET, HEAD, DELETE, or CONNECT requests without a BODY.
  ([@jyn514])

* [#679](httprb/http#679)
  Use features on redirected requests.
  ([@nomis])

* [#678](httprb/http#678)
  Restore `HTTP::Response` `:uri` option for backwards compatibility.
  ([@schwern])

* [#676](httprb/http#676)
  Update addressable because of CVE-2021-32740.
  ([@matheussilvasantos])

* [#653](httprb/http#653)
  Avoid force encodings on frozen strings.
  ([@bvicenzo])

* [#638](httprb/http#638)
  DNS failover handling.
  ([@midnight-wonderer])


## 5.0.1 (2021-06-26)

* [#670](httprb/http#670)
  Revert `Response#parse` behavior introduced in [#540].
  ([@DannyBen])

* [#669](httprb/http#669)
  Prevent bodies from being resubmitted when following unsafe redirects.
  ([@odinhb])

* [#664](httprb/http#664)
  Bump llhttp-ffi to 0.3.0.
  ([@bryanp])


## 5.0.0 (2021-05-12)

* [#656](httprb/http#656)
  Handle connection timeouts in `Features`
  ([@semenyukdmitry])

* [#651](httprb/http#651)
  Replace `http-parser` with `llhttp`
  ([@bryanp])

* [#647](httprb/http#647)
  Add support for `MKCALENDAR` HTTP verb
  ([@meanphil])

* [#632](httprb/http#632)
  Respect the SSL context's `verify_hostname` value
  ([@colemannugent])

* [#625](httprb/http#625)
  Fix inflator with empty responses
  ([@LukaszMaslej])

* [#599](httprb/http#599)
  Allow passing `HTTP::FormData::{Multipart,UrlEncoded}` object directly.
  ([@ixti])

* [#593](httprb/http#593)
  [#592](httprb/http#592)
  Support informational (1XX) responses.
  ([@ixti])

* [#590](httprb/http#590)
  [#589](httprb/http#589)
  Fix response headers paring.
  ([@Bonias])

* [#587](httprb/http#587)
  [#585](httprb/http#585)
  Fix redirections when server responds with multiple Location headers.
  ([@ixti])

* [#581](httprb/http#581)
  [#582](httprb/http#582)
  Add Ruby 2.7.x support.
  ([@janko])

* [#577](httprb/http#577)
  Fix `Chainable#timeout` with frozen Hash.
  ([@antonvolkoff])

* [#576](httprb/http#576)
  [#524](httprb/http#524)
  **BREAKING CHANGE**
  Preserve header names casing.
  ([@joshuaflanagan])

* [#540](httprb/http#540)
  [#538](httprb/http#538)
  **BREAKING CHANGE**
  Require explicit MIME type for Response#parse
  ([@ixti])

* [#532](httprb/http#532)
  Fix pipes support in request bodies.
  ([@ixti])

* [#530](httprb/http#530)
  Improve header fields name/value validation.
  ([@Bonias])

* [#506](httprb/http#506)
  [#521](httprb/http#521)
  Skip auto-deflate when there is no body.
  ([@Bonias])

* [#489](httprb/http#489)
  Fix HTTP parser.
  ([@ixti], [@fxposter])

* [#546](httprb/http#546)
  **BREAKING CHANGE**
  Provide initiating `HTTP::Request` object on `HTTP::Response`.
  ([@joshuaflanagan])

* [#571](httprb/http#571)
  Drop Ruby 2.3.x support.
  ([@ixti])

* [3ed0c31](httprb/http@3ed0c31)
  Drop Ruby 2.4.x support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requests to an IPv6 address result in: HTTP::ConnectionError (failed to connect: getaddrinfo: Name or service not known)
3 participants