-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: supplying and using asmap to improve IP bucketing in addrman #16702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
FWIW, the asmap generated from https://dev.maxmind.com/geoip/geoip2/geolite2/ is 988387 bytes in size. |
Encoding the map as bool[] array in the source code will add to the executable 4 bytes per bit in the map, that's a bit excessive. You should probably encode it as a encode it as a uint8_t[] instead (with 8 bits per array element). |
@sipa Would you mind providing the python script used to generate the asmap? I think Bitcoiners like @TheBlueMatt and myself who maintain their own BGP full table view of the Internet would like to generate their own asmap instead of trusting maxmind, or at least verify it. |
Concept ACK |
@wiz The script is here: https://gist.github.com/sipa/b90070570597b950f29a6297772a7636 though we need more tooling to convert from common dump formats, sanity check, ... I'll publish some when they're a bit cleaned up and usable. |
Thanks for the script, it helps a lot with my own implementation and I also wanted to point out a few potential attack vectors to circumvent the asmap protection:
There are several ASN ranges which can be used as originating ASNs without registration or verification and could be used to circumvent the asmap protection. For example I could announce one /24 from 65000, one /24 from 65001, and so on, all behind my actual ASN, in order to launch an Erebus attack. I propose that we map any IP blocks originating from the following invalid ASNs to the next-valid ASN upstream of it:
There are actually around 50 routing prefixes on the Internet that legitimately have multiple originating ASNs, so it's not always a strict 1:1 mapping. For example, a valid use case could be an ISP that aggregates multiple customer /29 or /30 prefixes into a single /24 announcement (since /24 is the smallest globally routable prefix size that would be generally accepted) and my router indicates all of the multiple originating ASNs in curly braces like this:
Of course this could be used to circumvent the asmap protection as well, so for this case I also propose we instead identify the upstream aggregating ASN as the originating ASN. For my implementation I'm simply going to use a regex to strip out the curly braces aggregation to ignore it. |
@wiz Very good points!
Besides using private ASN ranges another obvious attack vector would be to make use of non-reserved but unused or unallocated AS-numbers as a faux downstream for a specific attacker controlled prefix: Assume that An attacker The global view would hence be:
Traffic to Assuming that How can we guard against this attack? To guard against an attacker making use of non-IANA allocated ASNs as faux specific attacker controlled prefix:Instead of blacklisting known reserved AS numbers, we could be stricter and apply a whitelisting approach: allow only AS number ranges that have been been allocated to the five Regional Internet Registries (AFRINIC, APNIC, ARIN, LACNIC and RIPE) by IANA. That data is available in machine readable form. To guard against an attacker making use of IANA allocated but RIR unallocated ASNs as faux downstream for a specific attacker controlled prefix:I believe all five RIR:s provide machine readable lists of the individual ASNs they have assigned to organisations. (Note that this data comes with the country code for the organisation that has been assigned the AS-number. That could be handy if we some time in the future would like to implement logic for avoiding country or region based netsplits. If incorporated in the To guard against an attacker making use of IANA allocated and RIR allocated ASNs as faux downstreams for a specific attacker controlled prefix:This is harder to guard against. Perhaps we could require that a prefix-to-ASN mapping has been stable over say Another approach would be to analyse a full routing table when creating the |
Yeah, the AS map mitigation may be flawed. How about requiring everyone to also have a few Tor peers, presumably bypassing the network partition attack? |
Indeed. ASN mappings are not a foolproof solution, but they're better than just using /16s (after all, there are lots of unused /16s you could announce if you wanted to). Ultimately some monitoring and building up filtering lists over time as we observe malicious behavior may improve things, but, indeed, ensuring redundant connectivity is the only ultimate solution. Once #15759 lands, I'd really like to propose a default of 2 additional blocksonly Tor connections if Tor support is enabled (see-also https://twitter.com/TheBlueMatt/status/1160620919775211520, in which someone suggested their ISP was censoring Bitcoin P2P traffic, and only after setting bitcoind to Tor-only did it manage to connect). |
One thing we can play with after we build an initial table is to look at the paths, instead of looking only at the last ASN in the path. eg if, from many vantage points on the internet, a given IP block always passes from AS 1 to AS 2, we could consider it as a part of AS 1 (given it appears to only have one provider - AS 1). In order to avoid Western bias we'd need to do it across geographic regions and from many vantage points (eg maybe contact a Tier 1 and get their full routing table view, not just the selected routes), but once we get the infrastructure in place, further filtering can be played with. |
92528c2 Support serialization of std::vector<bool> (Pieter Wuille) Pull request description: This adds support for serialization of `std::vector<bool>`, as a prerequisite of #16702. ACKs for top commit: MarcoFalke: ACK 92528c2 (only looked at the diff on GitHub) practicalswift: ACK 92528c2 -- diff looks correct jamesob: ACK 92528c2 Tree-SHA512: 068246a55a889137098f817bf72d99fc3218a560d62a97c59cccc0392b110f6778843cee4e89d56f271ac64e03a0f64dc5e2cc22daa833fbbbe9234c5f42d7b9
a944e38
to
0921912
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. It makes sense that this is opt-in for now, and I like that it can rebucket if the map changes, so it doesn't have to be perfect.
Can you get rid of the merge (and oops) commit? A rebase is much easier to review commit-by-commit.
I'd like to try this out, but I need something to feed @sipa's script with (and then pass that into -asmap=<file>
). Is there an easy incantation to convert the geolite IPv4/IPv6 ASN tables? Or to produce it myself from a VPS?
Do we need to reconsider the number of buckets and their size?
d688a65
to
712d3d7
Compare
a57b6e3
to
37e37c0
Compare
merge bitcoin#17812, bitcoin#16702, bitcoin#16730, bitcoin#18023: supplying and using asmap to improve IP bucketing in addrman
16791f2 CMakeLists tests: add raw files generation. (furszy) 672d9a2 init: move asmap code earlier in init process (Jon Atack) 65cd143 net: extract conditional to bool CNetAddr::IsHeNet (Jon Atack) 2fc1f37 logging: asmap logging and #include fixups (Jon Atack) 0c9efb8 test: add functional test for an empty, unparsable asmap (Jon Atack) 6545656 config: separate the asmap finding and parsing checks (Jon Atack) 618b8d1 config: enable passing -asmap an absolute file path (Jon Atack) 8c7bdbe config: use default value in -asmap config (Jon Atack) de39fab test: add feature_asmap functional tests (Jon Atack) 4290d3f Make asmap Interpret tolerant of malicious map data (Pieter Wuille) e527e04 Use ASNs for mapped IPv4 addresses correctly (Pieter Wuille) 9a28bc0 Mark asmap const in statistics code (Pieter Wuille) 868a6ed Avoid asmap copies in initialization (Pieter Wuille) cb698fb Add extra logging of asmap use and bucketing (Gleb Naumenko) 2fe5a05 Return mapped AS in RPC call getpeerinfo (Gleb Naumenko) ce7aa15 scripted-diff: Replace NET_TOR with NET_ONION (wodry) 4c3ae7d Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko) 718f1df CAddrManTest: remove redundant MakeDeterministic call. (furszy) fd51941 Tests: address placement should be deterministic by default (René Nyffenegger) 8d01cbd Add asmap utility which queries a mapping (Gleb Naumenko) e986ed0 CAddrMan::Deserialize handle corrupt serializations better. (Patrick Strateman) d2a8baf addrman.h: CAddrInfo inline members default values, plus several typos corrected. (furszy) a7b9fd9 refactor: Use uint16_t instead of unsigned short (furszy) Pull request description: Decoupled from #2411, built on top of #2479. Probably the last decouple from the "road to Tor" work. Focused on porting the ASN nodes bucketing functionality. The hearth of this work is bitcoin#16702. Providing an asmap file that contains the IP->ASN mapping, nodes will be bucketed by AS they belong to, in order to make impossible for a node to connect to several nodes hosted in a single AS. This is done in response to Erebus attack, but also to generally diversify the connections every node creates, especially useful when a large fraction of nodes operate under a couple of cloud providers. #### List of PRs: * bitcoin#7932 * bitcoin#10765 * bitcoin#13532 * bitcoin#13575 * bitcoin#16702 * bitcoin#17812 * bitcoin#18023 * bitcoin#19314 PRs for a follow up PR: * bitcoin#18029 * bitcoin#18512 ACKs for top commit: random-zebra: re-utACK 16791f2 Fuzzbawls: ACK 16791f2 Tree-SHA512: 1452af87d693526d3359822845bbd6211578b5c7c69d740d19c8c3ee25c66fd6e130f4421066a8f5384d62f65a2754423c633f90d7e3d809f4f1cc00c3c956ba
ecde04a [Consensus] Bump Active Protocol version to 70923 for v5.3 (random-zebra) b63e4f5 Consensus: Add v5.3 enforcement height for testnet. (furszy) f44be94 Only relay IPv4, IPv6, Tor addresses (Pieter Wuille) 015298c fix: tor: Call event_base_loopbreak from the event's callback (furszy) 34ff7a8 Consensus: Add mnb ADDRv2 guard. (furszy) b4515dc GUI: Present v3 onion addresses properly in MNs list. (furszy) 337d43d tests: don't export in6addr_loopback (Vasil Dimov) 2cde8e0 GUI: Do not show the tor v3 onion address in the topbar. (furszy) 0b5f406 Doc: update tor.md with latest upstream information. (furszy) 89df7f2 addrman: ensure old versions don't parse peers.dat (Vasil Dimov) bb90c5c test: add getnetworkinfo network name regression tests (Jon Atack) d8e01b5 rpc: update GetNetworksInfo() to not return unsupported networks (Jon Atack) 57fc7b0 net: update GetNetworkName() with all enum Network cases (Jon Atack) 647d60b tests: Modify rpc_bind to conform to bitcoin#14532 behaviour. (Carl Dong) d4d6729 Allow running rpc_bind.py --nonloopback test without IPv6 (Kristaps Kaupe) 4a034d8 test: Add rpc_bind test to default-run tests (Wladimir J. van der Laan) 61a08af [tests] bind functional test nodes to 127.0.0.1 Prevents OSX firewall (Sjors Provoost) 6a4f1e0 test: Add basic addr relay test (furszy) 78aa61c net: Make addr relay mockable (furszy) ba954ca Send and require SENDADDRV2 before VERACK (Pieter Wuille) 61c2ed4 Bump net protocol version + don't send 'sendaddrv2' to pre-70923 software (furszy) ccd508a tor: make a TORv3 hidden service instead of TORv2 (Vasil Dimov) 6da9a14 net: advertise support for ADDRv2 via new message (furszy) e58d5d0 Migrate to test_large_inv() to Misbehaving logging. (furszy) d496b64 [QA] fix mininode CAddress ser/deser (Jonas Schnelli) cec9567 net: CAddress & CAddrMan: (un)serialize as ADDRv2 Change the serialization of `CAddrMan` to serialize its addresses in ADDRv2/BIP155 format by default. Introduce a new `CAddrMan` format version (3). (furszy) b8c1dda streams update: get rid of nType and nVersion. (furszy) 3eaa273 Support bypassing range check in ReadCompactSize (Pieter Wuille) a237ba4 net: recognize TORv3/I2P/CJDNS networks (Vasil Dimov) 8e50853 util: make EncodeBase32 consume Spans (Sebastian Falbesoner) 1f67e30 net: CNetAddr: add support to (un)serialize as ADDRv2 (Vasil Dimov) 2455420 test: move HasReason so it can be reused (furszy) d41adb4 util: move HasPrefix() so it can be reused (Vasil Dimov) f6f86af Unroll Keccak-f implementation (Pieter Wuille) 45222e6 Implement keccak-f[1600] and SHA3-256 (Pieter Wuille) 08ad06d net: change CNetAddr::ip to have flexible size (furszy) 3337219 net: improve encapsulation of CNetAddr. (furszy) 910d5c4 test: Do not instantiate CAddrDB for static call (Hennadii Stepanov) 6b607ef Drop IsLimited in favor of IsReachable (Ben Woosley) a40711b IsReachable is the inverse of IsLimited (DRY). Includes unit tests (marcaiaf) 8839828 net: don't accept non-left-contiguous netmasks (Vasil Dimov) 5d7f864 rpcbind: Warn about exposing RPC to untrusted networks (Luke Dashjr) 2a6abd8 CNetAddr: Add IsBindAny method to check for INADDR_ANY (Luke Dashjr) 4fdfa45 net: Always default rpcbind to localhost, never "all interfaces" (Luke Dashjr) 31064a8 net: Minor accumulated cleanups (furszy) 9f9c871 tests: Avoid using C-style NUL-terminated strings as arguments (practicalswift) f6c52a3 tests: Add tests to make sure lookup methods fail on std::string parameters with embedded NUL characters (practicalswift) a751b9b net: Avoid using C-style NUL-terminated strings as arguments in the netbase interface (furszy) f30869d test: add IsRFC2544 tests (Mark Tyneway) ed5abe1 Net: Proper CService deserialization + GetIn6Addr return false if addr isn't an IPv6 addr (furszy) 86d73fb net: save the network type explicitly in CNetAddr (Vasil Dimov) ad57dfc net: document `enum Network` (Vasil Dimov) cb160de netaddress: Update CNetAddr for ORCHIDv2 (Carl Dong) c3c04e4 net: Better misbehaving logging (furszy) 3660487 net: Use C++11 member initialization in protocol (Marco) 082baa3 refactor: Drop unused CBufferedFile::Seek() (Hennadii Stepanov) e2d776a util: CBufferedFile fixes (Larry Ruane) 6921f42 streams: backport OverrideStream class (furszy) Pull request description: Conjunction of a large number of back ports, updates and refactorings that made with the final goal of implementing v3 Onion addresses support (BIP155 https://github.com/bitcoin/bips/blob/master/bip-0155.mediawiki) before the tor v2 addresses EOL, scheduled, by the Tor project, for (1) July 15th: v2 addr support removal from the code base, and (2) October 15th: v2 addr network disable, where **every peer in our network running under Tor will loose the connection and drop the network**. As BIP155 describes, this is introducing a new P2P message to gossip longer node addresses over the P2P network. This is required to support new-generation Onion addresses, I2P, and potentially other networks that have longer endpoint addresses than fit in the 128 bits of the current addr message. In order to achieve the end goal, had to: 1. Create Span class and push it up to latest Bitcoin implementation. 2. Update the whole serialization framework and every object using it up to latest Bitcoin implementation (3-4 years ahead of what we currently are in master). 3. Update the address manager implementing ASN-based bucketing of the network nodes. 4. Update and refactor the netAddress and address manager tests to latest Bitcoin implementation (4 years ahead of what we currently are in master). 5. Several util string, vector, encodings, parsing, hashing backports and more.. Important note: This PR it is not meant to be merged as a standalone PR, will decouple smaller ones moving on. Adding on each sub-PR its own description isolated from this big monster. Second note: This is still a **work-in-progress**, not ready for testing yet. I'm probably missing to mention few PRs that have already adapted to our sources. Just making it public so can decouple the changes, we can start merging them and i can continue working a bit more confortable (rebase a +170 commits separate branch is not fun..). ### List of back ported and adapted PRs: Span and Serialization: ---------------- * bitcoin#12886. * bitcoin#12916. * bitcoin#13558. * bitcoin#13697. (Only Span's commit 29943a9) * bitcoin#17850. * bitcoin#17896. * bitcoin#12752. * bitcoin#16577. * bitcoin#16670. (without faebf62) * bitcoin#17957. * bitcoin#18021. * bitcoin#18087. * bitcoin#18112 (only from 353f376 that we don't support). * bitcoin#18167. * bitcoin#18317. * bitcoin#18591 (only Span's commit 0fbde48) * bitcoin#18468. * bitcoin#19020. * bitcoin#19032. * bitcoin#19367. * bitcoin#19387. Net, NetAddress and AddrMan: ---------------- * bitcoin#7932. * bitcoin#10756. * bitcoin#10765. * bitcoin#12218. * bitcoin#12855. * bitcoin#13532. * bitcoin#13575. * bitcoin#13815. * bitcoin#14532. * bitcoin#15051. * bitcoin#15138. * bitcoin#15689. * bitcoin#16702. * bitcoin#17243. * bitcoin#17345. * bitcoin#17754. * bitcoin#17758. * bitcoin#17812. * bitcoin#18023. * bitcoin#18454. * bitcoin#18512. * bitcoin#19314. * bitcoin#19687 Keys and Addresses encoding: ---------------- * bitcoin#11372. * bitcoin#17511. * bitcoin#17721. Util: ---------------- * bitcoin#9140. * bitcoin#16577. * bitcoin#16889. * bitcoin#19593. Bench: ---------------- * bitcoin#16299. BIP155: ---------------- * bitcoin#19351. * bitcoin#19360. * bitcoin#19534. * bitcoin#19628. * bitcoin#19841. * bitcoin#19845. * bitcoin#19954. * bitcoin#19991 (pending). * bitcoin#19845. * bitcoin#20000 (pending). * bitcoin#20120. * bitcoin#20284. * bitcoin#20564. * bitcoin#21157 (pending). * bitcoin#21564 (pending). * Fully removed v2 onion addr support. * Add hardcoded seeds. * Add release-notes, changes to files.md and every needed documentation. I'm currently working on the PRs marked as "pending", this isn't over, but I'm pretty pretty close :). What a long road.. ACKs for top commit: random-zebra: utACK ecde04a Fuzzbawls: ACK ecde04a Tree-SHA512: 82c95fbda76fce63f96d8a9af7fa9a89cb1e1b302b7891e27118a6103af0be23606bf202c7332fa61908205e6b6351764e2ec23d753f1e2484028f57c2e8b51a
This PR attempts to solve the problem explained in #16599.
A particular attack which encouraged us to work on this issue is explained here [Erebus Attack against Bitcoin Peer-to-Peer Network] (by @muoitranduc)
Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.
A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).
Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.
TODO:
more unit testsfind a way to test the code without including >1 MB mapping file in the repo.Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?