-
Notifications
You must be signed in to change notification settings - Fork 450
Replace Redis with local in-memory peer store #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #270 +/- ##
==========================================
+ Coverage 66.16% 66.40% +0.23%
==========================================
Files 184 186 +2
Lines 9098 9224 +126
==========================================
+ Hits 6020 6125 +105
- Misses 2319 2337 +18
- Partials 759 762 +3
Continue to review full report at Codecov.
|
tracker/peerstore/local.go
Outdated
// At this point, A is holding onto a peerGroup reference which has been | ||
// deleted from the peerGroups map, and thus has no choice but to attempt to | ||
// reload a new peerGroup. Since the cleanup interval is quite large, it is | ||
// *extremeley* unlikely this for-loop will execute more than twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling
tracker/peerstore/local.go
Outdated
} | ||
g.mu.Lock() | ||
for _, id := range expired { | ||
delete(g.peers, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think there's a race condition here where UpdatePeers could update the expiresAt field between the time we release the read lock and acquire the write lock. It seems like we should recheck the expiredAt value here before we actually perform the delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Nice catch.
LGTM |
tracker/peerstore/local.go
Outdated
s.mu.RUnlock() | ||
|
||
for _, h := range hashes { | ||
s.mu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to to combine this with the above for loop (to reduce the number of lock operations)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
} | ||
|
||
var expired []int | ||
g.mu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could actually consider just not locking here at all. If any entries are stale, they'll be checked again (with the lock) before they're deleted.
The benefit to this is you reduce the number of lock operations. This is nice because a lock operation always requires a memory access (it can't use the processor cache). While we don't really care about the extra cycles this introduces to the GC thread, if there are threads running on other processors (e.g. readers) that are also performing lock operations, there will be contention for the memory bus which may slow down the readers.
Totally optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UpdatePeers function is appending to the slice, therefore it's not safe to access without a lock.
@codygibb Is there any documentation on how to use this? Could you also explain what you mean by |
The local store is enabled by default. If you want to continue using Redis, you can set Check out https://github.com/uber/kraken/blob/master/tracker/announceclient/client.go#L111 for how client-side hashing works. |
@codygibb Thank you. One last question. We are using spot instances on AWS as kraken agents. Which means AWS can kill these instances anytime. I am seeing some
I suspect this is because I have I want to understand what announce request mean? Is it the time frame a tracker will keep the ip of the peer and share it with other peers? Also is there a default egress & ingress limits for agents & origins? We are using peers which have 10 Gigabit network performance & 25 Gigabit on the origins. I wasn't sure if this limit is applied by default on all agents. |
Would you mind moving this to an issue? The problem you're seeing seems unrelated to this PR. |
The use of Redis in Kraken is purely a historical artifact from when announce requests were not sharded and all Trackers shared the same Redis instance. These days, we have used client-side hashing in Agent to ensure that all Agent announces for a particular info hash are sent to the same Tracker, meaning the Trackers do not need to share peer stores anymore. As such, Tracker deployments have generally used local Redis instances.
This PR introduces a new peerstore.Store implementation called LocalStore, which is an in-memory two-level locking map. I made a couple of performance-conscious decisions that made the implementation more complex than reviewers may be accustomed to. The main optimization is ensuring that the top-level lock and infohash-level locks are never held at the same time except for runCleanup (which only runs every hour). This ensures that reads/writes to different infohashes only ever compete for the top-level lock. The alternative is to acquire the infohash-level lock while still holding the top-level lock, which is simpler, but causes more lock contention.
Here are the benchmarks for 4 hosts x 400 peers x 100 hashes announcing every 2s for 2m:
Redis:
Local: