-
Notifications
You must be signed in to change notification settings - Fork 127
server/swap: reduce lock contention and fix memory leak #1563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
s.unlockOrderCoins(mt.Maker) | ||
s.unlockOrderCoins(mt.Taker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from revoke
to deleteMatch
so this happens in the successful completion path too (see processRedeem
for the taker)
748ed22
to
a9a19dd
Compare
FWIW, rebased with other swapper changes from #1541 and the trade_simnet_test.go integration tests are passing, with a -race enabled server as well. No deadlocks or races observed. |
a9a19dd
to
79d2b9d
Compare
79d2b9d
to
581fbb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion
} | ||
status.mtx.RUnlock() // Contract.Confirmations can timeout (slow), causing lock contention | ||
|
||
confs, err := status.swap.Confirmations(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to convince myself that it's safe to access status.swap
here without protection and I'm having trouble. It looks like swap
is assigned in processInit
. What exactly prevents the maker from submitting more than one init
request? We only seem to check that the status is NewlyMatched || MakerSwapCast
, but that wouldn't prevent it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. We certainly don't want 3700 init requests for the same match all to start coin waiters.
Regarding the status.swap
field, we could also disallow setting it if it's not nil, since it doesn't make sense to change it once it's set. At least that was my thought process in doing this - The swapTime.IsZero
check is what needs to be under lock, with the assumption that status.swap
would never be modified after being initially set. Yeah, I think that needs to be explicitly prevented.
If the status.swap *asset.Contract
field can be overwritten, that's a fundamental issue.
I'll certainly look at this more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you identified a more fundamental and nasty issue. I've tried to address that with 4d82ea4
failMatch := func() { | ||
deleteMatch := func() { | ||
// Fail the match, and assign fault if lock times are not passed. | ||
s.failMatch(match, !match.expiredBy(now)) | ||
deletions = append(deletions, match) | ||
s.deleteMatch(match) | ||
failures = append(failures, fail{match, !match.expiredBy(now)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. So we delete it from s.matches
, then we have the only reference and we're free to pass it to failMatch
without mutex protection. Do I have that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and failMatch
only looks at the Status field. Any updates to Status are intentionally done with the matches map locked in processInit and processRedeem so it can't be revoked while updating status. It wouldn't hurt for failMatch to use the matchTracker mutex too but it wasn't needed and still isn't I'm fairly certain.
processBlock
-> tryConfirmSwap
could still have a reference to the match, but it's not touching the matchTracker.Status
field that matchTracker.mtx
is assigned to. In retrospect, I think we could have put the mutex in order.Match
instead of swap.matchTracker
.
ef57c52
to
4d82ea4
Compare
// Swap known means status.swap is set, and that it will not be replaced | ||
// because we are gating processInit with the swapSearching semaphore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that an easy way to just avoid a data race would have been to just use a getter method like the other swapStatus methods, but as I'm pointing out it's is completely incorrect for the swap contract field to be changed once set so I wanted to fix that issue properly.
} | ||
|
||
actor.status.mtx.Lock() | ||
actor.status.swap = contract | ||
actor.status.swap = contract // swap should not be set already (handleInit should gate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were not confident that the searching semaphore is sufficient to prevent this as well as simultaneous coin waiters, we could add a final nil check here before assigning, but I don't think we need to code for every "should never happen" condition. Besides, I'm not sure what an appropriate way to handle that would be other than log an error or panic.
if res == wait.DontTryAgain { | ||
stepInfo.actor.status.endSwapSearch() // contract now recorded | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you already doing this in processInit
? I prefer this pattern if there's a choice. Obviously a slight functional difference, but I don't know if using this pattern (waiting to endSwapSearch
until after acks are sent) introduces any bottlenecks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. The code here is how I started, but then I felt weird about waiting to unblock until after sending the ack and counterparty audit requests. Lemme look again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is way cleaner though here. It's why I was tempted to add that EndFun
to Waiter
mentioned here: https://github.com/decred/dcrdex/pull/1563/files#diff-c456beb63edc783dd7172356d814f0e516f0f7ec8a3ba4f3b180ff24464c2afdR40
I started down that path but it got messier for some reason.
I think this PR needs to be run with loadbot a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, well that's why I moved it into processInit
... the darn tests use the ack as the indication to move on to the next request in the test (too soon if the semaphore is cleared after that). I'll try to figure out how to modify the tests instead of making the swapper code weird.
Testing internals is such a mess
4d82ea4
to
58bb3ad
Compare
Because transactions can advance swap status before procesBlock has a chance to tryConfirmSwap and then unlockOrderCoins, there is a memory leak when we see the "Prematurely received" warnings from processInit and processRedeem. This moves the unlockOrderCoins calls from revoke to deleteMatch, where it is called for all matches, including completed ones. (see the MatchComplete branch in processRedeem).
58bb3ad
to
3f32fa0
Compare
Not super polished, but OK to review.
There is a lot of lock contention in the swapper, with mutexes locked during expensive operations that can time out like confirmations checks. This starts fixing that.
There also appeared to be a memory leak with the coinlocker map with "premature" init requests from clients/