Skip to content

Conversation

chappjc
Copy link
Member

@chappjc chappjc commented Apr 5, 2022

Not super polished, but OK to review.

There is a lot of lock contention in the swapper, with mutexes locked during expensive operations that can time out like confirmations checks. This starts fixing that.

There also appeared to be a memory leak with the coinlocker map with "premature" init requests from clients/

Comment on lines +346 to +389
s.unlockOrderCoins(mt.Maker)
s.unlockOrderCoins(mt.Taker)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from revoke to deleteMatch so this happens in the successful completion path too (see processRedeem for the taker)

@chappjc chappjc force-pushed the swapper-lock-contention branch 2 times, most recently from 748ed22 to a9a19dd Compare April 8, 2022 04:14
@chappjc
Copy link
Member Author

chappjc commented Apr 8, 2022

FWIW, rebased with other swapper changes from #1541 and the trade_simnet_test.go integration tests are passing, with a -race enabled server as well. No deadlocks or races observed.

Copy link
Contributor

@LasTshaMAN LasTshaMAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion

}
status.mtx.RUnlock() // Contract.Confirmations can timeout (slow), causing lock contention

confs, err := status.swap.Confirmations(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to convince myself that it's safe to access status.swap here without protection and I'm having trouble. It looks like swap is assigned in processInit. What exactly prevents the maker from submitting more than one init request? We only seem to check that the status is NewlyMatched || MakerSwapCast, but that wouldn't prevent it.

Copy link
Member Author

@chappjc chappjc May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. We certainly don't want 3700 init requests for the same match all to start coin waiters.

Regarding the status.swap field, we could also disallow setting it if it's not nil, since it doesn't make sense to change it once it's set. At least that was my thought process in doing this - The swapTime.IsZero check is what needs to be under lock, with the assumption that status.swap would never be modified after being initially set. Yeah, I think that needs to be explicitly prevented.

If the status.swap *asset.Contract field can be overwritten, that's a fundamental issue.

I'll certainly look at this more closely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you identified a more fundamental and nasty issue. I've tried to address that with 4d82ea4

Comment on lines -1125 to +1206
failMatch := func() {
deleteMatch := func() {
// Fail the match, and assign fault if lock times are not passed.
s.failMatch(match, !match.expiredBy(now))
deletions = append(deletions, match)
s.deleteMatch(match)
failures = append(failures, fail{match, !match.expiredBy(now)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So we delete it from s.matches, then we have the only reference and we're free to pass it to failMatch without mutex protection. Do I have that right?

Copy link
Member Author

@chappjc chappjc May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and failMatch only looks at the Status field. Any updates to Status are intentionally done with the matches map locked in processInit and processRedeem so it can't be revoked while updating status. It wouldn't hurt for failMatch to use the matchTracker mutex too but it wasn't needed and still isn't I'm fairly certain.

processBlock -> tryConfirmSwap could still have a reference to the match, but it's not touching the matchTracker.Status field that matchTracker.mtx is assigned to. In retrospect, I think we could have put the mutex in order.Match instead of swap.matchTracker.

@chappjc chappjc force-pushed the swapper-lock-contention branch from ef57c52 to 4d82ea4 Compare May 31, 2022 22:03
@chappjc chappjc marked this pull request as ready for review May 31, 2022 22:10
Comment on lines +915 to +939
// Swap known means status.swap is set, and that it will not be replaced
// because we are gating processInit with the swapSearching semaphore.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that an easy way to just avoid a data race would have been to just use a getter method like the other swapStatus methods, but as I'm pointing out it's is completely incorrect for the swap contract field to be changed once set so I wanted to fix that issue properly.

}

actor.status.mtx.Lock()
actor.status.swap = contract
actor.status.swap = contract // swap should not be set already (handleInit should gate)
Copy link
Member Author

@chappjc chappjc May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were not confident that the searching semaphore is sufficient to prevent this as well as simultaneous coin waiters, we could add a final nil check here before assigning, but I don't think we need to code for every "should never happen" condition. Besides, I'm not sure what an appropriate way to handle that would be other than log an error or panic.

@chappjc chappjc added this to the 0.5 milestone Jun 9, 2022
Comment on lines +1943 to +1960
if res == wait.DontTryAgain {
stepInfo.actor.status.endSwapSearch() // contract now recorded
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you already doing this in processInit? I prefer this pattern if there's a choice. Obviously a slight functional difference, but I don't know if using this pattern (waiting to endSwapSearch until after acks are sent) introduces any bottlenecks.

Copy link
Member Author

@chappjc chappjc Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. The code here is how I started, but then I felt weird about waiting to unblock until after sending the ack and counterparty audit requests. Lemme look again.

Copy link
Member Author

@chappjc chappjc Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is way cleaner though here. It's why I was tempted to add that EndFun to Waiter mentioned here: https://github.com/decred/dcrdex/pull/1563/files#diff-c456beb63edc783dd7172356d814f0e516f0f7ec8a3ba4f3b180ff24464c2afdR40
I started down that path but it got messier for some reason.

I think this PR needs to be run with loadbot a bit.

Copy link
Member Author

@chappjc chappjc Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, well that's why I moved it into processInit... the darn tests use the ack as the indication to move on to the next request in the test (too soon if the semaphore is cleared after that). I'll try to figure out how to modify the tests instead of making the swapper code weird.
Testing internals is such a mess

@chappjc chappjc force-pushed the swapper-lock-contention branch from 4d82ea4 to 58bb3ad Compare June 16, 2022 14:10
chappjc added 3 commits June 16, 2022 11:46
Because transactions can advance swap status before procesBlock has a
chance to tryConfirmSwap and then unlockOrderCoins, there is a memory
leak when we see the "Prematurely received" warnings from processInit
and processRedeem.

This moves the unlockOrderCoins calls from revoke to deleteMatch, where
it is called for all matches, including completed ones. (see the
MatchComplete branch in processRedeem).
@chappjc chappjc force-pushed the swapper-lock-contention branch from 58bb3ad to 3f32fa0 Compare June 16, 2022 16:46
@chappjc chappjc merged commit 59a764d into decred:master Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants