Skip to content

Conversation

paulhauner
Copy link
Member

Issue Addressed

NA

Proposed Changes

Fixes an issue introduced in #3574 where I erroneously assumed that a crossbeam_channel multiple receiver queue was a broadcast queue. This is incorrect, each message will be received by only one receiver. The effect of this mistake is these logs:

Sep 20 06:56:17.001 INFO Synced                                  slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation          error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1

Additional Info

NA

@paulhauner paulhauner added ready-for-review The code is ready for review v3.1.2 Release after v3.1.0 (formerly v3.1.1) labels Sep 21, 2022
@paulhauner
Copy link
Member Author

paulhauner commented Sep 21, 2022

For some context, I made my own implementation of a broadcast queue since I struggled to find a reliable non-async existing implementation. The closest I came was bus, however it comes with a risk of busy-waits.

Notably, what makes our requirements so simple is that we only need a one-shot queue. Because of this I figured it would be easier to just implement our own rather than deal with the complexities of a multi-message queue written by someone else.

Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really elegant implementation of a useful primitive!

I think we're good to merge this pretty much as-is and get it running on testnets

}

/// The sending pair of the `oneshot` channel.
pub struct Sender<T>(Arc<MutexCondvar<T>>, Option<Arc<()>>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Arc<()> could maybe be an AtomicBool, but that would require specifying memory orderings and stuff, so I think the Arc<()> is probably more readable (and should be just as efficient at runtime).

impl<T: Clone> Receiver<T> {
/// Check to see if there is a message to be read *without* blocking/waiting.
pub fn try_recv(&self) -> Result<Option<T>, Error> {
match &*self.0.mutex.lock() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should use try_lock and return early if the mutex can't be obtained? That would better satisfy the comment about not blocking.

Or maybe we just clarify the doc comment that this method can block on the mutex, but not to wait for a value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've clarified the documentation since I think returning early with a busy mutex would be challenging to handle downstream with committee promises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current logic would handle it gracefully by returning the promise from CommitteeCache::get, and then calling wait on it? But I'm also happy to keep things simple as is

@paulhauner
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Sep 21, 2022
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:

```
Sep 20 06:56:17.001 INFO Synced                                  slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation          error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```

## Additional Info

NA
@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 21, 2022
@bors bors bot changed the title Impl oneshot_broadcast for committee promises [Merged by Bors] - Impl oneshot_broadcast for committee promises Sep 21, 2022
@bors bors bot closed this Sep 21, 2022
bors bot pushed a commit that referenced this pull request Sep 21, 2022
## Issue Addressed

NA

## Proposed Changes

Fixes an issue found during testing with #3595.

## Additional Info

NA
bors bot pushed a commit that referenced this pull request Sep 21, 2022
## Issue Addressed

NA

## Proposed Changes

Fixes an issue found during testing with #3595.

## Additional Info

NA
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in sigp#3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:

```
Sep 20 06:56:17.001 INFO Synced                                  slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation          error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```

## Additional Info

NA
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

NA

## Proposed Changes

Fixes an issue found during testing with sigp#3595.

## Additional Info

NA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v3.1.2 Release after v3.1.0 (formerly v3.1.1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants