Skip to content

Conversation

jxs
Copy link
Member

@jxs jxs commented Jun 28, 2025

Issue Addressed

Lighthouse is currently loggign a lot errors in the RPC behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled).
This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected.
Thanks @ackintosh for helping debug!

@michaelsproul michaelsproul added ready-for-review The code is ready for review Networking v7.1.0 Post-Electra release labels Jun 29, 2025
@michaelsproul michaelsproul changed the title Error from RPC send_response when request doesn't exist on the active inbound requests. Error from RPC send_response when request doesn't exist on the active inbound requests Jun 29, 2025
michaelsproul added a commit that referenced this pull request Jun 29, 2025
… inbound requests (#7663)

Squashed commit of the following:

commit 0ce690c
Author: João Oliveira <hello@jxs.pt>
Date:   Fri Jun 27 22:50:11 2025 +0100

    Error from RPC `send_response`

    when request doesn't exist on the active inbound requests.
    And handle the error from the main service to check if it may be a data race or a critical bug

commit 522e00f
Author: Jimmy Chen <jimmy@sigmaprime.io>
Date:   Sat Jun 28 05:01:46 2025 +1000

    Fix incorrect `waker` update condition (#7656)

    This bug was first found and partially fixed by @VolodymyrBg in #7317 - this PR applies the same fix everywhere else.

    The old logic updated the waker when it already matched the context, and did nothing when it was stale:

    ```rust
    if waker.will_wake(cx.waker()) {
    self.waker = Some(cx.waker().clone());
    }
    ```

    This is the wrong way around. We only want to update the waker if it doesn't match the current context:

    ```rust
    if !waker.will_wake(cx.waker()) {
    self.waker = Some(cx.waker().clone());
    }
    ```

    I don't think we've ever noticed any issues, but it’s a subtle bug that could lead to missed wakeups.

commit 83cad25
Author: Jimmy Chen <jimmy@sigmaprime.io>
Date:   Sat Jun 28 04:21:17 2025 +1000

    Fix Rust 1.88 clippy errors & execution engine tests (#7657)

    Fix Rust 1.88 clippy errors.

commit 9b1f3ed
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Thu Jun 26 17:26:38 2025 -0700

    Add gossip check (#7652)

    N/A

      Add an additional gossip condition.

commit a0a6b93
Author: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Date:   Wed Jun 25 08:22:24 2025 +0200

    Do not compute sync selection proofs for the sync duty at the current slot (#7551)

commit 8e3c5d1
Author: chonghe <44791194+chong-he@users.noreply.github.com>
Date:   Wed Jun 25 13:33:17 2025 +0800

    Rust 1.89 compiler lint fix (#7644)

    Fix lints for Rust 1.89 beta compiler

commit 56b2d4b
Author: Eitan Seri-Levi <eserilev@ucsc.edu>
Date:   Tue Jun 24 09:29:10 2025 +0300

    Remove instrumenting log level  (#7636)

    #7155

      Theres some additional places we set instrumenting log levels that wasn't covered in #7620

commit fd643c3
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jun 23 23:11:46 2025 +1000

    Un-ignore EF test for v1.6.0-alpha.1 (#7632)

    Closes:

    - #7547

      Run the test that was previously ignored when we were between spec versions.

commit cef04ee
Author: chonghe <44791194+chong-he@users.noreply.github.com>
Date:   Mon Jun 23 16:37:49 2025 +0800

    Implement `validator_identities` Beacon API endpoint (#7462)

    * #7442
michaelsproul added a commit that referenced this pull request Jul 1, 2025
… inbound requests (#7663)

Squashed commit of the following:

commit 0ce690c
Author: João Oliveira <hello@jxs.pt>
Date:   Fri Jun 27 22:50:11 2025 +0100

    Error from RPC `send_response`

    when request doesn't exist on the active inbound requests.
    And handle the error from the main service to check if it may be a data race or a critical bug

commit 522e00f
Author: Jimmy Chen <jimmy@sigmaprime.io>
Date:   Sat Jun 28 05:01:46 2025 +1000

    Fix incorrect `waker` update condition (#7656)

    This bug was first found and partially fixed by @VolodymyrBg in #7317 - this PR applies the same fix everywhere else.

    The old logic updated the waker when it already matched the context, and did nothing when it was stale:

    ```rust
    if waker.will_wake(cx.waker()) {
    self.waker = Some(cx.waker().clone());
    }
    ```

    This is the wrong way around. We only want to update the waker if it doesn't match the current context:

    ```rust
    if !waker.will_wake(cx.waker()) {
    self.waker = Some(cx.waker().clone());
    }
    ```

    I don't think we've ever noticed any issues, but it’s a subtle bug that could lead to missed wakeups.

commit 83cad25
Author: Jimmy Chen <jimmy@sigmaprime.io>
Date:   Sat Jun 28 04:21:17 2025 +1000

    Fix Rust 1.88 clippy errors & execution engine tests (#7657)

    Fix Rust 1.88 clippy errors.

commit 9b1f3ed
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Thu Jun 26 17:26:38 2025 -0700

    Add gossip check (#7652)

    N/A

      Add an additional gossip condition.

commit a0a6b93
Author: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Date:   Wed Jun 25 08:22:24 2025 +0200

    Do not compute sync selection proofs for the sync duty at the current slot (#7551)

commit 8e3c5d1
Author: chonghe <44791194+chong-he@users.noreply.github.com>
Date:   Wed Jun 25 13:33:17 2025 +0800

    Rust 1.89 compiler lint fix (#7644)

    Fix lints for Rust 1.89 beta compiler

commit 56b2d4b
Author: Eitan Seri-Levi <eserilev@ucsc.edu>
Date:   Tue Jun 24 09:29:10 2025 +0300

    Remove instrumenting log level  (#7636)

    #7155

      Theres some additional places we set instrumenting log levels that wasn't covered in #7620

commit fd643c3
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jun 23 23:11:46 2025 +1000

    Un-ignore EF test for v1.6.0-alpha.1 (#7632)

    Closes:

    - #7547

      Run the test that was previously ignored when we were between spec versions.

commit cef04ee
Author: chonghe <44791194+chong-he@users.noreply.github.com>
Date:   Mon Jun 23 16:37:49 2025 +0800

    Implement `validator_identities` Beacon API endpoint (#7462)

    * #7442
@jxs jxs force-pushed the handle-rpc-send-response-errors branch from 0ce690c to c422ed2 Compare July 2, 2025 12:03
when request doesn't exist on the active inbound requests.
And handle the error from the main service to check if it may be a data race or a critical bug
@jxs jxs force-pushed the handle-rpc-send-response-errors branch from c422ed2 to 361ebf4 Compare July 2, 2025 12:03
michaelsproul added a commit that referenced this pull request Jul 3, 2025
… inbound requests (#7663)

Squashed commit of the following:

commit 361ebf4
Author: João Oliveira <hello@jxs.pt>
Date:   Fri Jun 27 22:50:11 2025 +0100

    Error from RPC `send_response`

    when request doesn't exist on the active inbound requests.
    And handle the error from the main service to check if it may be a data race or a critical bug
@michaelsproul michaelsproul changed the base branch from unstable to release-v7.1.0 July 6, 2025 22:59
Copy link

mergify bot commented Jul 8, 2025

Some required checks have failed. Could you please take a look @jxs? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. ready-for-review The code is ready for review and removed ready-for-review The code is ready for review waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Jul 8, 2025
@michaelsproul michaelsproul mentioned this pull request Jul 9, 2025
michaelsproul added a commit that referenced this pull request Jul 9, 2025
… inbound requests (#7663)

Squashed commit of the following:

commit 093b659
Author: ackintosh <sora.akatsuki@gmail.com>
Date:   Wed Jul 9 07:30:59 2025 +0900

    Respect peer_disconnected when counting active requests

commit 29686e8
Author: João Oliveira <hello@jxs.pt>
Date:   Tue Jul 8 00:44:45 2025 +0100

    address Akihito review

commit 9482db0
Author: João Oliveira <hello@jxs.pt>
Date:   Mon Jul 7 16:15:21 2025 +0100

    do not remove on disconnect, rather mark the request

commit 361ebf4
Author: João Oliveira <hello@jxs.pt>
Date:   Fri Jun 27 22:50:11 2025 +0100

    Error from RPC `send_response`

    when request doesn't exist on the active inbound requests.
    And handle the error from the main service to check if it may be a data race or a critical bug
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging ahead of release ✅

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jul 9, 2025
mergify bot added a commit that referenced this pull request Jul 9, 2025
@mergify mergify bot merged commit 8b5ccac into sigp:release-v7.1.0 Jul 9, 2025
33 checks passed
mergify bot pushed a commit that referenced this pull request Jul 10, 2025
Post-Pectra release for tree-states hot 🎉

Already merged to `release-v7.1.0`:

- #7444
- #6750
- #7437
- #7133
- #7620
- #7663
ethDreamer pushed a commit to ethDreamer/lighthouse that referenced this pull request Jul 28, 2025
…ve inbound requests (sigp#7663)

Lighthouse is currently loggign a lot errors in the `RPC` behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled).
This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected.
Thanks @ackintosh for helping debug!
ethDreamer pushed a commit to ethDreamer/lighthouse that referenced this pull request Jul 28, 2025
Post-Pectra release for tree-states hot 🎉

Already merged to `release-v7.1.0`:

- sigp#7444
- sigp#6750
- sigp#7437
- sigp#7133
- sigp#7620
- sigp#7663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Networking ready-for-merge This PR is ready to merge. v7.1.0 Post-Electra release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants