Skip to content

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Feb 5, 2025

Issue Addressed

NA

Proposed Changes

Bumps the ethereum_ssz version, along with other crates that share the dep.

Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (sigp/ethereum_ssz#38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.

Additional Info

  • Bitfields have been moved from ssz_types to ethereum_ssz, so there's some non-substantial changes to error handling.
  • The MetaData struct in lighthouse_network holds two bitfields and therefore now has a larger stack size. I had to Arc it to avoid some clippy lints about disparate enum sizes.
  • An Attestation is now 96 bytes larger than it was before (32-128). You'll see some tests updated to expect these new sizes.

@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Feb 10, 2025
@paulhauner paulhauner marked this pull request as ready for review February 10, 2025 02:48
@paulhauner paulhauner requested a review from jxs as a code owner February 10, 2025 02:48
@michaelsproul
Copy link
Member

michaelsproul commented Feb 10, 2025

We have a node running this branch on infra, are you planning to post some charts with metrics once it's been running for a little while?

I guess we're looking for lower total allocations per second, and maybe (indirectly) lower memory usage due to fragmentation. I don't imagine we get a noticeable reduction in CPU usage?

@paulhauner
Copy link
Member Author

I'm still collecting metrics in an attempt to show this has some effect. I've added some new jemalloc stats metrics. Hopefully I'll have something in the next couple of days.

@paulhauner
Copy link
Member Author

paulhauner commented Feb 19, 2025

Ok, so I have some metrics. The good news is that we can see a drop in allocations associated with this PR. The less good news is that I can't find any other metric that is affected by this change.

Below are metrics of stats.arenas.<i>.large.ndalloc (top) and stats.arenas.<i>.large.nmalloc (bottom) (details on these metrics).

Screenshot 2025-02-19 at 4 25 20 pm

What we can observe here is a reduction in the count of allocations. In particular, these are temporary allocations, as observed by the peaks in allocations and deallocations (I summed these in Grafana and they net-out to a flat line). Notably, I did not see any changes in:

  • Total memory allocated over time.
  • Memory footprint.
  • CPU usage.
  • Attestation verification times.

Therefore, we have managed to reduce the allocation count without being able to observe any tangible second-order benefit. Should we still merge this PR, I think yes. Primarily because reducing the allocation count feels like good hygiene to me. However, I am certainly open to opposing arguments.

Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been well tested, and the code looks solid.

I am going to prioritise merging this as I find I keep needing it to test Milhouse-related changes for Holesky.

@michaelsproul
Copy link
Member

Just waiting on the back-merge to unblock CI:

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 10, 2025
mergify bot added a commit that referenced this pull request Mar 10, 2025
Copy link

mergify bot commented Mar 10, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.

If you want to requeue this pull request, you can post a @mergifyio requeue comment.

@michaelsproul
Copy link
Member

Spurious failure.

@michaelsproul
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Mar 10, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

mergify bot added a commit that referenced this pull request Mar 10, 2025
@mergify mergify bot merged commit 8d1abce into sigp:unstable Mar 10, 2025
31 checks passed
mergify bot pushed a commit that referenced this pull request Jul 9, 2025
Update `c-kzg` from `v1` to `v2`. My motivation here is that `alloy-consensus` now uses `c-kzg` in `v2` and this results in a conflict when using lighthouse in combination with latest alloy. I tried also to disable the `czkg` feature in alloy, but the conflict persisted.

See here for the alloy update to `c-kzg v2`:  alloy-rs/alloy#2240

Error:
```
error: failed to select a version for `c-kzg`.
...
versions that meet the requirements `^1` are: 1.0.3, 1.0.2, 1.0.0

the package `c-kzg` links to the native library `ckzg`, but it conflicts with a previous package which links to `ckzg` as well:
package `c-kzg v2.1.0`
... which satisfies dependency `c-kzg = "^2.1"` of package `alloy-consensus v0.13.0`
... which satisfies dependency `alloy-consensus = "^0.13.0"` of package ...
...
```


  - Upgrade `alloy-consensus` to `0.14.0` and disable all default features
- Upgrade `c-kzg` to `v2.1.0`
- Upgrade `alloy-primitives` to `1.0.0`
- Adapt the code to the new API `c-kzg`
- There is now `NO_PRECOMPUTE` as my understand from https://github.com/ethereum/c-kzg-4844/pull/545/files we should use `0` here as `new_from_trusted_setup_no_precomp` does not precomp. But maybe it is misleading. For all other places I used `RECOMMENDED_PRECOMP_WIDTH` because `8` is matching the recommendation.
- `BYTES_PER_G1_POINT` and `BYTES_PER_G2_POINT` are no longer public in `c-kzg`
- I adapted two tests that checking for the `Attestation`  bitfield size. But I could not pinpoint to what has changed and why now 8 bytes less. I would be happy about any hint, and if this is correct. I found related a PR here: #6915
- Use same fields names, in json, as well as `c-kzg` and `rust_eth_kzg` for `g1_monomial`, `g1_lagrange`, and `g2_monomial`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants