Skip to content

Conversation

taratorio
Copy link
Member

@taratorio taratorio commented Feb 26, 2025

integration tests caught several bugs with initial implementation

gist of testing approach:

  • we have an integration test which starts up Erigon as EL only
  • we drive Erigon with a MockCl via the engine api json rpc
  • we interact with Erigon via the "eth_" namespace json rpc api (ie submit transactions, deploy shutter contracts, check transaction inclusion, etc)
  • this way we can build blocks and verify logic
  • this is a black box test where we feed Erigon inputs via the public api surface and check the end result again via the public api surface

test scenarios:

  • deploy initial keyper set
  • build a block with 1 shutter txn and 1 non-shutter txn
  • build a block after new eon change
  • built block does not exceed shutter encryption txn gas limit
  • built block does not include blob txns (if they accidentally make their way into the sequencer contract) - pending
  • build block after an unwind has happened (to cover unwind logic in our encrypted txn pool) - pending (probably in a follow up PR)

bug fixes caught and fixed by this:

  1. block tracker waiting timeout - fixed in this PR
[WARN] [02-26|21:59:15.010] Failed to build a block                  err="[3/4 MiningExecution] issue while waiting for parent block 14520965: context deadline exceeded"
  1. local rpc subscription panic - fixed in this PR
[EROR] [02-27|19:45:55.076] catch panic                              err="can't Notify before subscription is created" stack="[log_panic.go:41 panic.go:785 subscription.go:137 eth_filters.go:289 asm_amd64.s:1700]
  1. Recent logs notifications not sent via "flush extending fork" path - fix merged in a separate PR execution: fix missing log notifications when flushing extending fork #14192
  2. Caught that we do not send log notifications upon unwind - PENDING (will provide a better solution in a follow up PR as this got quite big)
  3. Caught issue with deployment of new eons and tracking of future eons - PENDING (will provide a better solution in a follow up PR as this got quite big)

@taratorio taratorio changed the title txnprovider/shutter: fix block tracker txnprovider/shutter: fix block tracker, rpc subscription local notifier Feb 27, 2025
taratorio added a commit that referenced this pull request Mar 18, 2025
used in #13983
taking a cohesive unit of logic out of the bigger PR for ease of
reviewing
adds a json rpc client for interacting with the engine api

the gist is:
- we have an integration test which starts up Erigon as EL only
- we drive Erigon with a MockCl
- in order to be able to drive Erigon we need to access it's engine api
json rpc (for this we need an engine api client)
- analogous to that, we need a json rpc api client for the rpcdaemon
publicly exposed apis (e.g. the "eth_" namespace, "debug_", etc.)
- this is a black box test where we feed Erigon inputs via the public
api surface and check the end result again via the public api surface
taratorio added a commit that referenced this pull request Mar 18, 2025
…#14192)

found in #13983

we weren't getting any new log notifications (rpc log subscriptions api)
when the EL was going via the `e.forkValidator.FlushExtendingFork` code
path which does an in-memory flush of the previously cached execution of
`EthereumExecutionModule.ValidateChain`
@taratorio taratorio marked this pull request as ready for review March 19, 2025 09:45
}
iterations++
if iterations > 1024 {
return 0, nil, fmt.Errorf("failed to find a free port after %d iterations", iterations)
Copy link
Contributor

@somnathb1 somnathb1 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever hit? If it does, doesn't it mean it's being used in the wrong way? I think the statefulness of this method might not be a good thing here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never hit it as of now, it assumes it won't take us more than 1024 tries to find a non consumed port (so say 1 test uses 10 ports, that's 100 tests running at the same time will potentially cause us to hit this)

You are right though, maybe I can switch to an approach where we have 1 global atomic port counter which we increment, try if the port is free, if yes then give it back to the port consumer, if not increment and try the next port and so on. It can be circular so if it ever reaches port 65535 it starts from 1024 again. It still needs a global atomic counter but it is a bit less stateless and less likely to be exhausted I guess. Let me try

Copy link
Contributor

@somnathb1 somnathb1 Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is the "consumed" ports are not consumed, and just reserved in the map. Maybe the ports were closed but the map wasn't updated, or ports were never opened. But that's probably an edge case not applicable here. One way would be to just use a free port, just in time, and if it didn't get opened, just try again (just in time).

The code to get a free port (stateless) and the management of the map (global test counter within the test(s))that you are using for convenience to not overlap each other could be kept segregated. Because your random returned port might just be 8080 and it could be used by another service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, what about this: 98c2671

it is close to what you are suggesting as "request on demand" and still catering for this problem I mentioned here #13983 (comment) (we just need to make sure all big tests like this that open ports in our repo use this approach - right now there are only a few tests which open ports so we should be fine - I may also move this func in a top level package in a follow up PR so it is visible)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. And yes, i think that will fix about a few of those that fail because of insistence on specific ports. I made a mental note to tackle one on the next failure.

@taratorio taratorio merged commit 18af7e1 into main Mar 21, 2025
13 checks passed
@taratorio taratorio deleted the shutter-fixes-to-block-tracker branch March 21, 2025 08:26
Giulio2002 pushed a commit that referenced this pull request Apr 12, 2025
…#14192)

found in #13983

we weren't getting any new log notifications (rpc log subscriptions api)
when the EL was going via the `e.forkValidator.FlushExtendingFork` code
path which does an in-memory flush of the previously cached execution of
`EthereumExecutionModule.ValidateChain`
Giulio2002 added a commit that referenced this pull request Apr 14, 2025
…xtending fork… (#14578)

… (#14192)

found in #13983

we weren't getting any new log notifications (rpc log subscriptions api)
when the EL was going via the `e.forkValidator.FlushExtendingFork` code
path which does an in-memory flush of the previously cached execution of
`EthereumExecutionModule.ValidateChain`

@taratorio you forgot to cherry-pick this

---------

Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants