Skip to content

Conversation

giorio94
Copy link
Member

Currently, [Table.Initialized] is affected by a potential pitfall, as it can return false (i.e., not initialized) and a closed watch channel if any write transaction (e.g., to populate an initial state, or register a changes iterator) completed before the first initializer is registered. Similarly, the same can happen if a new initializer is registered after that the previous ones have already been marked as done. In turn, this can cause consumers to misbehave, e.g., entering a busy loop, or incorrectly thinking that the table is initialized, if relying on the channel only.

Let's get this fixed by slightly reworking the internal handling of initialized, so that we correctly support multiple transitions for the same table between initialized and not initialized. The initializations status (and wait channel) returned by the [Table.Initialized] method is always consistent considering the snapshot referred to by the given [ReadTxn], and the presence of any initializer not marked done at that point.

In other words, every table starts in initialized state (as no initializers are initially present), and [Table.Initialized] returns true (i.e., initialized) and a closed channel. Once one or more initializers are registered, subsequent calls to [Table.Initialized] invoked with a [ReadTxn] referring to that snapshot return false and a wait channel, that gets closed once all initializers (also considering ones potentially added afterwards) get marked as done. If a new initializer is registered again, [Table.Initialized] would return false and a new watch channel, closed again once all initializers are marked done, and so on.

Additionally, let's also panic if initializer is marked done while the target table is not included in the write transaction, to make the failure explicit, instead of silently skipping the operation and not marking the table as initialized.

giorio94 added 2 commits July 25, 2025 14:59
Currently, [Table.Initialized] is affected by a potential pitfall,
as it can return false (i.e., not initialized) and a closed watch
channel if any write transaction (e.g., to populate an initial
state, or register a changes iterator) completed before the first
initializer is registered. Similarly, the same can happen if a
new initializer is registered after that the previous ones have
already been marked as done. In turn, this can cause consumers to
misbehave, e.g., entering a busy loop, or incorrectly thinking
that the table is initialized, if relying on the channel only.

Let's get this fixed by slightly reworking the internal handling
of initialized, so that we correctly support multiple transitions
for the same table between initialized and not initialized. The
initializations status (and wait channel) returned by the
[Table.Initialized] method is always consistent considering the
snapshot referred to by the given [ReadTxn], and the presence
of any initializer not marked done at that point.

In other words, every table starts in initialized state (as no
initializers are initially present), and [Table.Initialized]
returns true (i.e., initialized) and a closed channel. Once one
or more initializers are registered, subsequent calls to
[Table.Initialized] invoked with a [ReadTxn] referring to that
snapshot return false and a wait channel, that gets closed
once all initializers (also considering ones potentially added
afterwards) get marked as done. If a new initializer is registered
again, [Table.Initialized] would return false and a new watch
channel, closed again once all initializers are marked done,
and so on.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Similarly to the registration operation, let's panic if the target table
is not included in the write transaction, to make the failure explicit,
instead of silently skipping the operation and not marking the table
as initialized.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 requested a review from joamaki July 25, 2025 13:04
@giorio94 giorio94 marked this pull request as ready for review July 25, 2025 13:04
@giorio94 giorio94 requested a review from a team as a code owner July 25, 2025 13:04
Copy link

$ make
go build ./...
go: downloading go1.24.0 (linux/amd64)
go: downloading go.yaml.in/yaml/v3 v3.0.3
go: downloading github.com/cilium/hive v0.0.0-20250522123230-2946c4940f41
go: downloading golang.org/x/time v0.5.0
go: downloading github.com/spf13/cobra v1.8.0
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/cilium/stream v0.0.0-20240209152734-a0792b51812d
go: downloading github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de
go: downloading github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
go: downloading github.com/go-viper/mapstructure/v2 v2.2.1
go: downloading go.uber.org/dig v1.17.1
go: downloading golang.org/x/term v0.16.0
go: downloading github.com/spf13/viper v1.18.2
go: downloading golang.org/x/sys v0.17.0
go: downloading golang.org/x/tools v0.17.0
go: downloading github.com/spf13/cast v1.6.0
go: downloading github.com/fsnotify/fsnotify v1.7.0
go: downloading github.com/mitchellh/mapstructure v1.5.0
go: downloading github.com/sagikazarmark/slog-shim v0.1.0
go: downloading github.com/spf13/afero v1.11.0
go: downloading github.com/subosito/gotenv v1.6.0
go: downloading github.com/hashicorp/hcl v1.0.0
go: downloading gopkg.in/ini.v1 v1.67.0
go: downloading github.com/magiconair/properties v1.8.7
go: downloading github.com/pelletier/go-toml/v2 v2.1.0
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading golang.org/x/text v0.14.0
go test ./... -cover -vet=all -test.count 1
go: downloading github.com/stretchr/testify v1.8.4
go: downloading go.uber.org/goleak v1.3.0
go: downloading golang.org/x/exp v0.0.0-20240119083558-1b970713d09a
go: downloading github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2
ok  	github.com/cilium/statedb	22.338s	coverage: 83.3% of statements
ok  	github.com/cilium/statedb/index	0.004s	coverage: 28.7% of statements
ok  	github.com/cilium/statedb/internal	0.011s	coverage: 46.7% of statements
ok  	github.com/cilium/statedb/part	4.954s	coverage: 87.6% of statements
ok  	github.com/cilium/statedb/reconciler	0.291s	coverage: 89.7% of statements
	github.com/cilium/statedb/reconciler/benchmark		coverage: 0.0% of statements
	github.com/cilium/statedb/reconciler/example		coverage: 0.0% of statements
go test -race ./... -test.count 1
ok  	github.com/cilium/statedb	93.225s
ok  	github.com/cilium/statedb/index	1.014s
ok  	github.com/cilium/statedb/internal	1.031s
ok  	github.com/cilium/statedb/part	35.258s
ok  	github.com/cilium/statedb/reconciler	1.335s
?   	github.com/cilium/statedb/reconciler/benchmark	[no test files]
?   	github.com/cilium/statedb/reconciler/example	[no test files]
go test ./... -bench . -benchmem -test.run xxx
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkDB_WriteTxn_1-4                    	  522616	      2206 ns/op	    453254 objects/sec	    1448 B/op	      26 allocs/op
BenchmarkDB_WriteTxn_10-4                   	 1422374	       848.3 ns/op	   1178814 objects/sec	     513 B/op	       9 allocs/op
BenchmarkDB_WriteTxn_100-4                  	 1890986	       638.3 ns/op	   1566745 objects/sec	     430 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_1000-4                 	 1758542	       683.7 ns/op	   1462534 objects/sec	     391 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-4   	  558531	      2114 ns/op	    473065 objects/sec	    1258 B/op	      37 allocs/op
BenchmarkDB_NewWriteTxn-4                   	 1493544	       802.7 ns/op	     544 B/op	       8 allocs/op
BenchmarkDB_NewReadTxn-4                    	549586946	         2.181 ns/op	       0 B/op	       0 allocs/op
BenchmarkDB_Modify-4                        	    1620	    750834 ns/op	   1331852 objects/sec	  446397 B/op	    8136 allocs/op
BenchmarkDB_GetInsert-4                     	    1485	    820951 ns/op	   1218099 objects/sec	  430992 B/op	    8136 allocs/op
BenchmarkDB_RandomInsert-4                  	    1756	    692685 ns/op	   1443657 objects/sec	  422520 B/op	    7136 allocs/op
BenchmarkDB_RandomReplace-4                 	     372	   3224713 ns/op	    310105 objects/sec	 2023823 B/op	   49161 allocs/op
BenchmarkDB_SequentialInsert-4              	    1803	    685794 ns/op	   1458163 objects/sec	  391441 B/op	    7110 allocs/op
BenchmarkDB_SequentialInsert_Prefix-4       	     394	   2989029 ns/op	    334557 objects/sec	 3750166 B/op	   58568 allocs/op
BenchmarkDB_Changes_Baseline-4              	    1400	    852142 ns/op	   1173513 objects/sec	  465672 B/op	   10214 allocs/op
BenchmarkDB_Changes-4                       	     796	   1510290 ns/op	    662125 objects/sec	  820053 B/op	   14420 allocs/op
BenchmarkDB_RandomLookup-4                  	   20970	     57179 ns/op	  17488798 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_SequentialLookup-4              	   24801	     48396 ns/op	  20662678 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_Prefix_SecondaryIndex-4         	    7036	    163923 ns/op	   6100441 objects/sec	  125114 B/op	    1031 allocs/op
BenchmarkDB_FullIteration_All-4             	    1128	   1100624 ns/op	  90857519 objects/sec	     320 B/op	      11 allocs/op
BenchmarkDB_FullIteration_Get-4             	     201	   5884235 ns/op	  16994562 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_FullIteration_ReadTxnGet-4      	     206	   5840121 ns/op	  17122932 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_PropagationDelay-4              	  559059	      2138 ns/op	        17.00 50th_µs	        29.00 90th_µs	        52.00 99th_µs	    1133 B/op	      22 allocs/op
BenchmarkWatchSet_4-4                       	 1323957	       906.8 ns/op	     500 B/op	       8 allocs/op
BenchmarkWatchSet_16-4                      	  493624	      2408 ns/op	    1700 B/op	       8 allocs/op
BenchmarkWatchSet_128-4                     	   61789	     19343 ns/op	   13372 B/op	       8 allocs/op
BenchmarkWatchSet_1024-4                    	    5890	    190780 ns/op	  105228 B/op	       8 allocs/op
PASS
ok  	github.com/cilium/statedb	33.765s
PASS
ok  	github.com/cilium/statedb/index	0.003s
PASS
ok  	github.com/cilium/statedb/internal	0.003s
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: AMD EPYC 7763 64-Core Processor                
Benchmark_Uint64Map_Random-4        	    1262	    910909 ns/op	   1097804 items/sec	 2811503 B/op	   11030 allocs/op
Benchmark_Uint64Map_Sequential-4    	    1291	    870652 ns/op	   1148564 items/sec	 2591252 B/op	   11749 allocs/op
Benchmark_Insert_RootOnlyWatch-4    	   10000	    116302 ns/op	   8598281 objects/sec	   90619 B/op	    2044 allocs/op
Benchmark_Insert-4                  	    7045	    177046 ns/op	   5648237 objects/sec	  206591 B/op	    3076 allocs/op
Benchmark_Modify-4                  	   10000	    107303 ns/op	   9319435 objects/sec	  121686 B/op	    1050 allocs/op
Benchmark_GetInsert-4               	    7952	    147317 ns/op	   6788102 objects/sec	  122070 B/op	    1050 allocs/op
Benchmark_Replace-4                 	27443616	        43.63 ns/op	  22919647 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Replace_RootOnlyWatch-4   	27203329	        43.64 ns/op	  22915588 objects/sec	       0 B/op	       0 allocs/op
Benchmark_txn_1-4                   	 3275276	       366.6 ns/op	   2727460 objects/sec	     296 B/op	       5 allocs/op
Benchmark_txn_10-4                  	 8620956	       137.6 ns/op	   7266622 objects/sec	     113 B/op	       2 allocs/op
Benchmark_txn_100-4                 	10382816	       113.9 ns/op	   8777195 objects/sec	     130 B/op	       2 allocs/op
Benchmark_txn_1000-4                	 9359132	       127.9 ns/op	   7817623 objects/sec	     126 B/op	       2 allocs/op
Benchmark_txn_delete_1-4            	 1529714	       777.5 ns/op	   1286250 objects/sec	    3272 B/op	       8 allocs/op
Benchmark_txn_delete_10-4           	 7223368	       164.6 ns/op	   6073703 objects/sec	     368 B/op	       2 allocs/op
Benchmark_txn_delete_100-4          	11446683	       104.4 ns/op	   9581052 objects/sec	      82 B/op	       1 allocs/op
Benchmark_txn_delete_1000-4         	12150626	        97.73 ns/op	  10232422 objects/sec	      28 B/op	       1 allocs/op
Benchmark_Get-4                     	   34164	     35165 ns/op	  28437145 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Iterate-4                 	  174950	      6929 ns/op	 144322253 objects/sec	     104 B/op	       4 allocs/op
Benchmark_Hashmap_Insert-4          	   14931	     80266 ns/op	  12458631 objects/sec	   74265 B/op	      20 allocs/op
Benchmark_Hashmap_Get_Uint64-4      	  140240	      8537 ns/op	 117138914 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Hashmap_Get_Bytes-4       	  110755	     10835 ns/op	  92294838 objects/sec	       0 B/op	       0 allocs/op
PASS
ok  	github.com/cilium/statedb/part	26.813s
PASS
ok  	github.com/cilium/statedb/reconciler	0.004s
?   	github.com/cilium/statedb/reconciler/benchmark	[no test files]
?   	github.com/cilium/statedb/reconciler/example	[no test files]
go run ./reconciler/benchmark -quiet
1000000 objects reconciled in 2.45 seconds (batch size 1000)
Throughput 408644.92 objects per second
1225MB total allocated, 6011105 in-use objects, 384MB bytes in use

@joamaki joamaki merged commit ac88a9b into main Jul 28, 2025
1 check passed
@joamaki joamaki deleted the pr/giorio94/initializers-v2 branch July 28, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants