-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Context
I'm running a large routing node with LND using the new PostgreSQL backend.
After migrating my node from bbolt db to postgres via lndinit latest version, I'm experiencing constantly high CPU usage (~400% on a 6c mashine)) and extremely frequent read operations caused by queries on the table decayedlogdb_kv. Also this db is very large in size of arround 11GB. I dont know why exactly 11GB in size, but maybe it was added during the lndinit migration process. Dont have logs of that anymore..
schemaname | table_name | indexes_size
------------+-----------------------------+--------------
public | decayedlogdb_kv | 11 GB
public | channeldb_kv | 2255 MB
public | invoices | 57 MB
public | invoice_payment_hashes | 20 MB
public | invoice_features | 19 MB
public | invoice_htlcs | 15 MB
public | towerclientdb_kv | 8216 kB
public | walletdb_kv | 3072 kB
public | invoice_events | 208 kB
public | macaroondb_kv | 64 kB
public | invoice_htlc_custom_records | 40 kB
public | towerserverdb_kv | 32 kB
public | amp_sub_invoices | 24 kB
public | amp_sub_invoice_htlcs | 24 kB
public | invoice_event_types | 16 kB
public | migration_tracker | 16 kB
public | invoice_sequences | 16 kB
public | schema_migrations | 16 kB
High CPU load appears to be due to massive amounts of repetitive queries on parent_id, which do not properly leverage the existing partial index (parent_id, key).
System Info
- LND: latest version (v0.19.0-beta)
- PostgreSQL: 17
- native SQL invoices enabled and migrated
- RAM: 16 GB
- CPU: 6 cores, average usage 30–60% according to proxmox scaling (avergage 8% up to 15% on highload on bbolt) or arround 50% on htop scale
- Shared buffers: 8 GB
Index Size
lndb=# SELECT
pg_size_pretty(pg_total_relation_size('public.decayedlogdb_kv')) AS total_size,
pg_size_pretty(pg_indexes_size('public.decayedlogdb_kv')) AS index_size,
pg_size_pretty(pg_relation_size('public.decayedlogdb_kv')) AS data_size;
total_size | index_size | data_size
------------+------------+-----------
21 GB | 11 GB | 9744 MB
(1 row)
Should the index be bigger then the actually data?
Example problematic query (repeated thousands of times per second):
lndb=# SELECT key, value
FROM decayedlogdb_kv
WHERE parent_id = 149208614
ORDER BY key
LIMIT 1;
key | value
--------------------------------------------+------------
\x00000bfe0e6696137652a9a475dbb8c0538a045b | \x000db65b
(1 row)
lndb=#
Observations
Table decayedlogdb_kv has ~150 million rows, index size: 11 GB
Constant insert + read pattern
EXPLAIN (ANALYZE, BUFFERS)
SELECT key, value
FROM decayedlogdb_kv
WHERE parent_id = 149208614
ORDER BY key
LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.70..1.70 rows=1 width=18) (actual time=45.457..45.457 rows=1 loops=1)
Buffers: shared hit=3766
-> Sort (cost=1.70..1.70 rows=1 width=18) (actual time=45.456..45.456 rows=1 loops=1)
Sort Key: key
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=3766
-> Index Scan using decayedlogdb_kv_p on decayedlogdb_kv (cost=0.57..1.69 rows=1 width=18) (actual time=0.054..29.719 rows=232662 loops=1)
Index Cond: (parent_id = 149208614)
Buffers: shared hit=3763
Planning:
Buffers: shared hit=58
Planning Time: 0.325 ms
Execution Time: 45.469 ms
(13 rows)
Could the high CPU usage in the PostgreSQL backend be caused by inefficient query planning on the decayedlogdb_kv table?
Specifically, is it possible that PostgreSQL often ignores the optimal index and falls back to a less efficient plan, reading and sorting hundreds of thousands of rows just to fetch a single result? This seems to happen constantly on large routing nodes and may explain the extreme load.
Has this been analyzed or optimized already?
Another stats:
relname | idx_scan | idx_tup_fetch | total_reads
----------------+--------------+-------------------+-------------
decayedlogdb_kv | 456,924,012 | 462,423,861,984 | 6.6 billion
→ This accounts for 95%+ of total PostgreSQL I/O
Hit Ratio of the tables:
lndb=# SELECT
schemaname,
relname AS table_name,
heap_blks_hit,
heap_blks_read,
CASE
WHEN (heap_blks_hit + heap_blks_read) = 0 THEN NULL
ELSE ROUND(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read), 2)
END AS hit_ratio
FROM
pg_statio_user_tables
ORDER BY
hit_ratio DESC NULLS LAST;
schemaname | table_name | heap_blks_hit | heap_blks_read | hit_ratio
------------+-----------------------------+---------------+----------------+-----------
public | towerclientdb_kv | 115920334 | 2465 | 100.00
public | schema_migrations | 12 | 0 | 100.00
public | macaroondb_kv | 599226 | 27 | 100.00
public | invoice_htlc_custom_records | 322927 | 48 | 99.99
public | walletdb_kv | 3117766 | 1257 | 99.96
public | invoice_events | 34014 | 38 | 99.89
public | invoice_event_types | 7836 | 9 | 99.89
public | channeldb_kv | 7406342580 | 8896397 | 99.88
public | decayedlogdb_kv | 6603038283 | 9030643 | 99.86
public | invoice_sequences | 1618 | 8 | 99.51
public | invoice_features | 1102001 | 6663 | 99.40
public | invoice_htlcs | 664875 | 5322 | 99.21
public | invoice_payment_hashes | 746256 | 6388 | 99.15
public | invoices | 1447343 | 32033 | 97.83
public | migration_tracker | 7 | 7 | 50.00
public | amp_sub_invoices | 0 | 0 |
public | towerserverdb_kv | 0 | 0 |
public | amp_sub_invoice_htlcs | 0 | 0 |
(18 rows)
Suggestions / Questions
- Could LND use a non-partial index on (parent_id, key) in addition to the partial one?
- Could LND avoid sorting by using index order directly or via a query rewrite?
- Could these lookups be cached or batched internally?
- Could this table be compacted or purged over time? See
[bug]: garbage collectbatch-replay
bucket insphinxreplay.db
#7107 (comment)
Additional system info
VACUUM_FULL
didnt help any on the decayedlogdb_kv
My channel.db was 6.8G in size before migration, my sphinx arround 3.8GB
channeldb_kv is also heavily used (as expected) but not causing the same performance issues. It's the repeated access pattern on decayedlogdb_kv that is clearly the bottleneck on larger nodes.
Let me know if I can help further. I can provide full EXPLAIN ANALYZE, table schemas, and stat snapshots if needed.
Thanks for all the great work – this PostgreSQL backend is promising, but currently not usable on large nodes in production due to this bottleneck. I had a CPU load of of 25% on a 6c machine instead of now 450%
I run about 50 channels and do heavily rebalance
Before After bbolt:
I followed the PG optimizations suggest on gist:
https://gist.github.com/djkazic/526fa3e032aea9578997f88b45b91fb9
maybe relevant issue: