Skip to content

Store frequently accessed header metadata terms in a cache #5619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Aug 6, 2025

Previously, on every write and _all_docs and other call the props term was always loaded from disk, decompressed and run through binary_to_term. The term is small, and likely will be in the page cache, but still we'd have to go back to the disk to fetch it, in the worst case.

To fix it, load the terms once at startup and cache them into memory. This way we get the best of both worlds: a small header record, since we're using a pointer to a term, and fast interactive operations, since we don't have to do extra reads and deserializations.

Technically we could cache the security_ptr term as well, but it is currently cached in the #db{} record so would require more effort, especially since we access those across the cluster via fabric_util:get_db() call.

@nickva
Copy link
Contributor Author

nickva commented Aug 6, 2025

To notice the difference between main and PR can also try this trace pattern:

MAIN

http $DB/db/_all_docs'?limit=0'

=>

> recon_trace:calls({couch_file, pread_term, fun([_, _]) -> message(caller_line()) end}, 1000).
1

19:4:44.127475 <0.5659.0> couch_file:pread_term(<0.1778.0>, 0) {couch_bt_engine,get_header_term,3,{"src/couch_bt_engine.erl",1198}}

19:4:44.128057 <0.5659.0> couch_file:pread_term(<0.1778.0>, 0) {couch_bt_engine,get_header_term,3,{"src/couch_bt_engine.erl",1198}}

19:4:44.128553 <0.5659.0> couch_file:pread_term(<0.1778.0>, 94860) {couch_btree,get_node,2,{"src/couch_btree.erl",474}}

19:4:44.129022 <0.5659.0> couch_file:pread_term(<0.1778.0>, 49484) {couch_btree,get_node,2,{"src/couch_btree.erl",474}}

couch_btree:get_node/2 are expected but get_header_terms should be cached...

WITH PR

> recon_trace:calls({couch_file, pread_term, fun([_, _]) -> message(caller_line()) end}, 1000).
1

19:8:11.302801 <0.4886.0> couch_file:pread_term(<0.949.0>, 86369) {couch_btree,get_node,2,{"src/couch_btree.erl",474}}

19:8:11.303378 <0.4886.0> couch_file:pread_term(<0.949.0>, 82175) {couch_btree,get_node,2,{"src/couch_btree.erl",474}}

Just couch_btree:get_node/2 calls, as expected.

@nickva
Copy link
Contributor Author

nickva commented Aug 6, 2025

Random thought for a future PR, maybe cache the root nodes from the b-trees as well (with a limit on total size)...

@rnewson
Copy link
Member

rnewson commented Aug 7, 2025

the security object and props are stored in the #db header when we open a db and kept in memory (in couch_db_updater's gen_server state).

@nickva nickva force-pushed the cache-props-and-other-header-terms branch 2 times, most recently from 23b3053 to f4cbd25 Compare August 7, 2025 14:46
@nickva
Copy link
Contributor Author

nickva commented Aug 7, 2025

the security object and props are stored in the #db header when we open a db and kept in memory (in couch_db_updater's gen_server state).

I removed the security_ptr for now as it is cached in the couch_db_updater, but left a note in the comments, that will require a larger change, especially that we read that one from db records across nodes.

Comment on lines -18 to +17
fsync_options_deprecated,
cache = #{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsync_options_deprecated is a deprecated field and I haven't found any other
source code related to it, so I guess replacing it with cache should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, exactly, that one is from when we removed the fsync options (fsync before header, after header, or both, now we always sync both)

Previously, on every write and _all_docs and other call the props term was
always loaded from disk, decompressed and run through binary_to_term. The term
is small, and likely will be in the page cache, but still we'd have to go back
to the disk to fetch it, in the worst case.

To fix it, load the terms once at startup and cache them into memory. This way
we get the best of both worlds: a small header record, since we're using a
pointer to a term, and fast interactive operations, since we don't have to do
extra reads and deserializations.

Technically we could cache the security_ptr term as well, but it is currently
cached in the #db{} record so would require more effort, especially since we
access those across the cluster via fabric_util:get_db() call.
@nickva nickva force-pushed the cache-props-and-other-header-terms branch from f4cbd25 to cb46dc1 Compare August 7, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants