v3-alpha: Rbyds and B-trees and gcksums, oh my! #1111

geky · 2025-05-29T02:43:37Z

                  ####                                 
               ######## #####    #####                 
             ################# ########  ####          
       #### ### ############################## ####    
     #############-#### ##_#########################   
    ########'#########_\'|######_####################  
     ##### ##########_.. \ .'#_._#### #/#_##########   
   ################_    \ v .'_____#_.'.'_ ########### 
  ################_ "'. |  .""  _________ '"-##########
  ########"#####..--.  /    .-"'|  | |   '"-####'######
    #############    \     /  .---.|_|_    ########### 
     ###########     |    /  -|        |-   ########   
                     )    |  -|littlefs|-              
                     |    |  -|   v3   |-              
                     |    |   '--------'               
                     |   ||     ' ' '                  
       -------------/-/---\\----------------------

Note: v3-alpha discussion (#1114)

Unfortunately GitHub made a complete mess of the PR discussion. To try to salvage things, please use #1114 for new comments. Feedback/criticism are welcome and immensely important at this stage.

Table of contents ^

Hello!
Wait, a disk breaking change?
What's new?
Code/stack size
Benchmarks
1. Simulated benchmarks
Funding
Next steps

Hello! ^

Hello everyone! As some of you may have already picked up on, there's been a large body of work fermenting in the background for the past couple of years. Originally started as an experiment to try to solve littlefs's $O(n^2)$ metadata compaction, this branch eventually snowballed into more-or-less a full rewrite of the filesystem from the ground up.

There's still several chunks of planned work left, but now that this branch has reached on-disk feature parity with v2, there's nothing really stopping it from being merged eventually.

So I figured it's a good time to start calling this v3, and put together a public roadmap.

NOTE: THIS WORK IS INCOMPLETE AND UNSTABLE

Here's a quick TODO list of planned work before stabilization. More details below:

This work may continue to break the on-disk format.

That being said, I highly encourage others to experiment with v3 where possible. Feedback is welcome, and immensely important at this stage. Once it's stabilized, it's stabilized.

To help with this, the current branch uses a v0.0 as its on-disk version to indicate that it is experimental. When it is eventually released, v3 will reject this version and fail to mount.

Unfortunately, the API will be under heavy flux during this period.

A note on benchmarking: The on-disk block-map is key for scalable allocator performance, so benchmarks at this stage needs to be taken with a grain of salt when many blocks are involved. Please refer to this version as "v3 (no bmap)" or something similar in any published benchmarks until this work is completed.

Wait, a disk breaking change? ^

Yes. v3 breaks disk compatibility from v2.

I think this is a necessary evil. Attempting to maintain backwards compatibility has a heavy cost:

Development time - The littlefs team is ~1 guy, and v3 has already taken ~2.5 years. The extra work to make everything compatible would stretch this out much longer and likely be unsustainable.
Code cost - The goal of littlefs is to be, well, little. This is unfortunately in conflict with backwards compatibility.

Take the new B-tree data-structure, for example. It would be easy to support both B-tree and CTZ skip-list files, but now you need ~2x the code. This cost gets worse for the more enmeshed features, and potentially exceeds the cost of just including both v3 and v2 in the codebase.

So I think it's best for both littlefs as a project and long-term users to break things here.

Note v2 isn't going anywhere! I'm happy to continue maintaining the v2 branch, merge bug fixes when necessary, etc. But the economic reality is my focus will be shifting to v3.

What's new ^

Ok, with that out of the way, what does breaking everything actually get us?

Implemented: ^

Efficient metadata compaction: $O(n^2) \rightarrow O(n \log n)$ ^

v3 adopts a new metadata data-structure: Red-black-yellow Dhara trees (rbyds). Based on the data-structure invented by Daniel Beer for the Dhara FTL, rbyds extend log-encoded Dhara trees with self-balancing and self-counting (also called order-statistic) properties.

This speeds up most metadata operations, including metadata lookup ( $O(n) \rightarrow O(\log n)$ ), and, critically, metadata compaction ( $O(n^2) \rightarrow O(n \log n)$ ).

This improvement may sound minor on paper, but it's a difference measured in seconds, sometimes even minutes, on devices with extremely large blocks.
Efficient random writes: $O(n) \rightarrow O(\log_b^2 n)$ ^

A much requested feature, v3 adopts B-trees, replacing the CTZ skip-list that previously backed files.

This avoids needing to rewrite the entire file on random reads, bringing the performance back down into tractability.

For extra cool points, littlefs's B-trees use rbyds for the inner nodes, which makes CoW updates much cheaper than traditional array-packed B-tree nodes when large blocks are involved ( $O(n) \rightarrow O(\log n)$ ).
Better logging: No more sync-padding issues ^

v3's B-trees support inlining data directly in the B-tree nodes. This gives us a place to store data during sync, without needing to pad things for prog alignment.

In v2 this padding would force the rewriting of blocks after sync, which had a tendency to wreck logging performance.
Efficient inline files, no more RAM constraints: $O(n^2) \rightarrow O(n \log n)$ ^

In v3, B-trees can have their root inlined in the file's mdir, giving us what I've been calling a "B-shrub". This, combined with the above inlined leaves, gives us a much more efficient inlined file representation, with better code reuse to boot.

Oh, and B-shrubs also make small B-trees more efficient by avoiding the extra block needed for the root.
Independent file caches ^

littlefs's pcache, rcache, and file caches can be configured independently now. This should allow for better RAM utilization when tuning the filesystem.
Easier logging APIs: lfs3_file_fruncate ^

Thanks to the new self-counting/order-statistic properties, littlefs can now truncate from both the end and front of files via the new lfs3_file_fruncate API.

Before, the best option for logging was renaming log files when they filled up. Now, maintaining a log/FIFO is as easy as:
```
lfs3_file_write(&lfs, &file, entry, entry_size) => entry_size;
lfs3_file_fruncate(&lfs, &file, log_size) => 0;
```
Sparse files ^

Another advantage of adopting B-trees, littlefs can now cheaply represent file holes, where contiguous runs of zeros can be implied without actually taking up any disk space.

Currently this is limited to a couple operations:
- lfs3_file_truncate
- lfs3_file_fruncate
- lfs3_file_seek + lfs3_file_write past the end of the file
But more advanced hole operations may be added in the future.
Efficient file name lookup: $O(n) \rightarrow O(\log_b n)$ ^

littlefs now uses a B-tree (yay code reuse) to organize files by file name. This allows for much faster file name lookup than the previous linked-list of metadata blocks.
A simpler/more robust metadata tree ^

As a part of adopting B-trees for metadata, the previous threaded file tree has been completely ripped out and replaced with one big metadata tree: the M-tree.

I'm not sure how much users are aware of it, but the previous threaded file tree was a real pain-in-the-ass with the amount of bugs it caused. Turns out having a fully-connected graph in a CoBW filesystem is a really bad idea.

In addition to removing an entire category of possible bugs, adopting the M-tree allows for multiple directories in a single metadata block, removing the 1-dir = 1-block minimum requirement.
A well-defined sync model ^

One interesting thing about littlefs, it doesn't have a strictly POSIX API. This puts us in a relatively unique position, where we can explore tweaks to the POSIX API that may make it easer to write powerloss-safe applications.

To leverage this (and because the previous sync model had some real problems), v3 includes a new, well-defined sync model.

I think this discussion captures most of the idea, but for a high-level overview:
1. Open file handles are strictly snapshots of the on-disk state. Writes to a file are copy-on-write (CoW), with no immediate affect to the on-disk state or any other file handles.
2. Syncing or closing an in-sync file atomically updates the on-disk state and any other in-sync file handles.
3. Files can be desynced, either explicitly via lfs3_file_desync, or because of an error. Desynced files do not recieve sync broadcasts, and closing a desynced file has no affect on the on-disk state.
4. Calling lfs3_file_sync on a desynced file will atomically update the on-disk state, any other in-sync file handles, and mark the file as in-sync again.
5. Calling lfs3_file_resync on a file will discard its current contents and mark the file as in-sync. This is equivalent to
  closing and reopening the file.
Stickynotes, no more 0-sized files ^

As an extension of the littlefs's new sync model, v3 introduces a new file type: LFS3_TYPE_STICKYNOTE.

A stickynote represents a file that's in the awkward state of having been created, but not yet synced. If you lose power, stickynotes are hidden from the user and automatically cleaned up on the next mount.

This avoids the 0-sized file issue, while still allowing most of the POSIX interactions users expect.
A new and improved compat flag system ^

v2.1 was a bit of a mess, but it was a learning experience. v3 still includes a global version field, but also includes a set of compat flags that allow non-linear addition/removal of future features.

These are probably familiar to users of Linux filesystems, though I've given them slightly different names:
- rcompat flags - Must understand to read the filesystem (incompat_flags)
- wcompat flags - Must understand to write to the filesystem (ro_compat_flags)
- ocompat flags - No understanding necessary (compat_flags)
This also provides an easy route for marking a filesystem as read-only, non-standard, etc, on-disk.
Error detection! - Global-checksums ^

v3 now supports filesystem-wide error-detection. This is actually quite tricky in a CoBW filesystem, and required the invention of global-checksums (gcksums) to prevent rollback issues caused by naive checksumming.

With gcksums, and a traditional Merkle-tree-esque B-tree construction, v3 now provides a filesystem-wide self-validating checksum via lfs3_fs_cksum. This checksum can be stored external to the filesystem to provide protection against last-commit rollback issues, metastability, or just for that extra peace of mind.

Funny thing about checksums. It's incredibly cheap to calculate checksums when writing, as we're already processing that data anyways. The hard part is, when do you check the checksums?

This is a problem that mostly ends up on the user, but to help, v3 adds a large number checksum checking APIs (probably too many if I'm honest):
- LFS3_M_CKMETA/CKDATA - Check checksums during mount
- LFS3_O_CKMETA/CKDATA - Check checksums during file open
- lfs3_fs_ckmeta/ckdata - Explicitly check all checksums in the filesystem
- lfs3_file_ckmeta/ckdata - Explicitly check a file's checksums
- LFS3_T_CKMETA/CKDATA - Check checksums incrementally during a traversal
- LFS3_GC_CKMETA/CKDATA - Check checksums during GC operations
- LFS3_M_CKPROGS - Closed checking of data during progs
- LFS3_M_CKFETCHES - Optimistic (not closed) checking of data during fetches
- LFS3_M_CKREADS (planned) - Closed checking of data during reads
Better traversal APIs ^

The traversal API has been completely reworked to be easier to use (both externally and internally).

No more callback needed, blocks can now be iterated over via the dir-like lfs3_trv_read function.

Traversals can also perform janitorial work and check checksums now, based on the flags provided to lfs3_trv_open.
Incremental GC ^

GC work can now be accomplished incrementally, instead of requiring one big go. This is managed by lfs3_fs_gc, cfg.gc_flags, and cfg.gc_steps.

Internally, this just shoves one of the new traversal objects into lfs3_t. It's equivalent to managing a traversal object yourself, but hopefully makes it easier to write library code.

However, this does add a significant chunk of RAM to lfs3_t, so GC is now an opt-in feature behind the LFS3_GC ifdef.
Better recovery from runtime errors ^

Since we're already doing a full rewrite, I figured let's actually take the time to make sure things don't break on exceptional errors.

Most in-RAM filesystem state should now revert to the last known-good state on error.

The one exception involves file data (not metadata!). Reverting file data correctly turned out to roughly double the cost of files. And now that you can manual revert with lfs3_file_resync, I figured this cost just isn't worth it. So file data remains undefined after an error.

In total, these changes add a significant amount of code and stack, but I'm of the opinion this is necessary for the maturing of littlefs as a filesystem.
Standard custom attributes ^

Breaking disk gives us a chance to reserve attributes 0x80-0xbf for future standard custom attributes:
- 0x00-0x7f - Free for user-attributes (uattr)
- 0x80-0xbf - Reserved for standard-attributes (sattr)
- 0xc0-0xff - Encouraged for system-attributes (yattr)
In theory, it was technically possible to reserve these attributes without a disk-breaking change, but it's much safer to do so while we're already breaking the disk.

v3 also includes the possibility of extending the custom attribute space from 8-bits to ~25-bits in the future, but I'd hesitate to to use this, as it risks a significant increase in stack usage.
More tests! ^

v3 comes with a couple more tests than v2 (+~6812.2%):
```
     suites  cases  permutations ;     pls   runtime
v2:      22    198         11641 ;   28741    54.16s
v3:      23    784        804655 ; 2228513  1323.18s
```
You may or may not have seen the test framework rework that went curiously under-utilized. That was actually in preparation for the v3 work.

The goal is not 100% line/branch coverage, but just to have more confidence in littlefs's reliability.
Simple key-value APIs ^

v3 includes a couple easy-to-use key-value APIs:
- lfs3_get - Get the contents of a file
- lfs3_size - Get the size of a file
- lfs3_set - Set the contents of a file
- lfs3_remove - Remove a file (this one already exists)
This API is limited to files that fit in RAM, but if it fits your use case, you can disable the full file API with LFS3_KVONLY to save some code.

If your filesystem fits in only 2 blocks, you can also define LFS3_2BONLY to save more code.

These can be useful for creating small key-value stores on systems that already use littlefs for other storage.

Planned: ^

Efficient block allocation, via optional on-disk block-map (bmap) ^

The one remaining bottleneck in v3 is block allocation. This is a tricky problem for littlefs (and any CoW/CoBW filesystem), because we don't actually know when a block becomes free.

This is in-progress work, but the solution I'm currently looking involves 1. adding an optional on-disk block map (bmap) stored in gstate, and 2. updating it via tree diffing on sync. In theory this will drop huge file writes: $O(n^2 \log n) \rightarrow O(n \log_b^2 n)$

There is also the option of using the bmap as a simple cache, which doesn't avoid the filesystem-wide scan but at least eliminates the RAM constraint of the lookahead buffer.

As a plus, we should be able to leverage the self-counting property of B-trees to make the on-disk bmap compressible.
Bad block tracking ^

This is a much requested feature, and adding the optional on-disk bmap finally gives us a place to track bad blocks.
Pre-erased block tracking ^

Just like bad-blocks, the optional on-disk bmap gives us a place to track pre-erased blocks. Well, at least in theory.

In practice it's a bit more of a nightmare. To avoid multiple progs, we need to mark erased blocks as unerased before progging. This introduces an unbounded number of catch-22s when trying to update the bmap itself.

Fortunately, if instead we store a simple counter in the bmap's gstate, we can resolve things at the mrootanchor worst case.
Error correction! - Metadata redundancy ^

Note it's already possible to do error-correction at the block-device level outside of littlefs, see ramcrc32cbd and ramrsbd for examples. Because of this, integrating in-block error correction is low priority.

But I think there's potential for cross-block error-correction in addition to the in-block error-correction.

The plan for cross-block error-correction/block redundancy is a bit different for metadata vs data. In littlefs, all metadata is logs, which is a bit of a problem for parity schemes. I think the best we can do is store metadata redundancy as naive copies.

But we already need two blocks for every mdir, one usually just sits unused when not compacting. This, combined with metadata usually being much smaller than data, makes the naive scheme less costly than one might expect.
Error correction! - Data redundancy ^

For raw data blocks, we can be a bit more clever. If we add an optional dedup tree for block -> parity group mapping, and an optional parity tree for parity blocks, we can implement a RAID-esque parity scheme for up to 3 blocks of data redundancy relatively cheaply.
Transparent block deduplication ^

This one is a bit funny. Originally block deduplication was intentionally out-of-scope, but it turns out you need something that looks a lot like a dedup tree for error-correction to work in a system that allows multiple block references.

If we already need a virtual -> physical block mapping for error correction, why not make the key the block checksum and get block deduplication for free?

Though if this turns out to not be as free as I think it is, block deduplication will fall out-of-scope.

Stretch goals: ^

These may or may not be included in v3, depending on time and funding:

lfs3_migrate for v2->v3 migration ^
16-bit and 64-bit variants ^
Config API rework ^
Block device API rework ^
Custom attr API rework ^
Alternative (cheaper) write-strategies (write-once, global-aligned, eager-crystallization) ^
Advanced file tree operations (lfs3_file_punchhole, lfs3_file_insertrange, lfs3_file_collapserange, LFS3_SEEK_DATA, LFS3_SEEK_HOLE) ^
Advanced file copy-on-write operations (shallow lfs3_cowcopy + opportunistic lfs3_copy) ^
Reserved blocks to prevent CoW lockups ^
Metadata checks to prevent metadata lockups ^
Integrated block-level ECC (ramcrc32cbd, ramrsbd) ^
Disk-level RAID (this is just data redund + a disk aware block allocator) ^

Out-of-scope (for now): ^

If we don't stop somewhere, v3 will never be released. But these may be added in the future:

Alternative checksums (crc16, crc64, sha256, etc) ^
Feature-limited configurations for smaller code/stack sizes (LFS3_NO_DIRS, LFS3_KV, LFS3_2BLOCK, etc) ^
lfs3_file_openat for dir-relative APIs ^
lfs3_file_openn for non-null-terminated-string APIs ^
Transparent compression ^
Filesystem shrinking ^
High-level caches (block cache, mdir cache, btree leaf cache, etc) ^
Symbolic links ^
100% line/branch coverage ^

Code/stack size ^

littlefs v1, v2, and v3, 1 pixel ~= 1 byte of code, click for a larger interactive codemap (commit)

littlefs v2 and v3 rdonly, 1 pixel ~= 1 byte of code, click for a larger interactive codemap (commit)

Unfortunately, v3 is a little less little than v2:

            code            stack           ctx
v2:        17144             1440           580
v3:        37352 (+117.9%)   2280 (+58.3%)  636 (+9.7%)

            code            stack           ctx
v2-rdonly:  6270              448           580
v3-rdonly: 10616 (+69.3%)     808 (+80.4%)  508 (-12.4%)

On one hand, yes, more features generally means more code.

And it's true there's an opportunity here to carve out more feature-limited builds to save code/stack in the future.

But I think it's worth discussing some of the other reasons for the code/stack increase:

Runtime error recovery ^

Recovering from runtime errors isn't cheap. We need to track both the before and after state of things during fallible operations, and this adds both stack and code.

But I think this is necessary for the maturing of littlefs as a filesystem.

Maybe it will make sense to add a sort of LFS3_GLASS mode in the future, but this is out-of-scope for now.
B-tree flexibility ^

The bad news: The new B-tree files are extremely flexible. Unfortunately, this is a double-edged sword.

B-trees, on their own, don't add that much code. They are a relatively poetic data-structure. But deciding how to write to a B-tree, efficiently, with an unknown write pattern, is surprisingly tricky.

The current implementation, what I've taken to calling the "lazy-crystallization algorithm", leans on the more complicated side to see what is possible performance-wise.

The good news: The new B-tree files are extremely flexible.

There's no reason you need the full crystallization algorithm if you have a simple write pattern, or don't care as much about performance. This will either be a future or stretch goal, but it would be interesting to explore alternative write-strategies that could save code in these cases.
Traversal inversion ^

Inverting the traversal, i.e. moving from a callback to incremental state machine, adds both code and stack as 1. all of the previous on-stack state needs to be tracked explicitly, and 2. we now need to worry about what happens if the filesystem is modified mid-traversal.

In theory, this could be reverted if you don't need incremental traversals, but extricating incremental traversals from the current codebase would be an absolute nightmare, so this is out-of-scope for now.

Benchmarks ^

A note on benchmarking: The on-disk block-map is key for scalable allocator performance, so benchmarks at this stage needs to be taken with a grain of salt when many blocks are involved. Please refer to this version as "v3 (no bmap)" or something similar in any published benchmarks until this work is completed.

First off, I would highly encourage others to do their own benchmarking with v3/v2. Filesystem performance is tricky to measure because it depends heavily on your application's write pattern and hardware nuances. If you do, please share in this thread! Others may find the results useful, and now is the critical time for finding potential disk-related performance issues.

Simulated benchmarks ^

To test the math behind v3, I've put together some preliminary simulated benchmarks.

Note these are simulated and optimistic. They do not take caching or hardware buffers into account, which can have a big impact on performance. Still, I think they provide at least a good first impression of v3 vs v2.

To find an estimate of runtime, I first measured the amount of bytes read, progged, and erased, and then scaled based on values found in relevant datasheets. The options here were a bit limited, but WinBond fortunately provides runtime estimates in the datasheets on their website:

NOR flash - w25q64jv
NAND flash - w25n01gv
SD/eMMC - Also w25n01gv, assuming a perfect FTL

I said optimistic, didn't I? I could't find useful estimates for SD/eMMC, so I'm just assuming a perfect FTL here.

These also assume an optimal bus configuration, which, as any embedded engineer knows, is often not the case.

Full benchmarks here: https://benchmarks.littlefs.org (repo, commit)

And here are the ones I think are the most interesting:

Note that SD/eMMC is heavily penalized by the lack of on-disk block-map! SD/eMMC breaks flash down into many small blocks, which tends to make block allocator performance dominate.

Linear writes, where we write a 1 MiB file and don't call sync until closing the file. ^

This one is the most frustrating to compare against v2. CTZ skip-lists are really fast at appending! The problem is they are only fast at appending:

(commit, reads, progs, erases, sim, usage)
Random writes, note we start with a 1MiB file. ^

As expected, v2 is comically bad at random writes. v3 is indistinguishable from zero in the NOR case:

(commit, reads, progs, erases, sim, usage)
Logging, write 4 MiB, but limit the file to 1 MiB. ^

In v2 this is accomplished by renaming the file, in v3 we can leverage lfs3_file_fruncate.

v3 performs significantly better with large blocks thanks to avoiding the sync-padding problem:

(commit, reads, progs, erases, sim, usage)

Funding ^

If you think this work is worthwhile, consider sponsoring littlefs. Current benefits include:

Being able to complain about v3 not being released yet
Being able to complain about the disk breakage v2 -> v3

I joke, but I truly appreciate those who have contributed to littlefs so far. littlefs, in its current form, is a mostly self-funded project, so every little bit helps.

If you would like to contribute in a different way, or have other requests, feel free to reach me at geky at geky.net.

As stabilization gets closer, I will also be open to contract work to help port/integrate/adopt v3. If this is interesting to anyone, let me know.

Thank you @micropython, @fusedFET for sponsoring littlefs, and thank you @Eclo, @kmetabg, and @nedap for your past sponsorships!

Next steps ^

For me, I think it's time to finally put together a website/wiki/discussions/blog. I'm not sure on the frequency quite yet, but I plan to write/publish the new DESIGN.md in chapters in tandem with the remaining work.

EDIT: Pinned codemap/plot links to specific commits via benchmarks.littlefs.org/tree.html
EDIT: Updated with rdonly code/stack sizes
EDIT: Added link to #1114
EDIT: Implemented simple key-value APIs
EDIT: Added lfs3_migrate stretch goal with link to #1120
EDIT: Adopted lfs3_traversal_t -> lfs3_trv_t rename
EDIT: Added link to #1125 to clarify "feature parity"

Should've probably been two commits, but: 1. Cleaned up tracebd.py's header generation to be consistent with dbgbmap.py and other scripts. Percentage fields are now consistently floats in all scripts, allowing user-specified precision when punescaping. 2. Moved header generation up to where we still have the disk open (in dbgbmap[d3].py), to avoid issues with lazy Lfs attrs trying to access the disk after it's been closed. Found while testing with --title='cksum %(cksum)08x'. Lfs tries to validate the gcksum last minute and things break.

This took a bit of a messy route, but these scripts should be good to go now.

This matches dbgbmap.py and fits in with the other dbg scripts. The choice of tracebd.py for the name was arbitrary anyways, we just needed something that wouldn't conflict with other scripts.

No one is realistically ever going to use this. Ascii art is just too low resolution, trying to pad anything just wastes terminal space. So we might as well not support --padding and save on the additional corner cases. Worst case, in the future we can always find this commit and revert things.

This isn't true, especially for dbgbmap.py, 100% is very possible in filesystems with small files. But by limiting padding to 99.9%, we avoid the annoying wasted space caused by the rare but occasional 100.0%.

This automatically minimizes the status strings without flickering, all it took was a bit of ~*global state*~. --- If I'm remembering correctly, this was actually how tracebd.py used to work before dbgbmap.py was added. The idea was dropped with dbgbmap.py since dbgbmap.py relied on watch.py for real-time rendering and couldn't persist state. But now dbgbmap.py has its own -k/--keep-open flag, so that's not a problem.

After all, who doesn't love a good bit of flickering. I think I was trying to be too clever, so reverting. Printing these with no padding is the simplest solution, provides the best information density, and worst case you can always add -s1 to limit the update frequency if flickering is hurting readability.

So now the result scripts always require -d/--diff to diff: - before: ./scripts/csv.py a.csv -pb.csv - after: ./scripts/csv.py a.csv -db.csv -p For a couple reasons: - Easier to toggle - Simpler internally to only have one diff path flag - The previous behavior was a bit unintuitive

This affects the table renderers as well as csv.py's ratio expr. This is a bit more correct, handwaving 0/0 (mapping 0/0 -> 100% is useful for cov.py, please don't kill me mathematicians): frac(1,0) => 1/0 (∞%) frac(0,0) => 0/0 (100.0%) frac(0,1) => 0/1 (0.0%)

This prefix was extremely arbitrary anyways. The prefix Csv* has slightly more meaning than R*, since these scripts interact with .csv files quite a bit, and it avoids confusion with rbyd-related things such as Rattr, Ralt, etc.

- CsvInt.x -> CsvInt.a - CsvFloat.x -> CsvFloat.a - Rev.x -> Rev.a This matches CsvFrac.a (paired with CsvFrac.b), and avoids confusion with x/y variables such as Tile.x and Tile.y. The other contender was .v, since these are cs*v* related types, but sticking with .a gets the point across that the name really doesn't have any meaning. There's also some irony that we're forcing namedtuples to have meaningless names, but it is useful to have a quick accessor for the internal value.

This allows -w to provide a shortform flag for both --wear and --block-cycles, depending on if you include a cycles argument: - -w => --wear - -w100 => --block-cycles=100 I was originally hesitant to add this since it's inconsistent from --read/--prog/--erase, which can't have shortforms due to flag conflicts, but --wear is probably a special enough case.

Two new tricks: 1. Hide the cursor while redrawing the ring buffer. 2. Build up the entire redraw in RAM first, and render everything in a single write call. These _mostly_ get rid of the cursor flickering issues in rapidly updating scripts.

Dang, this touched like every single script.

- Added TreeArt __bool__ and __len__. This was causing a crash in _treeartfrommtreertree when rtree was empty. The code was not updated in the set -> TreeArt class transition, and went unnoticed because it's unlikely to be hit unless the filesystem is corrupt. Fortunately(?) realtime rendering creates a bunch of transiently corrupt filesystem images. - Tweaked lookupleaf to not include mroots in their own paths. This matches the behavior of leaf mdirs, and is intentionally different from btree's lookupleaf which needs to lookup the leaf rattr to terminate. - Tweaked leaves to not remove the last path entry if it is an mdir. This hid the previous lookupleaf inconsistency. We only remove the last rbyd from the path because it is redundant, and for mdirs/mroots it should never be redundant. I ended up just replacing the corrupt check with an explicit check that the rbyd is redundant. This should be more precise and avoid issues like this in the future. Also adopted explicit redundant checks in Btree.leaves and Lfs.File.leaves.

This matches the behavior of paths and helps figure out which string is associated with which crc32c/parity when checksumming multiple strings: $ ./scripts/crc32c.py -s hi hello f59dd9c2 hi 9a71bb4c hello It also might help clear up confusion if someone forgets to quote a string with spaces inside it.

This just gives dbgtag.py a few more bells and whistles that may be useful: - Can now parse multiple tags from hex: $ ./scripts/dbgtag.py -x 71 01 01 01 12 02 02 02 71 01 01 01 altrgt 0x101 w1 -1 12 02 02 02 shrubdir w2 2 Note this _does_ skip attached data, which risks some confusion but not skipping attached data will probably end up printing a bunch of garbage for most use cases: $ ./scripts/dbgtag.py -x 01 01 01 04 02 02 02 02 03 03 03 03 01 01 01 04 gdelta 0x01 w1 4 03 03 03 03 struct 0x03 w3 3 - Included hex in output. This is helpful for learning about the tag encoding and also helps identify tags when parsing multiple tags. I considered also included offsets, which might help with understanding attached data, but decided it would be too noisy. At some point you should probably jump to dbgrbyd.py anyways... - Added -i/--input to read tags from a file. This is roughly the same as -x/--hex, but allows piping from other scripts: $ ./scripts/dbgcat.py disk -b4096 0 -n4,8 | ./scripts/dbgtag.py -i- 80 03 00 08 magic 8 Note this reads the entire file in before processing. We'd need to fit everything into RAM anyways to figure out padding.

These mimic dbgtag.py, but provide debugging for the lower-level integer primitives in littlefs: $ ./scripts/dbgleb128.py -x 2a 80 80 a8 01 2a 42 80 80 a8 01 2752512 $ ./scripts/dbgle32.py -x 2a 00 00 00 00 00 2a 00 2a 00 00 00 42 00 00 2a 00 2752512 dbgleb128.py is probably going to be more useful, but I figured we might as well include both for completeness. Though dbgle32.py is begging to be generalized.

Do you see the O(n^2) behavior in this loop? j = 0 while j < len(data): word, d = fromleb(data[j:]) j += d The slice, data[j:], creates a O(n) copy every iteration of the loop. A bit tricky. Or at least I found it tricky to notice. Maybe because array indexing being cheap is baked into my brain... Long story short, this repeated slicing resulted in O(n^2) behavior in Rbyd.fetch and probably some other functions. Even though we don't care _too_ much about performance in these scripts, having Rbyd.fetch run in O(n^2) isn't great. Tweaking all from* functions to take an optional index solves this, at least on paper. --- In practice I didn't actually find any measurable performance gain. I guess array slicing in Python is optimized enough that the constant factor takes over? (Maybe it's being helped by us limiting Rbyd.fetch to block_size in most scripts? I haven't tested NAND block sizes yet...) Still, it's good to at least know this isn't a bottleneck.

This is limited to dbgle32.py, dbgleb128.py, and dbgtag.py for now. This more closely matches how littlefs behaves, in that we read a bounded number of bytes before leb128 decoding. This minimizes bugs related to leb128 overflow and avoids reading inherently undecodable data. The previous unbounded behavior is still available with -w0. Note this gives dbgle32.py much more flexibility in that it can now decode other integer widths. Uh, ignore the name for now. At least it's self documenting that the default is 32-bits... --- Also fixed a bug in fromleb128 where size was reported incorrectly on offset + truncated leb128.

Now that I know my way around the weirdness that is Python's class scope, this just required another function indirection to capture the class-level dicts correctly. I was considering using the __subclasses__ trick, but it seems like that would actually be more complicated here.

This replaces the previous fallback-to-what's-available behavior with explicit flags: - --tile-code - Tile based on code size (the default) - --tile-stack - Tile based on stack limits - --tile-frames - Tile based on stack frames - --tile-ctx - Tile based on function context - --tile-1 - Tile functions evenly This has the benefit of 1. being easier to toggle, 2. being explicit, and 3. allowing code/stack/ctx in punescapes (titles, labels, etc). There is an interesting question if --no-stack should be implicit, since showing two stack treemaps may be confusing, but I think that's trying to be too clever. Instead I just added the -S/--no-stack shortform to make it easier to toggle. Also updated ctx.py's description string. Probably need to check what else is out of date in other scripts as well.

Mainly adopting the added flexibility in csv.py, also adding make codemap-svg and friends for code map generation: - Split result commands into separate result, result-csv, and result-diff commands so csv generation is explicit. So make result no longer implicitly overwrites csv files: make code make code-csv -. make code | make code-diff <' This gives more control over result diffing. make code-csv _is_ more or less just a dependency on the lfs.code.csv rule, but it avoids BUILDDIR mess and is easier to remember. - Added make codemap/stackmap/ctxmap for in-terminal code/stack/ctx ascii art. I was a bit on the fence on these, since the result is more pretty than useful, but eh, can always drop them in the future. - Added make codemap-svg/codemap-tiny-svg for generating interactive codemap svgs. This raised an interesting question if the make commands should generate light or dark mode svgs. I settled on dark mode since that's what I personally find the most useful. I think the way this will breakdown is with dark mode generally used for development, and light mode generally used for published material. And it's not too hard to run the script outside of the Makefile for publishing. Or override CODEMAPFLAGS. - Adopted implicit prefixing, -q, etc. This simplifies some of the more complicated csv.py invocations (make summary, make funcs, etc). See make help for a full list of commands.

So now: (block_size) mbits = nlog2(----------) = nlog2(block_size) - 3 ( 8 ) Instead of: ( (block_size)) mbits = nlog2(floor(----------)) = nlog2(block_size & ~0x7) - 3 ( ( 8 )) This makes the post-log - 3 formula simpler, which we probably want to prefer as it avoids a division. And ceiling is arguably more intuitive corner case behavior. This may seem like a minor detail, but because mbits is purely block_size derived and not configurable, any quirks here will become a permanent compatibility requirement. And hey, it saves a couple bytes (I'm not really sure why, the division should've been optimized to a shift): code stack ctx before: 35528 2440 636 after: 35520 (-0.0%) 2440 (+0.0%) 636 (+0.0%)

- mdir_bits -> mbits - lfsr_mid_bid -> lfsr_mbid - lfsr_mid_rid -> lfsr_mrid These now match the naming in the dbg scripts. I feel like this is more terse in a way that is also more readable, but maybe that's just me.

So mbid=0 now implies the mdir is not inlined. Downsides: - A bit more work to calculate - May lose information due to masking everything when mtree.weight==0 - Risk of confusion when in-lfs.c state doesn't match (mbid=-1 is implied by mtree.weight==0) Upsides: - Includes more information about the topology of the mtree - Avoids multiple dbgmbids for the same physical mdir Also added lfsr_dbgmbid and lfsr_dbgmrid to help make logging easier/more consistent. And updated dbg scripts.

The exception being LFS_DEBUG. A bit inconsistent, but the at least consistent with LFS_ERR* vs LFS_ERROR, and may help reduce name conflicts: - LFS_DEBUGRBYDFETCHES -> LFS_DBGRBYDFETCHES - LFS_DEBUGRBYDBALANCE -> LFS_DBGRBYDBALANCE - LFS_DEBUGRBYDCOMMITS -> LFS_DBGRBYDCOMMITS - LFS_DEBUGBTREEFETCHES -> LFS_DBGBTREEFETCHES - LFS_DEBUGBTREECOMMITS -> LFS_DBGBTREECOMMITS - LFS_DEBUGMDIRFETCHES -> LFS_DBGMDIRFETCHES - LFS_DEBUGMDIRCOMMITS -> LFS_DBGMDIRCOMMITS - LFS_DEBUGALLOCS -> LFS_DBGALLOCS

This was a surprising side-effect the script rework: Realizing the internal btree/rbyd lookup APIs were awkwardly inconsistent and could be improved with a couple tweaks: - Adopted lookupleaf name for functions that return leaf rbyds/mdirs. There's an argument this should be called lookupnextleaf, since it returns the next bid, unlike lookup, but I'm going to ignore that argument because: 1. A non-next lookupleaf doesn't really make sense for trees where you don't have to fetch the leaf (the mtree) 2. It would be a bit too verbose - Adopted commitleaf name for functions that accept leaf rbyds. This makes the lfsr_bshrub_commit -> lfsr_btree_commit__ mess a bit more readable. - Strictly limited lookup and lookupnext to return rattrs, even in complex trees like the mtree. Most use cases will probably stick to the lookupleaf variants, but at least the behavior will be consistent. - Strictly limited lookup to expect a known bid/rid. This only really matters for lfsr_btree/bshrub_lookup, which as a quirk of their implementation _can_ lookup both bid + rattr at the same time. But I don't think we'll need this functionality, and limited the behavior may allow for future optimizations. Note there is no lfsr_file_lookup. File btrees currently only ever have a single leaf rattr, so this API doesn't really make sense. Internal API changes: - lfsr_btree_lookupnext_ -> lfsr_btree_lookupleaf - lfsr_btree_lookupnext -> lfsr_btree_lookupnext - lfsr_btree_lookup -> lfsr_btree_lookup - added lfsr_btree_namelookupleaf - lfsr_btree_namelookup -> lfsr_btree_namelookup - lfsr_btree_commit__ -> lfsr_btree_commit_ - lfsr_btree_commit_ -> lfsr_btree_commitleaf - lfsr_btree_commit -> lfsr_btree_commit - added lfsr_bshrub_lookupleaf - lfsr_bshrub_lookupnext -> lfsr_bshrub_lookupnext - lfsr_bshrub_lookup -> lfsr_bshrub_lookup - lfsr_bshrub_commit_ -> lfsr_bshrub_commitleaf - lfsr_bshrub_commit -> lfsr_bshrub_commit - lfsr_mtree_lookup -> lfsr_mtree_lookupleaf - added lfsr_mtree_lookupnext - added lfsr_mtree_lookup - added lfsr_mtree_namelookupleaf - lfsr_mtree_namelookup -> lfsr_mtree_namelookup - added lfsr_file_lookupleaf - lfsr_file_lookupnext -> lfsr_file_lookupnext - added lfsr_file_commitleaf - lfsr_file_commit -> lfsr_file_commit Also added lookupnext to Mdir/Mtree in the dbg scripts. Unfortunately this did add both code and stack, but only because of the optional mdir returns in the mtree lookups: code stack ctx before: 35520 2440 636 after: 35548 (+0.1%) 2472 (+1.3%) 636 (+0.0%)

This lets us cram in one more mask for potential redund bits: name tag mask LFSR_TAG_MASK0 0x0000 0x0fff ---- 1111 1111 1111 LFSR_TAG_MASK2 0x1000 0x0ffc ---- 1111 1111 11-- LFSR_TAG_MASK8 0x2000 0x0f00 ---- 1111 ---- ---- LFSR_TAG_MASK12 0x3000 0x0000 ---- ---- ---- ---- '.-' '.-' '---.---' mode bits -' | | ^ suptype ------' | | subtype --------------' | redund bits ------------------' I toyed around with a bitwise alternative to the lookup table, but couldn't come up with anything simpler than these: - 0xfff & ~((((1<<((i>>1)*8))-1) << ((i&1)*4)) | ((1<<(i*2))-1)) - 0xfff & ~((1 << (((i>>1)*8)+((i&1)<<(1+(i>>1)))))-1) - 0xfff & ~((1<<(2*i*i))-1) (requires multiply and 32-bit shift) --- This also replaces the mdir/rbyd/btree/mtree lookup/sublookup/suplookup functions with a single flexible lookup function that accepts tag masks. This ended up adding a bit of code/stack (the extra NULL args are surprisingly pricey), but will hopefully make the redund bits easier/cheaper to use: code stack ctx before: 35548 2472 636 after: 35584 (+0.1%) 2480 (+0.3%) 636 (+0.0%)

So now crystal_thresh only controls when fragments are compacted into blocks, while fragment_thresh controls when blocks are broken into fragments. Setting fragment_thresh=-1 will follow crystal_thresh and keeps the previous behavior. These were already two separate pieces of logic, so it makes sense to provide two separate knobs for tuning. Setting fragment_thresh lower than crystal_thresh has some potential to reduce hysteresis in cases where random writes push blocks close to crystal_thresh. It will be interesting to explore this more when benchmarking. --- The additional config option adds a bit of code/ctx, but hopefully that will go away in the future config rework: code stack ctx before: 35584 2480 636 after: 35600 (+0.0%) 2480 (+0.0%) 640 (+0.6%)

- Moved block allocator definitions into their own dedicated block before the lfs3_bptr_t stuff: - lfs3_alloc_discard - lfs3_alloc_ckpoint - lfs3_alloc I'm looking into adding lfs3_alloc related flags, and these aren't really predeclarable like C's function prototypes. Predeclaring these before lfs3_bptr_t is relevant if we ever add lfs3_bptr_alloc. Also touched up relevant comments a bit. - Also added inline to lfs3_alloc_ckpoint and lfs3_alloc_discard, which was strangely missing? The compiler figured it out anyways, so this has no impact on code cost. - Moved ecksum definitions below lfs3_bptr_t stuff. I don't really know where to put these, but close to the lfs3_rbyd_t stuff makes sense. - And updated lazy appendrattr_ ordering to match source code order. No code changes.

Currently this just has one flag the replaces the previous `erase` argument: LFS3_ALLOC_ERASE 0x00000001 Please erase the block Benefits include: - Slightly better readability at lfs3_alloc call sites. - Possibility of more allocator flags in the future: - LFS3_ALLOC_EMERGENCY - Use reserved blocks - Uh, that's all I can think of right now No code changes.

The isempty check has already been deduplicated. Deduplicating this more starts to code smell.

This just wraps up block allocation + struct initialization similarly to lfs3_rbyd_alloc. Also tweaked bptr updates in lfs3_file_crystallize__ to mutate the bptr fields directly instead of going through lfs3_bptr_init. Neither of these impacted code cost, everything ends up inlined in lfs3_file_crystallize__ anyways. Hopefully it helps with readability at least. Also LFS3_KVONLY mode is completely broken, and fixing it is not a huge priority. I don't think it makes sense to adopt lfs3_bptr_alloc in lfs3_file_flushset_ anyways, it would just lead to us initializing the lfs3_bptr_t struct twice for no real reason. No code changes.

- lfs3_omdir_t -> lfs3_handle_t - lfs3.omdirs -> lfs3.handles - o -> h - lfs3_omdir_* -> lfs3_handle_* - lfs3_omdir_ismidopen -> lfs3_mid_isopen From conversations with users, the term "handle" or "file handle" seems to be the most common/easily understood term for the lfs3_file_t struct itself. It makes sense to adopt this in our codebase. I usually dislike inventing new names for things when prefixes can imply a relationship (size -> ssize, cache -> rcache, shrub -> bshrub, etc), but lfs3_omdirs_t was probably a bit much.

- test_traversal -> test_trvs - lfs3_traversal_t -> lfs3_trv_t - lfs3_btraversal_t -> lfs3_btrv_t - t -> trv - bt -> btrv - lfs3_traversal_* -> lfs3_trv_* - lfs3_btraversal_* -> lfs3_btrv_* The traversal type is becoming one of the more fundamental types in littlefs, and if DIR and REG both get shortened names, it makes sense for TRV to have one as well. This also removes the temptation to use t for traversals, which is probably an even worse name. --- Note that lfs3_btree_traverse, lfs3_mtree_traverse, etc, remain unaffected. This may change in the future, but it's interesting to note that verbs seem to need much less typing than nouns.

Not sure how this got overlooked. Now that graft traversals are implemented directly in lfs3_alloc, there's no reason to store this state globally. Fortunately this was in a union, so it didn't actually show up in our ctx measurements. No code changes.

Note this includes both the lfs3_config -> lfs3_cfg structs as well as the LFS3_CONFIG -> LFS3_CFG include define: - LFS3_CONFIG -> LFS3_CFG - struct lfs3_config -> struct lfs3_cfg - struct lfs3_file_config -> struct lfs3_file_cfg - struct lfs3_*bd_config -> struct lfs3_*bd_cfg - cfg -> cfg We were already using cfg as the variable name everywhere. The fact that these names were different was an inconsistency that should be fixed since we're committing to an API break. LFS3_CFG is already out-of-date from upstream, and there's plans for a config rework, but I figured I'd go ahead and change it as well to lower the chances it gets overlooked. --- Note this does _not_ affect LFS3_TAG_CONFIG. Having the on-disk vs driver-level config take slightly different names is not a bad thing.

- enum lfs3_error -> enum lfs3_err - err -> err Really this just updates `enum lfs3_err` to match the prefixes used everywhere else. And because enum types are kind of useless in C, this has no effect on any other part of the codebase.

- enum lfs3_scmp -> enum lfs3_cmp - cmp -> cmp lfs3_scmp_t is still used as the type, as the s prefix indicates the type is signed, usually for muxing with error codes. I think that led to the enum also being named lfs3_scmp, but that's not quite right. But none of this really matters because enums are so useless and broken in C.

The --no-internal flag avoids building any internal tests/benches (tests/benches with in="lfs3.c"), which can be useful for quickly testing high-level things while refactoring. Refactors tend to break all the internal tests, and it can be a real pain to update everything. Note that --no-internal can be injected into the build with TESTCFLAGS: TESTCFLAGS=--no-internal make test-runner -j \ && ./scripts/test.py -j -b For a curious data point, here's the current number of internal/non-internal tests: suites cases perms total: 24 808 633968/776298 internal: 22 (91.7%) 532 (65.8%) 220316/310247 (34.8%) non-internal: 2 ( 8.3%) 276 (34.2%) 413652/466051 (65.2%) It's interesting to note that while internal tests have more test cases, the non-internal tests generate a larger number of test permutations. This is probably because internal tests tend to target specific corner cases/known failure points, and don't invite much variants. --- While --no-internal may be useful for high-level testing during a refactor, I'm not sure it's a good idea to rely on it for _debugging_ a refactor. The whole point of internal testing is to catch low-level bugs early, with as little unnecessary state as possible. Skipping these to debug integration tests is a bit counterproductive!

Note --list-suite-paths was already skipping case-less suites! I think only -Y/--summary was an outlier. This is consistent with test.py's matching of suite ids when no cases are found (test_runner itself doesn't really care, it just reports no matching cases). Though we do still compile case-less suites and include them in the test_suites array, which may be confusing in the future.

This is an indulgence to simplify the upcoming auxiliary btree work. Brings back the previously-reverted per-btree leaf caches, where each lfs3_btree_t keeps track of two rbyds: The root and the most recently accessed leaf. At the surface level, this optimizes repeated access to the same btree leaf. A common pattern for a number of littlefs's operations that has proven tricky to manually optimize: - Btree iteration - Pokes for our crystalization heuristic - Checksum collision resolution for dids and (FUTURE) ddkeys - Related rattrs attached to a single bid But the real motivation is to drop lfs3_btree_*lookupleaf and simplify the internal APIs. If repeated lfs3_btree_lookup*s are already efficient, there's no reason for extra leaf-level APIs, and in theory any logic that interacts with btrees will be simpler. --- This comes at a cost (humorously about the same amount as the tag-returning refactor, if you ignore the extra 28 bytes of ctx). Unsurprisingly, increasing the size of lfs3_btree_t has the biggest impact on stack and ctx: code stack ctx before: 36084 2336 656 after: 36784 (+1.9%) 2400 (+2.7%) 684 (+4.3%) Also note from the previous commit messages: Btree leaf caching has resulted in surprisingly little performance improvement for our current benchmarks + implementation. It turns out if you're dominated by write cost, optimizing btree lookups -- which already skip rbyd fetches, has barely noticeable impact. --- A note on reverting! Eventually (after the auxiliary btree work) it will probably make sense to revert this -- or at least provide a non-leaf-caching build for code/RAM sensitive users. I don't think this should be reverted as-is. Instead, I think we should allow the option to just disable the leaf cache, while keeping the simpler internal API. This would give us the best of all three worlds: - A small code/RAM option - Optimal btree iteration/nearby-lookup performance - Simpler internal APIs The only reason this isn't already implemented is because I want to avoid fragmenting the codebase further while we're still in development mode.

What a mouthful. The unconditional bshrub leaf discarding in lfs3_mdir_commit was copied from the previous btree leaf caching implementation, but discarding _all_ bshrub leaves on _every_ mdir commit is a bit insane. Really, the only bshrub leaves that ever need to be discarded here are the shrub roots, which are already questionable leaf caching targets because they're already cached as the root rbyd. An alternative option would be to just never cache shrub roots, but tinkering around with the idea showed it would be more costly that conditionally discarding leaves in lfs3_mdir_commit. At least here we can reuse some of the logic that discards file leaves. I'm also probably overthinking what is only a small code cost: code stack ctx before: 36784 2400 684 after: 36792 (+0.0%) 2400 (+0.0%) 684 (+0.0%) This doesn't take into account how much CPU time is spent creating rbyd copies, but that is not something we are optimizing for.

I was confused, but this commit->bid update is used to limit the commit->bid to the btree weight. Note we limit the bid after storing it as the initial rid.

This comes from an observation that we never actually use the leaf cache during traversals, and there is surprisingly little risk of a lookup creating a conflict in the future. Btree traversal fall into two categories: 1. Full traversals, where we traverse a full btree all at once. These are unlikely to have lookup conflicts because everything is usually self-contained in one chunk of logic. 2. Incremental traversals. These _are_ at risk, but in our current design limited to lfs3_trv_t, which already creates a fully bshrub/btree copy for tracking purposes. This copy unintentionally, but conveniently, protects against lookup conflicts. So, why not reuse the btree leaf cache to hold the rbyd state during traversals? In theory this makes lfs3_btree_traverse the same cost and lfs3_btree_lookupnext, drops the need for lfs3_btrv_t, and simplifies the internal API. The only extra bit of state we need is the current target bid, which is now expected as a caller-incremented argument similar to lfs3_btree_lookupnext iteration. There was a bit of futzing around with bid=-1 being necessary to initialize traversal (to avoid conflicts with bid=-1 => 0 caused by empty btrees). But the end result is a btree traversal that only needs one extra word of state. --- Unfortunately, in practice, the savings were not as great as expected: code stack ctx before: 36792 2400 684 after: 36876 (+0.2%) 2384 (-0.7%) 684 (+0.0%) This does claw back some stack, but less than a full rbyd due to the union with the mtortoise in lfs3_trv_t. The mtortoise now dominates. It might be possible to union the mtortoise and the bshrub/btree state better (both are not needed at the same time), but strict aliasing rules in C make this tricky. The new lfs3_btree_traverse is also a bit more complicated in terms of code cost. In theory this would be offset by the simpler traversal setup logic, but we only actually call lfs3_btree_traverse twice: 1. In lfs3_mtree_traverse 2. In lfs3_file_ck Still, some stack savings + a simpler internal API makes this worthwhile for now. lfs3_trv_t is also due for a revisit, and hopefully it's possible to better union things with btree leaf caches somehow.

This matches other internal rbyds: btree.r, mdir.r, etc. The intention of the single-char names is to reduce clutter around these severely nested structs, both btrees and mdirs _are_ rbyds, so the name doesn't really besides C-level type info. I was hesitant on btree.leaf.rbyd, but decided consistency probably wins here.

Looks like these traversal states were missed in the omdir -> handle rename. I think HANDLES and HBTREE states make sense: - LFS3_TSTATE_OMDIRS -> LFS3_TSTATE_HANDLES - LFS3_TSTATE_OBTREE -> LFS3_TSTATE_HBTREE

Just moving away from the *trv when unnecessary. This matches the h variable used for local iteration.

This forces our cycle detection tortoise (previously trv.u.mtortoise), into the unused shrub leaf via pointer shenanigans. This reclaims the remaining stack (and apparently code) we theoretically gained from the btree traversal rework, up until the compiler got in the way: code stack ctx before: 36876 2384 684 after: 36852 (-0.1%) 2368 (-0.7%) 684 (+0.0%) And it only required some _questionably_ defined behavior. --- It's probably not well-defined behavior, but trying to understand what the standard actually means on this is giving me a headache. I think I have to agree C99+strict-aliasing lost the plot on this one. Note mtortoise is only ever written/read through the same type. What I want: lfs3_trv_t: lfs3_bshrub_t: lfs3_handle_t: .---+---+---+---. .. .---+---+---+---. .. .---+---+---+---. | handle | | handle | | handle | | | | | | | +---+---+---+---+ +---+---+---+---+ .. '---+---+---+---' | root rbyd | | root rbyd | | | | | lfs3_mtortoise_t: +---+---+---+---+ +---+---+---+---+ .. .---+---+---+---. | leaf rbyd | | leaf rbyd | | mtortoise | | | | | | | +---+---+---+---+ +---+---+---+---+ .. '---+---+---+---' | staging rbyd | | staging rbyd | | | | | +---+---+---+---+ .. '---+---+---+---' | | : : But I'm starting to think this is simply not possible in modern C. At least this shows what is theoretically possible if we didn't have to fight the compiler.

I think modern C simply doesn't let us do what we want to do here, so I'm giving up, discarding the lfs3_mtortoise_t type, and just abusing various unrelated shrub fields to implement the tortoise. This sacrifices readability, but at least avoids undefined behavior without a RAM penalty: - shrub.blocks => tortoise blocks - shrub.weight => cycle distance - shrub.eoff => power-of-two bound Note this keeps trunk=0, which is a nice safety net in case some code ever tries to read from the shrub in the future. Fortunately the mtortoise logic is fairly self-contained in lfs3_mtree_traverse_, so with enough comments hopefully the code is not too confusing. --- Apparently shaves off a couple more bytes of code. I'm guessing this is just because of the slightly different struct offsets (we're reusing the root's rbyd instead of the leaf's rbyd now): code stack ctx before: 36852 2368 684 after: 36844 (-0.0%) 2368 (+0.0%) 684 (+0.0%)

This fixes a strict aliasing violation in lfs3_file_sync_, where we cast the file cache -> lfs3_data_t to avoid an extra stack allocation, by modifying the file's cache struct to use an lfs3_data_t directly. - file.cache.pos -> file.cache.pos - file.cache.buffer -> file.cache.d.u.buffer_ - file.cache.size -> file.cache.d.size - (const lfs3_data_t*)&file->cache -> &file->cache.d Note the underscore_ in file.cache.d.u.buffer_. This did not fit together as well as I had hoped, due to different const expectation between the file cache and lfs3_data_t. Up until this point lfs3_data_t has only been used to refer to const data (ignoring side-band pointer casting in lfs3_mtree_traverse*), while the file cache very much contains mutable data. To work around this I added data.u.buffer_ as a mutable variant, which works, but risks an accidental const violation in the future. --- Unfortunately this does come with a minor RAM cost, since we no longer hide file.cache.pos in lfs3_data_t's buffer padding: code stack ctx before: 36844 2368 684 after: 36832 (-0.0%) 2376 (+0.3%) 684 (+0.0%) lfs3_file_t before: 164 lfs3_file_t after: 168 (+2.4%) I think it's pretty fair to call C's strict aliasing rules a real wet blanket. It would be interesting to create a -fno-strict-aliasing variant of littlefs in the future, to see how much code/RAM could be saved if we were given free reign to abuse the available memory. Probably not enough to justify the extra work, but it would be an interesting experiment.

This abandons the data-backed cache idea due to concerns around readability and maintainability. Mixing const/mutable buffers in lfs3_data_t was not great. Instead, we now just allocate an indirect lfs3_data_t on the stack in lfs3_file_sync_ to avoid the previous undefined behavior. This actually results in less stack usage total, due to lfs3_file_t allocations in lfs3_set/read, and avoid the more long-term memory cost in lfs3_file_t: code stack ctx before: 36832 2376 684 after: 36840 (+0.0%) 2368 (-0.3%) 684 (+0.0%) Oh. And lfs3_file_sync_ isn't even on the stack hot-path, so this is a net benefit over the previous cache -> data cast: code stack ctx before sa: 36844 2368 684 after sa: 36840 (-0.0%) 2368 (+0.0%) 684 (+0.0%) Still less cool though.

geky added 30 commits April 16, 2025 15:22

scripts: maps: Cleaned up comments and junk

5e817be

This took a bit of a messy route, but these scripts should be good to go now.

scripts: Renamed tracebd.py -> dbgtrace.py

9008e8c

This matches dbgbmap.py and fits in with the other dbg scripts. The choice of tracebd.py for the name was arbitrary anyways, we just needed something that wouldn't conflict with other scripts.

scripts: maps: Assume percentages never hit 100.0%

97c2287

This isn't true, especially for dbgbmap.py, 100% is very possible in filesystems with small files. But by limiting padding to 99.9%, we avoid the annoying wasted space caused by the rare but occasional 100.0%.

scripts: Tweaked openio comment

71930a5

Dang, this touched like every single script.

Renamed a couple mbit related things

89356fc

- mdir_bits -> mbits - lfsr_mid_bid -> lfsr_mbid - lfsr_mid_rid -> lfsr_mrid These now match the naming in the dbg scripts. I feel like this is more terse in a way that is also more readable, but maybe that's just me.

geky added 5 commits July 18, 2025 16:42

Dropped an outdated TODO comment

b67bdbb

The isempty check has already been deduplicated. Deduplicating this more starts to code smell.

geky force-pushed the v3-alpha branch from 0503c94 to ee3be04 Compare July 18, 2025 22:45

geky added 18 commits July 18, 2025 18:28

Renamed error -> err

2e47172

- enum lfs3_error -> enum lfs3_err - err -> err Really this just updates `enum lfs3_err` to match the prefixes used everywhere else. And because enum types are kind of useless in C, this has no effect on any other part of the codebase.

btree: Dropped bcommit->bid TODO comment

e3e719a

I was confused, but this commit->bid update is used to limit the commit->bid to the btree weight. Note we limit the bid after storing it as the initial rid.

trv: Renamed OMDIRS -> HANDLES, OBTREE -> HBTREE

4b7a5c9

Looks like these traversal states were missed in the omdir -> handle rename. I think HANDLES and HBTREE states make sense: - LFS3_TSTATE_OMDIRS -> LFS3_TSTATE_HANDLES - LFS3_TSTATE_OBTREE -> LFS3_TSTATE_HBTREE

trv: Renamed trv.htrv -> trv.h

70872b5

Just moving away from the *trv when unnecessary. This matches the h variable used for local iteration.

This was referenced Jul 22, 2025

v3: Thread safety and locking support #1125

Open

Allow moving file handles #1124

Open

This was referenced Aug 11, 2025

tags parsed by the script are inconsistent #1127

Open

Why CoW when appending to files? #1128

Open

LFS Corruption-Propagation Issue #1130

Open

Access to buffers returned as corrupted by the block layer #904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3-alpha: Rbyds and B-trees and gcksums, oh my! #1111

v3-alpha: Rbyds and B-trees and gcksums, oh my! #1111

Uh oh!

geky commented May 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

v3-alpha: Rbyds and B-trees and gcksums, oh my! #1111

Are you sure you want to change the base?

v3-alpha: Rbyds and B-trees and gcksums, oh my! #1111

Uh oh!

Conversation

geky commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note: v3-alpha discussion (#1114)

Table of contents ^

Hello! ^

Wait, a disk breaking change? ^

What's new ^

Implemented: ^

Planned: ^

Stretch goals: ^

Out-of-scope (for now): ^

Code/stack size ^

Benchmarks ^

Simulated benchmarks ^

Funding ^

Next steps ^

Uh oh!

Uh oh!

geky commented May 29, 2025 •

edited

Loading