Skip to content

Conversation

liljenzin
Copy link
Contributor

Bumping the discussion in #377 with a proof of concept.

IMHO the file_stat_matches sloppiness option is not useful for several reasons.

Time stamps are updated when switching between branches, they don't match when files are checked out into different worktrees, they cannot be trusted alone as manifests contain relative file paths. And all included files will still need to be hashed for each compilation unit that uses them.

Hard facts from building a large program, taking the chromium browser as an example. With the current approach over 18 million files will be hashed to build about 34000 compilation units. With the suggested patch this goes down to hashing about 74000 files in the initial build and 26500 files in repeated builds from a hot ccache. This saves 7-8 minutes in serial build time for me, which is significant, although the real time gain for parallel builds is smaller and depends on the number of cores by natural reasons.

Note that this is a draft pull request with a patch I wrote just for the sake of discussion. Not sure if this is the best way to do it, but it demonstrates a cache can be efficient with close to zero overhead and as an embedded feature, thus not introducing new dependencies on external infrastructure such as caching daemons.

========================================================

The inode cache is a process shared cache that maps from device, inode,
mtime to saved hash results. The cache is stored persistently in a
single file that is mapped into shared memory by running processes,
allowing computed hash values to be reused both within and between
builds.

The chosen technical solution works for Linux and might work on other
POSIX platforms, but is not meant to be supported on non-POSIX
platforms such as Windows.

Use 'ccache -o inode_cache=true/false' to activate/deactivate the cache.
Use 'ccache -o inode_cache_file=/path/to/cache/file" to set a custom
cache-file location. Defaults to "{cache-dir}/inode-cache".

@liljenzin liljenzin marked this pull request as ready for review April 13, 2020 10:04
@jrosdahl
Copy link
Member

Interesting! I unfortunately don't have much time to think about this right now, though.

@jrosdahl jrosdahl added discussion feature New or improved feature labels Apr 13, 2020
@afbjorklund
Copy link
Contributor

afbjorklund commented Apr 14, 2020

Think I missed something here, why would the inode change when file is updated ?

We have investigated reusing hash contents before, that time using memcached.
(i.e. using some existing key-value store, rather than writing our own for ccache)

Here is some old code, with "stat": 3.7-maint...afbjorklund:memcached-hashed
Also tried mmap (with lmdb), but there's just so many ccache processes created...

It has a great win, but same downfall as the other sloppy options for a silver bullet.

@liljenzin
Copy link
Contributor Author

Think I missed something here, why would the inode change when file is updated ?

I will try to explain this way.

When calling hash_source_code_file() to hash "includes/foo.h", the result depends on the content of the specified file. It also depends on the provided seed and configuration options.

Caching the result based on (filename, mtime) would be a terrible idea since "includes/foo.h" is relative and could resolve to different files depending on current directory. It could also resolve to different files if files have been moved around or the directory structure has changed.

As (device, inode) makes the identity of a file, and mtime is always updated if the content is changed, then (device, inode, mtime) must resolve to the same content from time to time, otherwise the system violates assumptions made by other build tools like "make".

The saved hash value is made independent of the provided seed by always hashing files using a new seed, saving the hash digest from that, and then hashing the saved hash digest to the provided seed instead. Thus using the hash-combine trick to make saved hash values context independent. Configuration settings that change the hash result (currently sloppy_time_macros) must be added to the key though, which means (device, inode, mtime, sloppy_time_macros) will be used as combined key that identifies a saved entry in the cache.

We have investigated reusing hash contents before, that time using memcached.
(i.e. using some existing key-value store, rather than writing our own for ccache)

Here is some old code, with "stat": 3.7-maint...afbjorklund:memcached-hashed

It is always a tradeoff between generic solutions that solve generic problems and homebrewed solutions that makes exactly what you need, and not more than that.

I guess memcached is not zero-overhead because it runs in a seperate process and we have to communicate over a socket, causing context switching? I also guess integrating with memcached requires about the same amount of code as doing it all ourselves? And that memcached also requires additional administrative work for end users?

Please correct me if I'm wrong.

Also tried mmap (with lmdb), but there's just so many ccache processes created...

In this case a single file is mapped into shared memory by concurrent processes. This is basically how libc.so and other shared libraries are loaded every time a ccache process is spawned. Nothing would be fast if the kernels couldn't handle this very effeiciently.

The easiest way to find out is to build and test the patch yourself. Either it is fast or it isn't.

It has a great win, but same downfall as the other sloppy options for a silver bullet.

Exactly what downfalls do you see? And exactly how is it sloppy?

@afbjorklund
Copy link
Contributor

Ah, OK. Used the absolute path as key for the same thing (as the device, inode) then.
You are probably right about the adminstrative overhead, I just spawned it from make...

Using memcached back then was natural since we already used it for distributed cache.
As it ended up not being integrated anyway, it would make sense to start over and try.

It has a great win, but same downfall as the other sloppy options for a silver bullet.

Exactly what downfalls do you see? And exactly how is it sloppy?

It has the usual mtime caveats, if you change content and reset timestamp you lose...

The "checksum always" strategy was the safe-and-slow option. But as opt-in, sure!

@liljenzin
Copy link
Contributor Author

liljenzin commented Apr 14, 2020

It has the usual mtime caveats, if you change content and reset timestamp you lose...

In the actual implementation I added ctime and size to the key, thus tightening such loopholes since you can't modify mtime without increasing ctime. I didn't mention it since I didn't want to complicate things with all details when explaining how it worked in principle.

The "checksum always" strategy was the safe-and-slow option. But as opt-in, sure!

You can always violate contracts by editing the actual disk blocks as a last resort. The point is that other parts of a build system also make the same assumptions and if you violate them ccache will not be the only build tool that breaks the chain.

The feature is not sloppy in the way that it will break by accident when used on a healthy system. If you have to be creative to break it, then it is not a flaw that will hit innocent users.

@afbjorklund
Copy link
Contributor

The feature is not sloppy in the way that it will break by accident when used on a healthy system. If you have to be creative to break it, then it is not a flaw that will hit innocent users.

It was not meant sloppy as in dirty, and there is an inherit risk in all caching... Like you say, if the setup is healthy then it should be reliable. Hope that someone gets a chance to look at it!

@jrosdahl
Copy link
Member

Hard facts from building a large program [...] This saves 7-8 minutes in serial build time for me [...]

Thanks for the numbers. Not that it matters much, but it would be interesting to know how much of the 7-8 minutes come from avoiding hashing and how much come from avoiding I/O.

By the way, do you use the file_clone or hard_link mode?

The inode cache is a process shared cache that maps from device, inode,
mtime to saved hash results. The cache is stored persistently in a
single file that is mapped into shared memory by running processes,
allowing computed hash values to be reused both within and between
builds.

This indeed sounds like a much better approach than the path-based one envisioned in #377. Thanks! I have nothing against this approach in principle. Well, maybe the only concern I have is if your use case is "non-edge" enough to motivate increasing the ccache implementation complexity.

@liljenzin
Copy link
Contributor Author

Thanks for the numbers. Not that it matters much, but it would be interesting to know how much of the 7-8 minutes come from avoiding hashing and how much come from avoiding I/O.

Seems to be from reduced hashing. Before implementing the cache I used valgrind/callgrind for profiling and found about 60-70% of cpu time was spent in hashing, while most of this time got eliminated by the cache. The mentioned build was performed on a machine with 128 GB RAM, which effectively eliminates all reads from physical disk since all source files and all output fit into buffer space. It doesn't mean I/O is for free, but there is no latency while waiting for disk.

By the way, do you use the file_clone or hard_link mode?

None of them, mainly because I/O pressure looks very low in general when I build, provided sufficient RAM and fast NVMe drives. At least I have not seen any measurable gain when trying these options in the past.

The inode cache is a process shared cache that maps from device, inode,
mtime to saved hash results. The cache is stored persistently in a
single file that is mapped into shared memory by running processes,
allowing computed hash values to be reused both within and between
builds.

This indeed sounds like a much better approach than the path-based one envisioned in #377. Thanks! I have nothing against this approach in principle. Well, maybe the only concern I have is if your use case is "non-edge" enough to motivate increasing the ccache implementation complexity.

I think the important question is if ccache performance matters when you get a hit, or if hits are already fast enough? If it matters it might be hard to find other changes that achieves similar improvements, while not affecting complexity even worse.

Note that also trivial compilation units such as the "Hello World!" program will gain from the cache, because a single standard header often includes a huge amount of other files.

$ cat t.cc:
#include <iostream>

int main() {
	std::cout << "Hello World!" << std::endl;
}

$ ccache -o inode_cache=false
$ time for ((i=0;i<10000;++i)); do ccache g++ -c t.cc; done
real	0m37,334s
user	0m26,053s
sys	0m11,462s

$ ccache -o inode_cache=true
$ time for ((i=0;i<10000;++i)); do ccache g++ -c t.cc; done

real	0m19,883s
user	0m9,167s
sys	0m10,981s

Which is 87% faster even for a trivial program. :-)

Thanks for the review comments btw. Will look at them when I get some spare time the coming week.

@afbjorklund
Copy link
Contributor

Seems to be from reduced hashing. Before implementing the cache I used valgrind/callgrind for profiling and found about 60-70% of cpu time was spent in hashing, while most of this time got eliminated by the cache

You can also use the built-in ccache tracing, to get something you can load into chrome: #280

./configure --enable-tracing

export CCACHE_INTERNAL_TRACE=1

liljenzin and others added 16 commits May 3, 2020 11:02
The inode cache is a process shared cache that maps from device, inode,
mtime to saved hash results. The cache is stored persistently in a
single file that is mapped into shared memory by running processes,
allowing computed hash values to be reused both within and between
builds.

The chosen technical solution works for Linux and might work on other
POSIX platforms, but is not meant to be supported on non-POSIX
platforms such as Windows.

Use 'ccache -o inode_cache=true/false' to activate/deactivate the cache.
Use 'ccache -o inode_cache_file=/path/to/cache/file" to set a custom
cache-file location. Defaults to "{cache-dir}/inode-cache".
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
Co-authored-by: Joel Rosdahl <joel@rosdahl.net>
liljenzin added 2 commits May 25, 2020 20:26
Apparently posix_fallocate() needs read permission to the open file,
otherwise it will not work on nfs since the emulation is client side.
@jrosdahl jrosdahl merged commit 213d988 into ccache:master May 31, 2020
@jrosdahl
Copy link
Member

Thanks!

@jrosdahl jrosdahl added this to the 4.0 milestone May 31, 2020
@jrosdahl jrosdahl changed the title Inode cache for file hashes (proof of concept) Inode cache for file hashes May 31, 2020
jrosdahl added a commit to jrosdahl/ccache that referenced this pull request Jun 18, 2020
jrosdahl added a commit to jrosdahl/ccache that referenced this pull request Jun 18, 2020
Unintended or not, ccache#577 (213d988) changed the behavior of “ccache
--hash-file” to use hash_binary_file, which essentially performs
hash(hash(path)) if the i-node cache is enabled, otherwise hash(path).
This means that “ccache --hash-file” behaves differently depending on if
i-node cache is enabled and also that it’s no longer usable for
benchmarking purposes.

Fix this by simply using “hash_file” again.
jrosdahl added a commit to jrosdahl/ccache that referenced this pull request Jun 18, 2020
jrosdahl added a commit to jrosdahl/ccache that referenced this pull request Jun 18, 2020
Unintended or not, ccache#577 (213d988) changed the behavior of “ccache
--hash-file” to use hash_binary_file, which essentially performs
hash(hash(path)) if the i-node cache is enabled, otherwise hash(path).
This means that “ccache --hash-file” behaves differently depending on if
i-node cache is enabled and also that it’s no longer usable for
benchmarking purposes.

Fix this by simply using “hash_file” again.
jrosdahl added a commit that referenced this pull request Jun 18, 2020
jrosdahl added a commit that referenced this pull request Jun 18, 2020
Unintended or not, #577 (213d988) changed the behavior of “ccache
--hash-file” to use hash_binary_file, which essentially performs
hash(hash(path)) if the i-node cache is enabled, otherwise hash(path).
This means that “ccache --hash-file” behaves differently depending on if
i-node cache is enabled and also that it’s no longer usable for
benchmarking purposes.

Fix this by simply using “hash_file” again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New or improved feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants