-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Locked memory manager #8753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locked memory manager #8753
Conversation
Concept ACK! I'm glad that you're working on this. I think it's the right approach. |
There is no
|
Indeed. I've also been thinking about heartbleed-like attacks. Currently key data is scattered all around the heap and stack, with this approach it is consolidated in a few places which are separate from the normal heap where e.g. network buffers are allocated. It would help even more if the secret data is separated with a 'moat' of unmapped pages from the normal heap, so that a large read can't get there. I've done nothing special to accomplish this at the moment, though, apart from trying to use the OS page allocation directly. Which reminds me that on POSIX I should probably be using
Gah, that needs a silly cast to uint64_t (I guess this error comes up on 32-bit platforms?). |
// Implementation: LockedPool | ||
|
||
LockedPool::LockedPool(std::unique_ptr<LockedPageAllocator> allocator, LockingFailed_Callback lf_cb_in): | ||
allocator(std::move(allocator)), lf_cb(lf_cb_in), cumulative_bytes_locked(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_allocator
here please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using _in
as convention in this file, but yes it shouldn't shadow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New file, new convention? Welcome to Bitcoin Core...
Edit: There is no need to markup irony ;-)
void *synth_base = reinterpret_cast<void*>(0x08000000); | ||
const size_t synth_size = 1024*1024; | ||
Arena b(synth_base, synth_size, 16); | ||
void *x = b.alloc(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you use x
down in for cycles, please change this.
@paveljanik Ok, I've made your variable naming changes. But let's please discuss higher-level concerns first before bombarding nits in code that may be thrown away anyway. |
The higher level is already wrote by @gmaxwell, no need to repeat it. With
After
No memory locked at all? Or when we jump out of the limit, you do not lock anything? Hmm, arena is 256k min. Will try with lower arena size. Changed arenasize to 128k and:
Good! |
The default |
It allocates and locks memory per arena. If locking the first arena (of 256Kib) fails, nothing will be locked. You could set the
Yes on Ubuntu it's also unlimited by default. OpenBSD has 5 MiB. 64k seems utterly useless. |
Done, it should always get one arena of locked memory as long as the limit is larger then 0. If not it will act as a NonLockedPoolManager, nothing else to do. |
893222f
to
3105dbf
Compare
After
|
That's exactly what should be expected. It uses all the locked memory allowed to it by the limit. Thanks for testing. |
You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration. This would both reduce over-allocation and increase the chances that we get all that ulimit would allow. Not a strong opinion, just a design tweak. Guard pages sound like a good idea. They should be at least as large as any object that exists in the system. Causing the locked page pool to be at a random address would be helpful too. |
The practical problem here is that having tons of one-page (or two-page for that matter) arenas reduces performance, at least with the current implementation. I don't think allocating 256kB (or whatever the ulimit is, if it is less) on the first time this is used is such a bad compromise given Bitcoin Core's frequent use of this memory. As said my relatively large (not huge) wallet already requires 512kiB (which is really ~300KiB rounded up, but we're not programming for the C64 here). Also there would be some residual memory wasted if the pool consists of 4k/8k blocks spread around. And mind the syscall overhead when requesting from OS in such small quantities. Note that I'm all for changing the 256kB parameter to say, 128kB, if that is marginally better. As I've documented in the comments it's a compromise.
Hm, I guess, by specifying a random argument to mmap (modulus the virtual address size, rounded to a page) and then 'probing' the address space where it can be put. I think this is a great idea for later, but I'm not aiming to do so in this pull (maybe we can borrow some code from ASLR?). Also here you don't want the individual arenas too small or it'd spray the address space to unusability, as well as burden the MMU tables. |
Review comment on the first two commits: if you'd change the variable name of the field/variable whose type changes, it's obvious from the diff that you've adapted all places in the code that affect it (makes it easier to guarantee that there are no sizeofs left). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
#endif | ||
|
||
LockedPoolManager* LockedPoolManager::_instance = NULL; | ||
boost::once_flag LockedPoolManager::init_flag = BOOST_ONCE_INIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::once_flag ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that breaks the only dependency on boost
Chunk(size_t size_in, uint32_t flags_in): | ||
size(size_in), flags(flags_in) {} | ||
size_t size; | ||
uint32_t flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/uint32_t/ChunkFlags/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does bitwise logic with enums work with C++11? I didn't dare try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enums automatically decay to the integer type they are derived from for supported operations. Also, any reason for not just using a boolean? Do we expect more flags to be added in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enums automatically decay to the integer type they are derived from for supported operations.
Right - it always broke down when trying to assign the result of a bitwise operation back to the enum type. Good to hear that this is not a problem anymore.
Do we expect more flags to be added in the future?
No, I don't expect more flags to be added.
I'm thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It'd be silly to set the alignment to 1 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to cast to assign back to the enum type, as integers aren't automatically converted to enums, only the other way around.
What I meant was that you'd just have an enum with two values, and you wouldn't use any bitwise logic. But that would not make it much more clear than just using a boolean.
I'm thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It'd be silly to set the alignment to 1 anyway.
Or use the MSB and disallow locking over 2 GiB :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use the MSB and disallow locking over 2 GiB :)
Good idea, going with that, it's less invasive.
void* Arena::alloc(size_t size) | ||
{ | ||
/* Round to next multiple of alignment */ | ||
size = (size + alignment - 1) & ~(alignment-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use align_up?
return nullptr; | ||
|
||
for (auto& chunk: chunks) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style nit: braces on the same line, except for namespaces, classes, functions, methods.
if (locked) { | ||
cumulative_bytes_locked += size; | ||
} else if (lf_cb) { // Call the locking-failed callback if locking failed | ||
if (!lf_cb()) { // If the callback returns false, free the memory and fail, other consider the user warned and proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: otherwise
Good idea, I did this for the class field at least (chKey to vchKey and chIV to vchIV), makes sense as a general suggestion. |
1c769d7
to
f15e8b8
Compare
rebased:
|
Concept ACK, impressive work, will try to test this soon. |
Ref: #3949 |
f15e8b8
to
db325a9
Compare
Added a benchmark to bench/ |
Change CCrypter to use vectors with secure allocator instead of buffers on in the object itself which will end up on the stack. This avoids having to call LockedPageManager to lock stack memory pages to prevent the memory from being swapped to disk. This is wasteful.
Replace these with vectors allocated from the secure allocator. This avoids mlock syscall churn on stack pages, as well as makes it possible to get rid of these functions. Please review this commit and the previous one carefully that no `sizeof(vectortype)` remains in the memcpys and memcmps usage (ick!), and `.data()` or `&vec[x]` is used as appropriate instead of &vec.
f42c60a
to
444c673
Compare
Squashed the following commits
|
444c673 bench: Add benchmark for lockedpool allocation/deallocation (Wladimir J. van der Laan) 6567999 rpc: Add `getmemoryinfo` call (Wladimir J. van der Laan) 4536148 support: Add LockedPool (Wladimir J. van der Laan) f4d1fc2 wallet: Get rid of LockObject and UnlockObject calls in key.h (Wladimir J. van der Laan) 999e4c9 wallet: Change CCrypter to use vectors with secure allocator (Wladimir J. van der Laan)
OS X, clang (one
So maybe
somewhere before its usage? |
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in #8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
444c673 bench: Add benchmark for lockedpool allocation/deallocation (Wladimir J. van der Laan) 6567999 rpc: Add `getmemoryinfo` call (Wladimir J. van der Laan) 4536148 support: Add LockedPool (Wladimir J. van der Laan) f4d1fc2 wallet: Get rid of LockObject and UnlockObject calls in key.h (Wladimir J. van der Laan) 999e4c9 wallet: Change CCrypter to use vectors with secure allocator (Wladimir J. van der Laan)
444c673 bench: Add benchmark for lockedpool allocation/deallocation (Wladimir J. van der Laan) 6567999 rpc: Add `getmemoryinfo` call (Wladimir J. van der Laan) 4536148 support: Add LockedPool (Wladimir J. van der Laan) f4d1fc2 wallet: Get rid of LockObject and UnlockObject calls in key.h (Wladimir J. van der Laan) 999e4c9 wallet: Change CCrypter to use vectors with secure allocator (Wladimir J. van der Laan)
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
46c9043 Remove some unused functions and methods (Pieter Wuille) Pull request description: In the case of CKey's destructor, it seems to have been an oversight in bitcoin#8753 not to delete it. At this point, it results in the move constructors/assignment operators for CKey being deleted, which may have a performance impact (requiring a pool allocation/copy/free, rather than just handing over the pointer from one CKey to another) Tree-SHA512: 89715bafe3e0bea2c46fc92bc6a1010360a3fee2719f97b81ca379581003409b0876b50f992208a3c13c7f5b77f1866db09954e7d102f6a452fe5d7aed2044a1
f8fd095 Add missing lock in crypter GetKeys. (furszy) 366bc8b Get rid of LockObject and UnlockObject unused methods. (furszy) 121e5c0 [Wallet] Change CCrypter to use vectors with secure allocator (furszy) Pull request description: Built on top of #1486. Commits starting in `d058064` . This PR back port, partially, [upstream@8753](bitcoin#8753) . -- The last three commits left to back port in a future PR -- Containing the following changes: 1) Changing CCrypter to use vectors with secure allocator instead of have to lock stack memory pages to prevent the memory from being swapped to disk. 2) Removing the unused LockObject and UnlockObject methods. 3) Adding a missing lock in the `CCryptoKeyStore::GetKeys` method. ACKs for top commit: Fuzzbawls: utACK f8fd095 random-zebra: utACK f8fd095 and merging... Tree-SHA512: c1b760d37da93623ade0fb2565e8f060a4b7f5a109633cbe885732a16ed32614ba1f07d351fadcd57a19edf1343bdee1a81ee27898b4f124b6ddb15cc226d0d2
Locked memory manager Add a pool for locked memory chunks, replacing `LockedPageManager`. Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#8321 - bitcoin/bitcoin#8753 - bitcoin/bitcoin#9063 - bitcoin/bitcoin#9070 - bitcoin/bitcoin#11385 - bitcoin/bitcoin#12048 - Excludes change to benchmark. - bitcoin/bitcoin#15117 - bitcoin/bitcoin#16161 - Excludes Travis CI changes. - Includes change from bitcoin/bitcoin#13163 - bitcoin/bitcoin#15600 - bitcoin/bitcoin#18443 - Assorted small changes from: - bitcoin/bitcoin#9233 - bitcoin/bitcoin#10483 - bitcoin/bitcoin#10645 - bitcoin/bitcoin#10969 - bitcoin/bitcoin#11351 - bitcoin/bitcoin#19111 - Excludes change to `src/rpc/server.cpp` - bitcoin/bitcoin#9804 - Only the commit for `src/key.cpp` - bitcoin/bitcoin#9598
1c04a35 [Trivial] Use C++11 nulltpr instead of NULL macro in allocators files (random-zebra) 68a3f8f bench: Add benchmark for lockedpool allocation/deallocation (Wladimir J. van der Laan) cd34088 rpc: Add `getmemoryinfo` call (Wladimir J. van der Laan) 2d2e331 support: Add LockedPool (Wladimir J. van der Laan) 6ca6470 allocators: split allocators and pagelocker (random-zebra) Pull request description: First commit adapts btc d7d187e. Then backport the last 3 commits of bitcoin#8753, which were left out in #1488. ACKs for top commit: Fuzzbawls: ACK 1c04a35 furszy: ACK 1c04a35 . Tree-SHA512: 9ee1100df3648a73e1fcd9b98f7fb793d3eb342ad239d3d4f9b2dd32dc0bacfdb29825e84e4e2ba4e621aedb0a62ae66dfba4d76a435b77dd5e2af1a01541890
Add a pool for locked memory chunks, replacing
LockedPageManager
.This is something I've been wanting to do for a long time. The current approach of preventing swapping of sensitive information by locking objects where they happen to be on the stack or heap in-place causes a lot of mlock/munlock system call churn, slowing down any handling of keys.
Not only that, but locked memory is a limited resource and using a lot of it bogs down the system by increasing overall swappiness, so the previous approach of locking every page that may contain any key information (but also other information) is wasteful.
Thus replace it with a consolidated pool of locked memory, so that chunks of "secure" memory can be allocated and freed without any system calls, and there is little memory overhead as possible (for example, administrative structures are not themselves in locked memory). The pool consists of one of more arenas, which divide a contiguous memory range into chunks. Arenas are allocated per 256 kB (configurable). If all current arenas are full, allocate a new one. Arenas are directly allocated from the OS with the appropriate memory page allocation API. No arenas are ever freed unless the program exits.
getmemoryinfo
that returns statistics from LockedPoolManager. This can also be used to check whether locking actually succeeded (to prevent sudden crashes with low ulimits it is not fatal if locking fails, but it is useful to be able to see if all key data is still in unswappable memory).Review notes
sizeof(vectortype)
remains in the memcpys and memcmps usage (ick!), and.data()
or&vec[x]
is used as appropriate instead of&vec
which would overwrite the vector structure.Measurements
Immediately after startup, loading a fairly large wallet.
Amount of memory locked
cat /proc/$PID/status | grep VmLck
, current master:With this patch:
Syscall count
strace -cf
over whole run (start+shutdown immediately) current master:With this patch: