-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Process logs in a separate thread #13200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c147c17
to
f3f9602
Compare
src/logging.cpp
Outdated
static void ConsumeLogs() | ||
{ | ||
std::unique_ptr<LogArgs> next_log_line; | ||
while (next_log_line = log_buffer.PollForOne()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add batch support?
Concept ACK. My thoughts on your open questions:
|
Concept ACK! Agree with @jnewbery regarding the open questions. Benchmarks would be interesting. |
Wouldn't blocking be the status-quo and thus preferable for now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Good idea :)
Just a few things I've spotted.
Regarding your questions:
Do we want this behavior by default? I think so, given that it seems safer and more performant than doing log processing synchronously with Important Stuff.
Yes
Do we want to allow an opt-out configuration option that makes log processing synchronous again?
I'm a bit against adding config options for these kind of things. It either should work in all cases or not be used. I think for logging we should stick to one version and not have it user configurable.
The ring buffer implementation allows either overwriting existing data or blocking when at full capacity. In the case of logging, I've set it to block and wait for capacity instead of dropping messages. Does this seem right?
I'm very against dropping logging data. It shouldn't happen. The threads pushing logs should wait if it does block but the buffer should be big enough that this is rare anyway.
src/ringbuffer.h
Outdated
* | ||
* Allows optional blocking behavior instead of overwriting when at capacity. | ||
*/ | ||
template<typename ET, int S, typename ST=int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the size type template parameter? Especially if you are passing an int through as a value. size_t
makes more sense to me since std::array is size_t and signed sizes (int) also make no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agree. Fixed.
src/ringbuffer.h
Outdated
std::lock_guard<std::mutex> lock(m_lock); | ||
return m_size; | ||
} | ||
inline size_type MaxSize() const { return S; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr ? I guess inline helps some here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ringbuffer.h
Outdated
size_type m_size; | ||
|
||
/** Index at which the next insertion will happen. */ | ||
std::atomic<ST> m_insert_at_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't make sense why having these (m_insert_at_idx
and m_read_at_idx
) as std::atomic is necessary if they are protected by the mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed.
src/ringbuffer.h
Outdated
assert(m_size > 0); | ||
|
||
value_type data = std::move(m_buffer[m_read_at_idx]); | ||
m_buffer[m_read_at_idx] = value_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the object is movable it will be moved and the value will be empty already but if it's not movable and it's copied then I guess there's no big deal letting the data present? Seems unnecessary to construct an empty object in a free slot again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, fixed.
src/ringbuffer.h
Outdated
* | ||
* @arg no_overwrite If true, block and wait to write instead of overwriting existing data. | ||
*/ | ||
size_type PushBack(value_type v, bool no_overwrite = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a move version of PushBack with const v& ref push back would make more sense to avoid unnecessary copying.
PushBack(value_type&& v)
and PushBack(const value_type& v)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Shouldn't const value_type&
bind to both rvalues and lvalues, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/logging.cpp
Outdated
|
||
void FlushAll() | ||
{ | ||
for (LogArgs s : log_buffer.PopAll()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogArgs& s
to avoid copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
src/ringbuffer.h
Outdated
} | ||
|
||
private: | ||
std::array<value_type, S> m_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should use dynamic/heap memory for this for the following reasons:
- This is only allocated once on start up and is of fixed size.
- Passing a large value to this array can potentially have 2 problems:
- In the stack it can overflow (if used later by someone else)
- In global/static data it will grow the executable size.
Anyway just what I feel. Maybe std::array is okay in this case though. I'm just imagining uses of this class further down the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as-is for now since I don't think it's a real problem, but if you have thoughts to the contrary I'm curious to hear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me thanks! I went a bit overboard with the comments anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I thought your feedback was very useful - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine - if you want it on the heap you can alloc the entire RingBuffer
on the heap
84a3c73
to
5dbf3bf
Compare
I've pushed changes that address @donaloconnor's good feedback (thanks, Donal). Still trying to decide what to do re: blocking vs. dropping messages; I'm inclined towards the approach that @jnewbery suggests since it seems to be the safest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments - two general ones though:
- you really wrote a thread safe ringbuffer - if you think this will be useful elsewhere, consider having just a pure implementation of the ringbuffer with a derivation (or wrapper) that adds the synchronization logic
- Some of your api naming/consistency is a bit confusing. You have:
PushBack()
- might block if no_overwrite is set
PopFront()
- never blocks - user must know data is available
PollForOne()
- blocks, might return null
On the block vs drop debate - I think I would vote in the blocking camp since that is essentially current behavior, followed by dropping older messages
src/ringbuffer.h
Outdated
* | ||
* Allows optional blocking behavior instead of overwriting when at capacity. | ||
*/ | ||
template<typename ET, int S> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider making these template parameter names less abbreviated
src/ringbuffer.h
Outdated
} | ||
|
||
private: | ||
std::array<value_type, S> m_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine - if you want it on the heap you can alloc the entire RingBuffer
on the heap
src/ringbuffer.h
Outdated
*/ | ||
std::unique_ptr<value_type> PollForOne(int poll_interval_ms = 200) | ||
{ | ||
while (!m_stop_waiting.load()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this approach seems strange - why not just use another condition variable the same way you do when you push things to the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are doing this just to handle interruption on shutdown - you can do the boost::thread::interrupt() boost::condition_var and catch a boost::thread_interrupted exception instead of polling in a loop - its another option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we're trying to move away from boost
's thread stuff, but I think your general suggestion holds - we can probably just flip a condition variable + boolean in Shutdown()
to get the same sort of behavior. The polling was a pretty gross hack.
src/ringbuffer.h
Outdated
~RingBuffer() { } | ||
|
||
/** @return the number of elements in the buffer. */ | ||
inline size_t Size() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline seems overkill here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also - if you wanna do the size_t stuff tradition / precedence would suggest doing using size_type = size_t
src/ringbuffer.h
Outdated
std::unique_lock<std::mutex> lock(m_lock); | ||
std::vector<value_type> values; | ||
|
||
while (m_size > 0 && m_read_at_idx != m_insert_at_idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sufficient here to just do the size check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also - you could just do this as two range copies one from read_at to min(read_at + m_size, S) and another from 0 to m_size - the size of the stuff you copied previously
src/logging.cpp
Outdated
log_buffer.PushBack(std::move(str)); | ||
} | ||
|
||
void Shutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if for whatever reason there are more logs generated after you've flushed in this method - they wont ever get to the file.
I think you mentioned somewhere having an option to enable synchronous logging. You should probably flip that flag that here so if something happens to get logged after the async logger is shut down it makes its way into the file, added benefit that you can be a bit less careful about when in the shutdown sequence the logger is terminated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point.
src/bitcoind.cpp
Outdated
return (AppInit(argc, argv) ? EXIT_SUCCESS : EXIT_FAILURE); | ||
int status = AppInit(argc, argv); | ||
|
||
if (!status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if you do shutdown here it will only work for coind but not qt but in init.cpp it will for both? not entirely sure though
src/init.cpp
Outdated
@@ -291,6 +291,7 @@ void Shutdown() | |||
globalVerifyHandle.reset(); | |||
ECC_Stop(); | |||
LogPrintf("%s: done\n", __func__); | |||
async_logging::Shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh - why do in both? if you are just concerned about capturing all logs, see my comment on the Shutdown() method below
src/ringbuffer.h
Outdated
if (no_overwrite && m_size == S) { | ||
m_read_condvar.wait(lock, [this]() { return this->m_size < S; }); | ||
} | ||
m_buffer[m_insert_at_idx] = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing as in a prior comment - use a forward with an rvalue reference
src/ringbuffer.h
Outdated
{ | ||
std::unique_lock<std::mutex> lock(m_lock); | ||
if (m_size > 0) { | ||
return std::unique_ptr<value_type>(new value_type(AssumeLockedPopFront())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you are doing an unnecessary copy here
Thanks for the thorough review and good feedback here, @skeees. I'll address most of your comments in an upcoming push. Sadly, the benchmarks aren't looking good on this one. Either my implementation of the ringbuffer is dog slow or this Concept is eminently NACKable. Doing an IBD to height 250,000 from one (chain-synced) virtual machine to another (unsynced) over my local network, this PR performed roughly 77% slower than master I'm going to try applying @TheBlueMatt's previous work (bitcoinfibre@6b6a3ae) to master and see if that performs any better. If it doesn't, I'm going to just abandon async logging and continue work on a periodic benchmarking system. |
I'd do some profiling / tweaking before you abandon this entirely. I'd think this should be at a minimum equivalent - and potentially substantially faster than the current way things are logged. |
@TheBlueMatt's patch (jamesob@7ab904a) is better than mine, but still a minute (i.e. 10%) slower than
|
I was curious about the unexpectedly slow performance and tried something slightly different - skeees@1bf2307 its a bit messy, but good enough for benchmarking which seems to suggest it performs slightly better than the current logging, especially if you give it short workloads (e.g. do a reindex w a stopatheight of 150k) - though I imagine this effect gets washed away if you did e.g. a full IBD. |
Nice, @skeees! That approach performs the best so far in terms of async logging implementations, though it's still roughly on par with master in my benchmark:
(the core dump is just an unclean shutdown of the logging thread) @skeees suggested some ideas for further improvements on his draft which he or I may fool around with. |
I cleaned this up a bit more and updated my branch: https://github.com/skeees/bitcoin/tree/async-log-2 For benchmarks like the below (a reindex of 1st 100k blocks), I see a substantial improvement - that slowly fades away and longer duration things (e.g. a large IBD) is effectively equivalent - which suggests this may have some performance benefit for short bursty stuff that generates lots of logs
|
3212047
to
34000b1
Compare
I think I'm giving up the ghost on this one. I've taken @skeees' latest work, fixed shutdown, and simplified the code a bit, but I still can't get benchmarks to show improvement. Below is a round of three reindexes to height 100,000: master (196c5a9)vagrant@bench:/tmp/2018-05-09-master-6186a/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all
-printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex -logasync=0; done
real 1m21.080s
user 0m33.459s
sys 0m37.487s
real 1m4.716s
user 0m34.033s
sys 0m33.567s
real 1m23.858s
user 0m34.043s
sys 0m36.156s this branch (34000b1)vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex; done
real 1m4.269s
user 0m31.258s
sys 0m33.798s
real 1m30.629s
user 0m31.691s
sys 0m40.961s
real 1m14.528s
user 0m32.676s
sys 0m38.040s skeees version (skeees@57dc3fc)vagrant@bench:/tmp/2018-05-11-2018-05-asynclog-13145/bitcoin$ for _ in `seq 3`; do time ./src/bitcoind -datadir=`pwd`/data -conf=`pwd`/bitcoin.conf -debug=all -printtoconsole=0 -connect=0 -addnode=10.0.0.10 -listen=1 -stopatheight=100000 -reindex; done
terminate called without an active exception
Aborted (core dumped)
real 1m18.180s
user 0m33.764s
sys 0m39.323s
terminate called without an active exception
Aborted (core dumped)
real 3m17.639s
user 0m45.444s
sys 2m21.804s
terminate called without an active exception
Aborted (core dumped)
real 4m59.937s
user 0m48.212s
sys 3m35.134s Can't tell what's going on with those last two runs. In any case, I don't think the numbers here justify the complexity of this changeset. It's surprising to me that we aren't seeing more improvement, but I've got other stuff to do. I'm going to abandon this PR for now, but if anyone wants to pick it up they're welcome to. |
I would like to thank you for doing this experimentation @jamesob. Negative results give insights too. Thanks! |
Reopening because @skeees has some interesting bench results... |
wrote a quick benchmarking script here: https://gist.github.com/skeees/93d3ab3ce8ceabfa1b68df51d3b36b60 Some sample results below (we've now had a couple of independent reproductions across a couple of different machines) - and the latest commit on this pr is consistently faster than existing implementations for microbenchmarks (100k blocks) and equivalent if you change height in the script to a larger number (e.g. if you set I think, given the below, there's decent evidence that this will improve performance for bursty things, and is worth pursuing. A couple things still need to be cleaned up (shutdown mostly) (total is wall clock time)
|
Instead of flushing logs to disk synchronously within the originating thread, push the log message into a ring buffer to be consumed and written to disk by a separate, logging-specific thread. Co-authored-by: James O'Beirne <james@chaincode.com>
@skeees Is Do you know roughly how many log messages that were printed during the reindex and ibd rounds respectively? |
yeah - exactly master is a recent master (6916024), and then i took matt's and my+jamesob's changes and rebased on top of that master - those are the 3 cases. you should be able to run that script if you wanna reproduce good question - looks like the logs were ~150MB each run, 1.25mm lines |
The last travis run for this pull request was 60 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened. |
Benchmarking with bitcoinperf shows a speedup of 4.6% on IBD up to height 200,000 with this change (details here). |
Note to reviewers: This pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
How about (4) Store a sequence number, always overwrite. When reading out, notice when there were missed messages via gaps in the sequence number and log "X missed log entries". Aside, benchmarking on the first 100k blocks is something I'd strongly advise against. It's an almost completely different load than reality. The first couple hundred thousand blocks are almost completely empty, so tests against them aren't much different than a purely synthetic bench_bitcoin thing. If you're doing a lot of this sort of testing, its useful to bring a node up to height 450k, stop it, back it up and benchmark reindexing up to 500k. |
Needs rebase |
That sounds much more reasonable. I'll rerun benches on that part of the chain in the next few weeks. |
There hasn't been much activity lately and the patch still needs rebase, so I am closing this for now. Please let me know when you want to continue working on this, so the pull request can be re-opened. |
Just a thought I had, |
I don't plan to put any more work into this, but this might be a good project to pick up for a newish contributor. |
This changeset renders and flushes log messages in a dedicated thread, preventing the originating thread from blocking (or failing) on fwrite, fflush, or the various string manipulations done on log messages. Originating threads push log messages into a buffer that is continually flushed.
Benchmarks live in the comments below.
Open questions:
Future work: