Add fee_est tool for debugging fee estimation code #10443

ryanofsky · 2017-05-22T22:40:55Z

This PR adds an -estlog option for saving live fee estimation data from a bitcoin node, and a fee_est command line tool for processing the data and testing fee estimation code.

The idea is to make it easier to test improvements to fee estimation like #10199 in a more systematic and reproducible way.

Some documentation is in src/test/fee_est/README.md

Sample log file: est.log.xz (65M)

Sample fee_est.cpp graph output:

paveljanik · 2017-06-13T20:46:19Z

This is interesting but still no comment.

Concept ACK

I'll play around with this a bit.

Wshadow statistics: 
   1 policy/fees_input.cpp:127:23: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:197:38: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:229:22: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:233:27: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:27:61: warning: declaration shadows a field of 'CBlockPolicyInput' [-Wshadow]
   1 test/fee_est/fee_est.cpp:301:34: warning: declaration shadows a local variable [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:20: warning: declaration shadows a field of 'TxData' [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:29: warning: declaration shadows a field of 'TxData' [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:39: warning: declaration shadows a field of 'TxData' [-Wshadow]

paveljanik · 2017-06-14T07:21:25Z

When estimate log doesn't have enough data, fee_est is generating empty data in the HTML file. Emit some warning in such cases?

paveljanik · 2017-06-14T07:24:27Z

For perfect output, there should be 3rd dimension - time or current block height when the tx was first seen :-)

jnewbery · 2017-06-28T12:05:12Z

Concept ACK. I think this could be a useful tool.

Could this be split into two PRs to aid reviewers? The first PR would cover the -estlog option and writing the fee estimation data to disk. The second PR would be for a tool to read and graph the logs.

Parsing and graphing json files seems like a problem that has probably been solved many times before. If I was approaching this, I'd look at implementing this as a script in /contrib or a separate repository. Was there a particular reason you chose to implement this as a new C++ program?

laanwj · 2018-04-10T18:20:01Z

Could this be split into two PRs to aid reviewers? The first PR would cover the -estlog option and writing the fee estimation data to disk. The second PR would be for a tool to read and graph the logs.

I think this is useful.

However I'm not sure the analysis tool belongs in this repository. As it's specific to developers debugging the fee estimation code it's not something we want to ship with the distribution, or install by default. One place it could be is e.g. https://github.com/bitcoin-core/bitcoin-maintainer-tools . You could still document or refer to it from here.

ryanofsky · 2018-04-10T18:53:12Z

Agree that fee_est tool shouldn't be installed, since it's a tool specifically for made for modifying and debugging fee estimation code.

But it would be awkward to use and maintain from a separate repository because it links and calls into the fee estimation code. (The tool works by piping historical data into the fee estimator so it's possible to make experimental changes to fee estimation and see how those changes affect its output and internal state.)

As far as build / distribution is concerned I think it makes sense to think of it more like a unit test or benchmark than like a maintainer tool.

laanwj · 2018-05-14T14:18:30Z

But it would be awkward to use and maintain from a separate repository because it links and calls into the fee estimation code.

Yes, I agree, that's unfortunately true. That part needs to stay in the repository with the rest of the code.

Parsing and graphing json files seems like a problem that has probably been solved many times before. If I was approaching this, I'd look at implementing this as a script in /contrib or a separate repository.

I agree - especially with the references to cloudflare CDN and such. It feels a bit ugly to have that in the C++ code. Better to spit out e.g. JSON or CSV or whatever is convenient, then use a separate script for pretty formatting. This would also keep open the option of using different (non-web) visualization tools.

DrahtBot · 2018-07-29T15:18:00Z

The last travis run for this pull request was 54 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

josibake · 2023-05-01T13:03:18Z

Haven't dug into the code, but I'm leaning toward Concept ACK, Approach NACK

As someone who is planning to do fee estimation research in the near future, I don't think I'd use something like this the way it's currently written. I would much prefer something that logs Bitcoin Core's live fee rate estimations, or some option that would allow me to tell Bitcoin Core it's at a certain chain height, feed it some mempool data, and then ask it for fee estimations.

I also agree that the tool for analyzing the data shouldn't be in this repo. Much better if we can figure out what data is missing for people who want to do fee estimation analysis and provide an option for logging that data.

Perhaps a good next step would be to talk to people who are currently doing fee estimation work and see what data would help with their projects/PRs? @ClaraShk @darosior @xekyo ?

ryanofsky · 2023-05-01T15:23:02Z

@josibake thanks for looking at this. I don't think I understand what the idea behind an approach NACK would be based on what you are saying, though.

It sounds more like you're asking for an additional feature which would be easy to implement on top of the first commit, not pointing to a drawback of the approach. If you want to gather live fee estimates, the first commit outputs a structured jsonl log of fee estimation events, so it would be easy to add the actual fee estimates to this log, or add internal metrics used to produce the fee estimates, or add any other data points that might be interesting. It is also possible to take the existing log files this PR already generates, and feed them to the fee estimation algorithm by modifying the tool in the second commit to produce live fee estimates or a history of estimates over time.

If you just think I should drop the second commit, or trim it down, I think that would be reasonable. The point of the second commit is mainly to provide example code that demonstrates how to read the JSONL log file and how to call the fee estimator API. The idea is to avoid needing reinvent the wheel each time you want to monitor fee estimation behavior, or reproduce a bug, or determine the effects of a change.

I would much prefer [...] some option that would allow me to tell Bitcoin Core it's at a certain chain height, feed it some mempool data, and then ask it for fee estimations.

This is actually what the second commit provides. The .jsonl log contains chain and mempool data. You can feed it to the test program in the second commit, and it calls fee estimation code to produce fee estimates. It can produce a graph of the fee estimates, and calculate how accurate they were by looking at all the transactions that got confirmed and outputting the mean squared error of how many blocks it actually took the transactions to be confirmed vs how many blocks the fee estimation algorithm predicted it would take to get them confirmed based on their feerate.

This PR is a few years old, so I would probably implement the second commit a little differently today. Instead of adding a standalone c++ executable that calls fee estimator code and produces graphs and metrics, I might want to just add a python module that exposes the CBlockPolicyEstimator class, and generate the graphs and metrics from a python notebook.

josibake · 2023-05-03T09:53:12Z

Thanks for the thorough response @ryanofsky. I realize I didn't explain myself very well in my initial comment.

It sounds more like you're asking for an additional feature which would be easy to implement on top of the first commit, not pointing to a drawback of the approach

Since we already record mempool events in the debug log file (this is what @jamesob and I are using to log mempool events in bmon), it feels a bit strange to have special code for logging transactions again here for fee estimation. It's possible I'm not understanding something here, though: is there some special information in the log you are writing that we can't get from the debug log file?

This is actually what the second commit provides. The .jsonl log contains chain and mempool data. You can feed it to the test program in the second commit, and it calls fee estimation code to produce fee estimates

I think what I was envisioning was even simpler: have a flag to make Bitcoin Core log fee_rate estimates per block and then have some way to walk the node back to some point in the past, have it log fee_rate estimates (perhaps with a code change) and then leave it up to the individual on how to analyze the diff between those logs. Perhaps this is easier said than done since the fee estimation code in Bitcoin Core today seems to work on a per transaction basis?

ryanofsky · 2023-05-03T14:55:11Z

is there some special information in the log you are writing that we can't get from the debug log file?

I think so. The json log captures all inputs to fee estimator to used update its internal state. It includes not just mempool events but also information about which transactions are included in blocks, flush events, and contents of fee_estimates.dat blobs that are loaded. It makes fee estimation code reproducible, so if you make a change to the fee estimation code or parameters, you can evaluate the effects of the change with real world inputs offline. ¹

I think what I was envisioning was even simpler: have a flag to make Bitcoin Core log fee_rate estimates per block and then have some way to walk the node back to some point in the past, have it log fee_rate estimates (perhaps with a code change) and then leave it up to the individual on how to analyze the diff between those logs. Perhaps this is easier said than done since the fee estimation code in Bitcoin Core today seems to work on a per transaction basis?

I think I don't actually understand the proposal or don't understand the intended use-case. Use-cases for PR this are to:

Be able to make changes to the fee estimation algorithm and be able to see how the changes affect fee estimates in a quick feedback loop
Be able to evaluate quality of fee estimates. The example program compares how quickly fee estimator expects transactions get to get confirmed (based on feerate) vs how quickly they actually get confirmed. This could be one metric for comparing quality of different fee estimation algorithms, or seeing if one algorithm starts to perform better or worse over time. It should also be possible to implement other metrics besides this one and do similar things.

Is your proposal basically to save a fee_estimates.dat file each time a block is connected and then use those files for something? I'm guess I'm not sure how you would use those files to improve fee estimates. This PR does capture a superset of that information, too. For example, it would be possible to generate those files offline from the json log file produced in this PR by adding a few lines to the filter callback:

const UniValue& block = value["block"];
if (block.isObject()) {
    AutoFile file{fsbridge::fopen(strprintf("fee_estimates-%07d.dat", block["height"].getInt<int>());, "wb")};
    estimator.Write(file);
}

Another advantage of this change is that it decouples CBlockPolicyEstimator from the mempool, so it's possible to call it offline. The external API changes from:

bitcoin/src/policy/fees.h

Lines 189 to 200 in 7b45d17

    
           /** Process all the transactions that have been included in a block */ 
        
           void processBlock(unsigned int nBlockHeight, 
        
                             std::vector<const CTxMemPoolEntry*>& entries) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator); 
        
           /** Process a transaction accepted to the mempool*/ 
        
           void processTransaction(const CTxMemPoolEntry& entry, bool validFeeEstimate) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator); 
        
           /** Remove a transaction from the mempool tracking stats*/ 
        
           bool removeTx(uint256 hash, bool inBlock) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);

to:

bitcoin/src/policy/fees.h

Lines 191 to 201 in 26240e1

    
           /** Process all the transactions that have been included in a block */ 
        
           void processBlock(unsigned int nBlockHeight, const AddTxsFn& add_txs) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator); 
        
           /** Process a transaction accepted to the mempool*/ 
        
           void processTx(const uint256& hash, unsigned int txHeight, CAmount fee, uint32_t size, bool validFeeEstimate) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator); 
        
           /** Remove a transaction from the mempool tracking stats*/ 
        
           bool removeTx(uint256 hash, bool inBlock) 
        
               EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);

There are probably lots of changes that could be made to this PR to improve it, or trim it down if it is doing too much, but in general I think takes a good first step of pulling things out of CBlockPolicyEstimator that don't belong there (mempool access, filesystem access), and then builds some useful features on top of that.

I also think in practice parsing a json log file is cleaner and more straightforward than parsing a debug.log file. Compare
readLog vs logparse. But I recognize there could be other advantages to using debug.log files, especially since debug.log files are omnipresent and already exist. ↩

DrahtBot · 2023-06-20T17:24:07Z

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot · 2023-09-18T00:13:29Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

ryanofsky · 2023-09-19T11:38:20Z

Going to close this because I think @josibake is working on a different approach which will try replace current fee estimator, instead of just instrumenting it like this PR.

There's also another PR #28368 which decouples the fee estimator from the mempool, and might make it easier to replay fee estimation events like the changes in this PR, even though it only provides an API, not a CLI tool.

fanquake added the TX fees and policy label May 23, 2017

ryanofsky force-pushed the pr/fee branch from ae719a1 to fbe0961 Compare May 23, 2017 13:36

ryanofsky force-pushed the pr/fee branch 3 times, most recently from 43fa494 to ec5f6ec Compare June 7, 2017 15:23

ryanofsky force-pushed the pr/fee branch 2 times, most recently from da0e959 to 88e26a1 Compare July 19, 2017 17:14

ryanofsky mentioned this pull request Aug 17, 2017

AI for fee estimation and other parts of the system #11064

Closed

ryanofsky force-pushed the pr/fee branch from 88e26a1 to 242fea6 Compare August 25, 2017 21:23

ryanofsky force-pushed the pr/fee branch 2 times, most recently from 87e104b to f49e21d Compare November 29, 2017 21:26

ryanofsky force-pushed the pr/fee branch from f49e21d to d9cc337 Compare December 12, 2017 21:12

ryanofsky force-pushed the pr/fee branch from d9cc337 to 1b18551 Compare February 9, 2018 21:40

ryanofsky force-pushed the pr/fee branch from 1b18551 to 12e3eed Compare March 2, 2018 16:14

ryanofsky force-pushed the pr/fee branch from 12e3eed to b8c2ce5 Compare March 30, 2018 15:30

ryanofsky force-pushed the pr/fee branch from b8c2ce5 to 925281a Compare April 10, 2018 12:54

ryanofsky force-pushed the pr/fee branch from 925281a to 51163ce Compare May 14, 2018 17:46

ryanofsky force-pushed the pr/fee branch from 51163ce to 8b9c3da Compare June 4, 2018 19:32

DrahtBot mentioned this pull request Jun 14, 2018

[bugfix] Fix encoding issue for Windows #13426

Closed

This was referenced Jul 25, 2018

[WIP] Full unicode support on Windows #13755

Closed

refactor: Avoid locking tx pool cs thrice #13786

Merged

Test for Windows encoding issue #13787

Merged

ryanofsky force-pushed the pr/fee branch from 1288ad0 to 6e28e72 Compare February 10, 2023 14:19

DrahtBot removed the Needs rebase label Feb 10, 2023

DrahtBot mentioned this pull request Apr 4, 2023

move-only: Extract common/args from util/system #27419

Merged

DrahtBot added the Needs rebase label Apr 21, 2023

achow101 requested review from glozow and josibake April 25, 2023 16:14

ryanofsky added 2 commits May 2, 2023 09:03

Add -estlog option for saving live fee estimation data

651e375

Add fee_est tool for debugging fee estimation code

26240e1

ryanofsky force-pushed the pr/fee branch from 6e28e72 to 4d728f4 Compare May 2, 2023 15:09

DrahtBot added CI failed and removed Needs rebase labels May 2, 2023

ryanofsky force-pushed the pr/fee branch from 4d728f4 to 26240e1 Compare May 2, 2023 17:12

DrahtBot removed the CI failed label May 2, 2023

This was referenced May 11, 2023

Fee estimation: avoid serving stale fee estimate #27622

Merged

kernel: Remove util/system from kernel library, interface_ui from validation. #27636

Merged

kernel: Remove args, settings, chainparams, chainparamsbase from kernel library #27576

Merged

DrahtBot added the CI failed label May 14, 2023

DrahtBot mentioned this pull request May 21, 2023

kernel: Remove shutdown globals from kernel library #27711

Closed

This was referenced Jun 12, 2023

Mempool: persist mempoolminfee accross restarts #27859

Closed

kernel: Rm ShutdownRequested and AbortNode from validation code. #27861

Merged

DrahtBot added the Needs rebase label Jun 20, 2023

ryanofsky closed this Sep 19, 2023

bitcoin locked and limited conversation to collaborators Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fee_est tool for debugging fee estimation code #10443

Add fee_est tool for debugging fee estimation code #10443

Uh oh!

ryanofsky commented May 22, 2017 •

edited by DrahtBot

Loading

Uh oh!

paveljanik commented Jun 13, 2017

Uh oh!

paveljanik commented Jun 14, 2017

Uh oh!

paveljanik commented Jun 14, 2017

Uh oh!

jnewbery commented Jun 28, 2017

Uh oh!

laanwj commented Apr 10, 2018

Uh oh!

ryanofsky commented Apr 10, 2018 •

edited

Loading

Uh oh!

laanwj commented May 14, 2018 •

edited

Loading

Uh oh!

DrahtBot commented Jul 29, 2018

Uh oh!

josibake commented May 1, 2023

Uh oh!

ryanofsky commented May 1, 2023 •

edited

Loading

Uh oh!

josibake commented May 3, 2023

Uh oh!

ryanofsky commented May 3, 2023

Uh oh!

DrahtBot commented Jun 20, 2023

Uh oh!

DrahtBot commented Sep 18, 2023

Uh oh!

ryanofsky commented Sep 19, 2023

Uh oh!

Uh oh!

Add fee_est tool for debugging fee estimation code #10443

Add fee_est tool for debugging fee estimation code #10443

Uh oh!

Conversation

ryanofsky commented May 22, 2017 • edited by DrahtBot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paveljanik commented Jun 13, 2017

Uh oh!

paveljanik commented Jun 14, 2017

Uh oh!

paveljanik commented Jun 14, 2017

Uh oh!

jnewbery commented Jun 28, 2017

Uh oh!

laanwj commented Apr 10, 2018

Uh oh!

ryanofsky commented Apr 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jul 29, 2018

Uh oh!

josibake commented May 1, 2023

Uh oh!

ryanofsky commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josibake commented May 3, 2023

Uh oh!

ryanofsky commented May 3, 2023

Footnotes

Uh oh!

DrahtBot commented Jun 20, 2023

Uh oh!

DrahtBot commented Sep 18, 2023

Uh oh!

ryanofsky commented Sep 19, 2023

Uh oh!

Uh oh!

ryanofsky commented May 22, 2017 •

edited by DrahtBot

Loading

ryanofsky commented Apr 10, 2018 •

edited

Loading

laanwj commented May 14, 2018 •

edited

Loading

ryanofsky commented May 1, 2023 •

edited

Loading