Skip to content

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented May 22, 2017

This PR adds an -estlog option for saving live fee estimation data from a bitcoin node, and a fee_est command line tool for processing the data and testing fee estimation code.

The idea is to make it easier to test improvements to fee estimation like #10199 in a more systematic and reproducible way.

Some documentation is in src/test/fee_est/README.md

Sample log file: est.log.xz (65M)

Sample fee_est.cpp graph output:

graph

@paveljanik
Copy link
Contributor

This is interesting but still no comment.

Concept ACK

I'll play around with this a bit.

Wshadow statistics: 
   1 policy/fees_input.cpp:127:23: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:197:38: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:229:22: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:233:27: warning: declaration shadows a local variable [-Wshadow]
   1 policy/fees_input.cpp:27:61: warning: declaration shadows a field of 'CBlockPolicyInput' [-Wshadow]
   1 test/fee_est/fee_est.cpp:301:34: warning: declaration shadows a local variable [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:20: warning: declaration shadows a field of 'TxData' [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:29: warning: declaration shadows a field of 'TxData' [-Wshadow]
   1 test/fee_est/fee_est.cpp:79:39: warning: declaration shadows a field of 'TxData' [-Wshadow]

@paveljanik
Copy link
Contributor

When estimate log doesn't have enough data, fee_est is generating empty data in the HTML file. Emit some warning in such cases?

@paveljanik
Copy link
Contributor

For perfect output, there should be 3rd dimension - time or current block height when the tx was first seen :-)

@jnewbery
Copy link
Contributor

Concept ACK. I think this could be a useful tool.

Could this be split into two PRs to aid reviewers? The first PR would cover the -estlog option and writing the fee estimation data to disk. The second PR would be for a tool to read and graph the logs.

Parsing and graphing json files seems like a problem that has probably been solved many times before. If I was approaching this, I'd look at implementing this as a script in /contrib or a separate repository. Was there a particular reason you chose to implement this as a new C++ program?

@laanwj
Copy link
Member

laanwj commented Apr 10, 2018

Could this be split into two PRs to aid reviewers? The first PR would cover the -estlog option and writing the fee estimation data to disk. The second PR would be for a tool to read and graph the logs.

I think this is useful.

However I'm not sure the analysis tool belongs in this repository. As it's specific to developers debugging the fee estimation code it's not something we want to ship with the distribution, or install by default. One place it could be is e.g. https://github.com/bitcoin-core/bitcoin-maintainer-tools . You could still document or refer to it from here.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Apr 10, 2018

Agree that fee_est tool shouldn't be installed, since it's a tool specifically for made for modifying and debugging fee estimation code.

But it would be awkward to use and maintain from a separate repository because it links and calls into the fee estimation code. (The tool works by piping historical data into the fee estimator so it's possible to make experimental changes to fee estimation and see how those changes affect its output and internal state.)

As far as build / distribution is concerned I think it makes sense to think of it more like a unit test or benchmark than like a maintainer tool.

@laanwj
Copy link
Member

laanwj commented May 14, 2018

But it would be awkward to use and maintain from a separate repository because it links and calls into the fee estimation code.

Yes, I agree, that's unfortunately true. That part needs to stay in the repository with the rest of the code.

Parsing and graphing json files seems like a problem that has probably been solved many times before. If I was approaching this, I'd look at implementing this as a script in /contrib or a separate repository.

I agree - especially with the references to cloudflare CDN and such. It feels a bit ugly to have that in the C++ code. Better to spit out e.g. JSON or CSV or whatever is convenient, then use a separate script for pretty formatting. This would also keep open the option of using different (non-web) visualization tools.

@DrahtBot
Copy link
Contributor

The last travis run for this pull request was 54 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

@josibake
Copy link
Member

josibake commented May 1, 2023

Haven't dug into the code, but I'm leaning toward Concept ACK, Approach NACK

As someone who is planning to do fee estimation research in the near future, I don't think I'd use something like this the way it's currently written. I would much prefer something that logs Bitcoin Core's live fee rate estimations, or some option that would allow me to tell Bitcoin Core it's at a certain chain height, feed it some mempool data, and then ask it for fee estimations.

I also agree that the tool for analyzing the data shouldn't be in this repo. Much better if we can figure out what data is missing for people who want to do fee estimation analysis and provide an option for logging that data.

Perhaps a good next step would be to talk to people who are currently doing fee estimation work and see what data would help with their projects/PRs? @ClaraShk @darosior @xekyo ?

@ryanofsky
Copy link
Contributor Author

ryanofsky commented May 1, 2023

@josibake thanks for looking at this. I don't think I understand what the idea behind an approach NACK would be based on what you are saying, though.

It sounds more like you're asking for an additional feature which would be easy to implement on top of the first commit, not pointing to a drawback of the approach. If you want to gather live fee estimates, the first commit outputs a structured jsonl log of fee estimation events, so it would be easy to add the actual fee estimates to this log, or add internal metrics used to produce the fee estimates, or add any other data points that might be interesting. It is also possible to take the existing log files this PR already generates, and feed them to the fee estimation algorithm by modifying the tool in the second commit to produce live fee estimates or a history of estimates over time.

If you just think I should drop the second commit, or trim it down, I think that would be reasonable. The point of the second commit is mainly to provide example code that demonstrates how to read the JSONL log file and how to call the fee estimator API. The idea is to avoid needing reinvent the wheel each time you want to monitor fee estimation behavior, or reproduce a bug, or determine the effects of a change.

I would much prefer [...] some option that would allow me to tell Bitcoin Core it's at a certain chain height, feed it some mempool data, and then ask it for fee estimations.

This is actually what the second commit provides. The .jsonl log contains chain and mempool data. You can feed it to the test program in the second commit, and it calls fee estimation code to produce fee estimates. It can produce a graph of the fee estimates, and calculate how accurate they were by looking at all the transactions that got confirmed and outputting the mean squared error of how many blocks it actually took the transactions to be confirmed vs how many blocks the fee estimation algorithm predicted it would take to get them confirmed based on their feerate.

This PR is a few years old, so I would probably implement the second commit a little differently today. Instead of adding a standalone c++ executable that calls fee estimator code and produces graphs and metrics, I might want to just add a python module that exposes the CBlockPolicyEstimator class, and generate the graphs and metrics from a python notebook.

@josibake
Copy link
Member

josibake commented May 3, 2023

Thanks for the thorough response @ryanofsky. I realize I didn't explain myself very well in my initial comment.

It sounds more like you're asking for an additional feature which would be easy to implement on top of the first commit, not pointing to a drawback of the approach

Since we already record mempool events in the debug log file (this is what @jamesob and I are using to log mempool events in bmon), it feels a bit strange to have special code for logging transactions again here for fee estimation. It's possible I'm not understanding something here, though: is there some special information in the log you are writing that we can't get from the debug log file?

This is actually what the second commit provides. The .jsonl log contains chain and mempool data. You can feed it to the test program in the second commit, and it calls fee estimation code to produce fee estimates

I think what I was envisioning was even simpler: have a flag to make Bitcoin Core log fee_rate estimates per block and then have some way to walk the node back to some point in the past, have it log fee_rate estimates (perhaps with a code change) and then leave it up to the individual on how to analyze the diff between those logs. Perhaps this is easier said than done since the fee estimation code in Bitcoin Core today seems to work on a per transaction basis?

@ryanofsky
Copy link
Contributor Author

is there some special information in the log you are writing that we can't get from the debug log file?

I think so. The json log captures all inputs to fee estimator to used update its internal state. It includes not just mempool events but also information about which transactions are included in blocks, flush events, and contents of fee_estimates.dat blobs that are loaded. It makes fee estimation code reproducible, so if you make a change to the fee estimation code or parameters, you can evaluate the effects of the change with real world inputs offline. 1

I think what I was envisioning was even simpler: have a flag to make Bitcoin Core log fee_rate estimates per block and then have some way to walk the node back to some point in the past, have it log fee_rate estimates (perhaps with a code change) and then leave it up to the individual on how to analyze the diff between those logs. Perhaps this is easier said than done since the fee estimation code in Bitcoin Core today seems to work on a per transaction basis?

I think I don't actually understand the proposal or don't understand the intended use-case. Use-cases for PR this are to:

  • Be able to make changes to the fee estimation algorithm and be able to see how the changes affect fee estimates in a quick feedback loop
  • Be able to evaluate quality of fee estimates. The example program compares how quickly fee estimator expects transactions get to get confirmed (based on feerate) vs how quickly they actually get confirmed. This could be one metric for comparing quality of different fee estimation algorithms, or seeing if one algorithm starts to perform better or worse over time. It should also be possible to implement other metrics besides this one and do similar things.

Is your proposal basically to save a fee_estimates.dat file each time a block is connected and then use those files for something? I'm guess I'm not sure how you would use those files to improve fee estimates. This PR does capture a superset of that information, too. For example, it would be possible to generate those files offline from the json log file produced in this PR by adding a few lines to the filter callback:

const UniValue& block = value["block"];
if (block.isObject()) {
    AutoFile file{fsbridge::fopen(strprintf("fee_estimates-%07d.dat", block["height"].getInt<int>());, "wb")};
    estimator.Write(file);
}

Another advantage of this change is that it decouples CBlockPolicyEstimator from the mempool, so it's possible to call it offline. The external API changes from:

bitcoin/src/policy/fees.h

Lines 189 to 200 in 7b45d17

/** Process all the transactions that have been included in a block */
void processBlock(unsigned int nBlockHeight,
std::vector<const CTxMemPoolEntry*>& entries)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);
/** Process a transaction accepted to the mempool*/
void processTransaction(const CTxMemPoolEntry& entry, bool validFeeEstimate)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);
/** Remove a transaction from the mempool tracking stats*/
bool removeTx(uint256 hash, bool inBlock)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);

to:

bitcoin/src/policy/fees.h

Lines 191 to 201 in 26240e1

/** Process all the transactions that have been included in a block */
void processBlock(unsigned int nBlockHeight, const AddTxsFn& add_txs)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);
/** Process a transaction accepted to the mempool*/
void processTx(const uint256& hash, unsigned int txHeight, CAmount fee, uint32_t size, bool validFeeEstimate)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);
/** Remove a transaction from the mempool tracking stats*/
bool removeTx(uint256 hash, bool inBlock)
EXCLUSIVE_LOCKS_REQUIRED(!m_cs_fee_estimator);

There are probably lots of changes that could be made to this PR to improve it, or trim it down if it is doing too much, but in general I think takes a good first step of pulling things out of CBlockPolicyEstimator that don't belong there (mempool access, filesystem access), and then builds some useful features on top of that.

Footnotes

  1. I also think in practice parsing a json log file is cleaner and more straightforward than parsing a debug.log file. Compare
    readLog vs logparse. But I recognize there could be other advantages to using debug.log files, especially since debug.log files are omnipresent and already exist.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@ryanofsky
Copy link
Contributor Author

Going to close this because I think @josibake is working on a different approach which will try replace current fee estimator, instead of just instrumenting it like this PR.

There's also another PR #28368 which decouples the fee estimator from the mempool, and might make it easier to replay fee estimation events like the changes in this PR, even though it only provides an API, not a CLI tool.

@ryanofsky ryanofsky closed this Sep 19, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Sep 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.