Skip to content

Conversation

brunoerg
Copy link
Contributor

@brunoerg brunoerg commented Jan 11, 2022

In the current scenario, anchors.dat is deleted right after reading it, however, if my node stops before trying to connect to the anchors peers, it will create an empty anchors.dat. This PR changes it to delete anchors.dat after trying to connect to the peers from it. So, if my node stops (in a clean way) before trying to connect to that peers, the anchors.dat file will be preserved and a dump won't happen.

@brunoerg brunoerg marked this pull request as draft January 11, 2022 13:39
@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from 55cf0d2 to c9893a6 Compare January 11, 2022 13:50
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments after skimming the diff, feel free to ignore for now; will review.

@jonatack
Copy link
Member

Perhaps remove "refactor" from the PR title as there seems to be a change of behavior.

@brunoerg brunoerg marked this pull request as ready for review January 11, 2022 14:10
@brunoerg brunoerg changed the title p2p, refactor: delete anchors.dat after trying to connect to that peers p2p: delete anchors.dat after trying to connect to that peers Jan 11, 2022
@brunoerg
Copy link
Contributor Author

Perhaps remove "refactor" from the PR title as there seems to be a change of behavior.

Done!

@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch 2 times, most recently from 69d4184 to c24194c Compare January 11, 2022 14:36
@DrahtBot DrahtBot added the P2P label Jan 11, 2022
@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from c24194c to fb0c332 Compare January 11, 2022 16:38
@brunoerg
Copy link
Contributor Author

force-pushed addressing @jonatack's comments

@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from fb0c332 to 831377b Compare January 12, 2022 10:40
@brunoerg
Copy link
Contributor Author

cc: @hebasto

@hebasto
Copy link
Member

hebasto commented Jan 24, 2022

So, if my node stops before trying to connect to that peers, the anchors.dat file will be preserved.

Do you mean the normal shutdown, i.e., stop RPC or Quit in the GUI, right?

src/net.cpp Outdated
Comment on lines 2559 to 2308
} else if (m_anchors.size() == 0) {
DeleteAnchorsFile(gArgs.GetDataDirNet() / ANCHORS_DATABASE_FILENAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

@brunoerg brunoerg Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if there aren't anchors we can delete the file at that point, no need to wait to try connect them.

Copy link
Contributor Author

@brunoerg brunoerg Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about it earlier but I just remembered I can't remove it because in https://github.com/bitcoin/bitcoin/pull/24034/files#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R2029 if m_anchors is empty the anchors.dat won't be deleted.

@brunoerg
Copy link
Contributor Author

brunoerg commented Feb 1, 2022

Do you mean the normal shutdown, i.e., stop RPC or Quit in the GUI, right?

Yes.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the current approach works for the goal you're trying to achieve.

In case of clean shutdown, CConMan::Close() is always (I think?) going to be called, which means that DumpAnchors() is called too. DumpAnchors() will just overwrite any existing anchors.dat file, regardless of whether or not that file exists. If no block-only connections have been initialized yet, DumpAnchors() will simply create an empty file. So I think in case of clean shutdown, you need to control when DumpAnchors() is called instead of deleting anchors.dat.

In case of forced shutdown, I think this PR moves in the right direction by not immediately removing anchors.dat, because then DumpAnchors() may not be called and it could be helpful to restart from the previously existing anchors.dat. However, I'm unsure (from a security or resilience perspective) if that's something to be desired or not, I don't fully understand the big picture yet here. On the one hand, you don't want to make it too easy for an attacker to purge existing anchors. However, you should properly handle e.g. a corrupt anchors file too.

src/addrdb.cpp Outdated
return anchors;
}

void DeleteAnchorsFile(const fs::path& anchors_db_path) { fs::remove(anchors_db_path); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making anchors_db_path optional? If you apply the same pattern to DumpAnchors, ReadAnchors and DeleteAnchorsFile, you can move the definition of ANCHORS_DATABASE_FILENAME to addrdb.cpp. Personally I think it's more logical to define the default path logic within these functions since they're called DeleteAnchorsFile, not DeleteFile.

Suggested change
void DeleteAnchorsFile(const fs::path& anchors_db_path) { fs::remove(anchors_db_path); }
void DeleteAnchorsFile(std::optional<const fs::path> anchors_db_path)
{
fs::path path{ anchors_db_path.value_or(gArgs.GetDataDirNet() / ANCHORS_DATABASE_FILENAME) };
fs::remove(path);
}

@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch 2 times, most recently from ab509a0 to fe350d6 Compare February 21, 2022 13:53
@brunoerg
Copy link
Contributor Author

Thanks, @stickies-v.

Added a flag to control if a connection with all peers from anchors.dat was tried. So, if the connections with all of them have not been tried yet we don't allow DumpAnchors.

E.g:

  1. Starts bitcoind
  2. Read anchors.dat
  3. 2 block-relay-only anchors will be tried for connections
  4. Shutdown node before trying to connect to them
  5. Preserve anchors.dat and don't allow call to DumpAnchors

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light Concept ACK

This seems to be a definitive improvement, but there might be unthought drawbacks or consequences that I'm not aware of yet. Looking forward to today's review club to shed some more light on this.

Copy link
Contributor

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems arbitrary that we preserve our block-relay-only peers across restarts if the node shuts down cleanly but not if the node shuts down uncleanly. Would a better approach be to never delete anchors.dat but instead rewrite it (truncate and write) immediately whenever the set of block-relay-only peers changes (that is, make it a write-through cache), instead of only on a clean shutdown?

If this set of peers gets updated multiple times close in time, then for better performance, use CScheduler.scheduleFromNow() to implement a write-back cache (check every few seconds if anchors.dat is out of date, and if so, rewrite it).

Copy link
Contributor

@lsilva01 lsilva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK, but I think the code can be simplified.

delete_anchors_file and tried_connect_anchors may not be needed.
And there may also be no need to change CConnman::Start.

I added some suggestions.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems arbitrary that we preserve our block-relay-only peers across restarts if the node shuts down cleanly but not if the node shuts down uncleanly. Would a better approach be to never delete anchors.dat but instead rewrite it (truncate and write) immediately whenever the set of block-relay-only peers changes (that is, make it a write-through cache), instead of only on a clean shutdown?

There was a lot of discussion on this point in the original PR #17428 which introduced the anchors, see e.g. this summary of some of the aspects.

If I understand it correctly, this PR changes behavior in the special case where a clean shutdown is happening in the time window between startup and connecting to the anchors.
Is there a realistic scenario for how this could happen? If the shutdown was caused maliciously (by DoS, power shutdown, etc.), I think it would likely not be clean.

The only scenario I could think of is that the node operator themselves initiate a clean shutdown (maybe they forgot to specify a startup option, send a SIGINT and try again).

@brunoerg
Copy link
Contributor Author

brunoerg commented Mar 15, 2022

@mzumsande

If I understand it correctly, this PR changes behavior in the special case where a clean shutdown is happening in the time window between startup and connecting to the anchors.

This PR covers both scenarios (clean and unclean). See the functional test this PR modifies, I just added (last force-pushed) a coverage for un clean shutdown using the same logic as feature_init.

Is there a realistic scenario for how this could happen? If the shutdown was caused maliciously (by DoS, power shutdown, etc.), I think it would likely not be clean.

The only scenario I could think of is that the node operator themselves initiate a clean shutdown (maybe they forgot to specify a startup option, send a SIGINT and try again).

Yes, you're right, I don't think there is a realistic/explicit scenario as an unclean has. It could be caused by a node operator, maybe a social engineering, but there are many more scenarios for unclean shutdown as you mentioned (DoS, power shutdown, etc.).

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 23, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK theStack
Approach ACK lsilva01

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27071 (Handle CJDNS from LookupSubNet() by vasild)
  • #25284 (net: Use serialization parameters for CAddress serialization by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from ac11edf to e495983 Compare October 17, 2022 21:21
@brunoerg brunoerg marked this pull request as draft October 17, 2022 21:27
@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from e495983 to 41ae2b4 Compare October 17, 2022 21:32
@DrahtBot DrahtBot mentioned this pull request Oct 18, 2022
@brunoerg brunoerg marked this pull request as ready for review October 18, 2022 13:08
@brunoerg brunoerg force-pushed the 2022-02-anchors-delete branch from 41ae2b4 to 658cfb1 Compare November 30, 2022 14:37
@brunoerg
Copy link
Contributor Author

Just discovered another thing this PR may fix.

  1. Start bitcoind
  2. Wait 2 block-relay-only connections
  3. Shutdown it and start bitcoind with -stopafterblockimport

Since there is no enough time to establish the 2 connections from anchors.dat (because it's gonna stop after block import), maybe there is time for just one, it's gonna clean up that peer from anchors.dat.

void DeleteAnchorsFile()
{
const fs::path path{gArgs.GetDataDirNet() / ANCHORS_DATABASE_FILENAME};
fs::remove(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now if we fail to remove the file we crash fairly early on startup, so the user can investigate why they are unable to remove it. With this change we will get pretty far into normal operation before crashing. Should we catch and log any error if we fail to remove, or is it a fatal error if we don't remove?

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 3, 2023

There hasn't been much activity lately. What is the status here?

Finding reviewers may take time. However, if the patch is no longer relevant, please close this pull request. If the author lost interest or time to work on this, please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 7, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2023

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 5, 2024

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 5, 2024

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

@fanquake
Copy link
Member

fanquake commented Mar 6, 2024

Closing for now. 4 pings and 0 followup. From the discussion here it's not even clear that there is buy-in from other contributers that this is a change we want/is ultimately beneficial.

@fanquake fanquake closed this Mar 6, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Mar 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.