Content routing issues with "Reprovider.Strategy" set to "roots"

### Checklist

- [X] This is a bug report, not a question. Ask questions on [discuss.ipfs.io](https://discuss.ipfs.io).
- [X] I have searched on the [issue tracker](https://github.com/ipfs/kubo/issues?q=is%3Aissue) for my bug.
- [X] I am running the latest [kubo version](https://dist.ipfs.tech/#kubo) or have an issue updating.

### Installation method

ipfs-update or dist.ipfs.tech

### Version

```Text
Kubo version: 0.16.0
Repo version: 12
System version: amd64/darwin
Golang version: go1.19.1
```


### Config

```json
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/54957",
      "/ip6/::/tcp/54957",
      "/ip4/0.0.0.0/udp/54957/quic",
      "/ip6/::/udp/54957/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": true,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "<redacted>"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "roots"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}
```


### Description

I'm seeing unexpected behavior when using `Reprovider.Strategy` "roots".

Now, this might be one or multiple bugs in kubo, in the documentation, in the gateway, in specs, or in some interaction between them all. This is kind of an umbrella bug, since I'm not too familiar with the different subsystems that might be involved. I'm writing this from a user's perspective, not from an IPFS developer perspective.

Hopefully someone can help figure out if this needs to be split up in smaller tickets, or filed elsewhere. I'd be happy to work with you all to make that happen!

## Background
When (re)providing very large directories with lots of data, it can take a long time to complete. My understanding from the [config docs](https://github.com/ipfs/kubo/blob/master/docs/config.md#reproviderstrategy) is that you can use `Reprovider.Strategy` "roots" to only provide the directory CID itself, provided that it is pinned. Fantastic, that should make (re)providing much faster!

[This blog post](https://blog.ipfs.tech/2022-06-30-practical-explainer-ipfs-gateways-2/#debugging-content-publishing) then suggests that IPFS nodes should still be able to find the content that is linked to in this directory, e.g. when using `<directory CID>/filename` (emphasis mine):

> If you are adding folders of files to IPFS, only the CID for the pinned folder will be advertised (**all the blocks will still be retrievable with Bitswap once a connection to the node is established**).

Therefore, I would expect that I can do this:
1. Add new files with a wrapping directory, pinning only the directory.
2. Provide using `Reprovider.Strategy` "roots", so that only the "directory CID" is provided.
3. On a gateway, go to `/ipfs/<directory CID>/filename` (NOT `/ipfs/<file CID>`, since that will surely not work, since it has never been provided)

## Steps taken
Some of these steps might be unnecessary, but I tried to stick to my production use case as close as possible, while trying to make a minimal example. When debugging this, you might be able to make it even more minimal.

```sh
$ ipfs init # Create a new repo.
$ ipfs config --json Experimental.FilestoreEnabled true # Since we do that in my production setup.
$ ipfs config --json Reprovider.Strategy '"roots"' # Enable our roots strategy.
$ ipfs daemon --offline # First start in offline mode, so that "ipfs add" doesn't accidentally provide all our CIDs.

# In a different window:
$ head -c 1000 < /dev/urandom > 1.txt # Fill 1.txt with random data.
$ head -c 1000 < /dev/urandom > 2.txt # Fill 2.txt with random data.
$ ipfs add --wrap-with-directory --nocopy --recursive --hash=blake2b-256 --chunker=size-1048576 1.txt 2.txt # Using the same command line parameters as we use in production.
# Note that this gives us our directory CID, as well as the CIDs of 1.txt and 2.txt.

# In the first window again:
$ ipfs daemon # Start the daemon properly now (be sure to kill the previous one)

# In window 2 again:
$ ipfs bitswap reprovide # This often crashes the daemon (see below).
$ ipfs dht provide -v <directory CID> # Alternative if `bitswap reprovide` crashes. Note that we omit "-r".
$ ipfs stats provide # Verify that providing happened.
```

Ok, we have now provided just the directory CID onto the IPFS network. We can verify that it can be found by using [this diagnostic tool](https://pl-diagnose.on.fleek.co/#/diagnose/access-content?backend=https%3A%2F%2Fpl-diagnose.onrender.com). Indeed, I can consistently confirm that the directory can be found and accessed on my node. Similarly, we can consistently confirm that the CIDs for `1.txt` and `2.txt` cannot be found, which is in line with our expectations.

**First bug: `2.txt` not accessible**

Now it becomes a bit more iffy. The next steps I can't always fully reproduce, but most of the time I can.
1. I go to a gateway. I've tested this with ipfs.io, cloudflare-ipfs.com, gateway.pinata.cloud, and ipfs.fleek.co. It would be good to also test with a locally hosted gateway, but I haven't done that yet.
2. On the gateway, I go to `/ipfs/<directory CID>/1.txt`. Usually, this works!
3. Now I try to go to `/ipfs/<directory CID>/2.txt`. Usually, this times out. No matter how long I wait, I can't get it to work. Even when using a different gateway it often still times out! And sometimes I can't even access the directory itself; `/ipfs/<directory CID>`.

Interestingly, on a different IPFS node, `ipfs resolve -r /ipfs/<directory CID>/2.txt` usually *does* work, and a subsequent `ipfs dag get` on its output also usually does give me the data from `2.txt`.

**Second bug: crash during `ipfs bitswap reprovide`**

> @lidel moved this to https://github.com/ipfs/kubo/issues/9418 

## Related discussions
In my investigations so far, I've found a couple of people who have mentioned potentially related issues:
* https://github.com/ipfs/kubo/issues/6221#issuecomment-483441426
* https://github.com/ipfs/kubo/issues/8676

My hypothesis is that once an IPFS node has cached the directory CID locally, it doesn't remember the IPFS node that it originally connected to when looking up `2.txt`. And since the CID for `2.txt` itself is not cached, or provided on IPFS at all, it simply can't find it.

If I'm right about this, I see two possibilities, depending on whether this is the intended behavior:
1. Yes, this is the intended behavior. In that case, what is the purpose of `Reprovider.Strategy` "roots" in the first place? Is it only useful if we know that other nodes will always fully recursively fetch all the "children" of a root node? If so, then that should be more clearly documented. It would also be fairly brittle behavior, because what if a node goes down while doing the recursive fetching (which could potentially be many terabytes!). It would be hard to resume in that case.
2. No, this is not the intended behavior. In that case, you probably know better than me how to resolve this. I would imagine something like, either remembering which node we originally got the "directory CID" from, and re-establishing a connection with them when trying to fetch a child. Or checking all the nodes that provide the "directory CID" to see if they have any child CIDs. Or something else, I dunno!

I hope this is helpful! Again, happy to provide more details and think through this with you all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Content routing issues with "Reprovider.Strategy" set to "roots" #9416

Checklist

Installation method

Version

Config

Description

Background

Steps taken

Related discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Content routing issues with "Reprovider.Strategy" set to "roots" #9416

Description

Checklist

Installation method

Version

Config

Description

Background

Steps taken

Related discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions