Skip to content

Ls / listdir fails when using a public gateway because a non-CID path is requested #39

@mxmlnkn

Description

@mxmlnkn

All of these do work:

IPFS_GATEWAY='https://ipfs.io python3' -c "
import fsspec
with fsspec.open('ipfs://QmZ4tDuvesekSs4qM5ZBKpXiZGun7S2CYtEZRB3DYXkjGx', 'r') as f:
    print(f.read())
"  # hello worlds

# Folder with a single file to emulate a named file.
folderCID=bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4

ipfs daemon & sleep 3
ipfs get "$folderCID"
stat -c %s "$folderCID/welcome-to-IPFS.jpg" # 663082

python3 -c "
import fsspec
fs, _ = fsspec.url_to_fs('ipfs://$folderCID')
print(fs.ls('$folderCID'))
" 
# [{'name': 'bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4/welcome-to-IPFS.jpg',
#  'CID': 'bafkreie7ohywtosou76tasm7j63yigtzxe7d5zqus4zu3j6oltvgtibeom', 'type': 'file', 'size': 663082}]

The last command does fail when specifying the IPFS_GATEWAY environment variable

IPFS_GATEWAY='https://ipfs.io' python3 -c "
import fsspec; fs, _ = fsspec.url_to_fs('ipfs://$folderCID'); print(fs.ls('$folderCID'))" 

Error:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "~/.local/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "~/.local/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/ipfsspec/async_ipfs.py", line 302, in _ls
    return await self.gateway.ls(path, session, detail=detail)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/ipfsspec/async_ipfs.py", line 148, in ls
    return await asyncio.gather(*(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/ipfsspec/async_ipfs.py", line 73, in info
    self._raise_not_found_for_status(res, path)
  File "~/.local/lib/python3.12/site-packages/ipfsspec/async_ipfs.py", line 162, in _raise_not_found_for_status
    response.raise_for_status()
  File "~/.local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1157, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 406, message='Not Acceptable', url='https://trustless-gateway.link/ipfs/bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4/welcome-to-IPFS.jpg?format=raw'

That error 406 can be reproduced with a simple wget on the URL shown in the error message.
The URL already looks wrong. Instead of the structured https://trustless-gateway.link/ipfs/<CID>/welcome-to-IPFS.jpg?format=raw, I would have expected it to be simply a CID.
Using the simple file CID shown in the working ls output, i.e., wget 'https://trustless-gateway.link/ipfs/bafkreie7ohywtosou76tasm7j63yigtzxe7d5zqus4zu3j6oltvgtibeom?format=raw' gets me the desired file without an error.

Btw, does that mean that a simple ls will download all files? Or will it only download the header to determine the file type? Else, this might be running into the same performance problem as the fsspec HTTP backend: fsspec/filesystem_spec#1707

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions