Skip to content

Result.serialize() path handling is broken for windows paths and some other cases #2067

@aucampia

Description

@aucampia

I have been trying to figure out what is happening with these xfails:

if sys.platform == "win32":
xfails[("csv", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("csv", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("json", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("json", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("xml", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("xml", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)

The problem is with the approach to path handling:

rdflib/rdflib/query.py

Lines 268 to 279 in 1d5f3e7

location = cast(str, destination)
scheme, netloc, path, params, query, fragment = urlparse(location)
if netloc != "":
print(
"WARNING: not saving as location" + "is not a local file reference"
)
return None
fd, name = tempfile.mkstemp()
stream = os.fdopen(fd, "wb")
serializer.serialize(stream, encoding=encoding, **args)
stream.close()
shutil.move(name, path)

The problem with this approach is that file URIs and OS paths are quite different, for one, with windows OS paths, e.g. C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH, the drive letter gets interpreted as the URL scheme:

$ python3 -c 'from urllib.parse import urlparse; print(urlparse(r"C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH"))'
ParseResult(scheme='c', netloc='', path='\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p6\\file-DestinationType.STR_PATH', params='', query='', fragment='')

Furthermore, URIs support percent encoding, while OS paths do not.

Here is an example of things going wrong (from here)

  ------------------------------ Captured log call ------------------------------
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:317:test_select_result_serialize_parse destination = C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:318:test_select_result_serialize_parse format = csv
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:319:test_select_result_serialize_parse encoding = utf-16
  ___________ test_select_result_serialize_parse[csv-STR_PATH-utf-8] ____________
  Traceback (most recent call last):
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 566, in move
      os.rename(src, real_dst)
  FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpgk0vyq6q' -> '\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p7\\file-DestinationType.STR_PATH'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "D:\a\rdflib\rdflib\test\test_sparql\test_result.py", line 323, in test_select_result_serialize_parse
      encoding=encoding,
    File "D:\a\rdflib\rdflib\rdflib\query.py", line 283, in serialize
      shutil.move(name, path)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 580, in move
      copy_function(src, real_dst)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 266, in copy2
      copyfile(src, dst, follow_symlinks=follow_symlinks)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 121, in copyfile
      with open(dst, 'wb') as fdst:
  FileNotFoundError: [Errno 2] No such file or directory: '\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p7\\file-DestinationType.STR_PATH'

I think the best we can do to fix the path handling is to do the same as what happens in Graph.serialize

rdflib/rdflib/graph.py

Lines 1204 to 1218 in 1d5f3e7

if isinstance(destination, pathlib.PurePath):
location = str(destination)
else:
location = cast(str, destination)
scheme, netloc, path, params, _query, fragment = urlparse(location)
if netloc != "":
raise ValueError(
f"destination {destination} is not a local file reference"
)
fd, name = tempfile.mkstemp()
stream = os.fdopen(fd, "wb")
serializer.serialize(stream, base=base, encoding=encoding, **args)
stream.close()
dest = url2pathname(path) if scheme == "file" else location
shutil.move(name, dest)

This will fill relative path handling in some cases also, however it will break relative URI handling.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreRelates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}`

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions