Skip to content

Regression in 6.2.0 when URL source to be parsed contains a query string #2120

@nutjob4life

Description

@nutjob4life

Graph.parse in 6.2.0 is producing inconsistent statement counts when reading over HTTP versus 6.1.1 when the source URL contains a query string.

To reproduce:

$ date -u
Wed Sep 21 16:57:54 UTC 2022
$ cd /tmp
$ python3.10 --version
Python 3.10.5
$ python3.10 -m venv venv
$ cd venv
$ bin/pip install --quiet --upgrade setuptools pip wheel rdflib==6.2.0
$ bin/python
Python 3.10.5 (main, Jun 23 2022, 17:15:25) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdflib
>>> a, b = rdflib.Graph(), rdflib.Graph()
>>> a.parse('https://bmdb.jpl.nasa.gov/rdf/example')
>>> b.parse('https://bmdb.jpl.nasa.gov/rdf/example?all=right')
>>> len(a), len(b), len(a) != len(b)
(2, 2, False)

This should be (2, 4, True).

Using 6.1.1 produces the correct results:

$ bin/pip install --quiet rdflib==6.1.1
$ bin/python
Python 3.10.5 (main, Jun 23 2022, 17:15:25) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdflib
>>> a, b = rdflib.Graph(), rdflib.Graph()
>>> a.parse('https://bmdb.jpl.nasa.gov/rdf/example')
>>> b.parse('https://bmdb.jpl.nasa.gov/rdf/example?all=right')
>>> len(a), len(b), len(a) != len(b)
(2, 4, True)

Apparently in 6.2.2 the query string part of the URL ?all=right gets stripped out. It is necessary with certain web services APIs in order to select different sets of RDF statements.

Metadata

Metadata

Assignees

Labels

6.3.xThings that should be fixed in 6.3.x or later.bugSomething isn't workingregressionSomething stopped working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions