-
Notifications
You must be signed in to change notification settings - Fork 575
Description
I ran into an issue when serializing the results of SPARQL queries with aggregates from optional graph patterns, i.e., they might potentially be unbound. I am using rdflib==6.2.0
.
The query in question is:
SELECT DISTINCT ?x (COUNT(DISTINCT ?inst) AS ?cnt) WHERE {
?x a <http://swat.cse.lehigh.edu/onto/univ-bench.owl#GraduateStudent>
OPTIONAL {
VALUES ?inst { <http://www.University0.edu> <http://www.University1.edu> }.
?x <http://swat.cse.lehigh.edu/onto/univ-bench.owl#undergraduateDegreeFrom> ?inst .
}
} GROUP BY ?x
For each graduate student, I want to know how many undergraduate degrees he/she has from the list of universities provided using the VALUES clause. I am using OPTIONAL here since I am also interested in getting 0
if the student doesn't have a degree from one of the specified universities.
The query runs fine in my SPARQL endpoint but when I try to use rdflib as an in-memory RDF graph, I get the following exception:
Traceback (most recent call last):
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evalutils.py", line 68, in _eval
return ctx[expr]
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/sparql.py", line 175, in __getitem__
return self.ctx.initBindings[key] # type: ignore[index]
KeyError: rdflib.term.Variable('inst')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/my_path/graph_test.py", line 33, in run_query
res_json = res.serialize(format='json')
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/query.py", line 252, in serialize
serializer.serialize(stream2, encoding=encoding, **args) # type: ignore
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/results/jsonresults.py", line 43, in serialize
self._bindingToJSON(x) for x in self.result.bindings
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/query.py", line 184, in bindings
self._bindings += list(self._genbindings)
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 541, in evalDistinct
for x in res:
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 550, in <genexpr>
return (row.project(project.PV) for row in res)
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 100, in evalExtend
for c in evalPart(ctx, extend.p):
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 100, in evalExtend
for c in evalPart(ctx, extend.p):
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 453, in evalAggregateJoin
aggregator.update(row)
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 256, in update
if acc.use_row(row):
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 68, in use_row
return self.eval_row(row) not in self.seen
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 62, in eval_row
return _eval(self.expr, row)
File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evalutils.py", line 71, in _eval
raise NotBoundError("Variable %s is not bound" % expr)
rdflib.plugins.sparql.sparql.NotBoundError: Variable inst is not bound
I tried the following rewriting of my original query to bypass that issue but with no success.
SELECT DISTINCT ?x (IF(bound(?inst), COUNT(DISTINCT ?inst), 0) AS ?cnt) WHERE {
?x a <http://swat.cse.lehigh.edu/onto/univ-bench.owl#GraduateStudent>
OPTIONAL {
VALUES ?inst { <http://www.University0.edu> <http://www.University1.edu> }.
?x <http://swat.cse.lehigh.edu/onto/univ-bench.owl#undergraduateDegreeFrom> ?inst .
}
}
To my understanding, the error should occur in the count which shouldn't be executed due to the IF statement.
Many thanks in advance!