Skip to content

Serializing SPARQL Query Results with Aggregates over Variables from Optional Graph Pattern #2229

@prohde

Description

@prohde

I ran into an issue when serializing the results of SPARQL queries with aggregates from optional graph patterns, i.e., they might potentially be unbound. I am using rdflib==6.2.0.

The query in question is:

SELECT DISTINCT ?x (COUNT(DISTINCT ?inst) AS ?cnt) WHERE {
  ?x a <http://swat.cse.lehigh.edu/onto/univ-bench.owl#GraduateStudent> 
  OPTIONAL {
    VALUES ?inst { <http://www.University0.edu> <http://www.University1.edu> }. 
    ?x <http://swat.cse.lehigh.edu/onto/univ-bench.owl#undergraduateDegreeFrom> ?inst .
  }
} GROUP BY ?x

For each graduate student, I want to know how many undergraduate degrees he/she has from the list of universities provided using the VALUES clause. I am using OPTIONAL here since I am also interested in getting 0 if the student doesn't have a degree from one of the specified universities.

The query runs fine in my SPARQL endpoint but when I try to use rdflib as an in-memory RDF graph, I get the following exception:

Traceback (most recent call last):
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evalutils.py", line 68, in _eval
    return ctx[expr]
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/sparql.py", line 175, in __getitem__
    return self.ctx.initBindings[key]  # type: ignore[index]
KeyError: rdflib.term.Variable('inst')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/my_path/graph_test.py", line 33, in run_query
    res_json = res.serialize(format='json')
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/query.py", line 252, in serialize
    serializer.serialize(stream2, encoding=encoding, **args)  # type: ignore
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/results/jsonresults.py", line 43, in serialize
    self._bindingToJSON(x) for x in self.result.bindings
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/query.py", line 184, in bindings
    self._bindings += list(self._genbindings)
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 541, in evalDistinct
    for x in res:
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 550, in <genexpr>
    return (row.project(project.PV) for row in res)
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 100, in evalExtend
    for c in evalPart(ctx, extend.p):
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 100, in evalExtend
    for c in evalPart(ctx, extend.p):
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evaluate.py", line 453, in evalAggregateJoin
    aggregator.update(row)
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 256, in update
    if acc.use_row(row):
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 68, in use_row
    return self.eval_row(row) not in self.seen
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/aggregates.py", line 62, in eval_row
    return _eval(self.expr, row)
  File "/home/my_path/venv/lib/python3.9/site-packages/rdflib/plugins/sparql/evalutils.py", line 71, in _eval
    raise NotBoundError("Variable %s is not bound" % expr)
rdflib.plugins.sparql.sparql.NotBoundError: Variable inst is not bound

I tried the following rewriting of my original query to bypass that issue but with no success.

SELECT DISTINCT ?x (IF(bound(?inst), COUNT(DISTINCT ?inst), 0) AS ?cnt) WHERE {
  ?x a <http://swat.cse.lehigh.edu/onto/univ-bench.owl#GraduateStudent> 
  OPTIONAL {
    VALUES ?inst { <http://www.University0.edu> <http://www.University1.edu> }. 
    ?x <http://swat.cse.lehigh.edu/onto/univ-bench.owl#undergraduateDegreeFrom> ?inst .
  }
}

To my understanding, the error should occur in the count which shouldn't be executed due to the IF statement.

Many thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingconfirmation neededThe issue raises a potential bug that needs to be confirmed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions