-
Notifications
You must be signed in to change notification settings - Fork 577
DESCRIBE query implemented, resolves #479 #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pre-commit.ci autofix |
To link this PR to an issue please see: The way you are doing it makes more involved to find the issue you are referring to. |
for more information, see https://pre-commit.ci
Note to reviewers: a previous attempt at fixing this was here: https://github.com/gjhiggins/rdflib/commit/5fb6f43e6b1d2264ac5015c260b32fc4e7dbf240#comments |
Having a brief look I think most of the comments there apply here also. |
@@ -1597,6 +1597,70 @@ def add_to_cbd(uri): | |||
|
|||
return subgraph | |||
|
|||
def describe_cbd(self, resource, subgraph): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear from the docstrings why this would exist in addition to Graph.cbd
as from a brief glance the docstrings look the same. I don't think we need two methods to get the CBD, we just need one that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
Also you will have to include tests specifically for this function for this PR to be merged, but still I don't think it should exist, if Graph.cbd
does not do what it says in the docstring it is a bug and should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey!
Thanks for your review.
We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function. The test example which we have included will be able to cover all the cases for the definition of DESCRIBE query. The test example can be visualized as:-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only. However, for implementing the DESCRIBE
query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition of DESCRIBE
query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function.
My appologies, in that case you will need tests for SPARQL DESCRIBE
as part of the PR.
The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only.
If it does something different then it should at least have a different docstring which it does not as far as I can tell, and then the next question is, why does it do something different? There is one definition of a Concise Bounded Description, and it is this: https://www.w3.org/Submission/CBD/#definition - if it does something different it should be further qualified, and as SPARQL DESCRIBE
does not mandate anything in regards to what should be returned I don't think qualifying it with describe
is helpful.
However, for implementing the
DESCRIBE
query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition ofDESCRIBE
query.
Can you clarify why? As per the SPARQL 1.1 spec:
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#describe
The DESCRIBE form returns a single result RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor.
And goes on to say:
Other possible mechanisms for deciding what information to return include Concise Bounded Descriptions [CBD].
It does not state that DESCRIBE must return the CBD, but I do expect that if we decide to return the Concise Bounded Description that we return it as defined, in which case Graph.cbd
should do it, and if it does not should be fixed to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not state that DESCRIBE must return the CBD
DESCRIBE
should not return just the CBD, because as we mentioned below
However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.
This is in line with the definition of the DESCRIBE
query.
If it does something different then it should at least have a different docstring which it does not as far as I can tell
Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.
Further,
in that case, you will need tests for SPARQL DESCRIBE as part of the PR.
I have also provided the link to a testing file that has such an example.
To show the working of DESCRIBE query, we have provided a tester file.
If that matches what you expect, we can add that file to the PR. For now, we have just attached the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DESCRIBE
should not return just the CBD, because as we mentioned belowHowever, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.
But I don't understand why, where does this requirement come from, could you cite some specification that requires this behavior or indicates why you say it needs "all triples containing the resource – both as an object and subject."? If this is just what you as an individual would prefer I don't think it is sufficient justification.
I have also provided the link to a testing file that has such an example.
It has to be in our test suite, if it is not then I don't know if it passes (as I won't run it on my computer) and we also can't know if there is a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.
To begin with, it will be best to just cite what specification dictates this algorithm/behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, the result of a "DESCRIBE" query is determined by the service and the definition is sufficiently vague as to prompt the W3 to set description
in double quotes:
“The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by IRI, and assembles a single RDF graph by taking a "description" which can come from any information available including the target RDF Dataset. The description is determined by the query service.”
The EBNF for SPARQL DESCRIBE
suggests that an implementation that conforms to the spec is going to demand rather more than just defaulting to returning a cbd:
DescribeQuery | ::= | "DESCRIBE" ( VarOrIri+ \| "*") DatasetClause* WhereClause? SolutionModifier
Full diff for previous fix attempt is master...gjhiggins:fix-for-issue813-sparql-describe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comments
Please rebase this branch onto master or merge master into this branch. |
If there is nothing further on this PR I will close it. |
Closing this PR due to inactivity, feel free to open it again if you continue working on it again. |
This adds an implementation for SPARQL DESCRIBE queries, using the built-in `cbd` method. I see there are several issues and PRs for DESCRIBE implementation. I believe this should close #479 and should resolve #1913, or at least pick up where it left off. It should also resolve #1096. This implementation should support the full SPARQL specification for DESCRIBE; either explicit IRIs can be provided (with no WHERE clause), or variables projected from a WHERE clause can be provided, or variables projected from a WHERE clause AND explicit IRIs can be provided. If a WHERE clause is provided, it should be evaluated the same way as it would for a SELECT DISTINCT query (including dataset specifications). The expected results for the test cases provided match the behaviour seen when running the same queries against the same data using ARQ. A possible future extension would be to add a global option (similar to `rdflib.plugins.sparql.SPARQL_LOAD_GRAPHS`) to change the method used to describe resources instead of always using CBD.
Summary of changes
Fixes #479
Have implemented the DESCRIBE query, which returns a SPARQLResult containing all triples with the query as the subject or object, along with the resolution of triples containing blank nodes. The implementation is based on Concise Bounded Descriptions.
The Describe can take the form :
DESCRIBE <ex:Resource1>
Before the changes, DESCRIBE queries were not implemented as shown:
And raised
Exception: DESCRIBE not implemented
.Have modified the original implementation of CBD to yield the output of the DESCRIBE query. Originally the CBD implementation returns the triples containing the query resource as subject as mentioned here. The DESCRIBE query should output triples that contain the query resource as either subject or object. To this end, we implement the below function:
The above-mentioned function is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object. We also ensure that the blank nodes are recursively resolved.
To ensure that DESCRIBE query works for multiple resources as input, we make the following change to the function call.
The final output consists of a SPARQLResult object comprising of triples corresponding to all query resources as subjects or objects.
We assume that the input format in such a case would be:
DESCRIBE <ex:Resource1> <ex:Resource2>
To show the working of DESCRIBE query, we have provided a tester file.