Skip to content

Conversation

ms1901
Copy link

@ms1901 ms1901 commented May 12, 2022

Summary of changes

Fixes #479

Have implemented the DESCRIBE query, which returns a SPARQLResult containing all triples with the query as the subject or object, along with the resolution of triples containing blank nodes. The implementation is based on Concise Bounded Descriptions.

The Describe can take the form : DESCRIBE <ex:Resource1>
Before the changes, DESCRIBE queries were not implemented as shown:

elif part.name == "DescribeQuery":
        raise Exception("DESCRIBE not implemented")

And raised Exception: DESCRIBE not implemented.

Have modified the original implementation of CBD to yield the output of the DESCRIBE query. Originally the CBD implementation returns the triples containing the query resource as subject as mentioned here. The DESCRIBE query should output triples that contain the query resource as either subject or object. To this end, we implement the below function:

def describe_cbd(self, resource, subgraph):
        def add_to_cbd_original(uri):
            for s, p, o in self.triples((uri, None, None)):
                subgraph.add((s, p, o))
                if type(o) == BNode and not (o, None, None) in subgraph:
                    add_to_cbd_original(o)

            for s, p, o in self.triples((None, RDF.subject, uri)):
                for s2, p2, o2 in self.triples((s, None, None)):
                    subgraph.add((s2, p2, o2))

        def add_to_cbd_reverse(uri):
            for s, p, o in self.triples((None, None, uri)):
                
                subgraph.add((s, p, o))
                if type(s) == BNode and not (s, None, None) in subgraph:
                    add_to_cbd_reverse(s)

    
            for s, p, o in self.triples((None, RDF.object, uri)):
                for s2, p2, o2 in self.triples((s, None, None)):
                    
                    subgraph.add((s2, p2, o2))

        add_to_cbd_original(resource)
        add_to_cbd_reverse(resource)

        return subgraph

The above-mentioned function is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object. We also ensure that the blank nodes are recursively resolved.

To ensure that DESCRIBE query works for multiple resources as input, we make the following change to the function call.

The final output consists of a SPARQLResult object comprising of triples corresponding to all query resources as subjects or objects.

We assume that the input format in such a case would be:

DESCRIBE <ex:Resource1> <ex:Resource2>

elif part.name == "DescribeQuery":
        subgraph = Graph()     
        for var in part['PV']:
            subgraph = ctx.graph.describe_cbd(var, subgraph)
        res = {"type_": "DESCRIBE", "graph": subgraph}  
        return res 

To show the working of DESCRIBE query, we have provided a tester file.

@ms1901
Copy link
Author

ms1901 commented May 12, 2022

pre-commit.ci autofix

@aucampia
Copy link
Member

To link this PR to an issue please see:

The way you are doing it makes more involved to find the issue you are referring to.

@ms1901 ms1901 changed the title DESCRIBE query implemented. Solved #479 DESCRIBE query implemented. Resolves #479 May 12, 2022
@ms1901 ms1901 changed the title DESCRIBE query implemented. Resolves #479 DESCRIBE query implemented, resolves #479 May 12, 2022
@aucampia
Copy link
Member

Note to reviewers: a previous attempt at fixing this was here: https://github.com/gjhiggins/rdflib/commit/5fb6f43e6b1d2264ac5015c260b32fc4e7dbf240#comments

@coveralls
Copy link

coveralls commented May 12, 2022

Coverage Status

Coverage decreased (-0.02%) to 88.224% when pulling 2856f65 on ms1901:master into 5e5dcea on RDFLib:master.

@aucampia
Copy link
Member

Note to reviewers: a previous attempt at fixing this was here: gjhiggins@5fb6f43#comments

Having a brief look I think most of the comments there apply here also.

@@ -1597,6 +1597,70 @@ def add_to_cbd(uri):

return subgraph

def describe_cbd(self, resource, subgraph):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear from the docstrings why this would exist in addition to Graph.cbd as from a brief glance the docstrings look the same. I don't think we need two methods to get the CBD, we just need one that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

Also you will have to include tests specifically for this function for this PR to be merged, but still I don't think it should exist, if Graph.cbd does not do what it says in the docstring it is a bug and should be fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

Thanks for your review.

We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function. The test example which we have included will be able to cover all the cases for the definition of DESCRIBE query. The test example can be visualized as:-
Screen Shot 2022-05-12 at 10 14 53 PM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only. However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition of DESCRIBE query.

Copy link
Member

@aucampia aucampia May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function.

My appologies, in that case you will need tests for SPARQL DESCRIBE as part of the PR.

The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only.

If it does something different then it should at least have a different docstring which it does not as far as I can tell, and then the next question is, why does it do something different? There is one definition of a Concise Bounded Description, and it is this: https://www.w3.org/Submission/CBD/#definition - if it does something different it should be further qualified, and as SPARQL DESCRIBE does not mandate anything in regards to what should be returned I don't think qualifying it with describe is helpful.

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition of DESCRIBE query.

Can you clarify why? As per the SPARQL 1.1 spec:

https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#describe

The DESCRIBE form returns a single result RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor.

And goes on to say:

Other possible mechanisms for deciding what information to return include Concise Bounded Descriptions [CBD].

It does not state that DESCRIBE must return the CBD, but I do expect that if we decide to return the Concise Bounded Description that we return it as defined, in which case Graph.cbd should do it, and if it does not should be fixed to do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not state that DESCRIBE must return the CBD

DESCRIBE should not return just the CBD, because as we mentioned below

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.

This is in line with the definition of the DESCRIBE query.

If it does something different then it should at least have a different docstring which it does not as far as I can tell

Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.

Further,

in that case, you will need tests for SPARQL DESCRIBE as part of the PR.

I have also provided the link to a testing file that has such an example.

To show the working of DESCRIBE query, we have provided a tester file.

If that matches what you expect, we can add that file to the PR. For now, we have just attached the link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESCRIBE should not return just the CBD, because as we mentioned below

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.

But I don't understand why, where does this requirement come from, could you cite some specification that requires this behavior or indicates why you say it needs "all triples containing the resource – both as an object and subject."? If this is just what you as an individual would prefer I don't think it is sufficient justification.

I have also provided the link to a testing file that has such an example.

It has to be in our test suite, if it is not then I don't know if it passes (as I won't run it on my computer) and we also can't know if there is a regression.

Copy link
Member

@aucampia aucampia May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.

To begin with, it will be best to just cite what specification dictates this algorithm/behaviour.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, the result of a "DESCRIBE" query is determined by the service and the definition is sufficiently vague as to prompt the W3 to set description in double quotes:

“The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by IRI, and assembles a single RDF graph by taking a "description" which can come from any information available including the target RDF Dataset. The description is determined by the query service.”

The EBNF for SPARQL DESCRIBE suggests that an implementation that conforms to the spec is going to demand rather more than just defaulting to returning a cbd:

DescribeQuery | ::= | "DESCRIBE" ( VarOrIri+  \| "*")  DatasetClause*  WhereClause?  SolutionModifier

@aucampia
Copy link
Member

Full diff for previous fix attempt is master...gjhiggins:fix-for-issue813-sparql-describe

@aucampia aucampia added the needs more work The PR needs more work before being merged or further reviewed. label May 15, 2022
Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comments

@aucampia
Copy link
Member

Please rebase this branch onto master or merge master into this branch.

@aucampia
Copy link
Member

If there is nothing further on this PR I will close it.

@aucampia
Copy link
Member

Closing this PR due to inactivity, feel free to open it again if you continue working on it again.

@aucampia aucampia closed this Jul 29, 2022
aucampia pushed a commit that referenced this pull request Feb 7, 2023
This adds an implementation for SPARQL DESCRIBE queries, using the built-in `cbd` method. I see there are several issues and PRs for DESCRIBE implementation. I believe this should close #479 and should resolve #1913, or at least pick up where it left off. It should also resolve #1096.

This implementation should support the full SPARQL specification for DESCRIBE; either explicit IRIs can be provided (with no WHERE clause), or variables projected from a WHERE clause can be provided, or variables projected from a WHERE clause AND explicit IRIs can be provided. If a WHERE clause is provided, it should be evaluated the same way as it would for a SELECT DISTINCT query (including dataset specifications).

The expected results for the test cases provided match the behaviour seen when running the same queries against the same data using ARQ.

A possible future extension would be to add a global option (similar to `rdflib.plugins.sparql.SPARQL_LOAD_GRAPHS`) to change the method used to describe resources instead of always using CBD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more work The PR needs more work before being merged or further reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DESCRIBE query not working
3 participants