Move Store API to work with identifiers, not graphs #1646

ghost · 2022-01-06T02:36:59Z

This is a rebase of PR #958, PR #409 rebased against master to fix #167

Proposed Changes

As per original PR, all instances of Graph objects as contexts changed to Graph identifiers.

graph.py
auditable.py
notation3.py
sparqlstore.py
store.py
update.py
corresponding changes to tests and three annotations of failure.

This is a rebase of PR #958, PR #409 rebased against master to fix #167

update to pytest, add failure annotations, change `urn:` to `urn:x-rdflib:`

ghost · 2022-01-06T03:46:20Z

The default assignment of a BNode identifier to Graph is causing some blocking issues:

>>> from rdflib import Graph, URIRef
>>> tarek = URIRef("urn:x-rdflib:tarek")
>>> likes = URIRef("urn:x-rdflib:likes")
>>> pizza = URIRef("urn:x-rdflib:pizza")
>>> g = Graph(store="SPARQLUpdateStore")
>>> g.open(
...     configuration=(
...         "http://localhost:3030/db/query",
...         "http://localhost:3030/db/update",
...     )
... )
>>> g.add((tarek, likes, pizza))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../rdflib/graph.py", line 431, in add
    self.__store.add((s, p, o), self.identifier, quoted=False)
  File ".../rdflib/plugins/stores/sparqlstore.py", line 628, in add
    q = "INSERT DATA { GRAPH %s { %s } }" % (nts(context), triple)
  File ".../rdflib/plugins/stores/sparqlstore.py", line 33, in _node_to_sparql
    raise Exception(
Exception: SPARQLStore does not support BNodes! See http://www.w3.org/TR/sparql11-query/#BGPsparqlBNodes

Although it wasn't included in the original PR, this change:

diff --git a/rdflib/graph.py b/rdflib/graph.py
index cab825b1..69e31c32 100644
--- a/rdflib/graph.py
+++ b/rdflib/graph.py
@@ -1940,7 +1940,7 @@ class Dataset(ConjunctiveGraph):
     """
 
     def __init__(self, store="default", default_union=False, default_graph_base=None):
-        super(Dataset, self).__init__(store=store, identifier=None)
+        super(Dataset, self).__init__(store=store, identifier=DATASET_DEFAULT_GRAPH_ID)
 
         if not self.store.graph_aware:
             raise Exception("DataSet must be backed by a graph-aware store!")

allows the the otherwise failing tests in test_dataset_generators and json-ld test_named_graphs to succeed.

More generally, niklasl 's observations in #436 remain true “When ConjunctiveGraph.parse is called, it wraps its underlying store in a regular Graph instance. This causes problems for parsers of datasets, e.g. NQuads, TriG and JSON-LD. Specifically, the triples in the default graph of a dataset haphazardly end up in bnode-named contexts.” so while programmatic operations on Datasets are coherent, after this PR is applied, parsing is not.

test/test_graph_generators.py

…RAPH_ID instead of None. Revert test failure annotations in `test_dataset_generators` and `test_named_graphs`

ghost · 2022-01-06T11:30:32Z

Re: ea1c2eb, I'd be happier if I had a better model of why this works but being able to resolve the jsonld test_named_graphs failure seems worth it.

There are other reasons, too.

I followed aucampia's suggestion of creating a test_dataset_operators correlate to test_dataset_generators, which I meant to introduce before commit in order to illustrate what doesn't yet work with just the rebased PR as-is. Now that I've mucked it up, I'll include the diff as illiustration (the ValueErrors are solved in a subsequent change):

+++ test_dataset_graph_operators.py   2022-01-06 12:07:32.119924381 +0000
@@ -37,8 +37,7 @@
     with pytest.raises(ValueError):  # too many values to unpack (expected 3)
         assert len(ds - g) == 1  # removes pizza
 
-    with pytest.raises(AssertionError):  # 0 != 1
-        assert len(ds * g) == 1  # only pizza
+    assert len(ds * g) == 1  # only pizza
 
     with pytest.raises(ValueError):  # too many values to unpack (expected 3)
         assert len(ds ^ g) == 2  # removes pizza, adds cheese
@@ -64,10 +63,7 @@
     ):  # Trying to add to a context that isn't an identifier: None
         assert len(ds - cg) == 1  # removes pizza
 
-    with pytest.raises(
-        Exception
-    ):  # Trying to add to a context that isn't an identifier: None
-        assert len(ds * cg) == 1  # only pizza
+    assert len(ds * cg) == 1  # only pizza
 
 
 def test_operators_with_dataset_and_namedgraph():
@@ -90,8 +86,7 @@
     ):  # Trying to add to a context that isn't an identifier: None
         assert len(ds - ng) == 1  # removes pizza
 
-    with pytest.raises(AssertionError):  # 0 != 1
-        assert len(ds * ng) == 1  # only pizza
+    assert len(ds * ng) == 1  # only pizza
 
 
 def test_reversed_operators_with_dataset_and_graph():
@@ -107,8 +102,7 @@
     with pytest.raises(ValueError):  # too many values to unpack (expected 3)
         assert len(g + ds) == 3  # adds cheese as liking
 
-    with pytest.raises(AssertionError):  # 2 != 1
-        assert len(g - ds) == 1  # removes pizza
+    assert len(g - ds) == 1  # removes pizza
 
     with pytest.raises(ValueError):  # too many values to unpack (expected 3)
         assert len(g * ds) == 1  # only pizza

Assigning DATASET_DEFAULT_GRAPH_ID as the identifier of the Dataset default graph has an impact on Dataset.__len__() which then becomes the length of the default graph. Although this can accord with users' expectations (c.f. Dataset.graph should not allow adding random graphs to the store #698 (comment)) ...

“To clarify, if I were to run

import rdflib
from rdflib.namespace import FOAF
foaf = rdflib.Graph(identifier = FOAF)
foaf.parse("http://xmlns.com/foaf/spec/index.rdf", format="xml")
ds = rdflib.Dataset()
ds.add_graph(foaf)

am I correct in assuming that expected behavior is that len(foaf) should equal len(ds) (when in fact len(ds) is 0)?”

... the reality is that a Dataset is a collection of Graphs, one of which is a default --- in this light, perhaps len(ds) is perhaps more properly 0 and should be documented as such. fwiw, I favour len(ds) to be the length of the default graph, I doubt that users with deeper models of RDF1.1 would be thrown by this.

nicholascar · 2022-01-06T13:18:18Z

I just want to check we don't break any current multi-graph functionality.

I sometimes want to get a single graph out of a multi-graph and a function like this will currently do it:

def get_graph(
    self,
    identifier: Union[URIRef, BNode]
) -> Union[Graph, None]:
    """Returns the graph identified by given identifier"""
    return [x for x in self.contexts() if x.identifier == identifier][0]

How would a this need to be implemented after this change, because the whole graph object couldn't be returned, could it? Perhaps I would have to copy content out of the multi-graph, triple-by-triple like this:

def get_graph(
    self,
    identifier: Union[URIRef, BNode]
) -> Union[Graph, None]:
    """Returns the graph identified by given identifier"""
    g2 = Graph()
    for s, p, o in self.triples((None, None, None, identifier)):
        g2.add((s, p, o))
    return g2

ghost · 2022-01-06T14:00:19Z

I just want to check we don't break any current multi-graph functionality.

I sometimes want to get a single graph out of a multi-graph and a function like this will currently do it:
...
How would a this need to be implemented after this change, because the whole graph object couldn't be returned, could it? Perhaps I would have to copy content out of the multi-graph, triple-by-triple like this:

No, you're good, it returns the Graph (this, from the extended tests I'm about to commit):

def test_dataset_add_graph():
    data = """
    <http://data.yyx.me/jack> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_a> .
    <http://data.yyx.me/david> <http://onto.yyx.me/work_for> <http://data.yyx.me/company_b> .
    """
    ds = Dataset()
    subgraph_identifier = URIRef("urn:x-rdflib:subgraph1")
    g = ds.graph(subgraph_identifier)  # intended use pattern [1] 
    g.parse(data=data, format="n3")
    assert len(g) == 2
    subgraph = ds.get_context(subgraph_identifier)  # Your use case
    assert type(subgraph) is Graph
    assert len(subgraph) == 2

[1] see Dataset doc

ghost · 2022-01-06T14:13:38Z

Just a note, en passant. Because all RDFLib Graphs are named graphs, the Dataset implementation is slightly adrift from the RDF1.1 spec ...
“An RDF dataset is a collection of RDF graphs, and comprises:

Exactly one default graph, being an RDF graph. The default graph does not have a name and MAY be empty.
Zero or more named graphs. Each named graph is a pair consisting of an IRI or a blank node (the graph name), and an RDF graph. Graph names are unique within an RDF dataset.”

nicholascar · 2022-01-06T14:28:40Z

the Dataset implementation is slightly adrift from the RDF1.1 spec

Yeah, after this PR, my and Ashley's plan is to dedup ConjunctiveGraph & Dataset with a more spec implementation for an rdflib 7.0.0. I can't think of any benefits for having both classes that are similar but slightly different. It's confusing. I suppose someone added Dataset long after ConjunctiveGraph and had some specific reasons but I've not found work reasons to be able to differentiate them.

ghost · 2022-01-06T14:53:56Z

the Dataset implementation is slightly adrift from the RDF1.1 spec

Yeah, after this PR, my and Ashley's plan is to dedup ConjunctiveGraph & Dataset with a more spec implementation for an rdflib 7.0.0. I can't think of any benefits for having both classes that are similar but slightly different. It's confusing. I suppose someone added Dataset long after ConjunctiveGraph and had some specific reasons but I've not found work reasons to be able to differentiate them.

Gory details of the discussion are collated in this gist, enjoy 😁

ghost · 2022-01-06T14:58:39Z

Documentation note. Oracle how-to: “RDF Datasets, Named and Default Graphs Concepts and How to Query Them.
Emma Thomas. May 5, 2020” - purely SPARQL-oriented, a Pythonic API will need to have a different perspective.

A bit brutish maybe but effective in that the issues are fixed without causing any additional tests to fail.

nicholascar · 2022-01-07T23:38:32Z

Perhaps this PR should be moved to Draft if there are still a series of expected commits @gjhiggins? Just so that I and others don't approve too early or something!

ghost · 2022-01-07T23:43:58Z

Perhaps this PR should be moved to Draft if there are still a series of expected commits @gjhiggins? Just so that I and others don't approve too early or something!

I meant to set it to Draft status but forgot and there doesn't seem to be any way to change it.

The only remaining task (apart from documentation) is to implement the missing Dataset operators/in-place operators and corresponding tests.

Fixes quite a few issues 😄 https://github.com/gjhiggins/rdflib/tree/identifier-as-context/test/test_identifier_as_context/test_issues_fixed

nicholascar · 2022-01-07T23:52:28Z

OK, I can live!

I'm watching this closely so won't be a problem really. Ready to test myself and update documentation and examples like examples/datasets.py on completion.

ghost · 2022-01-08T01:14:54Z

OK, I can live!

I'm watching this closely so won't be a problem really. Ready to test myself and update documentation and examples like examples/datasets.py on completion.

Most of the work is now complete so examples/datasets.py executes without issue. You might be interested in the contents of test_issue436 which (along with the parsed input trig) is the proof-of-concept response to your comment.

mwatts15

Mostly looked at store.py and graph.py.

(Glad to see this finally being ironed out :))

mwatts15 · 2022-01-08T00:59:33Z

rdflib/graph.py

                yield s, p, o, None
-            else:
+            elif len(quad) == 3:
+                yield s, p, o, c.identifier


I think quads should return a Graph for the context in order to be consistent with other implementations.

Sorry, I don't understand your point. All three implementations of quads are iterators.

I'm saying that ConjunctiveGraph.quads returns a Graph for the fourth member of each quad, but Dataset.quads returns an Identifier. Admittedly, that isn't really the main point of this PR, so that may be a thing to consider later.

Ah, you did explicitly write "return a Graph for the context", sorry for being obtuse. You're correct that the current Dataset quads implementation returns an identifier but that behaviour is documented which is sort of the de facto definition of the public API.

I guess if this is to be changed, it needs to have a period of deprecation.

mypy has just reminded me that if quads are going to have a Graph as a context, then the type signatures of add and remove need to reflect that - hence my original typing of Optional[Union["Graph", "Identifier"]]

mwatts15 · 2022-01-08T01:02:22Z

rdflib/plugins/stores/auditable.py

-                else None
-            )
-            ctxId = context.identifier if context is not None else None
+            ctxId = context if context is not None else None


This is equivalent to ctxId = context

Happy to make this change to the original PR if it's generally agreed, otherwise - seeing as it was a rebase of a rebase, I left the original untouched.

mwatts15 · 2022-01-08T01:27:19Z

rdflib/graph.py

@@ -968,6 +967,12 @@ def qname(self, uri):
    def compute_qname(self, uri, generate=True):
        return self.namespace_manager.compute_qname(uri, generate)

+    def qname_strict(self, uri):


Why add more wrappers to namespace_manager? I don't see any additional usages in this PR.

In my experience, Graph is a poor place access a namespace manager from for anything except serializers: Any given namespace will span multiple graphs and stores, so I've found it's better to manage the namespace manager mostly separately vs trying to add and remove from this graph or that to get the prefixes I want..

Surely the move here is just to reach consistency of API? If you, @mwatts15, want a change in usage patterns, perhaps after this PR you would have to address the existing qname() & ``compute_qname()as well as qname_strict`?

So are you saying you use a stand-alone Namespace Manager object do do all your namespaceing, perhaps like:

from rdflib.namespace import NamespaceManager nm = NamespaceManager() g = Graph() g.namespace_manager(nm) other_gr = Graph() other_g.namespace_manager(nm) # etc.

Why add more wrappers to namespace_manager? I don't see any additional usages in this PR.

The other part of this commit details the issue resolved, when serialized as xml an Exception is raised by the XML serialiser: # 'Dataset' object has no attribute 'compute_qname_strict'. I'm not minded to prohibit serialisation of a Dataset as triples in XML (albeit, only the triples in the default graph get output) and, given that nicholascar and ashleysommer are considering a ConjunctiveGraph/Dataset dedupe, I thought this approach was the least impactful.

@gjhiggins, from where is 'Dataset' object has no attribute 'compute_qname_strict' reported? I would think the solution is just to pass dataset.namespace_manager instead of dataset in such a case. (I vaguely recall seeing a Graph passed off as a NamespaceManager in some RDFLib code, but I think that's really just a misuse of the API)

I don't mean to prohibit serialization of anything?? I'm just saying there's already a public API through the Graph's namespace_manager attribute: why duplicate it in Graph?

Surely the move here is just to reach consistency of API? If you, @mwatts15, want a change in usage patterns, perhaps after this PR you would have to address the existing qname() & compute_qname()as well as qname_strict?

My argument is for less redundancy in the API, so I'd say these methods should be deprecated. OTOH, I can see the argument for leaving them as part of the "legacy" API. Either way, I wouldn't encourage and deepen reliance on this redundancy by adding more methods that just pass through to namespace_manager.

As far as I can tell, the "Graph as namespace manager" thing traces back to a decision @eikeon made back in 2005 (6fc8ce9), probably to maintain backwards compatibility with earlier versions of Graph. I'd say remaining compatible with something before version 2.3.1 hasn't been possible for a while.

@gjhiggins, from where is 'Dataset' object has no attribute 'compute_qname_strict' reported? I would think the solution is just to pass dataset.namespace_manager instead of dataset in such a case.

In this test of serializations of the Dataset default graph.

dataset_default_graph = ds.get_context(DATASET_DEFAULT_GRAPH_ID) dataset_default_graph.serialize(format="xml") # <-- results in the Exception

Apologies for mistakenly stating earlier that it was a Dataset issue, not a Dataset default graph issue.

mwatts15 · 2022-01-08T01:42:23Z

rdflib/graph.py

@@ -1608,13 +1613,12 @@ def _spoc(
        either triples or quads
        """
        if triple_or_quad is None:
-            return (None, None, None, self.default_context if default else None)
+            return (None, None, None, self.identifier if default else None)


This is wrong, isn't it? You want _spoc to return a Graph for the 4th member.

This is wrong, isn't it? You want _spoc to return a Graph for the 4th member.

This specific change is inherited verbatim from the rebased PR see LoC#1313. Changing it back temporarily doesn't cause any test failures which suggests that if if is wrong, we need a test for it. I added a warning and found the only code that triggers the warning is test_conjunctive_graph.py::test_quad_contexts

The fourth member of the return value of _spoc is a Graph per the annotation, and it's treated as a Graph within this module except in a few cases where I've noted we check whether it is but could more intentionally compare against None (and maybe throw a TypeError for the case that we don't receive a Graph since that is expected)

Thanks for the explanation. The collective model of quads seems to have changed somewhat from the original, as documented in the “Working with multi-graphs” section of the "Introduction to parsing", where the type of the fourth member is explicitly described as an identifier, not a Graph object,

“To read and query multi-graphs, that is RDF data that is context-aware, you need to use rdflib's :class:rdflib.ConjunctiveGraph or :class:rdflib.Dataset class. These are extensions to :class:rdflib.Graph that know all about quads (triples + graph IDs).” (my emphasis)

The example is equally explicit:

You could parse the file and query it like this:

from rdflib import Dataset from rdflib.namespace import RDF g = Dataset() g.parse("demo.trig") for s, p, o, g in g.quads((None, RDF.type, None, None)): print(s, g)

This will print out:

http://example.com/person/drewp http://example.com/person/graph-1 http://example.com/person/nick http://example.com/person/graph-2

mwatts15 · 2022-01-08T01:44:56Z

rdflib/graph.py

@@ -1686,14 +1690,14 @@ def triples(self, triple_or_quad, context=None):
        """

        s, p, o, c = self._spoc(triple_or_quad)
-        context = self._graph(context or c)
+        context = context or c


This should explicitly check for context is None since an empty Graph is treated as a False value since __len__ is defined, but __bool__ isn't defined.

Yes, an explicit check would be required, except that issue appears to be addressed in the code that immediately follows the changed LoC:

if self.default_union: if context == self.identifier: context = None else: if context is None: context = self.identifier

mwatts15 · 2022-01-08T02:03:34Z

rdflib/graph.py

+        if len(triple_or_quad) == 4:
+            super(Dataset, self).add(triple_or_quad)
+        else:
+            super(Dataset, self).add(triple_or_quad + (DATASET_DEFAULT_GRAPH_ID,))


shouldn't this rather pass in the self.default_context?

mwatts15 · 2022-01-08T02:03:53Z

rdflib/graph.py


        if not self.store.graph_aware:
            raise Exception("DataSet must be backed by a graph-aware store!")
-        self.default_context = Graph(


why delete this??

It's in the original PR from gromgull - LoC #1605

mwatts15 · 2022-01-08T02:46:22Z

rdflib/graph.py

+        it returns the root context.
+        """
+
+        quadformats = ["trig", "trix", "nquads", "hext", "json-ld"]


I think this is meant to solve the problem of dataset serialization formats that send the default graph to the graph matching the "publicID" (whatever a "public ID" is), but keeping a list of formats RDFLib happens to know have a concept of "quads" or "graphs" seems a very poor way to handle that. More importantly, I think handling that problem can be done separately from just changing the store API.

Yeah, what does publicID actually mean? Is it just alternate (archaic?) terminology for the graph identifier, i which case we should probably just purge it, or does it have some other meaning?

Yeah, what does publicID actually mean? Is it just alternate (archaic?) terminology for the graph identifier, i which case we should probably just purge it, or does it have some other meaning?

From the docstring: “The publicID argument is for specifying the logical URI for the case that it's different from the physical source URI”.

It's necessary for correct serialization, e.g. reading the file examples/foaf.rdf and parsing it as data without a publicID results in URLs being rewritten as the file source, e.g.

rdflib.term.URIRef('file:///home/gjh/dev/rdflib/images/timbl-image-by-Coz-cropped.jpg')

whereas using the publicID of http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf results in the correct serialisation:

rdflib.term.URIRef('http://dig.csail.mit.edu/2008/webdav/timbl/images/timbl-image-by-Coz-cropped.jpg')

Just using the CSAIL URL as identifier doesn't result in correct serialization so it does appear that publicID has a valid and continued role.

OK, ok, I think I knew that or should have worked that out! So publicID stays as is.

mwatts15 · 2022-01-08T02:48:09Z

rdflib/graph.py

@@ -2169,7 +2257,7 @@ class ReadOnlyGraphAggregate(ConjunctiveGraph):
    """Utility class for treating a set of graphs as a single graph

    Only read operations are supported (hence the name). Essentially a
-    ConjunctiveGraph over an explicit subset of the entire store.
+    ConjunctiveGraph over an explicit subset of an entire single store.


isn't "single" redundant here?

I think so...

mwatts15 · 2022-01-08T02:49:30Z

rdflib/store.py

+    def remove(
+        self,
+        triple: Tuple["Node", "Node", "Node"],
+        context: Optional[Union["Graph", "Identifier"]],


Why is Graph in this annotation? Isn't the point of this PR to make it always an Identifier?

Looks like a good point...

ghost · 2022-01-10T18:19:09Z

Yeah, after this PR, my and Ashley's plan is to dedup ConjunctiveGraph & Dataset with a more spec implementation for an rdflib 7.0.0. I can't think of any benefits for having both classes that are similar but slightly different. It's confusing. I suppose someone added Dataset long after ConjunctiveGraph and had some specific reasons but I've not found work reasons to be able to differentiate them.

From: FWIW Dept.

Digging back into the commit history reveals that ConjunctiveGraph was introduced back in Oct 2005 in 750e8b8. Whilst this is contemporaneous with discussions of reification on the www-rdf-interest mailing list (see below), this significantly predates the official WC3 description of Dataset - which wasn't in the RDF1.1 draft of Aug 2011, only appearing in rough form in June 2012.

ConjunctiveGraph was eikeon's implementation of context-as-reification, a concept later refined and formalized by W3C as RDF Datasets. Hence the apparent duplication arises from the common requirements of representing reified triples. Ivan Herman, the author of the RDFLib Dataset implementation was an active member of W3C at the time (and who, tbf, isn't exactly complimentary about his implementation)

The first actual release of RDFLib - 2.0.6 was a couple of months prior to the publication of the Bizer, Carroll, Hayes and Stickler paper "Named Graphs, Provenance and Trust" at WWW May 2005, Chiba, Japan.

I suspect the fact that all RDFLib Graphs are named graphs is an early implementation of the principles defined therein. In Dec 2005, Chimezie committed a change to the Graph docstring referencing a 2004 paper on Named Graphs and TriX. The notion had been discussed for a couple of years previously, e.g. in this Feb 2004 paper after its airing in Dec 2003 by Chris Bizer on the www-rdf-interest mailing list - not that it was the first time reification had been discussed, as Jeremy Carroll observes:

... the interpretation of a reified statement is clearly
defined by RDF Semantics - but this differs from the applications that
reification is often used for. That was the best the WG could do and it is a
known limitation with the current round of RDF. That's why things like
contexts ended up in the postponed pile.

Given that Datasets are at least described in the W3C specs (albeit they declined to present a formal specification of the semantics) then following a clean, robust implementation of Datasets, the use of ConjunctiveGraphs could be deprecated but I imagine that would have profound implications for legacy apps/code.

ghost · 2022-01-10T18:54:29Z

I meant to set it to Draft status but forgot and there doesn't seem to be any way to change it.

Ah, found the "convert PR to draft" link.

nicholascar · 2022-01-18T11:52:29Z

...following a clean, robust implementation of Datasets, the use of ConjunctiveGraphs could be deprecated but I imagine that would have profound implications for legacy apps/code.

That's what a 7.0.0 would allow for!

The ConjunctiveGraph is dead, all hail the Dataset!

ghost · 2022-02-05T17:29:45Z

...following a clean, robust implementation of Datasets, the use of ConjunctiveGraphs could be deprecated but I imagine that would have profound implications for legacy apps/code.

That's what a 7.0.0 would allow for!

The ConjunctiveGraph is dead, all hail the Dataset!

Update. I went the whole hog and implemented the de-dupe. Dataset now inherits from Graph and the implementation is much cleaner. There are no longer any calls to ConjunctiveGraph in the RDFLib code, its presence in graph.py is now merely for legacy support. It's not quite ready for a PR just yet, even as a Draft, but soon enough (I hope) to be in 6.2 and enable a generous deprecation period for ConjunctiveGraph.

nicholascar · 2022-02-05T21:32:22Z

Great tidying @gjhiggins! Standing by to review/help/release.

ghost · 2022-03-22T11:31:55Z

This work unavoidably breaks the API (in particular, Dataset.get_context() which now returns a URIRef, not a Graph) and so can only be included as part of a version 0.7 release, so I have closed this PR and removed the branch from my fork.

nicholascar · 2022-03-22T12:33:22Z

@gjhiggins are you saying you are expecting to re-propose this PR for a 7.0.0 release?

ghost · 2022-03-22T14:26:05Z

Yes. Sorry if I wasn't clear. I've been busy working up a more complete implementation which includes changes to the stores, parsers, serializers and the SPARQL implementation in order to accommodate the changes in the public API.

All bar a couple of the extant suite of tests pass and apart from a small number of (some admittedly non-trivial) issues (e.g. Dataset operators/isomorphism [1] and the round-tripping of QuotedGraphs [2]) it's looking pretty good overall.

Now that a large chunk of the test files reorg has been done, I'm in a position to create a branch to carry the work which I am in the process of effecting right now and expect to commit later today.

[1] “RESOLVED: close issue-17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge.”

[2] In the context of test_graph_formula.py

def test_n3_roundtrip():
    ds = Dataset()
    ds.parse(data=testN3, format="n3")
    assert len(list(ds.contexts())) == 2
    assert sorted(list(ds.contexts())) == [
        rdflib.term.BNode('_:Formula2'),
        rdflib.term.BNode('_:Formula3'),
    ]
    assert sorted([c.n3() for c in list(ds.contexts())]) == ['_:_:Formula2', '_:_:Formula3']

Graham Higgins added 5 commits January 6, 2022 01:24

Move Store API to work with identifiers, not graphs

3b6be94

This is a rebase of PR #958, PR #409 rebased against master to fix #167

fix doctest failures, add annotation

b07b815

add required type

14069b6

add nquads to recognised file extensions

2914a10

modernise tests

200cb59

update to pytest, add failure annotations, change `urn:` to `urn:x-rdflib:`

aucampia reviewed Jan 6, 2022

View reviewed changes

test/test_graph_generators.py Outdated Show resolved Hide resolved

change Dataset initialisation to have identifier as DATASET_DEFAULT_G…

ea1c2eb

…RAPH_ID instead of None. Revert test failure annotations in `test_dataset_generators` and `test_named_graphs`

Graham Higgins and others added 10 commits January 7, 2022 00:03

Merge remote-tracking branch 'origin/master' into identifier-as-context

5f5435b

change urn:x-rdflib: to urn:example in tests

349e1a7

add folders of tests for issues

43fdcea

extend Dataset.parse() to handle default graph.

a9644eb

A bit brutish maybe but effective in that the issues are fixed without causing any additional tests to fail.

Merge branch 'RDFLib:master' into identifier-as-context

6d7bba3

reset sparql endpoint port to 3030

a537e0f

add check for HOST availability

9760823

Fix xml serialisation test failure

1ed5a13

add typing, following pattern for add

5daf25b

Change Dataset.__len__() to return length of default graph

ac5814e

mwatts15 suggested changes Jan 8, 2022

View reviewed changes

Graham Higgins added 5 commits January 9, 2022 16:12

extend test, remove example log output

7608413

disable failing assertions of trix serialization pro tem

cbf1967

annotate and elide SPARQLUpdateStore failure to clear default graph

b9ef605

explicitly raise NotImplementedError for unimplemented operators.

be3dc4c

delete w-i-p tests from PR

c1c2edf

ghost marked this pull request as draft January 10, 2022 18:52

Graham Higgins and others added 6 commits January 10, 2022 22:21

neaten dataset tests

e17d7e0

Fix type issue

a83705c

Merge branch 'RDFLib:master' into identifier-as-context

8006d84

Merge branch 'RDFLib:master' into identifier-as-context

958c385

Merge branch 'RDFLib:master' into identifier-as-context

c7558b2

Merge branch 'RDFLib:master' into identifier-as-context

bc1f47e

gjhiggins added 3 commits January 21, 2022 13:08

Merge branch 'RDFLib:master' into identifier-as-context

1515681

Merge branch 'RDFLib:master' into identifier-as-context

0ce60ee

Merge branch 'RDFLib:master' into identifier-as-context

835092a

gjhiggins added 4 commits February 6, 2022 16:32

Merge branch 'RDFLib:master' into identifier-as-context

10098b5

Merge branch 'RDFLib:master' into identifier-as-context

7883cf3

Merge branch 'RDFLib:master' into identifier-as-context

77ac1ba

Merge branch 'RDFLib:master' into identifier-as-context

2a9c925

ghost closed this Mar 22, 2022

ghost deleted the identifier-as-context branch March 22, 2022 10:58

ghost mentioned this pull request Jul 27, 2022

Issues with rdflib.Dataset #2021

Open

This pull request was closed.

Move Store API to work with identifiers, not graphs #1646

Move Store API to work with identifiers, not graphs #1646

Uh oh!

Conversation

ghost commented Jan 6, 2022

Proposed Changes

Uh oh!

ghost commented Jan 6, 2022

Uh oh!

Uh oh!

ghost commented Jan 6, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicholascar commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jan 6, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jan 6, 2022

Uh oh!

nicholascar commented Jan 6, 2022

Uh oh!

ghost commented Jan 6, 2022

Uh oh!

ghost commented Jan 6, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicholascar commented Jan 7, 2022

Uh oh!

ghost commented Jan 7, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicholascar commented Jan 7, 2022

Uh oh!

ghost commented Jan 8, 2022

Uh oh!

mwatts15 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Jan 9, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Jan 6, 2022 •

edited by ghost

Loading

nicholascar commented Jan 6, 2022 •

edited

Loading

ghost commented Jan 6, 2022 •

edited by ghost

Loading

ghost commented Jan 6, 2022 •

edited by ghost

Loading

ghost commented Jan 7, 2022 •

edited by ghost

Loading

ghost Jan 9, 2022 •

edited by ghost

Loading