Skip to content

Apply IdentifiedNode to Graph iterators #1697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ajnelson-nist
Copy link
Contributor

Fixes #1696

Proposed Changes

Note the Graph signature updates should also be propagated to Graph subclasses before this is merged.

This patch is known to be incomplete with respect to other needed
revisions to type signatures, both in Graph subclasses and in graph
consumers elsewhere under `/rdflib`.  However, running this change
through a type checker raised some questions on `Node` specificity and
circular type dependencies, and so is posted for discussion.

References:
* RDFLib#1696

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This resolves type signature issues raised in the prior patch, but might
require a follow-on patch depending on `Path` discussion.

References:
* RDFLib#1696

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist marked this pull request as draft January 25, 2022 18:54
This resolves type signature issues raised in the prior patch, but might
require a follow-on patch depending on `Path` discussion.

References:
* RDFLib#1696

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This patch makes no semantic changes, instead applying formatting from
`black`.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

This is ready for look-overs. Let's keep discussion to #1696.

Thanks to Iwan for noting this variable's behavior.

References:
* RDFLib#1696

Reported-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References:
* RDFLib#1696

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
`mypy` was confused by some of the variable name re-use and/or
re-definition.

This patch also highlights that `Path`s can be yielded in the predicate
position, which has further downstream consequences in `graph.py` and in
the `Store` class.

References:
* RDFLib#1696

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@aucampia
Copy link
Member

I will have a look later this week, but this is quite critical to do, so thanks. I will also see if I can make CI pipeline pass.

Comment on lines +456 to +460
self,
triple: Tuple[
Optional[IdentifiedNode], Union[None, Path, IdentifiedNode], Optional[Node]
],
) -> Iterable[Tuple[IdentifiedNode, Union[IdentifiedNode, Path], Node]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more correct than it was, but technically this should be:

Suggested change
self,
triple: Tuple[
Optional[IdentifiedNode], Union[None, Path, IdentifiedNode], Optional[Node]
],
) -> Iterable[Tuple[IdentifiedNode, Union[IdentifiedNode, Path], Node]]:
self,
triple: Tuple[
Optional[IdentifiedNode], Union[None, Path, URIRef], Optional[Node]
],
) -> Iterable[Tuple[IdentifiedNode, Union[URIRef, Path], Node]]:

Fine with it as is though, we can narrow it more later, this narrow already makes things better.

Same applies other in places also, won't label them all, just a note.

Type checking passes now.

I also included some changes which reduces
the footprint of this change so that it becomes almost entirely just
change of type annotations.

This may be a bit on the pendatic side, and I would be fine if you keep
the changes as they were, but I would rather keep refactoring seperate from
changes to add typing.
@aucampia
Copy link
Member

aucampia commented Feb 22, 2022

Change looks very good, thanks, and it is quite essential, I made some minor fixes in ajnelson-nist#14 to get the type checking to pass, and I also roll back some changes which is not entirely essential for this PR's aim, but it's up to you what to do with those.

@aucampia
Copy link
Member

For some reason mypy does not complain about unused ignores on my computer, will have to figure out why, but I removed the ignores that are not needed now and type validation seems to pass.

@ajnelson-nist
Copy link
Contributor Author

Thanks! I just hit that too, and I'm dealing with a weirdness in rdflib/plugins/serializers/rdfxml.py, line 190:

rdflib/plugins/serializers/rdfxml.py:190: error: Incompatible types in assignment (expression has type "Set[IdentifiedNode]", variable has type "Set[Node]")
rdflib/plugins/serializers/rdfxml.py:190: error: Argument 1 to "union" of "set" has incompatible type "Iterable[Node]"; expected "Iterable[IdentifiedNode]"

Partially reverting your PR to split that up again is...almost helping. I'm close to pushing a patch once I figure out the spirit of the possibles variable.

@aucampia
Copy link
Member

This validates on github actions now, I think maybe I have an older version of mypy than is being used on github actions

@ajnelson-nist
Copy link
Contributor Author

Hm, seems I had an old-mypy issue too. I was on 0.920, upgraded to 0.931, and now that curious issue with possibles has gone away. (Though, squinting at that line, I think it'll be worth an independent PR.)

I'm going to take this PR out of Draft mode now. It seems ready for any last review and merge to me.

@ajnelson-nist ajnelson-nist marked this pull request as ready for review February 23, 2022 00:26
@aucampia aucampia added the review wanted This indicates that the PR is ready for review label Feb 23, 2022
@ajnelson-nist
Copy link
Contributor Author

...I spoke a little soon, I missed black wanted to reformat a bit. NOW I think it's ready.

@aucampia
Copy link
Member

pre-commit.ci autofix

Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks again @ajnelson-nist

This only changes type annotations with one small exception, which is a change in variable names in well tested code path, this change in variable names is needed for type checking to pass without error/ignores as mypy does not like that the same variable has different types in the same scope:

rdflib/rdflib/graph.py

Lines 495 to 497 in 06feef0

else:
for (_s, _p, _o), cg in self.__store.triples((s, p, o), context=self):
yield _s, _p, _o

This PR also narrows some type annotations, but this narrowing brings things more in line with RDF 1.1 spec so I think it is good, if someone wants to use rdflib with non compliant types and type checking they will have to use # type: ignore or cast - but on the other hand, if we try and cater for non compliant RDF it also makes our lives harder.

The narrower types should however help users of this library detect errors in their code where they are potentially creating invalid RDF graphs.

I will add a page in docs regarding type annotations which covers some of the questions in #1696 and #1447

@aucampia aucampia merged commit 80a4633 into RDFLib:master Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review wanted This indicates that the PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IdentifiedNode should be used in Graph.triples() and related iterators
2 participants