-
Notifications
You must be signed in to change notification settings - Fork 578
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
It seems that some special characters in RDF literals are not preserved after parsing them but rather translated into something faulty. So far, I found following ones:
n3_test.nt:
<http:s> <http:o1> "\n" .
<http:s> <http:o2> "\f" .
<http:s> <http:o3> "\b" .
<http:s> <http:o4> "\\r" .
<http:s> <http:o5> "\\\r" .
from rdflib import Graph
from rdflib import term
g = Graph()
g.parse("n3_test.nt")
for s, p, o in g:
assert (type(o) == term.Literal)
print("{s} {p} {o}".format(s=s.n3(), p=p.n3(), o=o.n3()))
Sorted Output:
<http:s> <http:o1> """
"""
<http:s> <http:o2> ""
<http:s> <http:o3> "
<http:s> <http:o4> "\\\r"
<http:s> <http:o5> "\\\r"
We see e.g. that "\\r" and "\\\r" result into the same literal and I am not sure if this is the expected behavior. There are some DBPedia logs unfortunately which have such characters and currently I just cannot parse them correctly.
Is there a trick or a proper way how to get around this?
aucampia
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working