Skip to content

Some special characters might be parsed wrongly (?) #1655

@GreenfishK

Description

@GreenfishK

It seems that some special characters in RDF literals are not preserved after parsing them but rather translated into something faulty. So far, I found following ones:

n3_test.nt:

<http:s> <http:o1> "\n" .
<http:s> <http:o2> "\f" .
<http:s> <http:o3> "\b" .
<http:s> <http:o4> "\\r" .
<http:s> <http:o5> "\\\r" .
from rdflib import Graph
from rdflib import term

g = Graph()
g.parse("n3_test.nt")

for s, p, o in g:
    assert (type(o) == term.Literal)
    print("{s} {p} {o}".format(s=s.n3(), p=p.n3(), o=o.n3()))

Sorted Output:

<http:s> <http:o1> """
"""
<http:s> <http:o2> ""
<http:s> <http:o3> "
<http:s> <http:o4> "\\\r"
<http:s> <http:o5> "\\\r"

We see e.g. that "\\r" and "\\\r" result into the same literal and I am not sure if this is the expected behavior. There are some DBPedia logs unfortunately which have such characters and currently I just cannot parse them correctly.

Is there a trick or a proper way how to get around this?

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions