Skip to content

JSON-LD schema.org parsing fails with JSONDecodeError("Expecting value", s, err.value) from None #1781

@danbri

Description

@danbri

I have been trying different variations to parse a JSON-LD file into a Graph, but they're all failing.

The file seems OK (I tried several) and it parses ok with the JSON-LD playground. I tried a few variations for invoking the parser.

This was after entirely nuking and reinstalling Python/Anaconda, and was in a fresh Conda environment (python=3.8), and with only "pip3 install rdflib", i.e. no ageing version of the plugin version of the parser hanging around.

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
if __name__ == '__main__':
    fn = "example1.jsonld"
    g = Graph()
    g.parse(fn, format="json-ld")

parsejsonld_B.py

#!/usr/bin/env python3

from rdflib import Graph
g = Graph().parse("example1.jsonld", format="json-ld")
g.serialize("test-jsonld.nt", format="nt")

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
g = Graph()
g.parse(location = "file:feedkgx/example1.jsonld")
print(len(g))

The example file is just taken from Google documentation, see this Gist.

In each case I get this response:

./parsejsonld_A.py

Traceback (most recent call last):
  File "./parsejsonld_A.py", line 8, in <module>
    g.parse(fn, format="json-ld")
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/graph.py", line 1258, in parse
    parser.parse(source, self, **args)  # type: ignore[call-arg]
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 125, in parse
    to_rdf(data, conj_sink, base, context_data, version, generalized_rdf)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 144, in to_rdf
    return parser.parse(data, context, dataset)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 164, in parse
    context.load(local_context, context.base)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 357, in load
    self._prep_sources(base, source, sources, referenced_contexts)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 381, in _prep_sources
    new_ctx = self._fetch_context(
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 413, in _fetch_context
    source = source_to_json(source_url)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/util.py", line 43, in source_to_json
    return json.load(use_stream)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

As far as I can tell this is this something to do with the Schema.org @context URL, and our migration from http://schema.org/ + conneg, to https://schema.org/ and a JSON-LD 1.1-style HTTP header as the discovery mechanism for the context? But the error message is pretty uninformative.

If I change the schema.org context in the files to avoid a remote context, it parses.

The context lives here:

curl -s --head https://schema.org/ | grep 'link:'
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

Would a PR be welcomed on this?

e.g.

  • clearer error message
  • treat context as if it were written like this:
    "@context": { "@vocab": "https://schema.org/" },
  • or actually fetch the context doc via the link: header mechanism, as JSON-LD playground seems to do.

Related discussion: schemaorg/schemaorg#2578

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions