Skip to content

Replace long list of namespaces with list of prefixes used when using serialize. #1679

@wmelder

Description

@wmelder

I am wondering why the latest version of rdflib gives me a long list of namespaces when serializing a Graph to JSON-LD. It didn't used to be like that.

            return self.graph.serialize(
                format=return_format,
                context=dict(self.graph.namespaces()),
                auto_compact=True
            )

The return_format is 'json-ld'. The context in the result is:

  "@context": {
    "brick": "https://brickschema.org/schema/Brick#",
    "csvw": "http://www.w3.org/ns/csvw#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcam": "http://purl.org/dc/dcam/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcmitype": "http://purl.org/dc/dcmitype/",
    "dcterms": "http://purl.org/dc/terms/",
    "doap": "http://usefulinc.com/ns/doap#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "gtaa": "http://data.beeldengeluid.nl/gtaa/",
    "non-gtaa": "http://data.beeldengeluid.nl/nongtaa/",
    "odrl": "http://www.w3.org/ns/odrl/2/",
    "org": "http://www.w3.org/ns/org#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "prof": "http://www.w3.org/ns/dx/prof/",
    "prov": "http://www.w3.org/ns/prov#",
    "qb": "http://purl.org/linked-data/cube#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "https://schema.org/",
    "sdo": "https://schema.org/",
    "sh": "http://www.w3.org/ns/shacl#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "time": "http://www.w3.org/2006/time#",
    "vann": "http://purl.org/vocab/vann/",
    "void": "http://rdfs.org/ns/void#",
    "xml": "http://www.w3.org/XML/1998/namespace",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },

This is where the graph and namespace bindings (including some custom ones) were created:

        self.graph = Graph()
        self.graph.namespace_manager.bind("skos", SKOS)
        self.graph.namespace_manager.bind("gtaa", Namespace(self._model.GTAA_NAMESPACE))
        self.graph.namespace_manager.bind("non-gtaa", Namespace(self._model.NON_GTAA_NAMESPACE))

Further, in the custom class I add another namespace and triples. Here's a fragment:

        self.graph.namespace_manager.bind('sdo', Namespace(self._model.SCHEMA_DOT_ORG_NAMESPACE))
        # create a node for the record
        self.itemNode = URIRef(self.get_uri(concept_type, metadata["id"]))

        # get the RDF class URI for this type
        self.classUri = self._model.CLASS_URIS_FOR_DAAN_LEVELS[concept_type]

        # add the type
        self.graph.add((self.itemNode, RDF.type, URIRef(self.classUri)))

The custom Class is used to read some JSON from a backend system, interpret this and generate RDF for the item. You could see this as a wrapper pattern.

As I wrote in comments to this issue, I had to create some custom function to remove unused prefixes from the context, but that code is not so dynamic:

    def remove_unused_prefixes(self):
        """ Clean up the long list of namespaces.
        """
        context = dict(self.graph.namespaces())
        used_prefixes = ['gtaa', 'non-gtaa', 'rdf', 'rdfs', 'sdo', 'skos', 'xml', 'xsd']
        return {p: context[p] for p in used_prefixes}

and this is used here:

            context_used = self.remove_unused_prefixes()
            return self.graph.serialize(
                format=return_format,
                context=context_used,
                auto_compact=True
            )

Now I just discovered that the context argument can be left out. This is probably because of recent improvements and integration of json-ld. Well done. But it still gives me the long list. Also, when omitting the context argument the auto_compact=True argument finally gives me a short representation that I wanted, for example: "sdo:datePublished": "2006-02-19",. This is not the case when using this context = dict(self.graph.namespaces()). But, after all I still get the long list of namespace that I aren't used.

Another discovery: when further reducing the number of arguments, I still get JSON-LD, but no context at all in the results.

           return self.graph.serialize(
                format=return_format
            )

To summarize this issue: I would like to get JSON-LD serialization including context, but with a minimal list of (used) prefixes/namespaces in response to this request:

            return self.graph.serialize(
                format='json-ld',
                auto_compact=True
            )

I hope provided examples will help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingserializationRelated to serialization.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions