Skip to content

Problem: It's possible to use graph database as the main data storage or to efficiently solve workflows with graph databases #7489

@thegostev

Description

@thegostev

Problem hypothesis

  1. Is it possible to retain lossless graph database harvesting in CKAN.
  2. SQL database that's used right now for CKAN puts a limitation on amount of CKAN applications. As an example triplestore approach is used widely in Switzerland, thus CKAN with SQL database should be heavily customized (if it's possible) to fit in.
  3. SQL database doesn't support required standards in full. What standards are required and what parts of them are not supported?

Problem discovery
Gathering evidence here. Who mentioned the problem? How they solve the problem now? Are they ready to commit or provide feedback after delivery?

  • We know that sources of data for Swiss and EU customers are sometimes graph databases. Can we get examples?
  • It was similar issue from past experience with a german client, it was around DCAT-AP.
  • Harvesting of external catalogs in DCAT format. During this process they lose connections between the catalog entities (and relations).

Problem statement
Formulate the problem found during discovery stage


Solution hypothesis
Formulate the problem found during discovery stage

  1. We can have 2 copies of metadata: original (graph), frictionless.
  2. Can we have a single DB?
  3. What if I can choose in what format I would like to store the data. Would we have a database for each format? It sound's like it easier to get our of sync.

Solution discovery
Log here everything you've found during the discovery

2 solutions discussed and being researched:

  • Lossless harvesting storage with 2 databases (graph and postgres) Solution: Lossless harvesting storage with 2 databases (graph and postgres) #7514

  • Generic solution to store triples Solution: Generic solution to store triples #7515

  • Top level entity in DCAT is the catalog, we don't have this structure in CKAN. It's no possibility to do 1:1 mapping from DCAT to CKAN. Some metadata fields can't be mapped.

  • Now 1 package is 1 row in the table. We can store triplets with a plugin for triplets. Thus we can import original data and keep it as a copy.

  • What if we have a generic solution?

  • It was a project in Taiwan involving tripplestore but it was highly customized -> ⚠️ We need solution that would work for more people.

Validation
Why the solution is trustworthy? What makes it strong?

Questions to consider:

Is this change going to break current installations?

Can we provide a backwards compatibility?

How easy is gonna be for current implementations to migrate to this new release?

Do current versions of CKAN have the adequate resources/support to migrate to this new version?

Are we going to change the database schema?

Are we going to change the API?

Are we going to deprecate Interfaces?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Research Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions