-
Notifications
You must be signed in to change notification settings - Fork 319
Description
Let's build plumbing to load data into WordPress.
I think any data source can be represented as a stream of structured entities.
- WP_WXR_Reader sources them from a WXR file
- A markdown importer could do the same for markdown files
- WordPress -> Wordpress could be the same story
See this relevant visual from WordPress for Docs:
Importing data
WXR importers must answer these questions:
- What if a post with a given ID does or doesn't exists?
- What if there's a partial difference between the two posts? Do we ignore it? Reconcile? Ask the user? Which post wins?
- What if the author does or doesn't exist in the database?
- Ditto for tags, categories, post meta etc.
Let's view a WXR file as a flat list of entity objects such as posts, comments, meta, etc. We can now represent a lot of scenarios as list concatenation:
- Importing WXR into a WordPress site is
WordPress entities ++ WXR Entities
- Importing two WXR files is
WXR Entities ++ WXR Entities
- Pausing and resuming WXR import is
Entities before pause ++ Entities after pause
- Importing WordPress -> WordPress is
WordPress 1 Entities ++ WordPress 2 Entities
. - Syncing WP -> WP is
WordPress 1 Entities ++ WordPress 2 entities ++ WordPress 1 deletions ++ WordPress 2 deletions
From there, we'd need to reduce those lists to contain zero or one entries representing each object.
This is already similar to journaling MEMFS to OPFS in the Playground webapp. It also resembles map/reduce problems where parts of the processing can be parallelized while other parts must be processed sequentially.
I bet we can find a unified way of reasoning about all these scenarios and build a single data ingestion pipeline for any data source.
Let's see how far can we get with symbols and reasoning before writing code. I'm sure there are existing white papers and open source projects working through this exact problem.
Resources
- Existing WXR importers
- Importers from other data formats
- Site sync plugins