Skip to content

Conversation

ato
Copy link
Member

@ato ato commented Feb 25, 2021

This adds support for Pywb's CDXJ format. We follow the Pywb convention of emitting numeric values as JSON strings but accept JSON numbers if given them as input.

Support for arbitrarily named extension fields is not included yet and will be added separately as it requires a new version of the index storage format. Similarly our current index version doesn't really support the notion of missing fields so we map missing fields to "-" or -1 as appropriate for storage, which is a bit hacky but should generally work for now.

Other proposed CDXJ variants (such as "OpenWayback CDXJ") are not supported.

CC @ikreymer @anjackson
Closes #48

This adds support for Pywb's CDXJ format. We follow the Pywb convention
of emitting numeric values as JSON strings but accept JSON numbers if
given them as input.

Support for arbitrarily named extension fields is not included yet and
will be added separately as it requires a new version of the index
storage format. Similarly our current index version doesn't really
support the notion of missing fields so we map missing fields to "-" or
-1 as appropriate for storage, which is a bit hacky but should generally
work for now.

Other proposed CDXJ variants (such as "OpenWayback CDXJ") are not
supported.
@ato ato merged commit c2e7eea into master Mar 4, 2021
@ato ato deleted the cdxj branch March 4, 2021 01:26
ato added a commit that referenced this pull request Mar 19, 2021
- Added CDXJ input and output formats #100
- Annotate closest result in xmlquery for compatibility with OpenWayback BubbleCalendar #101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant