Skip to content

Releases: nla/outbackcdx

1.0.0

15 Feb 14:11
@ato ato
Compare
Choose a tag to compare

New features

  • Added a CBOR-based index version 5 which supports storing arbitrary CDXJ fields
  • Added query params method and requestBody to support replay of POST requests (currently requires pywb patch)
  • Added omitSelfRedirects query param and --omit-self-redirects CLI option which omits records which after URL canonicalization redirect to themselves from results
  • API for compacting an index
  • API for upgrading an index (see #117 for instructions for now)
  • API for exporting data for statistics (/cube)

Changes

  • Updated to RocksDB 8.1.1.1

Release 0.11.0

28 May 03:33
@ato ato
Compare
Choose a tag to compare
  • Upgrade to RocksDB 6.20.3 (Jamie Hoyle) #102
  • Strip prefixes like "sha1:" from the digest field when parsing CDX input (Jamie Hoyle) #102

Release 0.10.0

19 Mar 05:47
@ato ato
Compare
Choose a tag to compare
  • Added CDXJ input and output formats #100
  • Annotate closest result in xmlquery for compatibility with OpenWayback BubbleCalendar #101

0.9.1

12 Feb 11:22
@ato ato
Compare
Choose a tag to compare

Fixes storage of CDX lines such as those in CDX 9 or 10 format which lack a compressed size field. #99 (Kristinn Sigurdsson)

Previously OutbackCDX would return "0" instead of "-" for these records causing clients to think they were zero-length records. If you were affected by this you will need to reinsert the affected records into the index.

0.9.0

09 Feb 14:23
@ato ato
Compare
Choose a tag to compare
  • Added support Pywb's __wb_post_data query parameter. #91 (Kai Jauslin)
  • CDX parsing: Timestamps shorter than 14 digits are now padded with trailing zeroes so they sort correctly. #95 (Kristinn Sigurdsson)
  • XmlQuery: OutbackCDX now supports OpenWayback's count and start_page parameters for pagination.
  • XmlQuery: the numreturned and numresults values are now returned in the XML output for better compatibility with OpenWayback's default templates and pagination. #98

To maintain the ability to stream results without holding them in memory the <request> element is moved to the end of the XML output after <results> element. This was tested against OpenWayback but if there exist any other clients the XmlQuery protocol they may be affected. Please let us know if you encounter any compatibility problems.

When there are more records matching the query than the count parameter asks to be returned, in order to calculate numresults the OutbackCDX now needs to scan potentially many more records. This will slow down queries that match a lot of records but only return a fraction of them. A new --max-num-results command-line option was added to constrain the number of extra records that will be scanned. This defaults to 10,000. You may wish to decrease it if you don't care about numresults or increase if you want to paginate deeper than this in OpenWayback. This only affects queries using the XML protocol.

0.8.0

09 Dec 00:58
@ato ato
Compare
Choose a tag to compare
  • HMAC protected WARC record URLs (see README for details)
  • OpenWayback's count and start_page parameters in the XML CDX server protocol are now supported
  • Very basic support for POST data (in conjunction with Pywb and Pywb's cdx-indexer)
  • CDX records with timestamps shorter than 14 digits are now padded with zeroes.
  • Timeouts and logging for replication requests
  • --context-path option to deploy at a path other than /
  • Exceptions during queries are now logged and reported

0.7.0 - replication, fuzzy canon, filter plugins and more

14 Jan 04:50
@ato ato
Compare
Choose a tag to compare

Big new features contributed by James Kafader and Noah Levitt from the Archive-It team at the Internet Archive:

  • Replication
  • Fuzzy canonicalization using pywb-style rules.yaml
  • Request logging
  • RocksDB upgraded to 6.0.1
  • Command line options for tweaking performance
  • Filter plugins
  • New query params
    • urlkey - alternative to url, bypasses outbackcdx canonicalization
    • from - minimum timestamp
    • to - maximum timestamp
    • collapse aka collapseToFirst
    • collapseToLast

0.6.0

02 Sep 08:57
@ato ato
Compare
Choose a tag to compare
  • CDX meta/robotflags field support
  • CDX10 support
  • CDX11+3 support (Neil Munro)
  • ?badLines=skip option (Noah Levitt and Madison Scott-Clary)
  • Filter plugin API (Madison Scott-Clary)
  • Added option to limit max open SST files (with default value heuristic suggested by Noah Levitt)
  • Regex filters in query API
  • Fix for hang on large requests
  • Experimental option to use Undertow web server instead of nanohttpd
  • Upgraded various dependencies (notably urlcanon)