Skip to content

internetarchive/keystone

Repository files navigation

Keystone

About

Keystone is a web client for the ARCH (Archives Research Compute Hub) job server.

Run Keystone & ARCH using Docker

Note that the following features are only available in the hosted version at: https://arch.archive-it.org

  • Google Colab integration
  • Dataset publication to archive.org

Prerequisites

Build and Run the Docker Image

1. Build the images
make build-images
2. Run the services
docker compose up
3. Surf on over to http://localhost:12342
4. Log in

Log in as one of the three user types that dev/entrypoint.py created for you:

  • Superuser: username: system password: password
  • Admin: username: admin password: password
  • Normal: username: test password: password

The "arch-shared" Directory

The build-images Make target will create a local arch-shared subdirectory that will be mounted within both the running Keystone and ARCH containers to serve as the storage destination for ARCH outputs, and as a place to add your own custom collections of WARCs for analysis.

The arch-shared directory has the structure:

arch-shared/
├── in
│   └── collections
├── log
└── out
    ├── custom-collections
    └── datasets

These subdirectories are utilized as follows:

  • log
    • ARCH job logs
  • out/custom-collections
    • ARCH Custom Collection output files
  • out/datasets
    • ARCH Dataset output files
  • in/collections
    • A place to make your own WARCs available to ARCH as inputs - see "Analyze Your WARCs" below
Analyze Your WARCs

For each group of WARCs that you'd like to analyze as a collection:

  1. Create a new subdirectory within arch-shared/in/collections with a descriptive kebab-case style name like my-test-collection and copy your *.warc.gz into it, e.g.
arch-shared/
└── in
    └── collections
        └── my-test-collection
            └── ARCHIVEIT-22994-CRAWL_SELECTED_SEEDS-JOB1965703-SEED3267421-h3.warc.gz
  1. Restart both the Keystone and ARCH containers
docker compose restart keystone arch
  1. Your new collection will now be visibile in Keystone (e.g. as My Test Collection)

About

ARCH Web Client

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •