JRFC 33 - Repositories

This document is an attempt at specifying a generalized spec for repositories 
(the git and ipfs kind) in the hope to arrive at a generalized set of good 
practices. I am new to many intricacies and edge cases, so please suggest
important additions.

---

Many tools and systems create data repositories with configuration files. The 
classic example is `git` and other VCS tools, but many systems do. Application 
changes will necessarily bring about changes to the format of the repository 
(e.g. changing how data is stored, or changing the data itself). These should 
**NEVER** cause any data loss on users, and great care must be given to ensure
all format changes are accompanied with migration tools.

As applications grow, different types of storage media or execution strategies 
may optimize different use cases e.g. "flat files inside `.git` for git cli" 
vs "git repo inside database for fast web server access". No matter the use
case, application implementations should be able to operate with different 
concrete versions of the repository, provided suitable adaptors exist. This
separation reduces the cost of writing new storage implementations, and new
application implementations.

Terms:
- `repo` - a repository, a structured collection of objects, with a 
  configuration. e.g. a git repo. an ipfs repo
- `config` - a repository configuration which holds repository options
- `database` - a database which holds the repository data. this may be
  a key value store (leveldb), a collection of flat files (`.git/objects`), a
  relational db (SQLite), etc.
- `address` - is an identifier of the location of the repository e.g.: 
  `/Users/jbenet/foo/bar/.git`, `https://github.com/jbenet/go-ipfs`.
- `format` - the way in which the data is organized 
- `repo version` - a number identifying the repo's format. It is easiest if 
  these are monotonically increasing integers.
- `concrete repo` - the actual repo as stored in storage media. (e.g. posix 
  files inside `.git/`, files and a leveldb, s3, ...)
- `virtual repo` - a virtual object which can be manipulated. The distinction
  between `concrete` and `virtual` is here so that tools may be written mostly
  to operate on the `virtual` repo, and remain compatible with a variety of 
  repo implementations, through adapters.
## Notes
- `repo version` **MUST** be included, and remain readable by _all_ tools 
  attempting to modify `repo` (e.g. migration tools from any version must
  be able to determine the current version of the repo. Example:
  `.go-ipfs/version`)
- `config` and `database` may both be implemented by the same storage system, 
  but it is recommended they are separate, as one might define the other.
## Synchronization

Operations on a `repo` may require synchronization (some repos may support 
concurrent modifications, and others require complete mutual exclusion). Repos 
which require mutual exclusion must support mechanisms to achieve it (e.g. 
`.git/index.lock`). These may be granular or coarse, but repo formats must define
synchronization, so various implementations can ensure safe, concurrent access.
## Migrations

Migrations: through the lifetime of an application, `repo` formats may require 
changes. These changes must be accompanied a "migration tool", which convert 
the _data_ from the most recent format version, to the new one. Ideally the 
upgrade can be applied in both directions (`old <-> new`). For example, one 
may end up with a set of "repo version migration" tools like the following:

```
> ls ipfs/bin/repo-migrations
1-to-2
2-to-3
3-to-4
4-to-5
5-to-6
6-to-7

> ipfs/bin/repo-migrations/1-to-2
repository version: 3
already up to date.

> ipfs/bin/repo-migrations/3-to-4
repository version: 3
applying path: 3-to-4
repository version: 4

> ipfs/bin/repo-migrations/5-to-6 --revert
repository version: 4
applying patch: 4-to-3
repository version: 3

> ipfs/bin/repo-migrations/run 1-to-7
repository version: 3
applying patch: 3-to-4
applying patch: 4-to-5
applying patch: 5-to-6
applying patch: 6-to-7
repository version: 7
```

It is advised that repo migration tools are `virtual repo` tools (that is, implemented
to work with the logical repo, instead of the concrete data). This makes it possible
to reuse migration tools across repo implementations (with proper adapters). 
This may not be possible always, repo-format-specific migration tools might
be necessary.
### human inspection

Repo implementations must include tools to transform the data to a human
readable/inspectable structure. This makes it possible for users and application 
implementors to debug problems. These tools may be easiest to implement with
a human readable repository format, and conversion tools to convert to/from it.
### corruption
- `corrupted` - an unexpected, invalid data state
- `recovery` - the process of "uncorrupting" a repository. may not be possible.

...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JRFC 33 - Repositories #33

Notes

Synchronization

Migrations

human inspection

corruption

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

JRFC 33 - Repositories #33

Description

Notes

Synchronization

Migrations

human inspection

corruption

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions