-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Purpose
Store whole repositories in the same place instead of splitting them in several siva files. Reasons explained in: #380
Changes
- Add rooted repo column for the whole repository to the database schema
- Skip init commit search if repository has already a rooted repository selected in DB
- Select the rooted repository for the repository if it still doesn't have one
Database
In core-retrieval
add a new column to Repository
:
Init SHA1
We want to keep also the Init
in Reference
as these will be used to delete the references from the extra rooted repos on updating.
Init selection
If the repository already has Init
column set use it instead of searching for one. Otherwise pick it following this rules:
- Error when there are no references
- If there's a default branch and is valid calculated the rooted repo from it
- If there's no default branch calculate rooted repos from all branches and pick the most used, that is, the rooted repo with more references
- If there is a tie pick the first lexicographically
Note: There could be more rules like getting the longest commit history tree or checking which ones already exist in the database but it will make the code more complex and this shouldn't happen too often.
Changes in the code
gitReferencer
(https://github.com/src-d/borges/blob/master/git.go#L56) should have a new constructor to accept the init commit in case it exists in the database:
func NewGitReferencerWithInit(r *git.Repository, i plumbing.Hash) Referencer {
return gitReferencer{
Reposirory: r,
init: i,
}
}
type gitReferencer struct {
*git.Repository
init plumbing.Hash
}
If init
is set then do not do the search and set all references Init
to the same value.
Optimizations
These may not be done in the first implementation but could accelerate downloads a lot.
Fast path for first download
This is already done, here for completion. If the siva file is new (no commits) then rename the references and copy the repository as is inside the siva.
Fast path for updates
This only works if we already know the init where the repositoriy will be located.
A second optimization can use use a layer on top of the repository to do the translation of reference names when fetching it and do it directly over the siva file. This way the packfile that is downloaded is smaller and we don't need to do a push, it is written as is. This layer should be go-borges
.