Skip to content

Conversation

jimklimov
Copy link
Contributor

@jimklimov jimklimov commented Dec 7, 2020

JENKINS-64383 - combined refrepo became our bottleneck

As detailed in the JIRA issue, our heavy use of a single combined reference repository made it more a bottleneck and cause of job timeouts than a speedup and reliability improvement which it once was. This PR explores a way to keep the single point of configuration of the reference repository directory, suffixed with some "magic variable" to substitute a path to subdirectory with a smaller-scope reference repository for a particular source Git URL. On file systems with symlinks it is possible to maintain several such names that would point to the same directory, for closely-related repositories or different URLs of the same repository.

This PoC introduces trivial support for reference repository paths ending with /${GIT_URL} to replace by url => funny dir subtree in filesystem. Its limitation at the moment is that the URL is pasted in verbatim - this works for Linux and Unix like systems that only forbid a 0x00 and a slash from being characters in a filename, and slash suits us as a directory subtree separator. This code likely won't run on Windows as is (colon in https: and likely other chars - Microsoft has an extensive list of invalid chars).

The next ideas, commented but not yet PoCed, are to either escape such characters (non-ASCII and offensive to at least one popular filesystem), or convert URLs into base64 strings or sha/md5/... hashes. Using submodules and finding a way to map several URLs to a certain submodule might be a good idea if they keep indexes separately. This all can be built on top of this PoCed code by introducing further suffixes and handling for them.

It was tested on a MultiBranch pipeline job, where an original definition of the reference repository was suffixed with the new magic string, yielding /home/abuild/jenkins-gitcache/${GIT_URL} (verbatim in "Advanced clone behaviours"). During the checkout into a wiped workspace, with this plugin variant installed:

Cloning the remote Git repository
Cloning repository https://github.com/zeromq/czmq.git
 > git init /dev/shm/jenkins-swarm-client/workspace/CZMQ-upstream_master # timeout=10
[WARNING] Parameterized reference path replaced with: /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
Using reference repository: /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
Fetching upstream changes from https://github.com/zeromq/czmq.git
 > git --version # timeout=10
 > git --version # 'git version 2.1.4'
 > git fetch --tags --progress https://github.com/zeromq/czmq.git +refs/heads/*:refs/remotes/origin/* # timeout=40

Avoid second fetch
Checking out Revision fbe313cd2010bace7833fe52d419f82282343bd9 (master)

 > git config remote.origin.url https://github.com/zeromq/czmq.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config core.sparsecheckout # timeout=10
 > git checkout -f fbe313cd2010bace7833fe52d419f82282343bd9 # timeout=10

Commit message: "Merge pull request #2139 from bluca/ci_failures"
 > git rev-list --no-walk fbe313cd2010bace7833fe52d419f82282343bd9 # timeout=10

This completed quickly, much faster than the usual checkout with huge refrepo in original /home/abuild/jenkins-gitcache/, and did automatically find the "funny" /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git directory prepared with the single repo's reference cache:

# ls -la /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
total 38
drwxr-xr-x 7 4294967294 4294967294   12 Dec  7 19:31 .
drwxr-xr-x 3 4294967294 4294967294    3 Dec  7 19:29 ..
-rw-r--r-- 1 4294967294 4294967294 2353 Dec  7 19:31 FETCH_HEAD
-rw-r--r-- 1 4294967294 4294967294   23 Dec  7 19:30 HEAD
drwxr-xr-x 2 4294967294 4294967294    2 Dec  7 19:30 branches
-rwxr--r-- 1 4294967294 4294967294  204 Dec  7 19:30 config
-rw-r--r-- 1 4294967294 4294967294   73 Dec  7 19:30 description
drwxr-xr-x 2 4294967294 4294967294   11 Dec  7 19:30 hooks
drwxr-xr-x 2 4294967294 4294967294    3 Dec  7 19:30 info
drwxr-xr-x 4 4294967294 4294967294    4 Dec  7 19:30 objects
drwxr-xr-x 5 4294967294 4294967294    5 Dec  7 19:31 refs
lrwxrwxrwx 1 4294967294 4294967294   43 Dec  7 19:30 register-git-cache.sh -> /mnt/jenkins-gitcache/register-git-cache.sh

DOCS NOTE: With 2.36.x and newer Git versions, if your reference repository maintenance script runs as a different user account than the Jenkins server (or Jenkins agent), safety checks about safe.directory (see https://github.blog/2022-04-18-highlights-from-git-2-36/) can be disabled by configuring each such user account:

:; git config --global --add safe.directory '*'

UPDATE: My repository at https://github.com/jimklimov/git-refrepo-scripts provides the shell scripts and Jenkinsfile jobs I use to maintain the servers using this modification of the Git Client plugin. One of the jobs there allows to automatically discover and register Git repositories used by known builds on the server it runs on (might run daily or so), and another can run more regularly to update the known refrepos.

@jimklimov jimklimov force-pushed the refrepo-args branch 2 times, most recently from 191ca38 to 694ee90 Compare December 8, 2020 12:34
@jimklimov jimklimov marked this pull request as draft December 10, 2020 01:43
@MarkEWaite MarkEWaite added the enhancement Improvement or new feature label Dec 13, 2020
…RL} to replace by url => funny dir subtree in filesystem
…atibleGitAPIImpl.java so its logic (expected to grow in complexity) can be shared by both JGitAPIImpl.java and CliGitAPIImpl.java
…d ref-repos in submodule checkouts (only CliGitAPIImpl.java has it)
…(): do not bother normalizing the URL if the string is not with supported suffix
…intsToLocal*Mirror() with custom paths and bare vs workspace repos
…refactor getObjectPath(referencePath) to check on git dirs elsewhere later
…erenceRepository() and isParameterizedReferenceRepository() taking a File reference (not only a String) object
… keep original reference intact, and as indicator to recreate referencePath object once for many cases
…256_FALLBACK suffixes for using unsuffixed directory if expanded path points nowhere useful
Promulgates inconsistent coding style and breaks "logical" markup
of messages defined on multiple lines (like "item " + javavar) which
it fails to keep together, etc.

Unavoidable evil I guess, but hopefully someone can configure it.
@MarkEWaite MarkEWaite requested a review from a team as a code owner September 13, 2023 12:55
@github-actions github-actions bot added the tests Automated test addition or improvement label Jan 17, 2024
MarkEWaite and others added 11 commits March 8, 2024 20:59
Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…refrepo-args

Signed-off-by: Evgeny Klimov <klimov@provys.com>
…refrepo-args

Signed-off-by: Evgeny Klimov <klimov@provys.com>
Signed-off-by: Evgeny Klimov <klimov@provys.com>
Signed-off-by: Evgeny Klimov <klimov@provys.com>
Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…refrepo-args

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
… mvn:spotless complaint fixes

Signed-off-by: Evgeny Klimov <klimov@provys.com>
…comment from upstream changes

Signed-off-by: Evgeny Klimov <klimov@provys.com>
…duplicate comment from upstream changes

Signed-off-by: Evgeny Klimov <klimov@provys.com>
[maven-release-plugin] copy for tag git-client-6.2.0

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…epo-args

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…epo-args

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement or new feature ShortTerm Short term improvements tests Automated test addition or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants