Skip to content

Conversation

alexmt
Copy link
Collaborator

@alexmt alexmt commented Oct 20, 2020

Implements first part of #4583

PR changes repository server so that it executes multiple operations that require the same repository and same commit SHA ( if the operation does not change files in the repository). This in theory should significantly improve manifest generation for mono repo with many argocd apps in it.

@alexmt alexmt force-pushed the repo-performance branch 4 times, most recently from c7936e7 to f772a9b Compare October 21, 2020 00:26
@codecov-io
Copy link

codecov-io commented Oct 21, 2020

Codecov Report

Attention: Patch coverage is 85.10638% with 14 lines in your changes missing coverage. Please review.

Project coverage is 41.52%. Comparing base (c7dbe48) to head (3bbbf62).
Report is 5331 commits behind head on master.

Files with missing lines Patch % Lines
reposerver/repository/repository.go 71.42% 9 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4625      +/-   ##
==========================================
+ Coverage   41.24%   41.52%   +0.28%     
==========================================
  Files         124      125       +1     
  Lines       16976    17063      +87     
==========================================
+ Hits         7001     7086      +85     
+ Misses       8969     8963       -6     
- Partials     1006     1014       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexmt alexmt force-pushed the repo-performance branch 2 times, most recently from 1d185c9 to d173a2c Compare October 21, 2020 00:55
@alexmt alexmt marked this pull request as draft October 21, 2020 01:12
@alexmt
Copy link
Collaborator Author

alexmt commented Oct 22, 2020

Tested on a real-life mono repository. It seems like change actually significantly improve repo-server performance (~10x faster).

Here is how I've performed testing:

Run a script that refreshes all apps using soft-refresh ( cached manifests ) and then the same script with hard refresh (force manifest regeneration). The difference between runs is how much time manifest generation took.

Environment:
40 kustomize based apps in repo. ( kustomize build takes ~ 0.3 seconds )
2 argocd-repo-server replicas

Script:

REFRESH=${TYPE:=--hard-refresh}

echo "Refresh type: $REFRESH"

for app in $(cat /tmp/apps); do
   argocd app get $REFRESH $app > /dev/null &
done

time wait

Results:


Before PR (~15 seconds)

TYPE="--refresh" /tmp/check.sh
Refresh type: --refresh
real	0m12.150s
 /tmp/check.sh                                                                                      
Refresh type: --hard-refresh
real	0m27.091s

After PR (1.2 seconds)

TYPE="--refresh" /tmp/check.sh
Refresh type: --refresh

real	0m12.344s
/tmp/check.sh
Refresh type: --hard-refresh

real	0m13.555s

@alexmt alexmt force-pushed the repo-performance branch 16 times, most recently from d3c0978 to 46dc14d Compare October 25, 2020 14:11
@alexmt alexmt marked this pull request as ready for review October 25, 2020 14:11
@alexmt alexmt requested review from jannfis and jessesuen October 25, 2020 14:11
}

// Helm based apps might downloads dependencies and cannot be process concurrently
return v1alpha1.ApplicationSourceType(appType) != v1alpha1.ApplicationSourceTypeHelm
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately helm dependency build fails if run concurrently. So helm based apps from the same revision cannot be processed in parallel. But we can process such apps in different directories will work on it in a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured that we can simply add a lock that prevents running concurrent "helm dependency build" in the same directory.

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not had the change to take a thorough look at the code, nor did I perform any tests yet. But from glancing over the changes, I have one concern so far (please see inline comment).

Also, I was wondering when looking forward, whether we could just create shallow clones of the repositories at the revision that is requested for each application that refers the repository, adding the name of the application to the path of the local checkout.

For example, when we have two applications app-a and app-b which both refer to https://github.com/argoproj/argocd-example-apps, repo-server would checkout two repositories at /tmp/app-a_https:__github.com_argoproj_argocd-example-apps and /tmp/app-b_https:__github.com_argoproj_argocd-example-apps instead of just /tmp/https:__github.com_argoproj_argocd-example-apps.

This would not have to introduce any locking, but at the expense of more temporary disk space required. Performing a shallow fetch could keep the expense low, tho. For really large mono repos, we could introduce something like a weighted semaphore to limit concurrent checkouts per repo-server instance to a configurable value, if temporary disk space on the node is sparse. But it would have the nice effect to also allow Kustomize and (J|K)Sonnet sources in a concurrent way.

However, I'm not sure about the rational of the checkout and general repo handling design, so it might well be seen I've overlooked something that was already discussed in the past and has led to the implementaiton we currently have.

}

// allowConcurrent returns true if given application source can be processed concurrently
func allowConcurrent(source *v1alpha1.ApplicationSource) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be somewhat dangerous to test here & this way for whether a source allows concurrent repository processing.

Reason being, we might want to add further source options later on and this is a hard-to-catch dependency. Maybe we should move this to pkg/apis/application/v1alpha1/types.go and make it a receiver function in the various source types, so it becomes more obvious and easier for people writing code and people reviewing it. For example in some pseudoish-code:

func (k *ApplicationSourceKustomize) AllowsConcurrentProcessing() bool {
  return len(k.Images) == 0 &&
    len(k.CommonLabels) == 0 &&
    k.NamePrefix == "" &&
    k.NameSuffix == ""
}

func (k *ApplicationSourceKsonnet) AllowsConcurrentProcessing() bool {
   return ...
}

func (a *ApplicationSource) AllowsConcurrentProcessing() bool {
   switch {
     case a.Kustomize != nil:
        return a.Kustomize.AllowsConcurrentProcessing()
      ...
   }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Implemented the change

@alexmt
Copy link
Collaborator Author

alexmt commented Oct 26, 2020

Thank you for review @jannfis . Implemented your suggestion about moving AllowsConcurrentProcessing to types.go closer to structure definition.

Cloning the same repository multiple times and some kind of optimization like shallow clone was the first idea. I've decided to consider this approach first after evaluating our own use case. We have a repository with ~200+ applications (1 addon per cluster) . Single commit to the repository invalidates manifests cache of all applications and after receiving webhook notification controller fires 200 manifest generation requests. Cloning the same repository 200 times definitely won't work. As you've mentioned we might come up with some limit per repo clone and run several experiments to find the right number of clones. I'm afraid it won't be that easy and we will need to deal with something like Github/Gitlab rate-limiting, not enough network bandwidth etc.

On the other hand, the optimization in this PR already provides the best possible combination: 1 repo clone handles all 200 requests. The optimization doesn't require any significant redesign/code changes so it is really easy to try it. There is a chance that processing of the same commit in parallel is enough. Otherwise, we can try multiple repository clones.

What do you think?

@jannfis
Copy link
Member

jannfis commented Oct 26, 2020

I agree that your implementation is the most reasonable and efficient one for plain YAML manifests, and I really do like the approach of re-using the same repository for the same commit SHA to generate the manifests for reconciliation.

However, I was thinking about solving the use-case of more dynamically generated resources, i.e. when using any of the generators such as Kustomize that modify the source to some extent or when there are simultaneous commits to multiple tracking branches. But imo, your PR is solving >80% of use-cases, so I think we might think about solutions for the remaining 20% and implement them later, extending your work in this PR if possible. Maybe instead of cloning the repository multiple times, we can just recursively copy the initial clone from the Git provider on disk for applications that need to modify the source. This would save network bandwidth and reduce pressure on the Git provider (while still not considering disk space requirements tho).

But anyway, let's go with parallel use of same checkout for the time being :)

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexmt alexmt merged commit 5fdbe20 into argoproj:master Oct 26, 2020
@alexmt alexmt deleted the repo-performance branch October 26, 2020 20:36
@dudicoco
Copy link

@alexmt I want to use the concurrent processing feature along with a helmfile custom plugin within a monorepo, each app has the following config:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
spec:
  destination:
    namespace: my-ns
    server: https://kubernetes.default.svc.cluster.local
  project: default
  source:
    path: .
    repoURL: git@github.com:my-repo.git
    targetRevision: HEAD
    plugin:
      name: helmfile
  syncPolicy: {}

Since the plugin is triggered from the root of the repo (path: .), should I add the .argocd-allow-concurrency file to the root dir?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants