Rollback via UI is slow where it doesn't have to be?

# Summary

**Background:**
In our current setup, some of our kustomize-based apps take a long time to generate the manifests.
E.g. our heaviest app `kustomize build` takes 30s locally and up to 1m when executed in reposerver.

We're okay with being slow to deploy _new_ changes (because we use heavy KRM functions with kustomize).

**Problem:**
However, _rollbacks_ via the UI also take a long time (at least 2x longer), which is a problem for us.

This is strange because no new information is involved in rolling back to a previously experienced revision, so I'd presume reposerver should cache hit everything from redis.

Upon investigating, it turns out the main bit of rollback itself does indeed hit cache.
But two other steps involved in rollback via UI each lead to cache miss and trigger two full re-generation of manifests in reposerver, making the whole rollback process slow.

**Investigation:**
Process of a rollback via UI can be summarized as:
1. User clicks and opens "HISTORY AND ROLLBACK".
2. User browses and chooses the revision from the history list to rollback to.
   Two requests are made for each browsed revision:
   - `GET revision metadata`: gives info about revision commit author and commit message
   - `POST appdetails`: returns info including list of images
3. User clicks "Rollback" and accepts to disable auto-sync in order to proceed
   In this process, couple of requests are made which include:
   - `PUT application`: to disable auto-sync
   - `POST rollback`: to initiate the rollback itself

Cache miss (and thus regeneration of manifests) occurs in `POST appdetails` and `PUT application`.

- `PUT application`: Cache miss occurs at [verifyGenerateManifests()](https://github.com/argoproj/argo-cd/blob/a12f517ebd00627efba5561192adeadcfa773e74/util/argo/argo.go#L714), because the code always uses the latest commit of the repo as cache key, and commits stack up very frequently in a typical gitops monorepo setup: https://github.com/argoproj/argo-cd/blob/a12f517ebd00627efba5561192adeadcfa773e74/util/argo/argo.go#L814-L816

- `POST appdetails`: Cache miss occurs because appdetails info incorporates data from generated manifests, but doesn't reuse the cache already set before by GenerateManifest().

# Motivation

I think code can be improved so that the two cache misses above be avoided.
This would allow us to respond much more quickly to a faulty deployment by being able to rollback more quickly.

# Proposal

First, cache miss in `PUT application` occurs because the cache key changes too frequently.

But I think it can be avoided by just [specifying `?validate=false` as query params to the PUT request](https://github.com/dlemfh/argo-cd/commit/02188377f035c2cb5af496c7c68df6487f4ee6c2) at the frontend, thus skipping the call to verifyGenerateManifests() entirely.

There's no need to verify/validate when all we're doing is disabling auto-sync. This behavior would be consistent with the fact that the frontend already adds `?validate=false` to PUT requests triggered via the "ENABLE AUTO-SYNC" and "DISABLE AUTO-SYNC" buttons in the application details page.

Next, for the case of `POST appdetails`, I think code in [populateKustomizeAppDetails()](https://github.com/argoproj/argo-cd/blob/a12f517ebd00627efba5561192adeadcfa773e74/reposerver/repository/repository.go#L2191) can be improved to get manifest data from redis cache instead of calling Build().

But I do have to say I'm not too confident on my understanding of this part of the code, so I'm hoping someone could cross-check this or suggest a better option maybe.

Thanks!! 🤗 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rollback via UI is slow where it doesn't have to be? #22898

Summary

Motivation

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// Only check whether we can access the application's path,
	// and not whether it actually contains any manifests.
	_, err = repoClient.GenerateManifest(ctx, &req)

Rollback via UI is slow where it doesn't have to be? #22898

Description

Summary

Motivation

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions