Skip to content

Conversation

ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Feb 4, 2025

What changed?

Versioning 3.1 APIs:

  • ListWorkerDeployments
  • DescribeWorkerDeployment
  • DescribeWorkerDeploymentVersion
  • SetWorkerDeploymentCurrentVersion
  • SetWorkerDeploymentRampingVersion
  • UpdateWorkerVersionMetadata
  • DeleteWorkerDeployment
  • DeleteWorkerDeploymentVersion

Why?

More functionality and improving some existing ones based on user feedback.

How did you test it?

Added unit tests and functional tests, but testing is not done for the new APIs, more to come.

Potential risks

None identified.

Documentation

None yet.

Is hotfix candidate?

No.

@ShahabT ShahabT requested a review from a team as a code owner February 4, 2025 04:52
@ShahabT ShahabT force-pushed the versioning-3.1-merge branch 4 times, most recently from 001bfbe to 49a0c78 Compare February 4, 2025 05:52
@ShahabT ShahabT force-pushed the versioning-3.1-merge branch 2 times, most recently from 83b651c to 214cd5e Compare February 4, 2025 06:39
@ShahabT ShahabT force-pushed the versioning-3.1-merge branch 6 times, most recently from 122d338 to 9140a68 Compare February 6, 2025 17:17
@@ -36,6 +36,8 @@ frontend.workerVersioningDataAPIs:
- value: true
frontend.workerVersioningWorkflowAPIs:
- value: true
system.enableDeployments:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want system.enableDeploymentVersions also true here?

Suggested change
system.enableDeployments:
system.enableDeploymentVersions:
- value: true
system.enableDeployments:

go.mod Outdated
@@ -57,7 +57,7 @@ require (
go.opentelemetry.io/otel/sdk v1.34.0
go.opentelemetry.io/otel/sdk/metric v1.34.0
go.opentelemetry.io/otel/trace v1.34.0
go.temporal.io/api v1.43.3-0.20250204234658-ebd7fd7e73dd
go.temporal.io/api v1.43.2-0.20250205220223-9fa2feca17ff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update when API merges?

@ShahabT ShahabT force-pushed the versioning-3.1-merge branch 2 times, most recently from 23fa945 to 7614673 Compare February 6, 2025 22:53
ShahabT added a commit to temporalio/api that referenced this pull request Feb 6, 2025
_**READ BEFORE MERGING:** All PRs require approval by both Server AND
SDK teams before merging! This is why the number of required approvals
is "2" and not "1"--two reviewers from the same team is NOT sufficient.
If your PR is not approved by someone in BOTH teams, it may be summarily
reverted._

<!-- Describe what has changed in this PR -->
**What changed?**
- Renamed `DeploymentSeries` -> `WorkerDeployment` and `Deployment` ->
`WorkerDeploymentVersion`.
- Added the following experimental APIs:
  - ListWorkerDeployments
  - DescribeWorkerDeployment
  - DescribeWorkerDeploymentVersion
  - SetWorkerDeploymentCurrentVersion
  - SetWorkerDeploymentRampingVersion
  - UpdateWorkerVersionMetadata
  - DeleteWorkerDeployment
  - DeleteWorkerDeploymentVersion
- Deprecated the following experimental APIs:
  - DescribeDeployment
  - ListDeployments
  - GetDeploymentReachability
  - GetCurrentDeployment
  - SetCurrentDeployment

<!-- Tell your future self why have you made these changes -->
**Why?**
These are changes and improvements we made based on user feedback from
the previous iteration.

<!-- Are there any breaking changes on binary or code level? -->
**Breaking changes**
No breaking changes, the old APIs still work until cleaned up at a later
point with all other old Versioning APIs.

<!-- If this breaks the Server, please provide the Server PR to merge
right after this PR was merged. -->
**Server PR**
temporalio/temporal#7233

---------

Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
Co-authored-by: Shahab Tajik <shahab@temporal.io>
Co-authored-by: ShahabT <shahab.tajik@temporal.io>
temporal-cicd bot pushed a commit to temporalio/api-go that referenced this pull request Feb 6, 2025
_**READ BEFORE MERGING:** All PRs require approval by both Server AND
SDK teams before merging! This is why the number of required approvals
is "2" and not "1"--two reviewers from the same team is NOT sufficient.
If your PR is not approved by someone in BOTH teams, it may be summarily
reverted._

<!-- Describe what has changed in this PR -->
**What changed?**
- Renamed `DeploymentSeries` -> `WorkerDeployment` and `Deployment` ->
`WorkerDeploymentVersion`.
- Added the following experimental APIs:
  - ListWorkerDeployments
  - DescribeWorkerDeployment
  - DescribeWorkerDeploymentVersion
  - SetWorkerDeploymentCurrentVersion
  - SetWorkerDeploymentRampingVersion
  - UpdateWorkerVersionMetadata
  - DeleteWorkerDeployment
  - DeleteWorkerDeploymentVersion
- Deprecated the following experimental APIs:
  - DescribeDeployment
  - ListDeployments
  - GetDeploymentReachability
  - GetCurrentDeployment
  - SetCurrentDeployment

<!-- Tell your future self why have you made these changes -->
**Why?**
These are changes and improvements we made based on user feedback from
the previous iteration.

<!-- Are there any breaking changes on binary or code level? -->
**Breaking changes**
No breaking changes, the old APIs still work until cleaned up at a later
point with all other old Versioning APIs.

<!-- If this breaks the Server, please provide the Server PR to merge
right after this PR was merged. -->
**Server PR**
temporalio/temporal#7233

---------

Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
Co-authored-by: Shahab Tajik <shahab@temporal.io>
Co-authored-by: ShahabT <shahab.tajik@temporal.io>
ShahabT and others added 27 commits February 6, 2025 15:41
## What changed?
<!-- Describe what has changed in this PR -->
- Deployment BuildId's are no longer unique across a namespace. The
combination of <DeploymentName, BuildID> will be.
- Constraints for not allowing "/" and "__" in deploymentName and
buildID respectively
- We want the APIs to leave open the possibility of using a custom
version id instead of deployment_name/build_id to set current or
ramping. Also we want to accept the "__unversioned__" string without
having to accep an "unversioned version" . To support this, refactored
APIs to accept version strings. Refactored internal code to use build id
string for build id only, and version string for fully-qualified version
string only.

## Why?
<!-- Tell your future self why have you made these changes -->
- Versioning-3.1

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Existing suite of tests 

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
## What changed?
<!-- Describe what has changed in this PR -->
Updated code to use latest API changes in which Deployment Versions are
represented by string fields rather than structs.
`WorkerDeploymentVersion` proto message still exists but only for the
internal APIs.

## Why?
<!-- Tell your future self why have you made these changes -->
See temporalio/api#547.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
Co-authored-by: Shahab Tajik <shahab@temporal.io>
Co-authored-by: Shivam Saraf <shivam.saraf@temporal.io>
Co-authored-by: ShahabT <shahab.tajik@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->
- added poller presence (called MissingTaskQueue in code) checks when a
version starts to ramp or wants to be set as current
- happy path tests for delete version have been added
- happy path tests for poller presence checks have been added

## Why?
<!-- Tell your future self why have you made these changes -->
- versioning-3.1

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- NO TESTS have been added - just an initial prototype for review.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
… wf (#7240)

## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
…kQueueFamilyData (#7234)

## What changed?
1. Pass Task Queue Types to `CheckIfTaskQueuesHavePollers` activity so
that we check pollers for only the task queue types that are registered
in that version.
2. Remove `DeploymentVersionData` from the per-type map in
`TaskQueueFamilyData`. `DeploymentVersionData` stores the same info as
`VersionLocalState`, so repeating it num_task_queue times inside the
VersionLocalState was a waste of space, and kind of confusing. I spoke
with Shahab yesterday about how `TaskQueueFamilyData` should contain
information that is specific to each task queue type within that
version. So far, the only data that is specific to the version + task
queue tuple is `first_poller_time`, so I put that in a new
`TaskQueueVersionData` struct and put it in `TaskQueueFamilyData`.

## Why?
See explanation above.

## How did you test it?
Tested that the existing Versioning 3.1 functional tests still work, including DeleteVersion with poller presence.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
… wf (#7240)

## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
- The initial current version of a worker deployment is now
`"__unversioned__"` instead of `""`
- When the ramping version of a worker deployment becomes unversioned,
the deployment tells all the task queues in the current version that
they now have an unversioned ramp.
- If the unversioned ramp is promoted to current version, or if the
unversioned ramping version is unset, the deployment tells all the task
queues of the previous current version that they no longer have an
unversioned ramp.

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
- Made sure all existing WorkerDeploymentSuite and
DeploymentVersionSuite tests pass
- Wrote new tests in WorkerDeploymentSuite for the specific edge cases
that this code handles:
- `TestSetCurrentVersion_Unversioned_NoRamp`: Test that when current
version changes from versioned -> unversioned, the task queues become
unversioned
- `TestSetCurrentVersion_Unversioned_PromoteUnversionedRamp`: Test that
when the current version changes from versioned -> unversioned and
unversioned was previously ramping, the task queues become unversioned
with no more unversioned ramp
- `TestSetRampingVersion_Unversioned_UnversionedCurrent`: Test that this
fails with "Ramping version __unversioned__ is already current" error
- `TestSetRampingVersion_Unversioned_VersionedCurrent`: Test that the
ramping version of the current version's task queues becomes unversioned

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Shivam <57200924+shivs11@users.noreply.github.com>
Co-authored-by: Shahab Tajik <shahab@temporal.io>
Co-authored-by: Shivam Saraf <shivam.saraf@temporal.io>
Co-authored-by: ShahabT <shahab.tajik@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->
- Changed areas of code in the deployment client whereby we were using
update-with-start instead of update

## Why?
<!-- Tell your future self why have you made these changes -->
- We don't want users to create workflows on operations like
setCurrent/setRamping - we advice users to first create them by
registering your workers

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Bunch of tests have been modified and the suites now pass

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->
Child workflows start in parent's pinned Worker Deployment Version.

## Why?
<!-- Tell your future self why have you made these changes -->
So user do not have to worry about interface compatibility between
pinned parents and children.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
Added new tests.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
## What changed?
<!-- Describe what has changed in this PR -->
Syncing TQ User Data for all types of a single TQ name at once to reduce
the SetCurrent and SetRamping latency and flakiness of tests.

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->
- make deletes idempotent

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- added functional tests 
- also changes to functional test to make them have a stricter
consistency check

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->
…oyments excludes closed (#7267)

## What changed?
Make describe closed wf return not found and add tests

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: ShahabT <shahab.tajik@temporal.io>
Co-authored-by: Shivam Saraf <shivam.saraf@temporal.io>
## What changed?
Make updates and signals use args
Try to delete oldest to newest version if we hit limit

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
New functional test

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Shivam <57200924+shivs11@users.noreply.github.com>
Co-authored-by: Shahab Tajik <shahab@temporal.io>
Co-authored-by: Shivam Saraf <shivam.saraf@temporal.io>
Co-authored-by: ShahabT <shahab.tajik@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->
- Skip drainage

## Why?
<!-- Tell your future self why have you made these changes -->
- Versioning-3.1

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- Added a new test which passes when poller_history config is changed

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: ShahabT <shahab.tajik@temporal.io>
Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
## What changed?
<!-- Describe what has changed in this PR -->
- reset drainageInfo to nil after workflow starts accepting new
workflows again

## Why?
<!-- Tell your future self why have you made these changes -->

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
- existing suite

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: Carly de Frondeville <carly.defrondeville@temporal.io>
…7268)

## What changed?
Use MutableSideEffect to access dynamic config from inside workflows.

## Why?
To prevent non-determinism in the workflow code.
To allow the workflows to use the latest dynamic config values at all
times.

## How did you test it?
Tests are failing for other reasons, so I'm unable to test until I pull
in those changes.

## Potential risks
<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->

## Documentation
<!-- Have you made sure this change doesn't falsify anything currently
stated in `docs/`? If significant
new behavior is added, have you described that in `docs/`? -->

## Is hotfix candidate?
<!-- Is this PR a hotfix candidate or does it require a notification to
be sent to the broader community? (Yes/No) -->

---------

Co-authored-by: ShahabT <shahab.tajik@temporal.io>
@ShahabT ShahabT force-pushed the versioning-3.1-merge branch from 3073aa4 to ea3e879 Compare February 6, 2025 23:44
@ShahabT ShahabT merged commit ea3e879 into main Feb 7, 2025
50 checks passed
@ShahabT ShahabT deleted the versioning-3.1-merge branch February 7, 2025 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants