Skip to content

Conversation

Harsh9485
Copy link
Contributor

👋 Hey team,

Submitting my solution for the Temporal worker connectivity failure detection bounty. This implementation provides a minimal and reproducible test setup, along with a CRE-format rule to detect a worker connection failure when the Temporal server is unreachable.


📋 Details

/claim #39

Reproducible test setup (Maintainers invited): Temporl
Live CRE link: CRE Playground
Log file: worker.log


🎥 Video Demonstration

A complete test demo showing worker failing to connect while server is down and CRE rule triggering:
TemporalWorker.RefusedConnection.mp4

Screen.Recording.2025-06-05.154406.mp4

🚨 What This Detects

This rule captures a critical failure mode in self-hosted Temporal environments:

  • When a worker starts up but the server is unreachable (crashed, stopped, or not ready), the gRPC connection is refused.
  • This results in the worker silently failing to perform workflows, blocking business operations.

🧪 Reproduction Process

Run the following steps to simulate the issue:

# Start Temporal via Docker Compose
docker compose up -d

# Stop the Temporal server
docker compose stop temporal

# Run or restart the worker
go run worker/main.go  # Or your binary

# (Optional) Start the server again after a few seconds
docker compose start temporal

# View the log
cat worker_test.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants