Dual ATC setup cannot register worker

# Configuration Assistance Request

Note, below, I'm referring to "ATC" and "TSA" interchangeably, as I'm running binaries.

When configuring two ATCs with the `--peer-url` web command option, I'm getting the following errors when a worker attempts to register or forward. 

From atc1 (and also forwarded to worker logs):
```
{"timestamp":"...","source":"tsa","message":"tsa.connection.channel.forward-worker.register.failed-to-fetch-containers","log_level":2,"data":{"error":"Get http://api/containers: dial tcp <ip-of-atc2>:38775: getsockopt: connection refused","remote":"<ip-of-proxy>:52066","session":"1.1.1.5"}}
{"timestamp":"...","source":"tsa","message":"tsa.connection.channel.forward-worker.register.failed-to-list-volumes","log_level":2,"data":{"error":"Get http://concourse-atc2:33366/volumes: dial tcp <ip-of-atc2>:33366: getsockopt: connection refused","remote":"<ip-of-proxy>:52066","session":"1.1.1.5"}}
```

## Some Facts
* Using haproxy to round-robin TSAs
* The two ATCs, the proxy, and workers live in the same private network, each on their own AWS EC2 instance
* Every `*-bind-ip` option available to the `web` and `worker` commands is set to `0.0.0.0`. Not sure if this is necessary, but I'm just trying to open everything up to get things working.
* The TSAs are listening on the default port: 2222
* As I am running several separate concourse installations through one proxy instance, differentiated by TSA port number. For example, the proxy listens on port 2224 and forwards to the default 2222 on the TSA host.
* Host names are mapped correctly according to the above error messages (which show both the host name along with the correct IP)
* When both ATCs are brought up, peered, and no worker is attempting to register/forward, no errors occur, which I think tells me that the `--peer-url` values are correct, and the ATCs have access to each other over 8080
* I can prove that atc1 has connectivity to act2, and vice versa, using `telnet` to connect and `nc` to listen on an example ephemeral port from one of the above error messages
* If relevant, I can prove the same connectivity between the proxy, ATCs/TSAs, and workers in all directions
* When I remove the `--peer-url` option and run only one ATC/TSA, the worker is able to register/forward successfully and show up in `fly workers`, which tells me that proxy config and ssh keys are correct
* When I remove the proxy from the equation entirely (and again running two ATCs/TSAs, peered), and have the `--external-url`s point at atc1 (not the proxy), and have the worker point directly at either of the TSAs, the same errors occur (except for the difference in the "remote" value). This suggests that I've done something fundamentally incorrect, as multiple ATCs/TSAs should be providing redundancy so that service is upheld when one is unavailable, but that's not happening.
* When I run without any proxy and only one ATC/TSA, the worker shows up in `fly workers`
* In between changing any concourse command options to perform any of the above tests, I am truncating the `workers` table
* The AWS security group associated with every involved EC2 instance grants connectivity over all ports and all protocols, and the subnet CIDRs to which any EC2 instance is member is listed in the source

## Relevant Configuration
```
# Concourse web #1:
--peer-url http://concourse-atc2:8080
--external-url https://<public-host-name-of-proxy>
--tsa-bind-ip 0.0.0.0

# Concourse web #2:
--peer-url http://concourse-atc1:8080
--external-url https://<public-host-name-of-proxy>
--tsa-bind-ip 0.0.0.0

# Worker:
--tsa-host <internal-host-name-of-proxy>
--tsa-port 2224 
--bind-ip 0.0.0.0
--garden-bind-ip 0.0.0.0
--baggageclaim-bind-ip 0.0.0.0

# Proxy config for TSAs:
frontend concourse-tsa *:2224
	mode tcp
	option tcplog
	default_backend concourse-tsa-pool
backend concourse-tsa-pool
	mode tcp
	option tcplog
	balance	roundrobin
	server concourse-atc1 concourse-atc1:2222 check
	server concourse-atc2 concourse-atc2:2222 check
```

## Stuff
* Concourse version: Tried 3.8.0, 3.9.1
* Deployment type (BOSH/Docker/binary): binary
* Infrastructure/IaaS: AWS
* Browser (if applicable): Chromium on Linux
* Did this used to work? Not for me

## Summary

So, given all of this, I'm lead to believe that I've provided adequate connectivity between these concourse parts such that the ATCs/TSAs should be able to talk to each other and allow successful worker registration/forwarding. My wild-guess theory is that I've somehow not configured the ATCs/TSAs correctly such that one ATC/TSA is not listening on the ephemeral port for querying/manipulating containers and volumes when the other is trying to connect. Or, I've gotten confused from a number of `id10t` errors.

I'd appreciate some guidance. What might I have done wrong given these clues?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dual ATC setup cannot register worker #2085

Configuration Assistance Request

Some Facts

Relevant Configuration

Stuff

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dual ATC setup cannot register worker #2085

Description

Configuration Assistance Request

Some Facts

Relevant Configuration

Stuff

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions