-
Notifications
You must be signed in to change notification settings - Fork 541
Description
Description:
A colleage and I found that a subtle mistake in a single BackendTrafficPolicy
can make envoy proxy instances return 404's for ALL routes.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: hellogo
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: hello
retry:
numRetries: 1
perRetry:
backOff:
baseInterval: 0s # 0s breaks everything, 1s is ok
Repro steps:
Create a BackendTrafficPolicy as shown above. Nothing stops a developer setting baseInterval: 0s
.
At first, nothing is wrong. Then, if you restart envoy proxies, you'll find ALL httproutes return 404s immediately. Logs show route_not_found
for all requests but no mention of why or which resources causs this. Inspecting the raw envoy config via the admin portal, the dynamic_route_configs
section is never generated (usually its populated).
To find the offending resource, we had to delete resources until discovering the problematic thing was this one BackendTrafficPolicy
and this one value within it. Pretty scary to us. Questions:
- What values should be allowed in
baseInterval
? - What validations can be done to stop misconfigurations like this?
- Supposing there are other (perhaps future) resource misconfigs / validation issues, how can those be scoped to avoid breaking all routes?
- How can a user identify the problematic resources, either in envoy-gateway or the envoy proxy? Here we had to guess and test
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Environment:
Include the environment like gateway version, envoy version and so on.
envoy-gateway: v1.2.5
Logs:
Include the access logs and the Envoy logs.