-
Notifications
You must be signed in to change notification settings - Fork 505
Description
The problem is when there is any setup where the linkerd is acting as a grpc proxy and where some requests are getting canceled. It does not matter how its meshed, whether its using daemonset transformer, namerd for resolution or static dtabs. The essence is that when we have the setup of [strest-grpc client] -----> [linkerd] -----> [strest-grps-server]
and the client is considered to intentially fails some streams to exercise this codepath, we end up with a memory leak in linkerd that causes it to crash.
The problem is actually not in linkerd but in stress-grpc and namely in the underlying grpc-go library that has a pretty annoying issue that was fixed an year ago. So if you simply bump the image of stress-grpc from 0.0.5 to latest, this will not happen and the leak will dissapear.
Now whats happening is that the bug in grpc-go was causing the stream to hang in HALF_CLOSED mode because a RST_STREAM frame was not sent to the server when an END_STREAM was received form the client. The net effect of that is that the rcvMsg
was never being failed and was left hanging causing an accumulation of data frames and a bunch of other stream bookeeping state. You can read a bit more about the situation right here grpc/grpc-go#2354
yaml file reproducing it:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: l5d-config7
data:
config.yaml: |-
admin:
ip: 0.0.0.0
port: 9997
routers:
- protocol: h2
experimental: true
label: outgoing
dtab: /svc/* => /$/inet/strest-server7.test.svc.cluster.local/7777;
identifier:
kind: io.l5d.header.path
segments: 2
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: test
port: incoming
service: l5d7
servers:
- port: 5757
ip: 0.0.0.0
- protocol: h2
experimental: true
label: incoming
dtab: /svc/* => /$/inet/strest-server7.test.svc.cluster.local/7777;
identifier:
kind: io.l5d.header.path
segments: 2
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.localnode
servers:
- port: 6767
ip: 0.0.0.0
telemetry:
- kind: io.l5d.prometheus
usage:
orgId: integration-test-7
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: l5d7
name: l5d7
spec:
template:
metadata:
labels:
app: l5d7
testrun: test7
spec:
volumes:
- name: l5d-config7
configMap:
name: "l5d-config7"
containers:
- name: l5d
image: buoyantio/linkerd:1.6.4-rc1
env:
- name: JVM_HEAP_MIN
value: 384M
- name: JVM_HEAP_MAX
value: 384M
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- /io.buoyant/linkerd/config/config.yaml
ports:
- name: outgoing
containerPort: 5757
hostPort: 5757
- name: incoming
containerPort: 6767
- name: admin
containerPort: 9997
- name: debug
containerPort: 8849
hostPort: 8849
volumeMounts:
- name: "l5d-config7"
mountPath: "/io.buoyant/linkerd/config"
readOnly: true
- name: kubectl
image: buoyantio/kubectl:v1.8.5
args: ["proxy", "-p", "8001"]
---
apiVersion: v1
kind: Service
metadata:
name: l5d7
spec:
selector:
app: l5d7
type: LoadBalancer
ports:
- name: outgoing
port: 5757
- name: incoming
port: 6767
- name: admin
port: 9997
- name: debug
port: 8849
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: strest-server7
spec:
replicas: 3
template:
metadata:
labels:
app: strest-server7
testrun: test7
spec:
dnsPolicy: ClusterFirst
containers:
- name: service
image: buoyantio/strest-grpc:0.0.5
command:
- "/go/bin/strest-grpc"
- "server"
- "--address=0.0.0.0:7777"
- "--metricAddr=0.0.0.0:9997"
ports:
- name: grpc
containerPort: 7777
- name: strest-server
containerPort: 9997
---
apiVersion: v1
kind: Service
metadata:
name: strest-server7
spec:
selector:
app: strest-server7
clusterIP: None
ports:
- name: grpc
port: 7777
---
apiVersion: batch/v1
kind: Job
metadata:
name: strest-client7
spec:
template:
metadata:
name: strest-client7
labels:
testrun: test7
spec:
containers:
- name: strest-client
image: buoyantio/strest-grpc:0.0.5
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
command:
- "/bin/sh"
args:
- "-c"
- |
sleep 30 # wait for pods to start
/go/bin/strest-grpc client \
--address $HOST_IP:5757 \
--connections 25 \
--streams 1 \
--interval 30s \
--streaming \
--errorRate 0.5 \
--metricAddr 0.0.0.0:9997 \
--latencyPercentiles 0=0,100=500
ports:
- name: strest-client
containerPort: 9997
restartPolicy: OnFailure
yaml file that does not exhibit the issue:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: l5d-config7
data:
config.yaml: |-
admin:
ip: 0.0.0.0
port: 9997
routers:
- protocol: h2
experimental: true
label: outgoing
dtab: /svc/* => /$/inet/strest-server7.test.svc.cluster.local/7777;
identifier:
kind: io.l5d.header.path
segments: 2
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: test
port: incoming
service: l5d7
servers:
- port: 5757
ip: 0.0.0.0
- protocol: h2
experimental: true
label: incoming
dtab: /svc/* => /$/inet/strest-server7.test.svc.cluster.local/7777;
identifier:
kind: io.l5d.header.path
segments: 2
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.localnode
servers:
- port: 6767
ip: 0.0.0.0
telemetry:
- kind: io.l5d.prometheus
usage:
orgId: integration-test-7
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: l5d7
name: l5d7
spec:
template:
metadata:
labels:
app: l5d7
testrun: test7
spec:
volumes:
- name: l5d-config7
configMap:
name: "l5d-config7"
containers:
- name: l5d
image: buoyantio/linkerd:1.6.4-rc1
env:
- name: JVM_HEAP_MIN
value: 384M
- name: JVM_HEAP_MAX
value: 384M
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- /io.buoyant/linkerd/config/config.yaml
ports:
- name: outgoing
containerPort: 5757
hostPort: 5757
- name: incoming
containerPort: 6767
- name: admin
containerPort: 9997
- name: debug
containerPort: 8849
hostPort: 8849
volumeMounts:
- name: "l5d-config7"
mountPath: "/io.buoyant/linkerd/config"
readOnly: true
- name: kubectl
image: buoyantio/kubectl:v1.8.5
args: ["proxy", "-p", "8001"]
---
apiVersion: v1
kind: Service
metadata:
name: l5d7
spec:
selector:
app: l5d7
type: LoadBalancer
ports:
- name: outgoing
port: 5757
- name: incoming
port: 6767
- name: admin
port: 9997
- name: debug
port: 8849
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: strest-server7
spec:
replicas: 3
template:
metadata:
labels:
app: strest-server7
testrun: test7
spec:
dnsPolicy: ClusterFirst
containers:
- name: service
image: buoyantio/strest-grpc:latest
args:
- "server"
- "--address=0.0.0.0:7777"
- "--metricAddr=0.0.0.0:9997"
ports:
- name: grpc
containerPort: 7777
- name: strest-server
containerPort: 9997
---
apiVersion: v1
kind: Service
metadata:
name: strest-server7
spec:
selector:
app: strest-server7
clusterIP: None
ports:
- name: grpc
port: 7777
---
apiVersion: batch/v1
kind: Job
metadata:
name: strest-client7
spec:
template:
metadata:
name: strest-client7
labels:
testrun: test7
spec:
containers:
- name: strest-client
image: buoyantio/strest-grpc:latest
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
command:
- "/bin/sh"
args:
- "-c"
- |
sleep 30 # wait for pods to start
/strest-grpc/strest-grpc client \
--address=$HOST_IP:5757 \
--connections=25 \
--streams=1 \
--interval=30s \
--streaming \
--errorRate=0.5 \
--metricAddr=0.0.0.0:9997
ports:
- name: strest-client
containerPort: 9997
restartPolicy: OnFailure