Skip to content

Conversation

hzxuzhonghu
Copy link
Member

@hzxuzhonghu hzxuzhonghu commented Oct 31, 2020

This is based on #28363, and with other fixes: adjust listener filters order, route local net, added an integration test

Fix: #5679
For more details: #28363 (comment)

RFC: https://docs.google.com/document/d/1fd9XWe755XtwNJCxpQe19oU1K6VR_wECB8uerWyu1xQ/edit?usp=sharing

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Oct 31, 2020
@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Oct 31, 2020
@istio-testing istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2020
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Oct 31, 2020
@hzxuzhonghu
Copy link
Member Author

/test all

@hzxuzhonghu hzxuzhonghu changed the title Original src ip Preserve original src ip Nov 2, 2020
@hzxuzhonghu hzxuzhonghu marked this pull request as ready for review November 2, 2020 03:35
@hzxuzhonghu hzxuzhonghu requested review from a team as code owners November 2, 2020 03:35
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Nov 2, 2020
@hzxuzhonghu hzxuzhonghu requested a review from a team as a code owner November 2, 2020 03:55
@hzxuzhonghu
Copy link
Member Author

cc @gmemcc @rlenglet @howardjohn

@hzxuzhonghu
Copy link
Member Author

/retest

@hzxuzhonghu
Copy link
Member Author

/retest

@google-cla google-cla bot added cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. and removed cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. labels Dec 11, 2020
@hzxuzhonghu
Copy link
Member Author

@howardjohn can you take a look

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than iptables changes

@@ -432,9 +413,10 @@ func (iptConfigurator *IptablesConfigurator) run() {
// Avoid infinite loops. Don't redirect Envoy traffic directly back to
// Envoy for non-loopback traffic.
iptConfigurator.iptables.AppendRuleV4(constants.ISTIOOUTPUT, constants.NAT, "-m", "owner", "--uid-owner", uid, "-j", constants.RETURN)
}
} else {
// TPROXY uses GID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is safe, we may use GID in iptables mode as well. Also, why do we not need split anymore, I think it can be a list?
cc @JimmyCYJ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added GID matching in the first place because it was required for TPROXY. It's not used for REDIRECT AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think https://www.hyrumslaw.com/ may be in play, I am fairly sure @JimmyCYJ is depend on this. Which its fine to say "don't do that" and I agree, but we should be explicit about doing it with release note, etc

@@ -104,9 +104,6 @@ func TestHandleInboundIpv6RulesWithEmptyInboundPorts(t *testing.T) {
"ip6tables -t nat -A ISTIO_OUTPUT -o lo ! -d ::1/128 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT",
"ip6tables -t nat -A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN",
"ip6tables -t nat -A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN",
"ip6tables -t nat -A ISTIO_OUTPUT -o lo ! -d ::1/128 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to not mess with iptables rules - just change TPROXY stuff here

"iptables -t nat -A ISTIO_OUTPUT -m owner --gid-owner 2 -j RETURN",
"iptables -t nat -A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 3,4 -j ISTIO_IN_REDIRECT",
"iptables -t nat -A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 3,4 -j RETURN",
"iptables -t nat -A ISTIO_OUTPUT -m owner --uid-owner 3,4 -j RETURN",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--uid-owner 3,4 is this valid syntax?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not.

ISTIOREDIRECT = "ISTIO_REDIRECT"
ISTIOINREDIRECT = "ISTIO_IN_REDIRECT"
ISTIOOUTPUT = "ISTIO_OUTPUT"
ISTIOPROXYOUTPUT = "ISTIO_PROXY_OUTPUT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is for tproxy only should it be ISTIO_TPROXY_OUTPUT?

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Dec 11, 2020
@hzxuzhonghu
Copy link
Member Author

Agree that mess everything up is not very friendly to review. I will revert the iptables refactor and some other unrelated changes. Only keep thoses changes related to original src ip preserve.

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Dec 13, 2020
@hzxuzhonghu
Copy link
Member Author

/retest

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for pushing this through and all the improvements you made

@hi-usui
Copy link

hi-usui commented Dec 14, 2020

/retest

@istio-testing istio-testing merged commit 5ea614d into istio:master Dec 14, 2020
@hzxuzhonghu
Copy link
Member Author

Thanks everyone, this is a big improvement.

@hzxuzhonghu hzxuzhonghu deleted the original-src-ip branch December 14, 2020 03:21
@hi-usui
Copy link

hi-usui commented Dec 14, 2020

@hzxuzhonghu
I am unble to get any response in an application pod. How do you manually test this / what is your config? Previously, cURL would work and have source IP as 127.0.0.1, but now cURL results in curl: (56) Recv failure: Connection reset by peer. I think my setup is fine, but I am not sure.

No Istio VirtualService no gateway, no ingress, no egress

Minimal TCP app

apiVersion: v1
kind: Service
metadata:
  name: tcp
spec:
  type: ClusterIP
  selector:
    component: tcp
  ports:
    - name: tcp
      port: 10001
      targetPort: 3333

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcp
  namespace: default
spec:
  selector:
    matchLabels:
      component: tcp
  replicas: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/interceptionMode: TPROXY
        sidecar.istio.io/inject: "true"
        sidecar.istio.io/logLevel: trace
      labels:
        component: tcp
    spec:
      containers:
        - name: tcp
          image: n0r1skcom/echo
          ports:
            - containerPort: 3333
              protocol: TCP

Istio setup

git clone --single-branch --branch master https://github.com/istio/istio.git
cd istio; git checkout 5ea614d;
kubectl create ns istio
helm -n istio install istio-base manifests/charts/base --set global.hub="gcr.io/istio-testing",global.tag="1.9-alpha.5ea614d48ee3887884922789985f6cef88346ecf",global.istioNamespace=istio
helm -n istio install istiod manifests/charts/istio-control/istio-discovery --set global.hub="gcr.io/istio-testing",global.tag="1.9-alpha.5ea614d48ee3887884922789985f6cef88346ecf",global.istioNamespace=istio,meshConfig.rootNamespace=istio
kubectl label namespace default istio-injection=enabled --overwrite

IP info

root@hozer-55:~# k get po -o wide
NAME                                                    READY   STATUS      RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
tcp-595b7579c7-svfmm                                    2/2     Running     0          59m     10.0.0.161   hozer-55   <none>           <none>


root@hozer-55:~# k get svc -o wide
NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                             AGE     SELECTOR
tcp                                                 ClusterIP   10.104.253.221   <none>        10001/TCP                           2d9h    component=tcp

root@hozer-55:~# k describe node
Name:               hozer-55
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=hozer-55
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
Annotations:        csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"hozer-55","rook-ceph.rbd.csi.ceph.com":"hozer-55"}
                    io.cilium.network.ipv4-cilium-host: 10.0.0.93
...

enter pod namespace

export ID=ec1c2b2b29b7; nsenter --target $(docker inspect -f '{{ .State.Pid }}' $ID) --uts --ipc --net --pid

curl SVC IP of deployment

curl 10.104.253.221:10001

(pod namespace)
tcpdump -i any portrange 10001 or port 3333

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:44:56.837597 IP 10.0.0.93.44038 > 10.0.0.161.3333: Flags [S], seq 2655953232, win 64240, options [mss 1460,sackOK,TS val 4219136281 ecr 0,nop,wscale 7], length 0
19:44:56.837665 IP 10.0.0.161.3333 > 10.0.0.93.44038: Flags [S.], seq 1882611025, ack 2655953233, win 65160, options [mss 1460,sackOK,TS val 4017991257 ecr 4219136281,nop,wscale 7], length 0
19:44:56.838348 IP 10.0.0.93.44038 > 10.0.0.161.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 4219136281 ecr 4017991257], length 0
19:44:56.838546 IP 10.0.0.93.44038 > 10.0.0.161.3333: Flags [P.], seq 1:85, ack 1, win 502, options [nop,nop,TS val 4219136282 ecr 4017991257], length 84
19:44:56.838575 IP 10.0.0.161.3333 > 10.0.0.93.44038: Flags [.], ack 85, win 509, options [nop,nop,TS val 4017991258 ecr 4219136282], length 0
19:44:56.839339 IP 10.0.0.93.52873 > localhost.3333: Flags [S], seq 2795220205, win 65495, options [mss 65495,sackOK,TS val 1061837750 ecr 0,nop,wscale 7], length 0
19:44:57.846297 IP 10.0.0.93.52873 > localhost.3333: Flags [S], seq 2795220205, win 65495, options [mss 65495,sackOK,TS val 1061838757 ecr 0,nop,wscale 7], length 0
19:44:59.862213 IP 10.0.0.93.52873 > localhost.3333: Flags [S], seq 2795220205, win 65495, options [mss 65495,sackOK,TS val 1061840773 ecr 0,nop,wscale 7], length 0
19:45:03.898208 IP 10.0.0.93.52873 > localhost.3333: Flags [S], seq 2795220205, win 65495, options [mss 65495,sackOK,TS val 1061844809 ecr 0,nop,wscale 7], length 0
19:45:06.843733 IP 10.0.0.161.3333 > 10.0.0.93.44038: Flags [R.], seq 1, ack 85, win 509, options [nop,nop,TS val 4018001263 ecr 4219136282], length 0

(k8s node namespace)

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:46:46.758844 IP 10.0.0.93.46114 > 10.0.0.161.3333: Flags [S], seq 4197856034, win 64240, options [mss 1460,sackOK,TS val 4219246202 ecr 0,nop,wscale 7], length 0
19:46:46.758912 IP 10.0.0.161.3333 > 10.0.0.93.46114: Flags [S.], seq 712021803, ack 4197856035, win 65160, options [mss 1460,sackOK,TS val 4018101178 ecr 4219246202,nop,wscale 7], length 0
19:46:46.759224 IP 10.0.0.93.46114 > 10.0.0.161.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 4219246202 ecr 4018101178], length 0
19:46:46.759336 IP 10.0.0.93.46114 > 10.0.0.161.3333: Flags [P.], seq 1:85, ack 1, win 502, options [nop,nop,TS val 4219246202 ecr 4018101178], length 84
19:46:46.759351 IP 10.0.0.161.3333 > 10.0.0.93.46114: Flags [.], ack 85, win 509, options [nop,nop,TS val 4018101178 ecr 4219246202], length 0
19:46:56.767114 IP 10.0.0.161.3333 > 10.0.0.93.46114: Flags [R.], seq 1, ack 85, win 509, options [nop,nop,TS val 4018111186 ecr 4219246202], length 0

envoy-init logs

kubectl logs deploy/tcp -c istio-init
Environment:
------------
ENVOY_PORT=
INBOUND_CAPTURE_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_MARK=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_OUTBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=

Variables:
----------
PROXY_PORT=15001
PROXY_INBOUND_CAPTURE_PORT=15006
PROXY_TUNNEL_PORT=15008
PROXY_UID=1337
PROXY_GID=1337
INBOUND_INTERCEPTION_MODE=TPROXY
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=*
INBOUND_PORTS_EXCLUDE=15090,15021,15020
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
OUTBOUND_PORTS_INCLUDE=
OUTBOUND_PORTS_EXCLUDE=
KUBEVIRT_INTERFACES=
ENABLE_INBOUND_IPV6=false

ip -f inet rule add fwmark 1337 lookup 133
ip -f inet route add local default dev lo table 133
Writing following contents to rules file:  /tmp/iptables-rules-1607914838208321568.txt463547299
* nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT
* mangle
-N ISTIO_DIVERT
-N ISTIO_TPROXY
-N ISTIO_INBOUND
-A ISTIO_DIVERT -j MARK --set-mark 1337
-A ISTIO_DIVERT -j ACCEPT
-A ISTIO_TPROXY ! -d 127.0.0.1/32 -p tcp -j TPROXY --tproxy-mark 1337/0xffffffff --on-port 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m conntrack --ctstate RELATED,ESTABLISHED -j ISTIO_DIVERT
-A ISTIO_INBOUND -p tcp -j ISTIO_TPROXY
-A PREROUTING -p tcp -m mark --mark 1337 -j CONNMARK --save-mark
-A OUTPUT -p tcp -m connmark --mark 1337 -j CONNMARK --restore-mark
-I ISTIO_INBOUND 1 -p tcp -m mark --mark 1337 -j RETURN
COMMIT

iptables-restore --noflush /tmp/iptables-rules-1607914838208321568.txt463547299
Writing following contents to rules file:  /tmp/ip6tables-rules-1607914838226375591.txt502013094

ip6tables-restore --noflush /tmp/ip6tables-rules-1607914838226375591.txt502013094
iptables-save
# Generated by iptables-save v1.6.1 on Mon Dec 14 03:00:38 2020
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_DIVERT - [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_TPROXY - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A PREROUTING -p tcp -m mark --mark 0x539 -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A OUTPUT -p tcp -m connmark --mark 0x539 -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A ISTIO_DIVERT -j MARK --set-xmark 0x539/0xffffffff
-A ISTIO_DIVERT -j ACCEPT
-A ISTIO_INBOUND -p tcp -m mark --mark 0x539 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m conntrack --ctstate RELATED,ESTABLISHED -j ISTIO_DIVERT
-A ISTIO_INBOUND -p tcp -j ISTIO_TPROXY
-A ISTIO_TPROXY ! -d 127.0.0.1/32 -p tcp -j TPROXY --on-port 15006 --on-ip 0.0.0.0 --tproxy-mark 0x539/0xffffffff
COMMIT
# Completed on Mon Dec 14 03:00:38 2020
# Generated by iptables-save v1.6.1 on Mon Dec 14 03:00:38 2020
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A ISTIO_OUTPUT -s 127.0.0.6/32 -o lo -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
COMMIT
# Completed on Mon Dec 14 03:00:38 2020

iptables -t nat -S (pod namespace)

-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N ISTIO_INBOUND
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-N ISTIO_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A ISTIO_OUTPUT -s 127.0.0.6/32 -o lo -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001

iptables-save (k8s node namespace, Cilium BPF, not kube-proxy)

# Generated by iptables-save v1.8.2 on Sun Dec 13 20:23:11 2020
*raw
:PREROUTING ACCEPT [123205312:365002678667]
:OUTPUT ACCEPT [120750615:460507955533]
:CILIUM_OUTPUT_raw - [0:0]
:CILIUM_PRE_raw - [0:0]
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_raw" -j CILIUM_PRE_raw
-A OUTPUT -m comment --comment "cilium-feeder: CILIUM_OUTPUT_raw" -j CILIUM_OUTPUT_raw
-A CILIUM_OUTPUT_raw -o lxc+ -m mark --mark 0xa00/0xfffffeff -m comment --comment "cilium: NOTRACK for proxy return traffic" -j NOTRACK
-A CILIUM_OUTPUT_raw -o cilium_host -m mark --mark 0xa00/0xfffffeff -m comment --comment "cilium: NOTRACK for proxy return traffic" -j NOTRACK
-A CILIUM_PRE_raw -m mark --mark 0x200/0xf00 -m comment --comment "cilium: NOTRACK for proxy traffic" -j NOTRACK
COMMIT
# Completed on Sun Dec 13 20:23:11 2020
# Generated by iptables-save v1.8.2 on Sun Dec 13 20:23:11 2020
*mangle
:PREROUTING ACCEPT [123205309:365002678366]
:INPUT ACCEPT [116626645:362877827521]
:FORWARD ACCEPT [6039254:2102851687]
:OUTPUT ACCEPT [120750611:460507951014]
:POSTROUTING ACCEPT [126789865:462610802701]
:CILIUM_POST_mangle - [0:0]
:CILIUM_PRE_mangle - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_mangle" -j CILIUM_PRE_mangle
-A POSTROUTING -m comment --comment "cilium-feeder: CILIUM_POST_mangle" -j CILIUM_POST_mangle
-A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff
-A CILIUM_PRE_mangle -p tcp -m mark --mark 0x5f900200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 36959 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
-A CILIUM_PRE_mangle -p udp -m mark --mark 0x5f900200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 36959 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
COMMIT
# Completed on Sun Dec 13 20:23:11 2020
# Generated by iptables-save v1.8.2 on Sun Dec 13 20:23:11 2020
*nat
:PREROUTING ACCEPT [806999:41410035]
:INPUT ACCEPT [83507:8188642]
:OUTPUT ACCEPT [821046:50238493]
:POSTROUTING ACCEPT [1005149:61461988]
:CILIUM_OUTPUT_nat - [0:0]
:CILIUM_POST_nat - [0:0]
:CILIUM_PRE_nat - [0:0]
:DOCKER - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-POSTROUTING - [0:0]
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_nat" -j CILIUM_PRE_nat
-A OUTPUT -m comment --comment "cilium-feeder: CILIUM_OUTPUT_nat" -j CILIUM_OUTPUT_nat
-A POSTROUTING -m comment --comment "cilium-feeder: CILIUM_POST_nat" -j CILIUM_POST_nat
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
COMMIT
# Completed on Sun Dec 13 20:23:11 2020
# Generated by iptables-save v1.8.2 on Sun Dec 13 20:23:11 2020
*filter
:INPUT ACCEPT [116626665:362877830306]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [120750631:460507953799]
:CILIUM_FORWARD - [0:0]
:CILIUM_INPUT - [0:0]
:CILIUM_OUTPUT - [0:0]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:f2b-manual - [0:0]
-A INPUT -m comment --comment "cilium-feeder: CILIUM_INPUT" -j CILIUM_INPUT
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "cilium-feeder: CILIUM_FORWARD" -j CILIUM_FORWARD
-A OUTPUT -m comment --comment "cilium-feeder: CILIUM_OUTPUT" -j CILIUM_OUTPUT
-A OUTPUT -j KUBE-FIREWALL
-A CILIUM_FORWARD -o cilium_host -m comment --comment "cilium: any->cluster on cilium_host forward accept" -j ACCEPT
-A CILIUM_FORWARD -i cilium_host -m comment --comment "cilium: cluster->any on cilium_host forward accept (nodeport)" -j ACCEPT
-A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept" -j ACCEPT
-A CILIUM_FORWARD -i cilium_net -m comment --comment "cilium: cluster->any on cilium_net forward accept (nodeport)" -j ACCEPT
-A CILIUM_INPUT -m mark --mark 0x200/0xf00 -m comment --comment "cilium: ACCEPT for proxy traffic" -j ACCEPT
-A CILIUM_OUTPUT -m mark --mark 0xa00/0xfffffeff -m comment --comment "cilium: ACCEPT for proxy return traffic" -j ACCEPT
-A CILIUM_OUTPUT -m mark ! --mark 0xe00/0xf00 -m mark ! --mark 0xd00/0xf00 -m mark ! --mark 0xa00/0xe00 -m comment --comment "cilium: host->any mark as from host" -j MARK --set-xmark 0xc00/0xf00
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
COMMIT
# Completed on Sun Dec 13 20:23:11 2020

istio-proxy logs

curl 10.104.253.221:10001
kubectl logs deploy/tcp -c istio-proxy
2020-12-14T03:53:34.744551Z     debug   envoy filter    original_dst: New connection accepted
2020-12-14T03:53:34.744618Z     debug   envoy filter    Got a new connection in the original_src filter for address 10.0.0.93:53732. Marking with 1337
2020-12-14T03:53:34.744646Z     debug   envoy filter    tls inspector: new connection accepted
2020-12-14T03:53:34.744665Z     trace   envoy filter    tls inspector: recv: 0
2020-12-14T03:53:34.744724Z     trace   envoy filter    tls inspector: recv: 84
2020-12-14T03:53:34.744758Z     trace   envoy filter    tls inspector: done: true
2020-12-14T03:53:34.744845Z     debug   envoy filter    [C1645] new tcp proxy session
2020-12-14T03:53:34.744865Z     trace   envoy connection        [C1645] readDisable: disable=true disable_count=0 state=0 buffer_length=0
2020-12-14T03:53:34.744906Z     debug   envoy filter    [C1645] Creating connection to cluster inbound|3333||
2020-12-14T03:53:34.744951Z     debug   envoy pool      creating a new connection
2020-12-14T03:53:34.745036Z     debug   envoy pool      [C1646] connecting
2020-12-14T03:53:34.745055Z     debug   envoy connection        [C1646] connecting to 127.0.0.1:3333
2020-12-14T03:53:34.745125Z     debug   envoy connection        [C1646] connection in progress
2020-12-14T03:53:34.745154Z     debug   envoy pool      queueing request due to no available connections
2020-12-14T03:53:34.745164Z     debug   envoy conn_handler      [C1645] new connection
2020-12-14T03:53:34.745173Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-14T03:53:34.745182Z     trace   envoy main      clearing deferred deletion list (size=1)
2020-12-14T03:53:34.745196Z     trace   envoy connection        [C1645] socket event: 2
2020-12-14T03:53:34.745203Z     trace   envoy connection        [C1645] write ready
2020-12-14T03:53:35.809903Z     trace   envoy misc      enableTimer called on 0x55d172aa6100 for 3600000ms, min is 3600000ms
2020-12-14T03:53:35.809949Z     debug   envoy conn_handler      [C1647] new connection
2020-12-14T03:53:35.809987Z     trace   envoy connection        [C1647] socket event: 3
2020-12-14T03:53:35.809995Z     trace   envoy connection        [C1647] write ready
2020-12-14T03:53:35.810003Z     trace   envoy connection        [C1647] read ready. dispatch_buffered_data=false
2020-12-14T03:53:35.810023Z     trace   envoy connection        [C1647] read returns: 126
2020-12-14T03:53:35.810045Z     trace   envoy connection        [C1647] read error: Resource temporarily unavailable
2020-12-14T03:53:35.810073Z     trace   envoy http      [C1647] parsing 126 bytes
2020-12-14T03:53:35.810083Z     trace   envoy http      [C1647] message begin
2020-12-14T03:53:35.810096Z     debug   envoy http      [C1647] new stream
2020-12-14T03:53:35.810127Z     trace   envoy http      [C1647] completed header: key=Host value=10.0.0.161:15021
2020-12-14T03:53:35.810142Z     trace   envoy http      [C1647] completed header: key=User-Agent value=kube-probe/1.19
2020-12-14T03:53:35.810347Z     trace   envoy http      [C1647] completed header: key=Accept-Encoding value=gzip
2020-12-14T03:53:35.810363Z     trace   envoy http      [C1647] onHeadersCompleteBase
2020-12-14T03:53:35.810369Z     trace   envoy http      [C1647] completed header: key=Connection value=close
2020-12-14T03:53:35.810381Z     trace   envoy http      [C1647] Server: onHeadersComplete size=4
2020-12-14T03:53:35.810399Z     trace   envoy http      [C1647] message complete
2020-12-14T03:53:35.810407Z     trace   envoy connection        [C1647] readDisable: disable=true disable_count=0 state=0 buffer_length=126
2020-12-14T03:53:35.810449Z     debug   envoy http      [C1647][S14244208112977825498] request headers complete (end_stream=true):
':authority', '10.0.0.161:15021'
':path', '/healthz/ready'
':method', 'GET'
'user-agent', 'kube-probe/1.19'
'accept-encoding', 'gzip'
'connection', 'close'

2020-12-14T03:53:35.810458Z     debug   envoy http      [C1647][S14244208112977825498] request end stream
2020-12-14T03:53:35.810511Z     debug   envoy router    [C1647][S14244208112977825498] cluster 'agent' match for URL '/healthz/ready'
2020-12-14T03:53:35.810564Z     debug   envoy router    [C1647][S14244208112977825498] router decoding headers:
':authority', '10.0.0.161:15021'
':path', '/healthz/ready'
':method', 'GET'
':scheme', 'http'
'user-agent', 'kube-probe/1.19'
'accept-encoding', 'gzip'
'x-forwarded-proto', 'http'
'x-request-id', '6bfd60be-1ab9-4597-bd1c-c9e25cb350f0'
'x-envoy-expected-rq-timeout-ms', '15000'

2020-12-14T03:53:35.810582Z     debug   envoy pool      [C3] using existing connection
2020-12-14T03:53:35.810591Z     debug   envoy pool      [C3] creating stream
2020-12-14T03:53:35.810605Z     debug   envoy router    [C1647][S14244208112977825498] pool ready
2020-12-14T03:53:35.810628Z     trace   envoy connection        [C3] writing 242 bytes, end_stream false
2020-12-14T03:53:35.810650Z     trace   envoy http      [C1647][S14244208112977825498] decode headers called: filter=0x55d1727d64e0 status=1
2020-12-14T03:53:35.810661Z     trace   envoy http      [C1647] parsed 126 bytes
2020-12-14T03:53:35.810678Z     trace   envoy connection        [C1647] socket event: 2
2020-12-14T03:53:35.810684Z     trace   envoy connection        [C1647] write ready
2020-12-14T03:53:35.810692Z     trace   envoy connection        [C3] socket event: 2
2020-12-14T03:53:35.810697Z     trace   envoy connection        [C3] write ready
2020-12-14T03:53:35.810756Z     trace   envoy connection        [C3] write returns: 242
2020-12-14T03:53:35.811211Z     trace   envoy connection        [C3] socket event: 3
2020-12-14T03:53:35.811224Z     trace   envoy connection        [C3] write ready
2020-12-14T03:53:35.811231Z     trace   envoy connection        [C3] read ready. dispatch_buffered_data=false
2020-12-14T03:53:35.811247Z     trace   envoy connection        [C3] read returns: 75
2020-12-14T03:53:35.811261Z     trace   envoy connection        [C3] read error: Resource temporarily unavailable
2020-12-14T03:53:35.811271Z     trace   envoy http      [C3] parsing 75 bytes
2020-12-14T03:53:35.811278Z     trace   envoy http      [C3] message begin
2020-12-14T03:53:35.811291Z     trace   envoy http      [C3] completed header: key=Date value=Mon, 14 Dec 2020 03:53:35 GMT
2020-12-14T03:53:35.811303Z     trace   envoy http      [C3] onHeadersCompleteBase
2020-12-14T03:53:35.811308Z     trace   envoy http      [C3] completed header: key=Content-Length value=0
2020-12-14T03:53:35.811326Z     trace   envoy http      [C3] status_code 200
2020-12-14T03:53:35.811332Z     trace   envoy http      [C3] Client: onHeadersComplete size=2
2020-12-14T03:53:35.811339Z     trace   envoy http      [C3] message complete
2020-12-14T03:53:35.811344Z     trace   envoy http      [C3] message complete
2020-12-14T03:53:35.811351Z     debug   envoy client    [C3] response complete
2020-12-14T03:53:35.811357Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-14T03:53:35.811373Z     debug   envoy router    [C1647][S14244208112977825498] upstream headers complete: end_stream=true
2020-12-14T03:53:35.811412Z     trace   envoy main      item added to deferred deletion list (size=2)
2020-12-14T03:53:35.811451Z     debug   envoy http      [C1647][S14244208112977825498] closing connection due to connection close header
2020-12-14T03:53:35.811470Z     debug   envoy http      [C1647][S14244208112977825498] encoding headers via codec (end_stream=true):
':status', '200'
'date', 'Mon, 14 Dec 2020 03:53:35 GMT'
'content-length', '0'
'x-envoy-upstream-service-time', '0'
'server', 'envoy'
'connection', 'close'

2020-12-14T03:53:35.811485Z     trace   envoy connection        [C1647] writing 143 bytes, end_stream false
2020-12-14T03:53:35.811505Z     trace   envoy connection        [C1647] readDisable: disable=false disable_count=1 state=0 buffer_length=0
2020-12-14T03:53:35.811542Z     trace   envoy main      item added to deferred deletion list (size=3)
2020-12-14T03:53:35.811555Z     trace   envoy misc      enableTimer called on 0x55d172aa6100 for 3600000ms, min is 3600000ms
2020-12-14T03:53:35.811567Z     debug   envoy connection        [C1647] closing data_to_write=143 type=2
2020-12-14T03:53:35.811577Z     debug   envoy connection        [C1647] setting delayed close timer with timeout 1000 ms
2020-12-14T03:53:35.811595Z     debug   envoy pool      [C3] response complete
2020-12-14T03:53:35.811604Z     debug   envoy pool      [C3] destroying stream: 0 remaining
2020-12-14T03:53:35.811614Z     trace   envoy http      [C3] parsed 75 bytes
2020-12-14T03:53:35.811623Z     trace   envoy main      clearing deferred deletion list (size=3)
2020-12-14T03:53:35.811645Z     trace   envoy connection        [C1647] socket event: 2
2020-12-14T03:53:35.811651Z     trace   envoy connection        [C1647] write ready
2020-12-14T03:53:35.811737Z     trace   envoy connection        [C1647] write returns: 143
2020-12-14T03:53:35.811748Z     debug   envoy connection        [C1647] write flush complete
2020-12-14T03:53:35.811949Z     trace   envoy connection        [C1647] socket event: 6
2020-12-14T03:53:35.811960Z     debug   envoy connection        [C1647] remote early close
2020-12-14T03:53:35.811967Z     debug   envoy connection        [C1647] closing socket: 0
2020-12-14T03:53:35.812059Z     trace   envoy connection        [C1647] raising connection event 0
2020-12-14T03:53:35.812071Z     debug   envoy conn_handler      [C1647] adding to cleanup list
2020-12-14T03:53:35.812077Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-14T03:53:35.812084Z     trace   envoy main      item added to deferred deletion list (size=2)
2020-12-14T03:53:35.812091Z     trace   envoy main      clearing deferred deletion list (size=2)

without sidecar.istio.io/interceptionMode: TPROXY
tcpdump -i any portrange 10001 or port 3333

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
20:16:00.631830 IP 10.0.0.93.60678 > 10.0.0.61.3333: Flags [S], seq 2759031831, win 64240, options [mss 1460,sackOK,TS val 1971496097 ecr 0,nop,wscale 7], length 0
20:16:00.631896 IP 10.0.0.61.3333 > 10.0.0.93.60678: Flags [S.], seq 1630080100, ack 2759031832, win 65160, options [mss 1460,sackOK,TS val 2070818590 ecr 1971496097,nop,wscale 7], length 0
20:16:00.631940 IP 10.0.0.93.60678 > 10.0.0.61.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 1971496097 ecr 2070818590], length 0
20:16:00.632057 IP 10.0.0.93.60678 > 10.0.0.61.3333: Flags [P.], seq 1:85, ack 1, win 502, options [nop,nop,TS val 1971496097 ecr 2070818590], length 84
20:16:00.632069 IP 10.0.0.61.3333 > 10.0.0.93.60678: Flags [.], ack 85, win 509, options [nop,nop,TS val 2070818590 ecr 1971496097], length 0
20:16:00.637105 IP 10.0.0.61.3333 > 10.0.0.93.60678: Flags [P.], seq 1:272, ack 85, win 509, options [nop,nop,TS val 2070818595 ecr 1971496097], length 271
20:16:00.637144 IP 10.0.0.93.60678 > 10.0.0.61.3333: Flags [.], ack 272, win 501, options [nop,nop,TS val 1971496102 ecr 2070818595], length 0
20:16:00.637175 IP 10.0.0.61.3333 > 10.0.0.93.60678: Flags [F.], seq 272, ack 85, win 509, options [nop,nop,TS val 2070818595 ecr 1971496102], length 0

TCP Pod logs

kubectl logs deploy/tcp -c tcp
----- START 2020-12-14 04:16:00 -----
Container information:
Hostname:       tcp-585445675b-g222b

Interface       NetMask         IP
lo              255.0.0.0       127.0.0.1
eth0            255.255.255.255 10.0.0.61

TCP Remote information:
IP:     127.0.0.1

Data received:
GET / HTTP/1.1
Host: 10.104.253.221:10001
User-Agent: curl/7.64.0
Accept: */*

----- END -----

Curl output

root@hozer-55:~# curl 10.104.253.221:10001
Container information:
Hostname:       tcp-585445675b-g222b

Interface       NetMask         IP
lo              255.0.0.0       127.0.0.1
eth0            255.255.255.255 10.0.0.61

TCP Remote information:
IP:     127.0.0.1

Data received:
GET / HTTP/1.1
Host: 10.104.253.221:10001
User-Agent: curl/7.64.0
Accept: */*

@hzxuzhonghu
Copy link
Member Author

hzxuzhonghu commented Dec 14, 2020

@hi-usui Can you check the route_localnetwork config?

sysctl net/ipv4/conf/all/route_localnet

@hi-usui
Copy link

hi-usui commented Dec 14, 2020

sysctl net/ipv4/conf/all/route_localnet shows 1
I tried again in a new GCP instance and did kubeadm reset on the previous instance. Connections to the pod go through now for both. Unsure what caused it.

My final step is to use proxy_protocol and original_src, but the listeners are not applying. The source IP address observed in application pod is ingress-nginx pod LoadBalancer internal IP, not IP address in PROXY TCP4 header. Without istioctl, is there an easy way to check which listeners are attached to a sidecar? Or see the reason for attachment failure?

Cannot get working the proxy_protocol and original_src filters for EnvoyFilter on a sidecar. Is there requirement for something else?

EnvoyFilter

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: proxy-protocol
  namespace: default
spec:
  configPatches:
    - applyTo: LISTENER
      match:
        context: SIDECAR_INBOUND
        listener:
          portNumber: 10001
      patch:
        operation: MERGE
        value:
          listener_filters:
            - name: envoy.filters.listener.proxy_protocol
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol
            - name: envoy.filters.listener.original_src
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.listener.original_src.v3.OriginalSrc
                mark: 1337
root@hozer-55:~# curl 10.100.217.38:10001 -v
* Expire in 0 ms for 6 (transfer 0x55a0fc7b3ec0)
*   Trying 10.100.217.38...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55a0fc7b3ec0)
* Connected to 10.100.217.38 (10.100.217.38) port 10001 (#0)
> GET / HTTP/1.1
> Host: 10.100.217.38:10001
> User-Agent: curl/7.64.0
> Accept: */*
>
Container information:
Hostname:       tcp-76d87cfd9f-ft8nm

Interface       NetMask         IP
lo              255.0.0.0       127.0.0.1
eth0            255.255.255.255 10.0.0.79

TCP Remote information:
IP:     10.0.0.99

Data received:
PROXY TCP4 10.0.0.93 10.0.0.99 54576 10001
GET / HTTP/1.1
Host: 10.100.217.38:10001
User-Agent: curl/7.64.0
Accept: */*
* Closing connection 0

Miscellaneous
Cilium

export DEVICE_IP=$(ip -4 addr show $(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)') | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
kubeadm init --skip-phases addon/kube-proxy
helm install cilium cilium/cilium --version 1.9.1 --namespace kube-system -f - << EOF
k8sServiceHost: $DEVICE_IP
k8sServicePort: 6443
kubeProxyReplacement: strict
operator:
  replicas: 1
EOF

Linux

root@hozer-55:~# uname -a
Linux hozer-55 5.9.0-0.bpo.2-rt-amd64 #1 SMP PREEMPT_RT Debian 5.9.6-1~bpo10+1 (2020-11-19) x86_64 GNU/Linux

root@hozer-55:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

@hzxuzhonghu
Copy link
Member Author

Cannot get working the proxy_protocol and original_src filters for EnvoyFilter on a sidecar. Is there requirement for something else?

For the inbound traffic, it will go through 15006 virtualInbound listener, maybe this is the reason

@hi-usui
Copy link

hi-usui commented Dec 15, 2020

Cannot get working the proxy_protocol and original_src filters for EnvoyFilter on a sidecar. Is there requirement for something else?

For the inbound traffic, it will go through 15006 virtualInbound listener, maybe this is the reason

You are correct. proxy_protocol: New connection accepted and Got a new connection in the original_src filter now appear. I thought SIDECAR_INBOUND ports are for targeted service's ports only. The inbound listener matches on port 15006. Unfortunately I did not solve initial original problem, which is TCP does not receive a reply, both for internal and external traffic (tcp-pod <--> istio-proxy <--x ingress-nginx <--x client)

Perhaps it is not possible with ingress-nginx LoadBalancer? ingress-nginx tries to send to TCP pod (10.0.0.99.34566 > 10.0.0.157.3333) but cannot receive reply (10.0.0.157.3333 > 10.0.0.99.34566), envoy-proxy always ends stream with read error: Resource temporarily unavailable. Maybe this is not really related to source IP with TPROXY

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: proxy-protocol
  namespace: default
spec:
  configPatches:
    - applyTo: LISTENER
      match:
        context: SIDECAR_INBOUND
        listener:
          portNumber: 15006
      patch:
        operation: MERGE
        value:
          listener_filters:
            - name: envoy.filters.listener.proxy_protocol
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol
            - name: envoy.filters.listener.original_src
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.listener.original_src.v3.OriginalSrc
                mark: 1337
root@hozer-55:~# clear; k logs -l component=tcp -c istio-proxy -f
2020-12-15T02:42:17.008562Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-15T02:42:17.008569Z     trace   envoy main      item added to deferred deletion list (size=2)
2020-12-15T02:42:17.008576Z     trace   envoy main      clearing deferred deletion list (size=2)
2020-12-15T02:42:17.786889Z     trace   envoy upstream  starting async DNS resolution for zipkin.istio
2020-12-15T02:42:17.787237Z     trace   envoy upstream  Setting DNS resolution timer for 5000 milliseconds
2020-12-15T02:42:17.799052Z     trace   envoy upstream  Setting DNS resolution timer for 5000 milliseconds
2020-12-15T02:42:17.799603Z     trace   envoy upstream  Setting DNS resolution timer for 5000 milliseconds
2020-12-15T02:42:17.800043Z     trace   envoy upstream  Setting DNS resolution timer for 5000 milliseconds
2020-12-15T02:42:17.800362Z     trace   envoy upstream  async DNS resolution complete for zipkin.istio
2020-12-15T02:42:17.800378Z     debug   envoy upstream  DNS refresh rate reset for zipkin.istio, (failure) refresh rate 5000 ms
2020-12-15T02:42:18.109761Z     debug   envoy filter    original_dst: New connection accepted
2020-12-15T02:42:18.109967Z     debug   envoy filter    Got a new connection in the original_src filter for address 10.0.0.99:42868. Marking with 1337
2020-12-15T02:42:18.110047Z     debug   envoy filter    tls inspector: new connection accepted
2020-12-15T02:42:18.110102Z     trace   envoy filter    tls inspector: recv: 47
2020-12-15T02:42:18.110244Z     debug   envoy filter    proxy_protocol: New connection accepted
2020-12-15T02:42:18.110401Z     debug   envoy filter    Got a new connection in the original_src filter for address 73.71.38.214:53185. Marking with 1337
2020-12-15T02:42:18.111326Z     debug   envoy filter    [C1104] new tcp proxy session
2020-12-15T02:42:18.111399Z     trace   envoy connection        [C1104] readDisable: disable=true disable_count=0 state=0 buffer_length=0
2020-12-15T02:42:18.111494Z     debug   envoy filter    [C1104] Creating connection to cluster InboundPassthroughClusterIpv4
2020-12-15T02:42:18.111788Z     debug   envoy upstream  Using existing host 10.0.0.99:10001.
2020-12-15T02:42:18.111916Z     debug   envoy pool      creating a new connection
2020-12-15T02:42:18.112123Z     debug   envoy pool      [C1105] connecting
2020-12-15T02:42:18.112242Z     debug   envoy connection        [C1105] connecting to 10.0.0.99:10001
2020-12-15T02:42:18.112548Z     debug   envoy connection        [C1105] connection in progress
2020-12-15T02:42:18.112632Z     debug   envoy pool      queueing request due to no available connections
2020-12-15T02:42:18.112765Z     debug   envoy conn_handler      [C1104] new connection
2020-12-15T02:42:18.112832Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-15T02:42:18.112963Z     trace   envoy main      clearing deferred deletion list (size=1)
2020-12-15T02:42:18.113041Z     trace   envoy connection        [C1104] socket event: 2
2020-12-15T02:42:18.113204Z     trace   envoy connection        [C1104] write ready
2020-12-15T02:42:18.642691Z     debug   envoy pool      [C1096] connect timeout
2020-12-15T02:42:18.642841Z     debug   envoy connection        [C1096] closing data_to_write=0 type=1
2020-12-15T02:42:18.642892Z     debug   envoy connection        [C1096] closing socket: 1
2020-12-15T02:42:18.643016Z     trace   envoy connection        [C1096] raising connection event 1
2020-12-15T02:42:18.643151Z     debug   envoy pool      [C1096] client disconnected
2020-12-15T02:42:18.643252Z     debug   envoy filter    [C1104] connect timeout
2020-12-15T02:42:18.643405Z     debug   envoy filter    [C1104] Creating connection to cluster InboundPassthroughClusterIpv4
2020-12-15T02:42:18.643469Z     debug   envoy connection        [C1104] closing data_to_write=0 type=1
2020-12-15T02:42:18.643547Z     debug   envoy connection        [C1104] closing socket: 1
2020-12-15T02:42:18.643705Z     trace   envoy connection        [C1104] raising connection event 1
2020-12-15T02:42:18.643816Z     debug   envoy wasm      wasm log: [extensions/stats/plugin.cc:615]::report() metricKey cache hit , stat=12
2020-12-15T02:42:18.643872Z     debug   envoy wasm      wasm log: [extensions/stats/plugin.cc:615]::report() metricKey cache hit , stat=16
2020-12-15T02:42:18.643913Z     debug   envoy wasm      wasm log: [extensions/stats/plugin.cc:615]::report() metricKey cache hit , stat=20
2020-12-15T02:42:18.643952Z     debug   envoy wasm      wasm log: [extensions/stats/plugin.cc:615]::report() metricKey cache hit , stat=24
2020-12-15T02:42:18.643995Z     debug   envoy conn_handler      [C1104] adding to cleanup list
2020-12-15T02:42:18.644043Z     trace   envoy main      item added to deferred deletion list (size=1)
2020-12-15T02:42:18.644082Z     trace   envoy main      item added to deferred deletion list (size=2)
2020-12-15T02:42:18.644122Z     trace   envoy main      item added to deferred deletion list (size=3)
2020-12-15T02:42:18.644162Z     trace   envoy main      clearing deferred deletion list (size=3)
2020-12-15T02:42:18.644804Z     debug   envoy pool      [C1096] connection destroyed
2020-12-15T02:42:19.005921Z     debug   envoy conn_handler      [C1106] new connection
2020-12-15T02:42:19.006141Z     trace   envoy connection        [C1106] socket event: 3
2020-12-15T02:42:19.006377Z     trace   envoy connection        [C1106] write ready
2020-12-15T02:42:19.006466Z     trace   envoy connection        [C1106] read ready. dispatch_buffered_data=false
2020-12-15T02:42:19.006621Z     trace   envoy connection        [C1106] read returns: 126
2020-12-15T02:42:19.006814Z     trace   envoy connection        [C1106] read error: Resource temporarily unavailable
2020-12-15T02:42:19.006965Z     trace   envoy http      [C1106] parsing 126 bytes
2020-12-15T02:42:19.007092Z     trace   envoy http      [C1106] message begin
2020-12-15T02:42:19.008157Z     debug   envoy http      [C1106] new stream
2020-12-15T02:42:19.008437Z     trace   envoy http      [C1106] completed header: key=Host value=10.0.0.157:15021
2020-12-15T02:42:19.008544Z     trace   envoy http      [C1106] completed header: key=User-Agent value=kube-probe/1.19
2020-12-15T02:42:19.008753Z     trace   envoy http      [C1106] completed header: key=Accept-Encoding value=gzip
2020-12-15T02:42:19.008849Z     trace   envoy http      [C1106] onHeadersCompleteBase
2020-12-15T02:42:19.008996Z     trace   envoy http      [C1106] completed header: key=Connection value=close
2020-12-15T02:42:19.009090Z     trace   envoy http      [C1106] Server: onHeadersComplete size=4
2020-12-15T02:42:19.009177Z     trace   envoy http      [C1106] message complete
2020-12-15T02:42:19.009255Z     trace   envoy connection        [C1106] readDisable: disable=true disable_count=0 state=0 buffer_length=126
root@hozer-55:~# clear; tcpdump -i any portrange 10001 or port 3333
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:24:16.901762 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > hozer-55.ocf.berkeley.edu.10001: Flags [S], seq 2142299170, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 371388306 ecr 0,sackOK,eol], length 0
19:24:16.901926 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > 10.0.0.99.10001: Flags [S], seq 2142299170, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 371388306 ecr 0,sackOK,eol], length 0
19:24:16.901982 IP 10.0.0.99.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [S.], seq 1742813106, ack 2142299171, win 64308, options [mss 1410,sackOK,TS val 287109189 ecr 371388306,nop,wscale 7], length 0
19:24:16.902021 IP hozer-55.ocf.berkeley.edu.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [S.], seq 1742813106, ack 2142299171, win 64308, options [mss 1410,sackOK,TS val 287109189 ecr 371388306,nop,wscale 7], length 0
19:24:16.929848 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > hozer-55.ocf.berkeley.edu.10001: Flags [.], ack 1, win 2053, options [nop,nop,TS val 371388345 ecr 287109189], length 0
19:24:16.929952 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > 10.0.0.99.10001: Flags [.], ack 1, win 2053, options [nop,nop,TS val 371388345 ecr 287109189], length 0
19:24:16.930321 IP 10.0.0.99.34566 > 10.0.0.157.3333: Flags [S], seq 3382214082, win 64860, options [mss 1410,sackOK,TS val 1763326182 ecr 0,nop,wscale 7], length 0
19:24:16.930359 IP 10.0.0.99.34566 > 10.0.0.157.3333: Flags [S], seq 3382214082, win 64860, options [mss 1410,sackOK,TS val 1763326182 ecr 0,nop,wscale 7], length 0
19:24:16.930417 IP 10.0.0.157.3333 > 10.0.0.99.34566: Flags [S.], seq 3377057969, ack 3382214083, win 64308, options [mss 1410,sackOK,TS val 1379497095 ecr 1763326182,nop,wscale 7], length 0

*** cURL hangs for 10 seconds ***

19:24:26.930677 IP 10.0.0.157.3333 > 10.0.0.99.34566: Flags [R.], seq 1, ack 134, win 503, options [nop,nop,TS val 1379507096 ecr 1763326183], length 0
19:24:26.930731 IP 10.0.0.157.3333 > 10.0.0.99.34566: Flags [R.], seq 1, ack 134, win 503, options [nop,nop,TS val 1379507096 ecr 1763326183], length 0
19:24:26.930947 IP 10.0.0.99.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [F.], seq 1, ack 87, win 502, options [nop,nop,TS val 287119218 ecr 371388345], length 0
19:24:26.930979 IP hozer-55.ocf.berkeley.edu.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [F.], seq 1, ack 87, win 502, options [nop,nop,TS val 287119218 ecr 371388345], length 0
19:24:26.963312 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > hozer-55.ocf.berkeley.edu.10001: Flags [.], ack 2, win 2053, options [nop,nop,TS val 371398369 ecr 287119218], length 0
19:24:26.963437 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > 10.0.0.99.10001: Flags [.], ack 2, win 2053, options [nop,nop,TS val 371398369 ecr 287119218], length 0
19:24:26.963505 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > hozer-55.ocf.berkeley.edu.10001: Flags [F.], seq 87, ack 2, win 2053, options [nop,nop,TS val 371398369 ecr 287119218], length 0
19:24:26.963532 IP c-73-71-38-214.hsd1.ca.comcast.net.53328 > 10.0.0.99.10001: Flags [F.], seq 87, ack 2, win 2053, options [nop,nop,TS val 371398369 ecr 287119218], length 0
19:24:26.963552 IP 10.0.0.99.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [.], ack 88, win 502, options [nop,nop,TS val 287119251 ecr 371398369], length 0
19:24:26.963571 IP hozer-55.ocf.berkeley.edu.10001 > c-73-71-38-214.hsd1.ca.comcast.net.53328: Flags [.], ack 88, win 502, options [nop,nop,TS val 287119251 ecr 371398369], length 0
^C
19 packets captured
253 packets received by filter
205 packets dropped by kernel
root@hozer-55:~#

@hzxuzhonghu
Copy link
Member Author

Does the ingress-nginx send with proxy protocol?

Note: if the filter is enabled, the Proxy Protocol must be present on the connection (either version 1 or version 2), the standard does not allow parsing to determine if it is present or not.

@hi-usui
Copy link

hi-usui commented Dec 15, 2020

Does the ingress-nginx send with proxy protocol?

Note: if the filter is enabled, the Proxy Protocol must be present on the connection (either version 1 or version 2), the standard does not allow parsing to determine if it is present or not.

Yes, for ingress-nginx accepting non-PROXY that adds PROXY and sends to istio-proxy:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --version 3.15.2 --namespace ingress-nginx --create-namespace -f - << EOF
controller:
  service:
    type: LoadBalancer
    externalTrafficPolicy: Local
tcp:
  10001: default/tcp:10001::PROXY
EOF
export DEVICE_IP=$(ip -4 addr show $(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)') | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
helm install metallb2 bitnami/metallb --version 1.0.1 --namespace kube-system -f - << EOF
configInline:
  address-pools:
    - name: default
      protocol: layer2
      addresses:
        - $DEVICE_IP-$DEVICE_IP
EOF

ingress-nginx logs

[73.71.38.214] [15/Dec/2020:08:17:02 +0000] TCP 200 0 86 9.999
2020/12/15 08:17:02 [error] 1031#1031: *797350 recv() failed (104: Connection reset by peer) while proxying and reading from upstream, client: 73.71.38.214, server: 0.0.0.0:10001, upstream: "10.0.0.157:3333", bytes from/to client:86/0, bytes from/to upstream:0/133

but with no EnvoyFilter, TCP pod responds like:

root@hozer-55:~# clear; k logs -l component=tcp -c tcp -f
----- START 2020-12-15 09:17:12 -----
Container information:
Hostname:       tcp-7848c94d55-6spff

Interface       NetMask         IP
lo              255.0.0.0       127.0.0.1
eth0            255.255.255.255 10.0.0.33

TCP Remote information:
IP:     10.0.0.99

Data received:
PROXY TCP4 73.71.38.214 10.0.0.99 53853 10001
GET / HTTP/1.1
Host: berkeleytime.com:10001
User-Agent: curl/7.71.1
Accept: */*

----- END -----

@rlenglet
Copy link
Contributor

On side note, @jrajahalme can you confirm up to which version this PR needs to be backported? I think you saw issues with TPROXY in 1.6, right?

@hi-usui
Copy link

hi-usui commented Dec 16, 2020

@hzxuzhonghu Figured it out: Addenvoy.filters.listener.original_dst to listener_filters:

listener_filters:
  - name: envoy.filters.listener.proxy_protocol
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.listener.proxy_protocol.v3.ProxyProtocol
  - name: envoy.filters.listener.original_dst
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.listener.original_dst.v3.OriginalDst
  - name: envoy.filters.listener.original_src
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.listener.original_src.v3.OriginalSrc
root@bt-gcp-2:~# k logs -f -l component=tcp
TCP Remote information:
IP:	73.71.38.214

Data received:
GET / HTTP/1.1
Host: 34.94.29.220:10001
User-Agent: curl/7.71.1
Accept: */*

----- END -----

Pretty cool! Hopefully applications can now drop support for Proxy Protocol. Is there an easy way to have sidecars exclude cluster traffic from the listener match? Would like to apply this to Postgres DB but make it so cluster traffic does not have to send Proxy header

EDIT: I guess just use global.proxy.excludeIPRanges: 10.0.0.0/8
Application to Postgres
Before:

[2020-12-16 02:15:19] [NOTICE] starting monitoring of node "bt-psql-staging-postgresql-ha-postgresql-0" (ID: 1000)
[2020-12-16 02:15:19] [NOTICE] monitoring cluster primary "bt-psql-staging-postgresql-ha-postgresql-0" (ID: 1000)
10-0-0-93.ingress-nginx-controller.ingress-nginx.svc.cluster.local 2020-12-16 02:15:43.562 GMT [183] [unknown]@[unknown] LOG:  invalid length of startup packet
10-0-0-93.ingress-nginx-controller.ingress-nginx.svc.cluster.local 2020-12-16 02:15:50.540 GMT [208] [unknown]@[unknown] LOG:  invalid length of startup packet

After:

c-73-71-38-214.hsd1.ca.comcast.net 2020-12-16 02:16:47.621 GMT [332] bt@bt FATAL:  password authentication failed for user "bt"
c-73-71-38-214.hsd1.ca.comcast.net 2020-12-16 02:16:47.621 GMT [332] bt@bt DETAIL:  Password does not match for user "bt".
        Connection matched pg_hba.conf line 6: "host     all              all       0.0.0.0/0    md5"
c-73-71-38-214.hsd1.ca.comcast.net 2020-12-16 02:16:50.359 GMT [340] bt@bt FATAL:  password authentication failed for user "bt"
c-73-71-38-214.hsd1.ca.comcast.net 2020-12-16 02:16:50.359 GMT [340] bt@bt DETAIL:  Password does not match for user "bt".
        Connection matched pg_hba.conf line 6: "host     all              all       0.0.0.0/0    md5"

@hzxuzhonghu
Copy link
Member Author

All traffic goes to the virtualInbound listener with the above filters enabled, so i donot think it is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to preserve source ip
8 participants