Kubernetes service IP / container DNAT broken in 1262.0.0

# Issue Report #

Under Kubernetes, pod to pod communication via service IP within a
single node is broken in latest CoreOS alpha (1262.0.0).  Downgrading
to previous alpha resolves the issue.  The issue is not specific to
Kubernetes however.

## Bug ##

### CoreOS Version ###

```
$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1262.0.0
VERSION_ID=1262.0.0
BUILD_ID=2016-12-14-2334
PRETTY_NAME="CoreOS 1262.0.0 (Ladybug)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

```

### Environment ###

Observed on both bare metal and Vagrant + VirtualBox locally.

### Expected Behavior ###

In Kubernetes, I can define a service which targets a set of pods and
makes them reachable via a virtual IP.  With kube-proxy running in
iptables mode, Kubernetes will configure NAT rules to redirect traffic
destined for that virtual IP to the individual pods.  That should work
for traffic originating from any node in the cluster.

### Actual Behavior ###

With a cluster running on the latest alpha (1262.0.0), pods cannot be
reached via their service IP when the traffic originates from another
pod on the same node as the destination.

The following *does* work on the same node:

  - Host to service IP
  - Pods running in the host net namespace to service IP

### Reproduction Steps ###

This isn't actually specific to Kubernetes, I have an [alpha-nat](https://github.com/bison/coreos-vagrant/tree/alpha-nat)
branch of coreos-vagrant that will start a VM with a Docker container
and iptables rules similar to what Kubernetes uses -- along with trace
rules for debugging.

Check out that branch and run either `./start-broken.sh` or
`./start-working.sh` then `vagrant ssh` into the VM and run the
following:


```sh-session
# Works from host
core@core-01 ~ $ curl http://10.3.0.100:8080
CLIENT VALUES:
client_address=10.0.2.15
...


# Fails from another container on broken version
core@core-01 ~ $ docker run --rm busybox wget -O- -T5 http://10.3.0.100:8080
Connecting to 10.3.0.100:8080 (10.3.0.100:8080)
wget: download timed out

```

You could also use cloud-config in the `user-data` file on that branch
on any other platform.

Another option is to launch a Kubernetes cluster with a single
scheduleable node and start a pod with an accompanying service.  Other
pods on the same node will not be able to communicate using the
service IP.  I've been using [this](https://gist.github.com/bison/6fb9c58ed6590449a2657b49ba5d90c6) for testing.

### Other Information ###

Starting the same `echoheaders` container under rkt with the default
ptp networking and configuring similar NAT rules seems to work as
expected from other containers, so this might only be happening when
attaching containers to a bridge.

coreos/coreos-overlay#2300 landed in 1262.0.0 -- I tried not marking
the interfaces unmanaged with overrides in `/etc/systemd/network/` but
it didn't seem to help.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kubernetes service IP / container DNAT broken in 1262.0.0 #1743

Issue Report

Bug

CoreOS Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kubernetes service IP / container DNAT broken in 1262.0.0 #1743

Description

Issue Report

Bug

CoreOS Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions