Skip to content

Network gateway hostnames are not correctly resolved when DNS upstreams don't support ANY queries #38689

@sangwa

Description

@sangwa

Bug Description

I've been building a proof-of-concept multi-cluster mesh with multi-primaries in different networks. In my case, one cluster is AWS EKS and another is DigitalOcean managed Kubernetes. The east-west gateway in the EKS cluster is exposed with an AWS NLB.

After linking the clusters as per the documentation, I've discovered that the domain name of the AWS NLB in the EKS cluster is not correctly resolved in the DO cluster. It turned out that the upstream DNS set up on DO KS nodes is returning REFUSED answers to ANY queries that are currently used in the Pilot code:

// TODO figure out how to query only A + AAAA
res := n.client.Query(new(dns.Msg).SetQuestion(dns.Fqdn(name), dns.TypeANY))

$ dig -t ANY k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com

; <<>> DiG 9.16.1-Ubuntu <<>> -t ANY k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 44741
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: c367afe288a65985 (echoed)
;; QUESTION SECTION:
;k8s-istiomul-istioeas-4d501f177f-9a9de7682aacbcd6.elb.us-west-2.amazonaws.com. IN ANY

;; Query time: 7 msec
;; SERVER: 10.245.0.10#53(10.245.0.10)
;; WHEN: Mon May 02 03:20:07 UTC 2022
;; MSG SIZE  rcvd: 118

ANY queries are not guaranteed to be consistently implemented in DNS servers. For example, Cloudflare deems them deprecated and their NS return NOTIMP to ANY queries.

I'd suggest replacing ANY with A and AAAA queries, as mentioned by the comment in the code. Though it is technically possible to craft a multi-type query with the library currently in use, such queries also seem not guaranteed to be implemented consistently, so we'd likely have to make two separate queries and merge the results. I have a patch tested in my environment and can follow up with a PR.

Version

$ istioctl version
client version: 1.13.3
control plane version: 1.13.3
data plane version: 1.13.3 (1 proxies)

$ kubectl version --short
Client Version: v1.23.6
Server Version: v1.22.8

Additional Information

No response

Affected product area

  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Upgrade
  • Multi Cluster
  • Virtual Machine
  • Control Plane Revisions

Is this the right place to submit this?

  • This is not a security vulnerability
  • This is not a question about how to use Istio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions