kops-controller can't determine AWS region

**1. What `kops` version are you running? The command `kops version`, will display
 this information.**

```shell
➜ kops version
Version 1.18.0 (git-698bf974d8)
```

**2. What Kubernetes version are you running? `kubectl version` will print the
 version if a cluster is running or provide the Kubernetes version specified as
 a `kops` flag.**

```shell
➜ kubectl version --short
Client Version: v1.19.0
Server Version: v1.18.8
```

**3. What cloud provider are you using?**

AWS GovCloud

**4. What commands did you run?  What is the simplest way to reproduce this issue?**

This was done as part of an upgrade from Kubernetes v1.17.10 created using kops v1.17.1 to Kubernetes v1.18.8 using kops v1.18.0. We upgraded the cluster and removed the pinned Docker version (originally added to fix Docker issues on Amazon Linux). So it went something like the following:

```shell
kops upgrade cluster --yes
kops edit cluster # removed the docker stuff
kops update cluster --yes
kops rolling-update cluster --yes
```

**5. What happened after the commands executed?**

Everything came up again smoothly except the `role` labels weren't being added to the worker nodes. Looking into the `kops-controller`, we were getting errors like this:

```
I0901 17:32:15.966765       1 s3context.go:325] unable to read /sys/devices/virtual/dmi/id/product_uuid, assuming not running on EC2: open /sys/devices/virtual/dmi/id/product_uuid: permission denied
I0901 17:32:15.967033       1 s3context.go:170] defaulting region to "us-east-1"
I0901 17:32:16.311839       1 s3context.go:191] unable to get bucket location from region "us-east-1"; scanning all regions: InvalidToken: The provided token is malformed or otherwise invalid.
     status code: 400, request id: 4Z4P1MCS0S6K8P0R, host id: HOST-ID
 E0901 17:32:16.647080       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load cluster object for node ip-172-20-40-221.us-gov-west-1.compute.internal: error loading Cluster \"s3://cipher
 morph-com-k8s-local-kops/ciphermorph.com.k8s.local/cluster.spec\": Unable to list AWS regions: AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 870266f7-a2ac-417e-9fb3-e7496e5ba6
 29"  "controller"="node" "request"={"Namespace":"","Name":"ip-172-20-40-221.us-gov-west-1.compute.internal"}
```

**6. What did you expect to happen?**

I expected the `kops-controller` to be able to determine I'm running in by reading that file and not error out. Therefore, the `kops-controller` would be able to proceed and add the labels to the node.

**7. Please provide your cluster manifest. Execute
  `kops get --name my.example.com -o yaml` to display your cluster manifest.
  You may want to remove your cluster name and other sensitive information.**

```yaml
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2020-04-11T19:12:40Z"
  generation: 7
  name: our.cluster.k8s.local
spec:
  additionalPolicies:
    node: |
      [
        {"Effect":"Allow","Action":["autoscaling:DescribeAutoScalingGroups","autoscaling:DescribeAutoScalingInstances","autoscaling:DescribeLaunchConfigurations","autoscaling:DescribeTags","autoscaling:SetDesiredCapacity","autoscaling:TerminateInstanceInAutoScalingGroup"],"Resource":"*"}
      ]
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://our-bucket/our.cluster.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-gov-west-1a
      name: a
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-gov-west-1a
      name: a
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.18.8
  masterInternalName: api.internal.our.cluster.k8s.local
  masterPublicName: api.our.cluster.k8s.local
  networkCIDR: 172.172.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 127.0.0.1/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.172.172.0/19
    name: us-gov-west-1a
    type: Public
    zone: us-gov-west-1a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
```

**8. Please run the commands with most verbose logging by adding the `-v 10` flag.
  Paste the logs into this report, or in a gist and provide the gist link here.**

**9. Anything else do we need to know?**

This is running in AWS GovCloud region `us-gov-west-1`.

Digging into the code, I found [the place where this check it happening](https://github.com/kubernetes/kops/blob/86cbd296327cea20b84927b29bbdabb160b34459/util/pkg/vfs/s3context.go#L157-L158). By setting the `AWS_REGION` environment variable in the Daemonset, I was able to get things working. However, this won't be a long-term solution since I assume this will get overwritten when we do another upgrade.

I looked at the file `/sys/devices/virtual/dmi/id/product_uuid` directly on the node since it is the file the controller can't open and saw that it is owned by `root`. So Initially I was thinking maybe Amazon Linux makes `root ` own it but it isn't on other distros. This is incorrect. I spun up an Ubuntu instance, and it too had that file owned by root.

I also tried updating to the most recent AMI of Amazon Linux, but that didn't fix anything either.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kops-controller can't determine AWS region #9856

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kops-controller can't determine AWS region #9856

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions