Inter-Kubernetes Networking with Forwarded DNS

How to configure clusters that can be used securely between different regions or clouds.

Tutorials are accurate at the time of writing but rely heavily on third party software. Tutorials are provided to demonstrate how a particular problem may be solved. Use of third party software is not supported by Couchbase. For further help in the event of a problem, contact the relevant software maintainer.

It is often desirable to have clients connect to a Couchbase cluster from outside of the hosting Kubernetes cluster. The specific scenario this tutorial will address is where a client resides within another Kubernetes instance. Such a topology may be used in the following situations:

Geographically dispersed clients can connect into a centralized database instance.
Cross center replication (XDCR) can be used to replicate or backup data to a different geographical region, even cloud provider.
General partitioning of workloads into security zones.

Because your data is valuable, it should always be encrypted in transit — even across a secure VPN — so this tutorial will deal exclusively with TLS.

TLS — with the Operator — is driven exclusively with DNS wildcard certificates. A wildcard certificate is valid for all nodes that a Couchbase cluster contains and will contain as the topology changes during the cluster lifetime. The Operator does not support IP based certificates as that would require the Operator to act as a certificate signing authority, which is a security concern. Additionally IP addresses cannot be known before a pod is created and the TLS secrets mounted, adding complexity.

As such TLS support outside of the Kubernetes cluster is driven by DNS. Consider the following architecture diagram:

Figure 1. Basic Inter-Kubernetes Networking

The client uses DNS addresses served by its local Kubernetes DNS server for normal in-cluster operation. It also uses those in a remote Kubernetes DNS server in order to resolve the IP addresses of the remote Couchbase cluster. This means that the remote cluster must use a different subnet than the local one so network addresses are unique. This means that the remote cluster must use flat, routed networking; a pod in the local cluster must be able to connect to a pod in the remote cluster without DNAT.

For the purposes of this tutorial we will use CoreDNS as a proxy DNS server for Couchbase Server pods. This will then selectively forward requests on to either the local or remote DNS server instances. CoreDNS is readily available, and already powers most of the managed Kubernetes offerings. Other DNS servers may be used (e.g. BIND), provided they can forward specific zones to a remote DNS server.

While this tutorial focuses on Kubernetes to Kubernetes connectivity, the client need not reside in Kubernetes. The same rules however apply; no DNAT between the client and Couchbase server instance, and clients use a local DNS server to separate traffic and forward to the correct recipient.

Configuring the Platform

Platform configuration is left up to the end user as there are numerous different ways of deploying and interconnecting your Kubernetes clusters. The following diagram is based on eksctl for Amazon AWS, however the concepts will be generally the same for any platform:

Figure 2. AWS Platform Configuration

The eksctl tool creates clusters similar to those depicted. It uses the amazon-vpc-cni-k8s CNI plugin to provision pod network interfaces. Pod interfaces are provisioned using elastic network interfaces, so appear as part of the cluster VPC. No overlay networks are involved and the pod IP addresses can be reached from outside of the Kubernetes cluster without NAT. This fulfills the requirement for flat, routed networking.

Red Hat OpenShift users should be aware that the default is to use an based overlay network. This prohibits you from performing inter-Kubernetes networking, and connecting external Couchbase clients to a cluster securely. Please consult your platform vendor for advice.

A VPC is created to contain the EKS instance, each cluster needs to have a unique, non-overlapping network prefix than can be specified on creation time. In this example Cluster 1 has 10.0.0.0/16 and Cluster 2 10.1.0.0/16.

The two VPCs that are created are then peered together. This is quite simply a network pathway that allows packets from one cluster to reach another. This can be a simple as a physical cable between routers or as complex as a VPN, it is entirely up to the reader.

While not depicted, each availability zone within the VPC has its own subnet. Routing decisions are made at the subnet level, and each subnet has its own routing table. You will need to add in routing table entries to forward packets destined for the remote cluster over the peering connection. This, like all steps, should be done for both clusters to allow bi-directional communication.

The final part of platform configuration concerns security groups. Kubernetes nodes are firewalled off from everything except each other and their EKS control plane within an eksctl deployment. You need to allow ingress traffic from the remote cluster to all Kubernetes nodes in the local cluster. In cluster 1, for example, you would need a security group that allows all protocol from 10.1.0.0/0 to be applied to all Kubernetes EC2 instance.

At present eksctl creates the VPC for you, but cannot automatically add custom security groups to that VPC, the cluster scaling group launch template, and therefore the Kubernetes node EC2 instances. It does, however, allow the user to specify their own VPC subnets which may have a security group predefined that can then be used for provisioning. Other cluster provisioning tools may have better support for inter-connectivity and firewalling.

The end goal is to have pods from the local cluster be able to ping those in the remote. For example you can poll Cluster 2 for pods:

$ kubectl -n kube-system get pods -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
aws-node-c8mwf            1/1     Running   0          33m   10.1.3.157    ip-10-1-3-157.us-west-2.compute.internal    <none>           <none>
aws-node-tgwcm            1/1     Running   0          33m   10.1.80.190   ip-10-1-80-190.us-west-2.compute.internal   <none>           <none>
aws-node-wmmdc            1/1     Running   0          33m   10.1.38.67    ip-10-1-38-67.us-west-2.compute.internal    <none>           <none>
coredns-84549585c-h9qzk   1/1     Running   0          39m   10.1.51.54    ip-10-1-38-67.us-west-2.compute.internal    <none>           <none>
coredns-84549585c-sh9km   1/1     Running   0          39m   10.1.61.240   ip-10-1-38-67.us-west-2.compute.internal    <none>           <none>
kube-proxy-26jkh          1/1     Running   0          33m   10.1.80.190   ip-10-1-80-190.us-west-2.compute.internal   <none>           <none>
kube-proxy-zdd6m          1/1     Running   0          33m   10.1.38.67    ip-10-1-38-67.us-west-2.compute.internal    <none>           <none>
kube-proxy-zr57z          1/1     Running   0          33m   10.1.3.157    ip-10-1-3-157.us-west-2.compute.internal    <none>           <none>

Then ping them from Cluster 1. First create a pod to run the network test from, and shell into it:

$ kubectl run ubuntu --image ubuntu:bionic --command -- sleep 999999
kubectl run ubuntu --image ubuntu:bionic --command -- sleep 999999
$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
ubuntu-7fbcdb597-nwnbt   1/1     Running   0          6s
$ kubectl exec -ti ubuntu-7fbcdb597-nwnbt bash

Then from the local pod, ping a remote pod:

$ apt-get update && apt-get -y install iputils-ping iproute2
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
...
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether ae:9a:49:1e:6c:81 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.87.216/32 brd 10.0.87.216 scope global eth0 (1)
       valid_lft forever preferred_lft forever
$ ping -c 3 10.1.51.54 (2)
PING 10.1.51.54 (10.1.51.54) 56(84) bytes of data.
64 bytes from 10.1.51.54: icmp_seq=1 ttl=253 time=47.7 ms
64 bytes from 10.1.51.54: icmp_seq=2 ttl=253 time=47.7 ms
64 bytes from 10.1.51.54: icmp_seq=3 ttl=253 time=47.7 ms

--- 10.1.51.54 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 47.739/47.772/47.799/0.180 ms

1	Source pod exists in 10.0.0.0/16 (Cluster 1)
2	Target pod exists in 10.1.0.0/16 (Cluster 2)

If you are not able to establish a connection, refer to your platform provider’s documentation.

Configuring CoreDNS

Consider the following architecture diagram:

Figure 3. CoreDNS Architecture

A local pod will connect to local DNS server, not the cluster default. That pod can be a Couchbase SDK client or another Couchbase cluster as an XDCR source. The local DNS server will capture DNS requests that are destined for the remote.svc.cluster.local. domain and forward them to a remote Kubernetes DNS server. The Couchbase cluster we wish to connect to resides in this remote Kubernetes cluster. Any DNS records in the remote cluster for the remote namespace will be visible to a local pod in the local cluster. This includes individual A records for pods and SRV records generated for stable service discovery.

Any other DNS records that do not fall into the remote domain will be forwarded to the local Kubernetes cluster DNS server.

The configuration will shadow the remote namespace — if one exists — in the local cluster. As a result, it is recommended that Couchbase server instances are provisioned in globally unique namespaces to prevent shadowing.

The following configuration is for demonstration purposes only and does not fully provide high-availability. Enterprise CoreDNS Deployment discusses improvements that can be made for production deployments.

Provisioning CoreDNS

CoreDNS is provisioned in the namespace into which your client is to be deployed. CoreDNS is deployed as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
data:
  Corefile: |-
    remote.svc.cluster.local:5353 { (1)
      forward . 10.32.3.2 (2)
    }
    .:5353 { (3)
      forward . 10.39.240.10 (4)
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
spec:
  replicas: 1
  selector:
    matchLabels:
      app: coredns
  template:
    metadata:
      labels:
        app: coredns
    spec:
      containers:
      - name: coredns
        image: coredns/coredns:1.6.6
        args:
        - '-conf'
        - '/etc/coredns/Corefile'
        ports:
        - name: dns
          containerPort: 5353
        volumeMounts:
        - name: config
          mountPath: /etc/coredns
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: coredns
---
apiVersion: v1
kind: Service
metadata:
  name: coredns
spec:
  selector:
    app: coredns
  ports:
  - name: dns
    protocol: UDP
    port: 53
    targetPort: 5353
  - name: dns-tcp
    protocol: TCP
    port: 53
    targetPort: 5353

1	This server block defines our remote namespace domain. The `remote` component is the namespace we wish to forward, and can be modified as required. Port `5353` is arbitrary, but must be greater that 1024 so that it can be opened by an unprivileged process.
2	This forwarding rule specifies that all DNS entries within the server block should be forwarded to the remote IP address. This IP address can be determined with the command `kubectl -n kube-system get endpoints` on the remote cluster. Select an IP address, or multiple IP addresses from the DNS service typically `coredns` or `kube-dns`.
3	This server block defines our default DNS server. Port `5353` is arbitrary, but must be greater that 1024 so that it can be opened by an unprivileged process. The port should also be the same as that used in any remote server blocks.
4	This forwarding rule specifies that all DNS entries within the server block should be forwarded to the local IP address. This IP address should be the cluster IP address of the cluster DNS service. It can be determined with the command `kubectl -n kube-system get services` on the local cluster. The service will be named `coredns`, `kube-dns` or similar depending on your platform. This IP address is stable.

To get the remote DNS server, for example:

$ kubectl -n kube-system get endpoints
NAME                      ENDPOINTS                                                  AGE
kube-controller-manager   <none>                                                     39m
kube-dns                  10.1.28.194:53,10.1.90.175:53,10.1.28.194:53 + 1 more...   39m
kube-scheduler            <none>                                                     39m

You could then use 10.1.28.194.

To get the local DNS server, for example:

$ kubectl -n kube-system get svc
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP   43m

You could then use 172.20.0.10.

Forwarding rules can be customized as desired per the CoreDNS documentation.

Testing CoreDNS

For the following demonstration we’ve installed the Operator and a Couchbase cluster into the remote namespace on the remote Kubernetes cluster. Much like the ping test we performed earlier, we can now test DNS and ensure we can see the remote resources:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether d2:f6:af:21:ac:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.53.182/32 brd 10.0.53.182 scope global eth0 (1)
       valid_lft forever preferred_lft forever
$ dig _couchbase._tcp.cb-example-srv.remote.svc.cluster.local. SRV

; <<>> DiG 9.10.3-P4-Ubuntu <<>> _couchbase._tcp.cb-example-srv.remote.svc.cluster.local. SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35383
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_couchbase._tcp.cb-example-srv.remote.svc.cluster.local. IN SRV

;; ANSWER SECTION:
_couchbase._tcp.cb-example-srv.remote.svc.cluster.local. 1 IN SRV 0 33 11210 10-1-17-231.cb-example-srv.remote.svc.cluster.local. (2)
_couchbase._tcp.cb-example-srv.remote.svc.cluster.local. 1 IN SRV 0 33 11210 10-1-46-20.cb-example-srv.remote.svc.cluster.local.
_couchbase._tcp.cb-example-srv.remote.svc.cluster.local. 1 IN SRV 0 33 11210 10-1-68-212.cb-example-srv.remote.svc.cluster.local.

;; ADDITIONAL SECTION:
10-1-17-231.cb-example-srv.remote.svc.cluster.local. 1 IN A 10.1.17.231
10-1-46-20.cb-example-srv.remote.svc.cluster.local. 1 IN A 10.1.46.20
10-1-68-212.cb-example-srv.remote.svc.cluster.local. 1 IN A 10.1.68.212

;; Query time: 48 msec
;; SERVER: 172.20.92.77#53(172.20.92.77)
;; WHEN: Tue Jan 14 12:44:27 UTC 2020
;; MSG SIZE  rcvd: 661

1	The local host is in `10.0.0.0/16` therefore in Cluster 1.
2	An equivalent SRV lookup for the Couchbase connection string `couchbase://cb-example-srv.remote.svc` results in a list of A records. These A records point to hosts in the remote cluster `10.1.0.0/16`.

Enterprise CoreDNS Deployment

The DNS deployment architecture used in this tutorial has a number of downsides that should be addressed for a production deployment. Consider the following diagram:

Figure 4. Enterprise CoreDNS Architecture

The first improvement that can be made is by having multiple local proxy DNS servers. This provides a rolling upgrade path that will result in zero downtime if an upgrade is misconfigured and a pod does not come up. Additionally pod anti-affinity should be used to prevent multiple pods being affected if a Kubernetes node goes down.

The next improvement is the addition of an internal load balancer. Some managed cloud providers allow a load balancer service to be created that has a private IP address allocated from the host VPC. This can be targeted at the remote cluster DNS deployment. By exposing the remote DNS as a service, this provides a stable IP address for the local forwarding DNS service to use.

While not depicted it is also possible to create another DNS service in the remote cluster that only forwards to the remote cluster DNS service. This raises the possibility of having that DNS service operating with the DNS over TLS (DoT) protocol. Using DoT provides a secure transport layer between the two clusters. Mutual TLS (mTLS) may also be employed in order to provide additional security.