Installing on AKS

Install the Couchbase Autonomous Operator on Microsoft Azure Kubernetes Service (AKS).

This guide walks through the recommended procedure for installing the Couchbase Autonomous Operator on Microsoft Azure Kubernetes Service (AKS).

Prerequisites

Install the azure-cli and login to the Azure Portal.

az login

Get a list of your subscriptions.

az account list --output table

Set the subscription ID or name you want to use.

az account set --subscription "My Subscription"

AKS Setup

Create a Resource Group

Resource groups allow administrators to co-locate and coordinate operations across a group of resources such as VMs, disks, and networks. All resources created by AKS will belong to this group.

az group create --name myResourceGroup --location eastus

Create a Network

In order for XDCR to work, a layer 3 tunnel between the two cluster networks is required. This is so that nodes on one network can talk to nodes on the other, which are in turn port-forwarded onto your Couchbase nodes. As such, these must be non-overlapping. If we use the default setting, the first cluster would get the prefix 10.0.0.0/8, as would the second.

az network vnet create -g myResourceGroup -n myAKSVnet  --address-prefix 10.0.0.0/12  --subnet-name myAKSSubnet --subnet-prefix 10.8.0.0/16

Create the AKS Cluster

Create an AKS cluster within the allocated resource group on the subnet created in the previous step. Note monitoring is enabled and ssh-keys are being auto-generated. Also, the default virtual machine type is 'Standard_DS2_v2' which supports a maximum of 4 disks per VM. (See Best Practices for recommended VM size based on the requirements of your deployment.)

subnet_id=$(az network vnet show -g myResourceGroup -n myAKSVnet --query subnets[].id -o tsv)
az aks create -g myResourceGroup -n myAKSCluster --node-count 3  --vnet-subnet-id $subnet_id --service-cidr 10.0.0.0/16 --dns-service-ip 10.0.0.10 --generate-ssh-keys   --location eastus  --network-plugin azure  --kubernetes-version 1.12.7

To check that the --kubernetes-version exists within your AKS environment, run the following command:

az aks get-versions --location eastus --query orchestrators[].orchestratorVersion -o tsv`

Installing the Operator and Couchbase

Once you’ve properly deployed the Kubernetes cluster with Microsoft AKS, you can install the admission controller and the Operator, and use it to deploy a Couchbase cluster as normal.

If you do not already have kubectl installed in your CLI, run the following command:

az aks install-cli

Get the credentials for your cluster:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

Upgrading the AKS Cluster

An AKS cluster can be upgraded to a new version if applicable.

The following steps outline how to perform an upgrade using the az-cli tool, but the same steps can be performed through the Azure Portal.

To check if a new version is available for your cluster, use the get-upgrades command:

az aks get-upgrades -g myResourceGroup -n myAKSCluster --query agentPoolProfiles[].upgrades -o tsv

To proceed with an upgrade, simply run the upgrade command with the desired version:

az aks upgrade -g myResourceGroup -n myAKSCluster --kubernetes-version 1.13.5

Best Practices

Disk limits

AKS nodes have a maximum disk limit. This will restrict the number of persistent volumes that can be created within a cluster. The limits apply on a per-node basis. For example, if the VM type in your AKS cluster supports a maximum of eight data disks, and you have four nodes in your cluster, then your cluster can support 32 volumes.

See the Azure documentation for more information about max data disks.

Single disk per pod

Given the slow mount time, it’s best to limit each pod to one disk (excluding the default disk). For example, separate server classes should be created for Data, Index, and Analytics pods.

Consider the limitations of Azure Disks

Azure Disks are attached to VMs when pods are created. There are several known issues related to this behavior, most notably that manual intervention is required on node failure, as well as the occurrence of failures related to slow detach/attach times when pods are moved.

Third-party storage providers like Portworx decouple volume-to-node attachment by instead creating a replicating pool of storage. Storage nodes may also be run separately from compute nodes. Most issues with persistent volumes on AKS are the result of nodes being attached and moved between nodes.

Known Issues

  • AKS doesn’t support Azure Availability Zones

    At the time of this writing, AKS doesn’t support Azure Availability Zones. Rather, AKS supports Azure Availability Sets to achieve high availability.

    Availability Sets are labeled numerically (e.g. 0 and 1). This means that server groups also have to be named β€œ0” and β€œ1".

  • Failed nodes require manual volume failover

    When an Azure node is down and has volumes attached, a forced detach is required. This has to be done using the Azure CLI. If a recovery or upgrade is happening at the time of node failover, then it will fail. *On delta recovery create new PVC

  • Slow volume mounts

    AKS volume mounts may cause pod creation to time out while waiting for the availability of its underlying disks. The main reason for this is due to the time it takes to attach a disk to the VM versus the time the attach controller expects the disk to be attached. When the latter exceeds the former, failed mount events are generated and the attach controller enters into a state of high backoff wait time: