Public Cloud Prerequisites
Vendor specific tasks to perform before installing the Operator.
Kubernetes is supposed to be portable so a workload can be moved seamlessly from one cloud to another. There is however, scope in Kubernetes that allows for implementations to differ in ways that are not generic. This page details any tasks that need to be performed before deploying the Operator on public cloud infrastructure.
- Storage classes
The EBS volume type
io1is recommended over
gp2for any Storage Classes due to its performance characteristics. The official AWS user guide itself recommends
io1for “Large database workloads”.
Google GKE uses proprietary authentication in order to set up your Kubernetes configuration file.
The Google Cloud SDK can allow you to authenticate and configure with the
gcloud auth login command.
By default, Google GKE does not provide the necessary privileges required to deploy the Autonomous Operator. The required privileges can be granted to a specific user with the following command:
$ kubectl create clusterrolebinding \ john-doe-admin-binding \ (1) --clusterrole cluster-admin \ (2) --user email@example.com (3)
By default, the GKE control plane is firewalled off from Kubernetes nodes. This behavior stops the Operator dynamic admission controller (DAC) from functioning. You must configure firewall rules to allow the GKE control plane to connect to port 8443 on the DAC pod.
Microsoft AKS uses proprietary authentication in order to set up your Kubernetes configuration file.
The Azure CLI can allow you to authenticate and configure with the
az login command.
- Disk limits
AKS nodes have a maximum disk limit. This will restrict the number of persistent volumes that can be created within a cluster. The limits apply on a per-node basis. For example, if the VM type in your AKS cluster supports a maximum of eight data disks, and you have four nodes in your cluster, then your cluster can support 32 volumes.
See the Azure documentation for more information about max data disks.
- Single disk per pod
Given the slow mount time, it’s best to limit each pod to one disk (excluding the default disk). For example, separate server classes should be created for Data, Index, and Analytics pods.
- Consider the limitations of Azure Disks
Azure Disks are attached to VMs when pods are created. There are several known issues related to this behavior, most notably that manual intervention is required on node failure, as well as the occurrence of failures related to slow detach/attach times when pods are moved.
Third-party storage providers like Portworx decouple volume-to-node attachment by instead creating a replicating pool of storage. Storage nodes may also be run separately from compute nodes. Most issues with persistent volumes on AKS are the result of nodes being attached and moved between nodes.
- AKS doesn’t support Azure Availability Zones
At the time of this writing, AKS doesn’t support Azure Availability Zones. Rather, AKS supports Azure Availability Sets to achieve high availability.
Availability Sets are labeled numerically (e.g.
1). This means that server groups also have to be named “0” and “1".
- Failed nodes require manual volume failover
When an Azure node is down and has volumes attached, a forced detach is required. This has to be done using the Azure CLI. If a recovery or upgrade is happening at the time of node failover, then it will fail. *On delta recovery create new PVC
- Slow volume mounts
AKS volume mounts may cause pod creation to time out while waiting for the availability of its underlying disks. The main reason for this is due to the time it takes to attach a disk to the VM versus the time the attach controller expects the disk to be attached. When the latter exceeds the former, failed mount events are generated and the attach controller enters into a state of high back off wait time: