Rebalance/Failover

      +

      Rebalance

      The search service maintains a cluster-wide set of index definitions and metadata, which allows the redistribution of indexes and index replicas during a rebalance.

      Handling partitions during a rebalance

      The search service redistributes the index partitions across the available search service nodes for a balanced partition-node assignment during a rebalance operation. The newly assigned index partitions are built utilizing the DCP feed on the new nodes. Once the new partitions are built up to the most current or latest sequence numbers, the partitions are promoted to handle live traffic, and the older partitions are deleted from the system.

      During rebalancing, the search service does not wait for all replicas of an index to be built. This is expected behavior since the replica partitions are not used for serving FTS query traffic.

      Impacts on the live traffic updates

      The live traffic is never functionally affected. However, we cannot entirely rule out the performance impacts of a background rebalance operation on the live cluster traffic. If the user has mutations happening on their data service nodes, the new data will continue to be ingested into the FTS indexes despite the rebalance/failover operation.

      Tips for faster rebalance operations

      As the rebalance is a resource-consuming cluster management operation, it is always recommended to perform rebalancing during off-peak hours.

      Search service moves or builds the index partitions one at a time per node during the rebalance operations, which could significantly increase the overall time taken for the rebalance operation.

      There is a configurable option [maxConcurrentPartitionMovesPerNode] to bring the additional concurrency to how we move/build partitions during a rebalance operation.

      By overriding the maxConcurrentPartitionMovesPerNode parameter using the runtime cluster option, we can control the concurrency of partition moves, enabling faster rebalance completion times.

      Knowing what value to configure this parameter depends on the particular use case and available network bandwidth. Adequate testing must be done in a pre-production environment prior to deploying production clusters with this setting.

      Configuring the maximum concurrent partition moves per node

      Use the update endpoint for manager options.

      curl -XPUT -u <username>:<password> http://<nodeIP>:8094/api/managerOptions -d '
      {"maxConcurrentPartitionMovesPerNode":"5"}'

      Checking the current value for maxConcurrentPartitionMovesPerNode in a cluster

      curl -XGET -u <username>:<password> http://<nodeIP>:8094/api/manager

      Note: When multiple partitions are built parallel, it needs more RAM and mandates a higher RAM quota. As the rebalance operations consume resources, it is always advisable to plan them during non-peak hours. One way to speed up the rebalance operation is to enable the movement of partitions in parallell in a configurable way.

      Dealing with rebalance failures

      Failovers

      During failover of FTS service nodes, there is no partition movement, and hence the failover-rebalance is instantaneous. Search service promotes the replica index partitions to primary so that those serve the live cluster traffic instantly. Users can use failover and recovery rebalance while applying patches or upgrading software/hardware.

      Things to keep track of during a rebalance

      A failed over node can be recovered: added back into the cluster from which it was failed over through the rebalance operation. The index partitions residing on the recovered node would be reused during a recovery rebalance operation with no extra node additions/removals. This speeds up the recovery rebalance operation.

      Failover-recovery steps

      1. Failover the node that needs a quick software update or hardware maintenance.

      2. Failover triggers replica partitions to become active. With replica partitions, live traffic can be served seamlessly.

      3. Perform the software update or hardware maintenance operations.

      4. Perform recover rebalance operation.

      5. Cluster is back to normal/pre-failover safe state.