A newer version of this documentation is available.

View Latest

Data Service nodes

Data Service nodes handle data service operation, such as create/read/update/delete (CRUD) operations.

It is important to keep use cases and application workloads in mind since different application workloads have different resource requirements. For example, if your working set needs to be fully in memory, you might need large RAM size. On the other hand, if your application requires only 5% of data in memory, you will need enough disk space to store all the data and fast enough disks for your read/write operations.

You can start sizing the Data Service nodes by asking the following questions:

  1. Is your application primarily (or even exclusively) using individual document access?

  2. Do you plan to use Views?

  3. Do you plan to use XDCR? Please refer to Couchbase Server 4.0 documentation about different deployment topology.

  4. What’s your working set size and what are your data operation throughput or latency requirements?

Answers to the above questions can help you better understand the capacity requirement of your cluster and provide a better estimation for sizing.

Inside Couchbase Server, we looked at performance data and customer use cases to provide sizing calculations for each of the areas: CPU, Memory, Disk, and Network.

A use case for RAM sizing was used as an example with the following data:

Table 1. Input variables for sizing RAM
Input Variable value

documents_num

1,000,000

ID_size

100

value_size

10,000

number_of_replicas

1

working_set_percentage

20%

Table 2. Constants for sizing RAM
Constants value

Type of Storage

SSD

overhead_percentage

25%

metadata_per_document

56 for 2.1 and higher, 64 for 2.0.x

high_water_mark

85%

Based on the provided data, a rough sizing guideline formula would be:

Table 3. Guideline formula for sizing a cluster
Variable Calculation

no_of_copies

1 + number_of_replicas

total_metadata

(documents_num) * (metadata_per_document + ID_size) * (no_of_copies)

total_dataset

(documents_num) * (value_size) * (no_of_copies)

working_set

total_dataset * (working_set_percentage)

Cluster RAM quota required

(total_metadata + working_set) * (1 + headroom) / (high_water_mark)

number of nodes

Cluster RAM quota required / per_node_ram_quota

Based on the above formula, these are the suggested sizing guidelines:

Table 4. Suggested sizing guideline
Variable Calculation

no_of_copies

= 1 for original and 1 for replica

total_metadata

= 1,000,000 * (100 + 56) * (2) = 312,000,000

total_dataset

= 1,000,000 * (10,000) * (2) = 20,000,000,000

working_set

= 20,000,000,000 * (0.2) = 4,000,000,000

Cluster RAM quota required

= (312,000,000 + 4,000,000,000) * (1+0.25)/(0.85) = 6,341,176,470

This tells you that the RAM requirement for whole cluster is 7GB. Note that this amount is in addition to the RAM requirements for the operating system and any other software that runs on the cluster nodes.