Aggregate Functions

  • Capella Operational
  • reference

Aggregate functions take multiple values from documents, perform calculations, and return a single value as the result. The function names are case insensitive.

You can only use aggregate functions in SELECT, LETTING, HAVING, and ORDER BY clauses. When using an aggregate function in a query, the query operates as an aggregate query.

In Couchbase Capella, you can use aggregate functions as window functions by specifying a window definition using the OVER Clause.

In Couchbase Server 7.0 and later, window functions (and aggregate functions used as window functions) may specify their own inline window definitions, or they may refer to a named window defined by the WINDOW clause elsewhere in the query. By defining a named window with the WINDOW clause, you can reuse the window definition across several functions in the query, potentially making the query easier to write and maintain.

Syntax

This section describes the generic syntax of aggregate functions. For details of individual aggregate functions, see the sections that follow.

aggregate-function ::= aggregate-function-name '(' ( aggregate-quantifier? expr |
                       ( path '.' )? '*' ) ')' filter-clause? over-clause?
Syntax diagram
aggregate-quantifier

Aggregate Quantifier

filter-clause

FILTER Clause

over-clause

OVER Clause

Arguments

Aggregate functions take a single expression as an argument, which is used to compute the aggregate function. The COUNT function can instead take a wildcard (*) or a path with a wildcard (path.*) as its argument.

Aggregate Quantifier

aggregate-quantifier ::= 'ALL' | 'DISTINCT'
Syntax diagram

An aggregate quantifier determines whether the function aggregates all values in a group or only distinct values. You can use the quantifiers only with aggregate functions.

ALL

Includes all values in the computation.

DISTINCT

Includes only distinct values in the computation.

These quantifiers are optional and the default value is ALL.

FILTER Clause

filter-clause ::= 'FILTER' '(' 'WHERE' cond ')'
Syntax diagram

The FILTER clause enables you to specify which values are included in the aggregate. This clause is available for aggregate functions, and aggregate functions used as window functions. (It is not permitted for dedicated window functions.)

The FILTER clause is useful when a query contains several aggregate functions, each of which requires a different condition.

cond

[Required] Conditional expression. Values for which the condition resolves to TRUE are included in the aggregation.

The conditional expression is subject to the same rules as the conditional expression in the query WHERE clause, and the same rules as aggregation operands. It may not contain a subquery, a window function, or an outer reference.

If the query block contains an aggregate function which uses the FILTER clause, the aggregation is not pushed down to the indexer. Refer to indexes:groupby-aggregate-performance.adoc#filter-clause for more details.

OVER Clause

over-clause ::= 'OVER' ( '(' window-definition ')' | window-ref )
Syntax diagram

The OVER clause introduces the window specification for the function. There are two ways of specifying the window.

  • An inline window definition specifies the window directly within the function call. It is delimited by parentheses () and has exactly the same syntax as the window definition in a WINDOW clause. For further details, refer to Window Definition.

  • A window reference is an identifier which refers to a named window. The named window must be defined by a WINDOW clause in the same query block as the function call. For further details, refer to WINDOW Clause.

Default Values

If there is no input row for the group, COUNT functions return 0. All other aggregate functions return NULL.

ARRAY_AGG( [ ALL | DISTINCT ] expression)

Description

Returns an array of non-MISSING values from an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

The function ignores MISSING values, but includes NULL values.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • An array of non-MISSING values.

  • NULL if all values are MISSING.

Examples

List all values into an array
Query
SELECT ARRAY_AGG(input) AS agg_all
FROM [1, 2, 2, 3, "abc", MISSING, NULL]
AS input;
Results
[
  {
    "agg_all": [
      null,
      1,
      2,
      2,
      3,
      "abc"
    ]
  }
]
List distinct values into an array
Query
SELECT ARRAY_AGG(DISTINCT input) AS agg_distinct
FROM [1, 2, 2, 3, "abc", MISSING, NULL]
AS input;
Results
[
  {
    "agg_distinct": [
      null,
      1,
      2,
      3,
      "abc"
    ]
  }
]

AVG( [ ALL | DISTINCT ] expression)

This function has a synonym MEAN().

Description

Returns the arithmetic mean (average) of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the mean.

  • NULL if all values are non-numeric, MISSING, or NULL.

Examples

Find the average of all numbers
Query
SELECT AVG(input) AS avg_all
FROM [1, 1, 2, 2, 3, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "avg_all": 1.8
  }
]
Find the average of distinct numbers
Query
SELECT AVG(DISTINCT input) AS avg_distinct
FROM [1, 1, 2, 2, 3, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "avg_distinct": 2
  }
]

In this example, the average is (1 + 2 + 3) / 3 = 2.

COUNT(*)

Description

Returns the total count of all rows in an aggregated group. [1]

The * wildcard indicates that the function should count all rows, including those with NULL and MISSING values.

To get the count of only non-NULL and non-MISSING values in a group, use COUNT(expression).

Arguments

This function does not take any arguments.

Return Value

The function returns 1 of the following:

  • A number that represents the count of all rows.

  • 0 if the group is empty.

Example

Find the count of all rows
Query
SELECT COUNT(*) AS count_all_rows
FROM [1, 1, 2, 2, 3, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "count_all_rows": 8
  }
]

COUNT( [ ALL | DISTINCT ] expression)

Description

Returns the count of all non-NULL and non-MISSING values in an expression. [1]

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

To get the count of all rows, including those with NULL and MISSING values, use COUNT(*).

Return Value

The function returns 1 of the following:

  • A number that represents the count.

  • 0 if all values are MISSING or NULL.

Examples

Find the count of all values
Query
SELECT COUNT(input) AS count_all
FROM [1, 1, 2, 2, 3, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "count_all": 6
  }
]
Find the count of distinct values
Query
SELECT COUNT(DISTINCT input) AS count_distinct
FROM [1, 1, 2, 2, 3, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "count_distinct": 4
  }
]

COUNTN( [ ALL | DISTINCT ] expression )

Description

Returns the count of numeric values in an expression. [1]

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the count.

  • 0 if there are no numeric values.

Examples

Find the count of all numeric values
Query
SELECT COUNTN(input) AS count_all
FROM [ 1, 1, 2, 2, 3, "abc", MISSING, NULL]
AS input;
Results
[
  {
    "count_all": 5
  }
]
Find the count of distinct numeric values
Query
SELECT COUNTN(DISTINCT input) AS count_distinct
FROM [ 1, 1, 2, 2, 3, "abc", MISSING, NULL]
AS input;
Results
[
  {
    "count_distinct": 3
  }
]

MAX( [ ALL | DISTINCT ] expression)

Description

Returns the maximum value in an expression. The function ignores MISSING and NULL values.

When comparing values of different data types, the function uses SQL++ collation rules to determine precedence.

The ALL and DISTINCT quantifiers do not affect the result of this function. The maximum value remains the same whether or not duplicates are included.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • The maximum value in the group.

  • NULL if all values are either MISSING or NULL.

Examples

Find the max value from a group of numbers
Query
SELECT MAX(input) AS max_value_num
FROM [1, 3, 2, 3, MISSING, NULL]
AS input;
Results
[
  {
    "max_value_num": 3
  }
]
Find the max value from a group with mixed types
Query
SELECT MAX(input) AS max_value_all
FROM [1, 2, 3, "airline", "2025-12-01T00:00:00Z", NULL]
AS input;
Results
[
  {
    "max_value_all": "airline"
  }
]

The function returns airline because strings have a higher precedence than numbers and dates.

Find the max value from a group of strings
Query
SELECT MAX(input) AS max_value_string
FROM ["United", "Delta", "American", "Southwest"]
AS input;
Results
[
  {
    "max_value_string": "United"
  }
]

When comparing string values, the function uses alphabetical order to determine the maximum value.

MEAN( [ ALL | DISTINCT ] expression)

Synonym of AVG().

MEDIAN( [ ALL | DISTINCT ] expression)

Description

Returns the median of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the median value.

  • A number that represents the mean of the 2 median values if the number of numeric values is even.

  • NULL if there are no numeric values in the group.

Examples

Find the median when the number of values is odd
Query
SELECT MEDIAN(input) AS median_value
FROM [1, 2, 3, 3, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "median_value": 3
 }
]

In this example, the median is the middle value 3.

Find the median when the number of values is even
Query
SELECT MEDIAN(input) AS median_value
FROM [1, 2, 3, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "median_value": 3
 }
]

In this example, the median is the mean of the 2 middle values (3 and 3), which is 3.

Find the median of distinct values
Query
SELECT MEDIAN(DISTINCT input) AS median_value
FROM [1, 2, 3, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "median_value": 2.5
  }
]

In this example, the number of distinct numeric values is even. Therefore, the median is the mean of the 2 middle values (2 and 3), which is 2.5.

MIN( [ ALL | DISTINCT ] expression)

Description

Returns the minimum value in an expression. The function ignores MISSING and NULL values.

When comparing values of different data types, the function uses SQL++ collation rules to determine precedence.

The ALL and DISTINCT quantifiers do not affect the result of this function. The minimum value remains the same whether or not duplicates are included.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • The minimum value in the group.

  • NULL if all values are either MISSING or NULL.

Examples

Find the minimum value from a group of numbers
Query
SELECT MIN(input) AS min_value_num
FROM [3, 1, 2, 1, MISSING, NULL]
AS input;
Results
[
  {
    "min_value_num": 1
  }
]
Find the minimum value from a group with mixed types
Query
SELECT MIN(input) AS min_value_all
FROM [1, 2, 3, "airline", "2025-12-01T00:00:00Z", NULL]
AS input;
Results
[
  {
    "min_value_all": 1
  }
]

The function returns 1 because numbers have a lower precedence than strings and dates.

Find the minimum value from a group of strings
Query
SELECT MIN(input) AS min_value_string
FROM ["United", "Delta", "American", "Southwest"]
AS input;
Results
[
  {
    "min_value_string": "American"
  }
]

When comparing string values, the function uses alphabetical order to determine the minimum value.

STDDEV( [ ALL | DISTINCT ] expression)

Description

Returns the corrected sample standard deviation of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

This function is similar to STDDEV_SAMP(). However, it returns 0 if there is only 1 matching value, while STDDEV_SAMP() returns NULL in such cases.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the corrected sample standard deviation.

  • 0 if the group contains only 1 numeric value.

  • NULL if there are no numeric values in the group.

Examples

Find the standard deviation of all values
Query
SELECT STDDEV(input) AS std_deviation_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "std_deviation_all": 1.3038404810405297
  }
]
Find the standard deviation of distinct values
Query
SELECT STDDEV(DISTINCT input) AS std_deviation_distinct
FROM [1, 2, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "std_deviation_distinct": 1.2909944487358056
  }
]
Find the standard deviation of a single numeric value
Query
SELECT STDDEV(input) AS std_deviation_single
FROM [3, NULL, "abc"]
AS input;
Results
[
  {
    "std_deviation_single": 0
  }
]

STDDEV_POP( [ ALL | DISTINCT ] expression)

Description

Returns the population standard deviation of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the population standard deviation.

  • 0 if the group contains only 1 numeric value.

  • NULL if there are no numeric values in the group.

Examples

Find the population standard deviation of all values
Query
SELECT STDDEV_POP(input) AS pop_deviation_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "pop_deviation_all": 1.16619037896906
  }
]
Find the population standard deviation of distinct values
Query
SELECT STDDEV_POP(DISTINCT input) AS pop_deviation_distinct
FROM [1, 2, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "pop_deviation_distinct": 1.118033988749895
  }
]

STDDEV_SAMP( [ ALL | DISTINCT ] expression)

Description

Returns the corrected sample standard deviation of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

This function is similar to STDDEV(). However, it returns NULL if there is only 1 matching value, while STDDEV() returns 0 in such cases.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the sample standard deviation.

  • NULL if there are fewer than 2 numeric values in the group.

Example

Find the sample standard deviation of all values
Query
SELECT STDDEV_SAMP(input) AS std_deviation_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "std_deviation_all": 1.3038404810405297
  }
]
Find the sample standard deviation of distinct values
Query
SELECT STDDEV_SAMP(DISTINCT input) AS std_deviation_distinct
FROM [1, 2, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "std_deviation_distinct": 1.2909944487358056
  }
]
Find the sample standard deviation of a single numeric value
Query
SELECT STDDEV_SAMP(input) AS std_dev_sample
FROM [3, NULL, "abc"]
AS input;
Results
[
  {
    "std_dev_sample": null
  }
]

SUM( [ ALL | DISTINCT ] expression)

Description

Returns the arithmetic sum of all number values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the sum.

  • NULL if there are no numeric values in the group.

Examples

Find the sum of all numbers
Query
SELECT SUM(input) AS sum_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "sum_all": 14
  }
]
Find the sum of distinct numbers
Query
SELECT SUM(DISTINCT input) AS sum_distinct
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "sum_distinct": 10
  }
]

VARIANCE( [ ALL | DISTINCT ] expression)

Description

Returns the unbiased sample variance (the square of the corrected sample standard deviation) of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

This function is similar to VARIANCE_SAMP(). However, it returns 0 if there is only 1 matching value, while VARIANCE_SAMP() returns NULL in such cases.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the unbiased sample variance.

  • 0 if the group contains only 1 numeric value.

  • NULL if there are no numeric values in the group.

Examples

Find the sample variance of all values
Query
SELECT VARIANCE(input) AS variance_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "variance_all": 1.7
  }
]
Find the sample variance of distinct values
Query
SELECT VARIANCE(DISTINCT input) AS variance_distinct
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "variance_distinct": 1.6666666666666667
  }
]
Find the sample variance of a single numeric value
Query
SELECT VARIANCE(input) AS variance_single
FROM [3, NULL, "abc"]
AS input;
Results
[
  {
    "variance_single": 0
  }
]

VARIANCE_POP( [ ALL | DISTINCT ] expression)

This function has a synonym VAR_POP().

Description

Returns the population variance (the square of the population standard deviation) of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the population variance.

  • NULL if there are no numeric values in the group.

Examples

Find the population variance of all values
Query
SELECT VARIANCE_POP(input) AS pop_variance_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "pop_variance_all": 1.3599999999999999
  }
]
Find the population variance of distinct values
Query
SELECT VARIANCE_POP(DISTINCT input) AS pop_variance_distinct
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "pop_variance_distinct": 1.25
  }
]
Find the population variance of a single numeric value
Query
SELECT VARIANCE_POP(input) AS pop_variance_single
FROM [3, NULL, "abc"]
AS input;
Results
[
  {
    "pop_variance_single": null
  }
]

VARIANCE_SAMP( [ ALL | DISTINCT ] expression)

This function has a synonym VAR_SAMP().

Description

Returns the unbiased sample variance (the square of the corrected sample standard deviation) of all numeric values in an expression.

You can use the ALL or DISTINCT quantifier to specify which values to include in the calculation. For more information, see Aggregate Quantifier.

This function is similar to VARIANCE(). However, it returns NULL if there is only 1 matching value, while VARIANCE() returns 0 in such cases.

Arguments

See Syntax.

Return Value

The function returns 1 of the following:

  • A number that represents the sample variance.

  • NULL if there is fewer than 2 numeric values in the group.

Examples

Find the sample standard variance of all values
Query
SELECT VARIANCE_SAMP(input) AS variance_all
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "variance_all": 1.7
  }
]
Find the sample variance of distinct values
Query
SELECT VARIANCE_SAMP(DISTINCT input) AS variance_distinct
FROM [1, 2, 3, 4, 4, MISSING, NULL, "abc"]
AS input;
Results
[
  {
    "variance_distinct": 1.6666666666666667
  }
]
Find the sample variance of a single numeric value
Query
SELECT VARIANCE_SAMP(input) AS variance_single
FROM [3, NULL, "abc"]
AS input;
Results
[
  {
    "variance_single": null
  }
]

VAR_POP( [ ALL | DISTINCT ] expression)

Synonym of VARIANCE_POP().

VAR_SAMP( [ ALL | DISTINCT ] expression)

Synonym of VARIANCE_SAMP().

Formulas

Corrected Sample Standard Deviation

Formula for calculating the corrected sample standard deviation:

\$s = sqrt(1/(n-1) sum_(i=1)^n (x_i - barx)^2)\$
Population Standard Deviation

Formula for calculating the population standard deviation:

\$sigma = sqrt((sum(x_i - mu)^2)/N)"\$

1. When counting all the documents within a collection, this function usually relies on the collection statistics, which include any transaction records that may be stored in that collection. However, if the query performs an index scan using the primary index on that collection, counting all documents does not include any transaction records.