You are viewing the documentation for a prerelease version.

View Latest

Aggregate Functions

Aggregate functions take multiple values from documents, perform calculations, and return a single value as the result. The function names are case insensitive.

You can only use aggregate functions in SELECT, LETTING, HAVING, and ORDER BY clauses. When using an aggregate function in a query, the query operates as an aggregate query.

In Couchbase Server 6.5 and later, aggregate functions can also be used as window functions, when they are used with a window definition, which is introduced by the OVER keyword.

This page describes the syntax of aggregate functions. See the OVER clause and Window Functions page for the generic syntax of window functions, window options, and the window definition.

Syntax

Aggregate Functions

aggregate-function ::= function-name '(' [ aggregate-quantifier ] expr ')'
aggregate function

Aggregate Functions as Window Functions

Enterprise Edition | Introduced in Couchbase Server 6.5

aggregate-function-window ::= function-name '(' [ aggregate-quantifier ] expr ')' OVER '(' window-definition ')'
function-name '(' aggregate-quantifier? expr ')' 'OVER' '(' window-definition ')'

Arguments

Aggregate functions take one argument, which is used to compute the aggregate function. The COUNT function can also take a wildcard (*) or a path with a wildcard (path.*) as its argument.

Aggregate Quantifier

aggregate-quantifier ::= ALL | DISTINCT
aggregate quantifier

The aggregate quantifier determines whether the function aggregates all values in the group, or distinct values only.

ALL

All objects are included in the computation.

DISTINCT

Only distinct objects are included in the computation.

This quantifier can only be used with aggregate functions.

This quantifier is optional. If omitted, the default value is ALL.

Default Values

If there is no input row for the group, COUNT functions return 0. All other aggregate functions return NULL.

ARRAY_AGG( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns an array of the non-MISSING values in the group, including NULL values.

With the DISTINCT quantifier, returns an array of the distinct non-MISSING values in the group, including NULL values.

Examples

List all values of the Cleanliness reviews given.

SELECT ARRAY_AGG(reviews[0].ratings.Cleanliness) AS Reviews FROM `travel-sample`;
Results
[
  {
    "Reviews": [
      -1,
      -1,
      -1,
      -1,
      -1,
      ...
    ]
  }
]

List all unique values of the Cleanliness reviews given.

SELECT ARRAY_AGG(DISTINCT reviews[0].ratings.Cleanliness) AS Reviews FROM `travel-sample`;
Results
[
  {
    "UniqueReviews": [
      -1,
      1,
      2,
      3,
      4,
      5
    ]
  }
]

AVG( [ ALL | DISTINCT ] expression)

This function has an alias MEAN().

Return Value

With the ALL quantifier, or no quantifier, returns the arithmetic mean (average) of all the number values in the group.

With the DISTINCT quantifier, returns the arithmetic mean (average) of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

Examples

The average altitude of airports in the travel-sample keyspace.

SELECT AVG(geo.alt) AS AverageAltitude FROM `travel-sample` WHERE type="airport";
Results
[
  {
    "AverageAltitude": 870.1651422764228
  }
]

The average number of airline route stops vs. the DISTINCT average of airline route stops.

SELECT AVG(stops) FROM `travel-sample`; (1)

SELECT AVG(DISTINCT stops) FROM `travel-sample`; (2)
1 Results in 0.0002 since nearly all docs have 0 stops.
2 Results in 0.5 since the docs contain only 1 or 0 stops.

COUNT(*)

Return Value

Returns count of all the input rows for the group, regardless of value.

Example

The number of documents in travel-sample.

SELECT COUNT(*) AS CountAll FROM `travel-sample`;
Results
[
  {
    "CountAll": 31591
  }
]

COUNT( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns count of all the non-NULL and non-MISSING values in the group.

With the DISTINCT quantifier, returns count of all the distinct non-NULL and non-MISSING values in the group.

Examples

The number of documents with an airline route stop in travel-sample regardless of its value.

SELECT COUNT(stops) AS CountOfStops FROM `travel-sample`;
Results
[
  {
    "CountOfStops": 24024
  }
]

The number of unique values of airline route stops in travel-sample.

SELECT COUNT(DISTINCT stops) AS CountOfDistinctStops FROM `travel-sample`;
Results
[
  {
    "CountOfSDistinctStops": 2 (1)
  }
]
1 Results in 2 because there are only 0 or 1 stops.

COUNTN( [ ALL | DISTINCT ] expression )

Return Value

With the ALL quantifier, or no quantifier, returns a count of all the numeric values in the group.

With the DISTINCT quantifier, returns a count of all the distinct numeric values in the group.

Examples

The count of numeric values in a mixed group.

SELECT COUNTN(list.val) AS CountOfNumbers
FROM [
  {"val":1},
  {"val":1},
  {"val":2},
  {"val":"abc"}
] AS list;
Results
[
  {
    "CountOfNumbers": 3
  }
]

The count of unique numeric values in a mixed group.

SELECT COUNTN(DISTINCT list.val) AS CountOfNumbers
FROM [
  {"val":1},
  {"val":1},
  {"val":2},
  {"val":"abc"}
] AS list;
Results
[
  {
    "CountOfNumbers": 2
  }
]

MAX( [ ALL | DISTINCT ] expression)

Return Value

Returns the maximum non-NULL, non-MISSING value in the group in N1QL collation order.

This function returns the same result with the ALL quantifier, the DISTINCT quantifier, or no quantifier.

Examples

Max of an integer field.

Find the northernmost latitude of any hotel in travel-sample.

SELECT MAX(geo.lat) AS MaxLatitude FROM `travel-sample` WHERE type="hotel";
Results
[
  {
    "MaxLatitude": 60.15356
  }
]

Max of a string field.

Find the hotel whose name is last alphabetically in travel-sample.

SELECT MAX(name) AS MaxName FROM `travel-sample` WHERE type="hotel";
Results
[
  {
    "MaxName": "pentahotel Birmingham"
  }
]

That result might have been surprising since lowercase letters come after uppercase letters and are therefore "higher" than uppercase letters. To avoid this uppercase/lowercase confusion, you should first make all values uppercase or lowercase, as in the following example.

Max of a string field, regardless of case.

Find the hotel whose name is last alphabetically in travel-sample.

SELECT MAX(UPPER(name)) AS MaxName FROM `travel-sample` WHERE type="hotel";
Results
[
  {
    "MaxName": "YOSEMITE LODGE AT THE FALLS"
  }
]

MEAN( [ ALL | DISTINCT ] expression)

Alias for AVG().

MEDIAN( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns the median of all the number values in the group. If there is an even number of number values, returns the mean of the median two values.

With the DISTINCT quantifier, returns the median of all the distinct number values in the group. If there is an even number of distinct number values, returns the mean of the median two values.

Returns NULL if there are no number values in the group.

Examples

The median altitude of airports in the travel-sample keyspace.

SELECT MEDIAN(geo.alt) AS MedianAltitude FROM `travel-sample` WHERE type="airport";
Results
[
  {
    "MedianAltitude": 361.5
  }
]

The median of distinct altitudes of airports in the travel-sample keyspace.

SELECT MEDIAN(DISTINCT geo.alt) AS MedianDistinctAltitude FROM `travel-sample` WHERE type="airport";
Results
[
  {
    "MedianDistinctAltitude": 758
  }
]

MIN( [ ALL | DISTINCT ] expression)

Return Value

Returns the minimum non-NULL, non-MISSING value in the group in N1QL collation order.

This function returns the same result with the ALL quantifier, the DISTINCT quantifier, or no quantifier.

Examples

Min of an integer field.

Find the southernmost latitude of any hotel in travel-sample.

SELECT MIN(geo.lat) AS MinLatitude FROM `travel-sample` WHERE type="hotel";
Results
[
  {
    "MinLatitude": 32.68092
  }
]

Min of a string field.

Find the hotel whose name is first alphabetically in travel-sample.

SELECT MIN(name) AS MinName FROM `travel-sample` WHERE type="hotel";
Results
[
  {
    "MinName": "'La Mirande Hotel"
  }
]

That result might have been surprising since some symbols come before letters and are therefore "lower" than letters. To avoid this symbol confusion, you can specify letters only, as in the following example.

Min of a string field, regardless of preceding non-letters.

Find the first hotel alphabetically in travel-sample.

SELECT MIN(name) AS MinName FROM `travel-sample` WHERE type="hotel" AND SUBSTR(name,0)>="A";
Results
[
  {
    "MinName": "AIRE NATURELLE LE GROZEAU Aire naturelle"
  }
]

STDDEV( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns the corrected sample standard deviation of all the number values in the group.

With the DISTINCT quantifier, returns the corrected sample standard deviation of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

Examples

Sample standard deviation of all values.

SELECT STDDEV(reviews[0].ratings.Cleanliness) AS StdDev FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "StdDev": 2.0554275433769753
  }
]

Sample standard deviation of a single value.

SELECT STDDEV(reviews[0].ratings.Cleanliness) AS StdDevSingle FROM `travel-sample` WHERE name="Sachas Hotel";
Results
[
  {
    "StdDevSingle": 0 (1)
  }
]
1 There is only one matching result in the input, so the function returns 0.

Sample standard deviation of distinct values.

SELECT STDDEV(DISTINCT reviews[0].ratings.Cleanliness) AS StdDevDistinct FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "StdDevDistinct": 2.1602468994692865
  }
]

STDDEV_POP( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns the population standard deviation of all the number values in the group.

With the DISTINCT quantifier, returns the population standard deviation of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

Examples

Population standard deviation of all values.

SELECT STDDEV_POP(reviews[0].ratings.Cleanliness) AS PopStdDev FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "PopStdDev": 2.0390493736539432
  }
]

Population standard deviation of distinct values.

SELECT STDDEV_POP(DISTINCT reviews[0].ratings.Cleanliness) AS PopStdDevDistinct FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
      "PopStdDevDistinct": 1.9720265943665387
  }
]

STDDEV_SAMP( [ ALL | DISTINCT ] expression)

A near-synonym for STDDEV(). The only difference is that STDDEV_SAMP() returns NULL if there is only one matching element.

Example

Sample standard deviation of a single value.

SELECT STDDEV_SAMP(reviews[0].ratings.Cleanliness) AS StdDevSamp FROM `travel-sample` WHERE name="Sachas Hotel";
Results
[
  {
    "StdDevSamp": null (1)
  }
]
1 There is only one matching result in the input, so the function returns NULL.

SUM( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns the sum of all the number values in the group.

With the DISTINCT quantifier, returns the arithmetic sum of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

Examples

The sum total of all airline route stops in travel-sample.

In the travel-sample bucket, nearly all flights are non-stop (0 stops) and only six flights have 1 stop, so we expect 6 flights of 1 stop each, a total of 6.
SELECT SUM(stops) AS SumOfStops FROM `travel-sample`;
Results
[
  {
    "SumOfStops": 6 (1)
  }
]
1 There are 6 routes with 1 stop each.

The sum total of all unique numbers of airline route stops in travel-sample.

SELECT SUM(DISTINCT stops) AS SumOfDistinctStops FROM `travel-sample`;
Results
[
  {
    "SumOfDistinctStops": 1 (1)
  }
]
1 There are only 0 and 1 stops per route; and 0 + 1 = 1.

VARIANCE( [ ALL | DISTINCT ] expression)

Return Value

With the ALL quantifier, or no quantifier, returns the unbiased sample variance (the square of the corrected sample standard deviation) of all the number values in the group.

With the DISTINCT quantifier, returns the unbiased sample variance (the square of the corrected sample standard deviation) of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

This function has a near-synonym VARIANCE_SAMP(). The only difference is that VARIANCE() returns NULL if there is only one matching element.

Examples

Sample variance of all values.

SELECT VARIANCE(reviews[0].ratings.Cleanliness) AS Variance FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "Variance": 4.224782386072708
  }
]

Sample variance of a single value.

SELECT VARIANCE(reviews[0].ratings.Cleanliness) AS VarianceSingle FROM `travel-sample` WHERE name="Sachas Hotel";
Results
[
  {
    "VarianceSingle": 0 (1)
  }
]
1 There is only one matching result in the input, so the function returns 0.

Sampling variance of distinct values.

SELECT VARIANCE(DISTINCT reviews[0].ratings.Cleanliness) AS VarianceDistinct FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "VarianceDistinct": 4.666666666666667
  }
]

VARIANCE_POP( [ ALL | DISTINCT ] expression)

This function has an alias VAR_POP().

Return Value

With the ALL quantifier, or no quantifier, returns the population variance (the square of the population standard deviation) of all the number values in the group.

With the DISTINCT quantifier, returns the population variance (the square of the population standard deviation) of all the distinct number values in the group.

Returns NULL if there are no number values in the group.

Examples

Population variance of all values.

SELECT VARIANCE_POP(reviews[0].ratings.Cleanliness) AS PopVariance FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
    "PopVariance": 4.157722348198537
  }
]

Population variance of distinct values.

SELECT VARIANCE_POP(DISTINCT reviews[0].ratings.Cleanliness) AS PopVarianceDistinct FROM `travel-sample` WHERE city="London" AND `type`="hotel";
Results
[
  {
      "PopVarianceDistinct": 3.8888888888888893
  }
]

VARIANCE_SAMP( [ ALL | DISTINCT ] expression)

A near-synonym for VARIANCE(). The only difference is that VARIANCE_SAMP() returns NULL if there is only one matching element.

This function has an alias VAR_SAMP().

Example

Sample standard deviation of a single value.

SELECT VARIANCE_SAMP(reviews[0].ratings.Cleanliness) AS VarianceSamp FROM `travel-sample` WHERE name="Sachas Hotel";
Results
[
  {
    "VarianceSamp": null (1)
  }
]
1 There is only one matching result in the input, so the function returns NULL.

VAR_POP( [ ALL | DISTINCT ] expression)

Alias for VARIANCE_POP().

VAR_SAMP( [ ALL | DISTINCT ] expression)

Alias for VARIANCE_SAMP().

Formulas

The corrected sample standard deviation is calculated according to the following formula.

s = sqrt(1/(n-1) sum_(i=1)^n (x_i - barx)^2)
Figure 1. Corrected Sample Standard Deviation

The population standard deviation is calculated according to the following formula.

sigma = sqrt((sum(x_i - mu)^2)/N)
Figure 2. Population Standard Deviation

GROUP BY Clause for GROUP BY, LETTING, and HAVING clauses.