You are viewing the documentation for a prerelease version.

View Latest

UPDATE STATISTICS

  • Developer Preview
The UPDATE STATISTICS statement collects statistics on expressions over a named keyspace. These statistics are used by the cost-based optimizer when choosing the optimal plan to execute a query.

This is a Developer Preview feature, intended for development purposes only. Do not use this feature in production. No Enterprise Support is provided for Developer Preview features.

Refer to Developer Preview Mode for more information.

Purpose

The cost-based optimizer uses statistics to generate a query plan with the lowest processing cost, for SELECT, UPDATE, DELETE, MERGE, and INSERT INTO with SELECT queries.

After creating the indexes, and before running a query, you must use the UPDATE STATISTICS statement to gather distribution statistics for the expressions used in the query. The cost-based optimizer uses these distribution statistics, together with keyspace and index statistics, to select the optimal indexes and generate the query plan.

Syntax

update-statistics ::= UPDATE STATISTICS [ FOR ] named-keyspace-ref '(' index-expr [ ',' index-expr ]* ')' [ WITH options ]
'UPDATE' 'STATISTICS' 'FOR'? named-keyspace-ref '(' index-expr ( ',' index-expr )* ')' 'WITH' options

Named Keyspace Reference

named-keyspace-ref ::= [ namespace ':' ] keyspace
( namespace ':' )? keyspace

The name of the keyspace for which you want to gather statistics.

Index Expression

The expression for which you want to gather statistics. This may be any expression that is supported as an index key, including, but not limited to:

Options

An object with the following properties:

sample_size

[Optional] An integer specifying the sample size to use for distribution statistics. A minimum sample size is also calculated. If the specified sample size is smaller than the minimum sample size, the minimum sample size is used instead.

resolution

[Optional] A float representing the percentage of documents to store in each distribution bin. If omitted, the default value is 1.0, meaning each distribution bin contains 1% of the documents, and therefore 100 bins are required. The minimum resolution is 0.02 (5000 distribution bins) and the maximum is 5.0 (20 distribution bins).

update_statistics_timeout

[Optional] A number representing a duration in seconds. The command times out when this timeout period is reached. If omitted, a default timeout value is calculated based on the number of samples used.

Refer to Distribution Statistics for more information on sample size and resolution.

Usage

When you use an index with a query, you typically create the index on the fields which the query uses to filter. To use the cost-based optimizer with that query, you must collect statistics on the same fields that you used to create the index.

A query may have predicates on non-indexed fields, and you can collect statistics on those fields also to help the optimizer.

For a query which filters on an array or array of objects, you must collect the statistics using exactly the same expression that you used to create the index.

Result

The statement returns an empty array.

Examples

Example 1. Single predicate
Create indexes
CREATE INDEX idx_country_city ON `travel-sample`(country, city);
CREATE INDEX idx_city_country ON `travel-sample`(city, country);
Update statistics
UPDATE STATISTICS for `travel-sample`(type, city, country);
Query
EXPLAIN SELECT COUNT(*) FROM `travel-sample` WHERE country = 'France';
Result
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan3",
          "cardinality": 147.4090909090909, (1)
          "cost": 82.99131818181819, (2)
          "covers": [
            "cover ((`travel-sample`.`country`))",
            "cover ((`travel-sample`.`city`))",
            "cover ((meta(`travel-sample`).`id`))",
            "cover (count(*))"
          ],
          "index": "idx_country_city",
          "index_group_aggs": {
            "aggregates": [
              {
                "aggregate": "COUNT",
                "expr": "1",
                "id": 3,
                "keypos": -1
              }
            ]
          },
          "index_id": "ebcdefa561f1e3f6",
          "index_projection": {
            "entry_keys": [
              3
            ]
          },
          "keyspace": "travel-sample",
          "namespace": "default",
          "spans": [
            {
              "exact": true,
              "range": [
                {
                  "high": "\"France\"",
                  "inclusion": 3,
                  "low": "\"France\""
                }
              ]
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "cover (count(*))"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT count(*) FROM `travel-sample` WHERE country = 'France';"
  }
]
1 The cardinality estimate for the index scan operator.
2 The cost estimate for the index scan operator.
Example 2. Multiple predicates

This example uses the same indexes as Example 1.

Update statistics
UPDATE STATISTICS for `travel-sample`(type, city, country, free_breakfast);

There is no index on the free_breakfast field. However, the query below refers to this field as a predicate, so we collect statistics on this field also.

Query
EXPLAIN SELECT COUNT(*) FROM `travel-sample` WHERE country = 'United States' AND type = 'hotel' AND free_breakfast = true;
Result
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan3",
          "cardinality": 703.8225187176041, (1)
          "cost": 396.2520780380112,
          "index": "idx_country_city",
          "index_id": "ebcdefa561f1e3f6",
          "index_projection": {
            "primary_key": true
          },
          "keyspace": "travel-sample",
          "namespace": "default",
          "spans": [
            {
              "exact": true,
              "range": [
                {
                  "high": "\"United States\"",
                  "inclusion": 3,
                  "low": "\"United States\""
                }
              ]
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "cardinality": 703.8225187176041, (2)
          "cost": 3296.0008551545407,
          "keyspace": "travel-sample",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "cardinality": 0.2514468840707154, (3)
                "condition": "((((`travel-sample`.`country`) = \"United States\") and ((`travel-sample`.`type`) = \"hotel\")) and ((`travel-sample`.`free_breakfast`) = true))",
                "cost": 3298.1123227106937
              },
              {
                "#operator": "InitialGroup",
                "aggregates": [
                  "count(*)"
                ],
                "group_keys": []
              }
            ]
          }
        },
        {
          "#operator": "IntermediateGroup",
          "aggregates": [
            "count(*)"
          ],
          "group_keys": []
        },
        {
          "#operator": "FinalGroup",
          "aggregates": [
            "count(*)"
          ],
          "group_keys": []
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "count(*)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT COUNT(*) FROM `travel-sample` WHERE country = 'United States' AND type = 'hotel' AND free_breakfast = true;"
  }
]
1 Cardinality and cost estimates for the index scan operator.
2 Cardinality and cost estimates for the fetch operator.
3 Cardinality and cost estimates for the filter operator.
Example 3. Filter on an array
Create index
CREATE INDEX idx_public_likes ON `travel-sample`((DISTINCT (`public_likes`)));
Update statistics
UPDATE STATISTICS FOR `travel-sample`((DISTINCT (`public_likes`)));
Query
EXPLAIN SELECT COUNT(1)
FROM `travel-sample`
WHERE ANY p IN public_likes SATISFIES p LIKE 'A%' END;
Results
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "DistinctScan",
          "cardinality": 7.088088423748743,
          "cost": 3.0904065527544518,
          "scan": {
            "#operator": "IndexScan3",
            "cardinality": 7.088088423748743,
            "cost": 3.0691422874832055,
            "covers": [
              "cover ((distinct ((`travel-sample`.`public_likes`))))",
              "cover ((meta(`travel-sample`).`id`))"
            ],
            "filter_covers": {
              "cover (any `p` in (`travel-sample`.`public_likes`) satisfies ((\"A\" \u003c= `p`) and (`p` \u003c \"B\")) end)": true,
              "cover (any `p` in (`travel-sample`.`public_likes`) satisfies (`p` like \"A%\") end)": true
            },
            "index": "idx_public_likes",
            ...
          }
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "cardinality": 0.012547701674046188,
                "condition": "cover (any `p` in (`travel-sample`.`public_likes`) satisfies (`p` like \"A%\") end)",
                "cost": 3.111670818025698
              },
              {
                "#operator": "InitialGroup",
                "aggregates": [
                  "count(1)"
                ],
                "group_keys": []
              }
            ]
          }
        },
        ...
      ]
    },
    "text": "SELECT COUNT(1)\nFROM `travel-sample`\nWHERE ANY p IN public_likes SATISFIES p LIKE 'A%' END"
  }
]
Example 4. Filter on an array of objects
Create index
CREATE INDEX idx_reviews_ratings_overall ON `travel-sample`(DISTINCT ARRAY r.ratings.Overall FOR r IN reviews END);
Update statistics
UPDATE STATISTICS FOR `travel-sample`(DISTINCT ARRAY r.ratings.Overall FOR r IN reviews END);
Query
EXPLAIN SELECT COUNT(1)
FROM `travel-sample`
WHERE ANY r IN reviews SATISFIES r.ratings.Overall = 4 END
Results
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan3",
          "cardinality": 57.22814556197126,
          "cost": 19.056972472136433,
          "covers": [
            "cover ((distinct (array ((`r`.`ratings`).`Overall`) for `r` in (`travel-sample`.`reviews`) end)))",
            "cover ((meta(`travel-sample`).`id`))",
            "cover (count(1))"
          ],
          "filter_covers": {
            "cover (any `r` in (`travel-sample`.`reviews`) satisfies (((`r`.`ratings`).`Overall`) = 4) end)": true
          },
          "index": "idx_reviews_ratings_overall",
          ...
        },
        ...
      ]
    },
    "text": "SELECT COUNT(1)\nFROM `travel-sample` WHERE ANY r IN reviews SATISFIES r.ratings.Overall = 4 END"
  }
]