Data Definition Language (DDL)

This section lists all supported DDL statements in Couchbase Analytics.

A user must have one of the following RBAC roles to be able to execute DDL statements (Admin, Cluster Admin, or Analytics Manager).
No CREATE/DROP or CONNECT/DISCONNECT statement can be executed while the cluster topology changes (e.g. during rebalance). The evaluation of such DDL statements will fail and it can be reattempted after the topology change has completed.

Entities

These are the entities that are required to create shadow datasets for Data Service buckets.

Dataverses

A dataverse is a namespace for the other metadata entities (links, buckets, datasets, indexes). There is one Default dataverse that is available without creating it and that cannot be deleted. Because the Default dataverse is always available, creating a dataverse is optional.

Datasets

Datasets contain a shadow of the data of a bucket. A dataset is updated as the bucket that it shadows is updated. A dataset can contain the full content of the bucket or a filtered subset. Multiple datasets can shadow the same bucket. A dataset is created ON a bucket with an optional filter expression. The name of the dataset is optional and defaults to the name of the bucket. Creating a dataset fails if there is an existing dataset with the same name in the selected dataverse and IF NOT EXISTS is not specified.

A link represents a connection to a Couchbase cluster, specifically to all buckets available to a specific user on a Couchbase cluster. Every dataverse implicitly contains a Local link for all buckets in the local cluster that are available to the current user.

When a bucket has one or more shadow datasets and the Local link is connected, you cannot create a new dataset to shadow that bucket, or drop a dataset which shadows that bucket. However, you can always create a new dataset on any bucket which does not currently have a shadow dataset, even if the Local link is connected. In this case, data ingestion to the new shadow dataset does not start at once. To start ingestion to the new shadow dataset, you must disconnect the Local link and connect it again.

Indexes

An index is a materialized access path for data in a dataset. To create an index on a dataset, the link to the dataset must be in the disconnected state. You can create more than one index on the same dataset. Each index name must be unique within a dataset. Creating an index fails if there is an existing index with the same name in the target dataset and IF NOT EXISTS is not specified.

For each JSON document ingested into a dataset, the system computes the indexed key for each index. The index key of a secondary index is computed as follows:

  • The target field values are extracted from the JSON document according to the specified path of fields;

  • For non-numeric typed index fields (i.e., the specified type is STRING), if the specified type of the field is the same as the actual field value’s type, the value becomes part of the indexed key, otherwise the indexed key cannot be built;

  • For numeric typed index fields (i.e., the specified type is DOUBLE or BIGINT), the actual field value can be cast to the specified type and it becomes part of the indexed key. If the cast is impossible, the indexed key cannot be built;

After the indexed key has been built, it is inserted into the secondary index. In case the index key cannot be built, there is no entry made in the index for this object.

Secondary indexes are automatically maintained by the system during data ingestion (i.e., when their corresponding buckets are connected and data are flowing into the system). In addition, they are automatically rebalanced when their shadow datasets are rebalanced.

Statements

In addition to queries, N1QL for Analytics (the Couchbase Analytics implementation of SQL++) supports statements for data definition and to connect Couchbase Analytics to Data Service buckets.

Statement       ::= ( ( SingleStatement )? ( ";" )+ )* <EOF>
SingleStatement ::= CreateStatement
                  | UseStatement
                  | DropStatement
                  | ConnectStatement
                  | DisconnectStatement
                  | Query

Create Statements

The CREATE statement is used to create dataverses, datasets, and indexes.

CreateStatement     ::= "CREATE" ( DataverseSpecification
                                 | DatasetSpecification
                                 | IndexSpecification)

DataverseSpecification ::= "DATAVERSE" Identifier ( IfNotExists )?             ➊

IfNotExists         ::= "IF" "NOT" "EXISTS"

➀ For dataverses, the Identifier is the dataverse name.

DatasetSpecification ::= "DATASET" ( IfNotExists )? ( QualifiedName )?         ➊
                         "ON" Identifier ( "WHERE" Expression )?               ➋

QualifiedName       ::= Identifier ( "." Identifier )?

➀ For datasets, the QualifiedName is the full name of the dataset. It consists of an optional dataverse name and a name for the dataset. If no dataverse name is given, the dataset is created in the Default dataverse, or in the dataverse specified by the immediately preceding USE statement. If the full name of the dataset is not specified, the bucket name is used as the dataset name.

➁ The Identifier in the ON clause is the bucket which this dataset will shadow.

IndexSpecification ::= "INDEX" ( IfNotExists )? Identifier                     ➊
                       "ON" QualifiedName                                      ➋
                       "(" FieldSpecification ( ","  FieldSpecification )* ")" ➌

FieldSpecification ::= NestedField ":" "STRING"|"BIGINT"|"DOUBLE"

NestedField        ::= Identifier ( "." Identifier )*

➀ For indexes, the first Identifier is the index name.

➁ The QualifiedName identifies the dataset on which the index is built. It consists of an optional dataverse name and the name of the dataset. If no dataverse name is given, the dataset is assumed to be in the Default dataverse, or in the dataverse specified by the immediately preceding USE statement.

➂ The FieldSpecification consists of a NestedField that specifies a field path into the indexed JSON document, and a type identifier.

Examples
CREATE DATAVERSE `beer-data`;

Creates a dataverse with the name beer-data to be used to manage other metadata entities.

CREATE DATASET beers ON `beer-sample` WHERE `type` = "beer";
CREATE DATASET breweries ON `beer-sample` WHERE `type` = "brewery";

This example creates 2 datasets beers and breweries on the beer-sample bucket and filters the content for each dataset by the value of the type field of the record. Note that back-ticks are only necessary for reserved keywords, such as type, or certain operators, such as -.

CREATE INDEX beers_name_idx on beers (name: string);
CREATE INDEX breweries_name_idx on breweries (name: string);
CREATE INDEX breweries_loc_idx on breweries (geo.lon: double, geo.lat: double);

Creates indexes on the identified fields for the specified types.

Use Statements

The USE statement sets the Identifier to be the default dataverse for the following statement. If no USE statement is issued, the Default dataverse is used.

UseStatement ::= "USE" Identifier

The USE statement only works in a conjunction with another statement in a single request.

Example
USE `beer-data`;

This example sets beer-data to be the default dataverse for the statement immediately following.

Drop Statements

The DROP statement is the inverse of the CREATE statement. It can be used to drop dataverses, datasets, and indexes.

DropStatement ::= "DROP" ( "DATAVERSE" Identifier ( IfExists )?                ➊
                         | "DATASET" QualifiedName ( IfExists )?               ➋
                         | "INDEX" DoubleQualifiedName ( IfExists )? )         ➌

DoubleQualifiedName ::= Identifier "." Identifier ( "." Identifier )?

IfExists      ::= "IF" "EXISTS"

➀ For dataverses, the Identifier is the dataverse name.

➁ For datasets, the QualifiedName is the full name of the dataset. It consists of an optional dataverse name and the name of the dataset. If no dataverse name is given, the dataset is assumed to be in the Default dataverse, or in the dataverse specified by the immediately preceding USE statement.

➂ For indexes, the DoubleQualifiedName is the full name of the index. It consists of an optional dataverse name, the name of the dataset, and the name of the index. If no dataverse name is given, the dataset is assumed to be in the Default dataverse, or in the dataverse specified by the immediately preceding USE statement.

Examples
DROP DATAVERSE `beer-data` IF EXISTS;

This example drops the beer-data dataverse, if it already exists. Dropping a dataverse disconnects any links and drops any datasets contained in it.

DROP DATASET beers;
DROP DATASET breweries;

This example removes the datasets and all contained data.

DROP INDEX beers.beers_name_idx;
DROP INDEX breweries.breweries_name_idx;
DROP INDEX breweries.breweries_loc_idx;

Drops the indexes.

Connect Statements

The CONNECT statement connects all datasets in the given dataverse(s) to the buckets on which the datasets were created.

ConnectStatement  ::= "CONNECT" "LINK" LinkSpecification                       ➊
                      ( "WITH" ObjectValue )                                   ➋

LinkSpecification ::= ( Identifier "." )? "Local"
                      ( "," ( Identifier "." )? "Local" )*

➀ For links, the LinkSpecification is the full name of the link. It consists of (optionally) the name of the dataverse to connect to, and the name of the link, which is Local. If no dataverse name is given, the link is created in the Default dataverse, or in the dataverse specified by the immediately preceding USE statement.

➁ Parameters for the connection can be provided though the ObjectValue object after WITH. The ObjectValue contains key-value pairs, one for each parameter.

ObjectValue       ::= "{" ( Pair ( "," Pair )* )? "}"
Pair              ::= StringLiteral ":" ParameterValue
ArrayValue        ::= "[" ( ParameterValue ( "," ParameterValue )* )? "]"
ParameterValue    ::= ObjectValue | ArrayValue | Literal

Only one parameter is currently supported in the WITH clause: the force parameter, which takes a Boolean value (true / false) — default false. This parameter determines the behavior of CONNECT LINK if there has been a change in the bucket’s UUID, i.e. the bucket has been deleted and recreated with the same name.

If force is false, then CONNECT LINK fails — this is the default behavior. If force is true, CONNECT LINK proceeds: Analytics deletes all existing data in the dataset and ingests all data from the bucket again.

Example
CONNECT LINK Local;

This example connects all datasets to their Data Service buckets and starts shadowing.

Disconnect Statements

The DISCONNECT statement is the inverse of the CONNECT statement. It disconnects all datasets in the given dataverse(s).

DisconnectStatement ::= "DISCONNECT" "LINK" LinkSpecification
Example
DISCONNECT LINK Local;

This example stops all datasets shadowing and disconnects their Data Service buckets.

Metadata Introspection

Metadata entities can be introspected by querying the datasets in the Metadata dataverse. The Metadata dataverse contains the datasets Dataverse, Dataset, Index and Link. Each dataset contains the queryable metadata for each entity type.

SELECT VALUE d.DataverseName || '.' || d.DatasetName
FROM Metadata.`Dataset` d
WHERE d.DataverseName <> "Metadata"

Returns the qualified names of all datasets that are not in the Metadata dataverse. While dataverses, datasets, and indexes are created and removed by the corresponding CREATE and DROP statements, the lifecycle of links is managed by the system — they are created and removed as needed.