A newer version of this documentation is available.

View Latest

Data Definition Language (DDL)

This section lists all supported DDL statements in Couchbase Analytics.

A user must have one of the following RBAC roles to be able to execute DDL statements (Admin, Cluster Admin, or Analytics Manager).
No CREATE/DROP or CONNECT/DISCONNECT statement can be executed while the cluster topology changes (e.g. during rebalance). The evaluation of such DDL statements will fail and it can be reattempted after the topology change has completed.

Statements

Statement       ::= ( ( SingleStatement )? ( ";" )+ )* <EOF>
SingleStatement ::= CreateStatement
                  | DropStatement
                  | ConnectStatement
                  | DisconnectStatement
                  | Query

In addition to queries, the Couchbase Analytics implementation of SQL++ supports statements for data definition and to connect Couchbase Analytics to Data Service buckets.

Creating and Dropping

CreateStatement     ::= "CREATE" ( BucketSpecification
                                 | DatasetSpecification
                                 | IndexSpecification)

QualifiedName       ::= Identifier ( "." Identifier )?
DoubleQualifiedName ::= Identifier "." Identifier ( "." Identifier )?

The CREATE statement is used to create (Analytics) buckets, datasets, and indexes.

Buckets

BucketSpecification ::= "BUCKET" ( IfNotExists )? QualifiedName ( "WITH" ObjectValue )?

ObjectValue    ::= "{" ( Pair ( "," Pair )* )? "}"
Pair           ::= StringLiteral ":" ParameterValue
ArrayValue     ::= "[" ( ParameterValue ( "," ParameterValue )* )? "]"
ParameterValue ::= ObjectValue | ArrayValue | Literal

IfNotExists ::= "IF" "NOT" "EXISTS"
For Bucket DDL statments, in addition to having one of the roles required to execute DDL statements, a user must also have bucket data read permission on the Data Service bucket.

A (Analytics) bucket is a proxy for a bucket on a cluster of Data Service nodes. It represents this bucket in all other DDL operations. The CREATE BUCKET statement creates such a (Analytics) bucket. Parameters for the bucket can be provided though a parameter object. Currently 2 parameters are supported:

  • name a string for the name of the bucket used on the Data Service nodes and

  • nodes an array or IP addresses or node names for the Data Service nodes. However, you should not pass this parameter to the statement if the bucket resides in the same Couchbase instance.

Both parameters are optional - the default name is the name of the bucket and the default value for nodes are the Data Service nodes of the same Couchbase instance.

Examples
CREATE BUCKET `beer-sample`;

CREATE BUCKET beerbucket WITH {
  "name": "beer-sample",
  "nodes": "192.168.1.1"
};

The first example creates a new (Analytics) bucket that represents the bucket beer-sample on the Data Service of the same Couchbase instance. The second example creates a new bucket beerbucket that represents the bucket beer-sample on the Data Service of a Couchbase instance that can be found on 192.168.1.1.

Datasets

DatasetSpecification ::= "DATASET" ( IfNotExists )? ( QualifiedName )?
                         "ON" Identifier ( "WHERE" Expression )?

Datasets contain a shadow of the data of a bucket. They are connected to a bucket and are updated as the bucket gets updated. A dataset can contain the full content of the bucket or a filtered subset. Multiple datasets can shadow the same bucket. A dataset is created ON a previously created BUCKET with an optional filter expression. The name of the dataset is optional and defaults to the name of the bucket.

Examples
CREATE DATASET beers ON `beer-sample` WHERE type = "beer";
CREATE DATASET breweries ON `beer-sample` WHERE type = "brewery";

This example creates 2 datasets beers and breweries on the previously created beer-sample bucket and filters the content for each dataset by the value of the type field of the record.

Indexes

IndexSpecification ::= "INDEX" ( IfNotExists )? Identifier
                       "ON" QualifiedName "(" FieldSpecification (","  FieldSpecification)* ")"
FieldSpecification ::= QualifiedName : "STRING"|"BIGINT"|"DOUBLE"

The first Identifier is the index name. The QualifiedName in the IndexSpecification identifies the dataset on which the index is built. The FieldSpecifications consist of a QualifiedName that specifies a field path into the indexed JSON document and a type identifier. To create an index on a dataset, the bucket corresponding to the dataset must be in the disconnected state. There may be more than one secondary index created on the same dataset. Each index name must be unique within a dataset. Creating index fails if there is an existing index with the same name in the target dataset and ‘IF NOT EXISTS’ is not specified.

For each JSON document ingested into a dataset, the system computes the indexed key for each index. The index key of a secondary index is computed as follows:

  • The target field values are extracted from the JSON document according to the specified path of fields;

  • For non-numeric typed index fields (i.e., the specified type is STRING), if the specified type of the field is the same as the actual field value’s type, the value becomes part of the indexed key, otherwise the indexed key cannot be built;

  • For numeric typed index fields (i.e., the specified type is DOUBLE or BIGINT), the actual field value can be cast to the specified type and it becomes part of the indexed key. If the cast is impossible, the indexed key cannot be built;

After the indexed key has been built, it is inserted into the secondary index. In case the index key cannot be built, there is no entry made in the index for this object.

Secondary indexes are automatically maintained by the system during data ingestion (i.e., when their corresponding buckets are connected and data are flowing into the system). In addition, they are automatically rebalanced when their shadow datasets are rebalanced.

Examples
CREATE INDEX beers_name_idx on beers (name: string);
CREATE INDEX breweries_name_idx on breweries (name: string);
CREATE INDEX breweries_loc_idx on breweries (geo.lon: double, geo.lat: double);

For bucket/dataset/index creations, if ‘IF NOT EXISTS’ is specified, the DDL will not fail and leave the existing bucket/dataset/index unchanged.

Drop Statements

DropStatement ::= "DROP" ( "BUCKET" QualifiedName IfExists
                         | "DATASET" QualifiedName IfExists
                         | "INDEX" QualifiedName IfExists)
IfExists      ::= ( "IF" "EXISTS" )?

The DROP statement in SQL++ is the inverse of the CREATE statement. It can be used to drop (Analytics) buckets, datasets, and indexes.

The following examples illustrate some uses of the DROP statement.

Examples
DROP INDEX beers.beers_name_idx;
DROP INDEX breweries.breweries_name_idx;
DROP INDEX breweries.breweries_loc_idx;

drops the indexes.

Examples
DROP DATASET beers;
DROP DATASET breweries;

removes the dataset and all contained data.

Example
DROP BUCKET `beer-sample`;

removes the bucket.

Connecting and Disconnecting

ConnectStatement ::= "CONNECT" "BUCKET" QualifiedName ( "WITH" ObjectValue )? ( "IF" "NOT" "CONNECTED" )?

The CONNECT statement connects a Data Service bucket to Analytics and starts shadowing all datasets that are created on the bucket. Parameters for the connection can be provided though a parameter object after WITH.

The parameter object contains key-value pairs, one for each parameter. Different parameters are used for different bucket settings. We list two cases with regard to where the Data Service bucket is:

  • Case 1: Internal - same Couchbase instance;

  • Case 2: External - separately installed Couchbase 5.x instance;

For both cases, the timeout parameter (optional) is supported:

  • timeout: a connection timeout in ms when connecting to the Data Service nodes.

The default timeout is the connect timeout used by the Couchbase Java Client.

For case 1, no other parameters can be passed to the statement.

For case 2, there are two required parameters:

  • username (mandatory): a username for the bucket that the Analytics service connects to;

  • password (mandatory): the corresponding password for the username.

Example
CONNECT BUCKET `beer-sample` WITH { "timeout": 2500 };

This example connects all datasets that were previously created on bucket beer-sample to the Data Service nodes using a timeout of 2500 ms.

DisconnectStatement ::= "DISCONNECT" "BUCKET" QualifiedName  ( "IF" "CONNECTED" )?

The DISCONNECT statement is the inverse of the CONNECT statement. It stops shadowing all datasets that were created on a BUCKET and disconnects the bucket.

Example
DISCONNECT BUCKET `beer-sample` IF CONNECTED;

This example stops shadowing all datasets that were created on the bucket beer-sample and disconnects the bucket if the bucket is connected.