Managing Capella Analytics Collections

Capella Analytics

how-to

What’s a Collection?

A collection is a data container within a scope that allows for the logical grouping of documents.

The Capella Analytics Workbench enables you to create, edit, drop collections, and connect or disconnect local links and remote links.

Creating Collections

To stream data from a remote source like a Capella operational cluster, Couchbase Server, or a Kafka pipeline, create a remote link and an associated collection. To see your collections, go to Capella Analytics Workbench. When you create a collection, you need to associate a specific database and scope.

To create a remote link and associated collection, see Stream Data from Remote Sources
To query data hosted on external object storage, create an external link and associated collection, see Set Up an External Data Source.
To create a standalone collection, see Set Up a Standalone Collection.

Deleting a Collection

To delete a collection:

Go to Capella Analytics Workbench and find the collection you want to delete.
Go to More Options (︙) Delete Collection. The Warning dialog appears.
Confirm that you want to delete this collection and click Delete.

You can also delete a collection using the DELETE statement. For more information about deleting a collection, see Delete Statements.

View Metadata for a Collection

Each time you add a collection, Capella Analytics records its metadata in the System.Metadata.Dataset collection. To view metadata for a collection, you need to query this system collection. For more information, see Querying Metadata.

Capella Analytics supports both column and row storage formats.

The system uses the column format by default and it’s recommended not to exceed 4,000 unique columns across all documents in a collection.

When JSON documents are ingested, each unique leaf node is interpreted as a distinct column.

See the following example:

{
    "a": {
        "b": [1, 2],
        "c": "value",
        "d": [
            { "x": 1, "y": 2 },
            { "x": 3, "y": 4 }
        ]
    }
}
This document contributes 4 columns:
- a.b: [1, 2] → 1 column
- a.c: "value" → 1 column
- a.d: [{ "x": 1, "y": 2 }, { "x": 3, "y": 4 }] → 2 columns (a.d.x and a.d.y)

Additional documents or array elements with the same column structure do not count towards the 4,000-column limit.

Exceeding the recommended column limit may lead to degraded performance and high resource usage. To avoid this, design your schema and data model to minimize deeply nested or highly dynamic structures, and to prevent exceeding the column limit. Avoid naming fields in a way that causes each object to introduce new fields. For example, use a timestamp as the field name instead of storing it as a value.