Managing Capella Analytics Collections

  • Capella Analytics
  • how-to
    +

    What’s a Collection?

    A collection is a data container within a scope that allows for the logical grouping of documents.

    The Capella Analytics Workbench enables you to create, edit, drop collections, and connect or disconnect local links and remote links.

    Creating Collections

    To stream data from a remote source like a Capella operational cluster, Couchbase Server, or a Kafka pipeline, create a remote link and an associated collection. To see your collections, go to Capella Analytics  Workbench. When you create a collection, you need to associate a specific database and scope.

    Deleting a Collection

    To delete a collection:

    1. Go to Capella Analytics  Workbench and find the collection you want to delete.

    2. Go to More Options (︙)  Delete Collection. The Warning dialog appears.

    3. Confirm that you want to delete this collection and click Delete.

    You can also delete a collection using the DELETE statement. For more information about deleting a collection, see Delete Statements.

    View Metadata for a Collection

    Each time you add a collection, Capella Analytics records its metadata in the System.Metadata.Dataset collection. To view metadata for a collection, you need to query this system collection. For more information, see Querying Metadata.

    Capella Analytics supports both column and row storage formats.

    The system uses the column format by default and it’s recommended not to exceed 4,000 unique columns across all documents in a collection.

    When JSON documents are ingested, each unique leaf node is interpreted as a distinct column.

    See the following example:

    {
        "a": {
            "b": [1, 2],
            "c": "value",
            "d": [
                { "x": 1, "y": 2 },
                { "x": 3, "y": 4 }
            ]
        }
    }
    This document contributes 4 columns:
    - a.b: [1, 2] → 1 column
    - a.c: "value" → 1 column
    - a.d: [{ "x": 1, "y": 2 }, { "x": 3, "y": 4 }] → 2 columns (a.d.x and a.d.y)

    Additional documents or array elements with the same column structure do not count towards the 4,000-column limit.

    Exceeding the recommended column limit may lead to degraded performance and high resource usage. To avoid this, design your schema and data model to minimize deeply nested or highly dynamic structures, and to prevent exceeding the column limit. Avoid naming fields in a way that causes each object to introduce new fields. For example, use a timestamp as the field name instead of storing it as a value.