Amazon S3

  • Capella Analytics
  • how-to
    +
    To provide query access to OLAP data in an AWS S3 bucket, you create an external link and associate it with an external collection.

    Amazon S3 external sources allow you to connect to and query data stored in S3 buckets directly from your database. Before setting an S3 external source, make sure you have the necessary AWS permissions and configured credentials.

    You also need the following information about the S3 bucket containing the data you want to query:

    Credentials

    To create an external link for private data in an Amazon S3 bucket, you must supply an access key ID and secret access key. These credentials must have permission to list and read data from the bucket. For more information, see Managing access keys for IAM users in the AWS documentation.

    You can specify a session token to indicate that the credentials are temporary. For more information, see Temporary security credentials in IAM in the AWS documentation.

    You do not need credentials for publicly available data in S3.

    When you create an external link, be sure to follow best practices for security. Couchbase recommends that you grant the minimum possible permissions to perform the required operations, and allow access only to the required data and resources. You should never use root account credentials.

    The Location Path

    When you create an external collection based on an S3 bucket, you can supply a path to the files Capella Analytics queries. A path consists of one or more prefixes that define a hierarchical organization, using a format such as topLevel/nextLevel/lowestLevel. The path does not include filenames.

    If you use the Amazon S3 console, prefixes are also referred to as folders.

    To make querying the external data source as efficient as possible, you should supply a path that’s as specific and precise as possible. You can use static prefixes, dynamic prefixes, or a mixture of both to define a path. For information about static and dynamic prefixes, see Design a Location Path.

    Because you cannot index the data located in an external store, Couchbase encourages thoughtful design of the paths used in external collections.

    For information about using prefixes for data on S3, see Organizing objects using prefixes in the AWS documentation.

    You can select a subset of the files in a location by using fields that include and exclude filenames.

    For detailed instructions on setting and configuring Amazon S3 external sources, see the following: