Create a Kafka Pipeline Collection
- Capella Columnar
- how-to
To receive a data event stream from a remote data source that uses an Apache Kafka pipeline, you create a remote collection.
You can create collections to associate with a Kafka pipeline link when you create the link or at any time afterwards.
You can also use a SQL++ statement to create a remote Kafka collection. See CREATE a Remote Collection.
Requirements
- Primary Key
-
When you set up a remote collection to receive data from a Kafka pipeline, you supply the primary key and its data type in
KEY_NAME:DATA_TYPE
format. For example,id:string
.-
To use a key name that includes a space or any character other than an underscore (_), escape the name with backtick (
``
) characters. -
For source data that uses an object id, add
.
and then
after the`$oid
`KEY_NAME
, in the following format:KEY_NAME.`$oid`:DATA_TYPE
For example:
_id.`$oid`:string
-
For a composite key, enter a comma-separated list of the key names and their data types.
-
- Topic
-
The Kafka topic or set of topics that contains the data you want to stream into the collection. You can stream data from one or more topics to multiple collections using the same link. However, the collections that stream the same topics must have the same data serialization and change data capture settings. Otherwise, you receive an
inconsistent details config
error.Similarly, when streaming data from multiple topics into a collection, the data serialization and change data capture settings must apply to all of the topics that you provide.
- Data Serialization
-
The type of data serialization used for keys and values:
-
Protocol Buffers (Protobuf)
- Dead Letter Queue
-
You can have Capella Columnar report any messages it fails to load to a Kafka topic called the dead letter queue. The credentials you supply for the link to connect to Kafka must have permission to produce messages on this topic.
- Change Data Capture
-
Whether Change Data Capture (CDC) applies, and if so, the source.
Create a Collection for a Kafka Data Link
If you have just saved a new Kafka data link and are ready to add one or more collections to it, begin with step 5.
To create a remote collection that is associated with a Kafka data link:
-
In the Capella UI, select the Columnar tab.
-
Click a cluster name. The workbench opens.
-
Use the explorer to locate the link.
-
Move your cursor over the name of the link and then choose
. The Create Collection dialog opens. -
Use the lists to select the Database and Scope for the collection.
-
In the Collection Name field, enter a name for the collection.
The name must start with a letter (A-Z, a-z) and contain only upper- and lowercase letters, numbers (0-9), and underscore (_) or dash (-) characters.
-
In the Primary Key field, enter the name of the primary key and its data type in the format
KEY_NAME:DATA_TYPE
. See the requirements for examples. -
Supply one or more Kafka Topics in a comma-separated list. If you supply multiple topics, the choices you make in the remaining fields must apply to all of them. Also enter the name of the dead letter queue topic (if any).
-
Select the data serialization type used for keys.
-
Select the data serialization type used for values.
-
Specify whether the topics use CDC (Change Data Capture). If you select CDC Enabled, Capella Columnar supplies a CDC Connector of DEBEZIUM. You specify the CDC Source: MONGODB, MYSQLDB, or POSTGRESQL.
-
Choose Create Collection. Your collection appears under the specified database and scope in the explorer.
If the link is connected, the data stream from the specified topic or topics into this Capella Columnar collection begins immediately. If the link is not connected, see Connect or Disconnect a Remote Link.