Get Started with Agent Memory
- how-to
Deploy the Couchbase Agent Memory server and store and retrieve memory using its Python SDK.
Use this quickstart to get Agent Memory running, and to write a simple Python script that demonstrates how to store and retrieve memories.
| This quickstart is for development and testing purposes only. For production deployment instructions, see Deploy Agent Memory for Production. |
Prerequisites
-
A Couchbase Server Enterprise or Couchbase Capella cluster with the following configuration:
-
Couchbase Server 8.0.2 or later.
-
The following services enabled:
-
A bucket created for Agent Memory.
-
Cluster access credentials with read and write permissions on that bucket. For more information, see Create Cluster Access Credentials or Manage Users and Roles.
-
Your cluster’s connection string. For more information, see Connect To Your Cluster.
-
-
You have an API key for an OpenAI-compatible embedding model and large language model (LLM). These can be external or from the Capella Model Service.
-
You have Docker installed.
-
You have Python 3.12 or later installed.
Step 1: Deploy the Agent Memory Server
To deploy the Agent Memory server on Linux, Intel/AMD, Apple Silicon, or AWS Graviton:
-
Download the artifacts using the time-limited download link provided to you.
The download link expires in 24 hours. -
Load the Docker image:
-
Intel/AMD (amd64)
-
ARM (arm64)
Use these commands for most Linux servers and Intel/AMD machines:
docker load -i agentmemory-server-amd64-v1.0.0.tarUse these commands for Apple Silicon and AWS Graviton instances:
docker load -i agentmemory-server-arm64-v1.0.0.tar -
-
Create a
.envfile with the following configuration:Replace each placeholder with your actual values.
AGENTMEMORY_CONN_STRING=couchbases://your_cluster_host (1) AGENTMEMORY_USERNAME=your_username AGENTMEMORY_PASSWORD=your_password AGENTMEMORY_BUCKET=your_bucket_name # AGENTMEMORY_CONN_ROOT_CERTIFICATE=/app/certs/ca.pem (2) OPENAI_API_KEY=your_api_key AGENTMEMORY_EMBEDDING_MODEL=text-embedding-3-small AGENTMEMORY_LLM_MODEL=gpt-4o-mini AGENTMEMORY_SERVER_HOST=0.0.0.0 AGENTMEMORY_SERVER_PORT=8080 OIDC_AUTH_ENABLED=false LOG_LEVEL=INFO1 Use couchbase://for clusters without TLS, orcouchbases://for clusters with TLS.2 When connecting to a Capella cluster, you need to use the cluster’s root certificate. To use your Capella cluster’s certificate, first uncomment this line. Next, Download the root certificate from the Capella UI, rename it to ca.pem, and place it in the same directory as your.envfile. The path/app/certs/ca.pemis the location inside the container.This .envfile shows only the variables you need for a minimal deployment. For the full list of available variables, see Agent Memory Environment Variable Reference. -
Start the server:
If Couchbase Server and Agent Memory are running on the same host, see connecting to Couchbase Server on the same host. -
Intel/AMD (amd64)
-
ARM (arm64)
Use these commands for most Linux servers and Intel/AMD machines:
docker run -d \ --name agentmemory-server \ --platform linux/amd64 \ --env-file .env \ -p 8080:8080 \ -p 9090:9090 \ -v agentmemory-logs:/app/logs \ --restart unless-stopped \ agentmemory-server:amd64-
If connecting to a Capella cluster, add
-v $(pwd)/ca.pem:/app/certs/ca.pem:robefore the image name in thedocker runcommand. -
Port
8080is the Agent Memory API. For more information, see API Endpoints. -
Port
9090is the embedded Prometheus metrics endpoint. If you do not need to scrape Prometheus from outside the host, omit the-p 9090:9090flag.
Use these commands for Apple Silicon and AWS Graviton instances:
docker run -d \ --name agentmemory-server \ --env-file .env \ -p 8080:8080 \ -p 9090:9090 \ -v agentmemory-logs:/app/logs \ --restart unless-stopped \ agentmemory-server:arm64-
If connecting to a Capella cluster, add
-v $(pwd)/ca.pem:/app/certs/ca.pem:robefore the image name in thedocker runcommand. -
Port
8080is the Agent Memory API. For more information, see API Endpoints. -
Port
9090is the embedded Prometheus metrics endpoint. If you do not need to scrape Prometheus from outside the host, omit the-p 9090:9090flag.
-
-
Verify the server is healthy:
curl http://localhost:8080/healthThe server is ready when the response includes
"status": "healthy".If the status is unhealthy, check the logs for more information:
docker logs agentmemory-serverIf you encounter issues, see Troubleshooting.
Step 2: Install the Python SDK
To install the Python SDK on the machine where you want to develop with Agent Memory:
-
Create and activate a virtual environment in your project directory:
python -m venv .venv source .venv/bin/activateYou can also use Conda or another virtual environment manager. -
Install the Agent Memory Python SDK:
python -m pip install --upgrade pip python -m pip install couchbase-agent-memory
Step 3: Store and Retrieve Memory
You can use the following quickstart example to learn how to store and retrieve memories from Agent Memory:
-
Create a file called
quickstart.pyand add the following code. This code demonstrates creating a user and session, storing memory blocks, and retrieving them by semantic search:from agentmemory import AgentMemoryClient, ChatMessage with AgentMemoryClient(base_url="http://localhost:8080") as client: # Create a persistent user identity user = client.create_user(user_id="support-agent-1", name="Support Bot") print(f"Created user: {user.user_id}") # Open a session — the container for this conversation's memory blocks session = user.create_session(session_id="ticket-4821") print(f"Created session: {session.session_id}") # Store a conversation exchange. # async_processing=False blocks until the embedding is generated; # all returned blocks are in ready status on return. r1 = session.add_memory( messages=[ ChatMessage( user_content="My payments keep failing at checkout.", assistant_content="I can help with that. Are you seeing a specific error code?", ) ], async_processing=False, ) print(f"Stored message block(s): {r1.block_ids}") # Store a discrete fact extracted from the conversation r2 = session.add_memory( facts=["Customer is on the free tier and has not added a payment method."], async_processing=False, ) print(f"Stored fact block(s): {r2.block_ids}") # Retrieve memory blocks relevant to a query via semantic search print("\nRetrieving memories relevant to 'billing issues'...") results = session.search_memory(query="billing issues") for block in results.memory_blocks: if block.message: print(f" User: {block.message.user_content}") print(f" Assistant: {block.message.assistant_content}") if block.fact: print(f" Fact: {block.fact}") session.end() print("Session ended.")async_processing=Falsemakesadd_memory()wait for embedding generation to complete before returning. This is appropriate for a quickstart, but it adds latency per call. In production, use the defaultasync_processing=True, which returns immediately and generates embeddings in the background. Blocks remain searchable only after their status reachesready, which happens within seconds under normal load.If your server has OIDC authentication enabled, pass your JWT bearer token to the client constructor: AgentMemoryClient(base_url="http://localhost:8080", token="your-jwt-token") -
Run the file:
python quickstart.py -
Review your expected output:
Created user: support-agent-1 Created session: ticket-4821 Stored message block(s): ['<block-id-1>'] Stored fact block(s): ['<block-id-2>'] Retrieving memories relevant to 'billing issues'... User: My payments keep failing at checkout. Assistant: I can help with that. Are you seeing a specific error code? Fact: Customer is on the free tier and has not added a payment method. Session ended.Agent Memory stored both blocks, generated a vector embedding for each, and returned them based on semantic similarity to the search query.
Step 4: Explore the API Interactively
Agent Memory serves 2 API explorers at the following URLs with no setup required beyond running the server.
If you’re running the server on a different host or port, replace localhost:8080 with your host and port in the URLs below:
http://localhost:8080/docs-
Swagger UI. Try any endpoint directly in the browser, with request and response schemas auto-populated from the OpenAPI spec.
http://localhost:8080/redoc-
ReDoc. A read-only reference view of the full API spec.
Troubleshooting
- If
/healthreturns an error -
Check
docker logs agentmemory-serverfor startup errors. Both the Agent Memory server and Prometheus log to the same stream; lines are prefixed by the process name. - If the Couchbase connection fails
-
Verify
AGENTMEMORY_CONN_STRING,AGENTMEMORY_USERNAME,AGENTMEMORY_PASSWORD, andAGENTMEMORY_BUCKET. Check that the Search Service is enabled on the cluster and that the network allows the connection. If using TLS, verifyAGENTMEMORY_CONN_ROOT_CERTIFICATEis correct. - If model calls fail
-
Verify
OPENAI_API_KEY,AGENTMEMORY_EMBEDDING_MODEL, andAGENTMEMORY_LLM_MODEL. - If port
9090returns nothing -
Confirm you published the port with
-p 9090:9090and that no other process on the host is already bound to port9090. - If Docker reports a platform incompatibility error
-
Use
agentmemory-server-amd64.tarfor Intel/AMD Linux servers andagentmemory-server-arm64.tarfor ARM machines.
- If Agent Memory cannot connect to Couchbase Server when both are running on the same host
-
When running Couchbase Server and Agent Memory containers on the same host, such as an EC2 instance, the Agent Memory container cannot reach the Couchbase Server using
couchbase://localhost. To share the host’s network namespace, run Agent Memory with--network hostand keep the connection string ascouchbase://localhost.Example:
docker runwith--network host(Intel/AMD)docker run -d \ --name agentmemory-server --network host \ --platform linux/amd64 \ --env-file .env \ -v agentmemory-logs:/app/logs \ -v agentmemory-prometheus:/prometheus-data \ --restart unless-stopped \ agentmemory-server:amd64If you’re using Docker Desktop on Mac/Windows, you can instead leave the Agent Memory container on its default bridge network and set the connection string to
couchbase://host.docker.internal. That name is not defined by default on Linux servers, so do not rely on it when using EC2.