Agent Memory Environment Variable Reference
- reference
Reference for environment variables that configure the Agent Memory server, organized by functional area.
| For authentication variables, see Configure Agent Memory Authentication. |
You configure Agent Memory through environment variables, either in a .env file passed to the container with --env-file, or injected directly into the container environment.
Variables set directly in the container environment take precedence over values in the .env file.
The distribution includes a .env-sample file as a starting template.
A minimal deployment requires only the database connection variables and a model provider API key, while everything else has a default or is optional.
For a minimal deployment configuration, see Get Started with Agent Memory.
Database Connection
These variables tell Agent Memory which Couchbase cluster and bucket to use.
| For help finding the connection string for your Capella operational database, see Connect To Your Cluster. |
| Variable | Required | Default | Description |
|---|---|---|---|
|
Yes |
— |
Couchbase connection string.
Use |
|
Yes1 |
— |
Username of the Couchbase cluster access credentials. |
|
Yes1 |
— |
Password of the Couchbase cluster access credentials. |
|
Yes |
— |
Name of the bucket Agent Memory reads from and writes to.
The bucket holds the |
1 Username and password are ignored when you authenticate with a client certificate. For more information, see TLS and mTLS.
TLS and mTLS
These variables secure the connection between Agent Memory and Couchbase.
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
— |
Path inside the container to the cluster’s root CA certificate. Required when connecting to Couchbase Capella. |
|
No |
— |
Path to the client certificate for mutual TLS (mTLS). |
|
No |
— |
Path to the client private key for mutual TLS (mTLS). |
When you provide a client certificate, AGENTMEMORY_USERNAME and AGENTMEMORY_PASSWORD are ignored.
Never use plain text couchbase:// connections in production.
Use couchbases:// instead, with the appropriate TLS settings.
|
Write Durability
By default, Agent Memory uses standard writes for memory blocks. Enable durable writes to require the cluster to acknowledge persistence to the active node’s disk before confirming each write. For more information, see Durability.
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Set to |
Model Providers
Agent Memory uses an embedding model and an LLM to power semantic extraction.
API Keys
| Variable | Required | Default | Description |
|---|---|---|---|
|
Yes2 |
— |
Default API key for both the embedding and LLM providers.
When set, it takes precedence over |
2 Not required when both AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY are set.
Embedding Model
The embedding model converts each memory block into a vector for semantic search.
| Variable | Required | Default | Description |
|---|---|---|---|
|
Yes |
— |
Name of the embedding model, for example |
|
No3 |
— |
API key for the embedding provider.
Used only when |
|
No |
— |
Base URL of an OpenAI-compatible embedding endpoint. Set this to use a self-hosted model or an alternative provider. |
3 If OPENAI_API_KEY is set, this variable is ignored.
| Agent Memory creates a Search Vector Index automatically with a dimensionality matching the embedding model you configure. If you change the embedding model to a model that produces vectors of a different dimensionality, you must recreate the index. |
Large Language Model
Agent Memory uses an LLM to generate a concise, human-readable summary for each memory block. Summaries let agents quickly interpret search results without reading full block content.
| Variable | Required | Default | Description |
|---|---|---|---|
|
Yes |
— |
Name of the LLM, for example |
|
No4 |
— |
API key for the LLM provider.
Used only when |
|
No |
— |
Base URL of an OpenAI-compatible LLM endpoint. Set this to use a self-hosted model or an alternative provider. |
|
No |
|
Maximum number of tokens in a generated summary. Reduce this to keep summaries shorter and decrease LLM token usage per block. |
4 If OPENAI_API_KEY is set, this variable is ignored.
Semantic Extraction
Semantic extraction is the process by which Agent Memory generates an embedding and, when using an LLM, a summary for each ingested memory block. Semantic extraction is enabled by default and is what makes blocks searchable. Without embeddings, semantic search has nothing to query against.
If you turn off semantic extraction, Agent Memory skips all model API calls at ingestion time, making ingestion faster and eliminating model provider costs. However, with no semantic extraction, blocks are stored but you cannot search them. In this situation, you can only retrieve blocks using direct lookup with a known block ID.
Only turn off semantic extraction when:
-
You’re running a test environment and do not need search.
-
Your deployment retrieves blocks by ID rather than through semantic search.
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Set to |
|
No |
|
Number of memory blocks processed per batch during synchronous embedding and fact extraction. Only applies when |
Server Settings
These variables control the network interface, port, and thread pool the Agent Memory server uses at runtime.
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Address the server binds to. |
|
No |
|
Port the API listens on.
This must match the port you publish with |
|
No |
|
Number of worker threads for synchronous route handlers and the Couchbase SDK. Run a single Uvicorn worker and increase this value instead of adding workers — the async extraction processor must remain a singleton. |
|
No |
|
Cache TTL, in seconds, for health check results.
Set to |
|
No |
|
Interval, in seconds, at which the background health cache is refreshed.
Must be less than |
HTTPS
These variables configure TLS for the Agent Memory HTTP listener.
| These variables are intended for development and internal use with self-signed certificates. For production deployments, terminate TLS at a reverse proxy or load balancer instead of configuring it directly on the Agent Memory server. For more information, see Deploy Agent Memory for Production. |
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
— |
Path to the TLS certificate file for the HTTP listener. |
|
No |
— |
Path to the TLS private key file for the HTTP listener. |
|
No |
— |
Password for an encrypted TLS private key file. |
|
No |
— |
Path to the CA bundle used by Uvicorn for TLS client certificate verification. |
Memory Retention and Quotas
These variables control automatic expiry and capacity limits.
|
TTL is the primary mechanism for privacy compliance and data minimization. Any memory block that contains personally identifiable information should carry a TTL. Use a tiered retention approach:
A block-level TTL overrides the session default, which overrides the global default set by |
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Global default time to live, in seconds, applied to memory blocks.
|
|
No |
|
Estimated memory budget, in megabytes, used by the memory monitor. |
|
No |
|
Percentage of the quota at which the memory monitor begins rejecting ingestion requests. |
|
No |
|
Enables the memory monitor. When enabled, Agent Memory tracks estimated usage and rejects ingestion once usage exceeds the threshold percentage of the quota. |
|
No |
|
Interval, in seconds, at which the memory monitor samples process memory usage.
Only applies when |
Rate Limiting and Throughput
The model provider is almost always the throughput ceiling. Agent Memory and Couchbase can ingest far faster than any hosted model provider allows, so the extraction pipeline is almost always the bottleneck. Increasing server or database resources without raising your model provider tier has no effect on extraction throughput.
When setting limits and monitoring throughput, keep the following in mind:
- Match the limits to your provider tier
-
Agent Memory enforces a requests-per-minute limit and a tokens-per-minute limit against the model provider. Set both to the actual limits shown in your provider’s account dashboard:
-
Set too high, and the provider returns
429errors. This reduces throughput as Agent Memory handles this by waiting and retrying. -
Set too low, and provider capacity goes unused.
-
- Use extraction queue depth to find the bottleneck
-
The extraction queue depth reported by the
/health/async-batch-processor-statsendpoint is the primary signal to identify a bottleneck in your system:-
A steadily growing queue means extraction is not keeping up with ingestion, so the model provider is the constraint. To improve throughput, upgrade the provider tier or throttle ingestion at the application layer.
-
A stable queue near zero means the system is not saturated.
-
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Maximum model API requests per minute. Set this to the actual limit shown in your provider’s account dashboard. |
|
No |
|
Maximum model API tokens per minute. Set this to the actual limit shown in your provider’s account dashboard. |
|
No |
|
Maximum number of concurrent async extraction tasks.
A useful starting point is |
|
No |
|
Maximum token count for a single memory block. Blocks that exceed this limit are rejected from the extraction queue. |
|
No |
|
Maximum number of blocks that can be queued for async extraction.
When the queue is full, Agent Memory returns HTTP |
|
No |
|
Number of extraction failures after which a block is marked as permanently failed. |
|
No |
|
Maximum time, in seconds, that a single async extraction task can run before it is cancelled. Increase this for slow or heavily rate-limited model providers. |
|
No |
|
Sleep duration, in seconds, between async extraction dispatch cycles. Decrease to reduce extraction latency at the cost of higher CPU usage when the queue is idle. |
Logging
For information on how to access log files in a running deployment, see Viewing Logs.
| Variable | Required | Default | Description |
|---|---|---|---|
|
No |
|
Log verbosity level.
Accepted values: |
|
No |
|
Set to |
|
No |
|
Maximum log file size in bytes before rotation. The default is 10 MB. |
|
No |
|
Number of rotated log files to retain. |