Agent Memory Environment Variable Reference

reference

Reference for environment variables that configure the Agent Memory server, organized by functional area.

For authentication variables, see Configure Agent Memory Authentication.

You configure Agent Memory through environment variables, either in a .env file passed to the container with --env-file, or injected directly into the container environment. Variables set directly in the container environment take precedence over values in the .env file.

The distribution includes a .env-sample file as a starting template. A minimal deployment requires only the database connection variables and a model provider API key, while everything else has a default or is optional. For a minimal deployment configuration, see Get Started with Agent Memory.

Database Connection

These variables tell Agent Memory which Couchbase cluster and bucket to use.

For help finding the connection string for your Capella operational database, see Connect To Your Cluster.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_CONN_STRING`	Yes	—	Couchbase connection string. Use `couchbase://` for a cluster without TLS, or `couchbases://` for a cluster with TLS.
`AGENTMEMORY_USERNAME`	Yes¹	—	Username of the Couchbase cluster access credentials.
`AGENTMEMORY_PASSWORD`	Yes¹	—	Password of the Couchbase cluster access credentials.
`AGENTMEMORY_BUCKET`	Yes	—	Name of the bucket Agent Memory reads from and writes to. The bucket holds the `users`, `sessions`, and `memory` collections in the `agentmemory` scope.

AGENTMEMORY_CONN_STRING

Yes

—

Couchbase connection string. Use couchbase:// for a cluster without TLS, or couchbases:// for a cluster with TLS.

AGENTMEMORY_USERNAME

Yes¹

—

Username of the Couchbase cluster access credentials.

AGENTMEMORY_PASSWORD

Yes¹

—

Password of the Couchbase cluster access credentials.

AGENTMEMORY_BUCKET

Yes

—

Name of the bucket Agent Memory reads from and writes to. The bucket holds the users, sessions, and memory collections in the agentmemory scope.

¹ Username and password are ignored when you authenticate with a client certificate. For more information, see TLS and mTLS.

TLS and mTLS

These variables secure the connection between Agent Memory and Couchbase.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_CONN_ROOT_CERTIFICATE`	No	—	Path inside the container to the cluster’s root CA certificate. Required when connecting to Couchbase Capella.
`AGENTMEMORY_CONN_CLIENT_CERT`	No	—	Path to the client certificate for mutual TLS (mTLS).
`AGENTMEMORY_CONN_CLIENT_KEY`	No	—	Path to the client private key for mutual TLS (mTLS).

AGENTMEMORY_CONN_ROOT_CERTIFICATE

—

Path inside the container to the cluster’s root CA certificate. Required when connecting to Couchbase Capella.

AGENTMEMORY_CONN_CLIENT_CERT

—

Path to the client certificate for mutual TLS (mTLS).

AGENTMEMORY_CONN_CLIENT_KEY

—

Path to the client private key for mutual TLS (mTLS).

When you provide a client certificate, AGENTMEMORY_USERNAME and AGENTMEMORY_PASSWORD are ignored.

Never use plain text couchbase:// connections in production. Use couchbases:// instead, with the appropriate TLS settings.

Write Durability

By default, Agent Memory uses standard writes for memory blocks. Enable durable writes to require the cluster to acknowledge persistence to the active node’s disk before confirming each write. For more information, see Durability.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_DURABLE_WRITES`	No	`false`	Set to `true` to enable durable writes for memory blocks. Requires at least one replica on the bucket. Do not enable this on a single-node cluster or a bucket with no replicas.

AGENTMEMORY_DURABLE_WRITES

false

Set to true to enable durable writes for memory blocks. Requires at least one replica on the bucket. Do not enable this on a single-node cluster or a bucket with no replicas.

Model Providers

Agent Memory uses an embedding model and an LLM to power semantic extraction.

API Keys

Variable Required Default Description

Variable	Required	Default	Description
`OPENAI_API_KEY`	Yes²	—	Default API key for both the embedding and LLM providers. When set, it takes precedence over `AGENTMEMORY_EMBEDDING_API_KEY` and `AGENTMEMORY_LLM_API_KEY`. Required unless you set both `AGENTMEMORY_EMBEDDING_API_KEY` and `AGENTMEMORY_LLM_API_KEY`.

OPENAI_API_KEY

Yes²

—

Default API key for both the embedding and LLM providers. When set, it takes precedence over AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY. Required unless you set both AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY.

² Not required when both AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY are set.

Embedding Model

The embedding model converts each memory block into a vector for semantic search.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_EMBEDDING_MODEL`	Yes	—	Name of the embedding model, for example `text-embedding-3-small`.
`AGENTMEMORY_EMBEDDING_API_KEY`	No³	—	API key for the embedding provider. Used only when `OPENAI_API_KEY` is not set.
`AGENTMEMORY_EMBEDDING_URL`	No	—	Base URL of an OpenAI-compatible embedding endpoint. Set this to use a self-hosted model or an alternative provider.

AGENTMEMORY_EMBEDDING_MODEL

Yes

—

Name of the embedding model, for example text-embedding-3-small.

AGENTMEMORY_EMBEDDING_API_KEY

No³

—

API key for the embedding provider. Used only when OPENAI_API_KEY is not set.

AGENTMEMORY_EMBEDDING_URL

—

Base URL of an OpenAI-compatible embedding endpoint. Set this to use a self-hosted model or an alternative provider.

³ If OPENAI_API_KEY is set, this variable is ignored.

Agent Memory creates a Search Vector Index automatically with a dimensionality matching the embedding model you configure. If you change the embedding model to a model that produces vectors of a different dimensionality, you must recreate the index.

Large Language Model

Agent Memory uses an LLM to generate a concise, human-readable summary for each memory block. Summaries let agents quickly interpret search results without reading full block content.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_LLM_MODEL`	Yes	—	Name of the LLM, for example `gpt-4o-mini`.
`AGENTMEMORY_LLM_API_KEY`	No⁴	—	API key for the LLM provider. Used only when `OPENAI_API_KEY` is not set.
`AGENTMEMORY_LLM_URL`	No	—	Base URL of an OpenAI-compatible LLM endpoint. Set this to use a self-hosted model or an alternative provider.
`AGENTMEMORY_SUMMARY_TOKEN_LIMIT`	No	`700`	Maximum number of tokens in a generated summary. Reduce this to keep summaries shorter and decrease LLM token usage per block.

AGENTMEMORY_LLM_MODEL

Yes

—

Name of the LLM, for example gpt-4o-mini.

AGENTMEMORY_LLM_API_KEY

No⁴

—

API key for the LLM provider. Used only when OPENAI_API_KEY is not set.

AGENTMEMORY_LLM_URL

—

Base URL of an OpenAI-compatible LLM endpoint. Set this to use a self-hosted model or an alternative provider.

AGENTMEMORY_SUMMARY_TOKEN_LIMIT

700

Maximum number of tokens in a generated summary. Reduce this to keep summaries shorter and decrease LLM token usage per block.

⁴ If OPENAI_API_KEY is set, this variable is ignored.

Semantic Extraction

Semantic extraction is the process by which Agent Memory generates an embedding and, when using an LLM, a summary for each ingested memory block. Semantic extraction is enabled by default and is what makes blocks searchable. Without embeddings, semantic search has nothing to query against.

If you turn off semantic extraction, Agent Memory skips all model API calls at ingestion time, making ingestion faster and eliminating model provider costs. However, with no semantic extraction, blocks are stored but you cannot search them. In this situation, you can only retrieve blocks using direct lookup with a known block ID.

Only turn off semantic extraction when:

You’re running a test environment and do not need search.
Your deployment retrieves blocks by ID rather than through semantic search.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_EXTRACT_SEMANTIC`	No	`true`	Set to `false` to turn off semantic extraction. When `false`, Agent Memory stores blocks but does not generate embeddings or summaries, and semantic search returns no results.
`AGENTMEMORY_PROCESSING_BATCH_SIZE`	No	`5`	Number of memory blocks processed per batch during synchronous embedding and fact extraction. Only applies when `async_processing=false`. Increase this to raise throughput when your model provider’s rate limits allow it.

AGENTMEMORY_EXTRACT_SEMANTIC

true

Set to false to turn off semantic extraction. When false, Agent Memory stores blocks but does not generate embeddings or summaries, and semantic search returns no results.

AGENTMEMORY_PROCESSING_BATCH_SIZE

5

Number of memory blocks processed per batch during synchronous embedding and fact extraction. Only applies when async_processing=false. Increase this to raise throughput when your model provider’s rate limits allow it.

Server Settings

These variables control the network interface, port, and thread pool the Agent Memory server uses at runtime.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_SERVER_HOST`	No	`0.0.0.0`	Address the server binds to.
`AGENTMEMORY_SERVER_PORT`	No	`8080`	Port the API listens on. This must match the port you publish with `-p` and any upstream reverse proxy or load balancer.
`AGENTMEMORY_THREAD_POOL_SIZE`	No	`40`	Number of worker threads for synchronous route handlers and the Couchbase SDK. Run a single Uvicorn worker and increase this value instead of adding workers — the async extraction processor must remain a singleton.
`AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS`	No	`60`	Cache TTL, in seconds, for health check results. Set to `0` to disable health caching.
`AGENTMEMORY_HEALTH_REFRESH_INTERVAL_SECONDS`	No	`30`	Interval, in seconds, at which the background health cache is refreshed. Must be less than `AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS`.

AGENTMEMORY_SERVER_HOST

0.0.0.0

Address the server binds to.

AGENTMEMORY_SERVER_PORT

8080

Port the API listens on. This must match the port you publish with -p and any upstream reverse proxy or load balancer.

AGENTMEMORY_THREAD_POOL_SIZE

40

Number of worker threads for synchronous route handlers and the Couchbase SDK. Run a single Uvicorn worker and increase this value instead of adding workers — the async extraction processor must remain a singleton.

AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS

60

Cache TTL, in seconds, for health check results. Set to 0 to disable health caching.

AGENTMEMORY_HEALTH_REFRESH_INTERVAL_SECONDS

30

Interval, in seconds, at which the background health cache is refreshed. Must be less than AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS.

HTTPS

These variables configure TLS for the Agent Memory HTTP listener.

These variables are intended for development and internal use with self-signed certificates. For production deployments, terminate TLS at a reverse proxy or load balancer instead of configuring it directly on the Agent Memory server. For more information, see Deploy Agent Memory for Production.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_SSL_CERTFILE`	No	—	Path to the TLS certificate file for the HTTP listener.
`AGENTMEMORY_SSL_KEYFILE`	No	—	Path to the TLS private key file for the HTTP listener.
`AGENTMEMORY_SSL_KEYFILE_PASSWORD`	No	—	Password for an encrypted TLS private key file.
`AGENTMEMORY_SSL_CA_CERTS`	No	—	Path to the CA bundle used by Uvicorn for TLS client certificate verification.

AGENTMEMORY_SSL_CERTFILE

—

Path to the TLS certificate file for the HTTP listener.

AGENTMEMORY_SSL_KEYFILE

—

Path to the TLS private key file for the HTTP listener.

AGENTMEMORY_SSL_KEYFILE_PASSWORD

—

Password for an encrypted TLS private key file.

AGENTMEMORY_SSL_CA_CERTS

—

Path to the CA bundle used by Uvicorn for TLS client certificate verification.

Memory Retention and Quotas

These variables control automatic expiry and capacity limits.

TTL is the primary mechanism for privacy compliance and data minimization. Any memory block that contains personally identifiable information should carry a TTL. Use a tiered retention approach:

Hours to days: Transient conversational details.
Weeks to months: Durable user preferences.
Indefinite (0): Anonymized or aggregate facts only.

A block-level TTL overrides the session default, which overrides the global default set by AGENTMEMORY_MEMORY_BLOCK_TTL.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_MEMORY_BLOCK_TTL`	No	`0`	Global default time to live, in seconds, applied to memory blocks. `0` means no expiry. A session-level or block-level TTL overrides this value.
`AGENTMEMORY_MEMORY_QUOTA_MB`	No	`512`	Estimated memory budget, in megabytes, used by the memory monitor.
`AGENTMEMORY_MEMORY_THRESHOLD_PERCENT`	No	`70`	Percentage of the quota at which the memory monitor begins rejecting ingestion requests.
`AGENTMEMORY_MEMORY_MONITOR`	No	`false`	Enables the memory monitor. When enabled, Agent Memory tracks estimated usage and rejects ingestion once usage exceeds the threshold percentage of the quota.
`AGENTMEMORY_MEMORY_CHECK_INTERVAL`	No	`5.0`	Interval, in seconds, at which the memory monitor samples process memory usage. Only applies when `AGENTMEMORY_MEMORY_MONITOR` is `true`.

AGENTMEMORY_MEMORY_BLOCK_TTL

0

Global default time to live, in seconds, applied to memory blocks. 0 means no expiry. A session-level or block-level TTL overrides this value.

AGENTMEMORY_MEMORY_QUOTA_MB

512

Estimated memory budget, in megabytes, used by the memory monitor.

AGENTMEMORY_MEMORY_THRESHOLD_PERCENT

70

Percentage of the quota at which the memory monitor begins rejecting ingestion requests.

AGENTMEMORY_MEMORY_MONITOR

false

Enables the memory monitor. When enabled, Agent Memory tracks estimated usage and rejects ingestion once usage exceeds the threshold percentage of the quota.

AGENTMEMORY_MEMORY_CHECK_INTERVAL

5.0

Interval, in seconds, at which the memory monitor samples process memory usage. Only applies when AGENTMEMORY_MEMORY_MONITOR is true.

Rate Limiting and Throughput

The model provider is almost always the throughput ceiling. Agent Memory and Couchbase can ingest far faster than any hosted model provider allows, so the extraction pipeline is almost always the bottleneck. Increasing server or database resources without raising your model provider tier has no effect on extraction throughput.

When setting limits and monitoring throughput, keep the following in mind:

Match the limits to your provider tier

Agent Memory enforces a requests-per-minute limit and a tokens-per-minute limit against the model provider. Set both to the actual limits shown in your provider’s account dashboard:

Set too high, and the provider returns 429 errors. This reduces throughput as Agent Memory handles this by waiting and retrying.
Set too low, and provider capacity goes unused.

Use extraction queue depth to find the bottleneck

The extraction queue depth reported by the /health/async-batch-processor-stats endpoint is the primary signal to identify a bottleneck in your system:

A steadily growing queue means extraction is not keeping up with ingestion, so the model provider is the constraint. To improve throughput, upgrade the provider tier or throttle ingestion at the application layer.
A stable queue near zero means the system is not saturated.

Variable Required Default Description

Variable	Required	Default	Description
`AGENTMEMORY_MAX_REQUESTS_PER_MINUTE`	No	`1500`	Maximum model API requests per minute. Set this to the actual limit shown in your provider’s account dashboard.
`AGENTMEMORY_MAX_TOKENS_PER_MINUTE`	No	`125000`	Maximum model API tokens per minute. Set this to the actual limit shown in your provider’s account dashboard.
`AGENTMEMORY_MAX_CONCURRENT_TASKS`	No	`100`	Maximum number of concurrent async extraction tasks. A useful starting point is `(requests_per_minute / 60) × average_LLM_latency_seconds`.
`AGENTMEMORY_PER_REQUEST_TOKEN_LIMIT`	No	`4096`	Maximum token count for a single memory block. Blocks that exceed this limit are rejected from the extraction queue.
`AGENTMEMORY_QUEUE_MAX_SIZE`	No	`10000`	Maximum number of blocks that can be queued for async extraction. When the queue is full, Agent Memory returns HTTP `429` with a `Retry-After` header. If this happens frequently, increase this value or explore other scaling options.
`AGENTMEMORY_MAX_FAIL_COUNT`	No	`3`	Number of extraction failures after which a block is marked as permanently failed.
`AGENTMEMORY_TASK_TIMEOUT_SECONDS`	No	`300`	Maximum time, in seconds, that a single async extraction task can run before it is cancelled. Increase this for slow or heavily rate-limited model providers.
`AGENTMEMORY_IDLE_SLEEP_SECONDS`	No	`0.5`	Sleep duration, in seconds, between async extraction dispatch cycles. Decrease to reduce extraction latency at the cost of higher CPU usage when the queue is idle.

AGENTMEMORY_MAX_REQUESTS_PER_MINUTE

1500

Maximum model API requests per minute. Set this to the actual limit shown in your provider’s account dashboard.

AGENTMEMORY_MAX_TOKENS_PER_MINUTE

125000

Maximum model API tokens per minute. Set this to the actual limit shown in your provider’s account dashboard.

AGENTMEMORY_MAX_CONCURRENT_TASKS

100

Maximum number of concurrent async extraction tasks. A useful starting point is (requests_per_minute / 60) × average_LLM_latency_seconds.

AGENTMEMORY_PER_REQUEST_TOKEN_LIMIT

4096

Maximum token count for a single memory block. Blocks that exceed this limit are rejected from the extraction queue.

AGENTMEMORY_QUEUE_MAX_SIZE

10000

Maximum number of blocks that can be queued for async extraction. When the queue is full, Agent Memory returns HTTP 429 with a Retry-After header. If this happens frequently, increase this value or explore other scaling options.

AGENTMEMORY_MAX_FAIL_COUNT

3

Number of extraction failures after which a block is marked as permanently failed.

AGENTMEMORY_TASK_TIMEOUT_SECONDS

300

Maximum time, in seconds, that a single async extraction task can run before it is cancelled. Increase this for slow or heavily rate-limited model providers.

AGENTMEMORY_IDLE_SLEEP_SECONDS

0.5

Sleep duration, in seconds, between async extraction dispatch cycles. Decrease to reduce extraction latency at the cost of higher CPU usage when the queue is idle.

Logging

For information on how to access log files in a running deployment, see Viewing Logs.

Variable Required Default Description

Variable	Required	Default	Description
`LOG_LEVEL`	No	`INFO`	Log verbosity level. Accepted values: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.
`TIMESTAMPED_LOGS`	No	`true`	Set to `false` to use fixed log filenames instead of timestamp-based names.
`LOG_MAX_BYTES`	No	`10485760`	Maximum log file size in bytes before rotation. The default is 10 MB.
`LOG_BACKUP_COUNT`	No	`10`	Number of rotated log files to retain.

LOG_LEVEL

INFO

Log verbosity level. Accepted values: DEBUG, INFO, WARNING, ERROR, CRITICAL.

TIMESTAMPED_LOGS

true

Set to false to use fixed log filenames instead of timestamp-based names.

LOG_MAX_BYTES

10485760

Maximum log file size in bytes before rotation. The default is 10 MB.

LOG_BACKUP_COUNT

10

Number of rotated log files to retain.

For AI agents:

Agent Memory Environment Variable Reference

Database Connection

TLS and mTLS

Write Durability

Model Providers

API Keys

Embedding Model

Large Language Model

Semantic Extraction

Server Settings

HTTPS

Memory Retention and Quotas

Rate Limiting and Throughput

Logging

Next Steps