Agent Memory Environment Variable Reference

  • reference
Reference for environment variables that configure the Agent Memory server, organized by functional area.
For authentication variables, see Configure Agent Memory Authentication.

You configure Agent Memory through environment variables, either in a .env file passed to the container with --env-file, or injected directly into the container environment. Variables set directly in the container environment take precedence over values in the .env file.

The distribution includes a .env-sample file as a starting template. A minimal deployment requires only the database connection variables and a model provider API key, while everything else has a default or is optional. For a minimal deployment configuration, see Get Started with Agent Memory.

Database Connection

These variables tell Agent Memory which Couchbase cluster and bucket to use.

For help finding the connection string for your Capella operational database, see Connect To Your Cluster.
Variable Required Default Description

AGENTMEMORY_CONN_STRING

Yes

 — 

Couchbase connection string. Use couchbase:// for a cluster without TLS, or couchbases:// for a cluster with TLS.

AGENTMEMORY_USERNAME

Yes1

 — 

Username of the Couchbase cluster access credentials.

AGENTMEMORY_PASSWORD

Yes1

 — 

Password of the Couchbase cluster access credentials.

AGENTMEMORY_BUCKET

Yes

 — 

Name of the bucket Agent Memory reads from and writes to. The bucket holds the users, sessions, and memory collections in the agentmemory scope.

1 Username and password are ignored when you authenticate with a client certificate. For more information, see TLS and mTLS.

TLS and mTLS

These variables secure the connection between Agent Memory and Couchbase.

Variable Required Default Description

AGENTMEMORY_CONN_ROOT_CERTIFICATE

No

 — 

Path inside the container to the cluster’s root CA certificate. Required when connecting to Couchbase Capella.

AGENTMEMORY_CONN_CLIENT_CERT

No

 — 

Path to the client certificate for mutual TLS (mTLS).

AGENTMEMORY_CONN_CLIENT_KEY

No

 — 

Path to the client private key for mutual TLS (mTLS).

When you provide a client certificate, AGENTMEMORY_USERNAME and AGENTMEMORY_PASSWORD are ignored.

Never use plain text couchbase:// connections in production. Use couchbases:// instead, with the appropriate TLS settings.

Write Durability

By default, Agent Memory uses standard writes for memory blocks. Enable durable writes to require the cluster to acknowledge persistence to the active node’s disk before confirming each write. For more information, see Durability.

Variable Required Default Description

AGENTMEMORY_DURABLE_WRITES

No

false

Set to true to enable durable writes for memory blocks. Requires at least one replica on the bucket. Do not enable this on a single-node cluster or a bucket with no replicas.

Model Providers

Agent Memory uses an embedding model and an LLM to power semantic extraction.

API Keys

Variable Required Default Description

OPENAI_API_KEY

Yes2

 — 

Default API key for both the embedding and LLM providers. When set, it takes precedence over AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY. Required unless you set both AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY.

2 Not required when both AGENTMEMORY_EMBEDDING_API_KEY and AGENTMEMORY_LLM_API_KEY are set.

Embedding Model

The embedding model converts each memory block into a vector for semantic search.

Variable Required Default Description

AGENTMEMORY_EMBEDDING_MODEL

Yes

 — 

Name of the embedding model, for example text-embedding-3-small.

AGENTMEMORY_EMBEDDING_API_KEY

No3

 — 

API key for the embedding provider. Used only when OPENAI_API_KEY is not set.

AGENTMEMORY_EMBEDDING_URL

No

 — 

Base URL of an OpenAI-compatible embedding endpoint. Set this to use a self-hosted model or an alternative provider.

3 If OPENAI_API_KEY is set, this variable is ignored.

Agent Memory creates a Search Vector Index automatically with a dimensionality matching the embedding model you configure. If you change the embedding model to a model that produces vectors of a different dimensionality, you must recreate the index.

Large Language Model

Agent Memory uses an LLM to generate a concise, human-readable summary for each memory block. Summaries let agents quickly interpret search results without reading full block content.

Variable Required Default Description

AGENTMEMORY_LLM_MODEL

Yes

 — 

Name of the LLM, for example gpt-4o-mini.

AGENTMEMORY_LLM_API_KEY

No4

 — 

API key for the LLM provider. Used only when OPENAI_API_KEY is not set.

AGENTMEMORY_LLM_URL

No

 — 

Base URL of an OpenAI-compatible LLM endpoint. Set this to use a self-hosted model or an alternative provider.

AGENTMEMORY_SUMMARY_TOKEN_LIMIT

No

700

Maximum number of tokens in a generated summary. Reduce this to keep summaries shorter and decrease LLM token usage per block.

4 If OPENAI_API_KEY is set, this variable is ignored.

Semantic Extraction

Semantic extraction is the process by which Agent Memory generates an embedding and, when using an LLM, a summary for each ingested memory block. Semantic extraction is enabled by default and is what makes blocks searchable. Without embeddings, semantic search has nothing to query against.

If you turn off semantic extraction, Agent Memory skips all model API calls at ingestion time, making ingestion faster and eliminating model provider costs. However, with no semantic extraction, blocks are stored but you cannot search them. In this situation, you can only retrieve blocks using direct lookup with a known block ID.

Only turn off semantic extraction when:

  • You’re running a test environment and do not need search.

  • Your deployment retrieves blocks by ID rather than through semantic search.

Variable Required Default Description

AGENTMEMORY_EXTRACT_SEMANTIC

No

true

Set to false to turn off semantic extraction. When false, Agent Memory stores blocks but does not generate embeddings or summaries, and semantic search returns no results.

AGENTMEMORY_PROCESSING_BATCH_SIZE

No

5

Number of memory blocks processed per batch during synchronous embedding and fact extraction. Only applies when async_processing=false. Increase this to raise throughput when your model provider’s rate limits allow it.

Server Settings

These variables control the network interface, port, and thread pool the Agent Memory server uses at runtime.

Variable Required Default Description

AGENTMEMORY_SERVER_HOST

No

0.0.0.0

Address the server binds to.

AGENTMEMORY_SERVER_PORT

No

8080

Port the API listens on. This must match the port you publish with -p and any upstream reverse proxy or load balancer.

AGENTMEMORY_THREAD_POOL_SIZE

No

40

Number of worker threads for synchronous route handlers and the Couchbase SDK. Run a single Uvicorn worker and increase this value instead of adding workers — the async extraction processor must remain a singleton.

AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS

No

60

Cache TTL, in seconds, for health check results. Set to 0 to disable health caching.

AGENTMEMORY_HEALTH_REFRESH_INTERVAL_SECONDS

No

30

Interval, in seconds, at which the background health cache is refreshed. Must be less than AGENTMEMORY_HEALTH_CACHE_TTL_SECONDS.

HTTPS

These variables configure TLS for the Agent Memory HTTP listener.

These variables are intended for development and internal use with self-signed certificates. For production deployments, terminate TLS at a reverse proxy or load balancer instead of configuring it directly on the Agent Memory server. For more information, see Deploy Agent Memory for Production.
Variable Required Default Description

AGENTMEMORY_SSL_CERTFILE

No

 — 

Path to the TLS certificate file for the HTTP listener.

AGENTMEMORY_SSL_KEYFILE

No

 — 

Path to the TLS private key file for the HTTP listener.

AGENTMEMORY_SSL_KEYFILE_PASSWORD

No

 — 

Password for an encrypted TLS private key file.

AGENTMEMORY_SSL_CA_CERTS

No

 — 

Path to the CA bundle used by Uvicorn for TLS client certificate verification.

Memory Retention and Quotas

These variables control automatic expiry and capacity limits.

TTL is the primary mechanism for privacy compliance and data minimization. Any memory block that contains personally identifiable information should carry a TTL. Use a tiered retention approach:

Hours to days

Transient conversational details.

Weeks to months

Durable user preferences.

Indefinite (0)

Anonymized or aggregate facts only.

A block-level TTL overrides the session default, which overrides the global default set by AGENTMEMORY_MEMORY_BLOCK_TTL.

Variable Required Default Description

AGENTMEMORY_MEMORY_BLOCK_TTL

No

0

Global default time to live, in seconds, applied to memory blocks. 0 means no expiry. A session-level or block-level TTL overrides this value.

AGENTMEMORY_MEMORY_QUOTA_MB

No

512

Estimated memory budget, in megabytes, used by the memory monitor.

AGENTMEMORY_MEMORY_THRESHOLD_PERCENT

No

70

Percentage of the quota at which the memory monitor begins rejecting ingestion requests.

AGENTMEMORY_MEMORY_MONITOR

No

false

Enables the memory monitor. When enabled, Agent Memory tracks estimated usage and rejects ingestion once usage exceeds the threshold percentage of the quota.

AGENTMEMORY_MEMORY_CHECK_INTERVAL

No

5.0

Interval, in seconds, at which the memory monitor samples process memory usage. Only applies when AGENTMEMORY_MEMORY_MONITOR is true.

Rate Limiting and Throughput

The model provider is almost always the throughput ceiling. Agent Memory and Couchbase can ingest far faster than any hosted model provider allows, so the extraction pipeline is almost always the bottleneck. Increasing server or database resources without raising your model provider tier has no effect on extraction throughput.

When setting limits and monitoring throughput, keep the following in mind:

Match the limits to your provider tier

Agent Memory enforces a requests-per-minute limit and a tokens-per-minute limit against the model provider. Set both to the actual limits shown in your provider’s account dashboard:

  • Set too high, and the provider returns 429 errors. This reduces throughput as Agent Memory handles this by waiting and retrying.

  • Set too low, and provider capacity goes unused.

Use extraction queue depth to find the bottleneck

The extraction queue depth reported by the /health/async-batch-processor-stats endpoint is the primary signal to identify a bottleneck in your system:

  • A steadily growing queue means extraction is not keeping up with ingestion, so the model provider is the constraint. To improve throughput, upgrade the provider tier or throttle ingestion at the application layer.

  • A stable queue near zero means the system is not saturated.

Variable Required Default Description

AGENTMEMORY_MAX_REQUESTS_PER_MINUTE

No

1500

Maximum model API requests per minute. Set this to the actual limit shown in your provider’s account dashboard.

AGENTMEMORY_MAX_TOKENS_PER_MINUTE

No

125000

Maximum model API tokens per minute. Set this to the actual limit shown in your provider’s account dashboard.

AGENTMEMORY_MAX_CONCURRENT_TASKS

No

100

Maximum number of concurrent async extraction tasks. A useful starting point is (requests_per_minute / 60) × average_LLM_latency_seconds.

AGENTMEMORY_PER_REQUEST_TOKEN_LIMIT

No

4096

Maximum token count for a single memory block. Blocks that exceed this limit are rejected from the extraction queue.

AGENTMEMORY_QUEUE_MAX_SIZE

No

10000

Maximum number of blocks that can be queued for async extraction. When the queue is full, Agent Memory returns HTTP 429 with a Retry-After header. If this happens frequently, increase this value or explore other scaling options.

AGENTMEMORY_MAX_FAIL_COUNT

No

3

Number of extraction failures after which a block is marked as permanently failed.

AGENTMEMORY_TASK_TIMEOUT_SECONDS

No

300

Maximum time, in seconds, that a single async extraction task can run before it is cancelled. Increase this for slow or heavily rate-limited model providers.

AGENTMEMORY_IDLE_SLEEP_SECONDS

No

0.5

Sleep duration, in seconds, between async extraction dispatch cycles. Decrease to reduce extraction latency at the cost of higher CPU usage when the queue is idle.

Logging

For information on how to access log files in a running deployment, see Viewing Logs.

Variable Required Default Description

LOG_LEVEL

No

INFO

Log verbosity level. Accepted values: DEBUG, INFO, WARNING, ERROR, CRITICAL.

TIMESTAMPED_LOGS

No

true

Set to false to use fixed log filenames instead of timestamp-based names.

LOG_MAX_BYTES

No

10485760

Maximum log file size in bytes before rotation. The default is 10 MB.

LOG_BACKUP_COUNT

No

10

Number of rotated log files to retain.