Skip to main content
Use this checklist before deploying agents built with Agent SDK for Go to production. It complements the Temporal runtime and Worker separation guides.

Run limits and agent loops

  • Bound run duration — Set WithTimeout and/or a context deadline on Run, Stream, and RunAsync. Context deadline always wins over agent timeout.
  • Approval timeouts — When tools require approval, set WithApprovalTimeout less than the run timeout. Default is agent timeout − 30s.
  • Max iterations — Set WithMaxIterations to cap LLM rounds (default 5). Runs finish with finish_reason: max_iterations when hit.
  • Sub-agent depth — Set WithMaxSubAgentDepth when using delegation (default 2).
  • Agent mode — Use AgentModeInteractive (5 min default) for user-facing apps; AgentModeAutonomous (60 min default) for background pipelines. See Timeouts & Modes.
Activity retry counts inside Temporal workflows are fixed in the SDK — not user-tunable.

LLM provider fallback strategy

  • No built-in failover — The SDK calls a single LLMClient per agent. Failover is your responsibility at the client layer.
  • Implement a wrapper — Wrap multiple provider clients with your own retry and fallback logic, implementing interfaces.LLMClient. On 429 or 5xx, switch to the secondary provider.
  • Circuit breaker — Track consecutive LLM errors and open a circuit to avoid hammering a degraded provider. Re-probe after a backoff window.
  • Sticky provider per session — For conversation continuity, route the same conversation ID to the same provider. Provider-switching mid-conversation can cause context drift.
  • Test fallback paths — Add an integration test that injects a failing LLM client and confirms the agent returns a clean error rather than hanging.
See LLM Providers for the LLMClient interface and built-in clients.

Error handling patterns

  • Always check errorsRun, RunAsync, and Stream all return errors. A nil result with a non-nil error means the run did not complete — do not access result.Content.
  • Typed errorscontext.DeadlineExceeded means a timeout fired. ErrMaxIterationsReached means the agent hit WithMaxIterations. Log both with the run context (agent name, conversation ID).
  • RunAsync errors — Errors arrive on the channel as AgentRunAsyncResult.Error. Always select or range on the channel — an unread channel leaks the goroutine.
  • Stream errors — An AgentEventTypeRunError event carries the error message. Drain the channel to completion even after an error event; close from the producing side.
  • Surface errors to users clearly — Distinguish timeout errors (“request took too long”), provider errors (“AI service unavailable”), and logic errors (“agent reached its step limit”) in your UI layer.
  • Retries — Retrying Run with the same ConversationOptions.ID will continue from the saved conversation history (if conversation is enabled). Beware of idempotency if tools have side effects.
See Timeouts & Modes for the full failure behavior table.

Tool and delegation risk

  • Approval policy — Choose WithToolApprovalPolicy per agent — main and each specialist. Default is require-all; use AutoToolApprovalPolicy() only when you fully trust the agent surface.
  • Human review — Require approval for dangerous tools, MCP-exposed capabilities, and sub-agent delegation where policy demands it.
  • Tool authorization — Implement ToolAuthorizer for programmatic gates (scopes, tenancy, feature flags) before approval or execution.
  • Parallel vs sequential tools — Use WithAgentToolExecutionMode consistently across NewAgent, NewAgentWorker, and sub-agents when order or shared state matters.

MCP and external tools

  • Attack surface — Remote MCP servers widen what the LLM can invoke. Audit connected servers and use ToolFilter to allowlist tools.
  • TLS — Prefer TLS for streamable HTTP MCP in production. Avoid SkipTLSVerify outside local development.
  • Secrets — Protect bearer tokens, OAuth credentials, and custom headers. Never commit them to source control.
  • Runtime registration — Dynamic MCP changes on the client do not propagate to remote workers automatically — see Dynamic Capabilities.

Split processes and Temporal

  • Worker separation — When using DisableLocalWorker, run NewAgentWorker with matching configuration on a separate process.
  • Remote workers — Pass EnableRemoteWorkers() when using streaming or approvals across processes.
  • Distributed conversation — Use Redis (not in-memory) for conversation when client and worker are split. Same config on both processes.
  • Config fingerprint — Keep task queue, tools, MCP/A2A setup, memory, hooks, and observability config aligned. Avoid WithDisableFingerprintCheck in production.
  • Integration tests — Exercise approval and streaming paths with split processes before deploy.
  • Workflow replay — After upgrading the SDK module, confirm existing workflows still replay in your Temporal environment.

Rate limiting awareness

  • LLM provider rate limits — Most providers enforce requests-per-minute (RPM) and tokens-per-minute (TPM) limits. A 429 from the provider surfaces as an LLM activity error in Temporal (which retries it) or a run error in-process (which does not retry by default).
  • Implement exponential backoff — In your LLMClient wrapper, catch 429 responses and back off before retrying. Do not let unbounded retries exhaust your Temporal workflow timeout.
  • Track token usage — Monitor agent.llm.tokens.input and agent.llm.tokens.output OTLP metrics. Set alerts when per-minute token totals approach your provider limit. See Metrics.
  • Cap concurrent runs — A burst of concurrent RunAsync calls multiplies LLM load. Use a semaphore or rate limiter in your API layer before calling the agent.
  • Autonomous vs interactive modeAgentModeAutonomous (60 min timeout) can queue many long-running workflows. Size your Temporal workers and provider quota to match expected concurrency.

Cost controls

  • Token budget per run — Set WithLLMSampling(&LLMSampling{MaxTokens: N}) to cap completion tokens per LLM call. Prevents runaway costs from unexpectedly long responses.
  • Cap iteration countWithMaxIterations (default 5) directly caps how many LLM calls a single run makes. Reduce it for cost-sensitive workloads.
  • Monitor token usage — Read result.LLMUsage.TotalTokens on every run and aggregate by user/tenant. See Token Usage.
  • OTLP token metricsagent.llm.tokens.input and agent.llm.tokens.output histograms let you alert on token spend in dashboards. See Metrics.
  • Sub-agent token multiplication — Each sub-agent delegation triggers its own LLM calls. Monitor sub-agent trees carefully — a 3-level deep tree with 3 tools per level can multiply token spend significantly.
  • Estimate before production — Run the Benchmarks harness with real provider clients and mock_tokens tuned to your expected prompt size to estimate cost at target load.

Secrets and data handling

  • Credentials — Keep LLM API keys and Temporal credentials in environment variables or a secrets manager — not in source control.
  • Untrusted I/O — Treat tool arguments and model output as untrusted at your application boundary.
  • Prompt safety — Validate and sanitize prompts, tool args, and model output in your integration layer. Consider Hooks for guardrails and PII scrubbing.
  • Conversation and memory — You own Clear on conversation and memory stores. Scope memory with tenant and user context — see Memory.

Observability

  • OTLP wiring — Use WithObservabilityConfig on both NewAgent and NewAgentWorker so traces, metrics, and logs from worker activities reach your collector.
  • Collector reachability — Confirm your OTLP endpoint is reachable before deploying. Use Insecure: true only in development.
  • Run telemetry — Log AgentTelemetry from results for operational insight — LLM call count, tool breakdown, finish reason.
  • Structured logging — Set WithLogLevel appropriately; use WithLogger to integrate with your log pipeline.
  • Temporal UI — Use workflow history and the Temporal Web UI to debug individual runs.

Temporal worker scaling

  • Workers are stateless — Temporal holds all workflow state. Scale workers horizontally by adding replicas; each polls the same task queue and picks up available work.
  • One task queue per agent type — The root agent, each sub-agent, and each NewAgentWorker must use a unique task queue. Sharing a queue between different agent types causes workflow routing errors.
  • Scale sub-agent workers independently — Sub-agents are child workflows on separate task queues. A high-delegation workload may require more sub-agent workers than root agent workers.
  • Size worker concurrency — Each worker runs multiple workflow goroutines concurrently. Tune via Temporal worker options (max concurrent workflow tasks, activity tasks). Match to your LLM provider’s concurrency quota to avoid throttling.
  • Worker health checks — Probe that workers are polling before accepting traffic. AgentModeInteractive with DisableLocalWorker performs a pre-check automatically — ensure at least one worker is registered on the task queue.
  • Rolling deploys — Temporal workflows replay on new workers. After an SDK upgrade, verify workflow replay compatibility in staging before rolling out to production workers.
See Worker Separation and Multiple Agents.

Operations

  • Graceful shutdown — Call Agent.Close() to flush OTLP exporters. Call AgentWorker.Stop() on worker processes.
  • Streaming UX — Live stream events are not backfilled after disconnect. Design interactive UIs to wait for the current turn — see Temporal streaming guarantees.
  • Health checks — Verify Temporal connectivity, worker availability, Redis/Postgres backends, and OTLP collector health in your deployment probes.

Pre-deploy checklist

AreaVerify
LLM fallbackFailover strategy or wrapper in place; fallback path tested
Rate limitsBackoff on 429; token/request quota sized for target concurrency
Cost controlsMaxTokens, MaxIterations set; token metrics monitored
Error handlingAll Run/Stream/RunAsync errors checked and surfaced cleanly
TimeoutsWithTimeout, context deadlines, WithApprovalTimeout set for your SLA
IterationsWithMaxIterations and WithMaxSubAgentDepth bounded
ApprovalsPolicy matches risk for each agent and specialist
TemporalCluster reachable; client/worker config aligned; replay tested
Worker scalingTask queues unique; worker count sized to LLM quota
ConversationRedis or distributed store if split processes
ObservabilityOTLP on agent and worker; collector tested; alerts configured
SecretsNo keys in repo; env/secrets manager in place
SDK upgradeWorkflow replay tested after module bump

Validate before deploy

Run these against your target environment to confirm configuration:
CheckResource
Split client/worker and streamingAgent Worker, Durable Agent
OTLP export from agent and workerObservability example
Behavioral regressionEval harness
End-to-end HTTP + Temporal + PostgresAgent Chat
Load and orchestration limitsBenchmarks

Worker Separation

Split client and worker processes

Approvals

Tool and delegation approval flows

Observability

Traces, metrics, and logs

Agent Chat

Reference app architecture patterns