Run limits and agent loops
- Bound run duration — Set
WithTimeoutand/or a context deadline onRun,Stream, andRunAsync. Context deadline always wins over agent timeout. - Approval timeouts — When tools require approval, set
WithApprovalTimeoutless than the run timeout. Default is agent timeout − 30s. - Max iterations — Set
WithMaxIterationsto cap LLM rounds (default 5). Runs finish withfinish_reason: max_iterationswhen hit. - Sub-agent depth — Set
WithMaxSubAgentDepthwhen using delegation (default 2). - Agent mode — Use
AgentModeInteractive(5 min default) for user-facing apps;AgentModeAutonomous(60 min default) for background pipelines. See Timeouts & Modes.
LLM provider fallback strategy
- No built-in failover — The SDK calls a single
LLMClientper agent. Failover is your responsibility at the client layer. - Implement a wrapper — Wrap multiple provider clients with your own retry and fallback logic, implementing
interfaces.LLMClient. On429or5xx, switch to the secondary provider. - Circuit breaker — Track consecutive LLM errors and open a circuit to avoid hammering a degraded provider. Re-probe after a backoff window.
- Sticky provider per session — For conversation continuity, route the same conversation ID to the same provider. Provider-switching mid-conversation can cause context drift.
- Test fallback paths — Add an integration test that injects a failing LLM client and confirms the agent returns a clean error rather than hanging.
LLMClient interface and built-in clients.
Error handling patterns
- Always check errors —
Run,RunAsync, andStreamall return errors. Anilresult with a non-nil error means the run did not complete — do not accessresult.Content. - Typed errors —
context.DeadlineExceededmeans a timeout fired.ErrMaxIterationsReachedmeans the agent hitWithMaxIterations. Log both with the run context (agent name, conversation ID). RunAsyncerrors — Errors arrive on the channel asAgentRunAsyncResult.Error. Always select or range on the channel — an unread channel leaks the goroutine.Streamerrors — AnAgentEventTypeRunErrorevent carries the error message. Drain the channel to completion even after an error event; close from the producing side.- Surface errors to users clearly — Distinguish timeout errors (“request took too long”), provider errors (“AI service unavailable”), and logic errors (“agent reached its step limit”) in your UI layer.
- Retries — Retrying
Runwith the sameConversationOptions.IDwill continue from the saved conversation history (if conversation is enabled). Beware of idempotency if tools have side effects.
Tool and delegation risk
- Approval policy — Choose
WithToolApprovalPolicyper agent — main and each specialist. Default is require-all; useAutoToolApprovalPolicy()only when you fully trust the agent surface. - Human review — Require approval for dangerous tools, MCP-exposed capabilities, and sub-agent delegation where policy demands it.
- Tool authorization — Implement
ToolAuthorizerfor programmatic gates (scopes, tenancy, feature flags) before approval or execution. - Parallel vs sequential tools — Use
WithAgentToolExecutionModeconsistently acrossNewAgent,NewAgentWorker, and sub-agents when order or shared state matters.
MCP and external tools
- Attack surface — Remote MCP servers widen what the LLM can invoke. Audit connected servers and use
ToolFilterto allowlist tools. - TLS — Prefer TLS for streamable HTTP MCP in production. Avoid
SkipTLSVerifyoutside local development. - Secrets — Protect bearer tokens, OAuth credentials, and custom headers. Never commit them to source control.
- Runtime registration — Dynamic MCP changes on the client do not propagate to remote workers automatically — see Dynamic Capabilities.
Split processes and Temporal
- Worker separation — When using
DisableLocalWorker, runNewAgentWorkerwith matching configuration on a separate process. - Remote workers — Pass
EnableRemoteWorkers()when using streaming or approvals across processes. - Distributed conversation — Use Redis (not in-memory) for conversation when client and worker are split. Same config on both processes.
- Config fingerprint — Keep task queue, tools, MCP/A2A setup, memory, hooks, and observability config aligned. Avoid
WithDisableFingerprintCheckin production. - Integration tests — Exercise approval and streaming paths with split processes before deploy.
- Workflow replay — After upgrading the SDK module, confirm existing workflows still replay in your Temporal environment.
Rate limiting awareness
- LLM provider rate limits — Most providers enforce requests-per-minute (RPM) and tokens-per-minute (TPM) limits. A
429from the provider surfaces as an LLM activity error in Temporal (which retries it) or a run error in-process (which does not retry by default). - Implement exponential backoff — In your
LLMClientwrapper, catch429responses and back off before retrying. Do not let unbounded retries exhaust your Temporal workflow timeout. - Track token usage — Monitor
agent.llm.tokens.inputandagent.llm.tokens.outputOTLP metrics. Set alerts when per-minute token totals approach your provider limit. See Metrics. - Cap concurrent runs — A burst of concurrent
RunAsynccalls multiplies LLM load. Use a semaphore or rate limiter in your API layer before calling the agent. - Autonomous vs interactive mode —
AgentModeAutonomous(60 min timeout) can queue many long-running workflows. Size your Temporal workers and provider quota to match expected concurrency.
Cost controls
- Token budget per run — Set
WithLLMSampling(&LLMSampling{MaxTokens: N})to cap completion tokens per LLM call. Prevents runaway costs from unexpectedly long responses. - Cap iteration count —
WithMaxIterations(default 5) directly caps how many LLM calls a single run makes. Reduce it for cost-sensitive workloads. - Monitor token usage — Read
result.LLMUsage.TotalTokenson every run and aggregate by user/tenant. See Token Usage. - OTLP token metrics —
agent.llm.tokens.inputandagent.llm.tokens.outputhistograms let you alert on token spend in dashboards. See Metrics. - Sub-agent token multiplication — Each sub-agent delegation triggers its own LLM calls. Monitor sub-agent trees carefully — a 3-level deep tree with 3 tools per level can multiply token spend significantly.
- Estimate before production — Run the Benchmarks harness with real provider clients and
mock_tokenstuned to your expected prompt size to estimate cost at target load.
Secrets and data handling
- Credentials — Keep LLM API keys and Temporal credentials in environment variables or a secrets manager — not in source control.
- Untrusted I/O — Treat tool arguments and model output as untrusted at your application boundary.
- Prompt safety — Validate and sanitize prompts, tool args, and model output in your integration layer. Consider Hooks for guardrails and PII scrubbing.
- Conversation and memory — You own
Clearon conversation and memory stores. Scope memory with tenant and user context — see Memory.
Observability
- OTLP wiring — Use
WithObservabilityConfigon bothNewAgentandNewAgentWorkerso traces, metrics, and logs from worker activities reach your collector. - Collector reachability — Confirm your OTLP endpoint is reachable before deploying. Use
Insecure: trueonly in development. - Run telemetry — Log
AgentTelemetryfrom results for operational insight — LLM call count, tool breakdown, finish reason. - Structured logging — Set
WithLogLevelappropriately; useWithLoggerto integrate with your log pipeline. - Temporal UI — Use workflow history and the Temporal Web UI to debug individual runs.
Temporal worker scaling
- Workers are stateless — Temporal holds all workflow state. Scale workers horizontally by adding replicas; each polls the same task queue and picks up available work.
- One task queue per agent type — The root agent, each sub-agent, and each
NewAgentWorkermust use a unique task queue. Sharing a queue between different agent types causes workflow routing errors. - Scale sub-agent workers independently — Sub-agents are child workflows on separate task queues. A high-delegation workload may require more sub-agent workers than root agent workers.
- Size worker concurrency — Each worker runs multiple workflow goroutines concurrently. Tune via Temporal worker options (max concurrent workflow tasks, activity tasks). Match to your LLM provider’s concurrency quota to avoid throttling.
- Worker health checks — Probe that workers are polling before accepting traffic.
AgentModeInteractivewithDisableLocalWorkerperforms a pre-check automatically — ensure at least one worker is registered on the task queue. - Rolling deploys — Temporal workflows replay on new workers. After an SDK upgrade, verify workflow replay compatibility in staging before rolling out to production workers.
Operations
- Graceful shutdown — Call
Agent.Close()to flush OTLP exporters. CallAgentWorker.Stop()on worker processes. - Streaming UX — Live stream events are not backfilled after disconnect. Design interactive UIs to wait for the current turn — see Temporal streaming guarantees.
- Health checks — Verify Temporal connectivity, worker availability, Redis/Postgres backends, and OTLP collector health in your deployment probes.
Pre-deploy checklist
| Area | Verify |
|---|---|
| LLM fallback | Failover strategy or wrapper in place; fallback path tested |
| Rate limits | Backoff on 429; token/request quota sized for target concurrency |
| Cost controls | MaxTokens, MaxIterations set; token metrics monitored |
| Error handling | All Run/Stream/RunAsync errors checked and surfaced cleanly |
| Timeouts | WithTimeout, context deadlines, WithApprovalTimeout set for your SLA |
| Iterations | WithMaxIterations and WithMaxSubAgentDepth bounded |
| Approvals | Policy matches risk for each agent and specialist |
| Temporal | Cluster reachable; client/worker config aligned; replay tested |
| Worker scaling | Task queues unique; worker count sized to LLM quota |
| Conversation | Redis or distributed store if split processes |
| Observability | OTLP on agent and worker; collector tested; alerts configured |
| Secrets | No keys in repo; env/secrets manager in place |
| SDK upgrade | Workflow replay tested after module bump |
Validate before deploy
Run these against your target environment to confirm configuration:| Check | Resource |
|---|---|
| Split client/worker and streaming | Agent Worker, Durable Agent |
| OTLP export from agent and worker | Observability example |
| Behavioral regression | Eval harness |
| End-to-end HTTP + Temporal + Postgres | Agent Chat |
| Load and orchestration limits | Benchmarks |
Related
Worker Separation
Split client and worker processes
Approvals
Tool and delegation approval flows
Observability
Traces, metrics, and logs
Agent Chat
Reference app architecture patterns