Durable Agent - Agent SDK for Go

The durable_agent example is an interactive lab, not a minimal hello-world. It runs the agent client and Temporal worker in separate processes, streams events over the remote worker path, and includes a REPL so you can deliberately break things — kill workers, crash processes, miss approvals — and observe how Temporal and the SDK behave. Use it after Agent Worker when you need to feel durability guarantees, not just read about them. Source: examples/durable_agent/

Architecture

Process	Entry	SDK pattern
Worker	`durable_agent/worker`	`NewAgentWorker` — polls task queue `"agent"`, executes workflows and activities
Agent (client)	`durable_agent/agent`	`NewAgent` with `DisableLocalWorker` + `EnableRemoteWorkers` — starts runs, consumes stream

Both processes share the same agent options from durable_agent/opts/opts.go so the SDK fingerprint matches — name, LLM client, tools, approval policy, stream settings, etc. The agent uses Stream (WithStream(true)). The REPL prints event types as they arrive:

TEXT_MESSAGE_CONTENT — token deltas
Tool lifecycle events
CUSTOM — tool and delegation approvals (respond with OnApproval in the sample)
RUN_FINISHED / errors

Events arrive via Temporal’s remote event workflow path (UpdateWorkflow), not only in-process channels.

Before you start

Run all commands from the examples/ directory.

task infra:temporal:up && task infra:temporal:wait
# LLM_APIKEY in examples/.env (see Configuration)

Rule	Why
Start the worker before typing a prompt	Interactive mode checks for available workers at prompt submission, not at agent startup
Worker wait default: 5 minutes	`AgentModeInteractive` — fails with a clear error if no worker polls the queue in time
Autonomous mode skips worker pre-check	`AgentModeAutonomous` queues in Temporal until a worker appears — different failure mode on misconfig

Terminal 1 — worker:

AGENT_RUNTIME=temporal go run ./durable_agent/worker

Terminal 2 — agent REPL:

AGENT_RUNTIME=temporal go run ./durable_agent/agent

One-shot (no REPL): go run ./durable_agent/agent "Hello from remote agent!" Type exit, quit, or bye to leave the REPL.

Durability scenarios

Each scenario below lists what to try and what you should learn.

1. Baseline — worker first

Start worker, then agent, send Hello from remote agent!. Learn: End-to-end remote path works — stream completes, usage footer prints, REPL ready for the next turn.

2. Agent without worker (intentional timeout)

Start agent with no worker, send a prompt, wait for timeout error: no worker available: timed out waiting for workers on task queue "agent". Then start worker and resend the same prompt — run succeeds. Learn: Interactive agents fail fast when no worker is polling — better than hanging forever. After worker comes up, a new submission works. (This scenario intentionally waits out the ~5 minute check window.)

3. Kill worker between runs

3a — Graceful stop: Complete a run, Ctrl+C worker, restart worker, send another prompt. 3b — Crash: Complete a run, kill -9 the worker PID, restart worker, send another prompt. Learn: Planned or crash worker shutdown between runs does not lose completed work. Temporal history already recorded finished activities; the restarted worker does not re-execute them.

4. Kill worker during an LLM call (mid-stream)

Send a long prompt (e.g. 7-day Japan travel plan). While tokens stream, stop the worker. 4a — Worker stays down: Stream pauses silently until agent timeout (~5 min), then deadline exceeded: no worker resumed the workflow within the timeout. Restart worker to resume normal operation. 4b — Worker restarts before timeout: Stop worker mid-stream, restart within 5 minutes. Temporal reschedules the in-flight LLM activity; stream resumes on the same agent process without resending the prompt. Learn: Core durability — workflow continues in Temporal when the client still lives. Caveat: activity retry may repeat token deltas in the stream (LLM activity reruns from the start); final conversation content is still one complete result, not duplicated chunks. See Temporal streaming guarantees.

5. Agent restart or crash

5a — Graceful agent exit: Finish a run, type bye, restart agent, new prompt works against same worker. 5b — Agent crash between runs: kill -9 agent after a completed run; worker keeps polling; new agent process connects and runs immediately. 5c — Agent crash mid-LLM call: Kill agent while tokens stream; worker completes the run in Temporal even though the user saw nothing. Restart agent — follow-up prompt has no conversation memory in this example (no WithConversation wired). Learn: Worker survives agent death; work can finish server-side. This example does not persist chat history — a follow-up like “What was the first destination?” gets no context. For production UIs where users expect continuity after reconnect, see Agent Chat (Postgres + SSE + durable workflows).

6. Two workers, one queue

Run two durable_agent/worker processes on the same task queue. Send prompts; Ctrl+C one worker mid-session; send another prompt. Learn: Temporal load-balances across workers. Losing one worker mid-session does not drop an in-flight run if another worker polls the same queue.

7. Task queue mismatch

Temporarily change the worker’s task queue in worker/main.go (e.g. "wrong-queue") while the agent still uses "agent". Learn: Misconfiguration surfaces as no worker available after timeout — not silent corruption. Revert queue names and restart both processes. Tip: Under AgentModeAutonomous, the immediate worker check is skipped — mismatch may queue until timeout instead of failing fast. Always align task queue (and shared opts) between agent and worker before deploy.

Approvals in this example

The sample registers tools that require approval. Stream emits CUSTOM events — parse with ParseCustomEventApproval / ParseCustomEventDelegation and call OnApproval with the token (same pattern as Approvals). If you ignore approvals, tools are skipped with a clear message rather than hanging indefinitely.

What this example does not cover

Topic	Where to go
Minimal split client/worker	Agent Worker
Conversation across restarts	Agent Chat or Conversation with Redis
Production checklist	Readiness

Learn more

Temporal Runtime

Architecture, streaming guarantees, agent modes

Worker Separation

Production split-process pattern

Approvals

Stream CUSTOM events and OnApproval

Agent Chat

Durable chat with persisted history

​Architecture

​Before you start

​Durability scenarios

​1. Baseline — worker first

​2. Agent without worker (intentional timeout)

​3. Kill worker between runs

​4. Kill worker during an LLM call (mid-stream)

​5. Agent restart or crash

​6. Two workers, one queue

​7. Task queue mismatch

​Approvals in this example

​What this example does not cover

​Learn more