durable_agent example is an interactive lab, not a minimal hello-world. It runs the agent client and Temporal worker in separate processes, streams events over the remote worker path, and includes a REPL so you can deliberately break things — kill workers, crash processes, miss approvals — and observe how Temporal and the SDK behave.
Use it after Agent Worker when you need to feel durability guarantees, not just read about them.
Source: examples/durable_agent/
Architecture
| Process | Entry | SDK pattern |
|---|---|---|
| Worker | durable_agent/worker | NewAgentWorker — polls task queue "agent", executes workflows and activities |
| Agent (client) | durable_agent/agent | NewAgent with DisableLocalWorker + EnableRemoteWorkers — starts runs, consumes stream |
durable_agent/opts/opts.go so the SDK fingerprint matches — name, LLM client, tools, approval policy, stream settings, etc.
The agent uses Stream (WithStream(true)). The REPL prints event types as they arrive:
TEXT_MESSAGE_CONTENT— token deltas- Tool lifecycle events
CUSTOM— tool and delegation approvals (respond withOnApprovalin the sample)RUN_FINISHED/ errors
Before you start
Run all commands from theexamples/ directory.
| Rule | Why |
|---|---|
| Start the worker before typing a prompt | Interactive mode checks for available workers at prompt submission, not at agent startup |
| Worker wait default: 5 minutes | AgentModeInteractive — fails with a clear error if no worker polls the queue in time |
| Autonomous mode skips worker pre-check | AgentModeAutonomous queues in Temporal until a worker appears — different failure mode on misconfig |
go run ./durable_agent/agent "Hello from remote agent!"
Type exit, quit, or bye to leave the REPL.
Durability scenarios
Each scenario below lists what to try and what you should learn.1. Baseline — worker first
Start worker, then agent, sendHello from remote agent!.
Learn: End-to-end remote path works — stream completes, usage footer prints, REPL ready for the next turn.
2. Agent without worker (intentional timeout)
Start agent with no worker, send a prompt, wait for timeout error:no worker available: timed out waiting for workers on task queue "agent".
Then start worker and resend the same prompt — run succeeds.
Learn: Interactive agents fail fast when no worker is polling — better than hanging forever. After worker comes up, a new submission works. (This scenario intentionally waits out the ~5 minute check window.)
3. Kill worker between runs
3a — Graceful stop: Complete a run, Ctrl+C worker, restart worker, send another prompt. 3b — Crash: Complete a run,kill -9 the worker PID, restart worker, send another prompt.
Learn: Planned or crash worker shutdown between runs does not lose completed work. Temporal history already recorded finished activities; the restarted worker does not re-execute them.
4. Kill worker during an LLM call (mid-stream)
Send a long prompt (e.g. 7-day Japan travel plan). While tokens stream, stop the worker. 4a — Worker stays down: Stream pauses silently until agent timeout (~5 min), thendeadline exceeded: no worker resumed the workflow within the timeout. Restart worker to resume normal operation.
4b — Worker restarts before timeout: Stop worker mid-stream, restart within 5 minutes. Temporal reschedules the in-flight LLM activity; stream resumes on the same agent process without resending the prompt.
Learn: Core durability — workflow continues in Temporal when the client still lives. Caveat: activity retry may repeat token deltas in the stream (LLM activity reruns from the start); final conversation content is still one complete result, not duplicated chunks. See Temporal streaming guarantees.
5. Agent restart or crash
5a — Graceful agent exit: Finish a run, typebye, restart agent, new prompt works against same worker.
5b — Agent crash between runs: kill -9 agent after a completed run; worker keeps polling; new agent process connects and runs immediately.
5c — Agent crash mid-LLM call: Kill agent while tokens stream; worker completes the run in Temporal even though the user saw nothing. Restart agent — follow-up prompt has no conversation memory in this example (no WithConversation wired).
Learn: Worker survives agent death; work can finish server-side. This example does not persist chat history — a follow-up like “What was the first destination?” gets no context. For production UIs where users expect continuity after reconnect, see Agent Chat (Postgres + SSE + durable workflows).
6. Two workers, one queue
Run twodurable_agent/worker processes on the same task queue. Send prompts; Ctrl+C one worker mid-session; send another prompt.
Learn: Temporal load-balances across workers. Losing one worker mid-session does not drop an in-flight run if another worker polls the same queue.
7. Task queue mismatch
Temporarily change the worker’s task queue inworker/main.go (e.g. "wrong-queue") while the agent still uses "agent".
Learn: Misconfiguration surfaces as no worker available after timeout — not silent corruption. Revert queue names and restart both processes.
Tip: Under AgentModeAutonomous, the immediate worker check is skipped — mismatch may queue until timeout instead of failing fast. Always align task queue (and shared opts) between agent and worker before deploy.
Approvals in this example
The sample registers tools that require approval. Stream emitsCUSTOM events — parse with ParseCustomEventApproval / ParseCustomEventDelegation and call OnApproval with the token (same pattern as Approvals).
If you ignore approvals, tools are skipped with a clear message rather than hanging indefinitely.
What this example does not cover
| Topic | Where to go |
|---|---|
| Minimal split client/worker | Agent Worker |
| Conversation across restarts | Agent Chat or Conversation with Redis |
| Production checklist | Readiness |
Learn more
Temporal Runtime
Architecture, streaming guarantees, agent modes
Worker Separation
Production split-process pattern
Approvals
Stream CUSTOM events and OnApproval
Agent Chat
Durable chat with persisted history