> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agenticenv.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Durable Agent

> Test Temporal durability with worker crashes, process restarts, and mid-run recovery scenarios

The **`durable_agent`** example is an interactive **lab**, not a minimal hello-world. It runs the agent client and Temporal worker in **separate processes**, streams events over the remote worker path, and includes a REPL so you can deliberately break things — kill workers, crash processes, miss approvals — and observe how Temporal and the SDK behave.

Use it after [Agent Worker](/examples/agent-worker) when you need to **feel** durability guarantees, not just read about them.

Source: [`examples/durable_agent/`](https://github.com/agenticenv/agent-sdk-go/tree/main/examples/durable_agent)

## Architecture

| Process            | Entry                  | SDK pattern                                                                                 |
| ------------------ | ---------------------- | ------------------------------------------------------------------------------------------- |
| **Worker**         | `durable_agent/worker` | `NewAgentWorker` — polls task queue `"agent"`, executes workflows and activities            |
| **Agent (client)** | `durable_agent/agent`  | `NewAgent` with `DisableLocalWorker` + `EnableRemoteWorkers` — starts runs, consumes stream |

Both processes share the same agent options from `durable_agent/opts/opts.go` so the SDK **fingerprint** matches — name, LLM client, tools, approval policy, stream settings, etc.

The agent uses **`Stream`** (`WithStream(true)`). The REPL prints event types as they arrive:

* `TEXT_MESSAGE_CONTENT` — token deltas
* Tool lifecycle events
* `CUSTOM` — tool and delegation **approvals** (respond with `OnApproval` in the sample)
* `RUN_FINISHED` / errors

Events arrive via Temporal's remote event workflow path (UpdateWorkflow), not only in-process channels.

## Before you start

Run all commands from the **`examples/`** directory.

```bash theme={null}
task infra:temporal:up && task infra:temporal:wait
# LLM_APIKEY in examples/.env (see Configuration)
```

| Rule                                        | Why                                                                                                   |
| ------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| **Start the worker before typing a prompt** | Interactive mode checks for available workers at **prompt submission**, not at agent startup          |
| **Worker wait default: 5 minutes**          | `AgentModeInteractive` — fails with a clear error if no worker polls the queue in time                |
| **Autonomous mode skips worker pre-check**  | `AgentModeAutonomous` queues in Temporal until a worker appears — different failure mode on misconfig |

**Terminal 1 — worker:**

```bash theme={null}
AGENT_RUNTIME=temporal go run ./durable_agent/worker
```

**Terminal 2 — agent REPL:**

```bash theme={null}
AGENT_RUNTIME=temporal go run ./durable_agent/agent
```

One-shot (no REPL): `go run ./durable_agent/agent "Hello from remote agent!"`

Type `exit`, `quit`, or `bye` to leave the REPL.

## Durability scenarios

Each scenario below lists **what to try** and **what you should learn**.

### 1. Baseline — worker first

Start worker, then agent, send `Hello from remote agent!`.

**Learn:** End-to-end remote path works — stream completes, usage footer prints, REPL ready for the next turn.

### 2. Agent without worker (intentional timeout)

Start agent **with no worker**, send a prompt, wait for timeout error: `no worker available: timed out waiting for workers on task queue "agent"`.

Then start worker and **resend** the same prompt — run succeeds.

**Learn:** Interactive agents **fail fast** when no worker is polling — better than hanging forever. After worker comes up, a new submission works. (This scenario intentionally waits out the \~5 minute check window.)

### 3. Kill worker between runs

**3a — Graceful stop:** Complete a run, Ctrl+C worker, restart worker, send another prompt.

**3b — Crash:** Complete a run, `kill -9` the worker PID, restart worker, send another prompt.

**Learn:** Planned or crash worker shutdown **between runs** does not lose completed work. Temporal history already recorded finished activities; the restarted worker does not re-execute them.

### 4. Kill worker during an LLM call (mid-stream)

Send a long prompt (e.g. 7-day Japan travel plan). While tokens stream, stop the worker.

**4a — Worker stays down:** Stream pauses silently until agent timeout (\~5 min), then `deadline exceeded: no worker resumed the workflow within the timeout`. Restart worker to resume normal operation.

**4b — Worker restarts before timeout:** Stop worker mid-stream, restart within 5 minutes. Temporal reschedules the in-flight LLM **activity**; stream **resumes on the same agent process** without resending the prompt.

**Learn:** Core durability — workflow continues in Temporal when the client still lives. **Caveat:** activity retry may **repeat token deltas** in the stream (LLM activity reruns from the start); final conversation content is still one complete result, not duplicated chunks. See [Temporal streaming guarantees](/runtimes/temporal#streaming-and-approvals).

### 5. Agent restart or crash

**5a — Graceful agent exit:** Finish a run, type `bye`, restart agent, new prompt works against same worker.

**5b — Agent crash between runs:** `kill -9` agent after a completed run; worker keeps polling; new agent process connects and runs immediately.

**5c — Agent crash mid-LLM call:** Kill agent while tokens stream; **worker completes the run in Temporal** even though the user saw nothing. Restart agent — follow-up prompt has **no conversation memory** in this example (no `WithConversation` wired).

**Learn:** Worker survives agent death; work can finish server-side. **This example does not persist chat history** — a follow-up like "What was the first destination?" gets no context. For production UIs where users expect continuity after reconnect, see [Agent Chat](/reference-apps/agent-chat) (Postgres + SSE + durable workflows).

### 6. Two workers, one queue

Run two `durable_agent/worker` processes on the same task queue. Send prompts; Ctrl+C one worker mid-session; send another prompt.

**Learn:** Temporal load-balances across workers. Losing one worker mid-session does not drop an in-flight run if another worker polls the same queue.

### 7. Task queue mismatch

Temporarily change the worker's task queue in `worker/main.go` (e.g. `"wrong-queue"`) while the agent still uses `"agent"`.

**Learn:** Misconfiguration surfaces as **no worker available** after timeout — not silent corruption. Revert queue names and restart both processes.

**Tip:** Under `AgentModeAutonomous`, the immediate worker check is skipped — mismatch may queue until timeout instead of failing fast. Always align task queue (and shared `opts`) between agent and worker before deploy.

## Approvals in this example

The sample registers tools that require approval. Stream emits **`CUSTOM`** events — parse with `ParseCustomEventApproval` / `ParseCustomEventDelegation` and call `OnApproval` with the token (same pattern as [Approvals](/features/approvals)).

If you ignore approvals, tools are skipped with a clear message rather than hanging indefinitely.

## What this example does not cover

| Topic                        | Where to go                                                                                   |
| ---------------------------- | --------------------------------------------------------------------------------------------- |
| Minimal split client/worker  | [Agent Worker](/examples/agent-worker)                                                        |
| Conversation across restarts | [Agent Chat](/reference-apps/agent-chat) or [Conversation](/examples/conversation) with Redis |
| Production checklist         | [Readiness](/production/readiness)                                                            |

## Learn more

<CardGroup cols={2}>
  <Card title="Temporal Runtime" icon="server" href="/runtimes/temporal">
    Architecture, streaming guarantees, agent modes
  </Card>

  <Card title="Worker Separation" icon="gears" href="/advanced/worker-separation">
    Production split-process pattern
  </Card>

  <Card title="Approvals" icon="shield-check" href="/features/approvals">
    Stream CUSTOM events and OnApproval
  </Card>

  <Card title="Agent Chat" icon="comments" href="/reference-apps/agent-chat">
    Durable chat with persisted history
  </Card>
</CardGroup>
