Skip to main content
Each LLM completion can report token counts via interfaces.LLMUsage on LLMResponse.Usage. OpenAI, Anthropic, and Gemini clients populate the fields when the provider returns them.

Usage fields

FieldDescription
PromptTokensInput tokens
CompletionTokensOutput tokens
TotalTokensSum when reported by provider
CachedPromptTokensPrompt cache hits (when supported)
ReasoningTokensExtended thinking tokens (when supported)

Run

AgentRunResult.LLMUsage is the sum across all LLM calls in that run — including tool rounds. Use it for cost estimates, quotas, and logging:
result, err := a.Run(ctx, prompt, nil)
if err != nil {
    return err
}

if result.LLMUsage != nil {
    fmt.Printf("prompt=%d completion=%d total=%d\n",
        result.LLMUsage.PromptTokens,
        result.LLMUsage.CompletionTokens,
        result.LLMUsage.TotalTokens,
    )
}

Stream

The same aggregate appears on LLMUsage in the AgentEventTypeRunFinished event — Result is an *AgentRunResult:
for ev := range eventCh {
    if ev.Type() != agent.AgentEventTypeRunFinished {
        continue
    }
    finished, ok := ev.(*agent.AgentRunFinishedEvent)
    if !ok || finished.Result == nil || finished.Result.LLMUsage == nil {
        continue
    }
    u := finished.Result.LLMUsage
    fmt.Printf("total tokens: %d\n", u.TotalTokens)
}
OpenAI streaming with include_usage surfaces totals on RUN_FINISHED. Example: Stream.

RunAsync

Token usage is on the final AgentRunAsyncResult when the run completes — same aggregate as Run.

Per-request usage

For per-LLM-call breakdowns, use AfterLLMHook and inspect in.Response.Usage on each iteration.

Example

Simple Agent

SHOW_LLM_USAGE footer

Reasoning

ReasoningTokens from extended thinking

Streaming

Usage on RUN_FINISHED events

LLM Providers

Provider-specific usage reporting

Observability

Export usage as metrics