Usage Accounting
What we count per scope, how those counts are exposed, and how to read the request history for detail.
Two views of usage
You can query your own usage in two complementary ways:
- Live totals — aggregated counters per period, optimised for quick dashboards and budget checks. Exposed via
/v1/account/usage. - Request history — one entry per request, with full detail of what was sent, how long it took, and what was counted. Exposed via
/v1/account/history.
Metrics per scope
Each scope counts what matters for that service — token counts for completions, characters for text-to-speech, and so on.
completionsscope- tokens, prompt_tokens, completion_tokens
languagescope- requests, corrections
piiscope- requests, entities_found, entities_redacted, replacements_made
prompt_shieldscope- requests, detections_count
sttscope- requests, audio_duration_seconds
ttsscope- requests, characters_synthesised, output_audio_seconds
Live totals: /v1/account/usage
/v1/account/usageAggregated counters for the authenticated user, grouped per scope and per model.
Request
periodstring- minute, day (default), week or month. Defines the time window.
scopestring- Optional filter, e.g. completions.
Response · 200
{
"period": "day",
"scopes": {
"completions": {
"tokens": 5200,
"prompt_tokens": 3100,
"completion_tokens": 2100
}
},
"models": {
"qwen3.5-35b": {
"tokens": 5200,
"prompt_tokens": 3100,
"completion_tokens": 2100
}
}
}Reading the response
scopesis keyed by scope name. Each value is a metric-count map matching the table above.modelsis keyed by model ID. Only completions traffic contributes, since only completions track per-model counts.- Empty
{}objects mean no usage was recorded in that period. Older periods drop off after a while — for hard audit records, query the request history instead. - Note: for the completions scope, a
requestscount is not part of these totals — use the request history for that. For language, pii, stt and tts it is counted here.
Request history: /v1/account/history
/v1/account/historyPaginated per-request audit log for the authenticated user.
Request
pagenumber- Page number (1-based). Default 1.
per_pagenumber- Page size. Default 50, max 100.
scopestring- Filter by scope.
modelstring- Filter by model ID.
statusstring- success or error.
Response · 200
{
"data": [
{
"id": "uuid",
"user_id": "uuid",
"scope": "completions",
"model_id": "qwen3.5-35b",
"endpoint": "/v1/chat/completions",
"usage": {
"prompt_tokens": 200,
"completion_tokens": 150,
"processors": {
"anonymise": { "entities_redacted": 2 }
}
},
"latency_ms": 870,
"status": "success",
"stream": false,
"timestamp": "2026-04-15T14:30:00.000Z"
}
],
"meta": {
"total": 200,
"per_page": 50,
"current_page": 1,
"last_page": 4
}
}Shape of the usage object per scope
The usage field on a request history entry reflects what was counted for that specific call. The shape varies by scope:
completions
{
"prompt_tokens": 200,
"completion_tokens": 150,
"processors": {
"anonymise": { "entities_redacted": 2 }
}
}processors is only present when processors ran. Each processor adds its own keys under that object — see the Processors guide for the per-processor shape.
Some processors also persist forensic fields at the top level of usage (outside processors) on rejected requests — for example prompt_shield_detections. These are server-side only and never echoed to the client.
language
{ "requests": 1, "corrections": 2 }pii (analyse)
{ "requests": 1, "entities_found": 3 }pii (redact / restore)
{ "requests": 1, "entities_redacted": 2, "replacements_made": 2 }stt
{ "requests": 1, "audio_duration_seconds": 4.2 }tts
{ "requests": 1, "characters_synthesised": 45, "output_audio_seconds": 3.2 }See also
- Rate Limits — how these counts are used to enforce limits.
- Processors — each processor adds to its own scope's usage.