Guides

Billing

Per-token pricing in EUR, monthly allowances, and spending caps you control.

What gets billed

Only requests through Servada-managed upstreams are billed. If your org brings its own provider keys, you pay the provider directly and those requests do not appear on your Servada invoice.

Billing is in EUR. Only successful requests contribute to cost — errored and aborted requests are never charged.

Everything below is visible and controllable from the Billingpage in the portal: current-period spend, invoices, allowances, and your organisation's overage cap. No Servada support ticket required for the day-to-day.

Pricing

Each model has three prices, quoted in EUR per million tokens:

inputEUR / Mtok
Regular prompt tokens.
cached_inputEUR / Mtok
Prompt tokens served from the provider cache. Typically much cheaper.
outputEUR / Mtok
Completion tokens (including reasoning / thinking tokens).

The price of a request is frozen at the time of the call, so later price changes don't alter your historical invoices.

How pricing inherits

Prices live on a single base row per upstream model. Profiles attached to a base (think gpt-4o:chat vs gpt-4o:code) share the base's prices — there is one canonical “what does GPT-4o cost on OpenAI” record, not one per profile.

The resolver walks three layers in order:

service_model overrideoptional
Per-org price override on the service model row. Used when a tenant has negotiated bespoke pricing.
baseusual
The base template that owns identity + pricing for the upstream model. Inherited automatically by every profile attached to it.
zerofallback
A base with all-NULL prices is a valid free-tier configuration. Such requests resolve to source=zero and are recorded as a deliberate zero-cost call (see request_logs.price_snapshot).

Pricing is super-admin-only on Inferada. Org admins see the resolved prices on their invoices and usage views but cannot edit them — they are managed centrally so the catalogue stays consistent across tenants.

Your monthly plan

A plan is a flat monthly fee plus an allowance. Usage within the allowance is covered by the flat fee; anything beyond it is “overage” and billed on top.

flat feeEUR / month
Base monthly charge.
allowancetokens or EUR
How much usage the flat fee includes.
text
total_eur = flat_fee + overage_cost_eur

Spending caps

You choose what happens when the allowance runs out:

Stop at allowancemode
Requests are rejected with HTTP 402 once the allowance is used.
Allow overagemode
Requests continue up to the overage cap you set. Billed separately.

Organisation administrators can switch mode and set the overage cap fromBilling in the portal. The cap can be raised at any time; requests resume immediately.

One request may briefly cross the cap

The cap is checked before each request based on total spend so far. A single in-flight request can push you slightly over — we don't pre-reject based on guesses about its output size. The next request is then rejected.

HTTP 402 — cap reached

When the cap is reached, billable requests return:

json
{
  "type": "billing_cap_exceeded",
  "code": 402,
  "error": "Monthly spending cap reached.",
  "request_id": "uuid",
  "current_eur": 10.0125,
  "cap_eur": 10.00,
  "allowance_eur": 10.00,
  "overage_cap_eur": 0
}

Raise the overage cap from the portal (or contact your account manager if the ceiling is too low) and the next request goes through.

Invoices

An invoice is produced for each calendar month. It lists one row per model with token counts and cost, split into within-allowance and overage portions. You can view and download invoices fromBilling in the portal.

The billing dashboard also shows live current-month spend, a progress bar against your cap, and alerts as you approach it. The daily-spend chart stacks within-allowance spend on the bottom and overage on top — the overage segment is rendered in the destructive accent and with a hatched fill so it's readable in any theme and for colour-blind viewers.

See also

  • Usage Accounting — how tokens are counted per request.
  • Errors — every HTTP status code the gateway can return.