API Reference

Embeddings

Turn text into vector embeddings for search, clustering, and retrieval. OpenAI-compatible request and response shapes — point an existing OpenAI embeddings client at the gateway and it works.

Overview

The gateway proxies embedding requests to whichever upstream backs your organisation’s embeddings service — OpenAI, or any OpenAI-compatible, Google, or Mistral provider. Model routing works exactly like chat completions: send the gateway model id (or an alias) in model, and the gateway resolves it to the upstream model. Use /v1/models?scope=embeddings to discover which embedding models you can call.

Embeddings require a token with the embeddings scope and are billed on input tokens (there is no output-token cost).

Create embeddings

POST/v1/embeddings
Scopeembeddings

Generate one embedding vector per input. Pass a single string or an array of strings.

Request

modelrequiredstring
Embedding model id or alias. Discover options with GET /v1/models?scope=embeddings.
inputrequiredstring | string[]
Text to embed. A single string returns one vector; an array returns one vector per element, in order.
encoding_formatstring
"float" (default) returns each vector as a JSON array of numbers; "base64" returns it as a base64-encoded little-endian float32 string.
dimensionsinteger
Truncate vectors to this many dimensions (supported by OpenAI-compatible upstreams; ignored by providers that don’t support it).
userstring
Optional end-user identifier forwarded to OpenAI-compatible upstreams.
service_idstring
Optional. Route to a specific embeddings service (see GET /v1/services). Falls back to the org’s default embeddings service when omitted.
bash
curl https://api.inferada.com/v1/embeddings \
  -H "Authorization: Bearer inf_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox"
  }'

Batch multiple strings in one call by passing an array — the response keeps input order via the index field:

json
{
  "model": "text-embedding-3-small",
  "input": ["first document", "second document"]
}

Response · 200

objectstring
Always "list".
data[].embeddingnumber[] | string
The vector — a number array, or a base64 string when encoding_format is "base64".
data[].indexnumber
Position of this vector’s input in the request.
usage.prompt_tokensnumber
Input tokens consumed (equals total_tokens for embeddings).
json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0145, "..."]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Errors

A 400 is returned when input is empty or not a string/array of strings, or when the resolved upstream has no embeddings API (for example an Anthropic upstream — Anthropic has no first-party embeddings endpoint). A 404 means the model is unknown or not accessible to your token.