API Reference

Embeddings

Turn text into vector embeddings for search, clustering, and retrieval. OpenAI-compatible request and response shapes — point an existing OpenAI embeddings client at the gateway and it works.

Overview

The gateway proxies embedding requests to whichever upstream backs your organisation’s embeddings service — OpenAI, or any OpenAI-compatible, Google, or Mistral provider. Model routing works exactly like chat completions: send the gateway model id (or an alias) in model, and the gateway resolves it to the upstream model. Use /v1/models?scope=embeddings to discover which embedding models you can call.

Embeddings require a token with the embeddings scope and are billed on input tokens (there is no output-token cost).

Create embeddings

POST/v1/embeddings

Scopeembeddings

Generate one embedding vector per input. Pass a single string or an array of strings.

Request

modelrequiredstring: Embedding model id or alias. Discover options with GET /v1/models?scope=embeddings.
inputrequiredstring | string[]: Text to embed. A single string returns one vector; an array returns one vector per element, in order.
encoding_formatstring: "float" (default) returns each vector as a JSON array of numbers; "base64" returns it as a base64-encoded little-endian float32 string.
dimensionsinteger: Truncate vectors to this many dimensions (supported by OpenAI-compatible upstreams; ignored by providers that don’t support it).
userstring: Optional end-user identifier forwarded to OpenAI-compatible upstreams.
service_idstring: Optional. Route to a specific embeddings service (see GET /v1/services). Falls back to the org’s default embeddings service when omitted.

bash

curl https://api.inferada.com/v1/embeddings \
  -H "Authorization: Bearer inf_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox"
  }'

Batch multiple strings in one call by passing an array — the response keeps input order via the index field:

json

{
  "model": "text-embedding-3-small",
  "input": ["first document", "second document"]
}

Response · 200

objectstring: Always "list".
data[].embeddingnumber[] | string: The vector — a number array, or a base64 string when encoding_format is "base64".
data[].indexnumber: Position of this vector’s input in the request.
usage.prompt_tokensnumber: Input tokens consumed (equals total_tokens for embeddings).

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0145, "..."]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 8, "total_tokens": 8 }
}

Errors

A 400 is returned when input is empty or not a string/array of strings, or when the resolved upstream has no embeddings API (for example an Anthropic upstream — Anthropic has no first-party embeddings endpoint). A 404 means the model is unknown or not accessible to your token.