Embeddings
Turn text into vector embeddings for search, clustering, and retrieval. OpenAI-compatible request and response shapes — point an existing OpenAI embeddings client at the gateway and it works.
Overview
The gateway proxies embedding requests to whichever upstream backs your organisation’s embeddings service — OpenAI, or any OpenAI-compatible, Google, or Mistral provider. Model routing works exactly like chat completions: send the gateway model id (or an alias) in model, and the gateway resolves it to the upstream model. Use /v1/models?scope=embeddings to discover which embedding models you can call.
Embeddings require a token with the embeddings scope and are billed on input tokens (there is no output-token cost).
Create embeddings
/v1/embeddingsGenerate one embedding vector per input. Pass a single string or an array of strings.
Request
modelrequiredstring- Embedding model id or alias. Discover options with GET /v1/models?scope=embeddings.
inputrequiredstring | string[]- Text to embed. A single string returns one vector; an array returns one vector per element, in order.
encoding_formatstring- "float" (default) returns each vector as a JSON array of numbers; "base64" returns it as a base64-encoded little-endian float32 string.
dimensionsinteger- Truncate vectors to this many dimensions (supported by OpenAI-compatible upstreams; ignored by providers that don’t support it).
userstring- Optional end-user identifier forwarded to OpenAI-compatible upstreams.
service_idstring- Optional. Route to a specific embeddings service (see GET /v1/services). Falls back to the org’s default embeddings service when omitted.
curl https://api.inferada.com/v1/embeddings \
-H "Authorization: Bearer inf_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox"
}'Batch multiple strings in one call by passing an array — the response keeps input order via the index field:
{
"model": "text-embedding-3-small",
"input": ["first document", "second document"]
}Response · 200
objectstring- Always "list".
data[].embeddingnumber[] | string- The vector — a number array, or a base64 string when encoding_format is "base64".
data[].indexnumber- Position of this vector’s input in the request.
usage.prompt_tokensnumber- Input tokens consumed (equals total_tokens for embeddings).
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0145, "..."]
}
],
"model": "text-embedding-3-small",
"usage": { "prompt_tokens": 8, "total_tokens": 8 }
}Errors
A 400 is returned when input is empty or not a string/array of strings, or when the resolved upstream has no embeddings API (for example an Anthropic upstream — Anthropic has no first-party embeddings endpoint). A 404 means the model is unknown or not accessible to your token.