API Reference

Audio

Convert speech to text and text to speech. OpenAI-compatible request shapes so existing clients work out of the box.

Transcribe audio

POST/v1/audio/transcriptions
Scopestt

Speech-to-text. Send the audio file as a multipart upload and get back the transcribed text.

Request

Content-Type: multipart/form-data

filerequiredbinary
Audio file. Common formats such as WAV, MP3, FLAC, OGG, M4A are accepted.
languagestring
ISO language hint to improve accuracy (e.g. "en", "nl").
response_formatstring
Output format. JSON by default.
service_idstring
Optional. Route to a specific STT service (see GET /v1/services). Falls back to the org’s default STT service when omitted.
bash
curl https://api.inferada.com/v1/audio/transcriptions \
  -H "Authorization: Bearer inf_YOUR_TOKEN" \
  -F file=@speech.mp3 \
  -F language=en

Response · 200

textstring
Transcribed text.
durationnumber
Audio duration in seconds (when available).
json
{
  "text": "Hello, this is a transcription.",
  "duration": 4.2
}

Synthesise speech

POST/v1/audio/speech
Scopetts

Generate speech audio from text. Returns a binary audio stream in the requested format.

Request

voicerequiredstring
Voice ID. List available voices with /v1/audio/voices.
inputrequiredstring
Text to synthesise.
response_formatstring
mp3 (default), opus, flac, aac, wav or pcm.
speednumber
Playback speed multiplier. Default 1.0.
modelstring
Optional model identifier — usually inferred from the voice.
json
{
  "voice": "nl_NL-rdh-medium",
  "input": "Hallo, welkom.",
  "response_format": "mp3",
  "speed": 1.0
}

Response · 200

Binary audio stream. The Content-Type matches the requested format: audio/mpeg for MP3, audio/wav, audio/opus, audio/flac, audio/aac, or application/octet-stream for PCM.

bash
curl https://api.inferada.com/v1/audio/speech \
  -H "Authorization: Bearer inf_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "voice": "nl_NL-rdh-medium", "input": "Hallo." }' \
  --output out.mp3

Errors

If voice is unknown, the response is a 400 with the list of available voices inlined so your client can recover:

json
{
  "error": "Voice 'xyz' not found. Available voices: [...]",
  "available_voices": [
    { "id": "nl_NL-rdh-medium", "name": "RDH", "language": "nl" }
  ],
  "request_id": "..."
}

Choosing a voice

Voices come from two backends with different strengths. Pick by language:

  • Piper — lightweight and fast. Best coverage for Dutch, German, French and Spanish. Voices come in low/medium/high quality tiers.
  • Kokoro — neural, more natural-sounding. Best for English (American and British), and also covers Japanese and Italian.

The backend is baked into each voice id (e.g. piper:nl_NL-rdh-medium, kokoro:af_bella), so you just pass whatever id you picked from /v1/audio/voices.

List available voices

GET/v1/audio/voices
Scopetts

List voices configured for your organisation.

Response · 200

voices[].idstring
Unique voice identifier. Use this in /v1/audio/speech.
voices[].namestring
Human-readable display name.
voices[].languagestring
ISO language code (e.g. "nl", "en").
voices[].qualitystring | null
Quality tier when available.
json
{
  "voices": [
    {
      "id": "nl_NL-rdh-medium",
      "name": "RDH",
      "language": "nl",
      "quality": "medium"
    }
  ]
}