Audio
Convert speech to text and text to speech. OpenAI-compatible request shapes so existing clients work out of the box.
Transcribe audio
/v1/audio/transcriptionsSpeech-to-text. Send the audio file as a multipart upload and get back the transcribed text.
Request
Content-Type: multipart/form-data
filerequiredbinary- Audio file. Common formats such as WAV, MP3, FLAC, OGG, M4A are accepted.
languagestring- ISO language hint to improve accuracy (e.g. "en", "nl").
response_formatstring- Output format. JSON by default.
service_idstring- Optional. Route to a specific STT service (see GET /v1/services). Falls back to the org’s default STT service when omitted.
curl https://api.inferada.com/v1/audio/transcriptions \
-H "Authorization: Bearer inf_YOUR_TOKEN" \
-F file=@speech.mp3 \
-F language=enResponse · 200
textstring- Transcribed text.
durationnumber- Audio duration in seconds (when available).
{
"text": "Hello, this is a transcription.",
"duration": 4.2
}Synthesise speech
/v1/audio/speechGenerate speech audio from text. Returns a binary audio stream in the requested format.
Request
voicerequiredstring- Voice ID. List available voices with /v1/audio/voices.
inputrequiredstring- Text to synthesise.
response_formatstring- mp3 (default), opus, flac, aac, wav or pcm.
speednumber- Playback speed multiplier. Default 1.0.
modelstring- Optional model identifier — usually inferred from the voice.
{
"voice": "nl_NL-rdh-medium",
"input": "Hallo, welkom.",
"response_format": "mp3",
"speed": 1.0
}Response · 200
Binary audio stream. The Content-Type matches the requested format: audio/mpeg for MP3, audio/wav, audio/opus, audio/flac, audio/aac, or application/octet-stream for PCM.
curl https://api.inferada.com/v1/audio/speech \
-H "Authorization: Bearer inf_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "voice": "nl_NL-rdh-medium", "input": "Hallo." }' \
--output out.mp3Errors
If voice is unknown, the response is a 400 with the list of available voices inlined so your client can recover:
{
"error": "Voice 'xyz' not found. Available voices: [...]",
"available_voices": [
{ "id": "nl_NL-rdh-medium", "name": "RDH", "language": "nl" }
],
"request_id": "..."
}Choosing a voice
Voices come from two backends with different strengths. Pick by language:
- Piper — lightweight and fast. Best coverage for Dutch, German, French and Spanish. Voices come in low/medium/high quality tiers.
- Kokoro — neural, more natural-sounding. Best for English (American and British), and also covers Japanese and Italian.
The backend is baked into each voice id (e.g. piper:nl_NL-rdh-medium, kokoro:af_bella), so you just pass whatever id you picked from /v1/audio/voices.
List available voices
/v1/audio/voicesList voices configured for your organisation.
Response · 200
voices[].idstring- Unique voice identifier. Use this in /v1/audio/speech.
voices[].namestring- Human-readable display name.
voices[].languagestring- ISO language code (e.g. "nl", "en").
voices[].qualitystring | null- Quality tier when available.
{
"voices": [
{
"id": "nl_NL-rdh-medium",
"name": "RDH",
"language": "nl",
"quality": "medium"
}
]
}