Tools
Curated function-calling tools — real-time data lookups, small utilities, API wrappers — the gateway invokes for the model mid-completion and hands the answer back, all in one request.
What a tool does
A tool is a small, named capability the LLM can ask for while it's generating a response — "what time is it in Tokyo?", "what's the weather in Amsterdam?", "fetch that URL". You pick which tools your organisation wants from the catalogue, the model sees them as options on every completion it makes against a model that has them attached, and it calls them when it needs real-world information.
Your caller doesn't have to do anything — the gateway runs the tool, gives the result back to the model, and returns the finished answer. No client-side plumbing, no round-trips through your app.
How the catalogue is organised
Curated, reviewed, first-party. Use them by toggling them on for your org on the Tools page. Examples: current_time (no config), fetch_url (no config; safe-browses a public URL), web_search (per-org provider key — see below).
Added by Servada on request — usually wrapping a third-party HTTP API (weather, search, finance) or an API your team maintains. You supply the API key once in the portal; it's encrypted at rest and only used from inside the gateway. Examples: get_weather.
Enabling a tool
From the Tools page inside your organisation, click Enable (or Configure + enable if the tool needs an API key). The server saves your config encrypted and the tool is immediately available — no deploy, no reboot.
To actually have a model use the tool, open the model's settings under Modelsand add it to the model's enabled tools list. From that point, every chat completion against that model can invoke it.
What you see in the response
The gateway surfaces tool activity on the response so you can see what happened:
timings.tools_ms— total time spent running tools for this request.inferada.tools— one entry per call with its name, latency, and success / error status.usage.tools— the same entries, persisted in request logs for the history page.
Safety + limits
- Each tool call has a per-call timeout (default 10 s, hard-capped at 30 s).
- Tool-call chains are capped — the model can't spin infinitely. Default 5 steps; configurable per service model.
- API keys you configure are encrypted at rest and never returned to the client or the model.
- Tools that fetch a customer-supplied URL refuse private / link-local / cloud metadata addresses — scanning your internal network from inside a prompt doesn't work.
Web search
The web_search tool gives any agent recent world knowledge — news, docs, reference material — without the model needing to know which search provider is on the other side. The agent always sees one tool with one input shape; the provider is picked per-org when you configure the tool.
Configuring it
On the Tools page, enable web_search and supply:
- Provider —
tavilyorbrave. Tavily returns extracted page content and a synthesised answer; Brave returns links + snippets only. - API key— your account's key with the chosen provider. Stored encrypted at rest.
- Default max results (optional) — used when the model omits
max_results; defaults to 5. - Search depth (Tavily only, optional) —
basicoradvanced. Advanced is slower and more expensive but returns richer content. Brave ignores this knob.
The customer is billed by the search provider directly — Servada doesn't mark up or rebill these calls. Add the same tool to a model's enabled tools list under Models to make it available on completions against that model.
What the model sees
The LLM-facing input shape is always:
{
"query": "EU AI act enforcement timeline",
"max_results": 5, // optional, 1-20
"freshness": "past_month", // optional: anytime | past_day | past_week | past_month | past_year
"topic": "news" // optional: general | news (Tavily honours, Brave ignores)
}And the result shape is always:
{
"query": "EU AI act enforcement timeline",
"answer": "The EU AI Act entered into force on …", // Tavily only; Brave omits
"results": [
{
"title": "EU AI Act — entry into force",
"url": "https://example.eu/ai-act",
"snippet": "Short summary, ~200-300 chars.",
"content": "Full extracted page text.", // Tavily only; Brave omits
"published_date": "2025-04-01T00:00:00.000Z", // when the provider supplies it
"score": 0.91 // Tavily relevance score
}
]
}content and answerare present-when-available — Tavily fills them, Brave doesn't. Agents that depend on the full extracted text should configure a Tavily-backed org; agents that only need ranked links work fine on either provider.
Adding more providers
The provider list is closed for now (Tavily, Brave). If you need Exa, Serper, Linkup, or another search backend, ask Servada to wire it in — the architecture supports adding providers without changing the agent-facing tool name or shape.
Bringing your own tool definitions (passthrough mode)
When your client SDK ships its own tools array on the request — opencode, Claude Code, the official OpenAI SDK with custom function definitions, anything built on @ai-sdk/openai-compatible — the gateway uses exactly those and ignores the catalogue entirely. The model sees the caller's definitions, emits tool calls back on the wire, and your client executes them locally.
Caller-supplied tools replacethe gateway'senabled_tool_idsfor that request. No merging, no name-collision policy, no schema reshape. The schema you send is forwarded byte-for-byte to the upstream — if your agent's tool contract depends on a specific JSON-Schema shape, that shape is what the model sees.
Same rule on /v1/chat/completions and /v1/responses; pick the shape your endpoint expects (Chat Completions nests under function, Responses puts name at the top level).
Streaming a tool call (Chat Completions)
{
"model": "qwen3.5-35b",
"stream": true,
"messages": [
{ "role": "user", "content": "Run git diff and tell me what changed." }
],
"tools": [
{
"type": "function",
"function": {
"name": "bash",
"description": "Execute a shell command and return stdout/stderr.",
"parameters": {
"type": "object",
"properties": { "command": { "type": "string" } },
"required": ["command"]
}
}
}
],
"tool_choice": "auto"
}The gateway streams back OpenAI-compatible tool_calls deltas — id and name on the announce chunk, JSON-string fragments on follow-ups, terminating with finish_reason: "tool_calls":
data: {"choices":[{"index":0,"delta":{"role":"assistant","tool_calls":[{"index":0,"id":"call_a","type":"function","function":{"name":"bash","arguments":""}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"command\":"}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"git diff\"}"}}]}}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}
data: [DONE]Sending the tool result back
Run the call locally, then post the result as a role: "tool" message alongside the assistant turn the gateway just emitted:
{
"model": "qwen3.5-35b",
"messages": [
{ "role": "user", "content": "Run git diff and tell me what changed." },
{
"role": "assistant",
"content": null,
"tool_calls": [
{ "id": "call_a", "type": "function",
"function": { "name": "bash", "arguments": "{\"command\":\"git diff\"}" } }
]
},
{ "role": "tool", "tool_call_id": "call_a", "content": "diff --git a/foo …" }
],
"tools": [ /* same array as the first turn */ ]
}What the gateway skips on this path
- Server-side execution.The gateway never invokes a caller-supplied tool — that's your client's job.
- Output post-processing. Anonymise restoration, spellcheck, and other response-text post-processors are skipped — the assistant turn is agent payload (text preamble plus tool-call JSON), not human prose, and our prose-cleaning layer would corrupt tool arguments.
- Forced buffering. A processor that would normally force a buffered response when streaming is requested still lets streaming through here — agent callers ask for streaming on purpose.
- Pre-processors keep running. PII guard, prompt shield, and other input-direction processors still validate and transform the inbound request — the override rule covers tool definitions and upstream output, not your protections on inbound traffic.