Guides

Tools

Curated function-calling tools — real-time data lookups, small utilities, API wrappers — the gateway invokes for the model mid-completion and hands the answer back, all in one request.

What a tool does

A tool is a small, named capability the LLM can ask for while it's generating a response — "what time is it in Tokyo?", "what's the weather in Amsterdam?", "fetch that URL". You pick which tools your organisation wants from the catalogue, the model sees them as options on every completion it makes against a model that has them attached, and it calls them when it needs real-world information.

Your caller doesn't have to do anything — the gateway runs the tool, gives the result back to the model, and returns the finished answer. No client-side plumbing, no round-trips through your app.

How the catalogue is organised

Built-in tools
ships with the gateway

Curated, reviewed, first-party. Use them by toggling them on for your org on the Tools page. Examples: current_time (no config), fetch_url (no config; safe-browses a public URL), web_search (per-org provider key — see below).

API-wrapping tools
added by Servada

Added by Servada on request — usually wrapping a third-party HTTP API (weather, search, finance) or an API your team maintains. You supply the API key once in the portal; it's encrypted at rest and only used from inside the gateway. Examples: get_weather.

Enabling a tool

From the Tools page inside your organisation, click Enable (or Configure + enable if the tool needs an API key). The server saves your config encrypted and the tool is immediately available — no deploy, no reboot.

To actually have a model use the tool, open the model's settings under Modelsand add it to the model's enabled tools list. From that point, every chat completion against that model can invoke it.

What you see in the response

The gateway surfaces tool activity on the response so you can see what happened:

  • timings.tools_ms — total time spent running tools for this request.
  • inferada.tools — one entry per call with its name, latency, and success / error status.
  • usage.tools — the same entries, persisted in request logs for the history page.

Safety + limits

  • Each tool call has a per-call timeout (default 10 s, hard-capped at 30 s).
  • Tool-call chains are capped — the model can't spin infinitely. Default 5 steps; configurable per service model.
  • API keys you configure are encrypted at rest and never returned to the client or the model.
  • Tools that fetch a customer-supplied URL refuse private / link-local / cloud metadata addresses — scanning your internal network from inside a prompt doesn't work.

Bringing your own tool definitions (passthrough mode)

When your client SDK ships its own tools array on the request — opencode, Claude Code, the official OpenAI SDK with custom function definitions, anything built on @ai-sdk/openai-compatible — the gateway uses exactly those and ignores the catalogue entirely. The model sees the caller's definitions, emits tool calls back on the wire, and your client executes them locally.

The override rule
never merged with the catalogue

Caller-supplied tools replacethe gateway'senabled_tool_idsfor that request. No merging, no name-collision policy, no schema reshape. The schema you send is forwarded byte-for-byte to the upstream — if your agent's tool contract depends on a specific JSON-Schema shape, that shape is what the model sees.

Same rule on /v1/chat/completions and /v1/responses; pick the shape your endpoint expects (Chat Completions nests under function, Responses puts name at the top level).

Streaming a tool call (Chat Completions)

json
{
  "model": "qwen3.5-35b",
  "stream": true,
  "messages": [
    { "role": "user", "content": "Run git diff and tell me what changed." }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "bash",
        "description": "Execute a shell command and return stdout/stderr.",
        "parameters": {
          "type": "object",
          "properties": { "command": { "type": "string" } },
          "required": ["command"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

The gateway streams back OpenAI-compatible tool_calls deltas — id and name on the announce chunk, JSON-string fragments on follow-ups, terminating with finish_reason: "tool_calls":

text
data: {"choices":[{"index":0,"delta":{"role":"assistant","tool_calls":[{"index":0,"id":"call_a","type":"function","function":{"name":"bash","arguments":""}}]}}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"command\":"}}]}}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"git diff\"}"}}]}}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

data: [DONE]

Sending the tool result back

Run the call locally, then post the result as a role: "tool" message alongside the assistant turn the gateway just emitted:

json
{
  "model": "qwen3.5-35b",
  "messages": [
    { "role": "user", "content": "Run git diff and tell me what changed." },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        { "id": "call_a", "type": "function",
          "function": { "name": "bash", "arguments": "{\"command\":\"git diff\"}" } }
      ]
    },
    { "role": "tool", "tool_call_id": "call_a", "content": "diff --git a/foo …" }
  ],
  "tools": [ /* same array as the first turn */ ]
}

What the gateway skips on this path

  • Server-side execution.The gateway never invokes a caller-supplied tool — that's your client's job.
  • Output post-processing. Anonymise restoration, spellcheck, and other response-text post-processors are skipped — the assistant turn is agent payload (text preamble plus tool-call JSON), not human prose, and our prose-cleaning layer would corrupt tool arguments.
  • Forced buffering. A processor that would normally force a buffered response when streaming is requested still lets streaming through here — agent callers ask for streaming on purpose.
  • Pre-processors keep running. PII guard, prompt shield, and other input-direction processors still validate and transform the inbound request — the override rule covers tool definitions and upstream output, not your protections on inbound traffic.