Introduction

About Inferada

A managed gateway that sits between your app and the model. Bring AI into your product without rebuilding governance, privacy, and routing yourself.

What is Inferada?

Inferada is a managed LLM gateway from Servada. You point your application at one base URL with one bearer token, and Inferada handles the parts of "running AI in production" that aren't the model itself — provider routing, redaction, prompt-injection blocking, tool calls, spend caps, history, and per-organisation isolation.

The point is to bootstrap AI features into your product quickly without each team reinventing the same controls. Use any model you like — including local ones for data you'd rather not hand to a third party — and let the gateway take care of the rest.

What the gateway does between you and the model

Keeps sensitive data local.Pin restricted workloads to a local model running inside Servada's infrastructure; let the rest go to a commercial provider. The split lives in the gateway — clients don't have to know which model is which, and a misrouted prompt can't leak.
Redacts PII on the way out, restores it on the way back. Names, emails, phone numbers, IDs are swapped for placeholders before the request leaves; the real values are stitched back into the response. The provider — and the stored history — never see the originals.
Blocks PII from going out at all (when you'd rather reject than redact). pii_guardrejects requests that contain personal data instead of transforming them. Useful when "don't even try" is the safer policy than "we redacted, probably."
Stops prompt-injection / jailbreak attempts before the model sees them. prompt_shield runs a heuristic pass plus an optional classifier to spot instruction-override, role-hijack, and obfuscated payloads in user input — and either blocks the request or returns a synthesised refusal. Saves you writing a content guard from scratch.
Wires up tool calls without you holding the keys. Function-calling tools (search, CRM, weather, your own internal API) get their credentials provisioned inside the gateway. Your app sends a chat completion; the gateway resolves the tool calls and returns the answer. No third-party API key ever sits in your client.
Runs popular AI tools as Managed Apps. One-click provisioning for Flowise, n8n, OpenWebUI and similar — already wired to your models, isolated per organisation, no infrastructure to operate.
Tracks spend and gives you the controls.Per-token pricing, monthly allowances, overage caps, full request history, per-user and per-token attribution. All visible and adjustable from the portal — you don't need a support ticket to change a limit.
Isolates per organisation. Provider keys, tool configs, managed apps, history, and limits are scoped to your org, encrypted at rest, never shared with other tenants on the platform.

Request flow

When it helps

You want to ship an AI feature in your product but don't want every prompt your app sends to be readable by a third-party provider. Pin those calls to a local model; leave the rest on the best commercial one.
Your team is exploring chat / RAG / agent flows and you'd rather not spend the first sprint building rate limits, content guards, and an audit trail before you can even prototype.
You're running an internal assistant that handles user data and you want a hard line — anything containing PII either gets redacted before leaving (anonymise) or gets rejected outright (pii_guard).
You're worried about prompt injection in user-supplied content (chat, web forms, documents) and want a content guard to filter it without writing the rules yourself.
You're running multiple teams or customer organisations under one platform, each with their own provider keys, models, tools, and usage caps. Isolation is the default, not a configuration option you have to remember to flip.
You want your team to use a chat UI or a workflow builder without spinning up the infra yourself. Provision the Managed App once and it shows up wired to your models.