Guides

Haystack

Attach a deep JSON tree of caller-side context to a chat completion. The gateway renders it into a system message the model can read, and (when redaction is on) walks every leaf to replace PII with placeholders that are restored after the response.

Redaction is best-effort

The redactor uses your schema flags first and Presidio second; some values will still slip through. For traffic where any leak matters, route sensitive workloads to a Servada-hosted local model so nothing leaves the perimeter in the first place. The haystack feature is the right fit when you do use an external provider but want a structured way to reduce what reaches it.

When to use it

The haystack is for caller-side context that travels with a single request. Customer profiles, recent orders, account state, support tickets — anything you would otherwise hand-format into prose inside a system prompt. It is not a vector store, not an embedding cache, and not persistent memory across requests; you supply it on each call.

Two things you get for free over hand-formatting:

Field-level PII redaction guided by an explicit schema, not free-text scanning.
Dot-path templating — your system prompts can reference {{ haystack.customer.name }} directly. See Prompt Templating.

Quick start

Enable the processor and attach data. Everything else has sensible defaults.

json

{
  "model": "claude-sonnet-4-6:chat",
  "messages": [
    { "role": "system", "content": "Answer the user using the attached customer context." },
    { "role": "user", "content": "Why was my latest order delayed?" }
  ],
  "inferada": {
    "processors": ["haystack"],
    "language": "en",
    "haystack": {
      "data": {
        "customer": { "name": "Jane Doe", "email": "jane@example.com", "tier": "pro" },
        "orders": [
          { "id": "ord_1", "total": 49.99, "status": "delayed" },
          { "id": "ord_2", "total": 19.99, "status": "shipped" }
        ]
      }
    }
  }
}

The model sees an <haystack> block prepended to its system messages, in YAML by default. With redaction on (the default when your org has the PII service configured) names, emails, phone numbers, and the rest are replaced with <INF_HAY_TYPE_N> placeholders before the upstream call and restored in the buffered response.

Schema

Supplying a schema is optional but recommended. It gives the model real context (one-line descriptions of what each field means) and gives the redactor explicit signals instead of having to guess from value shape.

json

{
  "customer": {
    "type": "object",
    "description": "The end-user we're answering questions about",
    "properties": {
      "name":  { "type": "string",   "description": "Full legal name", "pii": "PERSON" },
      "email": { "type": "string",   "description": "Primary contact email", "pii": "EMAIL_ADDRESS" },
      "tier":  { "type": "enum",     "values": ["free", "pro", "enterprise"], "description": "Subscription tier" }
    }
  },
  "orders": {
    "type": "array",
    "description": "Customer's recent orders, newest first",
    "items": {
      "type": "object",
      "properties": {
        "id":     { "type": "id",       "description": "Internal order ID", "pii": false },
        "total":  { "type": "currency", "currency": "EUR", "description": "Order total" },
        "status": { "type": "enum",     "values": ["shipped", "pending", "cancelled", "delayed"] }
      }
    }
  }
}

Field-def keys:

typerequiredstring: string | number | boolean | null | object | array | enum | date | datetime | currency | id | markdown | code
descriptionstring: One sentence describing what the field is. Rendered as an inline comment in YAML/Markdown — the model reads it on every request.
piistring | boolean: false to never redact, true / "auto" to scan with Presidio + key heuristics, a Presidio entity type ("PERSON", "EMAIL_ADDRESS", ...) to assign a placeholder verbatim with no scan.
valuesarray: Allowed values for type: "enum". Strings or numbers.
currencystring: ISO 4217 code (EUR, USD, ...). Only meaningful for type: "currency".
propertiesobject: Children for type: "object" — same shape as the top-level schema.
itemsobject: Element shape for type: "array".

The full machine-readable contract lives at https://api.inferada.com/schemas/haystack/v1.json — wire that $schema URL into your IDE for autocomplete and validation. The inferada-haystack-author skill will generate a schema from a TypeScript type or sample JSON for you. Download it from https://api.inferada.com/skills/inferada-haystack-author.zip.

PII redaction

Set redact: true (default when your org has a PII service configured) and the processor walks every string leaf. Precedence:

Schema pii: false → never scanned.
Schema pii: "PERSON" (or any explicit type) → placeholder assigned verbatim, no Presidio call.
Schema pii: true / "auto" or absent → key heuristics + Presidio. Keys like email, phone, name, address, iban bias detection toward the matching entity type.

Placeholders use the form <INF_HAY_TYPE_N> — disjoint from the <INF_TYPE_N> namespace used by the anonymise processor for free-text user messages, so the two compose without collision. Restoration in the response is buffered-only — streaming responses include the redacted placeholders verbatim.

For free-text user/assistant messages, use anonymise. For caller-side structured data, use haystack with redact: true. Both can be enabled on the same request. See PII for the underlying detection endpoints.

Output format

The rendered block goes into a system message wrapped in an <haystack> tag. format picks how the body inside that wrapper is encoded.

yamldefault: Most token-efficient on nested data across modern frontier models. Schema descriptions render as inline # comments above each key; currency and date types format with their unit.
jsonoverride: Pretty-printed JSON. Round-trip-safe; no comments. Pick this when the model will emit the haystack back as structured output.
markdownoverride: Headings + bullet lists + blockquoted descriptions. Reads naturally for prose-heavy haystacks.
xmloverride: Element-per-key with sanitised tag names. Costs the most tokens — use when you have measured a Claude-only win on your workload.

Limits

Validation runs at request time and fails the call with HTTP 400 on the first violation:

Serialised JSON ≤ 64 KB.
Depth ≤ 8 nested levels.
Array length ≤ 500 entries per node.
Total leaves ≤ 2,000.
Schema type/enum mismatch (e.g. number declared, string sent).

Error responses include a reason tag (haystack_too_large, haystack_too_deep, haystack_schema_mismatch, …) and the failing path. See Errors for the response shape.

Templating access

When the haystack processor is enabled, the data tree is also exposed to the templating engine. Caller system prompts can interpolate any field via {{ haystack.<path> }} — dot for object access, brackets for array indices:

text

Greet {{ haystack.customer.name }}.
The most recent order total was {{ haystack.orders[0].total }}.
All tiers: {{ haystack.tiers }}        ← arrays/objects render as JSON automatically

With redact: true the substituted values are the redacted ones (<INF_HAY_PERSON_1> instead of the real name) — the response-side restore covers them too. See Prompt Templating for the resolution rules and the safety guarantees around prototype access.

Reference

inferada.processorsrequiredarray: Must include "haystack" to enable the feature.
inferada.languagerequiredstring: BCP-47 / ISO-639-1 language code. Required when redaction is on (always, by default).
inferada.haystack.datarequiredobject | array: Deep JSON tree. Leaves must be string, number, boolean, or null.
inferada.haystack.schemaobject: Optional. Field-by-field metadata; see the Schema section above.
inferada.haystack.formatstring: yaml | json | markdown | xml. Default: yaml.
inferada.haystack.redactboolean: Default: true when the org has a PII service configured. Set false to skip redaction even when configured.