Mnemom AEGIS

Cross-tenant defensive network for AI agents.

Mnemom AEGIS — Adaptive Enforcement, Governance & Intelligence Substrate — is the runtime security network behind Safe House. It screens every agent transaction at four checkpoints — front door, back door, inside.autonomy, inside.integrity — each independently configurable across four enforcement modes. Signed Managed Rules carry a sub-30s P95 cross-tenant propagation SLO target (first measurements publish 30 days post-GA).

AAP declares. AIP verifies in flight. CLPI governs and anchors. Safe House screens. AEGIS signs the cross-tenant defenses.

Customer dashboard curl /v1/trust/iocs Contact sales

The threat model.

Seven attack patterns drive the agentic threat surface today. Each maps to one of the four checkpoints — so customers can dial enforcement per surface, not as a single global posture.

Threat	Checkpoint	What it looks like
Prompt injection	`front door`	Direct attempts to override the agent's instructions, role-swap, or bypass declared scope at the inbound surface.
Indirect injection	`front door`	Hidden instructions hiding inside retrieved documents, tool outputs, and vector-store payloads — the prompt the agent never knew it received.
Tool misuse	`inside.autonomy`	Coerced or chained tool calls that exceed the agent's declared autonomy bounds or violate the org's protection-card protected surface (forbidden ops, protected assets). Argument-shape attacks against under-validated schemas (OWASP ASI02).
Data exfiltration	`back door`	PII, PHI, secrets, credentials, or cross-tenant data echoed back in agent responses, error traces, or split-token patterns.
BEC / impersonation fraud	`front door`	CEO-fraud style requests, urgency-and-authority pressure, social engineering that targets the agent's escalation contract.
Agent spoofing	`inside.integrity`	Identity-abuse attempts that claim authority the Alignment Card does not declare. OWASP ASI03 — Identity & Privilege Abuse.
Supply-chain compromise	`inside.integrity`	Behavioral signatures consistent with a compromised SDK, model fine-tune, or vendored prompt template — caught cross-tenant via substrate fingerprinting (OWASP ASI04).

Four checkpoints × four enforcement modes.

Every checkpoint is independently configurable. Composition is strictest-wins across Platform → Org → Team → Agent, so a stricter setting at any layer always governs. It mirrors the way Cloudflare WAF Managed Rules let you set severity × action per rule.

Mode →

off

observe

nudge

enforce

front door

Inbound message screening — every prompt, retrieval payload, and tool response before the agent processes it.

back door

Outbound response screening — PII, secrets, Alignment Card violations, regulated advice before the response leaves the perimeter.

inside.autonomy

Tool-call screening — every action the agent takes against the autonomy bounds the Alignment Card declares and the org's protection-card protected surface (forbidden ops, protected assets).

inside.integrity

Reasoning-integrity screening — AIP verdicts on thinking-block payloads; substrate-deviation signatures; identity-abuse patterns.

off

Checkpoint disabled. Used in canary tenants and pre-onboarding.

observe

Evaluates every transaction; emits signed verdicts; never blocks. The default for new Managed Rules during the 24-hour observe soak.

nudge

Annotates or warns inline without blocking. The middle ground for tier-3 rules during ramp-up.

enforce

Blocks the transaction and surfaces a signed verdict to the dashboard. Reached only after the observe soak and FP-rate rollback discipline — operator-confirmed today, automatic in CLPI Phase 2.

Composition cascade: Platform → Org → Team → Agent, strictest-wins. Customer admins clamp at any layer.

The Managed Rules pipeline.

Recipes are detection content. Managed Rules are the signed control-plane state that wraps them. The pipeline is structurally constrained — not procedurally — so tier-1 and tier-2 rules cannot auto-promote, regardless of operator-set mode.

1. Arena
Fifteen canonical adversarial personas probe Safe House 24/7. Mutation-phase gating activates per-bucket only when detection rate crosses 95% over a 48-hour rolling window with 24-hour hysteresis.
2. Candidate
Candidates that slip past the arena enter an isolated review queue with a strictly separated write path, so the system that proposes detection content can never be the same one that approves it. Customer false-negative and false-positive reports and cross-tenant network signals all flow into the same queue.
3. Review
Three reviewer modes — manual (default), auto-approve-trusted-sources, auto-approve-high-confidence. Tier-1 / tier-2 always require dual-control review under an append-only audit chain.
4. 24h observe soak
Every signed promotion lands in observe mode for 24 hours. FP-rate monitoring retires the recipe before any production traffic is blocked — operator-confirmed today, automatic in CLPI Phase 2.
5. Enforce
Tiered KV+R2+isolate-cache failover with independent signing chains pushes the rule to every gateway. P95 ≤ 30s signed-promotion → gateway-loaded.

The protective invariant

A tier-1 or tier-2 Managed Rule — one that would actually block real production traffic — can never be promoted without two-person human review, no matter how aggressive the auto-promotion mode is set. The guarantee is enforced structurally, in the data model itself: an active rule cannot exist unless its review quorum has been met. It is a property of the system, not a procedure someone has to remember to follow.

Guaranteed by the data model, not by operator discipline.

Substrate fingerprinting + supply-chain detection.

Every evaluation is stamped with a substrate fingerprint — the provider, model, and SDK version behind the request, plus an optional customer-supplied lockfile hash sent via the `X-Mnemom-Lockfile-Hash` header. AEGIS sees behavioral deviation across every customer running on the same substrate, simultaneously.

May 11, 2026 — the Mini Shai-Hulud worm compromised 170+ npm packages and 2 PyPI packages, including Mistral AI's SDK suite and Guardrails AI's PyPI package. The compromised `@tanstack/*` versions shipped with valid SLSA Build Level 3 attestations — the first documented case of a worm producing legitimate signed provenance for malicious packages. Per-tenant detection and package-layer Sigstore verification structurally cannot catch this class of attack.

Full threat model on /supply-chain

OWASP Top 10 for Agentic Applications.

Honest mapping against the authoritative OWASP Top 10 for Agentic Applications (OWASP Gen AI Security Project, released 2025-12-09). Where coverage is partial or absent, we say so — see genai.owasp.org for the full ASI taxonomy.

OWASP Top 10 for Agentic Applications (genai.owasp.org)

OWASP category	Coverage	How AEGIS addresses it
ASI02 — Tool Misuse	Partial	Policy engine (CLPI Phase 1) bounded-actions enforcement + forbidden-rule Managed Rules at the inside.autonomy checkpoint, plus back-door screening for data-exfiltration-via-tool. Declared-scope enforcement is the primary control; Mnemom does not intercept every unsafe tool invocation at the gateway.
ASI03 — Identity & Privilege Abuse	Full	AAP-declared autonomy bounds (Alignment Card) enforced by the CLPI policy engine + AIP in-flight integrity verdicts + inside.integrity checkpoint screening of runtime privilege/identity-abuse claims.
ASI04 — Agentic Supply Chain Vulnerabilities	Full (runtime)	Substrate fingerprinting on every evaluation + the cross-tenant aggregator detect runtime-behavior deviation consistent with a compromised dependency/substrate that no single customer can see. Complements — does not replace — build-time package provenance (SLSA, Sigstore).
ASI07 — Insecure Inter-Agent Communication	Partial	Back-door checkpoint treats unauthenticated authority/identity claims arriving as inbound runtime messages as suspicious by design. This screens the content of inter-agent messages; legitimate agent-to-agent authority must be encoded in Alignment Cards. It is not a transport-authentication scheme.

The remaining categories map elsewhere in the Mnemom stack, stated honestly: ASI01 (Agent Goal Hijack) — Safe House front-door screening, shipped for direct injection and substantially covering multi-turn goal redirection (residual on novel multi-turn/multi-vector sequences); ASI09 (Human-Agent Trust Exploitation) — shipped front-door detection of authority/urgency/secrecy manipulation; ASI10 (Rogue Agents) — covered at the governance layer (AAP Alignment Cards + CLPI lifecycle + Trust Ratings), not a single front-door pattern. Honest gaps: ASI05 (Unexpected Code Execution) and ASI06 (Memory & Context Poisoning) have no front-door interception today (the policy engine reduces the action surface; AIP gives partial downstream observability — pair with an app-layer sandbox / treat memory as untrusted input), and ASI08 (Cascading Failures) is an application-architecture concern (timeouts, bulkheads, circuit breakers). See /protection-network and /trust.

NIST AI Risk Management Framework.

How Mnemom's shipped runtime controls support the four NIST AI RMF functions. Honest mapping — Mnemom is a runtime trust substrate, not an AI-risk-management program; where a function is the customer's organizational responsibility, we say so.

NIST AI Risk Management Framework (AI RMF 1.0)

AI RMF function	Coverage	How Mnemom supports it
GOVERN	Partial	Alignment Card as the machine-readable per-agent policy artifact (principal, oversight, autonomy envelope) + CLPI lifecycle governance + dual-control Managed Rules promotion. Your organizational governance program (roles, approval authority, third-party-model intake) stays yours.
MAP	Partial	Alignment Card frames each agent's purpose + declared autonomy/integrity bounds; the EU AI Act risk-classification extension + the OWASP Agentic Top 10 threat mapping frame the risk context. Per-agent framing shipped; whole-estate framing is the customer's.
MEASURE	Partial	AIP integrity checkpoints + verdicts (per-decision), the 0–1000 Trust Rating, the published trust.mnemom.ai/slos SLIs, Safe House false-positive telemetry, and AEGIS substrate fingerprinting. Live runtime measurement; pre-deployment model eval is complementary + customer-run.
MANAGE	Partial	Policy Engine bounded-actions enforcement + Safe House observe/nudge/enforce treat detected risk; the advisory CMS + transparency log communicate incidents; AEGIS failover + the always-on responder handle respond/recover. Your org's risk-resource allocation + IR process stay yours.

"Partial" is honest: the AI RMF is a voluntary, non-certifiable framework operated by your organization. Mnemom supplies the runtime controls + verifiable evidence each function can draw on; it does not discharge your GOVERN responsibilities or certify conformity. Full mapping in /guides/eu-compliance.

How AEGIS compares.

Abbreviated from the 2026-05-23 competitive landscape research. AEGIS is the network layer; the vendors below are complementary, not replacements — see /governance for the full integration story.

Capability	Mnemom AEGIS	Cloudflare WAF	Lakera Guard	Cisco AI Defense	AWS Bedrock Guardrails	Google Model Armor
Cross-tenant Managed Rules with signed promotion	Yes — Ed25519-signed, P95 ≤ 30s propagation, public audit chain	WAF Managed Rules (web-layer, not agent-layer)	Vendor-curated threat-intel; no customer-network-derived signal	Build-time SDK embed; no runtime cross-tenant network	AWS-only; no cross-customer learning	In-process filter; no network
Four-checkpoint × four-mode model per-agent	Yes — front door / back door / inside.autonomy / inside.integrity, each independently configurable	Per-route WAF rules; not agent-transaction-shaped	Single-detector at runtime	NeMo Guardrails integration; build-time policy	Bedrock Guardrails per-policy (denylist, PII, contextual grounding)	Prompt-injection + URL + harmful-content filters
Substrate fingerprinting (provider + model + SDK version) on every evaluation	Yes — cross-tenant supply-chain detection	No	No	No	No	No
Public STIX 2.1 IoC feed + signed advisories	Yes — /v1/trust/iocs (empty at GA by design)	Customer-internal Radar feeds only	No public feed	Talos for traditional threats; no public agent IoC feed	No	No
Dual-control invariant on tier-1/-2 (enforced in the data model)	Yes — schema-enforced, not procedural	Procedural change-management	Vendor-controlled	Vendor-controlled	Customer policy IAM	Vendor-controlled

Sources: vendor public documentation 2026-05-23. AEGIS is a layer customers run alongside these products, not a replacement.

SLOs published. Measured continuously.

Headline numbers below. The full table — measurement queries, historical data once the first 30-day window closes, and the four supporting SLOs — lives on /trust/slos.

Managed Rule propagation

P95 ≤ 30s

Signed promotion → gateway-loaded. Published target; first measurements 30 days post-GA.

Failover availability

99.99%

Gateway loads a verified rule set across multiple independent read tiers.

Rule-set freshness

P99 ≤ 5 min

Under normal operation. P0 page at 24h stale.

First 30-day measurement window publishes 30 days post-GA. We do not pre-announce numbers we cannot defend.

See published SLOs

Bring your tools.

The IoC feed is machine-readable STIX 2.1. The audit chain is verifiable. The dashboard is open to every customer.

curl -s https://api.mnemom.ai/v1/trust/iocs | jq .

Customer dashboard curl /v1/trust/iocs Contact sales