Mnemom AEGIS

Cross-tenant defensive network for AI agents.

Mnemom AEGIS — Adaptive Enforcement, Governance & Intelligence Substrate — is the runtime security network behind Safe House. It screens every agent transaction at four checkpoints — front door, back door, inside.autonomy, inside.integrity — each independently configurable across four enforcement modes. Signed Managed Rules carry a sub-30s P95 cross-tenant propagation SLO target (first measurements publish 30 days post-GA).

AAP declares. AIP verifies in flight. CLPI governs and anchors. Safe House screens. AEGIS signs the cross-tenant defenses.

The threat model.

Seven attack patterns drive the agentic threat surface today. Each maps to one of the four checkpoints — so customers can dial enforcement per surface, not as a single global posture.

ThreatCheckpointWhat it looks like
Prompt injectionfront doorDirect attempts to override the agent's instructions, role-swap, or bypass declared scope at the inbound surface.
Indirect injectionfront doorHidden instructions hiding inside retrieved documents, tool outputs, and vector-store payloads — the prompt the agent never knew it received.
Tool misuseinside.autonomyCoerced or chained tool calls that exceed the agent's declared autonomy bounds or violate the org's protection-card protected surface (forbidden ops, protected assets). Argument-shape attacks against under-validated schemas (OWASP ASI02).
Data exfiltrationback doorPII, PHI, secrets, credentials, or cross-tenant data echoed back in agent responses, error traces, or split-token patterns.
BEC / impersonation fraudfront doorCEO-fraud style requests, urgency-and-authority pressure, social engineering that targets the agent's escalation contract.
Agent spoofinginside.integrityIdentity-abuse attempts that claim authority the Alignment Card does not declare. OWASP ASI03 — Identity & Privilege Abuse.
Supply-chain compromiseinside.integrityBehavioral signatures consistent with a compromised SDK, model fine-tune, or vendored prompt template — caught cross-tenant via substrate fingerprinting (OWASP ASI04).

Four checkpoints × four enforcement modes.

Every checkpoint is independently configurable. Composition is strictest-wins across Platform → Org → Team → Agent, so a stricter setting at any layer always governs. It mirrors the way Cloudflare WAF Managed Rules let you set severity × action per rule.

Mode
off
observe
nudge
enforce
front door
Inbound message screening — every prompt, retrieval payload, and tool response before the agent processes it.
back door
Outbound response screening — PII, secrets, Alignment Card violations, regulated advice before the response leaves the perimeter.
inside.autonomy
Tool-call screening — every action the agent takes against the autonomy bounds the Alignment Card declares and the org's protection-card protected surface (forbidden ops, protected assets).
inside.integrity
Reasoning-integrity screening — AIP verdicts on thinking-block payloads; substrate-deviation signatures; identity-abuse patterns.
off

Checkpoint disabled. Used in canary tenants and pre-onboarding.

observe

Evaluates every transaction; emits signed verdicts; never blocks. The default for new Managed Rules during the 24-hour observe soak.

nudge

Annotates or warns inline without blocking. The middle ground for tier-3 rules during ramp-up.

enforce

Blocks the transaction and surfaces a signed verdict to the dashboard. Reached only after the observe soak and FP-rate rollback discipline — operator-confirmed today, automatic in CLPI Phase 2.

Composition cascade: Platform → Org → Team → Agent, strictest-wins. Customer admins clamp at any layer.

The Managed Rules pipeline.

Recipes are detection content. Managed Rules are the signed control-plane state that wraps them. The pipeline is structurally constrained — not procedurally — so tier-1 and tier-2 rules cannot auto-promote, regardless of operator-set mode.

  1. 1. Arena

    Fifteen canonical adversarial personas probe Safe House 24/7. Mutation-phase gating activates per-bucket only when detection rate crosses 95% over a 48-hour rolling window with 24-hour hysteresis.

  2. 2. Candidate

    Candidates that slip past the arena enter an isolated review queue with a strictly separated write path, so the system that proposes detection content can never be the same one that approves it. Customer false-negative and false-positive reports and cross-tenant network signals all flow into the same queue.

  3. 3. Review

    Three reviewer modes — manual (default), auto-approve-trusted-sources, auto-approve-high-confidence. Tier-1 / tier-2 always require dual-control review under an append-only audit chain.

  4. 4. 24h observe soak

    Every signed promotion lands in observe mode for 24 hours. FP-rate monitoring retires the recipe before any production traffic is blocked — operator-confirmed today, automatic in CLPI Phase 2.

  5. 5. Enforce

    Tiered KV+R2+isolate-cache failover with independent signing chains pushes the rule to every gateway. P95 ≤ 30s signed-promotion → gateway-loaded.

The protective invariant

A tier-1 or tier-2 Managed Rule — one that would actually block real production traffic — can never be promoted without two-person human review, no matter how aggressive the auto-promotion mode is set. The guarantee is enforced structurally, in the data model itself: an active rule cannot exist unless its review quorum has been met. It is a property of the system, not a procedure someone has to remember to follow.

Guaranteed by the data model, not by operator discipline.

Substrate fingerprinting + supply-chain detection.

Every evaluation is stamped with a substrate fingerprint — the provider, model, and SDK version behind the request, plus an optional customer-supplied lockfile hash sent via the `X-Mnemom-Lockfile-Hash` header. AEGIS sees behavioral deviation across every customer running on the same substrate, simultaneously.

May 11, 2026 — the Mini Shai-Hulud worm compromised 170+ npm packages and 2 PyPI packages, including Mistral AI's SDK suite and Guardrails AI's PyPI package. The compromised `@tanstack/*` versions shipped with valid SLSA Build Level 3 attestations — the first documented case of a worm producing legitimate signed provenance for malicious packages. Per-tenant detection and package-layer Sigstore verification structurally cannot catch this class of attack.

OWASP Top 10 for Agentic Applications.

Honest mapping against the authoritative OWASP Top 10 for Agentic Applications (OWASP Gen AI Security Project, released 2025-12-09). Where coverage is partial or absent, we say so — see genai.owasp.org for the full ASI taxonomy.

OWASP Top 10 for Agentic Applications (genai.owasp.org)

OWASP categoryCoverageHow AEGIS addresses it
ASI02 — Tool Misuse
Partial
Policy engine (CLPI Phase 1) bounded-actions enforcement + forbidden-rule Managed Rules at the inside.autonomy checkpoint, plus back-door screening for data-exfiltration-via-tool. Declared-scope enforcement is the primary control; Mnemom does not intercept every unsafe tool invocation at the gateway.
ASI03 — Identity & Privilege Abuse
Full
AAP-declared autonomy bounds (Alignment Card) enforced by the CLPI policy engine + AIP in-flight integrity verdicts + inside.integrity checkpoint screening of runtime privilege/identity-abuse claims.
ASI04 — Agentic Supply Chain Vulnerabilities
Full (runtime)
Substrate fingerprinting on every evaluation + the cross-tenant aggregator detect runtime-behavior deviation consistent with a compromised dependency/substrate that no single customer can see. Complements — does not replace — build-time package provenance (SLSA, Sigstore).
ASI07 — Insecure Inter-Agent Communication
Partial
Back-door checkpoint treats unauthenticated authority/identity claims arriving as inbound runtime messages as suspicious by design. This screens the content of inter-agent messages; legitimate agent-to-agent authority must be encoded in Alignment Cards. It is not a transport-authentication scheme.

The remaining categories map elsewhere in the Mnemom stack, stated honestly: ASI01 (Agent Goal Hijack) — Safe House front-door screening, shipped for direct injection and substantially covering multi-turn goal redirection (residual on novel multi-turn/multi-vector sequences); ASI09 (Human-Agent Trust Exploitation) — shipped front-door detection of authority/urgency/secrecy manipulation; ASI10 (Rogue Agents) — covered at the governance layer (AAP Alignment Cards + CLPI lifecycle + Trust Ratings), not a single front-door pattern. Honest gaps: ASI05 (Unexpected Code Execution) and ASI06 (Memory & Context Poisoning) have no front-door interception today (the policy engine reduces the action surface; AIP gives partial downstream observability — pair with an app-layer sandbox / treat memory as untrusted input), and ASI08 (Cascading Failures) is an application-architecture concern (timeouts, bulkheads, circuit breakers). See /protection-network and /trust.

NIST AI Risk Management Framework.

How Mnemom's shipped runtime controls support the four NIST AI RMF functions. Honest mapping — Mnemom is a runtime trust substrate, not an AI-risk-management program; where a function is the customer's organizational responsibility, we say so.

NIST AI Risk Management Framework (AI RMF 1.0)

AI RMF functionCoverageHow Mnemom supports it
GOVERN
Partial
Alignment Card as the machine-readable per-agent policy artifact (principal, oversight, autonomy envelope) + CLPI lifecycle governance + dual-control Managed Rules promotion. Your organizational governance program (roles, approval authority, third-party-model intake) stays yours.
MAP
Partial
Alignment Card frames each agent's purpose + declared autonomy/integrity bounds; the EU AI Act risk-classification extension + the OWASP Agentic Top 10 threat mapping frame the risk context. Per-agent framing shipped; whole-estate framing is the customer's.
MEASURE
Partial
AIP integrity checkpoints + verdicts (per-decision), the 0–1000 Trust Rating, the published trust.mnemom.ai/slos SLIs, Safe House false-positive telemetry, and AEGIS substrate fingerprinting. Live runtime measurement; pre-deployment model eval is complementary + customer-run.
MANAGE
Partial
Policy Engine bounded-actions enforcement + Safe House observe/nudge/enforce treat detected risk; the advisory CMS + transparency log communicate incidents; AEGIS failover + the always-on responder handle respond/recover. Your org's risk-resource allocation + IR process stay yours.

"Partial" is honest: the AI RMF is a voluntary, non-certifiable framework operated by your organization. Mnemom supplies the runtime controls + verifiable evidence each function can draw on; it does not discharge your GOVERN responsibilities or certify conformity. Full mapping in /guides/eu-compliance.

How AEGIS compares.

Abbreviated from the 2026-05-23 competitive landscape research. AEGIS is the network layer; the vendors below are complementary, not replacements — see /governance for the full integration story.

CapabilityMnemom AEGISCloudflare WAFLakera GuardCisco AI DefenseAWS Bedrock GuardrailsGoogle Model Armor
Cross-tenant Managed Rules with signed promotion
Yes — Ed25519-signed, P95 ≤ 30s propagation, public audit chain
WAF Managed Rules (web-layer, not agent-layer)Vendor-curated threat-intel; no customer-network-derived signalBuild-time SDK embed; no runtime cross-tenant networkAWS-only; no cross-customer learningIn-process filter; no network
Four-checkpoint × four-mode model per-agent
Yes — front door / back door / inside.autonomy / inside.integrity, each independently configurable
Per-route WAF rules; not agent-transaction-shapedSingle-detector at runtimeNeMo Guardrails integration; build-time policyBedrock Guardrails per-policy (denylist, PII, contextual grounding)Prompt-injection + URL + harmful-content filters
Substrate fingerprinting (provider + model + SDK version) on every evaluation
Yes — cross-tenant supply-chain detection
NoNoNoNoNo
Public STIX 2.1 IoC feed + signed advisories
Yes — /v1/trust/iocs (empty at GA by design)
Customer-internal Radar feeds onlyNo public feedTalos for traditional threats; no public agent IoC feedNoNo
Dual-control invariant on tier-1/-2 (enforced in the data model)
Yes — schema-enforced, not procedural
Procedural change-managementVendor-controlledVendor-controlledCustomer policy IAMVendor-controlled

Sources: vendor public documentation 2026-05-23. AEGIS is a layer customers run alongside these products, not a replacement.

SLOs published. Measured continuously.

Headline numbers below. The full table — measurement queries, historical data once the first 30-day window closes, and the four supporting SLOs — lives on /trust/slos.

Managed Rule propagation
P95 ≤ 30s

Signed promotion → gateway-loaded. Published target; first measurements 30 days post-GA.

Failover availability
99.99%

Gateway loads a verified rule set across multiple independent read tiers.

Rule-set freshness
P99 ≤ 5 min

Under normal operation. P0 page at 24h stale.

First 30-day measurement window publishes 30 days post-GA. We do not pre-announce numbers we cannot defend.

See published SLOs

Bring your tools.

The IoC feed is machine-readable STIX 2.1. The audit chain is verifiable. The dashboard is open to every customer.

curl -s https://api.mnemom.ai/v1/trust/iocs | jq .
Featured on There's An AI For That