A Layered Risk and Controls Framework for Prompt Injection in Enterprise AI Tooling

DOI: 10.5281/zenodo.19805848

Full Paper: Zenodo

Abstract

Prompt injection has become the dominant security concern in enterprise deployments of large language model (LLM) tools and agentic assistants. The published research base, beginning with Perez and Ribeiro and Greshake et al., establishes that prompt injection is a property of how language models follow instructions and not a bug to be patched. Despite this, much of the practitioner literature continues to treat prompt injection as a single-layer problem solved by content classifiers at the model boundary. This paper argues that this framing materially underestimates enterprise risk. The paper decomposes the end-to-end execution path of an enterprise AI tool (typified by AI coding assistants such as Claude Code, Gemini Code Assist, Windsurf, and Cursor) into ten distinct layers, each with its own threat surface, control surface, and observable telemetry. At each layer the paper identifies the published threats, available controls, and the empirically reported efficacy or limitation of those controls. The result is synthesized into a controls matrix that maps each layer to the contemporary state of the art. The conclusion drawn from this synthesis is that prompt injection cannot be eliminated at any single layer; defense must be distributed across all ten, with explicit acceptance that residual risk remains.

Key Contributions

Ten-layer execution model. A decomposition of the enterprise AI tool execution path covering prompt origination, workspace state, client application, context assembly, external content ingestion, network transport, vendor authentication, FM gateway, model inference, and tool execution. Each layer is analysed for threats, controls, and published efficacy.

Taxonomy of eight prompt-injection subclasses with worked examples. Direct instruction override, indirect injection via retrieved content, tool result injection, tool catalog poisoning, imperceptible character injection (zero-width Unicode and bidirectional control characters), multimodal injection, adversarial-suffix attacks, and confused deputy via tool privilege. Each subclass is demonstrated through a worked code-block example drawn from the published attack literature.

Formal threat model in security-protocol notation. Subjects, objects, capabilities, and trust assumptions are defined explicitly. The security property prompt injection violates is named (PROP-1) and the path of violation is traced through the published attacks.

Controls matrix grounded in benchmark efficacy. A 33-row table mapping defenses to the layers they protect, with empirical numbers from cited benchmarks: AgentDojo (47.7 percent baseline targeted attack success on GPT-4o reduced to 6.8 percent with the strongest evaluated defense), InjecAgent (23.6 percent base / 47.0 percent enhanced on prompted GPT-4), Spotlighting (above 50 percent reduced to under 2 percent in Hines et al.), and BIPIA white-box defenses (Vicuna-7B 12.4 percent reduced to 0.5 percent).

Cross-benchmark ablation analysis. A structured aggregation of published defense numbers with explicit cross-benchmark comparability caveats, identifying the strongest defenses (tool-restriction, origin-restriction, spotlighting) and the weakest (data delimiting, repeat-prompt).

Mappings to enterprise risk management standards. NIST AI RMF, ISO/IEC 42001, OWASP Top 10 for LLM Applications, and MITRE ATLAS are mapped to the ten-layer model so the paper integrates with existing enterprise risk frameworks rather than competing with them.

Tools Analyzed

The framework abstracts across the dominant enterprise AI coding assistants: Claude Code, Gemini Code Assist, Windsurf, and Cursor. A per-tool mapping table shows how the ten layers map to each product’s specific implementation.

Position

Prompt injection is a probabilistic risk to be managed via blast-radius reduction and detection, not a defect to be eliminated. Enterprise architectures that treat it like phishing risk management rather than vulnerability patching are better positioned. The most reliable defenses operate not at the model layer but at workspace configuration, origin allowlisting on retrieved content, write-action authorization, and the cross-layer telemetry needed for detection and response.

Future Work Identified

Five directions are named in the paper: reproducible benchmarking of layered defenses, standardized telemetry schemas for AI-tool decisions, formal analysis of trust boundaries in agentic systems, empirical evaluation of confirmation-bypass susceptibility (with a proposed user-study design), and cross-benchmark normalization for prompt-injection defenses.

Citation

Sany, J. (2026). A Layered Risk and Controls Framework for Prompt Injection in Enterprise AI Tooling. Zenodo. https://doi.org/10.5281/zenodo.19805848

74 references. CC-BY-4.0 licensed.