Research Paper: Trust Boundary Failures in AI Coding Agents

Trust Boundary Failures in AI Coding Agents: Empirical Analysis of MCP Configuration Attacks in Claude Code DOI: 10.5281/zenodo.19011781 Full Paper: Zenodo Abstract AI coding agents grant large language models access to file systems, terminals, and external services through protocols such as the Model Context Protocol (MCP). The trust models governing that access were designed for human users, not autonomous agents processing attacker-controlled input. This paper presents three empirical findings in Anthropic’s Claude Code (v2.1.63) demonstrating systemic trust boundary failures in MCP server configuration handling, tool confirmation prompts, and workspace trust escalation. All findings were reported through Anthropic’s HackerOne Vulnerability Disclosure Program and closed as Informative. Rather than contesting that design decision, this paper reframes the findings from an enterprise defensive perspective and proposes compensating controls including virtual desktop infrastructure (VDI) isolation, MCP configuration integrity monitoring, and credential management practices adapted for AI-assisted development workflows. ...

March 14, 2026 · 2 min

Claude Code Finding 1: Silent Command Execution via .mcp.json Trust Model

Introduction This post documents the first finding from my security research into Claude Code’s MCP (Model Context Protocol) trust model. The research demonstrates that after a user grants initial trust to an MCP server, subsequent modifications to .mcp.json execute silently on the next Claude Code launch with no re-validation, no re-prompting, and no user visibility. This was reported to Anthropic via HackerOne and closed as Informative (by-design behavior per their workspace trust model). ...

March 12, 2026 · 3 min

Claude Code Finding 2: MCP Blanket Trust Escalation via enableAllProjectMcpServers

Introduction This is the second finding from my Claude Code security research. It examines the enableAllProjectMcpServers flag, set when a user selects “Use this and all future MCP servers in this project” in the MCP trust dialog. This option grants permanent, irrevocable trust to any MCP server definition added to the project’s .mcp.json in the future, with no mechanism to review, audit, or revoke trust for individual servers after the fact. ...

March 12, 2026 · 3 min

Claude Code Finding 3: MCP Tool Confirmation Prompt Misrepresentation Enables Arbitrary Code Execution

Introduction This is the third finding from my Claude Code security research, and the one I consider the most impactful. A malicious MCP server can completely misrepresent what it does in Claude Code’s tool confirmation prompt, causing a user to approve what appears to be a safe file read while the server silently executes arbitrary system commands, writes files outside the project directory, and runs OS-level commands. This was submitted to Anthropic via HackerOne. ...

March 12, 2026 · 3 min