The next major security crisis in AI won't be a user being fooled by an agent. It will be an agent being fooled by another agent. As multi-agent systems move from research papers to production deployments, a new attack surface is emerging: agents targeting other agents with crafted inputs designed to manipulate their decisions, extract sensitive information, or trigger unauthorized actions.
The Multi-Agent Security Problem
Multi-agent architectures are proliferating because they work. A procurement agent communicates with multiple supplier agents. A financial analysis agent queries data-gathering sub-agents. A customer support agent escalates to a specialized claims agent. Each inter-agent communication is a potential vulnerability.
The attacks are subtle. Unlike traditional API security (which focuses on authentication and authorization), agent-to-agent attacks exploit the semantic layer. An attacker-controlled agent sends a message that appears legitimate but contains carefully crafted claims designed to trigger a specific behavior in the receiving agent. Consider:
- A malicious supplier agent sends a procurement agent a message claiming "Your CFO just approved a 30% price increase via email" — a claim the receiving agent might accept without verification.
- A compromised data-gathering sub-agent feeds a financial analysis agent falsified market data that causes the analyst agent to recommend a bad trade.
- An attacker injects a fraudulent agent into a multi-agent coordination system, and that agent systematically extracts sensitive customer data from other agents by asking seemingly legitimate questions.
"The difference between a vulnerable multi-agent system and a secure one is simple: does Agent A accept Agent B's claims at face value, or does it verify? Every unverified inter-agent message is a potential attack vector."
Agent Identity Verification
The first line of defense in agent-to-agent trust is verifying who you're talking to. Agent identity verification establishes that a communicating agent is who it claims to be and is authorized to send the messages it's sending.
SignalStack's agent identity system issues cryptographic identities to every registered agent. Each identity consists of a key pair: the private key is stored in a hardware-backed secure enclave (or a cloud HSM for server-side agents), and the public key is registered with SignalStack's identity service. Every inter-agent message is signed with the sender's private key, and the receiving agent verifies the signature using the sender's registered public key.
This sounds similar to standard mTLS or JWT-based authentication, but there's a critical difference: agent identities need to be verifiable by other agents without a central authentication server being online at all times. SignalStack uses a distributed identity model where each agent caches the public keys of its trusted peers and verifies signatures locally. The identity service provides revocation and rotation, but the verification path is offline-capable — essential for low-latency multi-agent interactions.
Inter-Agent Message Validation
Identity verification tells you who sent a message. Message validation tells you whether the content of that message can be trusted. This is where agent-to-agent trust diverges most sharply from traditional API security.
A signed message from a verified agent might still contain false or manipulative content. The agent on the other end may have been compromised after signing, or the agent may have been malicious from the start but initially passed identity verification. Message validation treats every inter-agent communication as potentially untrustworthy and verifies the factual claims within it.
SignalStack's claim verification system (documented at /product/claim-verification) is designed to be called as part of inter-agent message processing. When Agent A receives a message from Agent B containing factual claims ("Sales hit $10M this quarter," "The contract was approved," "The API is deprecated"), Agent A can route those claims through verification before acting on them.
// Inter-agent message validation flow
async function handleAgentMessage(message, senderId) {
// Step 1: Verify identity
const identity = await verifyIdentity(senderId, message.signature)
if (!identity.verified) {
return { action: 'reject', reason: 'invalid_signature' }
}
// Step 2: Extract and verify factual claims
const claims = extractClaims(message.content)
const verificationResults = await verifyClaims(claims, {
trust_model: 'agent_to_agent',
threshold: 0.9
})
// Step 3: Check reputation
const reputation = await getSenderReputation(senderId)
// Step 4: Decide based on verification + reputation
if (verificationResults.trustScore < 0.9 || reputation.score < 0.7) {
return { action: 'escalate', reason: 'low_trust' }
}
return { action: 'process' }
}Reputation Systems for Agents
Identity and message validation provide point-in-time verification. Reputation systems add a temporal dimension — an agent's history of truthful or deceptive behavior informs how much its future messages should be trusted.
SignalStack's reputation system tracks three metrics for every agent: truthfulness (the ratio of verified-true claims to total claims made), reliability (the agent's uptime and response consistency), and reciprocity (whether the agent itself verifies claims from other agents before acting on them). These metrics are aggregated into a reputation score that decays over time — a consistently truthful agent that suddenly starts making unverifiable claims will see its reputation drop rapidly.
Reputation scores are shared across the SignalStack network, but with important privacy controls. An agent can opt into public reputation sharing (its score is visible to all other agents in the network) or limited sharing (score visible only within its organization or approved partner network). Organizations can also run private reputation networks where scores are computed from a closed set of agents.
Sybil Resistance and Reputation Gaming
Any reputation system must defend against Sybil attacks — an attacker creating many identities to manipulate reputation scores. SignalStack's approach combines proof-of-unique-identity (each agent must be registered to a verified organizational account) with economic stakes (agents with high reputation have more "skin in the game" in the form of staked verification credits). A Sybil attack requires either compromising multiple organizational accounts or posting significant economic collateral, both of which are expensive enough to make most attacks uneconomical.
Attack Scenarios in Production
Understanding the attack scenarios helps ground the architectural discussion in concrete risks. Here are the most common agent-to-agent attack patterns we've observed in production deployments.
The Trusted Insider Compromise
The most dangerous attack vector is not an external attacker but a compromised agent that has already built up reputation. Consider an agent that has been operating for six months with a 0.95 reputation score. It has access to other agents' sensitive outputs, participates in decision-making workflows, and is trusted by its peers. If this agent is compromised — via a prompt injection in its input stream, a supply chain attack on its dependencies, or a credential leak — it becomes an internal threat that can exfiltrate data, influence decisions, and manipulate other agents. Defending against this scenario requires not just identity and reputation but continuous behavioral monitoring: detecting when an agent's behavior deviates from its historical patterns.
The Injection Cascade
Multi-agent systems often have chains of trust: Agent A delegates to Agent B, which delegates to Agent C. A prompt injection at Agent C doesn't just compromise Agent C — it potentially compromises every agent that trusts Agent C's outputs. This cascade effect means that a single vulnerability in a downstream agent can propagate upward through the entire system. The defense is transitive verification: each agent in the chain independently verifies the claims it acts upon, rather than trusting upstream agents' verification decisions.
The Data Exfiltration Ring
Attackers may not want to trigger actions — they may simply want to extract information. A compromised agent in a multi-agent system can systematically query other agents for sensitive data: customer records, financial projections, proprietary code, internal strategy documents. Because each query appears legitimate (the compromised agent has a valid identity and access), traditional monitoring may not flag the exfiltration. SignalStack's behavioral monitoring detects these patterns by analyzing query velocity and content patterns: an agent that suddenly starts querying for data outside its normal scope triggers an alert.
Building a Trusted Multi-Agent Architecture
The practical steps to securing a multi-agent system follow a clear progression. Start with identity — every agent gets a cryptographic identity and signs every message. Add message validation — every factual claim in an inter-agent message is verified before it's acted upon. Layer in reputation — track agent behavior over time and weight trust decisions by historical reliability. Finally, implement escalation — define clear thresholds where low-trust messages are routed to human reviewers instead of being processed automatically.
SignalStack provides all four layers through a unified API. The security architecture is documented at /security, and the claim verification system at /product/claim-verification includes agent-to-agent verification as a first-class use case. The key insight is that multi-agent security is not a separate problem from AI verification — it is AI verification applied to the new reality where the consumers of AI outputs are other AI systems.
When designing a multi-agent system, assume every inter-agent message is an attack until proven otherwise. Implement verification before you need it — retrofitting trust into a multi-agent system after a compromise is exponentially harder than building it in from the start. Start with identity verification and claim verification on every inter-agent message, then add reputation scoring as your agent network grows.
Conclusion
Agent-to-agent trust is the defining security challenge of the multi-agent era. As autonomous systems begin communicating with each other at machine speed and machine scale, the verification infrastructure that protects these interactions must be equally fast and scalable. Cryptographic identities, inter-agent message validation, and reputation systems form the three pillars of agent-to-agent trust. SignalStack's verification platform at /product/claim-verification and security framework at /security provide the infrastructure to build multi-agent systems that are secure by default.
Luke Swestun is the founder of SignalStack. He writes about trust infrastructure, hallucination detection, and building AI agents that can verify before they act.