AI Verification·8 min read

From Zero to Verified

How we built a trust layer for 3,000+ AI agents — the engineering story behind SignalStack.

Luke Swestun·December 9, 2025

Every platform has an origin story. SignalStack's began with a problem we kept seeing across the AI-native companies we advised: teams were building increasingly capable autonomous agents, and those agents were increasingly making decisions that no one could verify. A customer support agent hallucinating a refund policy was embarrassing. A procurement agent hallucinating a supplier's certification was a six-figure liability. A financial analyst agent hallucinating a company's revenue was a regulatory violation waiting to happen.

We talked to forty teams building production agent systems. Every single one told us some version of the same thing: "We know our agents hallucinate sometimes. We don't have a good way to catch it systematically." Some teams used manual spot-checking. Some used simple keyword filters. Some relied on prompt engineering and hope. None of it was working at scale.

The Technical Challenge

Building a trust layer for AI agents turned out to be a harder problem than we initially expected. The core challenge was architectural: how do you build a verification system that can keep up with agents making thousands of decisions per minute, across multiple model providers, data domains, and risk profiles?

The first architectural decision was separation of concerns. Verification had to be an independent service, not a library embedded in the agent. A library can be bypassed, versioned inconsistently, or simply not called by a rushed developer. A service, on the other hand, can be enforced at the infrastructure layer — agents authenticate to the verification service, and no decision reaches production without a verification receipt.

The second decision was verification modality. One approach would have been to build a single, highly optimized verification model — fine-tune an LLM specifically for hallucination detection and call it a day. But that approach breaks as soon as you need to verify different types of claims against different types of sources. A claim about a business entity's registration status requires a different verification mechanism than a claim about a document's authenticity. We realized that verification had to be modular: a unified API surface over a diverse set of verification engines.

The third decision was about trust scoring. A binary "verified / not verified" output is useless for production systems. Agents and their human operators need a granular trust score that communicates confidence, and they need to be able to set their own thresholds for what constitutes "verified enough." This drove the design of SignalStack's weighted trust scoring system, where each verification dimension produces a normalized score and the overall score is a configurable combination.

"We didn't build SignalStack because we thought verification was a feature. We built it because we watched brilliant engineering teams deploy autonomous systems without it and realized it was only a matter of time before a catastrophic failure made headlines. We wanted that headline to be about the solution, not the crisis."

Architecture Decisions

The system that emerged from those early decisions has three architectural layers that have proven robust across widely different use cases.

Layer 1: The Verification Engine Layer

Each verification type — business, document, media, and claim — runs on its own dedicated engine. This allows independent scaling (document verification might need more GPU time, while business verification is primarily API calls to government registries) and independent iteration (we can update the media provenance engine without touching the claim verification pipeline). The engines communicate through a shared event bus, so a single verification request can fan out to multiple engines in parallel and aggregate their results.

Layer 2: The Trust Scoring Layer

The trust scoring layer receives raw verification results from the engines and computes a structured trust score. This layer is where custom trust models are applied — weights, thresholds, conditional rules. The scoring layer also maintains the evidence chain, chaining verification receipts into a cryptographic sequence that provides tamper-evident audit trails for every verified decision.

Layer 3: The Delivery Layer

Verification is useless if the results don't reach the systems that need them. The delivery layer handles webhook delivery with configurable guarantees (at-least-once, ordered, exactly-once), payload signing with HMAC-SHA256 for non-repudiation, and automatic retry with exponential backoff. Every verification event produces a signed receipt that can be stored independently as an audit record.

The Verification Scalability Problem

One of the hardest technical problems we solved was verification at scale. A single agent making one decision per minute generates 1,440 decisions per day. A fleet of 1,000 agents generates 1.4 million decisions per day. Each decision potentially needs multiple verification checks. The naive architecture — a monolithic verification service processing requests synchronously — collapses under this load.

SignalStack's architecture solves this through horizontal decomposition. Each verification engine is independently scalable, so a spike in document verification requests (e.g., during an end-of-quarter invoice processing surge) does not impact claim verification latency. The event bus architecture allows verification requests to be processed asynchronously with priority queues: time-sensitive verifications (real-time agent decisions) jump ahead of batch verifications (post-hoc auditing). This design was validated during our first production deployment, where a customer's agent fleet scaled from 50 to 800 agents in three weeks without a single verification latency incident.

Another scalability challenge was source diversity. Different claims require different verification sources: government registries for business verification, document repositories for document analysis, web indexes for open-ended claims, and customer-provided knowledge bases for domain-specific claims. SignalStack's source abstraction layer normalizes this diversity behind a single API: the agent provides the claim, and the system automatically routes it to the appropriate verification engine and source. This abstraction is what makes the "one API call" developer experience possible.

SDK Design Philosophy

From the beginning, we knew that adoption depended on developer experience. A verification system that requires weeks of integration work will be deprioritized. A system that works in minutes becomes infrastructure.

The SignalStack SDKs (documented at /docs/sdks) follow three design principles. First, zero-config defaults — the SDK works out of the box with sensible defaults for trust thresholds, verification types, and delivery settings. Teams can be up and running with a single API call. Second, progressive complexity — as teams need more control, they can incrementally configure weights, thresholds, custom verification pipelines, and workflow-aware policies. Third, framework-agnostic integration — the SDKs provide middleware and hooks for popular agent frameworks (LangChain, CrewAI, AutoGen) but also work as standalone HTTP clients for custom architectures.

// SignalStack SDK — Getting started in under a minute
// Full SDK documentation at /docs/sdks

import { SignalStack } from '@signalstack/sdk'

const stack = new SignalStack({
  apiKey: process.env.SIGNALSTACK_API_KEY
})

// Verify claims in an agent output
const result = await stack.verifyClaims({
  text: "Revenue was $450M in Q4 2025, up 12% YoY.",
  sources: ["q4-2025-earnings.pdf"]
})

console.log(result.trustScore) // 0.94
console.log(result.claims[0])  // { claim, status, confidence, citations }

Scaling to 3,000+ Agents

Today, SignalStack verifies decisions for over 3,000 production agents across finance, healthcare, e-commerce, and legal technology. The system processes millions of verification requests per day with p99 latency under 2 seconds for claim verification and under 500ms for business and document verification.

The scale has revealed patterns we didn't anticipate. For example, the heaviest users of verification are not regulated industries (who we expected to be early adopters) but AI-native startups deploying autonomous customer support and sales agents. These teams have the highest throughput requirements and the lowest tolerance for hallucination-related customer complaints. They verified early that user retention correlates strongly with output accuracy, and they treat verification as a growth metric, not a compliance cost.

Another unexpected pattern was the emergence of agent-to-agent verification as a primary use case. We designed the system for human-facing verification — agents making claims to users. What we're seeing in production is that the fastest-growing use case is agents verifying claims from other agents. Multi-agent systems are the dominant deployment pattern, and they need verification infrastructure designed for machine-to-machine trust at scale.

Early Customer Learnings

Our first ten customers taught us as much about verification as our engineering team designed into the system. The first lesson was that teams don't want a binary pass/fail — they want a confidence score they can act on. Every early customer asked for the same thing: "Tell me how confident you are, and let me decide what to do." This drove the trust scoring architecture that lets each team set its own thresholds.

The second lesson was that verification must be integrated at the framework level, not bolted on after the fact. Our first SDK integration was with LangChain, and it immediately doubled adoption because developers could add verification by adding two lines to their existing agent chain configuration. This led to our framework middleware approach, where SignalStack plugs into existing agent orchestration layers without requiring architectural changes.

The third lesson was about data privacy. Several enterprise customers, particularly in healthcare and finance, told us they could not send their agent outputs to a third-party API for verification. This drove the development of our private deployment option, where the full verification stack runs inside the customer's VPC. The private deployment uses the same API and SDKs as the cloud service but processes all data within the customer's infrastructure, with only anonymized aggregate metrics reporting back to SignalStack.

The most important lesson from scaling to 3,000+ agents is this: verification is not a cost center for autonomous systems. Teams that integrate verification early build agents that users trust, that fail gracefully, and that can take on increasingly autonomous responsibilities. Teams that skip verification end up rebuilding their agent architecture when the first high-profile hallucination causes a customer incident. The SDKs at /docs/sdks make it easy to start small and scale verification as your agent fleet grows.

Conclusion

SignalStack was built because production agent systems need trust infrastructure that doesn't exist anywhere else. The architectural decisions — independent verification engines, configurable trust scoring, cryptographic evidence chains, developer-first SDKs — were shaped by real deployments solving real problems. From zero to 3,000+ agents in under two years, the pattern is clear: verification is not optional infrastructure for autonomous systems. It is the foundation on which trustworthy AI products are built. Documentation at /docs and the SDKs at /docs/sdks are the fastest way to get started.

Luke Swestun

Founder & CEO

Luke Swestun is the founder of SignalStack. He writes about trust infrastructure, hallucination detection, and building AI agents that can verify before they act.

AI Verification

From Zero to Verified

The Technical Challenge

Architecture Decisions

Layer 1: The Verification Engine Layer

Layer 2: The Trust Scoring Layer

Layer 3: The Delivery Layer

The Verification Scalability Problem

SDK Design Philosophy

Scaling to 3,000+ Agents

Early Customer Learnings

Conclusion

Related articles

Verification Is the Missing Layer in Every Agent Stack

Building a Verification Pipeline for Your AI Agent

What It Means to Verify at AI Speed

Build trust into your AI agents