Trust Tiers
Not all benchmark results carry the same weight. Bench'd assigns a trust tier to every score based on who ran the benchmark, how it was verified, and whether the environment was controlled.
Bench'd Certified
benchd-certifiedHighest trust level. Benchmark was run by the Bench'd team using official infrastructure, with full isolation verification and cryptographic signing.
Requirements
- -Run executed by Bench'd team or authorized partner
- -Full isolation canary passed (no data contamination between runs)
- -Signed with Bench'd official keypair
- -Complete manifest with all traces published
- -Adapter reviewed and approved by Bench'd team
What it means for consumers
You can treat this score as ground truth. The run environment was controlled, the scoring was deterministic, and the receipt is cryptographically verifiable.
Vendor Verified
vendor-verifiedBenchmark was run by the system vendor using the official harness, with a signed manifest submitted for verification.
Requirements
- -Run executed using official benchd-harness package from PyPI
- -Manifest signed with vendor's own keypair
- -Signature verified by Bench'd
- -Adapter code submitted for review (may be proprietary)
- -At least one benchmark completed with full traces
What it means for consumers
The vendor used our tools and protocol, but controlled the environment. Results are likely accurate but could theoretically be cherry-picked or run on optimized configurations.
Community Verified
community-verifiedBenchmark was run by a community member using the official harness. Results are real but the environment was not controlled by Bench'd.
Requirements
- -Run executed using official benchd-harness package
- -Manifest signed (any keypair)
- -At least one benchmark completed
- -Adapter passes basic validation (benchd adapter validate)
What it means for consumers
These are real benchmark results from a real adapter, but the runner's environment and configuration were not audited. Good for initial ranking; may need re-verification for leaderboard claims.
Listed
listedSystem is cataloged but has not been benchmarked yet. May have an adapter in development.
Requirements
- -System identified as an AI memory system
- -Basic metadata collected (name, website, category, track)
What it means for consumers
No benchmark data available. The system appears on the site for discovery purposes only. Scores will show as 'pending' or 'adapter_missing'.
Promotion path
Systems move up the trust ladder as they complete verification steps:
Vendors can accelerate promotion by submitting their adapter for review and requesting a certified run through the claim flow.
Stable URL: benchd.ai/methodology/trust-tiers
Version: 1.0 | Referenced in signed manifests and adapter contracts.