llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed
Methodology
Trust Model v1.0

Trust Tiers

Not all benchmark results carry the same weight. Bench'd assigns a trust tier to every score based on who ran the benchmark, how it was verified, and whether the environment was controlled.

Bench'd Certified

benchd-certified

Highest trust level. Benchmark was run by the Bench'd team using official infrastructure, with full isolation verification and cryptographic signing.

Requirements

  • -Run executed by Bench'd team or authorized partner
  • -Full isolation canary passed (no data contamination between runs)
  • -Signed with Bench'd official keypair
  • -Complete manifest with all traces published
  • -Adapter reviewed and approved by Bench'd team

What it means for consumers

You can treat this score as ground truth. The run environment was controlled, the scoring was deterministic, and the receipt is cryptographically verifiable.

Vendor Verified

vendor-verified

Benchmark was run by the system vendor using the official harness, with a signed manifest submitted for verification.

Requirements

  • -Run executed using official benchd-harness package from PyPI
  • -Manifest signed with vendor's own keypair
  • -Signature verified by Bench'd
  • -Adapter code submitted for review (may be proprietary)
  • -At least one benchmark completed with full traces

What it means for consumers

The vendor used our tools and protocol, but controlled the environment. Results are likely accurate but could theoretically be cherry-picked or run on optimized configurations.

Community Verified

community-verified

Benchmark was run by a community member using the official harness. Results are real but the environment was not controlled by Bench'd.

Requirements

  • -Run executed using official benchd-harness package
  • -Manifest signed (any keypair)
  • -At least one benchmark completed
  • -Adapter passes basic validation (benchd adapter validate)

What it means for consumers

These are real benchmark results from a real adapter, but the runner's environment and configuration were not audited. Good for initial ranking; may need re-verification for leaderboard claims.

Listed

listed

System is cataloged but has not been benchmarked yet. May have an adapter in development.

Requirements

  • -System identified as an AI memory system
  • -Basic metadata collected (name, website, category, track)

What it means for consumers

No benchmark data available. The system appears on the site for discovery purposes only. Scores will show as 'pending' or 'adapter_missing'.

Promotion path

Systems move up the trust ladder as they complete verification steps:

ListedCommunityVendorCertified

Vendors can accelerate promotion by submitting their adapter for review and requesting a certified run through the claim flow.

Stable URL: benchd.ai/methodology/trust-tiers
Version: 1.0 | Referenced in signed manifests and adapter contracts.

Command Palette

Search for a command to run...