llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed
Independent benchmark authority

The scoreboard
for AI memory.

Bench'd runs memory systems through reproducible benchmark protocols — measuring recall, temporal correctness, failure traces, and how efficiently past experience improves future performance.

Every run is cryptographically signed and publicly verifiable. Spend attestation by ProofMeter (Patent Pending).

Independent
Reproducible
Verifiable
Open
Track LeadersVERIFIED

Independently run, cryptographically signed receipts.

64Systems Indexed
13Independently Scored
5Claims Flagged
46Awaiting Adapters

Benchmark Index

Systems indexed across open-source projects, managed memory layers, and agent frameworks.

Conversational Memory
#SystemVerified
1LangMem60.0
2LlamaIndex Memory59.0
3LLM Baseline (GPT-4o-mini)57.6
4LangChain Memory34.0
5Mem0 OSS32.4
Knowledge Brain
#SystemVerified
1gbrain100.0
2Graphiti65.0
3Quivr7.3
4Cognee0.0
Agent Memory
#SystemVerified
1Letta80.0
2AutoGPT Memory47.4
3CrewAI Memory46.0
Self-Reported Claims
#SystemVerified
--Hindsight89.0Self-reported. Not independently run by Bench'd.
--Mastra84.2Self-reported. Not independently run by Bench'd.
--Supermemory81.6Self-reported. Not independently run by Bench'd.
--Zep71.2Self-reported. Not independently run by Bench'd.
--MemPalacePending

Benchmark Coverage

Official Bench'd runs by benchmark. Listed and self-reported systems excluded.

Scoring Model

Verified Scoredeterministic

Exact match, regex, ID retrieval. Pure math. Does not change.

Nuance ScoreLLM-judged

Synthesis and open-ended recall. Contextual. May shift with judge updates.

Read scoring model

Claim your system

Connect your official endpoint and verify your results against the public harness.

Claim profile

Stay in the loop

New benchmark results, methodology updates, and memory system rankings. No spam.

Unsubscribe anytime. We respect your inbox.

All scores are independently run when marked Community-Verified, Vendor-Verified, or Partner-Audited. Listed and Self-Reported systems are clearly labeled. Spend attestation by ProofMeter (Patent Pending).

Command Palette

Search for a command to run...