langchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvalverifiedstate 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalLetta 88.4 on LongMemEval3 systems independently scored36 systems indexedlangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvalverifiedstate 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalLetta 88.4 on LongMemEval3 systems independently scored36 systems indexed
Independent benchmark authority

The scoreboard
for AI memory.

Bench'd runs memory systems through reproducible benchmark protocols — measuring recall, temporal correctness, failure traces, and how efficiently past experience improves future performance.

Every run is cryptographically signed and publicly verifiable.

Independent
Reproducible
Verifiable
Open
Latest Verified RunVERIFIED
Run ID

run_a9de0b9efb60

System

langchain-memory

Benchmark

LongMemEval v1.0

Signed

-3d ago

Verified Score

0.0/100

Judged Score

33.4/100

Failure Trace Preview
QueryWhat changed in March?
ExpectedReact migration
ReturnedOld framework
ResultStale memory failure
View full traces
How We Score
Verified Score

Deterministic exact-match and retrieval quality. Pure math.

Judged Score

LLM-judged synthesis and open-ended recall.

Read scoring model
36Systems Indexed
3Independently Scored
6Claims Flagged
27Awaiting Adapters

Benchmark Index

Systems indexed across open-source projects, managed memory layers, and agent frameworks.

#SystemVerified
Bench'd Verified
1LlamaIndex Memory56.8
2LLM Baseline (GPT-4o-mini)54.7
3LangChain Memory34.0
Self-Reported Claims
--Mem093.4Self-reported. Not independently run by Bench'd.
--Hindsight89.0Self-reported. Not independently run by Bench'd.
--Mastra84.2Self-reported. Not independently run by Bench'd.
--Supermemory81.6Self-reported. Not independently run by Bench'd.
--Zep71.2Self-reported. Not independently run by Bench'd.
--MemPalacePending
Listed / Awaiting Run
--AutoGPT MemoryPending
--ByteRoverPending
--Claude Code MemoryPending
--CogneePending
--CrewAI MemoryPending
--Cursor MemoryPending
--ENGRAMPending
--GraphitiPending
--gstackPending
--HonchoPending
--JulepPending
--KhojPending
--LangMemPending
--LettaPending
--MemaryPending
--MemMachinePending
--MemobasePending
--MemobasePending
--MemoriPending
--MemoriPending
--MemoryOSPending
--MemOSPending
--Microsoft GraphRAGPending
--Obsidian Smart ConnectionsPending
--OMEGAPending
--OpenMemoryPending
--ReMePending
Full leaderboard with detailed view

Benchmark Coverage

Official Bench'd runs by benchmark. Listed and self-reported systems excluded.

Scoring Model

Verified Scoredeterministic

Exact match, regex, ID retrieval. Pure math. Does not change.

Nuance ScoreLLM-judged

Synthesis and open-ended recall. Contextual. May shift with judge updates.

Read scoring model

Run Queue

27 awaiting adapters
4 missing wrappers
6 flagged claims
View methodology

Claim your system

Connect your official endpoint and verify your results against the public harness.

Claim profile

All scores are independently run when marked Community-Verified, Vendor-Verified, or Partner-Audited. Listed and Self-Reported systems are clearly labeled.

Command Palette

Search for a command to run...