The scoreboard
for AI memory.
Bench'd runs memory systems through reproducible benchmark protocols — measuring recall, temporal correctness, failure traces, and how efficiently past experience improves future performance.
Every run is cryptographically signed and publicly verifiable.
run_a9de0b9efb60
langchain-memory
LongMemEval v1.0
-3d ago
0.0/100
33.4/100
Deterministic exact-match and retrieval quality. Pure math.
LLM-judged synthesis and open-ended recall.
Benchmark Index
Systems indexed across open-source projects, managed memory layers, and agent frameworks.
| # | System | Verified | |||||
|---|---|---|---|---|---|---|---|
| Bench'd Verified | |||||||
| 1 | LlamaIndex Memory | 56.8 | |||||
| 2 | LLM Baseline (GPT-4o-mini) | 54.7 | |||||
| 3 | LangChain Memory | 34.0 | |||||
| Self-Reported Claims | |||||||
| -- | Mem0 | 93.4Self-reported. Not independently run by Bench'd. | |||||
| -- | Hindsight | 89.0Self-reported. Not independently run by Bench'd. | |||||
| -- | Mastra | 84.2Self-reported. Not independently run by Bench'd. | |||||
| -- | Supermemory | 81.6Self-reported. Not independently run by Bench'd. | |||||
| -- | Zep | 71.2Self-reported. Not independently run by Bench'd. | |||||
| -- | MemPalace | Pending | |||||
| Listed / Awaiting Run | |||||||
| -- | AutoGPT Memory | Pending | |||||
| -- | ByteRover | Pending | |||||
| -- | Claude Code Memory | Pending | |||||
| -- | Cognee | Pending | |||||
| -- | CrewAI Memory | Pending | |||||
| -- | Cursor Memory | Pending | |||||
| -- | ENGRAM | Pending | |||||
| -- | Graphiti | Pending | |||||
| -- | gstack | Pending | |||||
| -- | Honcho | Pending | |||||
| -- | Julep | Pending | |||||
| -- | Khoj | Pending | |||||
| -- | LangMem | Pending | |||||
| -- | Letta | Pending | |||||
| -- | Memary | Pending | |||||
| -- | MemMachine | Pending | |||||
| -- | Memobase | Pending | |||||
| -- | Memobase | Pending | |||||
| -- | Memori | Pending | |||||
| -- | Memori | Pending | |||||
| -- | MemoryOS | Pending | |||||
| -- | MemOS | Pending | |||||
| -- | Microsoft GraphRAG | Pending | |||||
| -- | Obsidian Smart Connections | Pending | |||||
| -- | OMEGA | Pending | |||||
| -- | OpenMemory | Pending | |||||
| -- | ReMe | Pending | |||||
Benchmark Coverage
Official Bench'd runs by benchmark. Listed and self-reported systems excluded.
| System | LongMemEval | LoCoMo | PersonaMem |
|---|---|---|---|
| LlamaIndex Memory | |||
| LLM Baseline (GPT-4o-mini) | |||
| LangChain Memory |
Scoring Model
Exact match, regex, ID retrieval. Pure math. Does not change.
Synthesis and open-ended recall. Contextual. May shift with judge updates.
Latest Signed Receipts
View receiptsClaim your system
Connect your official endpoint and verify your results against the public harness.
Claim profileAll scores are independently run when marked Community-Verified, Vendor-Verified, or Partner-Audited. Listed and Self-Reported systems are clearly labeled.