Independent benchmark authority

The scoreboard
for AI memory.

Bench'd runs memory systems through reproducible benchmark protocols — measuring recall, temporal correctness, failure traces, and how efficiently past experience improves future performance.

Every run is cryptographically signed and publicly verifiable. Spend attestation by ProofMeter (Patent Pending).

View Leaderboard Read Methodology

Independent

Reproducible

Verifiable

Open

Track LeadersVERIFIED

Conversational Memory

LlamaIndex Memory

gbrain

Letta

Independently run, cryptographically signed receipts.

64Systems Indexed

13Independently Scored

5Claims Flagged

46Awaiting Adapters

2h agoLatest Receipt

Benchmark Index

Systems indexed across open-source projects, managed memory layers, and agent frameworks.

Conversational Memory

#	System	Source	Tier	Verified	Nuance	Tested	Trend
1	LangMem	OSS	Community-Verified	60.0	60.0	-4d ago
2	LlamaIndex Memory	Framework	Community-Verified	59.0	59.0	-2d ago	--
3	LLM Baseline (GPT-4o-mini)	Research	Community-Verified	57.6	57.6	-2d ago	--
4	LangChain Memory	Framework	Community-Verified	34.0	34.0	-2d ago	--
5	Mem0 OSS	OSS	Community-Verified	32.4	32.4	-3d ago

View full track

Knowledge Brain

#	System	Source	Tier	Verified	Nuance	Tested
1	gbrain	OSS	Community-Verified	100.0	100.0	-5d ago
2	Graphiti	OSS	Community-Verified	65.0	65.0	-7d ago
3	Quivr	OSS	Community-Verified	7.3	7.3	-7d ago
4	Cognee	OSS	Community-Verified	0.0	0.0	-7d ago

View full track

Agent Memory

#	System	Source	Tier	Verified	Nuance	Tested
1	Letta	OSS	Community-Verified	80.0	80.0	-3d ago
2	AutoGPT Memory	Framework	Community-Verified	47.4	47.4	-3d ago
3	CrewAI Memory	Framework	Community-Verified	46.0	46.0	-3d ago

View full track

Self-Reported Claims

#	System	Source	Tier	Verified	Nuance	Tested	Trend
--	Hindsight	Source Available	Self-Reported	89.0Self-reported. Not independently run by Bench'd.	83.0	flagged	unverified
--	Mastra	Framework	Self-Reported	84.2Self-reported. Not independently run by Bench'd.	78.2	flagged	unverified
--	Supermemory	OSS	Self-Reported	81.6Self-reported. Not independently run by Bench'd.	75.6	flagged	unverified
--	Zep	Open Core	Self-Reported	71.2Self-reported. Not independently run by Bench'd.	65.2	flagged	unverified
--	MemPalace	OSS	Self-Reported	Pending	--	flagged	unverified

Listed / Awaiting Run46 systems awaiting adapters. View all

Benchmark Coverage

Official Bench'd runs by benchmark. Listed and self-reported systems excluded.

System	LongMemEval	LoCoMo	PersonaMem
gbrain
Letta
Graphiti
LangMem
LlamaIndex Memory
LLM Baseline (GPT-4o-mini)
AutoGPT Memory
CrewAI Memory

Scoring Model

Verified Scoredeterministic

Exact match, regex, ID retrieval. Pure math. Does not change.

Nuance ScoreLLM-judged

Synthesis and open-ended recall. Contextual. May shift with judge updates.

Read scoring model

Latest Signed Receipts

llamaindex-memoryLoCoMo

0.0-3d ago

llm-baselineLoCoMo

0.0-3d ago

mem0-localLongMemEval

0.0-3d ago

View receipts

Open Methodology

8 benchmark specs published 23 failure codes documented 4 trust tiers defined ProofMeter spend attestation (Patent Pending)

Full methodology

Claim your system

Connect your official endpoint and verify your results against the public harness.

Claim profile

Stay in the loop

New benchmark results, methodology updates, and memory system rankings. No spam.

Unsubscribe anytime. We respect your inbox.

All scores are independently run when marked Community-Verified, Vendor-Verified, or Partner-Audited. Listed and Self-Reported systems are clearly labeled. Spend attestation by ProofMeter (Patent Pending).

The scoreboardfor AI memory.

Benchmark Index

Benchmark Coverage

Scoring Model

Latest Signed Receipts

Open Methodology

Claim your system

Stay in the loop

Command Palette

The scoreboard
for AI memory.