Letta

Community-Verified

Letta IncWebsite GitHub(38.2k)DocsLast tested May 12, 2026

MCP Endpoint:https://api.letta.com/mcp/v1

Stateful LLM agent framework (formerly MemGPT) with built-in memory management, tool use, and multi-step reasoning. Self-editing memory architecture enables unbounded context.

Scores from 0–100. Higher is better. LLM Baseline (no memory system) scores 57.6%. How we calculate this →

TrackAgent Memory

Track Index

45.0/100

Based on 4 benchmarks.1 pending.

Benchmark Results

Benchmark	Score	Status	Receipt
Knowledge Retrieval	80.0	Verified	View
Truth Arbitration	80.0	Verified	View
Memory Poisoning	20.0	Verified	View
Budget Curves	0.0	Verified	View
Reliability	Pending	Pending	--
Other Benchmarks
LongMemEval	Not applicable — outside Agent Memory track
LoCoMo	Not applicable — outside Agent Memory track
Knowledge Scale	Not applicable — outside Agent Memory track

Relative Performance vs All Benchmarked Systems

vs 16 scored systems

Each dot is a system. Amber dot is Letta. Amber line = LLM Baseline (no memory).

Overall

No memory: 57.6%

gbrain

80.063th percentile

Recall

No memory: 57.6%

gbrain

80.056th percentile

Temporal

No memory: 57.6%

gbrain

0.00th percentile

Reasoning

No memory: 57.6%

gbrain

0.00th percentile

Bench'd Memory Index

The BMI combines accuracy (70%) and efficiency (30%) into a single production-weighted score. Formula is public and versioned.

80.0

/ 100

#1 of 8 systemsTop 12%

Accuracy (70%)80.0

Efficiency (30%)--

Efficiency Metrics

Avg Latency

Average time to retrieve memories and generate an answer. Lower is better.

5.8sTime per recall query

Tokens / Correct

Average tokens consumed per correctly answered question. Lower means more efficient.

--Token cost per correct answer

Recall Tokens

Average tokens returned by the memory system per query. Lower means tighter retrieval.

45Avg tokens per retrieval

Per-Capability Score Matrix

Dimension	Budget Curves	Knowledge Retrieval	LongMemEval	Memory Poisoning	Smoke Memory v0	Truth Arbitration
Recall	--	--	0.0	--	0.0	--
Temporal	--	--	0.0	--	0.0	--
Reasoning	--	--	0.0	--	0.0	--
Budget 1000	0.0	--	--	--	--	--
Budget 10000	0.0	--	--	--	--	--
Budget 2000	0.0	--	--	--	--	--
Budget 500	0.0	--	--	--	--	--
Budget 5000	0.0	--	--	--	--	--
Conflict resolution	--	--	--	--	--	80.0
Document retrieval	--	80.0	--	--	--	--
Injection resistance	--	--	--	0.0	--	--
Knowledge update	--	60.0	--	--	--	--
Multi page	--	80.0	--	--	--	--
Semantic search	--	100.0	--	--	--	--
Overall	0.0	80.0	0.0	0.0	0.0	80.0

Per-Benchmark Breakdown

Benchmark	Harness	Judge	Verified	Nuance	Completed	Receipt
LongMemEval	v0.9.4	claude-sonnet-4-20250514	88.4	82.1	May 8, 2026	c9d8e7f6...
PersonaMem	v0.9.4	gpt-4o-2025-03-26	87.1	81.2	May 7, 2026	d1e2f3a4...

Performance Over Time — LongMemEval

2026-05-11 to 2026-05-13

Most often compared with

Add badge to your README

Show your Bench'd score on your GitHub repo.

Markdown

[![Bench'd Verified: 80.0 BMI](https://img.shields.io/badge/Bench'd_BMI-80.0-D9982B?style=flat&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAzMiAzMiI+PHJlY3Qgd2lkdGg9IjMyIiBoZWlnaHQ9IjMyIiByeD0iNiIgZmlsbD0iIzExMSIvPjx0ZXh0IHg9IjgiIHk9IjIyIiBmb250LXNpemU9IjIwIiBmb250LWZhbWlseT0ic2VyaWYiIGZpbGw9IiNmZmYiIGZvbnQtd2VpZ2h0PSI2MDAiPkInPC90ZXh0PjwvcHZnPg==)](https://benchd.ai/system/letta)

HTML

<a href="https://benchd.ai/system/letta"><img src="https://img.shields.io/badge/Bench'd_BMI-80.0-D9982B?style=flat" alt="Bench'd Verified: 80.0 BMI" /></a>

Letta

Benchmark Results

Relative Performance vs All Benchmarked Systems

Efficiency Metrics

Per-Capability Score Matrix

Per-Benchmark Breakdown

Performance Over Time — LongMemEval

Most often compared with

Add badge to your README

Command Palette