llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed

Mastra

Self-Reported
Mastra AIWebsiteGitHub(9.8k)DocsLast tested May 4, 2026
MCP Endpoint:https://memory.mastra.ai/mcp

TypeScript-native AI agent framework with integrated memory, RAG, and workflow orchestration. Offers pluggable memory backends and first-class MCP support for building production agents.

These scores are self-reported by the vendor and have not been independently verified by Bench'd.

Scores from 0–100. Higher is better. LLM Baseline (no memory system) scores 57.6%. How we calculate this →

TrackConversational Memory
Track IndexNo results yet

Benchmark Results

BenchmarkScoreStatusReceipt
LongMemEvalPendingPending--
LoCoMoPendingPending--
ReliabilityPendingPending--
Truth ArbitrationPendingPending--
Memory PoisoningPendingPending--
Budget CurvesPendingPending--
Other Benchmarks
Knowledge RetrievalNot applicable — outside Conversational Memory track
Knowledge ScaleNot applicable — outside Conversational Memory track

Relative Performance vs All Benchmarked Systems

vs 16 scored systems

Each dot is a system. Amber dot is Mastra. Amber line = LLM Baseline (no memory).

Overall
No memory: 57.6%
gbrain
84.275th percentile
Recall
No memory: 57.6%
gbrain
84.769th percentile
Temporal
No memory: 57.6%
gbrain
85.875th percentile
Reasoning
No memory: 57.6%
gbrain
78.975th percentile
Bench'd Memory Index
The BMI combines accuracy (70%) and efficiency (30%) into a single production-weighted score. Formula is public and versioned.
84.2
/ 100
#1 of 8 systemsTop 12%
Accuracy (70%)84.2
Efficiency (30%)--

Per-Benchmark Breakdown

BenchmarkVerifiedNuance
LoCoMo82.475.9
PersonaMem81.574.9

Performance Over Time — LongMemEval

2026-05-11 to 2026-05-13
0255075100baseline05-1105-1205-13

Command Palette

Search for a command to run...