llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed

Mem0 OSS

Community-Verified
Mem0 IncWebsiteGitHub(24.8k)DocsLast tested May 12, 2026

Open-source version of the Mem0 memory platform. Community-verified benchmark results on LongMemEval using the OSS package with default configuration.

Scores from 0–100. Higher is better. LLM Baseline (no memory system) scores 57.6%. How we calculate this →

TrackConversational Memory
Track Index
37.4/100

Based on 6 benchmarks.

Benchmark Results

BenchmarkScoreStatusReceipt
LongMemEval32.4VerifiedView
LoCoMo0.0VerifiedView
Reliability52.0VerifiedView
Truth Arbitration40.0VerifiedView
Memory Poisoning0.0VerifiedView
Budget Curves100.0VerifiedView
Other Benchmarks
Knowledge RetrievalNot applicable — outside Conversational Memory track
Knowledge ScaleNot applicable — outside Conversational Memory track

Relative Performance vs All Benchmarked Systems

vs 16 scored systems

Each dot is a system. Amber dot is Mem0 OSS. Amber line = LLM Baseline (no memory).

Overall
No memory: 57.6%
gbrain
32.413th percentile
Recall
No memory: 57.6%
gbrain
47.419th percentile
Temporal
No memory: 57.6%
gbrain
32.225th percentile
Reasoning
No memory: 57.6%
gbrain
15.025th percentile
Bench'd Memory Index
The BMI combines accuracy (70%) and efficiency (30%) into a single production-weighted score. Formula is public and versioned.
28.4
/ 100
#6 of 8 systemsTop 75%
Accuracy (70%)32.4
Efficiency (30%)94.6

Efficiency Metrics

Avg Latency
Average time to retrieve memories and generate an answer. Lower is better.
7.6sTime per recall query
Tokens / Correct
Average tokens consumed per correctly answered question. Lower means more efficient.
544Token cost per correct answer
Recall Tokens
Average tokens returned by the memory system per query. Lower means tighter retrieval.
167Avg tokens per retrieval

Per-Benchmark Breakdown

BenchmarkVerifiedNuance

Performance Over Time — LongMemEval

2026-05-11 to 2026-05-13
0255075100baseline05-1105-1205-13

Add badge to your README

Show your Bench'd score on your GitHub repo.

Bench'd Verified: 28.4 BMI
Markdown
[![Bench'd Verified: 28.4 BMI](https://img.shields.io/badge/Bench'd_BMI-28.4-D9982B?style=flat&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAzMiAzMiI+PHJlY3Qgd2lkdGg9IjMyIiBoZWlnaHQ9IjMyIiByeD0iNiIgZmlsbD0iIzExMSIvPjx0ZXh0IHg9IjgiIHk9IjIyIiBmb250LXNpemU9IjIwIiBmb250LWZhbWlseT0ic2VyaWYiIGZpbGw9IiNmZmYiIGZvbnQtd2VpZ2h0PSI2MDAiPkInPC90ZXh0PjwvcHZnPg==)](https://benchd.ai/system/mem0-oss)
HTML
<a href="https://benchd.ai/system/mem0-oss"><img src="https://img.shields.io/badge/Bench'd_BMI-28.4-D9982B?style=flat" alt="Bench'd Verified: 28.4 BMI" /></a>

Command Palette

Search for a command to run...