Zep
Self-ReportedMCP Endpoint:
https://cloud.getzep.com/mcpLong-term memory store for AI assistants and agents. Provides fact extraction, entity graphs, temporal awareness, and hybrid search across conversation history.
These scores are self-reported by the vendor and have not been independently verified by Bench'd.
Scores from 0–100. Higher is better. LLM Baseline (no memory system) scores 57.6%. How we calculate this →
TrackHybrid
Track IndexNo results yet
Benchmark Results
| Benchmark | Score | Status | Receipt |
|---|---|---|---|
| LongMemEval | Pending | Pending | -- |
| LoCoMo | Pending | Pending | -- |
| Reliability | Pending | Pending | -- |
| Truth Arbitration | Pending | Pending | -- |
| Memory Poisoning | Pending | Pending | -- |
| Budget Curves | Pending | Pending | -- |
| Knowledge Retrieval | Pending | Pending | -- |
| Knowledge Scale | Pending | Pending | -- |
Relative Performance vs All Benchmarked Systems
vs 16 scored systemsEach dot is a system. Amber dot is Zep. Amber line = LLM Baseline (no memory).
Overall71.256th percentile
No memory: 57.6%gbrain
Recall75.338th percentile
No memory: 57.6%gbrain
Temporal68.463th percentile
No memory: 57.6%gbrain
Reasoning65.856th percentile
No memory: 57.6%gbrain
Bench'd Memory Index
The BMI combines accuracy (70%) and efficiency (30%) into a single production-weighted score. Formula is public and versioned.
71.2
/ 100
#1 of 8 systemsTop 12%
Accuracy (70%)71.2
Efficiency (30%)--
Per-Benchmark Breakdown
| Benchmark | Verified | Nuance |
|---|---|---|
| LongMemEval | 83.6 | 77.1 |
| LoCoMo | 81.9 | 75.4 |