llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed
Head-to-Head

Compare AI Memory Systems

Side-by-side comparison based on independent benchmark results. All verified scores are from Bench'd runs using our open-source harness under identical conditions.

SystemTypeLongMemEvalLOCOMOStatus
LlamaIndexFramework (OSS)59%54.8%
LLM BaselineNo memory system57.6%50.4%
LangChain MemoryFramework (OSS)59%51.9%
AutoGPT MemoryFramework (OSS)47.4%
CrewAI MemoryFramework (OSS)46%
GraphitiKnowledge Graph (OSS)0%
LettaAgent Framework (OSS)0%
Mem0 OSSOpen Source32.4%0%
Mem0 ManagedManaged Platform93.4%*68.5%*

* Self-reported scores are not independently verified by Bench'd.

Detailed Breakdown

LlamaIndexFramework (OSS)
LongMemEval59%

Document-based memory with vector retrieval and reranking

Strengths

  • Highest verified score
  • Strong recall
  • Active development

Weaknesses

  • Weak temporal reasoning
  • Framework complexity

Best For

Teams already using LlamaIndex for RAG who need conversation memory

LLM BaselineNo memory system
LongMemEval57.6%

Raw GPT-4o-mini context window, no memory layer

Strengths

  • No setup required
  • Full context preservation
  • Zero latency overhead

Weaknesses

  • No temporal indexing
  • Context window limits
  • Cost scales with history

Best For

Short-to-medium conversations where context window fits

LangChain MemoryFramework (OSS)
LongMemEval59%

In-memory message history with LLM-powered recall and smart truncation

Strengths

  • Tied #1 score
  • Large ecosystem
  • Easy integration

Weaknesses

  • Weak temporal reasoning
  • Context truncation on long history
  • No persistent storage

Best For

Teams already using LangChain who need conversation memory

AutoGPT MemoryFramework (OSS)
LongMemEval47.4%

File-backed and vector-store memory for persistent task context across agent execution cycles

Strengths

  • Massive community
  • Autonomous agent integration
  • MIT licensed

Weaknesses

  • Below baseline
  • Weak temporal reasoning
  • Agent-centric design

Best For

Teams building autonomous agents with AutoGPT who need persistent memory

CrewAI MemoryFramework (OSS)
LongMemEval46%

Short-term, long-term, and entity memory for multi-agent crews — below baseline on LongMemEval

Strengths

  • Multi-agent memory sharing
  • Entity memory
  • MIT licensed

Weaknesses

  • Below baseline
  • Weak temporal reasoning
  • Agent-centric design

Best For

Teams using CrewAI for multi-agent orchestration who need shared crew memory

GraphitiKnowledge Graph (OSS)
LongMemEval0%

Temporal knowledge graph with entity and relationship extraction — graph recall returns empty on LongMemEval

Strengths

  • Temporal knowledge graph
  • Entity extraction
  • Apache-2.0 licensed

Weaknesses

  • 0% on LongMemEval
  • Graph recall returns empty
  • Not suited for conversational memory

Best For

Structured knowledge graph use cases — not conversational memory retrieval

LettaAgent Framework (OSS)
LongMemEval0%

Self-editing memory architecture (formerly MemGPT) — interprets rather than recalls, 0/380 on partial run

Strengths

  • Self-editing memory
  • Unbounded context
  • Active community

Weaknesses

  • 0% on LongMemEval
  • Interprets rather than recalls
  • Agent architecture mismatch

Best For

Autonomous agent tasks where interpretation matters more than verbatim recall

Mem0 OSSOpen Source
LongMemEval32.4%

Automatic memory extraction with vector storage

Strengths

  • Simple API
  • Automatic extraction
  • Active community

Weaknesses

  • Below baseline
  • Missing managed platform features
  • Weak temporal

Best For

Quick memory integration where managed Mem0 isn't available

Mem0 ManagedManaged Platform
LongMemEval93.4%

Proprietary extraction, ranking, and retrieval pipeline

Strengths

  • Highest claimed score
  • Managed infrastructure
  • MCP compatible

Weaknesses

  • Scores not independently verified
  • Closed source
  • Paid service

Best For

Production deployments if independent verification confirms claims

Stay in the loop

New benchmark results, methodology updates, and memory system rankings. No spam.

Unsubscribe anytime. We respect your inbox.

Command Palette

Search for a command to run...