Hindsight
Self-ReportedSource-available memory system focused on retrospective analysis and memory consolidation. Periodically re-evaluates stored memories to surface stale or contradictory information.
These scores are self-reported by the vendor and have not been independently verified by Bench'd.
Scores from 0–100. Higher is better. LLM Baseline (no memory system) scores 57.6%. How we calculate this →
TrackConversational Memory
Track IndexNo results yet
Benchmark Results
| Benchmark | Score | Status | Receipt |
|---|---|---|---|
| LongMemEval | Pending | Pending | -- |
| LoCoMo | Pending | Pending | -- |
| Reliability | Pending | Pending | -- |
| Truth Arbitration | Pending | Pending | -- |
| Memory Poisoning | Pending | Pending | -- |
| Budget Curves | Pending | Pending | -- |
| Other Benchmarks | |||
| Knowledge Retrieval | Not applicable — outside Conversational Memory track | ||
| Knowledge Scale | Not applicable — outside Conversational Memory track | ||
Relative Performance vs All Benchmarked Systems
vs 16 scored systemsEach dot is a system. Amber dot is Hindsight. Amber line = LLM Baseline (no memory).
Overall89.081th percentile
No memory: 57.6%gbrain
Recall92.181th percentile
No memory: 57.6%gbrain
Temporal88.481th percentile
No memory: 57.6%gbrain
Reasoning83.781th percentile
No memory: 57.6%gbrain
Bench'd Memory Index
The BMI combines accuracy (70%) and efficiency (30%) into a single production-weighted score. Formula is public and versioned.
89.0
/ 100
#1 of 8 systemsTop 12%
Accuracy (70%)89.0
Efficiency (30%)--
Per-Benchmark Breakdown
| Benchmark | Verified | Nuance |
|---|