llamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexedllamaindex-memory 0.0 on LoCoMollm-baseline 0.0 on LoCoMomem0-local 0.0 on LongMemEvalmem0-local 0.0 on LongMemEvalllamaindex-memory 0.0 on LongMemEvalllm-baseline 0.0 on LongMemEvallangchain-memory 0.0 on LongMemEvalcognee 0.0 on LongMemEval13 systems independently scored64 systems indexed
Methodology
Metric v1.05 questions

Truth Arbitration

Tests whether the system correctly resolves conflicting information by preferring the most recent or most authoritative source.

What it measures

Conflict resolution: when two memories contradict each other, does the system return the correct (most recent) value?

How it works

  1. Ingest a conversation where a fact is stated, then later corrected or updated.
  2. Query for the current value of the fact.
  3. Score: system must return the updated value, not the original.
  4. Also tests whether the system acknowledges that a change occurred.

Scoring method

Deterministic (exact match). The correct answer is always the most recent value.

Dimensions tested: temporal

Purpose alignment

How this metric relates to each track (v1.0):

TrackAlignment
conversationalcore
knowledge-braincore
graphcore
agent-memorycore
baselinecore

Expected failure modes

  • STALE_MEMORY — returns the original value instead of the update
  • CONFLICT_UNRESOLVED — returns both values without choosing
  • TEMPORAL_CONFUSION — cannot determine which value is more recent

See the full failure taxonomy for all 20+ reason codes.

Dataset source

Bench'd internal dataset, hand-crafted contradiction scenarios.

Known limitations

  • 5 questions is a small sample; may not capture all conflict patterns.
  • Only tests temporal recency; does not test authority-based arbitration.

Stable URL: benchd.ai/methodology/metrics/truth-arbitration
This URL is referenced in signed manifests. It will not change.

Command Palette

Search for a command to run...