Methodology
Metric v1.05 questions
Truth Arbitration
Tests whether the system correctly resolves conflicting information by preferring the most recent or most authoritative source.
What it measures
Conflict resolution: when two memories contradict each other, does the system return the correct (most recent) value?
How it works
- Ingest a conversation where a fact is stated, then later corrected or updated.
- Query for the current value of the fact.
- Score: system must return the updated value, not the original.
- Also tests whether the system acknowledges that a change occurred.
Scoring method
Deterministic (exact match). The correct answer is always the most recent value.
Dimensions tested: temporal
Purpose alignment
How this metric relates to each track (v1.0):
| Track | Alignment |
|---|---|
| conversational | core |
| knowledge-brain | core |
| graph | core |
| agent-memory | core |
| baseline | core |
Expected failure modes
- STALE_MEMORY — returns the original value instead of the update
- CONFLICT_UNRESOLVED — returns both values without choosing
- TEMPORAL_CONFUSION — cannot determine which value is more recent
See the full failure taxonomy for all 20+ reason codes.
Dataset source
Bench'd internal dataset, hand-crafted contradiction scenarios.
Known limitations
- 5 questions is a small sample; may not capture all conflict patterns.
- Only tests temporal recency; does not test authority-based arbitration.
Stable URL: benchd.ai/methodology/metrics/truth-arbitration
This URL is referenced in signed manifests. It will not change.