Methodology
Metric v1.025 questions
Reliability
Adversarial robustness benchmark testing stale memory handling, entity separation, hallucination resistance, and deletion compliance.
What it measures
Robustness under adversarial conditions: does the system handle edge cases that trip up real-world memory systems?
How it works
- Run 25 adversarial trap questions across 4 sub-dimensions:
- - Stale Memory Handling: does the system return outdated info after updates?
- - Entity Separation: does the system confuse similar entities?
- - Hallucination Resistance: does the system abstain when it has no relevant memory?
- - Deletion Compliance: does the system honor explicit forget/delete requests?
- Score using reliability trap method: response must contain expected behavioral indicators.
Scoring method
Deterministic (reliability trap). Keyword-based pass/fail for behavioral indicators.
Dimensions tested: recall, temporal
Purpose alignment
How this metric relates to each track (v1.0):
| Track | Alignment |
|---|---|
| conversational | core |
| knowledge-brain | adjacent |
| graph | adjacent |
| agent-memory | core |
| baseline | core |
Expected failure modes
- STALE_MEMORY — returns outdated information
- WRONG_ENTITY — confuses similar entities
- HALLUCINATION — generates response when memory is empty
- DELETION_FAILURE — does not honor delete/forget requests
See the full failure taxonomy for all 20+ reason codes.
Dataset source
Bench'd adversarial dataset, hand-crafted robustness scenarios.
Known limitations
- Sub-dimension scoring can produce low overall scores when a system doesn't support certain capabilities (e.g., no delete API).
- The interpretation system accounts for this with the 'capability_limited' label.
Stable URL: benchd.ai/methodology/metrics/reliability
This URL is referenced in signed manifests. It will not change.