Metric v1.020 questions

Knowledge Retrieval

Measures whether a memory system can store and accurately retrieve factual information from conversational history.

What it measures

Core retrieval accuracy: given a conversation history containing specific facts, can the system find and return the correct answer when queried?

Ingest a conversation history containing 5-10 turns with embedded facts (names, dates, preferences, events).
Query the system with questions that require retrieving specific facts from the ingested history.
Score each response using exact match with containment fallback (normalized, case-insensitive).
Report the percentage of questions answered correctly.

Deterministic (exact match + containment). No LLM judge required.

Dimensions tested: recall

How this metric relates to each track (v1.0):

See the full failure taxonomy for all 20+ reason codes.

Bench'd internal dataset, hand-crafted conversational scenarios.

Tests single-conversation retrieval only; does not test cross-conversation recall.
20 questions may not capture long-tail failure modes.
Exact match scoring may miss semantically correct but differently worded answers.

Stable URL: benchd.ai/methodology/metrics/knowledge-retrieval
This URL is referenced in signed manifests. It will not change.