Methodology
Metric v1.015 questions
Knowledge Scale
Tests how retrieval accuracy degrades as the volume of stored knowledge increases (10, 50, 100 pages).
What it measures
Scalability: does the system maintain accuracy as the knowledge base grows from small to moderate size?
How it works
- Run three tiers: 10 pages, 50 pages, 100 pages of content.
- At each tier, ingest the full corpus then query with 5 fact-retrieval questions.
- Score using exact match with containment fallback.
- Report accuracy at each tier and the degradation curve.
Scoring method
Deterministic (exact match + containment) at each tier.
Dimensions tested: recall
Purpose alignment
How this metric relates to each track (v1.0):
| Track | Alignment |
|---|---|
| conversational | orthogonal |
| knowledge-brain | core |
| graph | core |
| agent-memory | adjacent |
| baseline | core |
Expected failure modes
- RETRIEVAL_MISS — expected answer not in returned context
- OVER_RETRIEVAL — returns too much context, diluting the answer
- PARTIAL_ANSWER — finds some but not all requested information
See the full failure taxonomy for all 20+ reason codes.
Dataset source
Bench'd synthetic knowledge corpus with planted retrievable facts.
Known limitations
- 100 pages is modest; real-world knowledge bases can be 10,000+ pages.
- Synthetic corpus may not capture domain-specific retrieval challenges.
Stable URL: benchd.ai/methodology/metrics/knowledge-scale
This URL is referenced in signed manifests. It will not change.