ProofMeter Specification v1.0

Receipt Specification

ProofMeter provides cryptographic spend attestation for AI agent actions. This document defines the receipt format, budget capability model, signing scheme, and verification rules. Patent Pending.

Core loop

Authorize→Meter→Sign→Settle→Verify

Before an agent, benchmark run, or workflow can incur cost, it receives a signed budget capability. Each billable action produces a signed, hash-chained spend receipt. When the task completes, receipts are aggregated into a Merkle-rooted settlement. Any party can independently verify any receipt or settlement without trusting the runner or the platform.

Budget Capability

A budget capability is a signed permission granting an agent a maximum spend within defined scope and time bounds.

json

{
  "schema": "proofmeter.capability.v1",
  "capability_id": "cap_01HX...",
  "namespace_id": "ns_benchd",
  "authorized_agent_id": "benchd-runner",
  "max_budget_cents": 500,
  "currency": "USD",
  "scope": {
    "allowed_providers": ["openai", "anthropic"],
    "allowed_endpoint_classes": ["chat", "embedding"]
  },
  "expires_at": "2026-05-18T00:00:00Z",
  "metadata": {
    "task_id": "run_abc123",
    "source": "benchd-harness"
  },
  "signature": {
    "algorithm": "Ed25519",
    "key_id": "key_01HX...",
    "value": "base64..."
  }
}

max_budget_cents — Hard spending limit in cents. Enforced per-receipt.

scope — Restricts which providers and endpoint classes the agent may use.

expires_at — Capability becomes invalid after this timestamp.

Design principle: Usage is fact. Cost is derived.

Receipts attest to provable facts — tokens consumed, provider called, model used, timestamps. Cost is a derived computationthat depends on who's computing it (list price, enterprise discount, internal chargeback). The receipt never claims to know what the customer actually paid. Different parties can compute different cost views from the same provable token counts.

Usage Receipt

Every API/LLM call produces a receipt with two distinct sections: proven usage (signed, verifiable forever) and an optional cost estimate (derived from a declared pricing table, re-computable by anyone).

json

{
  "schema": "proofmeter.receipt.v1.1",
  "receipt_id": "rcpt_01HX...",
  "namespace_id": "ns_benchd",
  "actor_id": "benchd-runner",
  "capability_id": "cap_01HX...",
  "task_id": "run_abc123",

  "proven_usage": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "endpoint_class": "chat",
    "input_tokens": 1200,
    "output_tokens": 300,
    "total_tokens": 1500,
    "latency_ms": 842,
    "occurred_at": "2026-05-17T14:30:00Z"
  },

  "cost_estimate": {
    "estimated_cost_usd": 0.00033,
    "cost_confidence": "estimated",
    "pricing_basis": "list_price",
    "pricing_version": "2026-05",
    "note": "Derived from public list prices. Actual billed amount may differ."
  },

  "metadata": {
    "question_id": "q_014",
    "benchmark": "reliability",
    "adapter": "verifiedstate"
  },

  "chain": {
    "previous_hash": "sha256:abc123...",
    "event_hash": "sha256:def456..."
  },
  "signature": {
    "algorithm": "Ed25519",
    "key_id": "key_01HX...",
    "value": "base64..."
  }
}

proven_usage — Signed, verifiable facts from the API response. This never changes.

cost_estimate — Derived from a declared pricing table. Can be recomputed by anyone with a different pricing table. Explicitly labeled as an estimate.

cost_confidence — One of: estimated (list prices), customer_supplied (their rates), invoice_reconciled (matched to billing), usage_only (no cost calculated).

Hash Chaining

Receipts are hash-chained: each receipt's event_hashis computed over the canonical JSON of the receipt payload plus the previous receipt's hash. This creates a tamper-evident chain — modifying or removing any receipt breaks the chain for all subsequent receipts.

text

Receipt 1:  event_hash = SHA-256(canonical(payload) + "null")
Receipt 2:  event_hash = SHA-256(canonical(payload) + receipt_1.event_hash)
Receipt 3:  event_hash = SHA-256(canonical(payload) + receipt_2.event_hash)
...
Settlement: merkle_root = Merkle(all event_hashes)

Settlement

When a task completes, all receipts are settled into a Merkle-rooted batch. The settlement is the final audit record for the task.

json

{
  "schema": "proofmeter.settlement.v1.1",
  "settlement_id": "stl_01HX...",
  "namespace_id": "ns_benchd",
  "task_id": "run_abc123",
  "capability_id": "cap_01HX...",
  "receipt_count": 294,

  "proven_totals": {
    "total_input_tokens": 352800,
    "total_output_tokens": 88200,
    "total_tokens": 441000
  },

  "cost_estimate": {
    "estimated_total_usd": "0.2646",
    "cost_confidence": "estimated",
    "pricing_basis": "list_price",
    "pricing_version": "2026-05"
  },

  "merkle_root": "sha256:...",
  "signature": {
    "algorithm": "Ed25519",
    "key_id": "key_01HX...",
    "value": "base64..."
  }
}

Signing scheme

Algorithm: Ed25519

Canonicalization: JCS (RFC 8785) — deterministic JSON serialization

Hash: SHA-256 over canonical JSON bytes

Signature: Ed25519 sign over the SHA-256 hash

Key format: Hex-encoded 32-byte public keys

The same signing scheme used by the Bench'd harness for benchmark manifests. A single manifest may carry both a harness signature (proving the score) and ProofMeter receipts (proving the cost).

Verification

Any party can verify a receipt or settlement without trusting the runner:

Re-canonicalize the receipt payload using JCS
Recompute SHA-256 hash of canonical bytes + previous_hash
Verify Ed25519 signature against the computed hash
Check that event_hash matches the recomputed hash (chain integrity)
For settlements: verify Merkle root against all receipt hashes
Check budget: total spend across receipts ≤ capability max_budget_cents

MCP integration

ProofMeter is accessible as MCP tools. Any MCP-compatible agent can authorize budgets, record spend, and verify receipts:

Tool	Action
meter_authorize	Create budget capability
meter_spend	Record spend event, get signed receipt
meter_budget	Check remaining budget
meter_settle	Settle receipts into Merkle batch
meter_verify	Verify receipt signature and chain
meter_receipts	List and filter receipts

Receipt examples by pricing mode

The same usage event looks different depending on the pricing mode. The proven_usage section is identical in all cases — only the cost section changes.

1. Usage-only (no cost)

json

{
  "proven_usage": { "provider": "openai", "model": "gpt-4o", "input_tokens": 1200, "output_tokens": 300 },
  "cost_estimate": { "estimated_cost_usd": null, "cost_confidence": "usage_only" }
}

2. Public estimated (default)

json

{
  "proven_usage": { "provider": "openai", "model": "gpt-4o", "input_tokens": 1200, "output_tokens": 300 },
  "cost_estimate": {
    "estimated_cost_usd": 0.006,
    "cost_confidence": "estimated",
    "pricing_basis": "public_estimate",
    "price_book_id": "proofmeter_public_2026_05",
    "price_book_hash": "sha256:8cfb89..."
  }
}

3. Customer price book (private rates)

json

{
  "proven_usage": { "provider": "openai", "model": "gpt-4o", "input_tokens": 1200, "output_tokens": 300 },
  "cost_estimate": {
    "estimated_cost_usd": 0.003,
    "cost_confidence": "customer_supplied",
    "pricing_basis": "customer_price_book",
    "price_book_id": "acme_openai_q2_2026",
    "price_book_hash": "sha256:a6f9bf..."
  }
}

Note: Raw rates are excluded from shared receipts when rates_private: true.

4. Invoice-reconciled (future)

json

{
  "proven_usage": { "provider": "openai", "model": "gpt-4o", "input_tokens": 1200, "output_tokens": 300 },
  "cost_estimate": {
    "estimated_cost_usd": 0.003,
    "cost_confidence": "invoice_reconciled",
    "pricing_basis": "customer_price_book",
    "reconciliation_status": "reconciled",
    "invoice_reference": "inv_2026_05_openai"
  }
}

Usage in benchd-harness

bash

# Run a benchmark with $5 budget and spend tracking
benchd run -a verifiedstate -b reliability --budget 5.00

# The signed manifest will include a proofmeter section:
# - total_spend_usd
# - receipt_count
# - per-model cost breakdown
# - settlement_merkle_root
# - settlement_signature

Reference implementation

The reference implementation ships with benchd-harness on PyPI as the benchd_harness.proofmeter submodule. It can be imported independently:

python

from benchd_harness.proofmeter import ProofMeterClient

client = ProofMeterClient(api_key="vs_live_...", namespace_id="ns_...")
client.connect()

budget = client.authorize_budget(
    agent_id="my-agent",
    max_budget_cents=500,
)

receipt = client.record_spend(
    actor_id="my-agent",
    provider_id="openai",
    usage_unit="tokens",
    usage_quantity=1500,
    cost_cents=2,
    capability_id=budget.capability_id,
)

settlement = client.settle(capability_id=budget.capability_id)

Non-goals

ProofMeter does not prevent fraud or validate request reasonableness.
ProofMeter does not enforce budgets at the provider layer — only at the meter.
ProofMeter does not validate vendor honesty about token counts (providers do not sign responses).
ProofMeter does not match invoices unless the customer provides reconciliation data.
ProofMeter does not claim to know actual billed cost — only estimated cost under a declared pricing basis.
ProofMeter does not provide regulatory compliance (SOC 2, GDPR, HIPAA) on its own.
ProofMeter does not attest to timestamp accuracy beyond signer assertion.

See Trust Boundaries for the full analysis of what ProofMeter proves, estimates, and does not prove.

Stable URL: benchd.ai/methodology/receipt-spec
Version: 1.1 | Protocol by VerifiedState. Patent Pending.
See also: Trust Boundaries (what ProofMeter proves and does not prove)

Receipt Specification

Core loop

Budget Capability

Design principle: Usage is fact. Cost is derived.

Usage Receipt

Hash Chaining

Settlement

Signing scheme

Verification

MCP integration

Receipt examples by pricing mode

1. Usage-only (no cost)

2. Public estimated (default)

3. Customer price book (private rates)

4. Invoice-reconciled (future)

Usage in benchd-harness

Reference implementation

Non-goals

Command Palette