Invoice extraction benchmark kit

Invoice extraction benchmarking, packaged.

Compare invoice extraction approaches with a packaged benchmarking kit that includes ground truth, evaluator, reports, and a clear protocol. Run it locally.

What it is

  • packaged invoice extraction benchmark kits for US English invoices
  • built for local evaluation, regression checks, and internal demos
  • field and line-item extraction benchmark

Inside the benchmark kit

Representative screenshots from the benchmark kit.

Minimal / clean Representative minimal invoice screenshot from the benchmark pack
Representative clean invoice Minimal layout from the benchmark kit.
Dense / clean Representative dense invoice screenshot from the benchmark pack
Structured layout example Denser table-oriented layout from the kit.
Light noisy Representative light-noisy invoice screenshot from the benchmark pack
Light-noisy example Simulated scan noise applied to the same layout.
Report preview Sample benchmark report summary preview
Sample benchmark report Core metrics and coverage summary from the evaluator output.
Benchmark card Benchmark card overview preview
Benchmark card overview Compact summary of scope, coverage, and limitations.

How it works

Three steps, local workflow.

1. Run your extractor

Use the included invoice PDFs with your model, OCR pipeline, or parser.

2. Score predictions

Run the local evaluator (standalone Python script, no dependencies) against the included labels and inspect the report.

3. Compare changes

Reuse the same benchmark when you update prompts, models, or parsing logic.

You can build an invoice benchmark yourself. Most teams spend days on document generation, label alignment, and evaluator edge cases before running a single comparison. This benchmarking kit ships ready to score.

Pricing

Three tiers. One workflow.

Free Sample

$0

Inspect the kit and evaluator workflow. Direct download.

Full

$990

Broader coverage for repeated evaluation and higher-confidence comparison work.

Limitations

What this is not for.

Next step

Try the free sample first.

Start with the free sample. Move to Starter when you want the main paid benchmark for real evaluation work.

Need a custom benchmark or dataset? Get in touch — we build benchmarks and datasets for any document type or domain.