Skip to content

Performance

Proving time depends almost entirely on circuit shape. This page tells you how to measure your circuit honestly, what dimensions matter, and where the benchmark suite lives.

DimensionEffect
Witness countThe dominant scaling factor for proving time. The witness vector is what gets committed via Merkle tree, and the prover’s main work (witness solve, commitment, sumcheck) is over the witness space. R1CS constraint count grows alongside witness count for most circuits but doesn’t drive proving time directly.
Merkle commitment hash (--hash)Sets the hash used for WHIR’s Merkle commitments. sha256 and blake3 are the fastest in proving thanks to hardware acceleration. skyscraper is the default because it’s BN254-friendly: slower to prove natively, but dramatically cheaper inside the Groth16 recursive wrap. keccak and poseidon2 are also available for specific interop scenarios.
Witness layer countWitness builders execute in layers (see Proving flow). Deep layer graphs add coordination overhead.
CPU architectureaarch64 benefits from SIMD-accelerated BN254 arithmetic in skyscraper/core. x86_64 falls back to portable arithmetic.
ParallelismProving uses Rayon. More cores help up to the parallelism inherent in the circuit. WASM threading depends on SharedArrayBuffer.
Host memoryMobile FFI hosts can swap to disk via pk_configure_memory(...). File-backed mmap allocation is slower than RAM but unlocks larger circuits.

The CLI prints span timings and memory statistics through its tracing layer. The simplest measurement:

Terminal window
cargo run --release --bin provekit-cli -- prove --prover circuit.pkp

Inspect the structured timing output it prints. For finer-grained profiling, build with the Tracy feature:

Terminal window
cargo run --release --features tracy --bin provekit-cli -- --tracy prove

For repeatable timing comparisons, the provekit-bench crate in tooling/provekit-bench/ ships a divan bench harness over the poseidon-rounds example, with separate benches for prover-key read, prove, prove-with-IO, and verify. Use it as a template for benchmarking your own circuits by pointing the benches at additional .pkp / .pkv / proof artifacts.

Two CLI commands tell you what you’re about to prove:

Terminal window
# R1CS structure and ACIR statistics.
cargo run --release --bin provekit-cli -- circuit-stats target/<circuit>.json
# Postcard-encoded byte sizes per prover-key component, plus an R1CS sub-breakdown
# (Interner, Matrix A/B/C) and the bytes saved by column-delta encoding.
cargo run --release --bin provekit-cli -- analyze-pkp <circuit>.pkp

Use circuit-stats to confirm witness and constraint counts match your expectations before committing to a host. A circuit that fits comfortably on a server may exceed practical proving time on mobile.

noir-examples/csp-benchmarks/ contains the Ethproofs CSP benchmarks, a standardized suite of client-side proving targets used to compare proof systems on common workloads.

TargetCircuit sizesImplementation note
SHA-256128, 256, 512, 1024, 2048 bytesUses noir-lang/sha256::sha256_var, lowering compression through Noir’s SHA-256 blackbox.
Keccak-256128, 256, 512, 1024, 2048 bytesNative Noir Keccak circuit with a witness-focused u32 lane representation.
Poseidon2, 4, 8, 12, 16 field elementsnoir-lang/poseidon BN254 native Noir helpers.
Poseidon22, 4, 8, 12, 16 field elementsTaceoLabs/noir-poseidon for states 2, 8, 12, 16; state 4 intentionally exercises Noir’s Poseidon2 blackbox.
ECDSAsecp256r1 over a 32-byte digestzkpassport/noir-ecdsa native P-256 verification (P-256 blackbox is not yet lowered by ProveKit).

To run any benchmark target:

Terminal window
cd noir-examples/csp-benchmarks/sha256_512
cargo run --release --bin provekit-cli -- prepare
cargo run --release --bin provekit-cli -- prove
cargo run --release --bin provekit-cli -- verify

Combine that with the CLI’s timing output (or the provekit-bench harness) to capture proving time, verification time, and memory for each target on your machine.

The fundamentals don’t change between hosts, but resource constraints do:

  • Native Rust on a workstation. The reference platform. Smallest measured proving time, largest available memory.
  • WASM in a browser. Slower than native, the proof system runs single-threaded unless SharedArrayBuffer is available, and JavaScript marshalling adds overhead at the boundaries.
  • WASM in Node.js. Closer to native than browser WASM, but still single-process unless you orchestrate workers externally.
  • iOS / Android via FFI. Bounded by device RAM unless you configure pk_configure_memory for file-backed mmap. Modern phones can prove non-trivial credential circuits on-device; budget memory carefully.
  • Verifier server. Verification dominates. Concurrency is configurable through VERIFIER_SEMAPHORE_LIMIT; the default of one keeps memory usage predictable.

The Go/gnark recursive verifier wraps a WHIR proof inside a Groth16 proof for on-chain settlement. The wrapper has two costs:

  • One-time setup: trusted-setup ceremony for the outer Groth16 circuit, producing the recursive proving and verifying keys. Run once per recursive-verifier R1CS shape.
  • Per-proof wrap: a Groth16 proving run over the WHIR verifier R1CS. Typically the largest single step in an on-chain workflow, dwarfing the base proving time for small circuits.

Measure both costs separately when benchmarking on-chain end-to-end latency.

If proving is slower than you need:

  1. Run circuit-stats. Confirm witness and constraint counts match expectations. Unexpected blowups in witness count are the strongest signal of accidentally-quadratic constraint generation.
  2. Pick --hash for your settlement path. skyscraper (default) is optimal when you’re wrapping the proof with Groth16 for on-chain verification. If you only verify off-chain, sha256 or blake3 will prove faster thanks to hardware acceleration.
  3. Audit black-box vs native lowerings. Some Noir black boxes (SHA-256, Keccak) are heavier in ProveKit than their native R1CS implementations. The CSP benchmarks call this out explicitly.
  4. Profile with Tracy. Run cargo run --release --features tracy --bin provekit-cli -- --tracy prove and inspect span timings. Look for layers that dominate the witness solve.