Performance
Proving time depends almost entirely on circuit shape. This page tells you how to measure your circuit honestly, what dimensions matter, and where the benchmark suite lives.
What drives performance
Section titled “What drives performance”| Dimension | Effect |
|---|---|
| Witness count | The dominant scaling factor for proving time. The witness vector is what gets committed via Merkle tree, and the prover’s main work (witness solve, commitment, sumcheck) is over the witness space. R1CS constraint count grows alongside witness count for most circuits but doesn’t drive proving time directly. |
Merkle commitment hash (--hash) | Sets the hash used for WHIR’s Merkle commitments. sha256 and blake3 are the fastest in proving thanks to hardware acceleration. skyscraper is the default because it’s BN254-friendly: slower to prove natively, but dramatically cheaper inside the Groth16 recursive wrap. keccak and poseidon2 are also available for specific interop scenarios. |
| Witness layer count | Witness builders execute in layers (see Proving flow). Deep layer graphs add coordination overhead. |
| CPU architecture | aarch64 benefits from SIMD-accelerated BN254 arithmetic in skyscraper/core. x86_64 falls back to portable arithmetic. |
| Parallelism | Proving uses Rayon. More cores help up to the parallelism inherent in the circuit. WASM threading depends on SharedArrayBuffer. |
| Host memory | Mobile FFI hosts can swap to disk via pk_configure_memory(...). File-backed mmap allocation is slower than RAM but unlocks larger circuits. |
Measuring your circuit
Section titled “Measuring your circuit”The CLI prints span timings and memory statistics through its tracing layer. The simplest measurement:
cargo run --release --bin provekit-cli -- prove --prover circuit.pkpInspect the structured timing output it prints. For finer-grained profiling, build with the Tracy feature:
cargo run --release --features tracy --bin provekit-cli -- --tracy proveFor repeatable timing comparisons, the provekit-bench crate in tooling/provekit-bench/ ships a divan bench harness over the poseidon-rounds example, with separate benches for prover-key read, prove, prove-with-IO, and verify. Use it as a template for benchmarking your own circuits by pointing the benches at additional .pkp / .pkv / proof artifacts.
Inspecting the circuit before proving
Section titled “Inspecting the circuit before proving”Two CLI commands tell you what you’re about to prove:
# R1CS structure and ACIR statistics.cargo run --release --bin provekit-cli -- circuit-stats target/<circuit>.json
# Postcard-encoded byte sizes per prover-key component, plus an R1CS sub-breakdown# (Interner, Matrix A/B/C) and the bytes saved by column-delta encoding.cargo run --release --bin provekit-cli -- analyze-pkp <circuit>.pkpUse circuit-stats to confirm witness and constraint counts match your expectations before committing to a host. A circuit that fits comfortably on a server may exceed practical proving time on mobile.
The ProveKit benchmark suite
Section titled “The ProveKit benchmark suite”noir-examples/csp-benchmarks/ contains the Ethproofs CSP benchmarks, a standardized suite of client-side proving targets used to compare proof systems on common workloads.
| Target | Circuit sizes | Implementation note |
|---|---|---|
| SHA-256 | 128, 256, 512, 1024, 2048 bytes | Uses noir-lang/sha256::sha256_var, lowering compression through Noir’s SHA-256 blackbox. |
| Keccak-256 | 128, 256, 512, 1024, 2048 bytes | Native Noir Keccak circuit with a witness-focused u32 lane representation. |
| Poseidon | 2, 4, 8, 12, 16 field elements | noir-lang/poseidon BN254 native Noir helpers. |
| Poseidon2 | 2, 4, 8, 12, 16 field elements | TaceoLabs/noir-poseidon for states 2, 8, 12, 16; state 4 intentionally exercises Noir’s Poseidon2 blackbox. |
| ECDSA | secp256r1 over a 32-byte digest | zkpassport/noir-ecdsa native P-256 verification (P-256 blackbox is not yet lowered by ProveKit). |
To run any benchmark target:
cd noir-examples/csp-benchmarks/sha256_512cargo run --release --bin provekit-cli -- preparecargo run --release --bin provekit-cli -- provecargo run --release --bin provekit-cli -- verifyCombine that with the CLI’s timing output (or the provekit-bench harness) to capture proving time, verification time, and memory for each target on your machine.
What to expect across hosts
Section titled “What to expect across hosts”The fundamentals don’t change between hosts, but resource constraints do:
- Native Rust on a workstation. The reference platform. Smallest measured proving time, largest available memory.
- WASM in a browser. Slower than native, the proof system runs single-threaded unless
SharedArrayBufferis available, and JavaScript marshalling adds overhead at the boundaries. - WASM in Node.js. Closer to native than browser WASM, but still single-process unless you orchestrate workers externally.
- iOS / Android via FFI. Bounded by device RAM unless you configure
pk_configure_memoryfor file-backed mmap. Modern phones can prove non-trivial credential circuits on-device; budget memory carefully. - Verifier server. Verification dominates. Concurrency is configurable through
VERIFIER_SEMAPHORE_LIMIT; the default of one keeps memory usage predictable.
Recursive verification cost
Section titled “Recursive verification cost”The Go/gnark recursive verifier wraps a WHIR proof inside a Groth16 proof for on-chain settlement. The wrapper has two costs:
- One-time setup: trusted-setup ceremony for the outer Groth16 circuit, producing the recursive proving and verifying keys. Run once per recursive-verifier R1CS shape.
- Per-proof wrap: a Groth16 proving run over the WHIR verifier R1CS. Typically the largest single step in an on-chain workflow, dwarfing the base proving time for small circuits.
Measure both costs separately when benchmarking on-chain end-to-end latency.
Optimization checklist
Section titled “Optimization checklist”If proving is slower than you need:
- Run
circuit-stats. Confirm witness and constraint counts match expectations. Unexpected blowups in witness count are the strongest signal of accidentally-quadratic constraint generation. - Pick
--hashfor your settlement path.skyscraper(default) is optimal when you’re wrapping the proof with Groth16 for on-chain verification. If you only verify off-chain,sha256orblake3will prove faster thanks to hardware acceleration. - Audit black-box vs native lowerings. Some Noir black boxes (SHA-256, Keccak) are heavier in ProveKit than their native R1CS implementations. The CSP benchmarks call this out explicitly.
- Profile with Tracy. Run
cargo run --release --features tracy --bin provekit-cli -- --tracy proveand inspect span timings. Look for layers that dominate the witness solve.
Related pages
Section titled “Related pages”- CLI reference, flags for
circuit-stats,analyze-pkp, and Tracy profiling. - Examples catalog, circuits you can benchmark against.
- Proving flow, the conceptual pipeline being measured.