this blog is me talking to myself. i started building sonar because i wanted to understand zk coprocessors from the inside out. and i'm writing this down because if i don't, i'll forget all the silly mistakes i made. if you're someone who actually wants to build this stuff, you might find it useful. and if you're just here for a buzzword to paste into a deck, you'll be bored in two paragraphs. your call.

TL;DR

built a Solana-native zk coprocessor around an Anchor program, the on-chain app, a Rust off-chain pipeline, and SP1, the proving engine that creates the proof receipts
on-chain request/callback/refund loop exists, plus a real CPI SDK, CLI, indexer, prover, and coordinator
historical_avg works end-to-end as the strongest vertical slice
the hardest part was not "how do i verify a proof on-chain?". it was everything around that: orchestration, data enrichment, proving ergonomics, and hardware reality
current state is hardened devnet-grade, not production-grade, and the present pause is about the proving operating model, not because the architecture itself collapsed

prologue
what you are reading
a quick translation layer for the non-blockchain folks
part i: architecture and theory
part ii: the journey and lessons
part iii: how to build your own
epilogue

prologue: the question that got under my skin

there are certain ideas in systems engineering that sound almost annoyingly obvious once you hear them.

this was one of them.

the question was simple:

what if a Solana program, meaning an app that lives on the blockchain itself, could ask a question that is too expensive, too state-heavy, or too time-consuming to answer on-chain, and still get back an answer it can trust?

not "trust me bro, i computed it off-chain." or "here is an API response, pinky promise it is honest." i mean a result that comes back with a proof, basically a tamper-resistant receipt saying the work was really done correctly, gets verified, and then becomes usable by another on-chain program.

that is the whole emotional center of Sonar.

because smart contracts, which are really just public rulebooks that run on a blockchain, are incredible. but they are also tiny. intentionally tiny.

they are constrained by compute budgets, transaction limits, account access rules, serialization overhead, and the very reasonable fact that blockchains are not supposed to be general-purpose supercomputers.

and yet, the applications people want to build keep asking bigger questions:

what was the historical average balance of this account across a slot range?
can i prove that a complex off-chain computation happened correctly?
can i verify a result without rerunning the whole thing on-chain?
can another program use that answer without trusting an operator?

that is where coprocessors enter the story.

so i built Sonar.

simply because this felt like one of those problems where if i did not build it myself, i was going to keep thinking about it.

and now, here we are.

what you are reading

this is a deep dive into how Sonar works, what zk coprocessors actually do, how most Solana zk coprocessors are structured under the hood, what we got right, what we got wrong, what hurt, what was standard boring plumbing, what was genuinely hard, and why the project is currently paused where it is.

it is written to be beginner-friendly, but not shallow.

if you have never heard of a zk coprocessor before, you should understand the core model by the end.

if you already know the space, the parts that should be worth your time are:

the trust-boundary design
the request -> queue -> prove -> callback pipeline
the on-chain/off-chain split
the proving-path failures we ran into on real hardware
the difference between a real prototype and a production-ready system

in short, Sonar is:

a Solana-native zk coprocessor prototype
built with an Anchor program plus a Rust off-chain pipeline
backed by SP1, the proving engine, for proof generation and groth16-solana, the compact proof checker, for on-chain verification
equipped with a real CPI SDK, CLI, indexer, coordinator, prover, deploy automation, benchmarks, CI, and baseline observability
strongest today as one real vertical slice: historical_avg

disclaimer: this is not a copy-paste tutorial.

the goal is not to hand you a premade architecture diagram and call it a day.
the goal is to help you actually understand why these systems exist, why they are structured the way they are, and what tradeoffs show up the moment you try to build one for real.

so yeah. buckle up, because this is gonna be a long one (maybe even longer than any of my write-ups before this.)

skip ahead: if you just want the architecture, jump to chapter 7. if you want the scars, tradeoffs, and lessons, jump to chapter 15 or chapter 20. if you want the build-your-own playbook, jump to part iii.

a quick translation layer for the non-blockchain folks

if some of the words above already felt like random syllables, this section is gonna be your cheat sheet.

skip ahead: if you already speak blockchain and just want the meat, jump to chapter 1 or straight to chapter 7.

for the next few sections, imagine Sonar as a very fast public records office with a front desk, a back office, a filing room, and a specialist proof desk. you do not need to memorize any of this. just keep the picture in your head and the rest of the read gets much easier.

blockchain: the shared master ledger that many computers keep in sync. in the records-office picture, this is the public book that every clerk agrees on.
Solana: the specific blockchain Sonar is built on. in the picture, this is the particular records office Sonar works inside.
program / smart contract: code that lives on the blockchain and runs by fixed rules. in the picture, this is the office rulebook that tells the clerks exactly what they are allowed to do.
on-chain: written into the blockchain's official record. in the picture, this means the paperwork has been stamped into the public book.
off-chain: happening outside that official record, on ordinary servers or laptops. in the picture, this is work done in the back office before the final stamped update goes into the public book.
transaction: a signed request asking the blockchain to do something. in the picture, this is the filled-out form you hand to the front desk.
account: a piece of stored data on Solana. in the picture, this is a labeled folder in the filing room.
PDA (program-derived address): a predictable account address a program can control without a human private key. in the picture, this is a folder slot whose label is generated from a recipe in the rulebook, so everyone knows where it belongs.
cryptography: the math that makes forgery hard and tampering obvious. in the picture, this is the tamper-evident seal on the paperwork.
proof: a compact cryptographic receipt showing some work was done correctly. in the picture, this is a sealed certificate from the specialist desk saying the back-office work really checked out.
zero-knowledge proof: a special kind of proof that lets the chain check the result without replaying the whole computation, and sometimes without revealing every private detail. in the picture, the clerk trusts the sealed certificate instead of rerunning the whole investigation.
verifier: the checker that tests whether a proof is valid. in the picture, this is the clerk at the counter who inspects the seal and decides whether to accept the certificate.
callback: a follow-up action after an async job finishes. in the picture, this is the office calling you back once your stamped folder is ready.
indexer: a service that reorganizes blockchain data so it is easy to search. in the picture, this is the records clerk who builds a usable catalog from a huge pile of folders.
CPI (cross-program invocation): one blockchain program calling another blockchain program. in the picture, this is one desk forwarding your stamped form to another desk in the same building.
Anchor: a framework that makes Solana development easier. in the picture, this is the office template kit that keeps every form and workflow from being handwritten chaos.
zkVM: a virtual machine whose execution can later be proved. in the picture, this is the sealed workshop in the back office where a job is run and a certificate comes out.
SP1: the particular proving engine Sonar uses. in the picture, this is the specific specialist workshop Sonar sends jobs to when it needs a certificate.
Groth16: the especially compact proof format Sonar brings back to chain because it is cheap to verify. in the picture, this is the short official certificate format the front desk can inspect quickly.
RPC: the request interface apps use to talk to a blockchain node. in the picture, this is the public front desk window.
WebSocket: a live connection that streams updates as they happen. in the picture, this is leaving the phone line open so the office can speak to you continuously instead of making you call back every few seconds.
finality: the point where a transaction is settled enough to trust. in the picture, this is when the office has stamped and filed the form so it is no longer just pending on someone's desk.

part i: architecture and theory

chapter 1: what a zk coprocessor actually is

let's strip away the buzzwords first.

a coprocessor is just a system that does work outside the main processor because that work is expensive, specialized, or inconvenient to do in the main execution environment.

your laptop already does this all the time.

the CPU does one class of work. the GPU does another. the network card does another. the TPM does another.

a zk coprocessor is the blockchain version of that idea.

the chain does not run the expensive computation itself.

instead:

the chain records a request
an off-chain worker performs the computation
that worker generates a proof that the computation was done correctly
the chain verifies the proof succinctly
the chain accepts the result and lets downstream programs use it

the key word here is succinctly.

if you are brand new to crypto, here is the simple version. cryptography is just math used to make forgery hard and tampering obvious. a zero-knowledge proof is a special kind of mathematical certificate. and succinctly just means the certificate is small and cheap to check.

the whole point of zero-knowledge proof systems in this context is not only privacy. in many coprocessor systems, privacy is not even the main feature.

the main feature is that the blockchain can verify something much cheaper than recomputing it itself.

so the mental model i like is this:

a zk coprocessor is simply an off-chain execution engine with an on-chain trust boundary.

that is it.

just simple outsourced execution + cryptographic verification, meaning the chain trusts the math-backed certificate instead of trusting the operator's word.

chapter 2: why Solana needs this at all

Solana is fast. Solana is parallel. Solana is cheap compared to many other chains. if you are brand new to this stuff, read "Solana" as "the particular blockchain network this whole system lives on."

but even then, there is a lot it is not supposed to do.

that is not a flaw. it is the design.

Solana programs run under hard constraints:

bounded compute units, which are basically a per-transaction work budget, like a shot clock
strict account access patterns, meaning the transaction has to say up front which data folders it plans to touch
no arbitrary historical-state queries from inside a program
no appetite for giant, slow, stateful computations in the middle of a transaction

so if you try to do something like:

scan a huge historical window of account balance data
run a large zkVM guest, meaning a program inside a prove-able virtual machine
aggregate data across many blocks
execute a heavyweight reduction over off-chain indexed state

you hit a wall VERY QUICKLY.

that does NOT mean the application idea is bad.

it simply means the chain is telling you, correctly, that the computation belongs somewhere else.

this is the exact niche a coprocessor fills.

the chain remains the place where state transitions are authorized.

the coprocessor becomes the place where heavyweight computation happens.

then a proof bridges the two.

chapter 3: how other Solana zk coprocessors usually work

this is important, because Sonar is not some alien architecture that came from a UFO. it belongs to a family.

most Solana zk coprocessors, regardless of branding, end up looking like variants of the same pipeline:

an on-chain program records a request
that request is observed by some off-chain service
the service gathers inputs or enrichment data
a prover executes a computation in a zkVM or proving system
the service submits a proof and result back to chain
an on-chain verifier accepts or rejects that proof
some callback or state update makes the result useful to another program

where these systems differ is usually on three axes.

axis 1: where proving happens

some systems assume:

a local prover
a prover marketplace
a centralized operator cluster
a remote proving service hidden behind an API

axis 2: where verification happens

some systems verify:

directly on Solana
on another settlement layer and then bridge a commitment back
through a hybrid model where some trust assumption remains off-chain

axis 3: how generic the request model is

some systems are:

fixed-function, one computation, one proof shape
semi-generic, where multiple computation IDs are supported
marketplace-style, where verifiers and programs are registered dynamically

Sonar sits in the middle of that spectrum.

it is not a one-off special-purpose proof gadget.

but it is also not trying to be a universal arbitrary-computation marketplace yet.

the on-chain verifier registry is dynamic.

the off-chain computation registry is still explicitly implemented in code.

that distinction matters a lot.

chapter 4: the problem Sonar is actually solving

if i had to phrase Sonar's main job in one sentence, it would be this:

make off-chain computation feel like a verifiable async primitive that another Solana program can safely depend on.

that breaks into four sub-problems:

how does a program ask for work?
how does off-chain infra know what to compute?
how does the chain know the result is valid?
how does another on-chain program use that result without trusting an operator?

the simplest possible (bad) answer is:

send an HTTP request off-chain
let a server compute a result
have a privileged signer write the result back on-chain

that can be useful, 100%.

but it is also not the trust model i wanted.

Sonar's answer is stricter:

requests are explicit PDAs
results are explicit PDAs
verifier material is explicit on-chain state
callbacks only happen after proof verification
refunds exist when liveness fails

that means the system splits nicely into two domains:

correctness, which is protected by the on-chain verifier and account constraints
liveness, which is handled by off-chain services that may fail without silently corrupting state

this is one of the biggest design wins in the whole repository.

chapter 5: why the whole thing has to be asynchronous

there is no world where a Solana transaction says:

"yo, go spin up a prover, run a zkVM guest, maybe fetch account history from an indexer (but only if you feel like it), wrap the result into Groth16, and get back to me before this instruction returns. your time starts now."

that is just not how these systems work.

so the request lifecycle has to be asynchronous.

in Sonar, the shape is:

user or CPI caller submits request
program creates request/result state and emits structured logs
off-chain workers observe and process the job
prover generates proof + result
callback worker submits callback
program verifies proof, writes result, invokes consumer callback, pays prover
if the deadline passes first, payer can call refund

the key insight is this:
refund is not a side feature. it is part of the contract with reality.

because off-chain systems fail.

Redis fails. provers fail. indexers fall behind. RPC endpoints, meaning the front desks apps use to talk to the chain, misbehave. GPU machines crash. proofs take too long.

if your architecture does not include a clean timeout and refund story, then i'm sorry to say this but you, my friend, do not have a robust coprocessor. what you have is instead a very, VERY optimistic science fair demo.

chapter 6: the trust model you actually want

whenever i talk about coprocessors, i think it is worth separating two kinds of trust.

trusted for correctness

in Sonar, these are the things that should determine whether a result is valid:

the on-chain program logic
the verifier registry state
the proof verification path
Solana finality

trusted for liveness, not correctness

these are the things that determine whether the result shows up on time at all:

coordinator availability
Redis availability
prover availability
indexer freshness
operator competence

that split is the difference between a system that fails loudly and a system that quietly lies (and we all hate liars, don't we?)

so, for that very reason, Sonar is explicitly trying to fail in the first way, not the second.

skip ahead: if the theory already clicked and you just want Sonar's actual moving parts, start here with chapter 7. if you want the lived experience instead, jump to part ii.

chapter 7: the bird's-eye view

Sonar today is easiest to understand as seven pieces:

1. Anchor program            -> the trust boundary
2. SDK + CLI                 -> the developer surface
3. Coordinator listener      -> sees requests
4. Indexer + API             -> provides enrichment data
5. Redis queues              -> decouples orchestration
6. Prover                    -> computes + proves
7. Callback worker           -> returns verified results on-chain

the lifecycle looks like this:

there is nothing crazy exotic about this pipeline.

and THAT is a feature.

one of the easiest ways to ruin infra is to decide that every part of it must also be novel.

Sonar is novel where it needs to be novel.

everywhere else, it simply tries to be legible.

chapter 8: the on-chain program, the part that actually matters

the Anchor program is the heart of the correctness model. Anchor, for the non-Solana readers, is just the developer framework around the actual blockchain program. think power tools, not raw scrap metal.

it supports four core instructions:

register_verifier
request
callback
refund

that is the whole protocol surface.

`register_verifier`

this creates a verifier registry PDA keyed by computation_id and stores Groth16 verifying-key material on-chain.

a PDA is just a predictable program-owned account.
in the records-office picture, it is a folder slot whose label is generated from the office rulebook instead of chosen by hand. a computation_id is just the label for a particular kind of job. a verifier is the checker for proofs, and Groth16 is the compact certificate format Sonar uses here because it is efficient to verify on-chain.

this matters because the proof system is not just a blob attached to a result. the verifier is explicit protocol state.

`request`

this is where a caller asks Sonar to do work.

it creates two PDAs:

RequestMetadata
ResultAccount

if the locker analogy helps, one locker remembers what was asked and the other is where the finished answer gets placed.

it also escrows the fee and emits structured logs.

those logs are not a debugging detail. they are part of the system design.

the coordinator learns about work by watching program logs, then decoding:

request ID
raw inputs
callback account metadata

the logs are, in practice, the event bus between the chain and the off-chain workers.

`callback`

a callback is just the second trip back after the off-chain work is done. think of dropping clothes at a tailor and later coming back for the finished suit.

this is the critical instruction.

it:

checks the request is still pending
checks the deadline has not passed
checks the verifier registry matches the request's computation ID
verifies the Groth16 proof
validates result size
marks request completed
invokes the consumer callback program
pays the prover out of escrow

the trust boundary lives here.

`refund`

if the deadline passes and the request is still pending, the payer gets the fee back.

again: this is not a "nice-to-have". it's a core piece of the system.

the data model

there are three important account types.

RequestMetadata tracks:

request ID
payer
callback program
computation ID
deadline
fee
status
bump

ResultAccount tracks:

request ID
result bytes
whether result has been written
write slot

VerifierRegistry tracks:

computation ID
authority
Groth16 verifying-key material
bump

this is simple on purpose.

the program is not trying to be a general database of every off-chain event imaginable.

it is trying to own the minimal state needed for verifiable async computation.

chapter 9: the developer surface, because protocol is not enough

one of the easiest mistakes in infra projects is to think that the protocol itself is the product.

it is not.

the developer surface is the product.

Sonar already ships two real developer-facing pieces:

the CPI SDK

CPI stands for cross-program invocation, which is just one blockchain program calling another. imagine one office in a government building forwarding your paperwork to the next office down the hall.

crates/sdk gives downstream Anchor programs an ergonomic request helper.

instead of every consumer re-deriving Sonar PDAs by hand and manually constructing CPI accounts, the SDK bundles that into a safer API that:

carries the caller-chosen request_id
validates the request and result PDA addresses
forwards signer seeds and remaining accounts
calls the Sonar request CPI cleanly

this is exactly the kind of thing that sounds small until you do not have it and every integration becomes a bug farm.

the CLI

crates/cli gives you sonar-cli register.

its job is to:

hash an ELF to derive computation_id
resolve verifier artifacts, meaning the packaged files the on-chain checker needs
perform integrity checking
construct and submit register_verifier

this is good product hygiene.

because "dynamic verifier registration exists in theory" is not actually useful until someone can operate it without copy-pasting brittle one-off scripts.

chapter 10: the coordinator, the thing that hears the chain talk

the coordinator is really two jobs wearing one hat.

job 1: log listener

the listener subscribes to Sonar program logs over Solana WebSocket. a WebSocket is just a live feed, like keeping a phone line open so updates arrive immediately instead of repeatedly asking "anything new yet?"

when it sees a request log, it decodes the request metadata and raw inputs.

for simple computations, that may be enough to form a prover job immediately.

for richer computations, it may need to enrich those inputs with off-chain data first.

job 2: callback worker

later, after the prover returns a ProverResponse, a callback worker consumes that response and submits the on-chain callback transaction.

this split is important.

request discovery and callback submission are related, but they are NOT the same problem.

the listener is about finding work.

the callback worker is about finalizing work safely.

the queueing model

the coordinator/prover boundary uses Redis.

Redis here is basically a very fast shared inbox. the coordinator drops jobs in, the prover picks them up, and neither side has to be welded directly to the other.

it is another one of boring decisions i tend to trust early in a project.

you want:

simple job dispatch
decoupling between services
a queue you can inspect
easy local and devnet operation

Redis gives you that cheaply.

production-grade durability and replay semantics are still future work. but for a prototype, this is a perfectly sane baseline.

chapter 11: the indexer, because chains do not remember history the same way you do

this is where Sonar becomes more interesting than a toy proof verifier.

the strongest real vertical slice in the repo today is historical_avg.

that computation is intentionally not something the program can just do from on-chain state in one instruction.

it needs account history across a slot range.

Solana programs do not get to ask the chain:

"hey, what did this account look like over the last N slots?"

so Sonar has an indexer stack. an indexer is just the system that reorganizes chain data into something searchable instead of leaving it as a giant messy timeline:

a Geyser plugin, basically Solana's raw account-update firehose, writes account updates into Postgres, which is just a standard database
an Axum HTTP API, meaning a normal web server layer, exposes account-history lookups
the coordinator uses that API to enrich historical_avg jobs before they hit the prover

the special path looks like this:

client submits request whose raw inputs encode (pubkey, from_slot, to_slot)
coordinator parses those inputs from logs
coordinator fetches balance history from the indexer API
returned balances become prover inputs
prover computes the average and produces a proof/result bundle
callback writes the final value on-chain

this is one of the best demonstrations of what a real coprocessor is for.

it is not just "do math off-chain." it is:

combine chain-triggered requests
with indexed off-chain data
and a proof-backed return path

that is real infrastructure.

chapter 12: the prover, where the heavy stuff actually lives

the prover resolves computations through an internal registry, builds and runs SP1 guests, wraps proofs into Groth16 when needed, and can export verifier artifacts.

in plain english, the prover is the machine that does the heavy homework and then prints the mathematical receipt proving it did not cheat. SP1 is the particular proving engine Sonar uses, and a guest is just the program running inside that proving engine.

the important current nuance is this:

on-chain verification is generic over registered verifier material
off-chain proving is only available for computations that Sonar's prover registry actually implements

in plain english, the chain can be taught to check different kinds of certificate formats, but the prover still needs explicit support for each kind of job it knows how to run.

that means Sonar today is dynamically verifiable but not yet arbitrarily computable.

that is a subtle but important difference.

how the proving flow works

at a high level:

prover consumes ProverJob from Redis
resolves computation by computation_id
runs the matching SP1 guest
generates proof + public inputs + result bundle
publishes ProverResponse back to Redis

public inputs are just the small set of values the chain is allowed to look at while checking the proof receipt.

the service is deliberately structured so the expensive proof generation runs in a blocking task rather than choking the async reactor, basically meaning the heavy work is pushed onto its own lane so it does not freeze the rest of the service.

it also logs periodic heartbeats while a long proof is still running. that sounds trivial, but it is not. once proofs get slow, silence is operational poison.

what computations exist today

there are really two categories in the repo right now:

historical_avg, the strongest end-to-end slice
a simpler fibonacci/demo proving path used for proof generation and verifier/artifact flows

that is enough to prove the architecture.

it is not yet a broad coprocessor catalog.

chapter 13: the callback worker, the part that makes the result usable

once the prover has done its job, the callback worker turns a proof artifact into an on-chain effect.

that means:

consume ProverResponse
construct callback instruction data
pass callback accounts back in the expected shape
submit transaction

then the program takes over:

verify proof
write result
invoke the consumer callback program
pay the prover

this is the piece that makes Sonar feel like a coprocessor rather than just a proof factory.

because the goal is not only to prove something.

the goal is to make that proof-backed result become actionable state for another program.

chapter 14: the operational surface, because repos do not become systems just like that

Sonar already ships more operational structure than many "alpha infra" repos do.

it has:

docker-compose.prod.yml for Postgres, Redis, coordinator, prover, Prometheus, and Grafana
scripts/deploy-devnet.sh for repeatable devnet deployment
scripts/devnet-smoke-bench.sh for reproducible remote-devnet smoke + benchmark runs
scripts/local-ci.sh plus .actrc for local GitHub Actions rehearsal
CI, cargo audit, cargo deny, gitleaks, and Criterion benches

while all this does sound super cool, that does not mean it is production-ready.

what it means is that the repo already crossed the line from "interesting architecture sketch" into "someone is trying to run this like a system."

the current observability story is still baseline-only:

metrics scraping exists
Grafana service exists
dashboards, alerts, tracing, SLOs, and runbooks do not yet

that is an important distinction.
baseline observability is not operational maturity. it is the seed of operational maturity.

part ii: the journey and lessons

skip ahead: if you only want the hardest pain points, jump to chapter 17. if you want the honest timeline, jump to chapter 24. if you want the builder playbook, jump to part iii.

chapter 15: what we genuinely did well

there are a few decisions in Sonar that i still think were exactly right.

1. we built a real vertical slice instead of an abstract platform first

this matters a lot.

it is very easy to spend months building a generic coprocessor framework that does nothing end to end.

Sonar chose to make one vertical slice real.

historical_avg is not a toy in the sense that it exercises:

request submission
log ingestion
enrichment via an indexer API
off-chain computation
proof generation
callback verification
final on-chain result handling

that is exactly the right order to discover reality.

2. we separated correctness from liveness

this is what i believe to be one of the core architecture wins.

if the off-chain stack dies, Sonar should get stuck or refund. it should not silently produce incorrect final state.

that is the right failure model.

3. we made the developer surface real early

the SDK and CLI are not stubs.

that is huge.

because protocols with no ergonomic integration path are just chores disguised as infra.

4. we hardened failure paths instead of pretending the happy path was enough

recent work added:

failure-path tests
prover startup preflights
benchmark honesty improvements
wallet balance preflights for devnet runs
explicit dead-letter handling for failed callback responses

this is exactly the kind of work that makes a prototype less glamorous and much more real.

chapter 16: what was standard, boring, and absolutely fine

not every part of a good system needs to be clever.

in fact, i would argue most parts should not be.

Sonar uses a lot of very standard machinery:

Redis queues
Axum HTTP APIs
Postgres for indexed data
Prometheus/Grafana baseline
Anchor for the on-chain program
simple service decomposition across coordinator/prover/indexer

and that is good.

the cryptography is already hard enough. there is no prize for making your queue architecture mysterious too.

if anything, one lesson from this repo is that standard boring plumbing is a superpower when the difficult parts of your project are already difficult.

chapter 17: where it got painful

this is the part where the project stopped being a nice architecture diagram and became a real system.

the deepest pain point was the proving operating model.

not the abstract proving model.

the actual, physical, annoying, machine-level one.

the laptop problem

full local CPU SP1 Groth16 proving is heavy.

really heavy.

that means on normal laptops you run into:

memory pressure
swap dependence
long opaque waits
proof startup costs that dominate short benchmark runs

so we hardened the CPU path with things like:

lower-memory defaults
worker-count limits
proving-key caching
fail-fast memory headroom checks

that helped. but it did not make physics go away.

the cache problem

then there was the beautiful little nightmare where the local SP1 Groth16 artifact cache looked like it existed, but was actually incomplete.

the directory contained only a truncated artifacts.tar.gz.

so the benchmark would wait and wait and wait, because from the outside it looked like "the prover is still running," when in reality the callback was never going to arrive.

that ended in a startup preflight too:

if the cache directory exists but the extracted Groth16 artifacts are missing, fail immediately

again: truth before optimism.

the CUDA detour

then came the obvious thought.

fine. if CPU proving is miserable on a laptop, take CPU out and use the GPU.

this is where infrastructure teaches humility.

first blocker:

the cached sp1-gpu-server binary was linked against libcudart.so.12
the host exposed CUDA 13 userspace
so the server would not even start

we fixed that by rebuilding sp1-gpu-server@6.0.2 locally so it linked against the installed libcudart.so.13.

for a minute, it looked like we had finally won.

then the real blocker appeared.

SP1's full CUDA prover on this path requires at least 24 GiB VRAM.

my laptop GPU here reports about 8 GiB.

so the final truth was:

CUDA 12 mismatch: solvable
laptop GPU size: not solvable by cleverness

that was a genuinely useful result.

because it turned a fuzzy performance problem into a crisp architectural fact:

full local CUDA proving is not the path on this machine.

and once you know that, you can stop lying to yourself about what the next move should be.

chapter 18: what we could have done better

there are a few things i would absolutely do earlier if starting over.

1. decide the proving operating model earlier

we had the protocol story before we had a fully honest answer to:

"where do the expensive proofs actually run in a way that is repeatable for operators?"

that question should have been elevated earlier.

because it shapes benchmarking, devnet workflows, local DX, and eventually production topology.

2. formalize operations earlier

the repo has baseline observability and automation, which is good.

but the missing pieces are the boring adult ones:

dashboards
alerts
runbooks
recovery procedures
staging promotion model

those are the things that make you trust a system where you're crunched with time, not just at demo time.

3. package the indexer operating model more explicitly

right now the prod-oriented Compose topology assumes an external indexer service.

that is a perfectly legitimate interim decision.

it is still an incomplete operator story.

4. broaden computations later, not sooner

this one is less regret and more reaffirmation.

the temptation to add many computations early is strong.

but honestly, if we had done that, i think we would have had a broader but less honest system.

the right call was depth first.

still, the cost of that decision is obvious: the product surface remains narrow today.

chapter 19: why the project is paused right here

the important thing is this:

Sonar is not paused because the core architecture failed.

it is paused because the next unanswered question is a bigger one than another patch or another unit test.

that question is:

what is the real proving deployment model?

the current facts are:

the request/callback architecture works
the vertical slice works
the devnet-grade prototype is real
full local laptop CPU proving is still painful
full local CUDA proving is not viable on this laptop due VRAM floor

so the likely next paths are:

keep the hardened CPU lane for local development only
evaluate narrower acceleration paths like groth16-cuda
move full proving to a remote machine with enough RAM and at least 24 GiB VRAM

that is not a bad place to pause.

it is actually a very honest one.

because from here onward, the work is less about inventing the core system and more about choosing the right operator model for it.

skip ahead: if you mostly want the distilled lessons and the honest accounting, start with chapter 20, chapter 24, or chapter 25.

chapter 20: the hardest part was not the zero-knowledge part

this is probably the most important lesson in the entire repo.

the hardest part was not the algebra.

it was not "how do i conceptually verify a proof on-chain?"

the hardest part was the actual engineering around it:

how requests get modeled
how off-chain work gets discovered
where enrichment data comes from
how to fail safely
how to benchmark honestly
how to make local development not completely miserable
how to separate product truth from operator wishful thinking

the proof system is the sharpest-looking part.

but to be very honest, it is not the whole iceberg.

chapter 21: one real slice beats ten speculative ones

Sonar is not broad yet.

that is true.

but what it has is one real slice and the infrastructure around it.

that is far more valuable than ten hypothetical integrations with no actual proving or callback pipeline behind them.

if you are building infra, depth is not the enemy of platform vision. depth is what stops platform vision from being a fantasy.

chapter 22: standard engineering discipline matters more than aesthetic novelty

CI matters.

benchmarks matter.

deploy scripts matter.

dead-letter handling matters.

wallet balance preflights matter.

artifact cache validation matters.

the part of infra work that looks least glamorous from outside is usually the part that distinguishes a repo that survives contact with reality from one that only survives screenshots.

chapter 23: the current repo is substantial, but not done

this is the honest inventory.

Sonar today is:

a serious prototype
a hardened devnet-grade system
a real vertical slice
a coherent architecture

it is not yet:

a production-ready proving platform
a mature governance system
a polished operator product
a broad computation marketplace

the remaining work is mostly not about new primitives.

it is about productionization.

that means:

proving strategy
operational maturity
verifier governance
economics and backpressure
recovery/runbooks/alerts
more computations later

which, honestly, is exactly where a good prototype should be.

chapter 24: how long this actually took

one thing that blogs like this often hide is the clock.

so here is the honest version.

if i go by the git history, the first commit in this repo lands on 27-03-2026.

from there, the core system came together very quickly:

27-03-2026 to 31-03-2026: repo scaffold, docs, CI, Anchor setup, prover service foundations, coordinator listener, indexer skeleton, and the first historical average path
05-04-2026: a huge amount of the serious protocol surface snaps into place, including on-chain verifier registry work, registry-backed proof verification, the CPI helper, verifier artifact tooling, and the registration CLI
11-04-2026 to 19-04-2026: production-ish compose setup, devnet deployment automation, callback-flow hardening, and the reproducible devnet smoke benchmark flow
22-04-2026 to 23-04-2026: the angry realism phase, where proving ergonomics, long waits, cache issues, benchmark honesty, low-memory CPU behavior, and unsupported CUDA hosts all get fail-fast handling

so if i am being fair, the answer is roughly this:

about 3 weeks to get from empty repo to a real end-to-end system with a legitimate vertical slice
just under 4 weeks to get to the hardened devnet-grade shape this post is describing

that is the build timeline.

the debugging timeline is funnier.

the concentrated "why is this prover being so cursed" spiral was basically 2 days of focused work, centered around the 22-04-2026 to 23-04-2026 commit cluster.

and the annoying punchline is that this was not purely a software bug hunt.

some of it really was software:

broken runtime expectations
incomplete Groth16 cache state
benchmarks waiting too long before admitting a run was dead

but after fixing the software-side traps, the final wall was hardware reality:

local CPU proving on a normal laptop is rough on memory
the rebuilt CUDA path could launch, but the laptop GPU still only had about 8 GiB VRAM
the full SP1 CUDA proving path wanted 24 GiB VRAM

so yes, we spent about two days debugging something that ended with a very physical answer: the machine was simply below the comfortable proving floor for the path we were trying to run.

honestly, i am glad that happened.

because it forced the repo to become more truthful.

chapter 25: where i think the next real progress comes from

if i had to prioritize the next moves, i would do them in this order.

1. settle the proving execution model

be explicit.

is full proving:

local CPU for development only?
remote GPU in the real path?
hybridized somehow?

make that answer architectural, not emotional.

2. productionize operations

dashboards

alerts

runbooks

recovery

staging path

this is where systems become credible.

3. formalize governance and economics

authority rotation

verifier lifecycle

artifact provenance

fee policy

admission control

that is what turns "working" into "safe to rely on."

4. broaden computations after the operator path is stable

not before.

part iii: how to build your own

skip ahead: if you only want the trust-boundary design, jump to chapter 27. if you only want proving and ops gotchas, jump to chapter 31. if you only want benchmarking advice, jump to chapter 33.

chapter 26: start with one computation, not a platform manifesto

if you want to build a zk coprocessor, do not start with:

"i am building a generalized decentralized proving marketplace for arbitrary off-chain compute."

that sentence is usually a sign that you are about to write an architecture doc instead of a system.

start with one computation that forces the architecture to be real.

ideally it should require at least one of these:

more compute than is comfortable on-chain
historical or indexed data access
proof-backed correctness
callback semantics that another program can use

for Sonar, that was historical_avg.

that single slice forced us to build:

request state
result state
verifier registry
coordinator ingestion
enrichment API
prover
callback worker
SDK integration

that is exactly what you want.

pick one computation that drags the whole architecture into the real world.

chapter 27: define the trust boundary before writing the queue

this is probably the single most important design step.

before you choose Redis or Postgres or Kafka or NATS or a homegrown worker framework, answer these questions:

what makes a result correct?
what only affects liveness?
what can fail without corrupting chain state?
what must be represented explicitly on-chain?

if you cannot answer those, you are not designing a zk coprocessor yet. you are just drawing boxes.

for Sonar, the answer became:

correctness is on-chain verifier + proof path + account constraints
liveness is everything else

that single choice simplified almost every later decision.

chapter 28: make `refund` a first-class citizen

seriously.

do this before you feel like doing it.

because if your system is asynchronous and involves off-chain services, failure is not hypothetical.

it is certain.

the only question is whether the user experience when it happens is graceful or embarrassing.

the basic pattern is:

escrow fee at request time
set deadline
if callback never arrives, let payer reclaim funds

that turns off-chain failure from a protocol failure into a liveness failure with a recovery path.

huge difference.

chapter 29: logs are not an implementation detail, they are your bridge

in systems like this, event emission is basically the membrane between on-chain and off-chain worlds.

make those logs structured, deterministic, and most importantly, easy to parse.

Sonar emits log lines for:

request ID
inputs
callback account metadata

that gives the coordinator everything it needs to reconstruct work without scraping the entire transaction in a brittle way.

if you are building your own system, do not bury the off-chain trigger inside vague human-readable logs meant only for debugging.

make them part of the protocol.

chapter 30: treat enrichment as a deliberate subsystem

many useful computations need data the chain does not expose conveniently inside a program.

that means you will probably need some form of indexer.

once that is true, acknowledge it.

build the narrowest API that solves the problem.

in Sonar, the indexer surface is intentionally tiny:

account-history lookups only

that is good.

do not build a giant generalized data warehouse API just because you can.

the more surface area you add, the more trust, ops, caching, and freshness guarantees you now own.

chapter 31: keep the prover path brutally honest

this is where many systems lie.

not deliberately. just through wishful thinking.

examples of lies:

"the proof is still running" when startup already failed
"the cache exists" when it is actually corrupted
"CUDA is supported" when the server binary cannot even launch
"the GPU path is available" when the card does not meet the prover's minimum VRAM

the only sane response is aggressive startup preflight.

Sonar ended up with exactly that mindset.

if the environment is not viable, fail immediately and explain why.

that is better than waiting 5,000 seconds to discover the run was dead on arrival (this happened to me btw).

chapter 32: build layered validation, not one giant heroic test

the right test stack for systems like this is layered.

you want:

fast unit tests
deterministic fixture-backed integration paths
one strong end-to-end slice
explicit heavyweight proof smoke tests, not default-every-time torture

Sonar eventually settled into something like:

cheap unit coverage
Anchor integration coverage
deterministic fixture-backed e2e for local stack behavior
opt-in expensive SP1 smoke
devnet benchmark wrappers for real-world runs

that is the right idea.

because the fastest way to destroy iteration speed is to make every small correctness check depend on the slowest and flakiest part of your proving pipeline.

chapter 33: do not confuse benchmark activity with benchmark truth

one of the nastier infra traps is a benchmark that appears busy and therefore feels meaningful.

in Sonar, the real lesson was:

request submitted is not the same as callback completed.

the only benchmark result that matters for a coprocessor is one tied to actual completion of the lifecycle you care about.

that means:

proof generation succeeded
callback landed
result account changed as expected
cleanup happened if appropriate

anything less is just animated logging.

epilogue: if you want to understand zk coprocessors, build one

not necessarily this one.

not necessarily with SP1.

not necessarily on Solana even.

but build one.

because once you do, a bunch of abstract phrases stop being abstract.

"off-chain execution with on-chain verification" stops sounding profound and starts sounding like a very practical split of responsibilities.

"proof generation pipeline" stops sounding like a single component and starts sounding like cache validation, hardware constraints, process orchestration, and startup honesty.

"developer platform" stops sounding like a lofty goal and starts sounding like a CPI helper, a CLI, a deploy script, an indexer API, and a callback that actually lands.

that is what Sonar gave me.

not just a repo. not just a prototype. not just a set of passing tests.

it gave me a much sharper model of where blockchains stop, where coprocessors begin, and how much of the real work lives in the glue between them.

and i think that is the right note to end on.

because the interesting thing about zk coprocessors is not that they make blockchains magical.

it is that they let blockchains stay small, strict, and honest while still answering bigger questions.

that, to me, is the whole point.

Sonar is paused, not dead. if you're an infra engineer with a GPU‑rich environment and want to help, the repo is open. otherwise, i'll be back when the proving model is settled.

if you made it this far, thank you for reading.

repo: github.com/bit2swaz/sonar

one line to end on: the proof only matters if the system around it is honest too.

table of contents

prologue: the question that got under my skin

what you are reading

a quick translation layer for the non-blockchain folks

part i: architecture and theory

chapter 1: what a zk coprocessor actually is

chapter 2: why Solana needs this at all

chapter 3: how other Solana zk coprocessors usually work

axis 1: where proving happens

axis 2: where verification happens

axis 3: how generic the request model is

chapter 4: the problem Sonar is actually solving

chapter 5: why the whole thing has to be asynchronous

chapter 6: the trust model you actually want

trusted for correctness

trusted for liveness, not correctness

chapter 7: the bird's-eye view

chapter 8: the on-chain program, the part that actually matters

register_verifier

request

callback

refund

the data model

chapter 9: the developer surface, because protocol is not enough

the CPI SDK

the CLI

chapter 10: the coordinator, the thing that hears the chain talk

job 1: log listener

job 2: callback worker

the queueing model

chapter 11: the indexer, because chains do not remember history the same way you do

chapter 12: the prover, where the heavy stuff actually lives

how the proving flow works

what computations exist today

chapter 13: the callback worker, the part that makes the result usable

chapter 14: the operational surface, because repos do not become systems just like that

part ii: the journey and lessons

chapter 15: what we genuinely did well

1. we built a real vertical slice instead of an abstract platform first

2. we separated correctness from liveness

3. we made the developer surface real early

4. we hardened failure paths instead of pretending the happy path was enough

chapter 16: what was standard, boring, and absolutely fine

chapter 17: where it got painful

the laptop problem

the cache problem

the CUDA detour

chapter 18: what we could have done better

1. decide the proving operating model earlier

2. formalize operations earlier

3. package the indexer operating model more explicitly

4. broaden computations later, not sooner

chapter 19: why the project is paused right here

chapter 20: the hardest part was not the zero-knowledge part

chapter 21: one real slice beats ten speculative ones

chapter 22: standard engineering discipline matters more than aesthetic novelty

chapter 23: the current repo is substantial, but not done

chapter 24: how long this actually took

chapter 25: where i think the next real progress comes from

1. settle the proving execution model

2. productionize operations

3. formalize governance and economics

4. broaden computations after the operator path is stable

part iii: how to build your own

chapter 26: start with one computation, not a platform manifesto

chapter 27: define the trust boundary before writing the queue

chapter 28: make refund a first-class citizen

chapter 29: logs are not an implementation detail, they are your bridge

chapter 30: treat enrichment as a deliberate subsystem

chapter 31: keep the prover path brutally honest

chapter 32: build layered validation, not one giant heroic test

chapter 33: do not confuse benchmark activity with benchmark truth

epilogue: if you want to understand zk coprocessors, build one

`register_verifier`

`request`

`callback`

`refund`

chapter 28: make `refund` a first-class citizen