sonar: building a zk coprocessor on solana
this blog is me talking to myself. i started building sonar because i wanted to understand zk coprocessors from the inside out. and i'm writing this down because if i don't, i'll forget all the silly mistakes i made. if you're someone who actually wants to build this stuff, you might find it useful. and if you're just here for a buzzword to paste into a deck, you'll be bored in two paragraphs. your call.
TL;DR
- built a Solana-native zk coprocessor around an Anchor program, the on-chain app, a Rust off-chain pipeline, and SP1, the proving engine that creates the proof receipts
- on-chain request/callback/refund loop exists, plus a real CPI SDK, CLI, indexer, prover, and coordinator
historical_avgworks end-to-end as the strongest vertical slice- the hardest part was not "how do i verify a proof on-chain?". it was everything around that: orchestration, data enrichment, proving ergonomics, and hardware reality
- current state is hardened devnet-grade, not production-grade, and the present pause is about the proving operating model, not because the architecture itself collapsed
table of contents
- prologue
- what you are reading
- a quick translation layer for the non-blockchain folks
- part i: architecture and theory
- chapter 1: what a zk coprocessor actually is
- chapter 2: why Solana needs this at all
- chapter 3: how other Solana zk coprocessors usually work
- chapter 4: the problem Sonar is actually solving
- chapter 5: why the whole thing has to be asynchronous
- chapter 6: the trust model you actually want
- chapter 7: the bird's-eye view
- chapter 8: the on-chain program, the part that actually matters
- chapter 9: the developer surface, because protocol is not enough
- chapter 10: the coordinator, the thing that hears the chain talk
- chapter 11: the indexer, because chains do not remember history the same way you do
- chapter 12: the prover, where the heavy stuff actually lives
- chapter 13: the callback worker, the part that makes the result usable
- chapter 14: the operational surface, because repos do not become systems just like that
- part ii: the journey and lessons
- chapter 15: what we genuinely did well
- chapter 16: what was standard, boring, and absolutely fine
- chapter 17: where it got painful
- chapter 18: what we could have done better
- chapter 19: why the project is paused right here
- chapter 20: the hardest part was not the zero-knowledge part
- chapter 21: one real slice beats ten speculative ones
- chapter 22: standard engineering discipline matters more than aesthetic novelty
- chapter 23: the current repo is substantial, but not done
- chapter 24: how long this actually took
- chapter 25: where i think the next real progress comes from
- part iii: how to build your own
- chapter 26: start with one computation, not a platform manifesto
- chapter 27: define the trust boundary before writing the queue
- chapter 28: make
refunda first-class citizen - chapter 29: logs are not an implementation detail, they are your bridge
- chapter 30: treat enrichment as a deliberate subsystem
- chapter 31: keep the prover path brutally honest
- chapter 32: build layered validation, not one giant heroic test
- chapter 33: do not confuse benchmark activity with benchmark truth
- epilogue
prologue: the question that got under my skin
there are certain ideas in systems engineering that sound almost annoyingly obvious once you hear them.
this was one of them.
the question was simple:
what if a Solana program, meaning an app that lives on the blockchain itself, could ask a question that is too expensive, too state-heavy, or too time-consuming to answer on-chain, and still get back an answer it can trust?
not "trust me bro, i computed it off-chain." or "here is an API response, pinky promise it is honest." i mean a result that comes back with a proof, basically a tamper-resistant receipt saying the work was really done correctly, gets verified, and then becomes usable by another on-chain program.
that is the whole emotional center of Sonar.
because smart contracts, which are really just public rulebooks that run on a blockchain, are incredible. but they are also tiny. intentionally tiny.
they are constrained by compute budgets, transaction limits, account access rules, serialization overhead, and the very reasonable fact that blockchains are not supposed to be general-purpose supercomputers.
and yet, the applications people want to build keep asking bigger questions:
- what was the historical average balance of this account across a slot range?
- can i prove that a complex off-chain computation happened correctly?
- can i verify a result without rerunning the whole thing on-chain?
- can another program use that answer without trusting an operator?
that is where coprocessors enter the story.
so i built Sonar.
simply because this felt like one of those problems where if i did not build it myself, i was going to keep thinking about it.
and now, here we are.
what you are reading
this is a deep dive into how Sonar works, what zk coprocessors actually do, how most Solana zk coprocessors are structured under the hood, what we got right, what we got wrong, what hurt, what was standard boring plumbing, what was genuinely hard, and why the project is currently paused where it is.
it is written to be beginner-friendly, but not shallow.
if you have never heard of a zk coprocessor before, you should understand the core model by the end.
if you already know the space, the parts that should be worth your time are:
- the trust-boundary design
- the request -> queue -> prove -> callback pipeline
- the on-chain/off-chain split
- the proving-path failures we ran into on real hardware
- the difference between a real prototype and a production-ready system
in short, Sonar is:
- a Solana-native zk coprocessor prototype
- built with an Anchor program plus a Rust off-chain pipeline
- backed by SP1, the proving engine, for proof generation and
groth16-solana, the compact proof checker, for on-chain verification - equipped with a real CPI SDK, CLI, indexer, coordinator, prover, deploy automation, benchmarks, CI, and baseline observability
- strongest today as one real vertical slice:
historical_avg
disclaimer: this is not a copy-paste tutorial.
the goal is not to hand you a premade architecture diagram and call it a day.
the goal is to help you actually understand why these systems exist, why they are structured the way they are, and what tradeoffs show up the moment you try to build one for real.
so yeah. buckle up, because this is gonna be a long one (maybe even longer than any of my write-ups before this.)
skip ahead: if you just want the architecture, jump to chapter 7. if you want the scars, tradeoffs, and lessons, jump to chapter 15 or chapter 20. if you want the build-your-own playbook, jump to part iii.
a quick translation layer for the non-blockchain folks
if some of the words above already felt like random syllables, this section is gonna be your cheat sheet.
skip ahead: if you already speak blockchain and just want the meat, jump to chapter 1 or straight to chapter 7.
for the next few sections, imagine Sonar as a very fast public records office with a front desk, a back office, a filing room, and a specialist proof desk. you do not need to memorize any of this. just keep the picture in your head and the rest of the read gets much easier.
- blockchain: the shared master ledger that many computers keep in sync. in the records-office picture, this is the public book that every clerk agrees on.
- Solana: the specific blockchain Sonar is built on. in the picture, this is the particular records office Sonar works inside.
- program / smart contract: code that lives on the blockchain and runs by fixed rules. in the picture, this is the office rulebook that tells the clerks exactly what they are allowed to do.
- on-chain: written into the blockchain's official record. in the picture, this means the paperwork has been stamped into the public book.
- off-chain: happening outside that official record, on ordinary servers or laptops. in the picture, this is work done in the back office before the final stamped update goes into the public book.
- transaction: a signed request asking the blockchain to do something. in the picture, this is the filled-out form you hand to the front desk.
- account: a piece of stored data on Solana. in the picture, this is a labeled folder in the filing room.
- PDA (program-derived address): a predictable account address a program can control without a human private key. in the picture, this is a folder slot whose label is generated from a recipe in the rulebook, so everyone knows where it belongs.
- cryptography: the math that makes forgery hard and tampering obvious. in the picture, this is the tamper-evident seal on the paperwork.
- proof: a compact cryptographic receipt showing some work was done correctly. in the picture, this is a sealed certificate from the specialist desk saying the back-office work really checked out.
- zero-knowledge proof: a special kind of proof that lets the chain check the result without replaying the whole computation, and sometimes without revealing every private detail. in the picture, the clerk trusts the sealed certificate instead of rerunning the whole investigation.
- verifier: the checker that tests whether a proof is valid. in the picture, this is the clerk at the counter who inspects the seal and decides whether to accept the certificate.
- callback: a follow-up action after an async job finishes. in the picture, this is the office calling you back once your stamped folder is ready.
- indexer: a service that reorganizes blockchain data so it is easy to search. in the picture, this is the records clerk who builds a usable catalog from a huge pile of folders.
- CPI (cross-program invocation): one blockchain program calling another blockchain program. in the picture, this is one desk forwarding your stamped form to another desk in the same building.
- Anchor: a framework that makes Solana development easier. in the picture, this is the office template kit that keeps every form and workflow from being handwritten chaos.
- zkVM: a virtual machine whose execution can later be proved. in the picture, this is the sealed workshop in the back office where a job is run and a certificate comes out.
- SP1: the particular proving engine Sonar uses. in the picture, this is the specific specialist workshop Sonar sends jobs to when it needs a certificate.
- Groth16: the especially compact proof format Sonar brings back to chain because it is cheap to verify. in the picture, this is the short official certificate format the front desk can inspect quickly.
- RPC: the request interface apps use to talk to a blockchain node. in the picture, this is the public front desk window.
- WebSocket: a live connection that streams updates as they happen. in the picture, this is leaving the phone line open so the office can speak to you continuously instead of making you call back every few seconds.
- finality: the point where a transaction is settled enough to trust. in the picture, this is when the office has stamped and filed the form so it is no longer just pending on someone's desk.
part i: architecture and theory
chapter 1: what a zk coprocessor actually is
let's strip away the buzzwords first.
a coprocessor is just a system that does work outside the main processor because that work is expensive, specialized, or inconvenient to do in the main execution environment.
your laptop already does this all the time.
the CPU does one class of work. the GPU does another. the network card does another. the TPM does another.
a zk coprocessor is the blockchain version of that idea.
the chain does not run the expensive computation itself.
instead:
- the chain records a request
- an off-chain worker performs the computation
- that worker generates a proof that the computation was done correctly
- the chain verifies the proof succinctly
- the chain accepts the result and lets downstream programs use it
the key word here is succinctly.
if you are brand new to crypto, here is the simple version. cryptography is just math used to make forgery hard and tampering obvious. a zero-knowledge proof is a special kind of mathematical certificate. and succinctly just means the certificate is small and cheap to check.
the whole point of zero-knowledge proof systems in this context is not only privacy. in many coprocessor systems, privacy is not even the main feature.
the main feature is that the blockchain can verify something much cheaper than recomputing it itself.
so the mental model i like is this:
a zk coprocessor is simply an off-chain execution engine with an on-chain trust boundary.
that is it.
just simple outsourced execution + cryptographic verification, meaning the chain trusts the math-backed certificate instead of trusting the operator's word.
chapter 2: why Solana needs this at all
Solana is fast. Solana is parallel. Solana is cheap compared to many other chains. if you are brand new to this stuff, read "Solana" as "the particular blockchain network this whole system lives on."
but even then, there is a lot it is not supposed to do.
that is not a flaw. it is the design.
Solana programs run under hard constraints:
- bounded compute units, which are basically a per-transaction work budget, like a shot clock
- strict account access patterns, meaning the transaction has to say up front which data folders it plans to touch
- no arbitrary historical-state queries from inside a program
- no appetite for giant, slow, stateful computations in the middle of a transaction
so if you try to do something like:
- scan a huge historical window of account balance data
- run a large zkVM guest, meaning a program inside a prove-able virtual machine
- aggregate data across many blocks
- execute a heavyweight reduction over off-chain indexed state
you hit a wall VERY QUICKLY.
that does NOT mean the application idea is bad.
it simply means the chain is telling you, correctly, that the computation belongs somewhere else.
this is the exact niche a coprocessor fills.
the chain remains the place where state transitions are authorized.
the coprocessor becomes the place where heavyweight computation happens.
then a proof bridges the two.
chapter 3: how other Solana zk coprocessors usually work
this is important, because Sonar is not some alien architecture that came from a UFO. it belongs to a family.
most Solana zk coprocessors, regardless of branding, end up looking like variants of the same pipeline:
- an on-chain program records a request
- that request is observed by some off-chain service
- the service gathers inputs or enrichment data
- a prover executes a computation in a zkVM or proving system
- the service submits a proof and result back to chain
- an on-chain verifier accepts or rejects that proof
- some callback or state update makes the result useful to another program
where these systems differ is usually on three axes.
axis 1: where proving happens
some systems assume:
- a local prover
- a prover marketplace
- a centralized operator cluster
- a remote proving service hidden behind an API
axis 2: where verification happens
some systems verify:
- directly on Solana
- on another settlement layer and then bridge a commitment back
- through a hybrid model where some trust assumption remains off-chain
axis 3: how generic the request model is
some systems are:
- fixed-function, one computation, one proof shape
- semi-generic, where multiple computation IDs are supported
- marketplace-style, where verifiers and programs are registered dynamically
Sonar sits in the middle of that spectrum.
it is not a one-off special-purpose proof gadget.
but it is also not trying to be a universal arbitrary-computation marketplace yet.
the on-chain verifier registry is dynamic.
the off-chain computation registry is still explicitly implemented in code.
that distinction matters a lot.
chapter 4: the problem Sonar is actually solving
if i had to phrase Sonar's main job in one sentence, it would be this:
make off-chain computation feel like a verifiable async primitive that another Solana program can safely depend on.
that breaks into four sub-problems:
- how does a program ask for work?
- how does off-chain infra know what to compute?
- how does the chain know the result is valid?
- how does another on-chain program use that result without trusting an operator?
the simplest possible (bad) answer is:
- send an HTTP request off-chain
- let a server compute a result
- have a privileged signer write the result back on-chain
that can be useful, 100%.
but it is also not the trust model i wanted.
Sonar's answer is stricter:
- requests are explicit PDAs
- results are explicit PDAs
- verifier material is explicit on-chain state
- callbacks only happen after proof verification
- refunds exist when liveness fails
that means the system splits nicely into two domains:
- correctness, which is protected by the on-chain verifier and account constraints
- liveness, which is handled by off-chain services that may fail without silently corrupting state
this is one of the biggest design wins in the whole repository.
chapter 5: why the whole thing has to be asynchronous
there is no world where a Solana transaction says:
"yo, go spin up a prover, run a zkVM guest, maybe fetch account history from an indexer (but only if you feel like it), wrap the result into Groth16, and get back to me before this instruction returns. your time starts now."
that is just not how these systems work.
so the request lifecycle has to be asynchronous.
in Sonar, the shape is:
- user or CPI caller submits
request - program creates request/result state and emits structured logs
- off-chain workers observe and process the job
- prover generates proof + result
- callback worker submits
callback - program verifies proof, writes result, invokes consumer callback, pays prover
- if the deadline passes first, payer can call
refund
the key insight is this:
refund is not a side feature. it is part of the contract with reality.
because off-chain systems fail.
Redis fails. provers fail. indexers fall behind. RPC endpoints, meaning the front desks apps use to talk to the chain, misbehave. GPU machines crash. proofs take too long.
if your architecture does not include a clean timeout and refund story, then i'm sorry to say this but you, my friend, do not have a robust coprocessor. what you have is instead a very, VERY optimistic science fair demo.
chapter 6: the trust model you actually want
whenever i talk about coprocessors, i think it is worth separating two kinds of trust.
trusted for correctness
in Sonar, these are the things that should determine whether a result is valid:
- the on-chain program logic
- the verifier registry state
- the proof verification path
- Solana finality
trusted for liveness, not correctness
these are the things that determine whether the result shows up on time at all:
- coordinator availability
- Redis availability
- prover availability
- indexer freshness
- operator competence
that split is the difference between a system that fails loudly and a system that quietly lies (and we all hate liars, don't we?)
so, for that very reason, Sonar is explicitly trying to fail in the first way, not the second.
skip ahead: if the theory already clicked and you just want Sonar's actual moving parts, start here with chapter 7. if you want the lived experience instead, jump to part ii.
chapter 7: the bird's-eye view
Sonar today is easiest to understand as seven pieces:
1. Anchor program -> the trust boundary
2. SDK + CLI -> the developer surface
3. Coordinator listener -> sees requests
4. Indexer + API -> provides enrichment data
5. Redis queues -> decouples orchestration
6. Prover -> computes + proves
7. Callback worker -> returns verified results on-chain
the lifecycle looks like this:
there is nothing crazy exotic about this pipeline.
and THAT is a feature.
one of the easiest ways to ruin infra is to decide that every part of it must also be novel.
Sonar is novel where it needs to be novel.
everywhere else, it simply tries to be legible.
chapter 8: the on-chain program, the part that actually matters
the Anchor program is the heart of the correctness model. Anchor, for the non-Solana readers, is just the developer framework around the actual blockchain program. think power tools, not raw scrap metal.
it supports four core instructions:
register_verifierrequestcallbackrefund
that is the whole protocol surface.
register_verifier
this creates a verifier registry PDA keyed by computation_id and stores Groth16 verifying-key material on-chain.
a PDA is just a predictable program-owned account.
in the records-office picture, it is a folder slot whose label is generated from the office rulebook instead of chosen by hand. a computation_id is just the label for a particular kind of job. a verifier is the checker for proofs, and Groth16 is the compact certificate format Sonar uses here because it is efficient to verify on-chain.
this matters because the proof system is not just a blob attached to a result. the verifier is explicit protocol state.
request
this is where a caller asks Sonar to do work.
it creates two PDAs:
RequestMetadataResultAccount
if the locker analogy helps, one locker remembers what was asked and the other is where the finished answer gets placed.
it also escrows the fee and emits structured logs.
those logs are not a debugging detail. they are part of the system design.
the coordinator learns about work by watching program logs, then decoding:
- request ID
- raw inputs
- callback account metadata
the logs are, in practice, the event bus between the chain and the off-chain workers.
callback
a callback is just the second trip back after the off-chain work is done. think of dropping clothes at a tailor and later coming back for the finished suit.
this is the critical instruction.
it:
- checks the request is still pending
- checks the deadline has not passed
- checks the verifier registry matches the request's computation ID
- verifies the Groth16 proof
- validates result size
- marks request completed
- invokes the consumer callback program
- pays the prover out of escrow
the trust boundary lives here.
refund
if the deadline passes and the request is still pending, the payer gets the fee back.
again: this is not a "nice-to-have". it's a core piece of the system.
the data model
there are three important account types.
RequestMetadata tracks:
- request ID
- payer
- callback program
- computation ID
- deadline
- fee
- status
- bump
ResultAccount tracks:
- request ID
- result bytes
- whether result has been written
- write slot
VerifierRegistry tracks:
- computation ID
- authority
- Groth16 verifying-key material
- bump
this is simple on purpose.
the program is not trying to be a general database of every off-chain event imaginable.
it is trying to own the minimal state needed for verifiable async computation.
chapter 9: the developer surface, because protocol is not enough
one of the easiest mistakes in infra projects is to think that the protocol itself is the product.
it is not.
the developer surface is the product.
Sonar already ships two real developer-facing pieces:
the CPI SDK
CPI stands for cross-program invocation, which is just one blockchain program calling another. imagine one office in a government building forwarding your paperwork to the next office down the hall.
crates/sdk gives downstream Anchor programs an ergonomic request helper.
instead of every consumer re-deriving Sonar PDAs by hand and manually constructing CPI accounts, the SDK bundles that into a safer API that:
- carries the caller-chosen
request_id - validates the request and result PDA addresses
- forwards signer seeds and remaining accounts
- calls the Sonar
requestCPI cleanly
this is exactly the kind of thing that sounds small until you do not have it and every integration becomes a bug farm.
the CLI
crates/cli gives you sonar-cli register.
its job is to:
- hash an ELF to derive
computation_id - resolve verifier artifacts, meaning the packaged files the on-chain checker needs
- perform integrity checking
- construct and submit
register_verifier
this is good product hygiene.
because "dynamic verifier registration exists in theory" is not actually useful until someone can operate it without copy-pasting brittle one-off scripts.
chapter 10: the coordinator, the thing that hears the chain talk
the coordinator is really two jobs wearing one hat.
job 1: log listener
the listener subscribes to Sonar program logs over Solana WebSocket. a WebSocket is just a live feed, like keeping a phone line open so updates arrive immediately instead of repeatedly asking "anything new yet?"
when it sees a request log, it decodes the request metadata and raw inputs.
for simple computations, that may be enough to form a prover job immediately.
for richer computations, it may need to enrich those inputs with off-chain data first.
job 2: callback worker
later, after the prover returns a ProverResponse, a callback worker consumes that response and submits the on-chain callback transaction.
this split is important.
request discovery and callback submission are related, but they are NOT the same problem.
the listener is about finding work.
the callback worker is about finalizing work safely.
the queueing model
the coordinator/prover boundary uses Redis.
Redis here is basically a very fast shared inbox. the coordinator drops jobs in, the prover picks them up, and neither side has to be welded directly to the other.
it is another one of boring decisions i tend to trust early in a project.
you want:
- simple job dispatch
- decoupling between services
- a queue you can inspect
- easy local and devnet operation
Redis gives you that cheaply.
production-grade durability and replay semantics are still future work. but for a prototype, this is a perfectly sane baseline.
chapter 11: the indexer, because chains do not remember history the same way you do
this is where Sonar becomes more interesting than a toy proof verifier.
the strongest real vertical slice in the repo today is historical_avg.
that computation is intentionally not something the program can just do from on-chain state in one instruction.
it needs account history across a slot range.
Solana programs do not get to ask the chain:
"hey, what did this account look like over the last N slots?"
so Sonar has an indexer stack. an indexer is just the system that reorganizes chain data into something searchable instead of leaving it as a giant messy timeline:
- a Geyser plugin, basically Solana's raw account-update firehose, writes account updates into Postgres, which is just a standard database
- an Axum HTTP API, meaning a normal web server layer, exposes account-history lookups
- the coordinator uses that API to enrich
historical_avgjobs before they hit the prover
the special path looks like this:
- client submits request whose raw inputs encode
(pubkey, from_slot, to_slot) - coordinator parses those inputs from logs
- coordinator fetches balance history from the indexer API
- returned balances become prover inputs
- prover computes the average and produces a proof/result bundle
- callback writes the final value on-chain
this is one of the best demonstrations of what a real coprocessor is for.
it is not just "do math off-chain." it is:
- combine chain-triggered requests
- with indexed off-chain data
- and a proof-backed return path
that is real infrastructure.
chapter 12: the prover, where the heavy stuff actually lives
the prover resolves computations through an internal registry, builds and runs SP1 guests, wraps proofs into Groth16 when needed, and can export verifier artifacts.
in plain english, the prover is the machine that does the heavy homework and then prints the mathematical receipt proving it did not cheat. SP1 is the particular proving engine Sonar uses, and a guest is just the program running inside that proving engine.
the important current nuance is this:
- on-chain verification is generic over registered verifier material
- off-chain proving is only available for computations that Sonar's prover registry actually implements
in plain english, the chain can be taught to check different kinds of certificate formats, but the prover still needs explicit support for each kind of job it knows how to run.
that means Sonar today is dynamically verifiable but not yet arbitrarily computable.
that is a subtle but important difference.
how the proving flow works
at a high level:
- prover consumes
ProverJobfrom Redis - resolves computation by
computation_id - runs the matching SP1 guest
- generates proof + public inputs + result bundle
- publishes
ProverResponseback to Redis
public inputs are just the small set of values the chain is allowed to look at while checking the proof receipt.
the service is deliberately structured so the expensive proof generation runs in a blocking task rather than choking the async reactor, basically meaning the heavy work is pushed onto its own lane so it does not freeze the rest of the service.
it also logs periodic heartbeats while a long proof is still running. that sounds trivial, but it is not. once proofs get slow, silence is operational poison.
what computations exist today
there are really two categories in the repo right now:
historical_avg, the strongest end-to-end slice- a simpler fibonacci/demo proving path used for proof generation and verifier/artifact flows
that is enough to prove the architecture.
it is not yet a broad coprocessor catalog.
chapter 13: the callback worker, the part that makes the result usable
once the prover has done its job, the callback worker turns a proof artifact into an on-chain effect.
that means:
- consume
ProverResponse - construct callback instruction data
- pass callback accounts back in the expected shape
- submit transaction
then the program takes over:
- verify proof
- write result
- invoke the consumer callback program
- pay the prover
this is the piece that makes Sonar feel like a coprocessor rather than just a proof factory.
because the goal is not only to prove something.
the goal is to make that proof-backed result become actionable state for another program.
chapter 14: the operational surface, because repos do not become systems just like that
Sonar already ships more operational structure than many "alpha infra" repos do.
it has:
docker-compose.prod.ymlfor Postgres, Redis, coordinator, prover, Prometheus, and Grafanascripts/deploy-devnet.shfor repeatable devnet deploymentscripts/devnet-smoke-bench.shfor reproducible remote-devnet smoke + benchmark runsscripts/local-ci.shplus.actrcfor local GitHub Actions rehearsal- CI,
cargo audit,cargo deny,gitleaks, and Criterion benches
while all this does sound super cool, that does not mean it is production-ready.
what it means is that the repo already crossed the line from "interesting architecture sketch" into "someone is trying to run this like a system."
the current observability story is still baseline-only:
- metrics scraping exists
- Grafana service exists
- dashboards, alerts, tracing, SLOs, and runbooks do not yet
that is an important distinction.
baseline observability is not operational maturity. it is the seed of operational maturity.
part ii: the journey and lessons
skip ahead: if you only want the hardest pain points, jump to chapter 17. if you want the honest timeline, jump to chapter 24. if you want the builder playbook, jump to part iii.
chapter 15: what we genuinely did well
there are a few decisions in Sonar that i still think were exactly right.
1. we built a real vertical slice instead of an abstract platform first
this matters a lot.
it is very easy to spend months building a generic coprocessor framework that does nothing end to end.
Sonar chose to make one vertical slice real.
historical_avg is not a toy in the sense that it exercises:
- request submission
- log ingestion
- enrichment via an indexer API
- off-chain computation
- proof generation
- callback verification
- final on-chain result handling
that is exactly the right order to discover reality.
2. we separated correctness from liveness
this is what i believe to be one of the core architecture wins.
if the off-chain stack dies, Sonar should get stuck or refund. it should not silently produce incorrect final state.
that is the right failure model.
3. we made the developer surface real early
the SDK and CLI are not stubs.
that is huge.
because protocols with no ergonomic integration path are just chores disguised as infra.
4. we hardened failure paths instead of pretending the happy path was enough
recent work added:
- failure-path tests
- prover startup preflights
- benchmark honesty improvements
- wallet balance preflights for devnet runs
- explicit dead-letter handling for failed callback responses
this is exactly the kind of work that makes a prototype less glamorous and much more real.
chapter 16: what was standard, boring, and absolutely fine
not every part of a good system needs to be clever.
in fact, i would argue most parts should not be.
Sonar uses a lot of very standard machinery:
- Redis queues
- Axum HTTP APIs
- Postgres for indexed data
- Prometheus/Grafana baseline
- Anchor for the on-chain program
- simple service decomposition across coordinator/prover/indexer
and that is good.
the cryptography is already hard enough. there is no prize for making your queue architecture mysterious too.
if anything, one lesson from this repo is that standard boring plumbing is a superpower when the difficult parts of your project are already difficult.
chapter 17: where it got painful
this is the part where the project stopped being a nice architecture diagram and became a real system.
the deepest pain point was the proving operating model.
not the abstract proving model.
the actual, physical, annoying, machine-level one.
the laptop problem
full local CPU SP1 Groth16 proving is heavy.
really heavy.
that means on normal laptops you run into:
- memory pressure
- swap dependence
- long opaque waits
- proof startup costs that dominate short benchmark runs
so we hardened the CPU path with things like:
- lower-memory defaults
- worker-count limits
- proving-key caching
- fail-fast memory headroom checks
that helped. but it did not make physics go away.
the cache problem
then there was the beautiful little nightmare where the local SP1 Groth16 artifact cache looked like it existed, but was actually incomplete.
the directory contained only a truncated artifacts.tar.gz.
so the benchmark would wait and wait and wait, because from the outside it looked like "the prover is still running," when in reality the callback was never going to arrive.
that ended in a startup preflight too:
- if the cache directory exists but the extracted Groth16 artifacts are missing, fail immediately
again: truth before optimism.
the CUDA detour
then came the obvious thought.
fine. if CPU proving is miserable on a laptop, take CPU out and use the GPU.
this is where infrastructure teaches humility.
first blocker:
- the cached
sp1-gpu-serverbinary was linked againstlibcudart.so.12 - the host exposed CUDA 13 userspace
- so the server would not even start
we fixed that by rebuilding sp1-gpu-server@6.0.2 locally so it linked against the installed libcudart.so.13.
for a minute, it looked like we had finally won.
then the real blocker appeared.
SP1's full CUDA prover on this path requires at least 24 GiB VRAM.
my laptop GPU here reports about 8 GiB.
so the final truth was:
- CUDA 12 mismatch: solvable
- laptop GPU size: not solvable by cleverness
that was a genuinely useful result.
because it turned a fuzzy performance problem into a crisp architectural fact:
full local CUDA proving is not the path on this machine.
and once you know that, you can stop lying to yourself about what the next move should be.
chapter 18: what we could have done better
there are a few things i would absolutely do earlier if starting over.
1. decide the proving operating model earlier
we had the protocol story before we had a fully honest answer to:
"where do the expensive proofs actually run in a way that is repeatable for operators?"
that question should have been elevated earlier.
because it shapes benchmarking, devnet workflows, local DX, and eventually production topology.
2. formalize operations earlier
the repo has baseline observability and automation, which is good.
but the missing pieces are the boring adult ones:
- dashboards
- alerts
- runbooks
- recovery procedures
- staging promotion model
those are the things that make you trust a system where you're crunched with time, not just at demo time.
3. package the indexer operating model more explicitly
right now the prod-oriented Compose topology assumes an external indexer service.
that is a perfectly legitimate interim decision.
it is still an incomplete operator story.
4. broaden computations later, not sooner
this one is less regret and more reaffirmation.
the temptation to add many computations early is strong.
but honestly, if we had done that, i think we would have had a broader but less honest system.
the right call was depth first.
still, the cost of that decision is obvious: the product surface remains narrow today.
chapter 19: why the project is paused right here
the important thing is this:
Sonar is not paused because the core architecture failed.
it is paused because the next unanswered question is a bigger one than another patch or another unit test.
that question is:
what is the real proving deployment model?
the current facts are:
- the request/callback architecture works
- the vertical slice works
- the devnet-grade prototype is real
- full local laptop CPU proving is still painful
- full local CUDA proving is not viable on this laptop due VRAM floor
so the likely next paths are:
- keep the hardened CPU lane for local development only
- evaluate narrower acceleration paths like
groth16-cuda - move full proving to a remote machine with enough RAM and at least 24 GiB VRAM
that is not a bad place to pause.
it is actually a very honest one.
because from here onward, the work is less about inventing the core system and more about choosing the right operator model for it.
skip ahead: if you mostly want the distilled lessons and the honest accounting, start with chapter 20, chapter 24, or chapter 25.
chapter 20: the hardest part was not the zero-knowledge part
this is probably the most important lesson in the entire repo.
the hardest part was not the algebra.
it was not "how do i conceptually verify a proof on-chain?"
the hardest part was the actual engineering around it:
- how requests get modeled
- how off-chain work gets discovered
- where enrichment data comes from
- how to fail safely
- how to benchmark honestly
- how to make local development not completely miserable
- how to separate product truth from operator wishful thinking
the proof system is the sharpest-looking part.
but to be very honest, it is not the whole iceberg.
chapter 21: one real slice beats ten speculative ones
Sonar is not broad yet.
that is true.
but what it has is one real slice and the infrastructure around it.
that is far more valuable than ten hypothetical integrations with no actual proving or callback pipeline behind them.
if you are building infra, depth is not the enemy of platform vision. depth is what stops platform vision from being a fantasy.
chapter 22: standard engineering discipline matters more than aesthetic novelty
CI matters.
benchmarks matter.
deploy scripts matter.
dead-letter handling matters.
wallet balance preflights matter.
artifact cache validation matters.
the part of infra work that looks least glamorous from outside is usually the part that distinguishes a repo that survives contact with reality from one that only survives screenshots.
chapter 23: the current repo is substantial, but not done
this is the honest inventory.
Sonar today is:
- a serious prototype
- a hardened devnet-grade system
- a real vertical slice
- a coherent architecture
it is not yet:
- a production-ready proving platform
- a mature governance system
- a polished operator product
- a broad computation marketplace
the remaining work is mostly not about new primitives.
it is about productionization.
that means:
- proving strategy
- operational maturity
- verifier governance
- economics and backpressure
- recovery/runbooks/alerts
- more computations later
which, honestly, is exactly where a good prototype should be.
chapter 24: how long this actually took
one thing that blogs like this often hide is the clock.
so here is the honest version.
if i go by the git history, the first commit in this repo lands on 27-03-2026.
from there, the core system came together very quickly:
- 27-03-2026 to 31-03-2026: repo scaffold, docs, CI, Anchor setup, prover service foundations, coordinator listener, indexer skeleton, and the first historical average path
- 05-04-2026: a huge amount of the serious protocol surface snaps into place, including on-chain verifier registry work, registry-backed proof verification, the CPI helper, verifier artifact tooling, and the registration CLI
- 11-04-2026 to 19-04-2026: production-ish compose setup, devnet deployment automation, callback-flow hardening, and the reproducible devnet smoke benchmark flow
- 22-04-2026 to 23-04-2026: the angry realism phase, where proving ergonomics, long waits, cache issues, benchmark honesty, low-memory CPU behavior, and unsupported CUDA hosts all get fail-fast handling
so if i am being fair, the answer is roughly this:
- about 3 weeks to get from empty repo to a real end-to-end system with a legitimate vertical slice
- just under 4 weeks to get to the hardened devnet-grade shape this post is describing
that is the build timeline.
the debugging timeline is funnier.
the concentrated "why is this prover being so cursed" spiral was basically 2 days of focused work, centered around the 22-04-2026 to 23-04-2026 commit cluster.
and the annoying punchline is that this was not purely a software bug hunt.
some of it really was software:
- broken runtime expectations
- incomplete Groth16 cache state
- benchmarks waiting too long before admitting a run was dead
but after fixing the software-side traps, the final wall was hardware reality:
- local CPU proving on a normal laptop is rough on memory
- the rebuilt CUDA path could launch, but the laptop GPU still only had about 8 GiB VRAM
- the full SP1 CUDA proving path wanted 24 GiB VRAM
so yes, we spent about two days debugging something that ended with a very physical answer: the machine was simply below the comfortable proving floor for the path we were trying to run.
honestly, i am glad that happened.
because it forced the repo to become more truthful.
chapter 25: where i think the next real progress comes from
if i had to prioritize the next moves, i would do them in this order.
1. settle the proving execution model
be explicit.
is full proving:
- local CPU for development only?
- remote GPU in the real path?
- hybridized somehow?
make that answer architectural, not emotional.
2. productionize operations
dashboards
alerts
runbooks
recovery
staging path
this is where systems become credible.
3. formalize governance and economics
authority rotation
verifier lifecycle
artifact provenance
fee policy
admission control
that is what turns "working" into "safe to rely on."
4. broaden computations after the operator path is stable
not before.
part iii: how to build your own
skip ahead: if you only want the trust-boundary design, jump to chapter 27. if you only want proving and ops gotchas, jump to chapter 31. if you only want benchmarking advice, jump to chapter 33.
chapter 26: start with one computation, not a platform manifesto
if you want to build a zk coprocessor, do not start with:
"i am building a generalized decentralized proving marketplace for arbitrary off-chain compute."
that sentence is usually a sign that you are about to write an architecture doc instead of a system.
start with one computation that forces the architecture to be real.
ideally it should require at least one of these:
- more compute than is comfortable on-chain
- historical or indexed data access
- proof-backed correctness
- callback semantics that another program can use
for Sonar, that was historical_avg.
that single slice forced us to build:
- request state
- result state
- verifier registry
- coordinator ingestion
- enrichment API
- prover
- callback worker
- SDK integration
that is exactly what you want.
pick one computation that drags the whole architecture into the real world.
chapter 27: define the trust boundary before writing the queue
this is probably the single most important design step.
before you choose Redis or Postgres or Kafka or NATS or a homegrown worker framework, answer these questions:
- what makes a result correct?
- what only affects liveness?
- what can fail without corrupting chain state?
- what must be represented explicitly on-chain?
if you cannot answer those, you are not designing a zk coprocessor yet. you are just drawing boxes.
for Sonar, the answer became:
- correctness is on-chain verifier + proof path + account constraints
- liveness is everything else
that single choice simplified almost every later decision.
chapter 28: make refund a first-class citizen
seriously.
do this before you feel like doing it.
because if your system is asynchronous and involves off-chain services, failure is not hypothetical.
it is certain.
the only question is whether the user experience when it happens is graceful or embarrassing.
the basic pattern is:
- escrow fee at request time
- set deadline
- if callback never arrives, let payer reclaim funds
that turns off-chain failure from a protocol failure into a liveness failure with a recovery path.
huge difference.
chapter 29: logs are not an implementation detail, they are your bridge
in systems like this, event emission is basically the membrane between on-chain and off-chain worlds.
make those logs structured, deterministic, and most importantly, easy to parse.
Sonar emits log lines for:
- request ID
- inputs
- callback account metadata
that gives the coordinator everything it needs to reconstruct work without scraping the entire transaction in a brittle way.
if you are building your own system, do not bury the off-chain trigger inside vague human-readable logs meant only for debugging.
make them part of the protocol.
chapter 30: treat enrichment as a deliberate subsystem
many useful computations need data the chain does not expose conveniently inside a program.
that means you will probably need some form of indexer.
once that is true, acknowledge it.
build the narrowest API that solves the problem.
in Sonar, the indexer surface is intentionally tiny:
- account-history lookups only
that is good.
do not build a giant generalized data warehouse API just because you can.
the more surface area you add, the more trust, ops, caching, and freshness guarantees you now own.
chapter 31: keep the prover path brutally honest
this is where many systems lie.
not deliberately. just through wishful thinking.
examples of lies:
- "the proof is still running" when startup already failed
- "the cache exists" when it is actually corrupted
- "CUDA is supported" when the server binary cannot even launch
- "the GPU path is available" when the card does not meet the prover's minimum VRAM
the only sane response is aggressive startup preflight.
Sonar ended up with exactly that mindset.
if the environment is not viable, fail immediately and explain why.
that is better than waiting 5,000 seconds to discover the run was dead on arrival (this happened to me btw).
chapter 32: build layered validation, not one giant heroic test
the right test stack for systems like this is layered.
you want:
- fast unit tests
- deterministic fixture-backed integration paths
- one strong end-to-end slice
- explicit heavyweight proof smoke tests, not default-every-time torture
Sonar eventually settled into something like:
- cheap unit coverage
- Anchor integration coverage
- deterministic fixture-backed e2e for local stack behavior
- opt-in expensive SP1 smoke
- devnet benchmark wrappers for real-world runs
that is the right idea.
because the fastest way to destroy iteration speed is to make every small correctness check depend on the slowest and flakiest part of your proving pipeline.
chapter 33: do not confuse benchmark activity with benchmark truth
one of the nastier infra traps is a benchmark that appears busy and therefore feels meaningful.
in Sonar, the real lesson was:
request submitted is not the same as callback completed.
the only benchmark result that matters for a coprocessor is one tied to actual completion of the lifecycle you care about.
that means:
- proof generation succeeded
- callback landed
- result account changed as expected
- cleanup happened if appropriate
anything less is just animated logging.
epilogue: if you want to understand zk coprocessors, build one
not necessarily this one.
not necessarily with SP1.
not necessarily on Solana even.
but build one.
because once you do, a bunch of abstract phrases stop being abstract.
"off-chain execution with on-chain verification" stops sounding profound and starts sounding like a very practical split of responsibilities.
"proof generation pipeline" stops sounding like a single component and starts sounding like cache validation, hardware constraints, process orchestration, and startup honesty.
"developer platform" stops sounding like a lofty goal and starts sounding like a CPI helper, a CLI, a deploy script, an indexer API, and a callback that actually lands.
that is what Sonar gave me.
not just a repo. not just a prototype. not just a set of passing tests.
it gave me a much sharper model of where blockchains stop, where coprocessors begin, and how much of the real work lives in the glue between them.
and i think that is the right note to end on.
because the interesting thing about zk coprocessors is not that they make blockchains magical.
it is that they let blockchains stay small, strict, and honest while still answering bigger questions.
that, to me, is the whole point.
Sonar is paused, not dead. if you're an infra engineer with a GPU‑rich environment and want to help, the repo is open. otherwise, i'll be back when the proving model is settled.
if you made it this far, thank you for reading.
repo: github.com/bit2swaz/sonar
one line to end on: the proof only matters if the system around it is honest too.