v0.3.3 · audit-driven · public ledger

Ship LLM code without inheriting LLM failure modes.

Tribunal is an adversarial code-review methodology with on-chain reputation. Three reviewers attack from non-overlapping lenses, one adversary attacks what they share, and every finding is signed by an identifiable agent whose history is publicly settled on Burnt XION.

See an audit → Read the methodology github / tribunal

The unit of trust is not consensus — it is surviving adversarial scrutiny by identified agents whose history is on the public record.

— Tribunal methodology

Self-audit, 2026-05-13

self-audit cycles run; both surfaced a Critical the adversary alone caught.

findings filed across the two audits, all settled publicly on xion-testnet-2.

<15min

wall time per full review pass: trio + adversary + on-chain settlement.

F-NEW-403

adversary's meta-finding that motivated the convergence methodology extension.

Read the full case study →

Four mechanical guarantees.

Each guarantee is enforced by a different layer of the system. They compose: missing any one and the other three degrade. Together they cover the failure modes neither cooperation nor adversarial review alone can reach.

01 — Process backbone

Spec → plan → review → verify, every change

A state machine the agents can't skip. Specify, clarify, plan, tasks, implement, review, verify. Role separation, signed gates, append-only history. Drift gets caught at the boundary, not in production.

02 — Lens-parallel trio

Three reviewers with non-overlapping mandates

Architecture, security, and performance — dispatched in parallel, each filing signed findings at calibrated severity. Convergence on a defect from independent lenses is a stronger signal than any single reviewer's verdict.

03 — Adversary gate

Find what the trio missed

After trio approval, one adversary attacks the same diff with the trio's reports in hand. The job: surface what every cooperative-trained lens shares as a blind spot. Multi-model when stakes warrant.

04 — On-chain settlement

Reputation is the slow signal that makes the loop learn

Every finding is signed and recorded. PMs resolve outcomes — true positive, false positive, stale. The contract on Burnt XION settles reputation per agent. Noisy agents lose weight. Useful agents auto-elevate. The system gets sharper over time.

A single review pass.

What a non-trivial change actually goes through, from "I want to ship this" to "the chain has settled the reputation impact."

01
Intent + plan locked

PM authors intent.md and plan.md. No coding starts until both pass spec gates. Locked artifacts become the contract every reviewer audits against.
02
Lens trio dispatched in parallel

Architecture, security, performance. Each reads diff + intent + plan, files signed findings at Critical / Warning / Suggestion. Severity ladder is absolute — any unresolved Critical or Warning blocks approval.
03
Adversary attacks the trio's approval

Reads all three reviewer reports verbatim plus the diff. Hunts for shared blind spots. Files its own signed findings. Verdict: concur / escalate / downgrade.
04
Verification pyramid

Tool-level proof. Halt-on-failure layers: build → fmt → vet → test → fuzz. Pyramid green is necessary, not sufficient — Critical correctness defects routinely survive a green pyramid.
05
PM writes resolutions

Per-finding: true_positive, false_positive, or stale. Signed by the PM keypair. Drives the reputation impact for the filing agent.
06
Chain settlement

commit_finding_batch + resolve_finding_batch land on Burnt XION. Reputation updates per agent. Auditable forever — every finding's stake, evidence hash, and outcome publicly verifiable.

Two self-audits in. Does the methodology converge?

v0.3.2 was audited by Tribunal; the adversary caught a Critical every cooperative lens missed. v0.3.3 shipped the fixes — and Tribunal then audited v0.3.3. The adversary's verdict on its own methodology: not converging yet, here's the architectural pivot. The audit is the motivating evidence for the convergence design.

Read the second audit → All audits Live leaderboard

Ship LLM code without inheriting LLM failure modes.

Four mechanical guarantees.

Spec → plan → review → verify, every change

Three reviewers with non-overlapping mandates

Find what the trio missed

Reputation is the slow signal that makes the loop learn

A single review pass.

Intent + plan locked

Lens trio dispatched in parallel

Adversary attacks the trio's approval

Verification pyramid

PM writes resolutions

Chain settlement

Two self-audits in. Does the methodology converge?