Spec → plan → review → verify, every change
A state machine the agents can't skip. Specify, clarify, plan, tasks, implement, review, verify. Role separation, signed gates, append-only history. Drift gets caught at the boundary, not in production.
v0.5.8 · required reproducibility · temporal lens · public ledger
Tribunal is an adversarial code-review methodology with on-chain reputation. Three reviewers attack from non-overlapping lenses, one adversary attacks what they share, and every finding is signed by an identifiable agent whose history is publicly settled on Burnt XION.
The unit of trust is not consensus — it is surviving adversarial scrutiny by identified agents whose history is on the public record.
Self-audit, 2026-05-13
Each guarantee is enforced by a different layer of the system. They compose: missing any one and the other three degrade. Together they cover the failure modes neither cooperation nor adversarial review alone can reach.
A state machine the agents can't skip. Specify, clarify, plan, tasks, implement, review, verify. Role separation, signed gates, append-only history. Drift gets caught at the boundary, not in production.
Architecture, security, and performance by default — each filing signed findings at calibrated severity. A fourth, opt-in temporal lens (v0.5) reviews systems whose central claim is longitudinal: memory, identity, accumulation, drift. Opt in via intent.md when the load-bearing property only emerges over many cycles.
After lens approval, one adversary attacks the same diff with all reviewer reports in hand. The job: surface what every cooperative-trained lens shares as a blind spot. Default panel is three Claude tiers (Opus + Sonnet + Haiku); intra-Claude diversity is the empirically validated default.
Every finding is signed and recorded. PMs resolve outcomes — true positive, false positive, stale. The contract on Burnt XION settles reputation per agent. Noisy agents lose weight. Useful agents auto-elevate. The system gets sharper over time.
What a non-trivial change actually goes through, from "I want to ship this" to "the chain has settled the reputation impact."
PM authors intent.md and plan.md. No coding starts until both pass spec gates. Locked artifacts become the contract every reviewer audits against.
Architecture, security, performance. Each reads diff + intent + plan, files signed findings at Critical / Warning / Suggestion. Severity ladder is absolute — any unresolved Critical or Warning blocks approval.
Reads all three reviewer reports verbatim plus the diff. Hunts for shared blind spots. Files its own signed findings. Verdict: concur / escalate / downgrade.
Tool-level proof. Halt-on-failure layers: build → fmt → vet → test → fuzz. Pyramid green is necessary, not sufficient — Critical correctness defects routinely survive a green pyramid.
Per-finding: true_positive, false_positive, or stale. Signed by the PM keypair. Drives the reputation impact for the filing agent.
commit_finding_batch + resolve_finding_batch land on Burnt XION. Reputation updates per agent. Auditable forever — every finding's stake, evidence hash, and outcome publicly verifiable.
The three default lenses each review a component well. The seams between components — the integral of
small per-cycle edits that defines memory, identity, drift — go unreviewed. Auditing
session-essence made this concrete: 11 of the adversary's findings lived in the
seams. v0.5 adds an opt-in fourth lens for longitudinal systems, a tribunal history
primitive that gives the lens trajectory access, a trajectory.Property PBT scaffold
that turns lens findings into executable rapid tests, and a trajectory_id field
(v0.5.6) for findings that span many plans. v0.5.7 closes Rust-contract PBT coverage at 7 properties. v0.5.8 requires a reproducibility field on every finding (exploit path / trigger sequence / workload+numbers / manifesting cycle / PoC) so downstream maintainers can distinguish real threats from style violations.
Identify → read → enforce. The lens the methodology was missing.