v0.5.8 · required reproducibility · temporal lens · public ledger

Ship LLM code without inheriting LLM failure modes.

Tribunal is an adversarial code-review methodology with on-chain reputation. Three reviewers attack from non-overlapping lenses, one adversary attacks what they share, and every finding is signed by an identifiable agent whose history is publicly settled on Burnt XION.

The unit of trust is not consensus — it is surviving adversarial scrutiny by identified agents whose history is on the public record.

— Tribunal methodology

Self-audit, 2026-05-13

4
self-audit cycles run; the adversary stage caught a Critical the lens trio missed in every one.
v0.5.8
current release. Required-reproducibility patch — every finding now ships a lens-specific trigger path (exploit / call sequence / workload+numbers / manifesting cycle / PoC). Findings without it are downgraded to Suggestion.
<15min
wall time per full review pass: lens panel + adversary + on-chain settlement.
F-NEW-403
adversary's meta-finding that motivated the convergence methodology extension.

Four mechanical guarantees.

Each guarantee is enforced by a different layer of the system. They compose: missing any one and the other three degrade. Together they cover the failure modes neither cooperation nor adversarial review alone can reach.

01 — Process backbone

Spec → plan → review → verify, every change

A state machine the agents can't skip. Specify, clarify, plan, tasks, implement, review, verify. Role separation, signed gates, append-only history. Drift gets caught at the boundary, not in production.

02 — Lens-parallel reviewers

Non-overlapping mandates, dispatched in parallel

Architecture, security, and performance by default — each filing signed findings at calibrated severity. A fourth, opt-in temporal lens (v0.5) reviews systems whose central claim is longitudinal: memory, identity, accumulation, drift. Opt in via intent.md when the load-bearing property only emerges over many cycles.

03 — Adversary gate

Find what the trio missed

After lens approval, one adversary attacks the same diff with all reviewer reports in hand. The job: surface what every cooperative-trained lens shares as a blind spot. Default panel is three Claude tiers (Opus + Sonnet + Haiku); intra-Claude diversity is the empirically validated default.

04 — On-chain settlement

Reputation is the slow signal that makes the loop learn

Every finding is signed and recorded. PMs resolve outcomes — true positive, false positive, stale. The contract on Burnt XION settles reputation per agent. Noisy agents lose weight. Useful agents auto-elevate. The system gets sharper over time.

A single review pass.

What a non-trivial change actually goes through, from "I want to ship this" to "the chain has settled the reputation impact."

  1. 01

    Intent + plan locked

    PM authors intent.md and plan.md. No coding starts until both pass spec gates. Locked artifacts become the contract every reviewer audits against.

  2. 02

    Lens trio dispatched in parallel

    Architecture, security, performance. Each reads diff + intent + plan, files signed findings at Critical / Warning / Suggestion. Severity ladder is absolute — any unresolved Critical or Warning blocks approval.

  3. 03

    Adversary attacks the trio's approval

    Reads all three reviewer reports verbatim plus the diff. Hunts for shared blind spots. Files its own signed findings. Verdict: concur / escalate / downgrade.

  4. 04

    Verification pyramid

    Tool-level proof. Halt-on-failure layers: build → fmt → vet → test → fuzz. Pyramid green is necessary, not sufficient — Critical correctness defects routinely survive a green pyramid.

  5. 05

    PM writes resolutions

    Per-finding: true_positive, false_positive, or stale. Signed by the PM keypair. Drives the reputation impact for the filing agent.

  6. 06

    Chain settlement

    commit_finding_batch + resolve_finding_batch land on Burnt XION. Reputation updates per agent. Auditable forever — every finding's stake, evidence hash, and outcome publicly verifiable.

v0.5 closes the longitudinal gap.

The three default lenses each review a component well. The seams between components — the integral of small per-cycle edits that defines memory, identity, drift — go unreviewed. Auditing session-essence made this concrete: 11 of the adversary's findings lived in the seams. v0.5 adds an opt-in fourth lens for longitudinal systems, a tribunal history primitive that gives the lens trajectory access, a trajectory.Property PBT scaffold that turns lens findings into executable rapid tests, and a trajectory_id field (v0.5.6) for findings that span many plans. v0.5.7 closes Rust-contract PBT coverage at 7 properties. v0.5.8 requires a reproducibility field on every finding (exploit path / trigger sequence / workload+numbers / manifesting cycle / PoC) so downstream maintainers can distinguish real threats from style violations. Identify → read → enforce. The lens the methodology was missing.