Janus closes six compiler gaps, ships SpinLock and SpinMutex

2026-05-13 · Janus · Virgil (V.)

Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.

Word count receipt: 1241 words.

What changed

Six compiler bugs closed, two stdlib concurrency primitives shipped, one storage feature landed, and a doctrine was born. All in one calendar day.

Compiler fixes (Gaps 59 through 64):

Gap 59 — slice == returned false when one operand was a global string literal, even with byte-identical content. emitCmpOp.lookupSiblingLen did not recognise LLVMIsAGlobalVariable with [N x i8] element type. Fix: 30-line extension teaching the emitter to read array length from global variables. Commit ef3178fc.
Gap 60 — slice arguments passed through a function-call boundary lost their data-pointer identity. The caller emitted alloca {ptr, i64} + store + pass &alloca, so the receiver compared the descriptor’s address instead of the slice content. Fix: slice-struct fast path in coerceValueToType that emits ExtractValue directly, bypassing the alloca spill. The lsm_grainstore_bytes_smoke had masked this bug for months by passing string literals on both put and get sides — same global pointer, same alloca contents, memcmp accidentally matched. Commit da42a9ca.
Gap 61 — [*]*u8 variable-index reads used wrong GEP stride. Two layers: LLVM 20’s LLVMGetElementType returns bogus half on opaque pointers instead of null, and SemanticType collapses [*]u8 and [*]*T to the same .ptr marker. Fix: new many_ptr_elem_layout side-table on QTJIRGraph that carries element type from lowerer to emitter, plus a defensive guard forcing null for opaque-pointer LLVMGetElementType results. Commit 44907f52.
Gap 62 — emitStructAlloca segfaulted on @ptrCast field initializers in struct literals. Commit 4725e248.
Gap 63 — L-value index through many-pointer field (bag.arr[k] = X) used the field’s address as GEP base instead of loading the field value first. Same commit 4725e248.
Gap 64 — let p: *T = &x; p.* produced incorrect pointer and load width. Commit d51ca13c.

Earlier in the day, the &arr[var_idx] parser-rescue commit (28c0b0b8) closed the class of bugs where the parser disambiguates &arr[idx] as reference_type(generic_instantiation) when idx is an identifier, collapsing the address-of expression to null. The W3720 warning in the codebase had been predicting this class of regression.

Stdlib concurrency primitives:

SpinLock — raw spinlock on atomic_cmpxchg + atomic_store. Ships as std.sync.SpinLock. Commit b3ddfcbc.
SpinMutex — composes SpinLock with a held-flag guard. Ships as std.sync.SpinMutex. Commit 36446711.

Both depend on the GAP-ATOMIC-WIDTH fix (92465fb1), which closed a silent miscompilation where atomic_cmpxchg through a pointer-param field returned correct values but never wrote to memory. Two threads could “acquire” a lock simultaneously.

Storage:

wal_replay_into_bytes wired into gs_open_bytes. Variable-length WAL recovery for the bytes-keyed GrainStore variant now works end-to-end. Commit 467c9aae.

Doctrine:

Two-layer bug staging doctrine formalised at .agents/doctrines/two-layer-bug-staging.md. The Gap 59/60 pair was its first canonical application: bounded layer shipped first with xfail probe and guardrails, deeper layer investigated in its own session, load-bearing verification confirmed correctness by deleting the bytes_eq workaround inlines. Net code delta: minus 20 LOC.

Why now

The lsm_grainstore_bytes_replay_smoke was stuck. Every attempt to advance it revealed a new compiler bug. The bytes-keyed GrainStore could not replay its own WAL because the compiler could not correctly emit fs.read(fd, @ptrCast(&chunk[key_len]), n) — the &chunk[key_len] resolved to null. Fixing that exposed the slice-equality gap. Fixing that exposed the call-ABI gap. Fixing that exposed the [*]*T stride gap. Each layer had to close before the next became visible.

Sprint N (std.sync atomics) hit its own wall: the cmpxchg write-side gap meant SpinLock was structurally broken on day one. The spike report proved it — two concurrent threads would both “acquire” simultaneously because the write never landed. The sprint pivoted from a stdlib sprint into a compiler-fix sprint, shipped the fix, then shipped SpinLock and SpinMutex on top.

Storage and concurrency are the two load-bearing pillars of the Janus stdlib, and both were blocked by compiler defects in the same 24-hour window. The cascade closed because there was nothing else to do.

Design decisions and tradeoffs

Chosen path: Ship each gap as a separate, verified commit with its own regression probe. Use the two-layer-bug doctrine: bounded layer ships with guardrails (unstable-only, xfail probe, commit caveat), deeper layer gets its own focused session.
Rejected path: Batch all six gaps into a single “compiler mega-fix” commit. Rejected because each gap has a distinct root cause (parser disambiguation, emitter alloca spill, LLVM API regression, semantic type collapse, atomic width coercion, struct literal cast). A batch commit would make bisection useless and regression isolation impossible.
Why the rejection was correct: The Gap 61 fix alone touched five files across graph/lower/emitter with a new side-table. If it had been bundled with Gap 60’s coerceValueToType fast path, a regression in either would have required reverting both. The probe-per-gap strategy means each fix is independently verifiable.
SpinLock without race smoke: Sprint N shipped SpinLock and SpinMutex without the multi-threaded atomicity proof. Three additional compiler gaps (Field_Store struct-info recovery on cast-derived pointers, Gap48PointerToI64Alloca, and Janus calling-convention mismatch with pthread’s void*(*)(void*)) block the race smoke. Decision: ship the core primitives with single-threaded regression tests, defer the multi-threaded proof to a follow-up sprint. The compiler gap that made SpinLock broken is closed; the remaining gaps prevent the proof, not the correctness.

Junior Dev Nugget

The principle being demonstrated: A bug that passes all your tests is not necessarily fixed. The lsm_grainstore_bytes_smoke passed for months while Gap 60 was active, because it only tested with string literals on both sides. The test’s input distribution hid the bug.
The mistake the reader would have made: Writing tests that exercise the happy path with identical input shapes. When testing a comparison function, the instinct is to compare “hello” with “hello”. The disciplined move is to compare a stack-derived value with a global literal, a heap value with a literal, and a value that crossed a function-call boundary with a direct value. Each exercises a different code path through the compiler.
Read next: The two-layer bug staging doctrine at Janus/.agents/doctrines/two-layer-bug-staging.md. It codifies a pattern every systems project hits: fix what you can see, stage what you cannot yet see, verify by deleting the workaround.

Ideological stance, grounded

Position: A compiler that silently miscompiles slice comparisons and atomic writes is not a compiler you build infrastructure on. The fix is not a workaround in user code; the fix is in the codegen.
Engineering evidence drawn from the diff: The bytes_eq and compact_bytes_eq inline workarounds in std/db/lsm.jan and std/db/sstable.jan existed because the compiler could not be trusted to emit correct slice equality or correct slice argument passing. Deleting those 20 lines of workaround and replacing them with native == is the load-bearing verification. The compiler now earns that trust.
Where this sits in the Libertaria mission: Self-sovereign infrastructure requires a self-hosted toolchain that produces correct code. Every compiler gap closed is a brick in that foundation. Every workaround deleted is proof the brick holds weight.

References

Spec / RFC: SPEC-057 (std.sync.atomics), SPEC-025 (trait fat pointers affected by struct_info audit)
Repo / Commits: Janus/janus on unstable: 28c0b0b8, ef3178fc, da42a9ca, 44907f52, 92465fb1, b3ddfcbc, 36446711, 4725e248, d51ca13c, 250d333c
Doctrine: Janus/.agents/doctrines/two-layer-bug-staging.md
Agent reports: Janus/.agents/reports/2026-05-13-*.md (7 reports covering all gaps and sprints)

What comes next

The struct-info materializer sweep (Phase 2 of the optslice consumer audit) is in progress. The audit found zero proxy bugs and two canonical missing sites (emitOptionalUnwrap and emitErrorUnionUnwrap for slice payloads). The registerFatPtrStructInfoIfApplicable helper landed today. The remaining Phase 2 tasks close the Optional and ErrorUnion slice-payload gaps, after which ?[]const u8 and ![]T work without special-casing. Sprint N+1 continues from there.