Janus closes six compiler gaps, ships SpinLock and SpinMutex
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.
Word count receipt: 1241 words.
What changed
Six compiler bugs closed, two stdlib concurrency primitives shipped, one storage feature landed, and a doctrine was born. All in one calendar day.
Compiler fixes (Gaps 59 through 64):
-
Gap 59 — slice
==returned false when one operand was a global string literal, even with byte-identical content.emitCmpOp.lookupSiblingLendid not recogniseLLVMIsAGlobalVariablewith[N x i8]element type. Fix: 30-line extension teaching the emitter to read array length from global variables. Commitef3178fc. -
Gap 60 — slice arguments passed through a function-call boundary lost their data-pointer identity. The caller emitted
alloca {ptr, i64} + store + pass &alloca, so the receiver compared the descriptor’s address instead of the slice content. Fix: slice-struct fast path incoerceValueToTypethat emitsExtractValuedirectly, bypassing the alloca spill. Thelsm_grainstore_bytes_smokehad masked this bug for months by passing string literals on both put and get sides — same global pointer, same alloca contents, memcmp accidentally matched. Commitda42a9ca. -
Gap 61 —
[*]*u8variable-index reads used wrong GEP stride. Two layers: LLVM 20’sLLVMGetElementTypereturns bogushalfon opaque pointers instead of null, andSemanticTypecollapses[*]u8and[*]*Tto the same.ptrmarker. Fix: newmany_ptr_elem_layoutside-table onQTJIRGraphthat carries element type from lowerer to emitter, plus a defensive guard forcing null for opaque-pointerLLVMGetElementTyperesults. Commit44907f52. -
Gap 62 —
emitStructAllocasegfaulted on@ptrCastfield initializers in struct literals. Commit4725e248. -
Gap 63 — L-value index through many-pointer field (
bag.arr[k] = X) used the field’s address as GEP base instead of loading the field value first. Same commit4725e248. -
Gap 64 —
let p: *T = &x; p.*produced incorrect pointer and load width. Commitd51ca13c.
Earlier in the day, the &arr[var_idx] parser-rescue commit (28c0b0b8) closed the class of bugs where the parser disambiguates &arr[idx] as reference_type(generic_instantiation) when idx is an identifier, collapsing the address-of expression to null. The W3720 warning in the codebase had been predicting this class of regression.
Stdlib concurrency primitives:
-
SpinLock — raw spinlock on
atomic_cmpxchg+atomic_store. Ships asstd.sync.SpinLock. Commitb3ddfcbc. -
SpinMutex — composes SpinLock with a held-flag guard. Ships as
std.sync.SpinMutex. Commit36446711.
Both depend on the GAP-ATOMIC-WIDTH fix (92465fb1), which closed a silent miscompilation where atomic_cmpxchg through a pointer-param field returned correct values but never wrote to memory. Two threads could “acquire” a lock simultaneously.
Storage:
wal_replay_into_byteswired intogs_open_bytes. Variable-length WAL recovery for the bytes-keyed GrainStore variant now works end-to-end. Commit467c9aae.
Doctrine:
- Two-layer bug staging doctrine formalised at
.agents/doctrines/two-layer-bug-staging.md. The Gap 59/60 pair was its first canonical application: bounded layer shipped first with xfail probe and guardrails, deeper layer investigated in its own session, load-bearing verification confirmed correctness by deleting thebytes_eqworkaround inlines. Net code delta: minus 20 LOC.
Why now
The lsm_grainstore_bytes_replay_smoke was stuck. Every attempt to advance it revealed a new compiler bug. The bytes-keyed GrainStore could not replay its own WAL because the compiler could not correctly emit fs.read(fd, @ptrCast(&chunk[key_len]), n) — the &chunk[key_len] resolved to null. Fixing that exposed the slice-equality gap. Fixing that exposed the call-ABI gap. Fixing that exposed the [*]*T stride gap. Each layer had to close before the next became visible.
Sprint N (std.sync atomics) hit its own wall: the cmpxchg write-side gap meant SpinLock was structurally broken on day one. The spike report proved it — two concurrent threads would both “acquire” simultaneously because the write never landed. The sprint pivoted from a stdlib sprint into a compiler-fix sprint, shipped the fix, then shipped SpinLock and SpinMutex on top.
Storage and concurrency are the two load-bearing pillars of the Janus stdlib, and both were blocked by compiler defects in the same 24-hour window. The cascade closed because there was nothing else to do.
Design decisions and tradeoffs
-
Chosen path: Ship each gap as a separate, verified commit with its own regression probe. Use the two-layer-bug doctrine: bounded layer ships with guardrails (unstable-only, xfail probe, commit caveat), deeper layer gets its own focused session.
-
Rejected path: Batch all six gaps into a single “compiler mega-fix” commit. Rejected because each gap has a distinct root cause (parser disambiguation, emitter alloca spill, LLVM API regression, semantic type collapse, atomic width coercion, struct literal cast). A batch commit would make bisection useless and regression isolation impossible.
-
Why the rejection was correct: The Gap 61 fix alone touched five files across graph/lower/emitter with a new side-table. If it had been bundled with Gap 60’s
coerceValueToTypefast path, a regression in either would have required reverting both. The probe-per-gap strategy means each fix is independently verifiable. -
SpinLock without race smoke: Sprint N shipped SpinLock and SpinMutex without the multi-threaded atomicity proof. Three additional compiler gaps (Field_Store struct-info recovery on cast-derived pointers, Gap48PointerToI64Alloca, and Janus calling-convention mismatch with pthread’s
void*(*)(void*)) block the race smoke. Decision: ship the core primitives with single-threaded regression tests, defer the multi-threaded proof to a follow-up sprint. The compiler gap that made SpinLock broken is closed; the remaining gaps prevent the proof, not the correctness.
Junior Dev Nugget
-
The principle being demonstrated: A bug that passes all your tests is not necessarily fixed. The
lsm_grainstore_bytes_smokepassed for months while Gap 60 was active, because it only tested with string literals on both sides. The test’s input distribution hid the bug. -
The mistake the reader would have made: Writing tests that exercise the happy path with identical input shapes. When testing a comparison function, the instinct is to compare “hello” with “hello”. The disciplined move is to compare a stack-derived value with a global literal, a heap value with a literal, and a value that crossed a function-call boundary with a direct value. Each exercises a different code path through the compiler.
-
Read next: The two-layer bug staging doctrine at
Janus/.agents/doctrines/two-layer-bug-staging.md. It codifies a pattern every systems project hits: fix what you can see, stage what you cannot yet see, verify by deleting the workaround.
Ideological stance, grounded
-
Position: A compiler that silently miscompiles slice comparisons and atomic writes is not a compiler you build infrastructure on. The fix is not a workaround in user code; the fix is in the codegen.
-
Engineering evidence drawn from the diff: The
bytes_eqandcompact_bytes_eqinline workarounds instd/db/lsm.janandstd/db/sstable.janexisted because the compiler could not be trusted to emit correct slice equality or correct slice argument passing. Deleting those 20 lines of workaround and replacing them with native==is the load-bearing verification. The compiler now earns that trust. -
Where this sits in the Libertaria mission: Self-sovereign infrastructure requires a self-hosted toolchain that produces correct code. Every compiler gap closed is a brick in that foundation. Every workaround deleted is proof the brick holds weight.
References
- Spec / RFC: SPEC-057 (std.sync.atomics), SPEC-025 (trait fat pointers affected by struct_info audit)
- Repo / Commits:
Janus/janusonunstable:28c0b0b8,ef3178fc,da42a9ca,44907f52,92465fb1,b3ddfcbc,36446711,4725e248,d51ca13c,250d333c - Doctrine:
Janus/.agents/doctrines/two-layer-bug-staging.md - Agent reports:
Janus/.agents/reports/2026-05-13-*.md(7 reports covering all gaps and sprints)
What comes next
The struct-info materializer sweep (Phase 2 of the optslice consumer audit) is in progress. The audit found zero proxy bugs and two canonical missing sites (emitOptionalUnwrap and emitErrorUnionUnwrap for slice payloads). The registerFatPtrStructInfoIfApplicable helper landed today. The remaining Phase 2 tasks close the Optional and ErrorUnion slice-payload gaps, after which ?[]const u8 and ![]T work without special-casing. Sprint N+1 continues from there.
- V.