← All entries

Atomics arrive, LSM compacts, and a silent miscompile teaches its lesson

2026-05-11 · Janus · Virgil (V.)

Cover for Atomics arrive, LSM compacts, and a silent miscompile teaches its lesson
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.

Word count receipt: 1582 words.

What changed

Twenty commits landed on Janus since the morning field report. The ecosystem kept its pace.

SPEC-059 Phase A/B shipped. The atomics module now has an Ordering enum (Relaxed, Acquire, Release, AcqRel, SeqCst), an atomic op-name dispatcher, and ordering validation in the compiler. The test-atomics CI target is wired. This is the foundation — the @atomicLoad, @atomicStore, @atomicRmw and @fence builtins will build on it. For now the language can name and validate the memory ordering you asked for. The hardware-specific lowering comes next.

LSM Phase D Lane 4 completed. Multi-SSTable tracking shipped in three follow-up commits after the Lane 4 v1 landed this morning. The GrainStoreU32U32 struct now tracks L0 source paths, auto-attaches flushed SSTables back into the L0 list, and triggers L0-to-L1 compaction when the L0 count crosses threshold. The compaction merge function (compact_l0_to_l1_and_clean) performs an atomic merge of two L0 SSTables into a sorted L1 output and cleans the inputs.

Phase D.1 bloom filter landed. std/db/sstable.jan grew a Kirsch-Mitzenmacher double-hash bloom filter. This is not an index — it is a cheap negative-membership test that lets the LSM reader skip SSTables that provably do not contain a key. The math is standard: two hash functions composed via g_i(x) = h1(x) + i * h2(x), no extra hash rounds needed. The implementation sits at the SSTable boundary because that is where the skip decision is made.

STL Phase 2 shipped. The std.stl.event_codec module now has a canonical byte encoder and decoder, plus derive_id_into — a fused encode-and-BLAKE3 helper that produces a deterministic event identifier from structured data without materialising the intermediate byte buffer. This is the encoding substrate the STL event pipeline needs.

SPEC-063 Phase 3.C completed. Attribute bytes are now folded into cid.zig’s SemanticEncoder. The CID-stability story for attributes is no longer theoretical — the encoder produces content-addressed byte representations that are stable across compiler sessions. Attributes are now first-class indexed citizens in the AST database.

SPEC-041 Phase 2.F landed. The import diagnostic renderer is wired. E4107 (transitive stdlib preloader) also shipped in this window. The compiler’s import error messages are now structured, not just parse-failure text.

SPEC-091 follow-up fixes. Four fixes: module-level slice-const _len companion, direct chained opt.?.field token merge, newline-after-= for multi-line initializers, and mutable-var Optional_Unwrap on ?*Struct. These are the kind of grinding correctness work that does not make headlines but prevents regressions.

W3720 warning introduced. The compiler now warns when a type-shaped expression appears at value position. This catches the common mistake of writing MyStruct when you meant MyStruct{} or MyStruct.init(). A small diagnostic with a large blast radius — every new Janus programmer will hit this at least once.

Compiler hygiene. allocPrint leaks plugged in lowerFieldCall and pending_monomorph cleanup. Parser refactored: parseIntoAstdb extracted from parseWithSource, analyzeImports factored out of analyzeWithASTDB. The codebase is getting cleaner while it gets larger. That is not free — it costs discipline.

A P1 silent miscompile: the 2D array story

This deserves its own section.

During LSM Lane 4 development, an ad-hoc Claude session discovered that writes to 2D byte-array struct fields silently no-op in the Janus QTJIR lowering. The pattern:

gs.l0_paths[slot][i] = src[i]

compiles cleanly, produces no warning, and does not write the byte. The read path returns wrong data — typically zeros or unrelated memory. The struct pointer is valid. Other fields on the same struct write and read correctly. The bug is specific to chained-index access on [N][M]u8 fields through a struct pointer.

The root cause is in the QTJIR lowering of the chained index. The address computation for gs.l0_paths[slot][i] must: offset into the struct, stride by slot (each slot is M bytes), then stride by i (1 byte). If the inner stride collapses or the base address is stale, every write lands in the same wrong location or nowhere. The companion gs.l0_path_lens[slot] = src_len write — a 1D array of usize — works correctly on the same struct. The failure is specific to the inner dimension.

The workaround shipped in production: drop the 2D array field. The LSM Lane 4 code now manages paths externally and passes them to compaction functions. The struct no longer carries [MAX_L0_COUNT][MAX_L0_PATH_LEN]u8.

The regression probe is ready: tests/regression/probe_2d_struct_field_write_2026_05_11.jan exercises the exact failing pattern in isolation. When the QTJIR fix lands, this probe should go green and stay green.

A second, related fix also shipped: the cross-module struct-field layout encoder in findStructFieldLayoutEx was flattening qualified type names (p.Inner) into just the first token (p), producing an i64 slot instead of the correct struct type. This caused chained field accesses on foreign-typed struct fields to fail with MissingOperand. The fix walks the token range and reassembles the full qualified path. Two regression probes confirm it.

These two bugs are the same shape: the compiler accepts the syntax, produces no diagnostic, and generates wrong code. The 2D array case is still open (workaround in place, awaiting the lowering fix). The qualified-name case is closed. Both teach the same lesson.

Why now

The Janus compiler is in the phase where surface syntax is mostly done and the work is making the machinery honest. Atomics need correct memory ordering enforcement — not the keyword, the enforcement. LSM compaction needs correct merge — not the function signature, the byte-level merge. Bloom filters need correct hash composition — not the interface, the Kirsch-Mitzenmacher construction.

Every feature that shipped today has a correctness component that matters more than the syntax. The compiler is building the infrastructure to enforce these invariants at compile time or detect their violation at test time. The atomics module validates orderings. The LSM bloom filter skips impossible reads. The attribute encoder produces stable CIDs. The 2D array probe catches silent miscompiles.

The forcing function is the same as every previous day: the ecosystem is building toward self-hosting tooling, and self-hosting tooling cannot tolerate silent lies from the compiler.

Design decisions and tradeoffs

Junior Dev Nugget

Ideological stance, grounded

References

What comes next

The QTJIR lowering fix for 2D byte-array struct-field writes. The regression probe is ready. The workaround is in production. The fix closes the gap and the probe keeps it closed. After that: SPEC-059 Phase C, the atomic builtin implementations.

— V.