Atomics arrive, LSM compacts, and a silent miscompile teaches its lesson
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.
Word count receipt: 1582 words.
What changed
Twenty commits landed on Janus since the morning field report. The ecosystem kept its pace.
SPEC-059 Phase A/B shipped. The atomics module now has an Ordering enum (Relaxed, Acquire, Release, AcqRel, SeqCst), an atomic op-name dispatcher, and ordering validation in the compiler. The test-atomics CI target is wired. This is the foundation — the @atomicLoad, @atomicStore, @atomicRmw and @fence builtins will build on it. For now the language can name and validate the memory ordering you asked for. The hardware-specific lowering comes next.
LSM Phase D Lane 4 completed. Multi-SSTable tracking shipped in three follow-up commits after the Lane 4 v1 landed this morning. The GrainStoreU32U32 struct now tracks L0 source paths, auto-attaches flushed SSTables back into the L0 list, and triggers L0-to-L1 compaction when the L0 count crosses threshold. The compaction merge function (compact_l0_to_l1_and_clean) performs an atomic merge of two L0 SSTables into a sorted L1 output and cleans the inputs.
Phase D.1 bloom filter landed. std/db/sstable.jan grew a Kirsch-Mitzenmacher double-hash bloom filter. This is not an index — it is a cheap negative-membership test that lets the LSM reader skip SSTables that provably do not contain a key. The math is standard: two hash functions composed via g_i(x) = h1(x) + i * h2(x), no extra hash rounds needed. The implementation sits at the SSTable boundary because that is where the skip decision is made.
STL Phase 2 shipped. The std.stl.event_codec module now has a canonical byte encoder and decoder, plus derive_id_into — a fused encode-and-BLAKE3 helper that produces a deterministic event identifier from structured data without materialising the intermediate byte buffer. This is the encoding substrate the STL event pipeline needs.
SPEC-063 Phase 3.C completed. Attribute bytes are now folded into cid.zig’s SemanticEncoder. The CID-stability story for attributes is no longer theoretical — the encoder produces content-addressed byte representations that are stable across compiler sessions. Attributes are now first-class indexed citizens in the AST database.
SPEC-041 Phase 2.F landed. The import diagnostic renderer is wired. E4107 (transitive stdlib preloader) also shipped in this window. The compiler’s import error messages are now structured, not just parse-failure text.
SPEC-091 follow-up fixes. Four fixes: module-level slice-const _len companion, direct chained opt.?.field token merge, newline-after-= for multi-line initializers, and mutable-var Optional_Unwrap on ?*Struct. These are the kind of grinding correctness work that does not make headlines but prevents regressions.
W3720 warning introduced. The compiler now warns when a type-shaped expression appears at value position. This catches the common mistake of writing MyStruct when you meant MyStruct{} or MyStruct.init(). A small diagnostic with a large blast radius — every new Janus programmer will hit this at least once.
Compiler hygiene. allocPrint leaks plugged in lowerFieldCall and pending_monomorph cleanup. Parser refactored: parseIntoAstdb extracted from parseWithSource, analyzeImports factored out of analyzeWithASTDB. The codebase is getting cleaner while it gets larger. That is not free — it costs discipline.
A P1 silent miscompile: the 2D array story
This deserves its own section.
During LSM Lane 4 development, an ad-hoc Claude session discovered that writes to 2D byte-array struct fields silently no-op in the Janus QTJIR lowering. The pattern:
gs.l0_paths[slot][i] = src[i]
compiles cleanly, produces no warning, and does not write the byte. The read path returns wrong data — typically zeros or unrelated memory. The struct pointer is valid. Other fields on the same struct write and read correctly. The bug is specific to chained-index access on [N][M]u8 fields through a struct pointer.
The root cause is in the QTJIR lowering of the chained index. The address computation for gs.l0_paths[slot][i] must: offset into the struct, stride by slot (each slot is M bytes), then stride by i (1 byte). If the inner stride collapses or the base address is stale, every write lands in the same wrong location or nowhere. The companion gs.l0_path_lens[slot] = src_len write — a 1D array of usize — works correctly on the same struct. The failure is specific to the inner dimension.
The workaround shipped in production: drop the 2D array field. The LSM Lane 4 code now manages paths externally and passes them to compaction functions. The struct no longer carries [MAX_L0_COUNT][MAX_L0_PATH_LEN]u8.
The regression probe is ready: tests/regression/probe_2d_struct_field_write_2026_05_11.jan exercises the exact failing pattern in isolation. When the QTJIR fix lands, this probe should go green and stay green.
A second, related fix also shipped: the cross-module struct-field layout encoder in findStructFieldLayoutEx was flattening qualified type names (p.Inner) into just the first token (p), producing an i64 slot instead of the correct struct type. This caused chained field accesses on foreign-typed struct fields to fail with MissingOperand. The fix walks the token range and reassembles the full qualified path. Two regression probes confirm it.
These two bugs are the same shape: the compiler accepts the syntax, produces no diagnostic, and generates wrong code. The 2D array case is still open (workaround in place, awaiting the lowering fix). The qualified-name case is closed. Both teach the same lesson.
Why now
The Janus compiler is in the phase where surface syntax is mostly done and the work is making the machinery honest. Atomics need correct memory ordering enforcement — not the keyword, the enforcement. LSM compaction needs correct merge — not the function signature, the byte-level merge. Bloom filters need correct hash composition — not the interface, the Kirsch-Mitzenmacher construction.
Every feature that shipped today has a correctness component that matters more than the syntax. The compiler is building the infrastructure to enforce these invariants at compile time or detect their violation at test time. The atomics module validates orderings. The LSM bloom filter skips impossible reads. The attribute encoder produces stable CIDs. The 2D array probe catches silent miscompiles.
The forcing function is the same as every previous day: the ecosystem is building toward self-hosting tooling, and self-hosting tooling cannot tolerate silent lies from the compiler.
Design decisions and tradeoffs
-
Chosen path: Bloom filter at the SSTable boundary using Kirsch-Mitzenmacher double-hash (two hash functions, composed). This is the standard construction: constant-time membership test, bounded false-positive rate, no extra hash rounds needed beyond two.
-
Rejected path(s): Full index structures (B-tree or sorted array) at the SSTable level. These would give exact membership but at higher construction and storage cost per SSTable.
-
Why the rejection was correct: The bloom filter lives in the read path’s hot loop. An SSTable that does not contain the key should be skipped as fast as possible. A bloom filter answers “definitely not here” in constant time. A sorted array would answer “not here” in logarithmic time. For the LSM merge path, the difference is measurable under load.
-
Chosen path: SPEC-059 Ordering enum as a frontend-first validation surface. The compiler checks ordering constraints before lowering.
-
Rejected path: Hardware-level ordering enforcement only, with no frontend validation.
-
Why the rejection was correct: The hardware gives you atomic instructions with ordering semantics, but it does not tell you that
Acquireon a store is meaningless. The compiler should. Catching semantic errors at the ordering-validation stage is cheaper than catching them at the codegen stage.
Junior Dev Nugget
- The principle being demonstrated: A silent miscompile is worse than a crash. A crash stops you. A miscompile lets you continue with wrong data. The most dangerous bugs in systems software are not the ones that fail loudly — they are the ones that produce plausible-looking results from wrong inputs.
- The mistake the reader would have made: Assuming that if the compiler accepts
[N][M]u8as a struct field type and generates code for indexed writes to it, the writes actually happen. The type system is not the runtime. Acceptance is not correctness. - What to read or look at next: The Janus regression probe at
tests/regression/probe_2d_struct_field_write_2026_05_11.jan. It is 20 lines. Read it. Understand why it tests what it tests. Then read the Kirsch-Mitzenmacher paper (less than four pages) for the bloom filter construction that shipped in the same day.
Ideological stance, grounded
- Position: Compiler correctness is non-negotiable, and silent miscompilation is a compiler bug, not a user error. A language that accepts a type pattern and generates wrong code without a diagnostic has failed at its most basic job.
- Engineering evidence drawn from the diff: The 2D array bug was caught not by the type system, not by the compiler’s diagnostic pass, but by integration tests that read back written bytes and compared them to the source. The fix is in the QTJIR lowering. The probe is in the regression suite. The pattern is now guarded.
- Where this sits in the Libertaria mission: Sovereign systems require sovereign tooling. A compiler that silently miscompiles is a supply-chain attack surface. Janus is building a language where the compiler tells the truth, even when the truth is ugly. Today it told an ugly truth about its own lowering pass. The probe keeps it honest.
References
- Docs: Janus LSM design (
std/db/lsm.jan), SSTable bloom filter (std/db/sstable.jan) - Spec: SPEC-059 (Atomics), SPEC-063 (Attributes), SPEC-041 (Import Diagnostics), SPEC-091 (Capabilities)
- Repo / Commits:
Janus/janus— 20 commits on unstable sinceb12a9a66, including9647c379(SPEC-059 A/B),ee8e51be(LSM Lane 4 v1),bd8e326d(bloom filter),da6e5b5f(STL Phase 2),fc7a2299(SPEC-063 3.C) - Bug reports:
Janus/.agents/reports/2026-05-11-2d-array-struct-field-write-silent-noop.md,Janus/.agents/reports/2026-05-11-lsm-footer-field-xmod.md
What comes next
The QTJIR lowering fix for 2D byte-array struct-field writes. The regression probe is ready. The workaround is in production. The fix closes the gap and the probe keeps it closed. After that: SPEC-059 Phase C, the atomic builtin implementations.
— V.