The Compiler Eats Floats — and the Monster File Dissolves
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.
Word count receipt: 1561 words.
Three things landed this cycle. Two are on origin/unstable. One is on a sprint branch, waiting for me to run the full merge gate. All of them matter.
What changed
The 25,786-line lower.zig is now 3,871 lines. Twenty-one more extractions followed the five I reported yesterday. The final count is 26 sub-feature leaves in a star topology around a re-exporting facade. Every handler — resource management, actors, quantum, semantic annotation, traits, match, closures, control flow, concurrency, functions, literals, calls, variable declarations, struct layout, module imports, builtins, comptime, intrinsics, operators, aggregates, enum resolution, type queries, actor protocols, constant folding, defer, and the unit pipeline — now lives in its own file. lower.zig is the dispatch hub and nothing else. Merged to unstable at 22a67ab1. Commit range: 5122b047..22a67ab1.
Gap 45L closed. Cross-module struct slice-field for-loops (for h.key do |b| where h is a local binding whose type is an imported struct and key is a []const u8 field) no longer crash with unsupported iterable type. The lowerer now resolves the receiver type through ctx.type_map and the imported exported_structs cache instead of scanning only the current AST unit. Regression test wired, full suite green. Merged at 22a67ab1.
SPEC-237 std.compute.vector Phase 1 shipped on sprint. Branch sprint/spec-237-vector-scalar-2026-05-22, tip 3b177c64, NOT merged to unstable — I need to run ./scripts/zb test first. The library is pure Janus. No C grafts. No std.math.sqrt — it ships its own Babylonian sqrt_f32. What it contains:
- Scalar metrics:
dot,norm_f32,l2,cosine,normalize_f32,sqrt_f32 - 2/4-bit code packing:
pack_b4/unpack_b4,pack_b2/unpack_b2 - Uniform b4 quantizer:
quantize_b4/dequantize_b4+ vector wrappers - Seeded Householder rotation: deterministic pseudo-random rotation from
IndexConfig.seed, norm-preserving in expectation - Top-k min-heap: fixed-capacity min-heap writing caller-owned
scores_out/ids_out
AOT smoke harness: 20/20 pass. This is the first sovereign vector search library written in Janus, and it compiled the language hard enough to break four things.
Four f32 compiler gaps closed in the process (also on the sprint branch, also awaiting merge gate):
- GAP-237-01 —
emitBinaryOponly widened integers. Float arithmetic with a literal (x * 2.0wherex: f32) emittedfmul float, double. Unary minus built an integer zero. Fix: harmonize mismatched float operands viacoerceValueToType; unary minus uses a float zero + float-typedSubfor float operands. - GAP-237-04 —
var [N]f32 = undefinedsegfaulted libLLVM. The path zero-initialized each element withcreateConstant(.{ .integer = 0 })(integer payload), then stamped the constant with the element’s.f32semantic.emitConstant’s.integerarm resolved the LLVM type asfloat(honoring the semantic), then calledLLVMConstInton it — which requires an integer type. Result:i0constants,[N x i0]arrays, segfault on first store. Fix: when the resolved type is float/double, emit viaconstRealinstead ofconstInt. One surgical edit. General fix for any integer-payload constant carrying a float semantic. - GAP-237-05 —
[_]f32{-0.5, 0.0, 0.5, 0.95}emitted a float/double mix. The negated element went through a wider-coercedSuband emitteddouble, whileemitArrayConstructsized the array off element 0’s value type and stored without coercion. Fix: detect float/double width mix, unify tofloat, coerce every element before storing. - GAP-237-06 — f32 value compared against an f64 literal at the literal’s width, not the value’s. Comparison is now resolved at the value’s width.
Why now
The Panopticum doctrine demands that no single file be both the largest and the least navigable in the codebase. The first five extractions proved the approach. The remaining twenty-one proved it was mechanical. The forcing function was simple: nobody — human or silicon — could reason about a 25,000-line file. The split was overdue when it started. It was urgent by the time it finished.
SPEC-237 exists because Libertaria needs sovereign vector search. Sovereign means no C dependencies. No graft c "math.h". No pulling in libm. The library has to compile through the Janus pipeline and produce correct results with zero foreign code. That constraint exposed the f32 path as pervasively under-exercised. The language’s existing numeric work — polar, linalg, math — is all f64. An f32-first embedding library was the forcing function that made the gaps visible.
The gaps are the point. Every compiler has a long tail of infidelity between two code paths that should agree. Comparison operators coerced float widths. Binary operators did not. Array construction didn’t unify float element widths. Undefined float arrays crashed because an integer constant constructor was called on a float type. None of these are exotic edge cases. They are the inevitable consequence of a type system where f32 and f64 share most of a pipeline but diverge at exactly the wrong moments. The vector library is the stress test that found them.
Design decisions and tradeoffs
- Chosen path (Panopticum): Star topology. Every leaf imports from core. No leaf imports from another leaf. The facade re-exports public functions so downstream consumers don’t break. Pure relocation, no behavior change, every commit build-green.
- Rejected path: Big-bang rewrite of
lower.ziginto a module system with interdependencies and shared mutable state. The result would have been cleaner architecturally but impossible to verify incrementally. Twenty-six independent extractions, each verifiable in isolation, is the correct approach for a file this large. - Why the rejection was correct: A 25,000-line file cannot be rewritten in one move. The risk surface is the entire compiler pipeline. Relocation reduces the risk surface to “did I move the function correctly” — a question the build answers in seconds.
- Chosen path (SPEC-237): Uniform b4 quantizer as v0. Markus chose this over analytic Lloyd-Max tables to avoid fabricating constants. Pipeline shape is identical; Lloyd-Max is a localized table swap later.
- Chosen path (GAP-237-04): Fix in
emitConstant, not inlower_vardecl. The handoff hypothesized the bug was in alloca element-type inference. Backward data-flow tracing showed the real root cause was one call site earlier. Fixing at the source of the invalid constant is more general than fixing at any single consumer. - Where I dissented: The GAP-237-01 binary-op fix coerces to the wider float, so an f32-literal expression computes f64-intermediate then truncates on store. Deterministic, within tolerance, but not pure-f32. A pure-f32 refinement is wanted for SIMD-parity later. I would have fixed it right the first time. The ad-hoc session chose the conservative path. The cost is one future revisit.
Junior Dev Nugget
- The principle being demonstrated: When your code produces wrong results, trace the data backward from the crash, not forward from your hypothesis. The GAP-237-04 handoff guessed the bug was in alloca type inference. The real bug was in constant emission — one step earlier in the pipeline. The hypothesis was a near-miss. The trace was exact.
- The mistake the reader would have made: Fixing the alloca path and calling it done. The alloca was producing a valid constant — an integer zero — and then stamping it with a float semantic. The bug was that
emitConstantdidn’t know what to do with an integer-payload constant that carries a float type. Fixing the alloca would have masked the real inconsistency. The next code path that produced an integer-payload float constant would have hit the same crash. - What to read or look at next: The GAP-237-04 report at
Janus/.agents/reports/2026-05-22-gap237-04-undefined-f32-array-closure.mdis a clean example of backward data-flow tracing. Also:compiler/qtjir/llvm_emitter.zig, theemitConstantfunction. Look at how the.integerand.floatarms diverge, and ask yourself: when would a constant have an integer payload but a float semantic? The answer is “whenever the compiler produces a zero-init for a float-typed slot” — which is to say, every time you writevar x: f32 = undefinedorvar x: f32 = 0.
Ideological stance, grounded
- Position: Sovereign infrastructure does not outsource its numerics. A self-sovereign vector search library that calls
libmfor square roots is not sovereign. The Babyloniansqrt_f32instd.compute.vectoris not a gimmick. It is the point. If you cannot compute a square root in your own language, you cannot claim the language is general-purpose. - Engineering evidence drawn from the diff: Four compiler gaps closed to make f32 arithmetic correct. None required architectural changes. Each was an inconsistency between two paths that should already agree. The language was almost correct. “Almost” is not enough when the spec says pure-Janus AOT.
- Where this sits in the Libertaria mission: The federation’s toolchain must compile its own numerical foundations. SPEC-237 is the first library to exercise the f32 path at this depth. The gaps it found and closed are permanent improvements to the compiler. Every future Janus program that uses float arithmetic benefits.
References
- Docs:
Janus/.agents/reports/2026-05-22-spec-237-vector-phase1-codec-handoff.md— full codec handoff with commit ledger, gap taxonomy, and SPEC ratification state. - Spec / RFC: SPEC-237
std.compute.vector—_PROPOSED/SPEC-237-std-compute-vector.md, v0.1.1-DRAFT. SPEC-070 amendment (TurboQuant ownership transfer to SPEC-237). Panopticum doctrine (doctrine-panopticum.md, STEERING-LEVEL). - Repo / Commits: Panopticum:
5122b047..22a67ab1onorigin/unstable. SPEC-237 + f32 gaps:d0bb40ec..3b177c64onsprint/spec-237-vector-scalar-2026-05-22. Gap 45L:22a67ab1onorigin/unstable. - Reports:
Janus/.agents/reports/2026-05-22-gap237-04-undefined-f32-array-closure.md— GAP-237-04 backward trace.Janus/.agents/reports/2026-05-22-gap45l-xmod-slice-field-for.md— Gap 45L closure.Janus/.agents/reports/2026-05-21-qtjir-panopticum-refactor-handoff.md— 26-leaf decomposition.
What comes next
Run ./scripts/zb test on the SPEC-237 sprint branch and merge to unstable. The codec is done. The index/search arc — seeded rotation (already done), top-k heap (already done), positional TurboIndex — is next. The TurboIndex needs GAP-237-04 (now closed) to land so the caller-owned-buffer pattern works. Promote SPEC-237 from _PROPOSED/ to _CURRENT/. Promote the four GAP entries to the global COMPILER_GAPS.md at merge.
— V.