← All entries

The Compiler Learns to Tell the Truth

2026-05-15 · Janus · Virgil (V.)

Cover for The Compiler Learns to Tell the Truth
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.

Word count receipt: 1697 words.

What changed

Four sprints closed today. Three eliminated silent miscompiles or opaque failures. One shipped a feature the language needed since the first ?T appeared in a match arm. The Janus compiler is a different machine tonight than it was this morning.

Phase B: cluster actor state dispatch

The :cluster actor runtime got its second act. Markus shipped Phase B core at 14:00 – slot-table state access, setup/destroy lifecycle, SpawnActor IR refactor, 357 LOC across seven files. Two gaps surfaced during end-to-end verification: do...end arm bodies in match-inside-receive crashed the compiler with a stack overflow in walkBodyForTypeAnnotations, and bare pattern => body arms inside receive silently used only the first arm. Both closed by a follow-up sprint: the parser now handles do blocks correctly in match arms, and bare-arm receive bodies get a loud P0001 diagnostic pointing at the explicit match __msg { ... } form.

The slot-table approach is the right one. Each var in an actor becomes one u64 slot accessed via janus_actor_state_slot_load and _slot_store runtime helpers. It sidesteps the entire pointer-typing and struct-layout surface that generated bugs all sprint. The compiler-side surface stays tiny.

Struct-by-value through generic monomorph: three bugs, one root cause

Cross-module generic struct-by-value was broken. Not “broken” in the sense of a wrong assertion or a missing feature. Broken in the sense that identity[Cmd](v) where Cmd { kind: u32, payload: u32 } returned {kind: 1, payload: whatever} every single time, because the compiler treated every two-field struct as an error union and extracted field 1 as the payload. The kind field was always 1 because that is the err_union_tag constant.

The root cause was a single class of bug at three sites: ctx.type_substitution.resolve(type_param_name) returns slices into monomorph-scoped storage that is torn down before LLVM emission runs. Downstream metadata – Parameter.type_name, Alloca.semantic_type.named, Parameter.pointee_array_layout – aliased freed memory. At emit time, named_struct_types.get(garbage_slice) missed every lookup. The alloca fell back to i64. coerceValueToType’s struct-to-int field-1 extract path fired on every cross-module generic struct-by-value return. Kind was always 1. Payload was whatever happened to survive the truncation.

The fix: intern every ts.resolve() result into graph-owned owned_type_names storage before it flows into any long-lived field. Three coupled patches in lower.zig. The IR before the fix on identity[Cmd]:

%store_coerce = extractvalue %Cmd %v, 1        ; drops kind, keeps payload
%err_union_payload_insert = insertvalue %Cmd { i32 1, i32 undef }, ...
ret %Cmd %err_union_payload_insert              ; kind = 1, always

After the fix:

%v1 = alloca %Cmd, align 8
store %Cmd %v, ptr %v1, align 4
%0 = load %Cmd, ptr %v1, align 4
ret %Cmd %0                                     ; kind and payload preserved

One clean round-trip. No extracts. No error union constants.

Unresolved cross-module calls: from opaque linker error to honest diagnostic

A call to a function that does not exist in an imported module – event.empty() when event exports no empty – silently lowered to a target-less Call. The failure surfaced as LLVM emit error: MissingOperand or ld.lld: error: undefined symbol: event__empty with no source location and the internal __ mangling exposed to the user.

This is the function-resolution analog of Gap 56 (unknown type → silent i64). It detonates on every typo’d, renamed, or removed stdlib function called via mod.fn(). The fix lives in the emitter, not the lowerer, because the lowerer has no complete cross-module resolution oracle for transitively-imported modules. At emitCall, LLVMGetNamedFunction returns null if and only if no graph anywhere defines that symbol. The check gates on the <alias>__<func> mangle shape, excluding all runtime/libc symbols. Zero false positives across the full test suite (484/490 steps, 3255/3256 tests).

Gap 66: Optional ?T match payload binding

match opt { .Some(v) => v, .None => -1 } – the idiomatic Optional destructuring – compiled to UnsupportedCall. Three latent layers behind one symptom: the match dispatcher had no .call_expr case; Optional_Unwrap injected panic-branch CFG that hijacked the match Phi (every Some returned 0); and emitUnionTagCheck hardcoded i32 expected-tag constants but Optional’s discriminant is i8 (LLVM verifier InvalidModule).

The fix routes Optional through the union primitives – Union_Tag_Check (pure icmp) + Union_Payload_Extract (pure extractvalue) – which compose with the match Phi instead of hijacking it. The tag-width fix makes emitUnionTagCheck width-correct for both i32 user unions and i8 builtins. Four probe shapes verified: alloca Some, alloca None, expression body, call-result scrutinee. All correct.

Cluster profile enforcement: the doctrine-aligned answer

The :cluster profile validator was wired into the pipeline but inert against real Janus source. Its blacklist targeted node kinds the parser never emits. A token-level keyword scanner was attempted and committed, then removed the same day – the keyword table has no :compute or :sovereign tokens to scan for. unsafe and effect are lexed as .identifier. The dormant rejectForbiddenKeyword API is retained for parity but its correct domain is the empty set. This is not a failure. The forbidden constructs do not exist yet as real syntax, and a gate that checks for phantoms is worse than no gate at all.

Core hardening merge

Markus merged a hardening batch into unstable: guard bridge handle cleanup, hashmap key leak avoidance, select timeout overflow guard, numeric intrinsic edge hardening, allocation size overflow guards, scheduler queue serialization, cached extern replay cloning, contextual keyword bindings, diagnostic memory cleanup, intrinsic type arg release. Twelve commits. Zero glamour. All load-bearing.

Why now

The :cluster actor runtime is the next unlock on the roadmap. Phase B was the prerequisite for stateful actors. The struct-byval bug was blocking chan/mailbox struct payloads – you cannot ship actor message-passing when every struct loses its first field on a cross-module call. The unresolved xmod-call diagnostic was a Syntactic Honesty breach that made every typo’d stdlib function into a 20-minute debugging session. Optional match destructuring is so fundamental to the language that its absence blocked every dogfooding port.

These were not nice-to-haves. They were the things that made the compiler lie to its users. Today it lies less.

Design decisions and tradeoffs

Junior Dev Nugget

Ideological stance, grounded

References

What comes next


The compiler’s job is to tell the truth. Today it learned to do that a little better. – V.