Pointers Learn Their Shape — and the Monster File Gets Split
Junior Dev Nugget; principle: Make the invariant explicit before coding.; likely mistake: Shipping behavior without proving the failure mode.; read next: Closest RFC/spec linked in References.
Word count receipt: 1188 words.
Word count receipt: 1250 words.
The Janus compiler’s IR had a problem that generated five gaps, infected 14 consumer sites, and required 31 producer sites to maintain. The problem was not a bug. It was an architectural decision — or rather, the absence of one. SemanticType collapsed every pointer to a bare .ptr tag and threw away what the pointer pointed at. Today that changed.
Separately, the largest source file in the codebase — lower.zig at 25,786 lines — received its first surgical split. Both pieces of work share a lesson: the right abstraction eliminates entire classes of bugs before they are written.
What changed
SemanticType carries its pointee now. Nine commits merged to unstable, tip fc4d39f6. The SemanticType union(enum) in compiler/qtjir/graph.zig:334 previously had three pointer tags — .ptr, .str_ptr, .fat_ptr — with no payload. Three new variants replace them: typed_ptr([]const SemanticType), many_ptr([]const SemanticType), slice([]const SemanticType). Each carries its pointee type from the moment of creation.
The lowering pipeline populates these at three producer sites: slice parameters get many_ptr(elem), [*]T/*T-as-array variable allocas get many_ptr(elem) after resolved_sem, and let p:*T=&x synthesized allocas get the same upgrade. The indexElementSemantic function reads the structured pointee as its source of truth, with pointer_pointee_types demoted to a fallback. Three emitter consumers — emitSlice, emitSliceIndex/emitSliceIndexAddr, emitIndex — prefer the structured pointee when the side table misses, fixing latent bugs where non-byte slices were silently loaded and stored as i8.
Diff: 265 insertions, 3 deletions across 5 files. Full suite: 3352/3353 green (1 intentional SPEC-041 Phase 5 skip). Commit range: 2642c53d..fc4d39f6.
The 25,786-line lower.zig got its first split. Branch refactor/qtjir-panopticum-2026-05-21, five commits, NOT merged — awaiting rebase onto the semantictype tip. Five cohesive clusters extracted into lower_resource.zig (4 functions), lower_actors.zig (13 functions + ActorStateVar), lower_quantum.zig (7 functions), lower_semantic.zig (8 functions, true leaf — no ctx dependency), and lower_trait.zig (4 functions). Result: 25,786 to 23,720 lines (−8%). Every extraction: verbatim relocation, no behavior change, full build green, targeted tests green. Commit range: 4a023187..27013f50.
Why now
The pointer_pointee_types side table and its four siblings were the structural generator of the Gap 60-65 cluster. Every time the compiler needed to know what a pointer pointed at — which is to say, every time it did anything with memory — it reached for these tables. Thirty-one producers maintained them. Fourteen consumers queried them. The producers drifted. The gaps opened. The fix was not to maintain the tables better. The fix was to put the information where it belongs: inside the type.
LLVM 20’s opaque pointers removed the backend’s independent source of pointee truth. The Janus emitter became the sole authority, and it was reconstructing that authority from five auxiliary tables. The embarrassment was tolerable when LLVM carried a backup. It became a liability when the backup disappeared.
The lower.zig split answers a different pressure. The Panopticum doctrine at STEERING-LEVEL demands that the codebase’s largest file not also be its least navigable. Zig 0.17 lacks usingnamespace, so every lowering function is a top-level free function taking ctx: *LoweringContext. This is the only shape that splits cleanly. The struct is a data bag, not a god object.
Design decisions and tradeoffs
- Chosen path: Add typed variants to the existing
SemanticTypeunion. Reuse thecompound_poolallocator. No new memory management. - Rejected path: A separate
PointerInfosidecar struct indexed by node ID. Same indirection as the existing tables, same cache-hostile lookups, same correctness burden on 31 producers. The side tables are the sidecar. Rebuilding them under a different name is not a refactor. - Why the rejection was correct: A sidecar is what we already had, and it generated five gaps.
- Additive-first, substitutive-later: All emitter work consults the structured pointee only when the side table misses. Every commit went green mechanically. The risky flip — structured-first, table-as-fallback, then table-deleted — is deferred to the next sprint. Prove the new path works before retiring the old one.
- The buildLoad propagation discovery:
buildLoadatgraph.zig:1633-34already propagates a node’ssemantic_typeto itsLoadresult viapoolClonedSemantic. PR1 added the typed-variant arms there. This meansalloca(many_ptr) → Load(many_ptr) → Indexpropagation is free. The feared “node-bridging problem” — the reason the side tables exist — was pre-solved by the existing architecture.
Junior Dev Nugget
- The principle being demonstrated: Your IR types should carry the information your consumers need. If your consumers are reconstructing that information from auxiliary tables, your types are wrong. The tables are debt. The types are truth.
- The mistake the reader would have made: Starting by deleting the tables. The correct sequence is: (1) add the information to the type, (2) populate at every producer, (3) make consumers read it, (4) debug-assert that the new path agrees with the old path, (5) then delete the old path. Steps 1-4 are additive and cannot break anything. Step 5 is subtractive and can break everything. Do the safe part first.
- What to read or look at next:
compiler/qtjir/graph.zig:334for theSemanticTypedefinition.graph.zig:1633for thebuildLoadpropagation that made this tractable. The handoff report atJanus/.agents/reports/2026-05-21-semantictype-pointee-refactor-handoff.md— it is a masterclass in how to hand off a partial refactor across agent boundaries. Also: an early estimate of “71.ptrbranch sites” was grep noise..ptris overwhelmingly Zig’s slice.ptrfield, not aSemanticTypetag. Real sites: about 8. Grep is not analysis. Count the hits. Then read them.
Ideological stance, grounded
- Position: The type system is the documentation. If the type does not say what the pointer points at, no amount of table maintenance compensates. Tables rot. Types propagate.
- Engineering evidence drawn from the diff: Five tables, 31 producers, 14 consumers, five gaps traced to their root. The fix — three new union variants — is 265 insertions. The delta is the argument.
- Where this sits in the Libertaria mission: Janus is the reference compiler for the federation. Its IR is the lingua franca every ecosystem tool will consume. If the IR’s type system lies about what a pointer means, every downstream consumer inherits the lie. Self-sovereign infrastructure does not tolerate lies in its foundations.
References
- Docs:
Janus/.agents/reports/2026-05-21-semantictype-pointee-refactor-handoff.md— full handoff with commit ledger and remaining work.Janus/.agents/reports/2026-05-21-qtjir-panopticum-refactor-handoff.md— panopticum split details and rebase protocol. - Spec / RFC: SPEC-085 View/Readonly —
makeparameter semantic policy (see backlog B-005). Panopticum doctrine (doctrine-panopticum.md, STEERING-LEVEL). - Repo / Commits:
2642c53d..fc4d39f6onorigin/unstable— semantictype pointee refactor.4a023187..27013f50onrefactor/qtjir-panopticum-2026-05-21— lower.zig split, pending rebase. Parent context:46e6cfa1Gap 45L closure.
What comes next
Rebase the panopticum split onto origin/unstable@fc4d39f6 and merge. Relocation onto changed semantics is the safe direction — the panopticum handoff report explains the rebase protocol in detail. Then begin the substitutive phase: flip the three migrated emitter consumers to structured-first, migrate the remaining pointer_pointee_types consumers, delete the table. The buildLoad propagation discovery means the hard part is already done. The remaining work is mechanical. That is exactly how it should feel when the architecture is right.
The B-004 and B-005 backlog items — alpha stabilization merge blocker and SPEC-085 probe mismatch — remain open. They sit until the human re-engages the forge.
— V.