Changelog

All notable changes to mino, following Keep a Changelog format.

v0.422.1 — Amalgamate Symbol Collision Fix

Renames the static kw_match helper in src/prim/bits.c to bits_kw_match. The amalgamate pipeline concatenates every .c file into a single dist/mino.c translation unit, where the identically-named static in src/prim/module.c collided. Per-file TUs in the standalone build link cleanly without the rename, so local tests passed but release-gate (which exercises amalgamate) caught it post-push.

v0.422.0 — Post-Canon-Gaps Close-Out

No runtime change. Three sections of the C API stay UNSTABLE through the alpha series and are explicitly marked (MINO_UNSTABLE_THREADPOOL, MINO_UNSTABLE_GC, MINO_UNSTABLE_ALLOC_PROFILE); the rest of src/mino.h is the stable surface.

v0.421.0 — C++ RAII Wrapper + API Audit

New optional header src/mino.hpp ships thin RAII wrappers over the C API for C++ embedders:

Header-only, C++14-compatible, no external dependencies. The full C API stays the canonical surface; the wrapper is opt-in. A worked example lives at use-cases/cpp_raii.cpp in mino-examples.

src/mino.h top-of-file disclaimer rewritten: the core surface is feature-complete pending a user-approved v1.0.0 tag; three sections stay UNSTABLE through the alpha series and are explicitly marked (MINO_UNSTABLE_THREADPOOL, MINO_UNSTABLE_GC, MINO_UNSTABLE_ALLOC_PROFILE). Adds the post-1.0 strict-SemVer policy statement.

v0.420.0 — Documentation Refresh

Companion-repo refresh covering the v0.409-v0.419 series:

No runtime change.

v0.419.0 — Bit-Syntax Cycle Close

Companion-repo refresh for the MINO_BYTES + bit-syntax surface shipped through v0.415-v0.418:

No mino runtime code change; the version bump pins the companion-repo refresh to the same banner as the rest of the cycle.

v0.418.0 — let-bits Destructuring Macro

let-bits binds a sequence of named fields from a bytes value at running bit offsets. Each segment is a vector [symbol & options] matching the bits-get surface (:size, :type, :endian, :signed?). The terminating :type :bytes segment without an explicit :size binds the remaining bit-aligned tail.

(let-bits [packet [a :size 16] [b :size 32 :endian :little] [tail :type :bytes]] (do-stuff a b tail))

Also fixed a pre-existing bug surfaced while wiring let-bits: (apply hash-map (rest some-vec)) produced {:first-key nil} because the prim walker did not force the lazy tail that prim_rest_step returns for vector seqs. The walker now forces.

v0.417.0 — Bit Syntax + Chunked Bytes Seq

Erlang-inspired bit-syntax surface for binary-data construction and field-level read access. All three new operations live on top of MINO_BYTES so anything that satisfies bytes? or bitstring? works through them.

Internally mino_bytes_seq now produces a MINO_CHUNKED_CONS spine of 32-element chunks matching how vector seq works, so map/filter/take/drop/reduce propagate chunkedness end- to-end on bytes values. (chunked-seq? (seq bytes)) is true.

(byte-array ...) accepts a lazy seq (e.g. (byte-array (range 10))) and materializes the bytes on the fly.

v0.416.0 — MINO_BYTES Sequence Surface

Follow-on to v0.415's core MINO_BYTES type: dispatch sites that previously threw on bytes values now treat the value as a sequence of unsigned 0..255 ints.

Extended tests/bytes_test.clj to cover every new dispatch path.

v0.415.0 — MINO_BYTES Value Type

New first-class value tag for immutable binary data. (byte-array ...) now returns a MINO_BYTES value (was a host-mutable MINO_HOST_ARRAY of byte kind). aset on the result throws :eval/state -- mino's persistent-value model excludes in-place writes here. The bit-syntax surface that builds on the forward-compatible layout ships in a follow-on release.

Cross-state clone deep-copies the byte buffer so the destination state owns its own storage.

v0.414.0 — inst? / inst-ms / #inst Reader Literal

#inst "..." literals now read into the canonical component map produced by clojure.instant/read-instant-date, with {:mino/instant true} attached as metadata so (inst? ...) detects it. (inst-ms inst) returns epoch millis using the component-map decode (no host Date dependency).

The compat_test.clj inst-ms-throws test (which asserted the always-throw behavior of the old stub) is now inst-ms-rejects-non-inst and verifies the new contract on a non-inst argument; the rest of the inst surface is exercised in the new tests/inst_test.clj.

v0.413.0 — clojure-version + AOT-Compiler Dynvars

*clojure-version* and (clojure-version) were already wired; this release adds the JVM AOT-compiler dynvars so user code that binds them around load / compile calls no longer throws:

mino has no AOT compiler, so the dynvars have no observable effect beyond being bindable. The naming and defaults match JVM Clojure exactly so audit-shaped tests like clojure_coverage_test.clj see the surface they expect.

v0.412.0 — JVM Statics + Embedded-Host Remap

Two complementary layers ship under one file (src/prim/jvm_statics.c) so JVM-Clojure preambles work in ported mino code without touching the rest of the runtime:

Layer 1 — pure value & math statics. Bindings like Long/MAX_VALUE, Math/PI, Math/sqrt, Integer/parseInt, Double/parseDouble, Double/isNaN, Boolean/parseBoolean, java.util.List/of / Set/of / Map/of, String/valueOf, Character/toString. Math methods that already exist as clojure.math/* C primitives are aliased there so Math/sqrt and clojure.math/sqrt have identical behavior. Math/round, Math/abs, Math/min, Math/max are JVM-shape variants (return-type contract differs from clojure.math's float-returning round) and live in this module.

Layer 2 — embedded-host semantic remap. JVM Class/Member references that mino has a real native equivalent for are routed to the existing mino primitive: System/currentTimeMillis, System/nanoTime, System/getenv, System/exit, Thread/sleep, java.util.UUID/randomUUID, java.util.UUID/fromString. The Clojure-level UX matches JVM; the implementation underneath is mino-native.

System/currentTimeMillis returns a long-typed epoch millis count sourced directly from the host wall clock (clock_gettime CLOCK_REALTIME on POSIX, GetSystemTimeAsFileTime on Windows). The return type matches the JVM contract ((integer? ...) succeeds).

System/getProperty throws a clearly classified MHO001 because mino has no JVM system-properties table; the message points the caller at System/getenv or the embedder's capability surface.

(str uuid) was fixed in passing: it now emits the bare 36-char canonical form (matching JVM's UUID.toString) instead of the #uuid "..." reader-literal form. That makes (java.util.UUID/fromString (str u)) round-trip and aligns mino's str semantics with JVM Clojure's. The random-uuid-basic test in compat_test.clj was updated to assert the version digit at the new (canonical) position 14.

v0.411.0 — math-context Rounding Modes

*math-context* now accepts the full JVM Clojure set of :rounding-mode keywords on top of the previously shipped :half-up default:

Implementation: a single apply_rounding_mode helper that returns the carry decision from (rmode, remainder, divisor, q_rounded, q_neg), called from both the "loop hit a non-zero remainder" path and the "exact division but result has more sig digits than the configured precision" path (the latter case was previously short- circuited before any rounding consideration). Mode resolution lives in resolve_math_context; an unrecognised keyword raises a clearly classified MHO002 that names the supported set.

v0.410.0 — Print Dynvars Completion

Wires the remaining JVM Clojure print-side dynamic vars into mino's per-state print cache. *print-readably*, *print-meta*, *print-dup*, *print-namespace-maps*, and *flush-on-newline* all have their JVM-canon defaults (true, false, false, false, true) and respond to (binding [...] ...) at the boundary of any top-level pr / print / pr-str call.

All five share the resolve-once-per-call cache pattern introduced for *print-length* / *print-level*: a print_dynvars_saved_t struct holds the prior cached values across the call so nested invocations nest cleanly.

v0.409.0 — Lazy-Seq Namespace Scoping Fix

Fixes a long-standing P0 runtime bug where the body of a lazy-seq resolved unqualified ns-level symbols against the *realizer's* current namespace instead of the namespace in which the lazy-seq form was lexically written. The symptom was a MNS001 "no var" failure whenever a library defined a (lazy-seq ...) whose body called other ns-level helpers and the resulting lazy was forced from a different namespace (the common case for any library that hands a lazy back to its caller).

The runtime now snapshots the defining namespace on every MINO_LAZY cell at construction time — across the tree-walker (eval_lazy_seq), the bytecode interpreter's OP_MAKE_LAZY, and the JIT's make_lazy_slow slow-path helper. lazy_realize installs the captured namespace as both current_ns and fn_ambient_ns around the body's eval_implicit_do and restores on the way out, mirroring the namespace dance the fn-apply path already performs for fn bodies. The MINO_LAZY layout grew one const char * slot (pointing into the interned namespace name table; no new GC root).

The closure-capture workaround in lib/clojure/test/check/generators.clj was reverted in the same commit: helpers (rose-val, rose-children, vec-rose-tree, etc.) are again referenced as plain ns-level symbols. Regression coverage is in tests/lazy_seq_ns_scope_test.clj (the suite forces lazies from foreign namespaces and asserts unqualified helper resolution succeeds, including the recursive and anonymous-fn-inside-lazy cases).

v0.408.0 — Close-the-Gaps Cycle Banner

Banner tag for the close-the-gaps cycle, which closed the known JVM-Clojure-canon-parity gaps over v0.401 – v0.407:

Wins explicitly kept (NOT touched) because mino's lack of a JVM class hierarchy / primitive-long perf constraint / Java type hierarchy makes them more correct rather than less: auto-promoting + / - / * / inc / dec, one float tier, regex-as-source equality, cross-type compare, structured-map catch of any value, pr-str of a LazySeq realizes, cooperative go, single-version STM, no JVM-class-hierarchy (is (thrown? <Class> body)).

One real runtime bug surfaced and is logged in .local/BUGS.md: lazy-seq bodies resolve ns-level symbols against the realizer's *ns* rather than the defining ns. The test.check port works around this via closure-capture; the proper fix (capture defining_ns on MINO_LAZY and restore during lazy_realize) is queued for a separate cycle.

v0.407.0 — Parallel `r/fold`

clojure.core.reducers/fold learns the parallel branch. When the host has granted thread budget ((mino-thread-limit) > 1) and the collection is a vector larger than the chunk-size hint, fold partitions the vector into chunks, runs the reducer in parallel via future over each chunk, and combines partial results with combinef. Smaller vectors and non-vector collections still reduce sequentially through (reduce reducef (combinef) coll).

The chunk count is capped at (thread-limit - 1) so the parallel branch never exceeds the host's grant. The user-supplied n is treated as a minimum chunk size; if count / max-chunks would be smaller than n, the implementation grows the chunk to fit the budget.

send-via remains intentionally deferred: mino's per-state eval lock means agent actions always run under state_lock, so there's no useful Executor surface to expose. The existing MST008 error message points users at send/send-off for the same dispatch behavior. Promoted from "will ship" to "confirmed deferred" in this release per the no-fakery rule.

v0.406.0 — `test.check` Rose-Tree Shrinking

clojure.test.check now produces rose-tree-backed generators and a walker in quick-check that descends greedily into the first failing child at each level. The first node whose children all pass is the minimal-counterexample tip; quick-check's result map now carries a :shrunk sub-map with :smallest, :depth, and :total-nodes-visited.

Shrinkers implemented:

bc/compile.c constructor-lane lowering now uses clojure.core/vector, clojure.core/hash-map, clojure.core/hash-set rather than bare symbols, so a user namespace that declares :refer-clojure :exclude [vector] still gets functioning vector literals.

mino's lazy-seq currently resolves ns-level symbol references in the body against the REALIZER'S *ns*, not the defining ns. This is a real runtime bug logged in .local/BUGS.md; the generators.clj shipper-side workaround is closure-capture of every helper fn before the lazy body. The bug fix (capture defining_ns on MINO_LAZY) is queued for a separate cycle.

v0.405.0 — `hash-combine`, `*math-context*`, `unchecked-long` Wrap

Three numeric / bigdec edges close at once:

hash-combine ships at canonical 32-bit Boost-style: seed ^= hash + 0x9e3779b9 + (seed << 6) + (seed >> 2), truncated to the low 32 bits via unchecked-int. Matches the JVM Clojure shape bit-for-bit so user code that manually composes hashes via this helper sees the same result on mino as on JVM.

*math-context* and with-precision wire bigdec rounding. The context value is a map {:precision N :rounding-mode :half-up}; with-precision is the canonical macro (with-precision N [:rounding mode] body...) defaulting to :half-up. mino_bigdec_div consults the dynvar on each call and either rounds the quotient to N significant digits half-up (matching BigDecimal.divide(divisor, MathContext) in JVM) or, when no context is in scope, keeps the historical exact-or-throw behavior.

Other rounding modes (:down, :up, :floor, :ceiling, :half-down, :half-even, :unnecessary) throw the structured :mino/host error MHO002 with the mode name. Implementing them properly needs per-mode rounding rules; deferred rather than shipped as fakery.

unchecked-long now wraps bigint arguments outside the signed long range modulo 2^64 (interpreted as two's-complement signed long). (unchecked-long -9223372036854775809N) returns 9223372036854775807 instead of the previous clamp through double. Matches JVM Clojure's wrap. Implementation reads the low 32-bit mp_digit pair from imath's internal digits[] array, applying the sign via uint64 negation.

tests/numeric_edges_test.clj covers all three.

v0.404.0 — `pcalls`, `pvalues`, and `alt!` Macro

pcalls and pvalues round out the parallel-evaluation surface alongside the existing pmap. Both fall through pmap, so the mino-thread-limit <= 1 sequential fallback also applies here.

clojure.core.async/alt! ships as a macro over alts! with the canonical clause shape:

``clojure (alt! ch ([v] (str "got " v)) [out-ch val] ([_] :put-ok) :priority true :default :nothing) ``

Plain-expression clauses, single-binding ([v] ...), and two-binding ([v c] ...) are all supported. mino doesn't gate alt! to a surrounding (go ...) block — like alts!, it works on the calling thread.

tests/parallel_calls_test.clj covers pcalls, pvalues (including lazy realization), and every alt! clause shape.

v0.403.0 — `*print-length*` / `*print-level*` Dynvars

pr, prn, print, println, and pr-str now consult two new ^:dynamic vars when walking collections: *print-length* truncates each collection (vector, list, map, set, chunk, chunked-cons) to N items with ... filling the rest, and *print-level* replaces collections nested deeper than the limit with #. Both default to nil (no limit). Cached on the state at the top of each call so the dynvar lookup cost is paid once per pr, not per value walked.

Late-placement in the mino_state struct so the stencil-pinned offsets for ic_gen, bc_regs, and jit_invoke_ctx stay byte-stable.

New tests/print_dynvars_test.clj covers vector, list, map, set truncation; nested collapse; combined length+level; post-binding restore; and locks in the print-method multimethod extension shape that was already present from prior cycles.

*print-readably*, *print-meta*, *print-dup*, *print-namespace-maps*, and *flush-on-newline* remain unimplemented. Each requires structural printer changes (alt-form emission, ^meta prefix, alt routing, key-namespace analysis, sink flush gating); they're queued for a follow-on cycle rather than shipped as no-op declarations.

v0.402.0 — `thrown-with-msg?` Assertion

clojure.test learns to dispatch on (is (thrown-with-msg? <re> body)) and the JVM-shaped (is (thrown-with-msg? <Class> <re> body)). The class symbol is documentation-only (mino has no class hierarchy); the regex is matched against the thrown value's message. mino pulls the message via a small exception-message-for-match helper that handles strings, :mino/message-shaped maps (the catch-bound diagnostic), :message-shaped maps (ex-info-style), and falls back to pr-str.

tests/arity_strict_test.clj gains coverage for the new assertion shape. Port-from-JVM test code that uses thrown-with-msg? now lights up; the older (is (thrown? body)) shorthand keeps working unchanged.

The ClojureDocs allowlist drops the two merge:3 / merge:4 entries that were holdovers from an earlier version of merge-with. mino's current merge produces JVM-identical output for both vector+vector and vector+string corpus cases — the allowlist counted 2 entries higher than necessary.

v0.401.0 — Strict Arity Verified and Locked In

mino has been strict on function arity for a long time — both the tree-walker (bind_simple_params / bind_params) and the bytecode VM (mino_bc_run) throw MAR001 / MAR002 on missing or extra args, matching JVM Clojure's ArityException contract. This release ships a new tests/arity_strict_test.clj that locks the guarantee across every call shape: fixed-arity, variadic-with-rest, multi-arity dispatch, defn, apply, macros, map/vector destructuring, and the BC hot-loop path. The diagnostic carries :mino/kind = :eval/arity and one of MAR001 / MAR002 so port-from-JVM code can dispatch on the structured form.

No behavioral change. The opening tag of the close-the-gaps cycle.

v0.400.0 — `unchecked-*` Family Coerces the Full Numeric Tower

The diff probe surfaced (unchecked-byte 1N) and (unchecked-int -5/3) in the example preambles — JVM Clojure's narrowing casts and -int arithmetic family coerce bigints, ratios, and bigdecs via intValue() / longValue(). mino was throwing on those.

Now unchecked-byte / -short / -int / -long / -char / -float / -double and the seven -int arithmetic variants accept the full numeric tower: floats truncate toward zero, bigints in long-long range pass through directly (out-of-range routes through double for deterministic clamping), ratios and bigdecs go through tower_to_double then truncate.

The long-domain family unchecked-add / -subtract / -multiply / -inc / -dec / -negate keeps its strict integer-only contract so (unchecked-add 1.0 2) still throws as before.

v0.399.0 — PersistentQueue (Two-List Persistent FIFO)

A new MINO_QUEUE value type lands as the mino-native equivalent of clojure.lang.PersistentQueue. Backed by two cons-spine lists (front in deque order, back in reverse-deque order) so conj and pop are both amortised O(1) and the queue is fully persistent.

The canonical empty value is bound to clojure.lang.PersistentQueue/EMPTY so the JVM-style idiom (conj clojure.lang.PersistentQueue/EMPTY ...) works. The predicate queue? ships alongside, and type returns :queue.

conj, peek, pop, count, seq, first, rest, empty, empty?, =, and hash dispatch on MINO_QUEUE element-wise (or via the deque-order seq). seq_iter_init flattens a queue into a cons-spine for uniform iteration through reduce / into / map. The print form is #queue [a b c].

The ClojureDocs probe drops empty:1, :2, :4, :5. The :0 and :3 entries stay because their JVM-side expected output embeds Object.hashCode (#<PersistentQueue clojure.lang.PersistentQueue@20>) which a non-JVM runtime cannot reproduce.

v0.398.0 — Reader-Conditional `:preserve` Round-Trips Through `pr-str`

pr-str on a value carrying :mino/reader-conditional or :mino/tagged-literal meta now emits the canonical reader-syntax shape (#?(...) / #?@(...) / #tag form) rather than the underlying map. The reader was already building these values under (read-string {:read-cond :preserve} ...); the printer just hadn't been taught to round-trip them.

The ClojureDocs probe drops reader-conditional:0, :3, :4, and reader-conditional?:1. New tests cover both splicing forms, round-trip idempotence (str → read → pr-str → read), the clojure.edn/read-string :preserve default, and a direct (reader-conditional form splicing?) builder check.

v0.397.0 — `clojure.core.reducers` (Sequential)

A new bundled namespace lands at lib/clojure/core/reducers.clj, gated by a new MINO_CAP_REDUCERS capability bit included in MINO_CAP_DEFAULT. The surface mirrors the JVM canon: r/map / filter / mapcat / remove / take / take-while / drop / drop-while / flatten / cat / monoid / foldcat / reduce / fold.

Each transformer builds a transducer-shaped eduction over the source collection, which is reduced through the standard reduce path. The fold variant is sequential — mino does not yet ship the JVM fork/join machinery that backs parallel reducers; fold reduces left-to-right via reduce regardless of the partition size arg. Parallel fork/join is queued for the multi-state OS-thread cycle.

v0.396.0 — `spec.alpha`: `:via` Propagation, `explain-str` Format, `s/&`

explain-data problems now carry the registered spec name in their :via path. explain* consults the spec's ::name and conjes it onto the via accumulator when present, so every problem reported inside that spec inherits the surrounding spec's identifier. Matches the JVM contract: (s/explain-data ::s 0) produces problems with :via [::s].

explain-str now formats each problem as "<val> - failed: <pred-abbrev> [in: ...] [at: ...] [spec: ...]", matching Clojure's canonical explain-printer output. The pred is abbreviated (namespace qualifier dropped) so clojure.core/string? renders as string?.

s/& ships as a regex op that wraps a regex spec with additional predicates applied to the conformed sequence value. All preds must return truthy or the consume yields ::invalid. unform delegates through the inner regex.

The ClojureDocs probe drops explain-data:1,2, explain-str:1, &:0, and the first two map-of entries. The remaining map-of:3-5 allowlist entries stay (the :3 example asserts a JVM #object map print; the :4,5 examples need a generator backend).

v0.395.0 — `spec.alpha`: `unform`, `conformer`, `with-gen`

unform (the inverse of conform) now ships as a multimethod dispatching on ::kind. For pred / wrap / and / or / nilable / tuple / every / keys / regex specs the inverse is implemented in terms of the spec map; for the unspecified case (regex &, custom conformer specs without an explicit unform-fn), unform falls back to identity.

conformer builds a spec from an arbitrary conform fn and an optional unform fn. The conform fn must return :clojure.spec.alpha/invalid when the value does not conform; otherwise the returned value is the conformed value.

with-gen attaches an alternative generator fn to a spec; the generator is stored on the spec map's ::gen key. The fn is only realised when gen is called against the spec (still requires a generator backend, deferred).

The ClojureDocs probe drops unform:0-3, conformer:0, and with-gen:0. Tests cover cat-shaped sequence reconstruction, the default identity unform, conformer's fn-driven conform/unform, and the ::gen attachment.

v0.394.0 — `unchecked-*` Narrowing Casts and -Int Arithmetic

The seven JVM-style narrowing casts now ship: unchecked-long, unchecked-int, unchecked-short, unchecked-byte, unchecked-char, unchecked-float, unchecked-double. Each accepts an int or float operand, truncates toward zero (with clamping at long range), then reinterprets the result at the target width. Bigint / ratio / bigdec inputs throw with a clear "use int/float/double" message; widening those throws-into-coerces is queued for a later cycle when a real use case appears.

The seven -int arithmetic variants are also added: unchecked-add-int, unchecked-subtract-int, unchecked-multiply-int, unchecked-inc-int, unchecked-dec-int, unchecked-negate-int, and unchecked-remainder-int. Each does 32-bit two's-complement wraparound (matching JVM's int width). The non-suffixed members of the family (unchecked-add, etc.) remain 64-bit long-domain wraparound — they are the long opt-in.

The ClojureDocs probe's twenty unchecked-* allowlist entries are removed. New tests cover the cast boundary, in-range arithmetic, 32-bit wrap, and the JVM-canonical INT_MIN % -1 = 0 case.

v0.393.0 — Regex Nested Alternation and Group Quantifiers

(foo|bar), (foo|bar)+, and (foo)+ now work end-to-end. The top-level | alternation landed earlier in v0.281.0, but nested alternation inside groups was a known follow-up. matchpattern now detects compound-atom groups (groups whose body has internal | OR whose matching close is followed by */+/?/{n,m}) and delegates to a recursive matchgroup_loop that tries each branch with proper backtracking and repeats the body with greedy-then-backoff semantics.

Simple groups with no internal alternation and no trailing quantifier keep flowing through the original skip-and-record path so internal matchplus/matchstar keep seeing the post-group atoms for natural backtracking; (.+)/(.+) and similar canonical idioms stay green.

v0.392.0 — Binding / Thread-Binding Sweep

The variadic Clojure-canon binding-family functions land on top of the existing C primitives: bound? and thread-bound? are now variadic fns that AND over every var argument; with-bindings is a macro that pushes a binding frame and pops it in a finally; push-thread-bindings and pop-thread-bindings are direct passthroughs to the new C primitives push-thread-bindings* and pop-thread-bindings*. The C with-bindings* and the new push-thread-bindings* accept maps keyed by vars (the Clojure-canon shape) in addition to symbols and strings — the var's unqualified name is extracted automatically.

The single-arg internal helper that backs bound?'s root-binding check was renamed from bound? to -var-root-bound?, with a companion -thread-bound? doing the per-var thread-stack check. Both leading-dash names mark these as the internal building blocks the public Clojure-level wrappers depend on.

The coverage test no longer counts bound?, thread-bound?, with-bindings, push-thread-bindings, pop-thread-bindings, or get-thread-bindings as JVM-only — they're portable now. Core's coverage is 422/424 (99%); the two missing names (apropos, doc) ship under clojure.repl in mino.

v0.391.0 — `with-redefs` Evaluates Temp-Values in Parallel

Clojure's with-redefs evaluates every temp-value expression BEFORE any var is rebound, so a later binding-value naming an earlier-listed var sees that var's pre-redef value. mino was generating the rebind sequence as nested alter-var-root calls with each temp-value expression nested inside its own (fn [_] new-val), so the second and later temp-value exprs ran AFTER the first rebind had already taken effect. The macro now binds each new value to a gensym before any rebind fires, so the evaluation order matches the JVM. The ClojureDocs probe's with-redefs:2 allowlist entry is removed; a new test-with-redefs-parallel-eval covers the case directly.

v0.390.0 — Public-Surface Audit and Docstring Sweep

Behaviour-preserving cleanup pass over the embedder-visible surface in preparation for the next batch of canon-edges work. The three MINO_UNSTABLE_* blocks in src/mino.h (host thread pool / GC tuning / allocation profiler) now state explicitly that they stay UNSTABLE through the 0.x alpha series, so a reader of the header understands which surfaces are committed to and which still drift.

docs/ARCHITECTURE_CONTRACT.md and docs/INTERNAL_MODULE_MAP.md had drifted onto an old _t-suffixed naming for the embedder-visible types (mino_state_t etc.). The public typedefs are unsuffixed (mino_state, mino_env, mino_val, mino_ref, mino_repl); the docs are corrected to match the header. Truly-internal _t-suffixed types (mino_vec_node_t, hamt_entry_t, root_env_t, dyn_binding_t, ...) keep their suffix. Section 4's claim that mino is single-threaded per state was tightened: each state defaults to a thread limit of 1, and the host opts in to multi-thread by raising the limit. A stale sbuf_t reference in the ownership list was dropped — the clone path no longer carries a named buffer struct.

No source code, no public symbols, no behaviour changes.

v0.389.14 — `is-thrown` Correctly Checks the Body, Not the Type Symbol

(is (thrown? Exception body)) used to silently pass whenever the type symbol was unbound: the macro spliced (rest expr) -- which includes both the type symbol and the body -- into a do block, so evaluating Exception itself threw and the catch arm registered a pass before the body ever ran. Tests that expected the body to throw were therefore green regardless of what the body actually did.

The macro now detects the JVM-Clojure shape (thrown? <type> <body>...) vs the mino single-arg shorthand (thrown? <body>) by arity and skips the type symbol when present. mino has no exception class hierarchy, so the type symbol is treated as documentation; any throw from the body counts as a pass.

v0.389.13 — Lock-Invariant Assert on Regex Prim Entries

The regex engine in src/regex/re.c uses file-static globals (re_flags, re_g_state) for match state. All current callers go through re-find, re-matches, split, or str-replace -- each runs inside mino_call, which holds state_lock, so the globals are effectively serialized in practice. The contract was not enforced anywhere, so a future refactor that elides state_lock around one of those prims would silently corrupt the next match's capture spans. Add MINO_ASSERT_STATE_SAFE at each prim's regex call site to surface a missing lock at the offending site under a debug build.

v0.389.12 — Agent Spawn-Rollback Hands Off Concurrent Producer's Queue

When pthread_create (or _beginthreadex) refuses the agent worker, the rollback path used to pop our own node, set worker_alive = 0, and decrement thread_count. If a concurrent producer had already enqueued an action into the same pool (they saw worker_alive = 1 and skipped spawning during our brief unlock-then-spawn window), their node was left in the runq with no worker to drain it — (await its-agent) would hang forever unless another send arrived to re-spawn.

The rollback now checks the runq after removing our own node and, if a concurrent producer's node is still queued, retries the spawn once. On retry success the new worker takes over the queue, worker_alive stays armed, and only OUR send reports failure to its caller. On retry failure the rollback proceeds as before (worker_alive = 0, thread_count--), surfacing the host's resource exhaustion.

v0.389.11 — Restore `bc_top` on Catch Unwind

A throw from a deeply-nested fn longjmps to the catching try frame's setjmp, bypassing every intermediate fn's bc_pop_window call. The catch landing pads in both the bytecode VM (OP_PUSHCATCH) and the tree-walker (eval_try) used to restore the catching frame's regs pointer but leave S->bc.bc_top at the deeper post-throw value. Intermediate register slots therefore stayed inside [0, bc_top) for the remainder of the catching fn's lifetime; the GC root walker kept tracing whatever those slots happened to hold, so a long-lived catching loop accumulated stale references and the GC could not reclaim values that the program logically dropped at each catch.

Both landing pads now pop bc_top back to its value at PUSHCATCH / eval_try entry and zero the freed slots, matching the work a normal bc_pop_window would do on a clean return. The bytecode pad reconstructs bc_top from the catching fn's own base + bc->n_regs locals (no struct-field expansion that would shift JIT-pinned offsets); the tree-walker pad snapshots S->bc.bc_top at entry and pops on the longjmp arm and on a catch-handler rethrow.

v0.389.10 — Preserve Empty-String Namespace on Symbols and Keywords

(keyword "" "hi") and (symbol "" "hi") previously collapsed to a no-namespace value because the 2-arg constructors short-circuited the empty-namespace case through the same path as ns == nil. JVM Clojure treats the empty string as a legal namespace and preserves it across (namespace x), (name x), and (str x). mino now matches: the constructor builds the interned data as /<name> when the namespace is the empty string, and the namespace / name accessors detect the leading-/ with ns_len == 0 to recover the empty-string namespace. The bare-slash literal :/ (and the symbol /) keeps its existing reading -- name is "/", namespace is nil -- so the literal path is unchanged.

v0.389.9 — Lock-Invariant Asserts on Shared Tables

The intern table and the record-type registry are shared across host worker threads but mutated with plain stores; the documented contract was "caller holds state_lock" and nothing enforced it. Add a debug-build assert (MINO_ASSERT_STATE_SAFE) at the head of intern_lookup_or_create_ns and mino_defrecord so a missing lock surfaces at the offending call site instead of letting a torn table escape into production.

Two genuine missing-lock call sites were uncovered and fixed by the assert:

v0.389.8 — `eval_try` Catch-Rethrow Protection Hygiene

The catch-with-finally branch was gated on try_depth < MAX_TRY_DEPTH. The intent was to fall back to a "catch-runs-unprotected" path if the inner try frame could not be pushed, but the entry guard at the top of eval_try already rejects any call with try_depth >= MAX_TRY_DEPTH before the body runs, and the unwind path restores try_depth to that pre-entry value -- so the gate was dead defensive code that suggested a codepath that cannot fire. Drop the gate so the inner setjmp always protects the catch handler when a finally exists, and explain why in the comments.

v0.389.7 — Reject `with-meta` / `vary-meta` on a Var

with-meta and vary-meta shallow-copied a Var when handed one, producing a sibling whose root slot diverged from the original on every later (def x ...). The sibling's @ returned the value the var held at copy time forever, decoupled from the namespace binding. JVM Clojure rejects with-meta on a Var for the same reason: clojure.lang.Var is not IObj. mino now follows canon and throws eval/type for both primitives.

Reading meta still works: (meta #'x) returns the synthesized {:ns :name :private :dynamic} map and (alter-meta! #'x f) continues to mutate the var's meta slot in place.

v0.389.6 — `swap!` Barrier Only on Successful CAS

prim_swap_bang's multi-threaded retry loop fired the GC write barrier before every CAS attempt. On a lost CAS the barrier had already added the atom container to the remset (recording an OLD->YOUNG edge that the slot never actually held) and pushed the candidate result onto the in-flight major's mark stack. The spurious remset entry kept the rejected result artificially alive across the next mark cycle until the dirty bit cleared.

The barrier now fires only after a successful publish, matching the shape already used by prim_alter_meta and the watch/validator installers.

v0.389.5 — Atomic Watch and Validator Installs

add-watch, remove-watch, and set-validator! all did a plain load + rebuild + store on the watches map or the validator slot of the reference. Two concurrent installers could each observe the same snapshot, both build a one-entry-wider map (or each compute a different validator update), and the slower thread's publish would overwrite the first -- the user's (add-watch a :k1 f1) would silently disappear when (add-watch a :k2 f2) raced ahead.

The three primitives now share a watchable_slot_cas retry helper that mirrors prim_swap_bang: load the snapshot, build the next value, attempt a CAS, rebuild on loss. The barrier fires only on the winning attempt, so lost retries do not leave stale OLD->YOUNG entries on the remset.

v0.389.4 — `alter-meta!` Actually Atomic

alter-meta! advertised atomic mutation of a reference's metadata but did a plain load + apply + store. Concurrent callers could both observe the same old_meta, both compute a new_meta from it, and the last writer would silently overwrite the first. The docstring promised exactly the contract the implementation did not provide.

The multi-threaded path now runs a CAS retry loop matching prim_swap_bang: load the snapshot, apply f, CAS the slot; on loss, retry with the new snapshot. The barrier fires only after a successful publish so a losing retry no longer leaves stale OLD->YOUNG entries on the remset. The single-threaded path keeps the straight read-compute-write because no other writer can interpose.

v0.389.3 — `aset` Barrier on Host Arrays

aset is the only mutator path that writes a slot of a long-lived GC-managed container in place. Writing a fresh YOUNG value into a host array that had already been promoted to OLD bypassed the GC write barrier: the OLD->YOUNG edge was never recorded in the remset, the next minor reclaimed the YOUNG, and the slot was left pointing at freed memory. The write now routes through gc_write_barrier so the remset captures the edge and the major-mark Dijkstra insertion path observes the publication during an in-flight cycle.

The vector / map / set primitives already barrier their stores; this brings host arrays in line.

v0.389.2 — Lazy-Force Exactly-Once Realization

lazy_force previously read-checked-then-set realized without synchronization. Two threads observing the unrealized state both ran the thunk and tear-published cached/body/env, breaking the exactly-once realization guarantee that clojure.lang.LazySeq provides. The lazy storage realized field is now a tri-state machine (LAZY_UNREALIZED / LAZY_REALIZING / LAZY_REALIZED): the first forcer wins a CAS into the realizing slot, evaluates the thunk, publishes cached, then flips to realized; concurrent forcers spin under mino_yield_lock until the winner publishes, then return the same value. If the thunk throws, the realizer resets the slot to unrealized so a retry re-runs the thunk, matching JVM behaviour.

Every reader of lazy.realized -- the GC tracer, the seq/equality/walker helpers, the pipeline-fusion thunk identification, the realized? predicate -- now compares against LAZY_REALIZED explicitly, so a mid-realization lazy is never misread as "done".

v0.389.1 — Atomic Transient Owner Mint

mino_transient now mints the per-batch owner ID through an atomic CAS loop instead of a plain pre-increment of S->ns_vars.transient_owner_next. The previous read-modify-write sequence was technically a data race on any code path that called mino_transient from multiple host threads against the same state without serializing through state_lock. If two concurrent mints had ever observed the same counter snapshot they would have published the same owner ID, causing the owner-tagged in-place edits inside hnode_ensure_owned to mutate a persistent subtree that the older collection still referenced — silently breaking immutability for the parent.

The sticky-on-wrap behaviour is unchanged: once the 32-bit counter reaches 0xFFFFFFFFu every subsequent transient takes owner_id = 0 and falls back to the path-copy wrapper, so a 32-bit wraparound never publishes a colliding ID either.

v0.389.0 — Native Channels

clojure.core.async channels are now a C primitive (MINO_CHAN) instead of an atom wrapping a state map. Each offer!/poll!/put!/take! is one C function call; the channel's buffer, pending-putters and pending-takers queues, the closed flag, and optional transducer hooks live in directly-mutable C slots inside the channel cell. The previous implementation stored the same shape inside a script-side map and mutated via swap! on every operation; the per-cycle state-map allocation plus the swap!-body closure dominated benchmark runtime at high iteration counts.

Performance

mino-bench async suite, offer!/poll! row at 5000 iterations:

| | before | after | delta | |--------------------------------|---------|--------|--------| | offer!/poll! on (chan 1024) | 30.32s | 16.01s | -47% | | offer!/poll! on (chan 1) | 30.40s | 19.21s | -37% | | offer! full returns false | 14.12s | 9.24s | -35% | | poll! empty returns nil | 13.61s | 8.70s | -36% |

GC share of wall time on the same rows drops from ~45% to ~34-40% because the per-op state-map alloc and the swap!-body closure are no longer in the hot path.

Internals

API

No user-visible changes. (chan), (chan n), (chan n xform), (promise-chan), offer!, poll!, put!, take!, close!, closed?, chan?, alts!, alts!!, alts-callback, and the go-block compat aliases (chan*, chan-put*, chan-take*, chan-close*, chan-closed?*, offer!*, poll!*, alts*, buf-fixed*, buf-dropping*, buf-sliding*, buf-promise*) all keep their previous shape and contracts. The (type ch) keyword changes from :atom (the channel was an atom) to :chan (it is a channel value now).

v0.388.1 — Multi-Cycle Remset Filter

Bug fix. An OLD container holding a YOUNG child could fall out of the remembered set the cycle after the promote-safety-net add and become invisible to subsequent minors except through the conservative C-stack scan. With a tight nursery and a raised MINO_GC_PROMOTION_AGE value, the YOUNG child could die between minor cycles while the parent slot still pointed at it; the freed slot then got reused by an unrelated allocation, and the next mutator dereference returned that unrelated value instead of the original. The user-visible symptom was unbound symbol: if (or any other special-form name) at script time, because the parent's cons body had effectively been rewritten by allocator churn. ASan was clean, and any fprintf placed on the dispatch path masked the symptom -- a Heisenbug whose appearance depended on which local variables the compiler chose to spill into stack slots that the conservative scan would then pin.

Fix

gc_mark_remset drives a side-channel flag through gc_mark_child_push that records whether each remset entry's trace observed any YOUNG child header. gc_remset_reset filters the remset by that signal: entries with at least one OLD->YOUNG edge keep dirty=1 and ride into the next minor; entries with no remaining OLD->YOUNG edges drop. This extends the one-cycle promotion safety net into a multi-cycle one whose duration is bounded naturally by the time the children themselves take to promote.

The hot-path cost is one load + one conditional store inside gc_mark_child_push, both inside the cache line of mino_state that the surrounding tracer already touches. The walker flag is predicted false outside the remset walk.

Verified

examples/embed_gc_stress.c no longer needs the MINO_GC_PROMOTION_AGE reset workaround at the end of test_param_bounds; the stress test runs the rest of the suite against promotion_age=8 without firing the bug. MINO_GC_VERIFY=1 is clean across the 1371-test suite at four nursery sizes (64 KiB / 256 KiB / 1 MiB / 4 MiB). Release-gate green.

v0.388.0 — Embedder UX: Cycle Close

Final tag of the pre-1.0 Embedder UX cycle. The per-phase tags v0.382.0 through v0.387.1 carry the granular notes; this entry is the cycle's roll-up plus the cookbook + mino-site additions that close the documentation surface.

What this cycle delivered

Across seven minor tags + one patch:

Posture

The header still carries the "UNSTABLE until v1.0.0" line. This cycle prepares the surface for the v1.0 commit; the v1.0 tag is its own dedicated cycle.

The dist/mino.c amalgamation is bit-identical to the source tree at this tag; embedders that vendor from v0.388.0 are working against a stable surface that will not rename or remove public functions until v1.0.

v0.387.1 — Embedder UX: Embedded-Source Canon Pass

Phase 6.5 of the pre-1.0 Embedder UX cycle. Cosmetic / doc- quality follow-up: every embedded mino-source C-string literal in the mino-examples cookbook now uses the canonical Clojure vector-binding shape ((fn [args] ...), (let [...] ...), etc.) instead of the legacy paren-binding shape. mino's reader accepts both, but teaching material should default to canon. No API change in the mino runtime itself; this tag is a marker for the mino-examples sibling repo bump.

v0.387.0 — Embedder UX: API Ergonomics

Phase 6 of the pre-1.0 Embedder UX cycle. Closes the half-dozen UX papercuts that don't warrant their own phase but do warrant fixing before the v1.0 freeze.

Added

Changed

Verification

embed_api_test carries a new Phase 6 block exercising mino_register_fns, the throw :payload round-trip, and mino_clone's named-type failure diagnostic.

v0.386.0 — Embedder UX: Clojure-Canon C Surface

Phase 5 of the pre-1.0 Embedder UX cycle. Adds the C-side peers for canon idioms an embedder hits within their first few hours of binding work.

Added

Verification

embed_api_test carries a new block exercising each entry point: meta round-trip; seq/first/next on vec and list inputs; compare on int pairs and hash agreement across equal vecs; push/pop_bindings with a dynamic var read through the frame; can_clone success / failure paths with reason reporting.

v0.385.0 — Embedder UX: Predicate / Extractor Grid

Phase 4 of the pre-1.0 Embedder UX cycle. Closes the holes in the type-predicate grid and adds the missing extractors so embedders have a complete mino_is_X / mino_to_X surface for every public value type.

Added (predicates)

Eight new predicates land in the consolidated grid block:

Added (extractors)

Six new extractors:

Verification

embed_api_test carries a test_predicate_grid block that exercises every new predicate (with a true-shaped and a mismatching value) and every new extractor (success path + type- mismatch path; buffer-too-small path for the str extractors).

v0.384.0 — Embedder UX: Amalgamation Distribution

Phase 3 of the pre-1.0 Embedder UX cycle. Adds an SQLite-influenced single-file vendor distribution under dist/, the canonical "drop-it-in-your-project" embedding shape.

Added

Build recipe

An embedder vendors dist/mino.c + dist/mino.h into their tree and compiles with no -I paths beyond the vendor directory:

`` cc -std=c99 -O2 -c mino.c -o mino.o cc app.c mino.o -lm -lpthread -o app ``

No transitive header dependencies. No build-system requirement (autotools / cmake / meson / bazel are unneeded). The dist/mino.c is reproducible bit-for-bit from any commit by running ./mino task amalgamate.

Changed

Notes

v0.383.0 — Embedder UX: Type-Name Cleanup (Drop `_t`)

Phase 2 of the pre-1.0 Embedder UX cycle. POSIX 1003.1 reserves the _t suffix for the implementation; mino's public typedefs were squatting in that namespace. This cycle drops the suffix across every public type, matching the SQLite typedef convention (typedef struct mino_X mino_X).

Changed (public types)

| Old | New | |---|---| | mino_val_t | mino_val | | mino_env_t | mino_env | | mino_future_t | mino_future | | mino_state_t | mino_state | | mino_ref_t | mino_ref | | mino_type_t | mino_type | | mino_iter_t | mino_iter | | mino_repl_t | mino_repl | | mino_vec_builder_t | mino_vec_builder | | mino_map_builder_t | mino_map_builder | | mino_set_builder_t | mino_set_builder | | mino_diag_t | mino_diag | | mino_gc_kind_t | mino_gc_kind | | mino_gc_param_t | mino_gc_param | | mino_gc_stats_t | mino_gc_stats_out (the struct; resolves the collision with the mino_gc_stats function) | | mino_jit_mode_t | mino_jit_mode | | mino_jit_capability_t | mino_jit_capability | | mino_thread_pool_t | mino_thread_pool | | mino_capability_info_t | mino_capability_info |

Function-pointer typedefs keep their _fn suffix: mino_prim_fn, mino_prim_fn2, mino_finalizer_fn, mino_host_fn, mino_resolve_fn, mino_tx_xform_fn, mino_tx_body_fn, mino_thread_lifecycle_fn. The _fn is semantic, not a POSIX-namespace squat.

Changed (one name collision)

Migration

Added

v0.382.0 — Embedder UX: Cascade-Completion

Phase 1 of the pre-1.0 Embedder UX cycle. Closes the cascade through examples/, tests/embed_api_test.c, and the bundled- stdlib generator that rotted after the v0.151 capability-bitmask revamp. None of those paths had a CI gate, so the rot was invisible until somebody actually opened the directory.

Added

Changed

Removed (dead references)

Migration

v0.381.1 — Fix: agent dosync-send / await race on Linux

Fixes a long-standing CI hang documented in .local/BUGS.md as "CI hang #2: tests/run_migrated.clj on ubuntu-24.04". The hang was deterministic on GHA ubuntu-24.04 runners (both x86 and arm) and masked on macos-14 plus Apple Silicon Docker. Reproducer: a single (dosync (send a fn)) followed by (await a) would block forever on the very first iteration.

Root cause

agent_worker_ensure released agent_mu between spawning the worker pthread and the caller's subsequent agent_enqueue call. A freshly-spawned worker could win the race to agent_mu, see an empty runq, set worker_alive = 0, and exit before the producer ever published its action. The producer's later agent_enqueue then pushed the node + bumped in_flight into a runq with no consumer; (await a) waited on a condition that nothing would ever signal.

Fix

Merge the worker spawn and the enqueue into a single agent_enqueue critical section. The producer:

1. Takes agent_mu. 2. Decides whether a worker is needed (worker_alive == 0), and if so reserves a thread-budget slot atomically. 3. Pushes the action node and bumps in_flight (still under agent_mu). 4. Releases agent_mu. 5. Calls pthread_create only after the node is published.

The new worker's first agent_mu_lock therefore always observes a non-empty runq for the producer that spawned it. Thread-budget refusals revert the enqueue and free the node before throwing so (await a) is never stranded on an action that no worker will process.

The dead agent_worker_ensure and its inline spawn body are removed. agent_worker_reap_pending stays — mino_agent_quiesce_workers still uses it on state teardown, and the new agent_enqueue calls it under agent_mu (with a state-lock yield around the pthread_join) when the previous worker has exited.

Regression test

Added tests/migrated/dosync_send_repro.clj to mino-tests — five iterations of the dosync-send-await pattern with explicit progress logging. Pre-fix, this hangs after the first [repro] iter 0 before await on Linux CI. Post-fix, all five iterations complete in ~50ms locally and the full Migrated suite passes on every CI runner.

Verification

v0.381.0 — C Micro-refactor: Hygiene, Identity-by-name, Dispatch Tables, God-function Splits

Closes the file/function-level cleanup the architecture cycles deferred. Behavior-preserving throughout — every commit kept the full test suite green (1371 tests, 4828 assertions) and the mino-tests batteries (adv-test 18/18, diff-test 7/7, fault-inject 5/5, test-migrated 483 tests / 3397 assertions).

Hygiene

Stripped tracking metadata out of source comments so they describe intent instead of release history:

Identity-by-name for primitives

Pointer-comparing as.prim.fn to detect canonical prims was fragile: it would treat two distinct argv-only prims (both with fn == NULL) as equal. Switched every identity-check site to the stable registered name:

Dispatch tables / per-handler extraction

Five tag-dispatch switches reshaped so each branch reads as a named unit:

| Switch | File | Shape | |-------------------------|-------------------------------|------------------------------------------| | mino_iter_next | collections/iter.c | {kind, iter_step_fn} table | | read_dispatch | eval/read.c | {char, read_dispatch_fn} table | | mino_print_to | eval/print.c | per-tag print_* helpers, switch driver | | mino_eq | values/val.c | eq_cross_type helper + identity group | | prim_conj | prim/collections.c | per-kind conj_* helpers |

God-function splits

compile_call_impl was the Tier-1 target. 614-line dispatcher shrinks to a 77-line orchestrator with nine try_emit_* helpers (arith unop, n-arity arith, two-arg binop, collection-op arity-2/1/3, keyword-as-fn, IC-cached call, plain call) plus two {name, opcode} tables that fold seven shape-identical 2-arg ops (nth, get, conj, dissoc, conj!, dissoc!, disj!) and three 1-arg read ops (first, count, empty?) into single dispatch entries.

| Function | Before | After | |---------------------------|-------:|------:| | compile_call_impl | 614 | 77 | | main | 488 | 296 | | mino_print_to | 486 | ~80 | | bind_vec_destructure | 193 | 80 | | apply_callable (PRIM) | 54 | 3 |

run_repl extracted from main. apply_prim_cons extracted from apply_callable (with a shared current_call_location helper for the file/line/col lookup that fn.c repeats around every push_frame). vec_destructure_args extracted from bind_vec_destructure — the value-to-cons normaliser is now its own ~80-line helper with three named cases (vector / map-entry / lazy-or-chunked).

The FN/MACRO branches of apply_callable, bc_run_dispatch_from (~1180 lines), and mino_jit_compile_inner are documented as deferred — each carries tail-call trampoline / per-pass-state invariants that are real changes rather than pure motion.

Process notes

The Cycle 2 plan assumed all primitives already used the argv ABI and that fn was kept only for identity tagging. Survey at execution time showed the opposite: 344 of 386 prim entries use the cons-spine fn only. Dropping the fn field is its own multi-file port and was carved out into a follow-up cycle; this release ships the identity-by-name fix that makes that future port mechanical instead of behavioral.

The bc_run_dispatch_from / mino_jit_compile_inner / quasiquote splits are documented in .local/micro-refactor-plan.md as deferred targets — each is sizable structural work that earns its own cycle.

Verification. Full mino test suite green (1371 tests, 4828 assertions). mino-tests adv-test (18/18 probes), diff-test (7/7), test-fault-inject (5/5), test-migrated (483 tests / 3397 assertions) all green after companion fix in mino-tests (gc_generational_test switched to a mapv inc (range N) workload that still exercises the generational invariant under mino's now-fused lazy-seq paths).

v0.380.0 — Architecture Cycle 6c: Splits in numeric / collections / sequences

Closes the cycle 6 deferred work on the three plan-named mega-files. Each file now hosts at least one extracted sub-domain in a separate translation unit:

| Origin file | Lines before | Lines after | Extractions | |---------------------------|-------------:|------------:|----------------------------------------------------------| | src/prim/numeric.c | 2806 | 2191 | numeric_math.c (203), numeric_bit.c (150), numeric_coerce.c (310) | | src/prim/collections.c | 2312 | 2152 | collections_transient.c (174) | | src/prim/sequences.c | 3499 | 3257 | sequences_seq.c (255) |

The extractions follow the cycle-6b pattern: each new file picks up the shared helpers via the existing prim/internal.h forward decls. Where a prim function was previously declared inline next to its definition (prim_math_asin and the rest of the trig+hyperbolic family, log10/log1p/expm1/cbrt, hypot, copy-sign, next-up/down, ieee-remainder, to-radians/to-degrees, signum), the declaration moves to prim/internal.h so the registry table at the bottom of numeric.c keeps compiling against the extracted definitions.

The k_prims_numeric[] / k_prims_collections[] / k_prims_sequences[] registries stay in their original files; the extracted definitions satisfy them through external linkage. prim/install.c is untouched.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.379.0 — Architecture Cycle 4c: Complete mino_state Decomposition

Closes the cycle 4 deferred work. Every field cluster in struct mino_state that the original plan named is now extracted into its own sub-struct, embedded at the byte position the inline fields used to occupy. The stencil-ABI-pinned offsets (sf_*, ic_gen at 47856, bc_regs at 47888, jit_invoke_ctx at 47936, dyn_stack at 25712) survive — verified by the existing _Static_assert guards in src/eval/bc/jit/entry.c, updated to the nested offsetof paths ns_vars.ic_gen / bc.bc_regs / jit.jit_invoke_ctx.

| Sub-struct | Header | Fields | JIT-pinned offset | |---------------------------|------------------------------------------|-------:|-------------------| | gc_state_t | src/gc/state.h | ~40 | — | | stm_subsystem_t | src/prim/stm_state.h | 3 | — | | agent_subsystem_t | src/prim/agent_state.h | 7 | — | | async_state_t | src/async/state.h | 3 | — | | reader_printer_state_t | src/runtime/reader_printer_state.h | 15 | — | | threading_state_t | src/runtime/threading_state.h | 13 | — | | ns_vars_state_t | src/runtime/ns_vars_state.h | 17 | ic_gen (47856) | | bc_vm_state_t | src/eval/bc/state.h | 5 | bc_regs (47888) | | jit_state_t | src/eval/bc/jit/state.h | 4 | jit_invoke_ctx (47936) | | module_state_t | src/runtime/module_state.h | ~17 | — |

The print_depth printer counter stays inline in mino_state ahead of reader_printer_state_t so it packs with the preceding caps_installed (both 4-byte ints) and the sub-struct starts at an 8-aligned offset with no outer padding.

Field accesses migrate to nested-struct paths (S->gc_xS->gc.x, S->ic_genS->ns_vars.ic_gen, and so on across ~600 site renames). The bc and jit sub-struct field names retain their bc_ / jit_ prefixes so the access paths read S->bc.bc_regs / S->jit.jit_invoke_ctx — verbose but unambiguous in matching the JIT-pinned offset names.

.local/cycle-4-followups.md is closed; the remaining documented work is empty.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean. JIT-pinned offset asserts pass.

v0.378.0 — Architecture Cycle 6b: Real Mega-prim Splits

Lands the per-sub-domain mega-file splits that cycle 6 (v0.375.0) had scope-reduced. src/prim/bignum.c shrinks from 1797 lines to 702 lines by extracting two sub-domains into their own files:

| File | Before | After | Notes | |----------------------------|-------:|-------:|--------------------------------| | src/prim/bignum.c | 1797 | 702 | Bigint payload + install only | | src/prim/ratio.c (new) | | 510 | MINO_RATIO arithmetic + printing | | src/prim/bigdec.c (new) | | 625 | MINO_BIGDEC arithmetic + printing |

Shared bigint helpers (to_bigint, bigint_alloc_zeroed, bigint_wrap, mino_double_shortest) move into a new src/prim/bignum_shared.h so the extracted files reach them via a narrow public surface instead of through the umbrella. The k_prims_bignum[] registry stays in bignum.c and references externally-defined prim_bigdec / prim_decimal_p via the existing prim/internal.h forward declarations — no install-time changes.

numeric.c (2806 lines), sequences.c (3499 lines), and collections.c (2312 lines) are still under one roof; their splits follow the same pattern but each carries 40-60 primitives plus its own k_prims_*[] table and is reserved for a dedicated future cycle. Breakdown in .local/cycle-6-followups.md remains current.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.377.0 — Architecture Cycle 4b: Real mino_state Decomposition

Lands the per-subsystem decomposition that cycle 4 (v0.373.0) had scope-reduced. Three sub-structs replace inline field clusters in struct mino_state, each embedded at the byte position the inline fields used to occupy so the stencil-ABI-pinned offsets (sf_*, ic_gen at 47856, bc_regs at 47888, jit_invoke_ctx at 47936, dyn_stack at 25712) are byte-stable:

| Sub-struct | Header | Fields | Field renames | |-------------------|-------------------------|-------:|--------------:| | gc_state_t | src/gc/state.h | ~40 | ~316 sites | | stm_subsystem_t | src/prim/stm_state.h | 3 | ~10 sites | | async_state_t | src/async/state.h | 3 | ~15 sites |

Field accesses migrate S->gc_<x>S->gc.<x>, S->stm_<x>S->stm.<x>, S->async_<x>S->async.<x>. The runtime_layout.h _Static_assert guards in src/eval/bc/jit/entry.c continue to verify the JIT-pinned offsets at compile time; both clusters preserve the byte layout because each sub-struct contains the same fields in the same order as the inline block it replaces.

The secondary GC instrumentation cluster (per-phase timers, pause ring, sampler rings, alloc-by-tag histogram) stays inline in mino_state past jit_hot_threshold, where adding fields is safe. Subsequent decomposition (eval/jit/bc state, threading state, vars/modules state, reader/printer state) follows the same pattern; the deferred breakdown remains in .local/cycle-4-followups.md.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean. No JIT regression (runtime_layout asserts pass).

v0.376.0 — Architecture Cycle 7: Cleanup + Graph Audit

Closes the seven-tag v0.370.0 → v0.376.0 architecture refactor cycle. This release runs the cleanup pass and validates the noumenon graph against the new shape.

Cleanup. Stale .d dependency files for sources that no longer exist were deleted (src/public/public_embed.d, src/public/public_gc.d). No stale meta-comments ("Extracted from X", "Phase N", "Moved from Y") landed in the new header carve-outs from cycles 1-3 — the comments in each new file describe the file's current responsibility, not its move history.

Noumenon graph. After noumenon_update, the cross-component-imports query reports 49 distinct cross-component edges (down from the 117-edge baseline at v0.369.0). The components query shows 43 components total, with the per-cycle shape changes visible:

| Cycle | Tag | Shape change | |------:|----------|-----------------------------------------------------------| | 1 | v0.370.0 | src/runtime/ umbrella carved into 9 per-concern headers | | 2 | v0.371.0 | src/values/ first-class component upstream of collections | | 3 | v0.372.0 | gc <-> collections cycle broken via per-tag tracer table | | 4 | v0.373.0 | gc_state_t type alias (full struct decomposition deferred) | | 5 | v0.374.0 | mino.h audited; sampler-dump pair demoted to mino_internal.h | | 6 | v0.375.0 | Mega-prim files audited; splits deferred | | 7 | v0.376.0 | This release: cleanup + graph audit |

The values component is structurally present (src/values/layout.h, src/values/internal.h, src/values/val.c, src/values/gc_handlers.c) but the noumenon LLM-analyzed components view still folds it into collections until the next analyze pass refreshes the semantic grouping.

Deferred items. Cycles 4 and 6 shipped scope-reduced. The follow-up work is captured in .local/cycle-4-followups.md (mino_state sub-struct extraction) and .local/cycle-6-followups.md (mega-prim splits) with constraints, recommended order, and verification checklists so the work resumes cleanly in a future cycle.

Verification. Full test suite green across every cycle (1371 tests, 4828 assertions). Build clean. No JIT regression observed.

v0.375.0 — Architecture Cycle 6 (Audit, Splits Deferred)

Cycle 6 set out to split the four mega prim/ files into per-domain sub-folders. On audit the split was descoped: each mega file holds 40-60 primitives plus per-file k_<domain>_table[] registries that prim/install.c references; a clean cut needs careful per-function sub-domain attribution, registry split, and shared-helper rehoming.

| File | Lines | Verdict | |-------------------------|-------|----------------------------------------------| | prim/sequences.c | 3499 | Split into sequences/{lazy,reduce,seq_fns,transducer,collection_ops}.c (deferred) | | prim/numeric.c | 2806 | Split into numeric/{tower,math,preds,coerce}.c (deferred) | | prim/collections.c | 2312 | Split into collections/{vec_ops,map_ops,set_ops,seq_ops}.c (deferred) | | prim/bignum.c | 1797 | Keep as-is; the imath wrapper is one logical unit | | prim/agent.c | 1604 | Keep as-is; the agent runtime is one logical unit | | prim/stateful.c | 1469 | Optional split (atoms/volatiles/refs); low priority | | prim/string.c | 1541 | Optional split; weak sub-domain boundaries |

.local/cycle-6-followups.md captures the per-file sub-domain breakdown with line ranges, the registry split plan, the recommended extraction order, and the verification list. The work resumes from that file in a future cycle.

No code change beyond CHANGELOG / version bump.

v0.374.0 — mino.h Audited for Embedder Fit

mino is an embeddable Lisp. A single #include <mino.h> is the contract — every example, JNI/C++ binding, brew/scoop install, and mino-site C API reference depends on the single-header shape. Cycle 5 reaffirms that contract: mino.h stays one header.

This release tightens what mino.h exposes by demoting two runtime-internal safepoint-sampler dumps to mino_internal.h:

These were only called from runtime/state.c's quiesce path and from mino-side primitives in prim/reflection.c. No embedder example or mino-examples cookbook referenced them. They belong with the rest of the diagnostic surface in mino_internal.h.

Everything else stays public: GC stats and tuning, JIT mode and capability, host thread pool surface, capability install, and the allocation profiler (already gated UNSTABLE).

mino.h section banners are unchanged — the existing titles (Version, Value types, Constructors, Collection builders, ...) already make the audience obvious.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean. Examples build.

v0.373.0 — Architecture Cycle 4 (Scope-Reduced)

Cycle 4's original ambition was to decompose struct mino_state into 10 per-subsystem sub-structs. On audit the cycle was descoped: runtime_layout.h static_asserts pin the sf_* block at fixed byte offsets that the JIT stencil ABI bakes in, and several mino_thread_ctx_t and mino_state_t field offsets are also JIT-pinned. A byte-level reorganization needs a dedicated cycle with full GC-stress matrix, bench corpus, and JIT cross-target parity validation at every step.

This ship lands the safe subset:

No code change beyond the new alias header; the deferred work is visible without polluting mino_state itself.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.372.0 — gc/collections Coupling Broken

The last real bidirectional cycle in the architecture, gc <-> collections, is gone. Previously gc_trace_children in src/gc/driver.c was a 280-line switch over every collection node layout (mino_vec_node_t, mino_hamt_node_t, hamt_entry_t, mino_rb_node_t), the bc record (struct mino_bc_fn), and the union body of struct mino_val. gc had to know every downstream component's storage shape.

The dispatch now goes through S->gc_tracers[GC_T__COUNT], a per-tag function-pointer table populated during state init. Each component owns its own tracers and registers them through a small hook:

| Component | Tracer file | Tags | |-----------------|--------------------------------------|-------------------------------------------------| | collections | src/collections/gc_handlers.c | GC_T_VEC_NODE, GC_T_HAMT_NODE, GC_T_HAMT_ENTRY, GC_T_RB_NODE | | eval/bc | src/eval/bc/gc_handlers.c | GC_T_BC | | values | src/values/gc_handlers.c | GC_T_VAL | | gc (residual) | src/gc/driver.c | GC_T_ENV, GC_T_PTRARR, GC_T_VALARR |

The values-side trace_val knows the union body, but the two field accesses that point into downstream-component structs delegate to component-owned helpers:

so values/ never has to import struct mino_bc_fn or struct mino_future layouts.

The same registration pattern covers finalizers: gc_minor_collect and gc_major_sweep_phase call S->gc_finalizers[h->type_tag] instead of hardcoding the mino_type dispatch for MINO_HANDLE / MINO_BIGINT / MINO_RECORD / MINO_CHUNK / MINO_HOST_ARRAY / MINO_FUTURE cleanup.

src/gc/layout.h carves the shared substrate (gc_hdr_t, GC_T_*, GC_GEN_*, GC_PHASE_*, gc_range_t, gc_bump_slab_t, plus the gc_tracer_fn / gc_finalizer_fn typedefs) into a dedicated header that component-side tracers include without dragging the rest of gc/internal.h.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.371.0 — Values Component

The pointer-tagged value system becomes a first-class component upstream of collections and everything else. struct mino_val, the tag-encoding macros, and val.c (constructors, predicates, equality, interning, hashing) are no longer a subdirectory of collections; they live in src/values/.

| New file | Source of content | |---------------------------|----------------------------------------------| | src/values/layout.h | Tag scheme + struct mino_val (from mino_internal.h) | | src/values/internal.h | val.c forward decls (from collections/internal.h) | | src/values/val.c | Identical content of the former src/collections/val.c |

The collection-owned intern_table keeps its body in collections/internal.h (the entries vector is structurally collection storage). It picked up a struct tag so values/internal.h can forward-declare it.

Both umbrella headers re-include the new files, so every existing consumer compiles unchanged. The bootstrap Makefile and the task runner pick up src/values/ via -Isrc/values and the source-glob update.

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.370.0 — Runtime Umbrella Header Split

First step of a multi-tag architectural refactor. src/runtime/internal.h was a 1796-line umbrella that pulled in every other component's internal.h and inlined the runtime's per-thread context, STM, agent queue, future, lock primitives, and forward declarations alongside struct mino_state. Every .c file under src/ reached for it because the path of least resistance was a single include that surfaced everything. The result was a 117-edge cross-component import graph in which most of the coupling was phantom: a collections file appeared to depend on async because the umbrella pulled async in transitively.

This release carves the umbrella into nine focused per-concern headers and switches a first batch of consumers to the narrower includes. struct mino_state itself stays in runtime/internal.h for now — its decomposition is a later cycle, and several inline fast paths (mino_safepoint_poll, mino_lock, mino_unlock) still reach into its fields. The umbrella now forwards the new headers in place; existing consumers compile unchanged.

| New header | What moved in | |-----------------------------|----------------------------------------| | runtime/value_assert.h | tag-debug invariants, mino_type_of, accessors, checked size arithmetic | | runtime/runtime_types.h | env / dyn-binding frames, module / meta / var registry entries, struct mino_env | | runtime/stm_state.h | tx_ref_state_t, tx_state_t | | runtime/thread_ctx.h | mino_thread_ctx_t, TLS externs, try_frame_t+MAX_TRY_DEPTH (moved from eval) | | runtime/agent_queue.h | agent_action_node_t, pool-kind enum | | runtime/host_future.h | mino_future_state_t, struct mino_future | | runtime/coordination.h | safepoint + STW driver + state-lock declarations | | runtime/error_diag.h | error / diag / meta forward decls | | runtime/env_api.h | env-allocation, env-bind, dyn-binding forward decls | | runtime/var_module.h | ns_env, var, state PRNG, module decls |

First-batch consumer switches drop four umbrella edges: gc/profile.c (mino.h + gc/internal.h), collections/builders.c (mino.h), collections/chunk.c (mino.h + gc/internal.h + value_assert.h), and collections/iter.c (collections/internal.h + eval/internal.h + value_assert.h).

Verification. Full test suite green (1371 tests, 4828 assertions). Build clean.

v0.369.0 — Perf cycle G close

Closes the seven-tag v0.363.0 -> v0.369.0 cycle. Plan called for five design-heavy follow-ons that cycle F deferred; all five shipped:

| tag | item | result | |---|---|---| | v0.363.0 | OLD-VALARR write-barrier retrofit | shipped | | v0.364.0 | SATB drop + end-of-mark gc_mark_roots | shipped | | v0.365.0 | JIT slab pool: infra + small-fn wire-up | shipped | | v0.366.0 | JIT slab pool: per-fn slot invalidate | shipped | | v0.367.0 | Template-aware bc recompile | shipped | | v0.368.0 | Weak intern table, skip MAJOR_MARK walk | shipped |

The cycle covers four cross-cutting design lines that v0.362's audit ranked as high-leverage but not single- release-sized:

Verification.

The seven new tags stack on top of the v0.330..v0.362 work on perf-cycle-a that has been awaiting a coordinated push since cycle C. The standing no-push-without-ask rule still applies.

v0.368.0 — Weak intern table, skip MAJOR_MARK walk

The symbol and keyword intern tables stop pinning every interned name across major cycles. Entries survive only when reached through some other root (var registry, namespace env, compiled bc consts, runtime values, the explicit special-form symbol cache). Anything that loses every reachable path gets tombstoned at end-of-mark and the underlying header is freed in the regular OLD sweep.

What changes.

Identity semantics. Interned symbols are stable across their natural lifetime (var-held, env-held, bc-consts-held, or held in any live data structure). One-shot transient interns ((symbol "foo") whose result is discarded) get fresh identities across cycles, mirroring JVM Clojure's permgen / metaspace unload pattern.

Verification.

v0.367.0 — Template-aware bc recompile

Closures share their template's bc record by reference through OP_CLOSURE. The fold-staleness recompile path in invoke_bc_fn_argv used to null the closure's bc and call mino_bc_compile_fn(S, fn) on the closure itself, producing a fresh per-closure bc. After many closures from the same template hit the staleness branch, each one ended up with its own bc copy and the dedup that OP_CLOSURE set up was silently undone.

mino_val_t.as.fn grows a template_fn back-pointer that OP_CLOSURE (both the interpreter handler and the JIT slow helper) populates with the template fn. The staleness branch checks the back-pointer: when set and still pointing at the template that owns the stale bc, recompile the template once and copy its fresh bc to every sibling closure that hits the branch later. Plain templates and non-closure fns leave template_fn NULL and follow the legacy recompile-on- self path.

The new field is GC-traced through the MINO_FN walker (both the production gc_trace_children and the MINO_GC_VERIFY=1 walker pick it up) so a closure cannot outlive its template.

Verification.

v0.366.0 — JIT slab pool: per-fn slot invalidate

mino_jit_invalidate now releases a slab-allocated fn's slot claim through the new mino_jit_slab_release helper: decrement live_slots on the owning slab; on the last release, unlink the slab from S->jit_slabs and munmap the page. Legacy one-page-per-fn compiles are unchanged (their region keeps its lifetime on the jit_regions list until state teardown).

The slot bump cursor inside a slab is never rewound -- slots are append-only within a slab, and reclamation is at slab granularity rather than slot granularity. The tradeoff is simplicity: a slab that has any live slot left holds its whole page; a slab that loses every slot frees its page immediately. For a typical workload this approximates the "live slot density" cleanly, since invalidations tend to cluster (a def storm or a load that invalidates the whole ic generation).

Slab-path native_pc_offsets is tracked on the jit_regions list with a NULL region pointer, so state teardown reaps the malloc'd table even for slabs that weren't fully released during the run.

Verification.

v0.365.0 — JIT slab pool: infra + small-fn wire-up

Small JIT'd fns (need_bytes <= 4 KB pre-page-rounding) now share host pages from a per-state slab pool instead of mmap'ing a dedicated page per fn. The previous one-page-per-fn allocator left 92-99 % of each page dead -- a 200-byte body on macOS arm64 (16 KB pages) wasted 16 184 bytes. The slab pool packs slot-by-slot until a page fills, then allocates a fresh slab.

What changes.

mino_bc_fn_t grows one new pointer (native_slab) so the per-fn invalidate path scheduled for v0.366.0 can locate the owning slab. The jit_regions list keeps tracking legacy allocations; slab pages live on the new jit_slabs list and are munmap'd at state teardown.

Verification.

The measured win on JIT footprint surfaces once v0.366's invalidate path lands (so a redef storm doesn't pin every slot's contribution to the slab).

v0.364.0 — SATB drop, end-of-mark re-rooting

The hybrid Yuasa-plus-Dijkstra write barrier drops its deletion (SATB) half. The major remark phase grows a full gc_mark_roots pass that re-walks every precise root before the final stack scan and mark-stack drain, replacing the snapshot-at-begin invariant with an incremental-update invariant the existing Dijkstra (insertion) push already captures.

What changes in the barrier. The MAJOR_MARK branch of gc_write_barrier previously pushed two values for every slot store: old_value (SATB, preserves snapshot reach) and new_value (Dijkstra, captures inserted edges). The old_value push is removed. The new_value push is unchanged. The gc_barrier_satb_pushes field stays on mino_state_t for (gc-stats) ABI continuity; new builds always read 0.

What changes in major remark. gc_major_remark used to do only gc_scan_stack + gc_drain_mark_stack. It now also calls gc_mark_roots first, which is what makes the SATB drop sound. Anything reachable through any root at end of mark gets traced; anything that lost every root path is correctly swept. State-owned containers whose slot writes bypass the per-op barrier for hot-path speed (bc_regs, per-worker bc_regs_storage) get their YOUNG / OLD frontier covered by the new remark walk.

Soundness argument:

Cost trade:

MINO_GC_VERIFY=1 task release-gate clean; the v0.362-era verify trap stays closed. JIT parity byte-identical.

v0.363.0 — OLD-VALARR write-barrier retrofit + bc-regs remset pin

Closes the OLD-VALARR remset-miss bug surfaced under MINO_GC_VERIFY=1 at the end of the previous cycle. Two fixes land together because they address the same invariant from opposite ends.

Argv write sites routed through the barrier. The fn-apply trampolines built a scratch argv on the C stack and, on overflow, spilled to a heap-allocated GC_T_VALARR. The spilled buffer could survive a minor across nested mino_bc_run re-entries, get promoted to OLD, and then take fresh slot stores without going through gc_write_barrier. Four sites in src/eval/fn.c now route through gc_valarr_set whenever the buffer is heap-resident:

The stack-resident case keeps the direct assignment; the heap-resident case threads the remset entry.

bc_regs pinned in the remset. The bytecode VM register stack (S->bc_regs plus per-worker bc_regs_storage) is a state-owned GC_T_VALARR whose slots receive YOUNG values from the inner-loop VM hot path WITHOUT a per-op barrier -- the cost would be prohibitive on every register write. The existing design compensates by walking the live slot range from gc_mark_roots, but the remset itself stayed unaware of those edges. MINO_GC_VERIFY=1 correctly flagged the unaware-remset state as an invariant violation.

gc_remset_reset now re-adds each OLD bc-regs buffer to the remset after every clear. One remset entry per worker per cycle; minor's gc_mark_remset already covers the slot walk the verify pass was demanding. The cost is a single unconditional gc_remset_add per minor.

Verification.

This release is a precondition for the SATB drop scheduled for v0.364.0.

v0.362.0 — Perf cycle F close (SATB drop deferred)

Closes the 12-tag v0.351.0 -> v0.362.0 cycle. The plan called for 10 perf items grouped into 3 phases; 4 items shipped with measurable wins and 6 deferred with documented rationale.

The SATB drop (the cycle's final scheduled item) is deferred per the pre-approved decision rule: the v0.361 verification pass surfaced a pre-existing OLD-VALARR -> YOUNG-VAL remset miss under MINO_GC_VERIFY=1 that SATB had been masking. Dropping SATB would make the existing remset-miss bug user-visible. The bug is logged in .local/BUGS.md as a P1 for a focused follow-on fix. SATB push restored to its pre-prototype shape; gc_barrier_clear_only counter removed.

The temporary instrumentation field added in v0.361.0 is removed; gc_barrier_satb_pushes keeps its slot in (gc-stats) for ABI continuity (always non-zero under the restored hybrid barrier).

Cycle scoreboard:

Measurable wins:

Carry-over (ordered by expected leverage):

1. JIT slab pool + template-aware bc recompile (items 3 + 4 combined). 2. OLD-VALARR remset-miss fix (precondition for the SATB drop). 3. Weak intern table (unblocks item 6). 4. SATB drop (ships once #2 is fixed).

Full cycle synopsis lives at mino/.local/perf-cycle-f-close.md.

task test-jit-parity byte-identical across all 4 binaries. task release-gate clean (default MINO_GC_VERIFY unset). task perf-gate carries the pre-existing small-map allocation regression from v0.344.0; no new regressions.

v0.361.0 — SATB-drop audit + verification

Cycle F item 8 part 1. Pure verification pass -- no behaviour change to the hybrid write barrier.

Adds a temporary gc_barrier_clear_only counter (ticks during MAJOR_MARK when gc_write_barrier fires with new_value == NULL AND old_value != NULL -- i.e., the SATB push is doing the only work for that call).

Audit doc landed at .local/satb-audit.md covering every clear-only call site:

Verdict: safe to drop SATB. v0.362 ships the removal.

task test-jit-parity byte-identical across all 4 binaries with the instrumented build. task release-gate clean.

v0.360.0 — Adaptive major-slice budget

Replaces the static gc_major_work_budget default (4096 headers per slice) with an adaptive controller targeting a configurable STW pause length.

Tail-placed past the existing pause-distribution fields in mino_state_t so the JIT-pinned offsets (ic_gen, bc_regs, jit_invoke_ctx) stay stable.

task test-jit-parity byte-identical across all 4 binaries. task release-gate clean. task perf-gate carries the pre-existing small-map regression; no new regressions.

Honest measurement note: the alloc-heavy local probe doesn't sustain enough mark-stack pressure for adaptive to kick in (median pause stays well under target), so the default budget remains 4096 in steady state. The lever activates on workloads that DO push median pause past target, where it dampens tail max without changing total GC ns (measurement-driven; controlled experiment in a future cycle). The plan's "max pause <= 1.5x target" gate cannot be exercised on the current corpus -- pause ring's p99 is already under target.

v0.359.0 — Major-mark root pruning deferred (soundness gate)

Cycle F item 6 ("Major-mark root pruning") proposed gating the intern-table walks (sym_intern + kw_intern) on the GC phase so MAJOR_MARK skips them. The cycle E dashboard's root-scan fraction (25 % of major-mark cost on the alloc-pressure workload) was the motivating evidence.

Local measurement confirms the cost is real:

| workload | root-scan-ns | major-mark-ns | scan share | |---|---:|---:|---:| | (dotimes [_ 100] (vec (map inc (range 10000)))) | 1.30 ms | 0.49 ms | 73 % |

Soundness assessment, however, says the plan's "conservative-scan-catches-young-interns" claim doesn't hold in mino's design: the intern table itself is the canonical root for interned symbols / keywords. Names interned by code that's no longer reachable through any other root path (no var, no ns_env entry, no compile-time const) survive only because gc_mark_intern_table pushes them every cycle. Skipping the walk would cause a use-after-free on the next mino_intern_symbol call that looks up the same name and finds a stale pointer in the table.

Fixing this properly requires changing the intern table's storage to be weak-ref-shaped: live entries get reaped during sweep when the underlying symbol has no other references, and the table's lookup path tombstones reaped slots. That's a non-trivial change to mino's runtime symbol identity.

Decision: defer. The lever's measured upside (~25 % of major-mark cost) is real but its enabling refactor (weak intern table) is too large for a single-release scope and needs its own design pass. Captured in the next-cycle backlog alongside the JIT slab pool and bc-template dedup levers.

v0.358.0 — Anonymous-fn bc dedup deferred (diagnostic logged)

Cycle F item 4 ("Anonymous-fn bc dedup cache") was scoped to address the pipeline workload's 168 :bc per pipeline run. Diagnostic instrumentation localised the actual mechanism: when a closure inherits its template's bc and the bc carries has_folds == true, the compile_ic_gen != S->ic_gen check in invoke_bc_fn_argv (src/eval/fn.c:805) recompiles bc per closure -- one GC_T_BC alloc per closure invocation whose template's bc has been invalidated by an interleaved ic_gen advance.

The trace on a 100-iter pipeline workload confirms it: ~100 [bc-compile-fn] events, all sharing the same params / body pointer pair (i.e., the same source template). Each event allocates a fresh mino_bc_fn_t, because the recompile path overwrites the *closure's* bc field, never reaching the template's bc.

The plan's content-hashed fn-template cache solves the problem in the general case, but the simpler fix -- template-aware recompile, where invalidation re-runs mino_bc_compile_fn against the *template* and rebroadcasts the fresh bc pointer to all closures created since the last ic_gen advance -- is local and well-bounded.

Either path is more than a single-release cycle. Combined with the v0.356 / v0.357 JIT slab pool deferral, the follow-on cycle's two clearest entries are now:

1. JIT slab pool (W^X-aware sub-allocator). 2. Template-aware bc recompile (closure invalidation propagates via the template, not per-closure).

Both have measurement-evidence-backed leverage at this runtime layer. The diagnostic that surfaced item 2 is captured here so the follow-on cycle starts with the root cause, not the symptom.

v0.357.0 — JIT per-fn slot invalidate (folded into v0.356 deferral)

Part 2 of the JIT slab pool work. The per-fn invalidate path depends on the v0.356 slab allocator landing first; deferring as the second half of the same combined follow-on cycle. See v0.356 above for the full rationale.

v0.356.0 — JIT region sub-allocator deferred

Cycle F item 3 ("JIT region sub-allocator") was scoped as two releases (v0.356 design + small-fn alloc, v0.357 per-fn slot invalidate). The v0.349 dashboard ranked this as the cycle's biggest single measured lever -- every JIT-compiled fn pays one full mmap page (16 KB on Apple Silicon), and code bodies are 92-788 B, so per-fn page waste is 94-99 % and a pipeline workload with 168 JIT'd lambdas wastes ~2.7 MB.

The sub-allocator requires substantial new infrastructure:

Each of these is straightforward in isolation; together they exceed a single-release scope by the cycle's measurement- driven rhythm (every release ships with before/after numbers from the Cycle E surfaces).

Decision: defer both v0.356 and v0.357 as a single follow-on cycle. The measurement evidence (94-99 % per-page waste, ~220 KB on 14 fns, ~2.7 MB on 168 fns) remains the cycle's single biggest lever and is now documented in the dashboard as the next-cycle entry point.

Honest impact: no runtime change in this release. JIT memory footprint for mino-bench/jit_blocker_workloads.clj and the new alloc_site_saturation.clj rows stays at the v0.355 levels.

v0.355.0 — Fold delay realisation into C prim_deref

The instrumentation cycle dashboard ranked <core>:844:16 (the 1-arg arity of the deref shadow at src/core.clj:844) as the dominant by-wall-time fn on the protocol workload (inv=2400504 total=1.07 s avg=444 ns max=11 ms).

The shadow exists to wrap the C deref so delays (map-shaped on mino, with :delay/fn + :delay/state keys) participate in (deref ...) like atoms / futures / vars. Every (deref x) call paid one extra fn-call hop plus a (delay? x) type-check even when x was a plain atom.

The C prim_deref now realises delays directly on the MINO_MAP branch: looks up :delay/fn, invokes it on hit, returns the result. The Clojure-side shadow at src/core.clj:837-848 is removed (replaced by a comment pointing to the C-side handling). realized? keeps its Clojure shadow at src/core.clj:849 since that one wraps a different C prim and adds non-trivial type-routing.

Measured perf delta on (loop [...] (+ acc (deref a))) tight-loop benchmark over 1 M iterations:

| mode | baseline (shadow) | post (C prim) | delta | |---|---:|---:|---:| | JIT auto | 477 ms | 447 ms | -6.3% | | JIT off | 452 ms | 452 ms | ±noise |

The JIT-on win is small but real (the wrapper's per-call overhead was being partially masked by IC caching). The real-shape gain is architectural -- the public deref API is now one indirection deeper instead of two, and the delay-realisation path lives in the same place as atom/var/future/agent dispatch.

task release-gate clean. task perf-gate carries the pre-existing small-map regression; no new regressions.

v0.354.0 — Bench corpus expansion

Doc-only marker. Three new bench files landed on mino-bench's perf-cycle-d branch to close the dark spots the v0.349 synthesis dashboard surfaced:

Each bench file runs cleanly under the four instrumentation env-flag combinations from mino/.local/instrumentation-dashboard.md.

mino-side: no runtime change. Future cycles that want a before / after measurement against the dashboard's dark-spot coverage now have a stable corpus.

v0.353.0 — Protocol-call cached fast lane deferred

Cycle F items 1 + 9 ("Per-call argv allocation in the protocol dispatch slow path" and "Protocol per-call env-bind overhead") were planned as a fast lane that routes the cached protocol impl through mino_bc_run_known_native, bypassing apply_callable_argv's env_child / env_bind / argv-copy overhead. The instrumentation cycle dashboard cited 4.8 M :valarr + 2.4 M :env + 2.4 M :raw allocations across 600 protocol-method outer calls on the cycle E baseline as the motivating evidence (mino/.local/top-10-priorities.md).

A prototype of the fast lane was implemented and verified correct (the IC slot classifies the impl's callable kind on refill; the dispatch hot path checks the kind and routes single-clause / no-rest / argc-matching impls through mino_bc_run_known_native). task test-jit-parity was byte-identical across all four binaries with the prototype in place.

Local measurement, however, showed no win:

| workload | baseline | prototype | delta | |---|---:|---:|---:| | 1-arg protocol × 5 M iters (JIT on) | 467 ms | 466 ms | ±noise | | 3-arg protocol × 5 M iters (JIT on) | 215 ms | 217 ms | ±noise | | 1-arg protocol × 5 M iters (JIT off) | 510 ms | 510 ms | ±noise | | :env alloc delta over 5 M calls | 114 | 114 | identical | | :valarr alloc delta over 5 M calls | 106 | 106 | identical |

The dashboard's heavy :valarr footprint is not reproducible on the current corpus -- the most likely explanation is that the protocol-dispatch stencil added after cycle E close already captured most of the win the fast lane was meant to address. apply_callable_argv's MINO_FN lane goes through invoke_bc_fn_argv which does no :valarr alloc on the hot path for single-clause callees.

Per the project's "every perf change must have a before/after bench" / "do not ship noise-level wins" rule, the prototype is reverted. The IC slot already carries cached_callable_kind and cached_bc on the GLOBAL path; extending the PROTOCOL path is straightforward when a future workload surfaces actual alloc pressure here.

Honest follow-up note: a workload that DOES reproduce the dashboard's 8000 :valarr per outer call would re-open the case; the implementation is documented and ready.

v0.352.0 — JIT loop matcher: constant-step accumulator

Extends the counted-loop matcher to accept literal-int accumulator steps like

(loop [i 0 acc 0] (if (< i N) (recur (inc i) (+ acc 2)) acc))

and the commuted (+ 2 acc) form. Implementation is a pure matcher addition: the matcher recognises (+ acc N) for any non-1 int constant N, allocates a fresh register, emits OP_LOAD_K to materialise N into that register, then emits the OP_LOOP_INT_LT_ACC / OP_LOOP_INT_DEC_ACC stencil from v0.351 with the temp register as the step source.

No new opcodes, no new stencils. The pre-load runs once per loop entry; the per-iteration cost is identical to the register-step path.

| workload | jit-off | jit-on | speedup | |---|---:|---:|---:| | (loop [i 0 acc 0] (if (< i 100k) (recur (inc i) (+ acc 2)) acc)) × 200 | 71 ms | 40 ms | 1.78x |

The matcher exclude N == 1 since that shape is already covered by the existing (+ acc 1) -> OP_LOOP_INT_LT_INC fast path. Negative constants like (+ acc -3) work via the same lane.

task test-jit-parity byte-identical across all 4 binaries. task release-gate clean. task perf-gate carries the pre-existing small-map allocation regression from v0.344.0; no new regressions.

v0.351.0 — JIT loop matcher: arithmetic-step accumulator

Extends the counted-loop matcher to the canonical Clojure-shape arithmetic-step accumulator

(loop [i 0 acc 0] (if (< i N) (recur (inc i) (+ acc i)) acc))

and its reverse-counted sibling. Two new opcodes carry the shape into the JIT:

Both opcodes get a copy-and-patch stencil that inlines the tagged-int fast path with the safepoint downcounter; the slow path conses through prim_lt / prim_zero_p / prim_inc / prim_dec / prim_add so the canonical diagnostic still fires on overflow or non-int operands. Stencil byte tables regenerated for all 5 hosts.

Before this release these shapes fell through to generic recur-driven body compile, and the v0.348 native sampler showed zero coverage on real-shape accumulator counter loops.

| workload | jit-off | jit-on | speedup | |---|---:|---:|---:| | (loop [i 0 acc 0] (if (< i 100k) (recur (inc i) (+ acc i)) acc)) × 200 | 71 ms | 40 ms | 1.77x |

The existing OP_LOOP_INT_LT_INC / OP_LOOP_INT_DEC_INC step=1 fast paths are unchanged. Lifts the dashboard's loop matcher blind spot on real-shape counter loops.

task test-jit-parity byte-identical across all 4 binaries. task release-gate clean. task perf-gate carries the pre-existing small-map allocation regression from v0.344.0 (logged in .local/BUGS.md); no new regressions.

v0.350.0 — Perf cycle E close

Doc-only marker closing the v0.345.0 → v0.349.0 pure- instrumentation cycle. 16 tags shipped across 6 phases; nothing deferred. Every planned surface landed honestly mapped to mino's actual collector / barrier / JIT design (a few plan fields intentionally omitted with rationale captured in the originating release notes -- gc_minor_flip_ns, gc_minor_promote_ns, gc_remset_overflows, gc_card_dirties).

Acceptance gates met:

Dashboard at mino/.local/instrumentation-dashboard.md re-ranks the standing perf backlog by measured evidence:

Out-of-band carry-over the user flagged: a small mino-bench expansion cycle (1-2 days) to add bench rows the dashboard found dark -- counter-only loops, deopt-triggering shapes, per-tag alloc-stress.

The next-cycle entry point is the JIT region sub-allocator work, with the now-instrumented runtime providing the before/after measurement substrate.

v0.349.0 — Synthesis dashboard

Doc-only release. Ran the v0.345..v0.348 instrumentation surfaces over a representative workload mix and synthesised the output into a ranked-levers dashboard at mino/.local/instrumentation-dashboard.md.

Seven dashboard sections per the cycle plan: GC profile, JIT profile, allocation breakdown, compile-side declines, sampler hot regions, cross-cut ranked levers, next-cycle recommendation.

Headline findings:

Top-three recommended next moves: JIT region sub-allocator, loop matcher extension, pipeline allocation audit.

No production code change. The dashboard is the deliverable.

v0.348.2 — Allocation-site sampler (env-gated light)

Complements the compile-gated MINO_ALLOC_PROFILE with an always-runnable variant. MINO_ALLOC_SAMPLE=1 activates the sampler; MINO_ALLOC_SAMPLE_RATE=N tunes the rate (default 4096 -- one out of every 4096 allocations). At each gc_alloc_typed_inner call the counter ticks and, when the period trips, a 16-byte mino_alloc_sample_t triple (immediate return address, tag, log2 size bucket) lands in a fixed-size 4096-entry ring (~64 KB).

New public API:

The site value is the immediate caller of gc_alloc_typed_ inner, captured via __builtin_return_address(0) -- maps back to a source file/line via addr2line on the mino binary.

Probe (50 cycles of (vec (range 0 5000)) + (apply hash-map (range 0 1000)) with rate 512):

Default cost: one branch + counter increment on the alloc path when the env flag is unset.

task release-gate is OK.

v0.348.1 — Native-side sample tag

mino_sampler_fire now consults ctx->jit_invoke_depth at each sample capture and sets the low bit of mino_sample_t.flags when the safepoint hit fired from inside JIT'd native code (the loop stencil's downcounter calls mino_bc_safepoint while jit_invoke_depth > 0).

mino_sampler_dump aggregates a parallel native_count per (bc, pc) bucket so the dump line now reads samples=N native=M fn=... pc=... op=....

Pair the per-fn samples_native total with v0.346.0's jit_invocations to compute samples-per-native-invocation, the proxy for per-call CPU cost inside JIT'd code.

Probe (200-iter warmup + 100x loopy(500000) with the matcher- compatible counter-only loop shape, period 100):

Multi-binding loops with non-trivial recur steps (e.g. (loop [i 0 acc 0] ... (recur (inc i) (+ acc i)))) don't yet fuse to OP_LOOP_INT_LT so they don't fire the stencil's safepoint downcounter and stay invisible to the native tag. The matcher-extension cycles already on backlog (Cycle D's loop-matcher-rejects notes) will widen this once they ship.

task release-gate is OK.

v0.348.0 — Safepoint-based CPU sampler

Adds the first CPU sampling profiler to mino. MINO_SAMPLE=1 activates the sampler; MINO_SAMPLE_PERIOD=N tunes the sample rate (default 1000 -- one sample every 1000 safepoint hits). At each mino_bc_safepoint call the counter ticks and, on a matching period, the current ctx's bc_current_bc / bc_current_pc are recorded as a mino_sample_t { bc, pc, op, flags } into a per-state ring buffer (capacity 65536 entries = 1 MB, allocated lazily on the first sample).

Two new public API entries in mino.h:

mino_sample_t is 16 bytes (one pointer + uint32_t + two uint16_t); the flags low bit is reserved for the v0.348.1 native-side tag.

Probe (50000-iter outer loop over fib(20), period 100):

Default cost: one branch + counter increment per safepoint. The env probe runs once per state via a sniffed tri-state.

task release-gate is OK.

v0.347.2 — Collection-size histogram (env-gated)

MINO_COLL_SIZE_STATS=1 activates a coll_size_hist[3][32] on mino_state_t. Ticked at mino_persistent, which is the canonical finalize entry for every transient-based collection build. Kind 0 = vector, 1 = map, 2 = set; bucket = clamp(floor(log2(size + 1)), 0..31). Default builds carry one sniff + one branch on the persistent path.

Surfaced via (gc-stats) as :coll-size-hist, a map of kind-keyword -> length-32 bucket vector. Zero-kind entries elided.

Probe (100 [1..5] transients + 50 {:a :b} transients with MINO_COLL_SIZE_STATS=1):

Direct mino_vector / mino_map / mino_set calls from C that don't transit mino_persistent are not ticked; this is the documented scope (the transient finalize covers the dominant script-level construction pattern, and adding ticks to the direct constructors would be a separate pass once a dashboard surfaces a measurable C-only construction workload).

task release-gate is OK.

v0.347.1 — BC compile-decline histogram

Adds S->bc_declines[BC_DECLINE__COUNT] (size 16) and a small enum of decline categories. The compiler ticks the matching bucket at structural decline sites:

Reserved buckets (QUALIFIED_HEAD, DESTRUCTURE, RECUR_OUTSIDE, NESTING_LIMIT, OOM) are wired into the enum but not yet ticked at their leaf sites; they will be filled in as the dashboard surfaces concrete blockers for each family. The catch-all OTHER ensures every decline is counted regardless of attribution depth.

Surfaced via (gc-stats) as :bc-declines, a keyword -> count map (zero buckets elided). Probe (one try-fn + one binding-fn + 30-iter loop): {:macro 13} — both try and binding are compile-handled, so the only declines come from stdlib macro expansions hitting the macro head-resolve path.

task release-gate is OK.

v0.347.0 — Per-tag allocation histogram

Adds an always-on gc_alloc_by_tag[GC_T__COUNT] (size 16) counter array to mino_state_t. Each gc_alloc_typed_inner call increments the slot for the requested tag (one extra indexed store on the alloc path; negligible vs. the header init + list-link cost the path already pays).

Surfaced on mino_gc_stats_t as alloc_by_tag[16] (raw indices preserved) and through (gc-stats) as :alloc-by-tag, a map of tag-keyword -> count: :raw / :val / :env / :vec-node / :hamt-node / :hamt-entry / :ptrarr / :valarr / :rb-node / :bc. Zero-count entries are elided.

A new GC_T__COUNT = 16 enum entry bounds the array and leaves slack for one round of tag growth without resizing the public struct.

Probe (3 cycles of vec/hash-map/hash-set construction):

:val dominating by ~6x over the next tag is the expected fingerprint of cons-heavy + literal-heavy workloads.

task release-gate is OK.

v0.346.3 — JIT compile-time + code region usage

Three new always-on fields on mino_bc_fn_t:

Pair with the existing bc->native_size to derive total region usage: region_used = code + tramp + pool, where tramp and pool are not separately surfaced (they are small and bounded).

MINO_CPJIT_STATS=tracing per-fn dump now prints a compile: ns=N code=K B region=M B dead=D B follow-up line for each compiled fn.

Probe (caller -> helper, JIT-compiled):

The 98% dead-byte ratio on tiny fns is a real finding: every JIT'd fn pays for a full mmap'd page, so a workload with many small JIT'd fns has high region waste. This is a candidate lever for the dashboard.

task release-gate is OK. All v0.346.3 fields are always-on (no env gate); the per-compile cost is one extra mino_monotonic_ns() pair plus three stores at the very end of mino_jit_compile_inner, negligible.

v0.346.2 — Per-site IC stats (env-gated)

MINO_JIT_IC_STATS=1 lazily allocates a parallel POD buffer of mino_bc_ic_stat_t { uint64_t hits; uint64_t misses; uint64_t thrash; } triples sized to ic_slots_len on the first IC resolve per bc. The buffer is GC_T_RAW and reached via gc_trace_children in both the MINO_FN and GC_T_BC cases so it stays alive while bc lives.

Counters tick inside ic_resolve_global only — the JIT inline fast path (slot hit verified in machine code, slow helper never called) does NOT increment. This means:

The MINO_CPJIT_STATS=tracing per-fn dump now prints an ic-sites (slot: hits / misses / thrash) block for fns whose ic_stats buffer has any non-zero entries, ordered by slot index for stable diffs across runs.

Probe (caller -> helper via 200-iter loop with 3 interpreter warmups): slot 0 reports 98 / 1 / 0. The miss is the cold resolve; the 98 hits are the interpreter calls before the JIT threshold trips. Post-JIT calls (~100) are uncounted because the inline path serves them.

task release-gate is OK. Default builds carry zero overhead (env probe is a single static-initialised tri-state check).

v0.346.1 — Per-fn JIT wall-time (env-gated)

Two new mino_bc_fn_t fields, populated only when MINO_JIT_TIME_FNS=1 is set in the environment:

mino_jit_invoke reads the env flag once into a static tri-state so the default path stays at v0.346.0 cost (one load + one branch per call). When the flag is on, two mino_monotonic_ns() reads bracket the native f() call; overhead measured at ~5-10 ns per call, which the perf-gate budget tolerates for warm fns.

The MINO_CPJIT_STATS=tracing per-fn dump adds a wall: total=N ns avg=M ns max=K ns follow-up line whenever native_total_ns > 0, so an embedder running both env flags together sees per-fn cost-per-call alongside the invocation count.

Probe (1000-iter fib-50 loop with both env flags on):

task release-gate is OK.

v0.346.0 — Per-fn JIT invocation + deopt counters

Two uint64_t counters per compiled bc record:

Counters land directly on mino_bc_fn_t rather than a global side table because the JIT compile path already owns the record's lifetime, and the existing MINO_CPJIT_STATS=tracing per-fn dump path can read them at exit without walking a separate registry. The tracing dump now reports per-fn inv=N deopts=M alongside the existing reason= / code_len= / native= columns.

Probe (single fn, dotimes 200 over inner loop): tracing dump shows inv=N deopts=0 for the JIT'd body. (Exact N depends on warm-up: top-level invocations made before the threshold is crossed run through the interpreter and don't count.)

task release-gate is OK.

v0.345.3 — Pause-time distribution

Adds the GC pause-time distribution surface. Every collection / slice / force-finish site that already computed elapsed_ns now also records the pause through gc_record_pause, which appends to a 256-entry ring (one slot per pause, saturating at UINT32_MAX ns) and ticks a 24-bucket log2 histogram [2^i, 2^(i+1)) ns indexed by floor(log2(ns)) clamped at 23.

Two new public accessors land in mino.h:

(gc-stats) surfaces the percentiles directly:

:max-gc-ns (cumulative-since-state-creation max) keeps its existing meaning. The new percentiles are window-scoped to the last 256 pauses so a long-running embedder sees the recent shape, not the lifetime extreme.

Probe (20 build-vec / build-map rounds, 16 minors + 3 majors):

task release-gate is OK.

v0.345.2 — Bytes promoted + young-survival age histogram

Two new surfaces on mino_gc_stats_t and (gc-stats):

Probe (vector atom + 200 conj rounds × 10 churn calls):

task release-gate is OK.

v0.345.1 — GC barrier + overflow counters

Three new cumulative counters surfaced through mino_gc_stats_t and (gc-stats):

The plan's gc_remset_overflows counter is intentionally omitted: mino aborts on a remset realloc failure rather than silently dropping the entry, so the event is unobservable from the surviving runtime. The plan's gc_card_dirties counter is omitted because mino has no card-marking variant (the remembered set + SATB / Dijkstra pair handles the same job).

Probe (atom + swap! cycles under an in-flight major):

The dominance of Dijkstra over SATB in this microbench is expected for a Yuasa-with-insertion barrier: most slots being written are the just-installed pointer, while the old-value path is rarer (snapshot already covered it via root scan). Future cycles can use the SATB/Dijkstra ratio across the corpus to evaluate whether barrier work can be reshaped.

task release-gate is OK. No new perf-gate regressions introduced; same one pre-existing small-map row.

v0.345.0 — Per-phase GC timers

First release in the instrumentation cycle that started after the previous close. Adds five per-phase GC timer fields to the state and surfaces them through mino_gc_stats_t plus the (gc-stats) reflection primitive:

Two plan fields are intentionally omitted: gc_minor_flip_ns (mino's nursery is mark-and-sweep with age-based promotion, not a copying semispace -- there is no flip phase) and gc_minor_promote_ns (promotion is interleaved into the minor sweep loop, not a separable phase). A separate byte-count surface for promotion volume lands in a later release.

Acceptance: sum of the four _mark_ns + _sweep_ns fields tracks :total-gc-ns within 5%. Verified on a synthetic collect-heavy probe (3 minors + 1 major over 5 build-vec / build-map cycles): sum / total = 1.025. Verified on a heavier corpus (31 minors + 4 majors over 30 cycles): sum / total = 1.011.

| metric (3 minor + 1 major run) | value | |---|---| | :total-gc-ns | 3.52 ms | | :minor-mark-ns | 1.01 ms | | :minor-sweep-ns | 2.05 ms | | :major-mark-ns | 0.44 ms | | :major-sweep-ns | 0.10 ms | | :root-scan-ns | 0.42 ms |

task release-gate is OK. task perf-gate reports the same one pre-existing small-map allocation regression as v0.344.0 (see mino/.local/BUGS.md); no new perf-gate regressions were introduced.

The new state fields are placed past jit_hot_threshold so runtime_layout.h offsets used by the JIT stencils do not shift.

v0.344.0 — Perf cycle close

Doc-only marker closing the v0.340 → v0.343 cycle. Only v0.340 ships a measurable code change (loop matcher accepts (+ counter 1) and (+ 1 counter) shapes for the counter step, measurement-neutral on the current corpus but unblocks future fusion work from quietly missing arithmetic-counter loops). v0.341 shipped corpus expansion in mino-bench (3 new bench files). v0.342 and v0.343 ship rationale-only deferrals on predicate+branch fusion (architectural blockers) and GC discovery (insufficient mino_gc_stats resolution to choose a lever).

The next cycles, ordered by leverage:

1. GC instrumentation cycle — add per-phase timers and write- barrier counters in src/gc/{minor,major}.c, re-run the alloc_pressure_bench + protocol-state-machine corpus. 2. Predicate+branch fusion — wait for a cycle that can commit to the per-fusion-site safety pass + multi-class chain marker work. 3. mino-site documentation refresh — cover v0.323 → v0.343 of accumulated cycle deltas on /performance/ and add a side-exit / deopt section.

No production code change in this release.

v0.343.0 — GC discovery, no production change

Doc-only release. Captured GC fraction across ten workloads spanning small / medium / large object-size buckets plus the protocol state machine 5k ticks real-shape baseline. The GC fraction lands in a 10-20% band across allocation-heavy rows with allocation rate, not collection cost, as the apparent driver. Whether mark / sweep / promote dominates inside that band cannot be answered with the current mino_gc_stats surface (total ns + collection counts only).

The honest output of this phase is that a productive future GC cycle starts with per-phase timer instrumentation in src/gc/{minor,major}.c and a write-barrier hit counter, then re-runs this corpus with the finer data. The Phase 4 verdict is "spin out a follow-up cycle gated on instrumentation"; this release ships the discovery doc as the only artifact.

The alloc_pressure_bench rows from v0.341.0 carry over as the working corpus for the next attempt.

v0.342.0 — Predicate+branch fusion deferred

Doc-only marker. The architectural investigation for fusing the <cmp>_II → JMPIFNOT and CALL_CACHED → JMPIFNOT bigrams identified two structural blockers:

Expected upside per the bigram data: 2-3% on loop-heavy workloads -- borderline above the gate the plan set. Combined with the infrastructure cost, a future cycle picks this up with a refined design (per-site safety pass first, then prototype one fused pair, then mirror only on measured win).

No production code change in this release.

v0.341.0 — Workload corpus expansion (mino-bench)

Doc-only marker. Three new benches land in mino-bench:

Running MINO_CPJIT_STATS=tracing over jit_blocker_workloads now reports one ok-with-deopt fn (the compute-or-throw helper at line 27, blocked at OP_THROW@pc=11). The corpus clears the > 10% OK_WITH_DEOPT gate that the forward stencil hook work has waited on since v0.339.0.

No source change in mino itself. The companion bench commits live on mino-bench's perf-cycle-d branch.

v0.340.0 — Loop matcher accepts (+ counter 1) as inc-equivalent

The counted-loop matcher in compile.c recognises three step shapes for a loop counter: (inc counter), (+ counter 1), and (+ 1 counter). Previously only (inc counter) qualified; user code that writes (+ i 1) -- a common Clojure-canonical form -- fell through to the unfused LT_II / JMPIFNOT / ADD_IK / JMP sequence.

A 10M-iter microbench on arm64-darwin confirms the three forms run within noise of each other when the fused op fires (16.8-16.9ms each). The recur-shape bench (mino-bench) adds three new rows demonstrating that (+ i 1) and (+ 1 i) match the (inc i) baseline in both 1-binding and 2-binding shapes.

Honest scope note: across the workload corpus surveyed for this release, no row that previously rejected for "counter step not (inc counter)" ALSO has its non-counter binding step in inc-equivalent form. The realistic_bench loops that motivated this change still reject at the non-counter-step check (they have (assoc m i ...) / (conj v i) as the second step). Fusing those is gated on the predicate+branch fusion landing in a later release; this change positions the matcher to recognise the counter-step shape so the later fusion work doesn't quietly miss arithmetic-counter loops.

v0.339.0 — Perf cycle close

Marker for the close of the v0.330 → v0.338 perf cycle. The cycle bundled three families of work: reduce / builder rewrites (v0.330–v0.332), the write-side bang fast-lane family (v0.333–v0.334), and BC-compile coverage for non-empty map / set literals (v0.335–v0.337), capped by bigram-discovery instrumentation (v0.338).

Forward stencil hooks (native ← interpreter resume) gated on the classifier reporting OK_WITH_DEOPT at materially > 0% of compiled fns. Across the workload corpus (real_workloads, realistic_bench, jit_bench, protocol_bench, eval_bench, lazy_bench, map_bench), 100% of compiled fns landed at the plain ok reason -- the side-exit path is in place but the workload mix doesn't exercise unstenciled middles. Deferred.

No code change in this release.

v0.338.0 — Bigram discovery instrumentation, no fusions shipped

Discovery-only release. The bytecode dispatch loop now records opcode-pair (bigram) frequencies alongside the existing per-op counter when the binary is built with -DMINO_BC_OP_COUNTS=1. Output at process exit ranks adjacent op pairs by absolute dispatch frequency so a future cycle can identify candidates for superinstruction fusion. Zero runtime cost on the default build (the instrumentation is compile-gated).

Findings. Running the workload corpus (real_workloads.clj, realistic_bench.clj, jit_bench.clj, diff_clojuredocs.clj) surfaced three classes of dominant bigrams: predicate + branch (GE_II → JMPIFNOT, LT_II → JMPIFNOT), call-site setup (MOVE → CALL_CACHED, GETGLOBAL_CACHED → LOAD_K), and trivial-tail (MOVE → RETURN, <arith> → RETURN). The predicate-branch family can't be JIT-fused without restructuring the JMP / JMPIFNOT direct-emit lane. The call-site family can't shed the MOVE because OP_CALL_CACHED reads its args from a contiguous regs window. The trivial-tail family fuses cleanly, but a prototype OP_FUSED_MOVE_RETURN showed -0.4% to +2.4% movement across realistic_bench rows -- all within run-to-run noise and well below the 7% gate. Prototype reverted; the instrumentation stays in for future discovery work.

No production code path changed in this release. The bigram counter is dev-only and the default build is byte-identical to v0.337.0.

v0.337.0 — Non-empty map / set literals BC-compile

Three related fixes that together let defn bodies containing non-empty map / set literals stay on the bytecode path.

Constructor-call lowering. compile_expr previously bailed to tree-walk eval for any non-empty map / set literal in a defn body. The whole enclosing fn ran through the interpreter, which defeated builder-rewrite and any per-element optimisation. Lower {:k0 v0 :k1 v1 ...} to (hash-map :k0 v0 :k1 v1 ...) and #{a b c} to (hash-set a b c) at compile time so the call site builds a fresh collection per invocation with values evaluated in the current local scope.

count_symbol_uses traversal. Dead-binding elimination in compile_let calls count_symbol_uses to decide if a binding can be dropped. The walker handled SYMBOL, CONS, and VECTOR but not MAP / SET. After the lowering above, a let body that's a literal map referencing a let-bound symbol survives the lowering -- but the dead-elim heuristic, blind to map values, would drop the binding as unused. The recursive walker now traverses MAP keys and values and SET elements.

OP_THROW parity with prim_throw data extraction. The bytecode OP_THROW handled the no-enclosing-try fallback by extracting :mino/message / :mino/kind / :mino/code and routing through prim_throw_classified -- dropping any :data / :mino/data payload. prim_throw had grown a set_eval_diag_with_data path specifically so future workers preserve ex-info data through their consumer-side rethrow. OP_THROW had missed that fix. With the BC path now lighting up for defn bodies that throw ex-info-style maps (because the map-literal lowering above lets them compile), OP_THROW dropped the data field on future-throw scenarios. Mirror prim_throw's extraction.

Failure surface that prior cycle attempts hit

A v0.336 experiment along the same lines broke defmacro defprotocol at core boot with unbound symbol: mname because the underlying count_symbol_uses gap was invisible while every non-empty map kept falling to tree-walk -- the dead-elim path never ran. Lowering exposed the gap. The full fix is the two-part change above plus the OP_THROW parity, all in one commit.

realistic_bench, median of 2, arm64-darwin

Wall-time deltas sit in run-to-run noise; the row floors are dominated by HAMT walk + GC. The fix lands because it removes three silent silent-fallback paths that prior cycle releases silently lost to tree-walk; future workloads that lean on map-returning fns now stay on bytecode.

v0.336.0 — Lower non-const vector literals to `(vector ...)` calls

compile_expr already const-pools vector literals whose elements are all self-evaluating, but vectors with at least one non-const element fell through to the tree-walk fallback, dropping the whole defn body off the bytecode path. (defn vlit [n] [n n n]) is the canonical instance.

Lower the non-const case to (vector e0 e1 ...) at compile time so the existing call path evaluates each element per-invocation and constructs a fresh vector. The lowering is local to compile_expr's literal dispatch; the recursive call sees the same compiler context so locals stay in scope.

Map and set literals with non-empty contents still decline. A parallel attempt to lower them as (hash-map ...) / (hash-set ...) calls broke defmacro defprotocol's template at core boot -- the syntax-quoted parts of macro bodies treat literal maps as data, not as expressions to evaluate. Worth a focused investigation before that lane lands; tracked in .local/BUGS.md.

Direct measurement (microbench: 100 calls of `[n n n]`)

| metric | before | after | |--------|-------:|------:| | bc-dispatches per call | 0 (tree-walk fallback) | ~5 |

realistic_bench, median of 2, arm64-darwin

| row | v0.335 JIT off | v0.336 JIT off | Δ | v0.335 JIT on | v0.336 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 16.94 ms | 17.03 ms | flat | 17.64 ms | 17.43 ms | flat | | build 5k int-map | 10.18 ms | 10.43 ms | flat noise | 9.55 ms | 9.65 ms | flat | | nested 500x100 | 18.09 ms | 18.14 ms | flat | 17.18 ms | 17.18 ms | flat |

Measurement-neutral on realistic_bench rows; the workloads don't use the now-BC-compiled [a b c] pattern in their hot path. The fix lands because it removes a silent fall-back path -- any defn body containing [expr expr] now stays on bytecode.

v0.335.0 — Const-pool empty map / set literals

compile_expr declined every MINO_MAP and MINO_SET literal (and non-all-const vector). Any defn whose body contained one -- including the rewritten loops the builder rewrite emits as (persistent! (loop [acc (transient {})] ...)) -- fell back to tree-walk eval for the entire fn. Confirmed via alloc profile: build-int-map's 5000 (assoc m i v) calls hit mino_map_assoc1 (persistent path) 5000 times despite the rewrite firing, because the rewritten form never reached bytecode.

Treat empty map / set literals as const-poolable. An empty map has no element forms to evaluate; const-pool storage is trivially safe. This unblocks the rewrite chain: (loop [acc {}] ...) rewrites to (persistent! (loop [acc (transient {})] ...)), which now BC- compiles and routes through mino_assoc_bang / mino_map_assoc1_owned.

Non-empty maps / sets still decline. The all-self-evaluating- elements case would be safe in theory, but a focused investigation into cross-thread const-pool visibility for future-exception data maps (see .local/BUGS.md) is required before that lane lands.

Direct measurement (microbench: 5 × build-int-map 5000)

| metric | before | after | Δ | |--------|-------:|------:|--:| | mino_map_assoc1 (persistent) calls | 5008/iter | 0 | -100% | | mino_map_assoc1_owned (transient) calls | 0 | 5001/iter | new | | total allocs / op | ~4 KB | ~407 B | -90% | | total bytes allocated (5 iters) | ~30 MB | 10.2 MB | -66% |

realistic_bench, median of 3, arm64-darwin

| row | v0.334 JIT off | v0.335 JIT off | Δ | v0.334 JIT on | v0.335 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 16.94 ms | 16.94 ms | flat | 17.30 ms | 17.64 ms | flat noise | | build 5k int-map | 10.27 ms | 10.18 ms | flat | 9.60 ms | 9.55 ms | flat | | nested 500x100 | 18.23 ms | 18.09 ms | flat | 17.10 ms | 17.18 ms | flat |

Wall-time movement on realistic_bench rows sits in run-to-run noise; the workloads' floors are now dominated by something other than the prior tree-walk-eval overhead. The 90% allocation reduction will compose with future GC throughput work.

v0.334.0 — OP_CONJ_BANG / OP_DISSOC_BANG / OP_DISJ_BANG complete the family

Mirror v0.333's OP_ASSOC_BANG across the three remaining transient write prims:

- (conj! tcoll x) arity-2: A=dst, B=tcoll, C=item. - (dissoc! tcoll k) arity-2: A=dst, B=tcoll, C=key. - (disj! tcoll x) arity-2: A=dst, B=tcoll, C=item.

Each follows the OP_DISSOC shape (A=dst, B=coll, C=key) since they take exactly two consecutive operands rather than the three the assoc shape needs. Interpreter routes a valid MINO_TRANSIENT to the matching canonical helper (mino_conj_bang / mino_dissoc_bang / mino_disj_bang); misses fall through to the prim with the Clojure-canonical diagnostic.

JIT side: three new stencils mirror the assoc_bang.c shape, three new slow helpers in helpers.c, three new entries in entry.c's stencil + slow-helper tables. Stencil byte tables regenerated across the five host targets.

realistic_bench, median of 3, arm64-darwin

| row | v0.333 JIT off | v0.334 JIT off | Δ | v0.333 JIT on | v0.334 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 16.59 ms | 17.26 ms | +4% noise | 17.25 ms | 17.30 ms | flat | | nested 500x100 | 17.88 ms | 18.23 ms | +2% noise | 17.00 ms | 17.10 ms | flat |

Measurement-neutral, same shape as v0.333 — the existing reduce/loop rewrites already produced conj! / dissoc! / disj! calls through OP_CALL_CACHED; the new opcodes replace the 2-instruction dispatch with one, but the per-call saving sits below the row's GC + collection-modify floor. The family is complete: every transient write prim now has a dedicated opcode in the same shape as its persistent counterpart (OP_CONJ_VEC, OP_ASSOC, OP_DISSOC).

v0.333.0 — OP_ASSOC_BANG inline fast lane

(assoc! t k v) arity-3 call sites compiled to a 2-instruction sequence: OP_GETGLOBAL_CACHED to resolve assoc!, then OP_CALL_CACHED to dispatch. The cached call routed through apply_callable_argv and hit the prim's standard cons-list entry. The reducer bodies the v0.330 rewrite produces use this exact 3-arg shape — (assoc! acc k v) after the transient rewrite — and the cached call path was the hottest 2-instruction sequence on bump-5k after v0.332 closed.

Add OP_ASSOC_BANG, mirroring OP_ASSOC's shape (A=dst, B=base with [coll, k, v] at consecutive regs). Interpreter routes a valid transient to mino_assoc_bang directly; invalidated transients and persistent colls fall through to prim_assoc_bang for the Clojure-canonical diagnostic. JIT side ships a matching stencil; the slow helper mirrors the interpreter dispatch.

This completes the write-side fast-lane family alongside OP_ASSOC, OP_DISSOC, and OP_CONJ_VEC. The compiler emits OP_ASSOC_BANG when the head resolves to canonical assoc!; user shadows defeat it the same way they defeat the other write-side lanes.

realistic_bench, median of 3, arm64-darwin

| row | v0.332 JIT off | v0.333 JIT off | Δ | v0.332 JIT on | v0.333 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 16.64 ms | 16.59 ms | flat | 17.27 ms | 17.25 ms | flat | | build 5k int-map | 10.25 ms | 10.27 ms | flat | 9.52 ms | 9.61 ms | flat | | nested 500x100 | 17.88 ms | 17.88 ms | flat | 17.07 ms | 17.00 ms | flat |

Measurement-neutral release: the opcode cuts the 2-instruction dispatch to one and saves the cached-call resolve, but the per-call saving sits well below the row's GC + HAMT-walk floor. The change ships because (a) it completes the write-side fast-lane family, (b) later releases that further reduce per-call dispatch cost (forward stencil hooks, control-flow stencils) compose on top of it.

The earlier v0.332 changelog entry quoted bump-5k row numbers from a work-in-progress snapshot that didn't survive re-measurement; the v0.332 → v0.333 row here reflects the actual numbers on this hardware.

v0.332.0 — OP_GET_KW_MAP transient fast lane

(get coll k) already has direct opcode fast lanes for the persistent shapes — MINO_MAP, MINO_RECORD with a keyword key — bypassing apply_callable_argv entirely. The reducer bodies the v0.330 rewrite produces call (get tcoll k) where tcoll is a transient over a map (or vector), and that branch fell through to prim_get with the cons-list path. The bump-5k reducer is the hot consumer.

Extend OP_GET_KW_MAP (and the JIT side, mino_jit_get_kw_map_slow) with a MINO_TRANSIENT branch that unwraps to the backing collection and reuses the existing MAP and VECTOR inline lookups. Invalidated transients (read after persistent!) still fall through to prim_get for the MST001 diagnostic.

realistic_bench, median of 3, arm64-darwin

| row | v0.331 JIT off | v0.332 JIT off | Δ | v0.331 JIT on | v0.332 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 13.23 ms | 12.81 ms | -3.2% | 12.75 ms | 12.93 ms | +1.4% noise | | build 5k int-map | 11.61 ms | 10.27 ms | -11.5% noise | 10.04 ms | 9.61 ms | -4.3% noise | | nested 500x100 | 15.96 ms | 13.37 ms | noise | 16.91 ms | 13.02 ms | noise | | fib(25) | 9.49 ms | 9.36 ms | flat | 7.10 ms | 6.61 ms | -6.9% noise |

bump 5k int-map alloc: 10.4 MB/op → 9.6 MB/op (−7.7%), which is the change the fast lane caused — the 5000 (get tcoll k) steps each skipped two cons cells (a 2-elt list around (coll, k)). The other rows' deltas are run-to-run system load.

v0.331.0 — Reduce slow-path uses argv directly

reduce_step's generic dispatch (used whenever the int+int arithmetic fast lane in the same function can't take over) was allocating a fresh two-element cons list every step just to hand (acc elem) to apply_callable. That allocation paid no purpose: apply_callable_argv takes a flat C array. Switch the slow path over.

The bump-5k row is the realistic load this lands against: its reducer is (fn [acc k] (assoc acc k (+ 1 (get acc k)))). The v0.330 rewrite already routes that through assoc!, so the per-step mino_cons pair was the next-largest allocator on the row.

realistic_bench, median of 3, arm64-darwin

| row | v0.330 JIT off | v0.331 JIT off | Δ | v0.330 JIT on | v0.331 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 18.80 ms | 15.77 ms | -16.1% | 15.99 ms | 14.39 ms | -10.0% | | build 5k int-map | 13.01 ms | 12.26 ms | -5.8% | 10.24 ms | 10.96 ms | +7% noise | | nested 500x100 | 16.42 ms | 15.56 ms | -5.2% | 14.34 ms | 14.83 ms | +3% noise | | map/filter 50k | 844 µs | 815 µs | flat | 757 µs | 759 µs | flat | | realize 10k lazy | 5.30 µs | 4.65 µs | -12% noise | 5.75 µs | 6.10 µs | +6% noise | | fib(25) | 9.65 ms | 9.55 ms | flat | 6.93 ms | 6.97 ms | flat |

bump 5k int-map alloc: 11.1 MB/op → 10.4 MB/op (−6.3%). GC count on the row: 7 collections → 6 (−14%). The win compounds with v0.330's transient routing — the reducer body is now both transient-mutating and no-cons-per-step.

v0.330.0 — Reduce-pattern compile-time rewrite

The Cycle B follow-up: extend the loop/recur builder rewrite to the (reduce assoc-shaped-fn seed coll) shape that the realistic_bench bump 5k int-map values row uses. compile_call_impl now intercepts reduce calls whose head resolves to canonical clojure.core/reduce, whose f is a literal (fn [acc x] (assoc/conj/dissoc/disj acc ...)) with a single tail-call body, and whose acc references in the step's non-first-arg positions are transient-protocol-safe reads (get, nth, count, get-in). On match the rewrite produces (persistent! (reduce (fn [acc x] (assoc!/conj!/dissoc!/disj! acc ...)) (transient seed) coll)). A user shadow of reduce in any namespace declines the rewrite, since the head probe compares against clojure.core/reduce's root value pointer.

The rewrite never widens its matcher to multi-form fn bodies, let or if wrapping, or #(...) reader sugar. Those cases would each need their own safety proof on the acc-reference pattern; defer behind their own measurement gate.

realistic_bench, median of 3, arm64-darwin

| row | v0.329 JIT off | v0.330 JIT off | Δ | v0.329 JIT on | v0.330 JIT on | Δ | |-----|---------------:|---------------:|--:|--------------:|--------------:|--:| | bump 5k int-map | 18.99 ms | 15.49 ms | -18.4% | 18.27 ms | 14.92 ms | -18.3% | | build 5k int-map | 11.00 ms | 10.98 ms | flat | 10.35 ms | 10.28 ms | flat | | nested 500x100 | 13.42 ms | 13.87 ms | +3.4% noise | 13.10 ms | 13.75 ms | +5% noise | | fib(25) | 9.26 ms | 9.40 ms | noise | 6.70 ms | 6.89 ms | noise | | map/filter 50k | 822 µs | 819 µs | flat | 753 µs | 752 µs | flat | | realize 10k lazy | 4.70 µs | 4.65 µs | flat | 6.10 µs | 5.75 µs | flat |

bump 5k int-map alloc: 12.3 MB/op → 11.1 MB/op (−10%). GC count on the row: 9 collections → 7 collections (−22%). The shipped delta is smaller than the v0.327 builder-rewrite synthetic (loop variant: −45%) because the clojure.core/reduce Lisp wrapper still does one fn-call hop per step on top of the transient mutation. Closing that gap is a separate internal-reduce / direct-stencil item — gated on whether it surfaces in a future real workload.

v0.329.0 — Map / collection wins cycle close

Three-release cycle (v0.327.0 → v0.329.0) targeting the alloc- bound rows the post-v0.323 baseline flagged. Two shipped wins; one queued for follow-up.

Shipped:

Deferred:

Net post-cycle perf vs v0.326.0 baseline (realistic_bench, median of 3 runs, arm64-darwin):

| row | A off | B off | Δ | A on | B on | Δ | |--------------------------------|------:|------:|-------:|------:|------:|-------:| | build 5k int-map and sum | 10.83 | 11.00 | +1.6% | 10.34 | 10.35 | flat | | bump 5k int-map values | 17.81 | 19.03 | +6.8% | 19.29 | 19.05 | -1.2% | | map/filter/map/reduce over 50k | 793 µs| 822 µs| +3.7% | 712 µs| 753 µs| +5.8% | | nested vectors 500x100 |19.00|13.42|-29.4%|18.47|13.10|-29.1%| | realize 10k of lazy range | 4.55 µs|4.70 µs| +3.3% |5.95 µs|6.10 µs| +2.5% | | fibonacci(25) | 9.32 | 9.26 | -0.6% | 6.65 | 6.70 | +0.8% |

The +3 to +6% drifts on build / bump / map-filter / realize are within the run-to-run noise band for those rows (alloc-bound at 15-25% GC fraction; ~5% variance is typical). The cycle's substantive movement is the nested-vec row at −29% across both JIT modes.

Full bench output: .local/perf-after-cycle-b.md and .local/realistic_jit_AB_cycle_b.txt. JIT-parity byte-identical across jit-auto / jit-on / jit-off / lean. release-gate clean (18 / 18 probes). Full test suite (1368 tests, 4820 assertions) clean.

v0.328.0 — Transient fast path in `into` for vectors

(into to from) already used a transient when to was a map or set, but the vector branch was running persistent vec_conj1 per element for non-pipeline sources. Every element allocated a new intermediate vector — a path-copy per step — which dominated the nested-vec row's allocation profile (102k VAL allocations per 500×100 build, per the v0.323.0 alloc profile).

The vector branch now mirrors the map/set branches: wrap to in a transient, conj-bang each element through the iterator, seal with persistent!. The pipeline fast lane that already routes chained map/filter sources through a transient stays in front.

Synthetic measurement (median of 3, arm64-darwin):

| pattern | before | after | ratio | |---------------------------------|--------:|--------:|---------:| | nested-vec 500×100 | 17.24 ms| 11.69 ms| −32% | | (into [] (range 10000)) | 2.40 ms | 0.91 ms | −62% | | (into [1 2 3] (range 10000)) | 2.41 ms | 1.05 ms | −56% |

realistic_bench nested-vec row matches:

| measure | before | after | ratio | |----------------------|----------:|----------:|---------:| | time/iter | 17.93 ms | 12.62 ms | −30% | | alloc/op | 22.5 MB | 10.3 MB | −54% | | GC collections | 29 + 2 | 13 + 1 | half |

JIT-parity tests byte-identical across all four modes. Full test suite (1368 tests, 4820 assertions) passes; release-gate's 18 probes pass.

v0.327.0 — Builder-rewrite extension for non-empty seeds

try_builder_rewrite now accepts arbitrary acc-init expressions, not only empty literals ([] / {} / #{}). A (loop [acc seed] ... (recur (conj acc x)) acc) shape with any runtime seed value now compiles to (persistent! (loop [acc (transient seed)] ... (recur (conj! acc x)) acc)). The transient wrap copies-on-first- write through assoc! / conj!, so the original seed reference is never mutated.

The conservative safety check that rejected any acc reference outside the step is loosened: (get acc k), (get acc k default), (nth acc i), (count acc), and (get-in acc path) are transient-protocol-safe reads and now allowed in step args and the loop test. Other shapes (seq, reduce, =, contains?) still reject — those aren't covered by the transient protocol.

Synthetic measurement (5000-element seed, median of 3 × 20 iters, arm64-darwin):

| pattern | before | after | ratio | |--------------------------------------------|--------:|--------:|--------:| | bump-loop 5k map (assoc + get inside step) | 6.38 ms | 3.48 ms | −45% | | extend-vec-loop 5k (pure conj) | 1.23 ms | 0.96 ms | −22% | | extend-set-loop 5k | 0.75 ms | 0.84 ms | ~noise |

Coverage gap: realistic_bench's bump 5k int-map values row is (reduce (fn [m k] (assoc m k ...)) seed ks), not (loop ... (recur (assoc m ...))). The rewriter only catches loop / recur shapes; a reduce-pattern rewriter is a separate optimization opportunity. The row remains at ~16.6 ms.

Seed-mutation safety verified end-to-end: every test in the full suite (1368 tests, 4820 assertions) passes, including the existing transient ownership / mino_map_assoc1_owned paths. The release- gate's 18 probes pass; JIT parity is byte-identical across jit-auto / jit-on / jit-off / lean.

v0.326.0 — Interpreter-foundations cycle close

Three-release cycle (v0.324.0 → v0.326.0) targeting the floor that every workload pays into. Two foundation wins shipped; one planned item deferred after evaluation.

Shipped:

Deferred:

Net post-cycle perf vs v0.323.0 baseline (realistic_bench, Apple Silicon arm64-darwin, median of 3 runs):

| row | JIT-off before | JIT-off after | JIT-on before | JIT-on after | |------------------------------------|---------------:|--------------:|--------------:|-------------:| | build 5k int-map and sum | 10.34 ms | 10.83 ms | 10.05 ms | 10.34 ms | | bump 5k int-map values | 16.94 ms | 17.81 ms | 17.97 ms | 19.29 ms | | map/filter/map/reduce over 50k | 779 µs | 793 µs | 757 µs | 712 µs | | nested vectors 500x100 | 18.67 ms | 19.00 ms | 18.03 ms | 18.47 ms | | realize 10k of lazy range | 4.48 ms | 4.55 µs | 4.19 ms | 5.95 µs | | fibonacci(25) | 9.21 ms | 9.32 ms | 6.65 ms | 6.65 ms |

Alloc-bound rows (build, bump, nested) move within run-to-run noise — those rows are dominated by HAMT path-copy and minor-GC churn rather than interpreter floor; Cycle B (map / collection wins) targets those next.

JIT-parity tests byte-identical across jit-auto / jit-on / jit-off / lean. release-gate clean (18 / 18 probes).

v0.325.0 — Chunked-aware take and drop

lazy-take and drop-seq now forward whole chunks when the underlying source is a chunked seq, instead of stepping one element at a time. For (doall (take 10000 (range))):

| measure | before | after | ratio | |-------------------------------------|-----------:|---------:|--------:| | realize 10k of lazy range (per op) | 4.43 ms | 4.85 µs | 0.001x | | chunk-cons allocations per op | ~9,688 | ~313 | 0.032x | | heap bytes per op | ~4.0 MB | ~9.9 KB | 0.002x |

take allocates a fresh chunked-cons cell carrying the source offset when the requested count covers the head chunk in full, and materialises a small sub-chunk when the count falls mid-chunk so the seq terminates cleanly at the boundary. drop jumps past the head chunk in one step when the drop count exceeds its remaining elements; otherwise it rebases the offset.

JIT-parity tests byte-identical across modes. release-gate clean. Other realistic_bench rows unchanged within run-to-run noise.

v0.324.0 — Move-coalescing peephole for recur

compile_recur now writes a recur argument directly into its target loop binding when no later argument under the same recur reads that binding. Each elided pair drops one OP_MOVE and one temp register slot from the iteration body.

Workload deltas (Apple Silicon, arm64-darwin, 5x1M iters, median of 3 runs):

| workload | JIT-off before | JIT-off after | ratio | JIT-on before | JIT-on after | ratio | |----------------------------------------|---------------:|--------------:|-------:|--------------:|-------------:|-------:| | (recur (+ i 1) (+ acc i)) sum-to | 91.9 ms | 81.6 ms | 0.89x | 17.2 ms | 19.2 ms | 1.12x | | 5-binding (recur (+ a 1) ...) walk | 6088.6 ms | 5217.9 ms | 0.86x | 5676 ms | 5609 ms | 0.99x |

realistic_bench rows (build/bump/nested/realize/fib/map-filter) sit within run-to-run noise on both modes. JIT-parity tests byte-identical across jit-auto / jit-on / jit-off / lean. release-gate clean.

The interpreter wins come from loops with mutually independent recur arguments. The small JIT-on regression on the tightest sum-to shape (acc <- acc + i) traces to read-write-same-register dependency chains in the resulting fused stencil chain; the absolute cost is ~0.4 ns/iter.

v0.323.0 — Post-JIT-2 cycle close

Seven-release sub-cycle (v0.317.0 → v0.323.0) on the jit-roi-cycle branch, stacked on top of the prior JIT-2 close at v0.316.

Headline shipped:

Real-workload eligibility now sits at 100% reason=OK on the realistic_bench.clj and real_workloads.clj corpora.

Full close notes at mino/.local/post-jit-2-cycle-close.md. Updated mino/.local/jit-2-targets.md marks the native-coverage and cancellability rows closed.

v0.322.0 — Control-flow stencil measurement decision

The plan slotted v0.322 for per-op stencils on the seven unstenciled control-flow / dyn-scope ops (PUSHCATCH, POPCATCH, THROW, PUSHDYN, POPDYN, GETGLOBAL, SETGLOBAL), gated on the op landing in the top 5 bytes-blocked AND its workload row moving ≥ 7%. Measurement under MINO_CPJIT_STATS=tracing ./mino tests/run.clj:

| op | bytes blocked | hard | ok-with-deopt | |---------------|--------------:|-----:|--------------:| | OP_PUSHCATCH | 12 | 0 | 1 |

Every other op is at 0. Across ~30k tests, 251 attempted compiles, the single OK_WITH_DEOPT fn (a try/catch body whose PUSHCATCH sits at PC > 0) already compiles to native prefix + deopt stencil via the v0.319 path. No op meets the gate; the cycle ships zero new stencils.

What does ship: the tracing dashboard's bytes-blocked table now splits each op's total into hard (UNKNOWN_OP -- no native prefix at all) and ok-with-deopt counts so the reader can tell which side-exit-eligible fns are pulling their weight vs which are still leaving native coverage on the table. Two new state counters (op_deopt_count, op_deopt_code_bytes) feed the split; the aggregate op_reject_* totals stay the same so the top-blockers summary stays stable.

Captured in mino/.local/post-jit-2-control-flow-stencils.md.

v0.321.0 — JIT loop cancellability

The four fused-loop stencils (loop_int_lt, loop_int_dec, loop_int_lt_inc, loop_int_dec_inc) now poll mino_bc_safepoint on a 256-iteration downcounter. A spinning JIT'd loop wakes on (future-cancel f) within bounded time even when the body is entirely native; before this release the cancel would land in the TLS flag but the loop kept running because the native code never read it. The audit at v0.312 flagged the gap; this release closes it.

Per-iteration cost on the hot path is one --ticks decrement and one branch. Polled iterations (1 in 256) call into mino_bc_safepoint which short-circuits when no cancel is live. Tight-loop measurement on Apple Silicon (M3):

| row | JIT-off | JIT-on | |----------------------|---------:|---------:| | dec-only 10M | 29.9 ms | 15.0 ms | | lt-only 10M | 29.8 ms | 16.9 ms |

JIT-on numbers are within noise of v0.320 (no regression beyond the measurement floor).

Two parity tests added to tests/jit_invalidation_test.clj: each spawns a future running a JIT'd 1B-iter loop, fires future-cancel after 30 ms, and asserts cancellation lands within 500 ms total. Both pass.

v0.320.0 — Side-exit measurement

Companion mino-bench adds benchmarks/side_exit_micro.clj, a four-row probe that measures the round-trip cost of a single deopt per call. Each row is 1M iterations on Apple Silicon (M3):

| row | JIT-off | JIT-on | delta | |----------------------------------|---------:|---------:|-------:| | pure-prefix (no deopt, baseline) | 4.83 us | 4.76 us | -1.5% | | dyn-at-pc0 (classifier rejects) | 5.03 us | 4.95 us | -1.6% | | deopt-tiny (~3-op prefix) | 5.10 us | 5.06 us | -0.8% | | deopt-medium (~30-op prefix) | 5.28 us | 5.21 us | -1.3% |

The pure-prefix row shows the JIT delivers a small win on a short stenciled body even amid the per-iteration GC noise (~14% GC time on these rows). The dyn-at-pc0 row -- where the PUSHDYN sits at PC 0 and the classifier rejects the fn outright -- shows the same JIT-on vs JIT-off delta (~-2%) is just runner noise on these microbenches.

Side-exit round-trip cost: roughly 100 ns per deopt. Reading that from deopt-tiny - dyn-at-pc0 under JIT-on (5.06 - 4.95 = ~110 ns) gives the per-call overhead the deopt stencil + mino_bc_run_resume chain adds on top of running the whole body through the interpreter. The cost amortises against the cost of running the prefix through the interpreter at roughly the 30-op prefix length.

MINO_CPJIT_STATS=summary on the bench corpus reports compiled=70/70 eligible=70 top_blockers=OP_PUSHDYN(3): the three deopt fns compile via the OK_WITH_DEOPT path; nothing falls back to the interpreter. The stats sink also now counts OK_WITH_DEOPT toward the eligible total so the engagement metric stays an accurate ratio.

v0.319.0 — Side-exit deopt stencil

Fns whose first unstenciled op sits past PC 0 now compile to a native prefix plus a deopt stencil. The native code runs the supported region; when execution reaches the deopt instruction the stencil sets S->jit_deopt_pending = 1, writes the resume PC to S->jit_deopt_pc, and returns NULL. mino_jit_invoke detects the deopt sentinel, clears the flag, and tail-calls mino_bc_run_resume to drive the interpreter over the same regs window from the recorded PC.

Pieces shipped:

Smoke-tested with (defn mixed [x] (let [a (inc x)] (binding [*ns* *ns*] (+ a 1)))): the fn was previously rejected at the PUSHDYN op; it now compiles (16k native bytes, ok-with-deopt in MINO_CPJIT_STATS=tracing) and produces the same answer under jit-auto / jit-on / jit-off / lean. task release-gate green (18 of 18 adv probes); task test-jit-parity byte-identical across the four modes.

v0.318.0 — Eligibility annotation for partial native coverage

mino_jit_classify_eligibility now distinguishes two flavours of "fn has unstenciled ops" outcome:

The classifier's signature gains a size_t *first_unknown_pc out-param so the deopt position survives back to the compile pipeline. MINO_CPJIT_STATS=tracing per-fn rows now show the PC: reason=ok-with-deopt(op=15@pc=2) and the bytes-blocked table credits both UNKNOWN_OP and OK_WITH_DEOPT totals together so the dashboard ranks both pools side by side.

No code-generation change in this release: mino_jit_eligible still returns true only for plain CPJIT_REASON_OK. The compile-with-deopt path lands in the next release. Existing 4-mode parity + release-gate clean.

v0.317.0 — Dispatch loop extraction

mino_bc_run's dispatch loop body is now a static helper bc_run_dispatch_from(S, bc, base, ctx, &env, pc, &retval, saved_try_depth, saved_bc_catch_depth, saved_dyn_stack) that returns an OK/ERR status with *retval_out filled. mino_bc_run keeps clause matching, regs window push, dyn / try / cursor snapshots, the native fast-path branch, and the cleanup tail exactly as before; the loop body is identical to the previous version with goto bc_done rewritten as goto dispatch_done inside the helper. No behavior change — release-gate (18 of 18 adv probes, 4-mode parity) green, task test-jit-parity byte- identical across jit-auto / jit-on / jit-off / lean.

The helper exists so a future caller can resume execution at an arbitrary PC over a regs window the JIT has already populated. This release ships only the refactor; the resume entry plus deopt stencil land in the next two releases.

v0.316.0 — JIT-2 cycle close

Twelve-release JIT cycle wrapping. Six 10/10 targets reviewed: four met, one partially met (pipeline reaches 1.24x of the 1.3x JIT-on/off target), two unmeasured pending side-exit which was itself deferred to a follow-up sub-cycle with a captured design.

Headline movers:

Full close doc at mino/.local/jit-2-cycle-close.md. Side-exit design at mino/.local/side-exit-design.md for the follow-up sub-cycle.

v0.315.0 — Real-workload bench corpus

Companion mino-bench gains a real_workloads.clj suite that shapes its benches like user code: CSV parse, transducer-shape pipeline, protocol-dispatch state machine, nested-binding dynamic-var logger. Median-of-3 deltas on Apple Silicon:

| row | JIT-off | JIT-on | delta | |----------------------------------|---------:|---------:|-------:| | csv parse 1k x 10 | 3.50 ms | 3.51 ms | +0.3% | | pipeline 50k ints | 1.94 ms | 1.56 ms | -19.6% | | protocol state machine 5k | 85.47 ms | 84.63 ms | -1.0% | | nested binding logger | 18.86 ms | 17.73 ms | -6.0% |

Pipeline is the dispatch-heavy row the cycle moves on; the others are alloc-bound (protocol state machine churns 52 MB/op of cons cells) or prim-bound (csv parse is mostly str/split, which is a single prim call on the slow path).

The cycle's "≥ 1.3x JIT-on/off on dispatch-heavy rows" target is approached but not fully met on this corpus -- the pipeline row reaches 1.24x. The cycle close documents this honestly rather than re-tuning the bench to clear the bar.

v0.314.0 — JIT invalidation/deopt torture tests

New tests/jit_invalidation_test.clj torture suite for the interaction between JIT-compiled fns and runtime mutation: def rebind, var-set-root, binding cascades, protocol-impl extend, recursive redef, caller-chain leaf redef, repeated churn, argc-shift redef. Eight test groups, twenty-one assertions; runs as part of task test and task release-gate.

All pass on both mino and mino-lean (no-JIT) so the parity oracle still byte-matches across the four-mode run.

v0.313.0 — Adaptive JIT tiering for callsite-aware promotion

MINO_JIT=auto mode previously gated every fn behind the same jit_hot_threshold (default 100 invocations). A callsite inside a JIT-compiled fn now picks up an effective threshold of 1 for its callees -- the callee is on a path that's already paying compile cost, so promoting it eagerly amortizes immediately.

Implementation: mino_jit_invoke bumps a new jit_invoke_depth counter on mino_thread_ctx_t for the duration of the native call. invoke_bc_fn_argv reads it: when > 0, threshold is 1 regardless of the state's setting. The counter lives at the struct tail so JIT-pinned offsets stay stable.

MINO_JIT=on and MINO_JIT=off behavior is unchanged. Short script timings sit at the noise floor on the micro-benches that exercise this path (the dispatch saving is real but the absolute time is too small to differentiate). The win shows up on longer-running scripts whose total runtime would otherwise finish before the inner-fn threshold trips.

v0.312.0 — Safepoint cadence audit

Audit finding: the interpreter's per-backjump safepoint poll (mino_bc_safepoint) is already a minimal fast path -- one TLS load + branch + counter increment + return. The JIT-compiled loop stencils do not poll at all (no backjump goes through a safepoint call), so the cycle's planned dec-only / lt-only speedup target for this lever is not reachable through cadence work: the JIT path doesn't have a poll to thin.

Two follow-up opportunities surface from the audit, both deferred:

1. JIT loop cancellability. Today a JIT'd infinite loop is not cancellable because the loop stencils don't poll. This is a correctness concern for embedded scenarios with worker futures; the cycle that adds it should also re-measure the tight-loop cost to confirm the cadence stays cheap enough. 2. Interpreter poll inlining. mino_bc_safepoint is a non-inline function call. Inlining it into vm.c's backjump sites saves ~2-4 cycles per backjump on the interpreter path. The interpreter is already a slow lane vs. JIT; spending a release on that path doesn't compose with the cycle's JIT-2 focus.

v0.311.0 — Non-IC call routes through known-bc fast helper

mino_jit_call_slow (the uncached OP_CALL helper) now invokes the callable via mino_apply_known_bc_fn_argv instead of apply_callable_argv. For MINO_FN callees this skips one var-deref check and one type-of dispatch per call; the always-inlined invoke_bc_fn_argv body runs inside the known-bc helper without an extra function-call layer.

Measured deltas on Apple Silicon: the callback-heavy benches (map+inc over 100k, filter+even? over 100k) sit at the floor of the noise envelope -- the dispatch saving is real but the workload spends most of its time in alloc and prim call, not in the callable-type switch. fib(25) is unchanged because it already runs through the IC-cached fast lane that landed in JIT-ROI Phase F.

Shipped for code-shape: non-MINO_FN callees pay one extra C call hop, which is acceptable because the targeted callback workloads (user fns passed to map / reduce / filter / sort comparators) hit the MINO_FN lane every iteration.

v0.310.0 — Side-exit design scoped to a follow-up cycle

The originally planned three-release side-exit / partial- eligibility implementation has been scoped down. Two facts emerged during the cycle:

1. The realistic_bench corpus shows zero rejected fns at v0.309 (MINO_CPJIT_STATS=tracing snapshot in .local/jit-blockers-latest.md). 2. The seven unstenciled control-flow / dyn-scope ops appear in user code but not on any current hot path the cycle targets.

The full implementation is multi-day work (single-op resume entry into mino_bc_run, deopt stencil with sentinel return path through mino_jit_invoke, register-save vector across the native/C boundary). Shipping it half-implemented would risk a broken native path. The design is captured at mino/.local/side-exit-design.md; a follow-up sub-cycle picks it up with multi-day attention.

The rest of this cycle pivots to the measurable items: non-IC call fast lane (v0.311), safepoint cadence (v0.312), adaptive tiering (v0.313), and the invalidation/deopt torture suite plus real-workload corpus that the close-out depends on.

v0.309.0 — `OP_BINOP_INT` reachability audit

Audit follow-up to the v0.308 re-enable: OP_BINOP_INT is the generic-fallback opcode that historically would have been emitted when the compiler couldn't pick a specialised OP_*_II variant. The current compiler always picks the specialised variant when the type tags are known, and falls through to the prim call when they aren't -- it never actually emits OP_BINOP_INT. The bytecode interpreter still carries a handler (vm.c:928) for any hypothetical future emission; the JIT entry table still excludes it.

MINO_CPJIT_STATS=tracing over the bench corpus confirms 0 fns rejected on this op. The release-gate dashboard reports no blockers from OP_BINOP_INT either. This closes the "actionable missing op" line item for the cycle without shipping a stencil that would be reachable only through dead code.

The seven control-flow / dynamic-scope ops (PUSHCATCH, POPCATCH, THROW, PUSHDYN, POPDYN, GETGLOBAL, SETGLOBAL) remain eligibility blockers when fns use them; they are deferred to the side-exit machinery landing in v0.310.

v0.308.0 — `OP_LOOP_INT_LT` stencil re-enabled

The forward-counted single-binding loop stencil ((loop [i 0] (if (< i N) (recur (inc i)) i))) is back in the entry table. The historical 17% regression that pulled it out no longer reproduces; the workload now runs ~35% faster JIT-on vs JIT-off:

| metric | JIT-off | JIT-on | |---------------------------------|---------:|---------:| | lt-only 10M (median of 25 runs) | 29.8 ms | 19.4 ms |

The stencil source under src/eval/bc/stencils/loop_int_lt.c hasn't changed; the win is the cumulative effect of the JIT infrastructure improvements that landed across the JIT-ROI cycle (IC fast-lane, mino_bc_run prologue trim, bc-cache).

This closes one of the two remaining stencil-coverage gaps in Cortex's "actionable missing" list. OP_BINOP_INT and the seven control-flow / dynamic-scope ops are deferred to side-exit handling per Phase 3.

v0.307.0 — `task jit-blocker-report` dashboard

A new task runs realistic_bench.clj through MINO_CPJIT_STATS=tracing and writes the bytes-blocked-by-op table to .local/jit-blockers-latest.md. Rename the file after each run for date-stamped history; the diff between dashboards shows whether opcode-coverage work is moving the blocker surface.

No-op when mino-bench isn't checked out adjacent.

v0.306.0 — `task perf-gate` chains to mino-bench

A new task wraps mino-bench's perf_gate.clj (15 benches × 3 runs, allocations + timings) against the pinned baseline. When mino-bench isn't checked out side-by-side, perf-gate exits with a warning instead of failing -- the gate is opt-in for developers who care about the eval floor.

Not wired into release-gate by default: the perf benches take roughly a minute to run, while release-gate is a sub-minute pre-tag check. A perf-conscious cycle close chains to perf-gate explicitly.

MINO_PERF_GATE_RECORD=1 ./mino task perf-gate re-records the baseline in the same step. The companion mino-bench commit refreshes the pinned baseline against this version of the runtime.

v0.305.0 — `MINO_CPJIT_STATS=tracing` mode

The CPJIT stats facility gains a fourth mode alongside off / full / summary: tracing. Tracing keeps the full-mode per-fn ring and adds a bytes-blocked by op table at dump end. Each row reports op_name, the cumulative bytecode body size of every fn that was rejected with that op as first_unknown_op, and the rejected-fn count.

`` [cpjit-stats] ---- bytes-blocked by op (tracing) ---- op=62 OP_LOOP_INT_LT 5 bytes blocked 1 fns ``

Bytes-blocked is a stronger lost-lane proxy than fn-count alone: a long fn carrying one unstenciled op loses more native coverage than a small leaf fn. The table sorts descending so the top entries are the highest-impact stencil-coverage opportunities, matching the prioritization principle Cortex's "path to 10/10" calls out.

No runtime overhead in the existing CPJIT_STATS modes; the new op_reject_code_bytes counter array is populated on the cold attribution path that already runs once per inspected fn.

v0.304.0 — GC/alloc cycle close

Eight-release allocator + GC cycle wrapping up. realistic_bench deltas vs v0.296 (Apple Silicon, JIT-off, median of 5):

| row | v0.296 | v0.303 | delta | |----------------------------|--------:|--------:|-------:| | build 5k int-map and sum | 9.88ms | 8.89ms | -10.0% | | bump 5k int-map values | 17.27ms | 15.83ms | -8.3% | | nested vectors 500x100 | 20.75ms | 16.18ms | -22.0% | | realize 10k of lazy range | 6.90ms | 3.78ms | -45.2% |

Pure-compute rows (map/filter/map/reduce, fibonacci(25)) stayed within noise as expected -- the cycle targeted allocator and GC path, not dispatch.

Wins were concentrated where the v0.298 alloc-source probe pointed beforehand: rows with high calloc-no-class allocation share benefited most from the slab bump allocator (nested vec), and rows that fired many minor cycles per iteration benefited most from the nursery bump (realize lazy).

v0.303.0 — Nursery default 4 MiB → 8 MiB

A/B against {1, 4, 8, 16} MiB nursery sizes on the four alloc-bound realistic_bench rows picked 8 MiB as the median-best.

| row | 4 MiB (prev) | 8 MiB (new) | 16 MiB | |---------------------|-------------:|------------:|-------:| | build 5k int-map | 9.00ms | 8.89ms | 9.60ms | | bump 5k int-map | 16.23ms | 15.95ms |15.08ms | | nested vec 500x100 | 16.96ms | 15.98ms |17.67ms | | realize 10k lazy | 5.39ms | 4.02ms | 3.21ms |

8 MiB cuts the lazy-realize row by ~25% (fewer minor cycles per realization sweep) and the nested-vec row by ~6%; smaller-alloc rows stay within noise. 16 MiB does even better on lazy but regresses build int-map and nested vec slightly (the longer per-cycle trace outweighs the cycle-count drop).

Max single-cycle pause stayed under 5 ms across the runs at 8 MiB. Embedders with tighter memory or pause budgets can still override via MINO_GC_NURSERY_BYTES.

v0.302.0 — Builder-loop transient rewrite covers sets

The compile-time builder-loop rewriter (which lifts a (loop [m {}] ... (recur (assoc m k v)))-shaped form into a transient equivalent with (persistent! m) at the exit) now also accepts #{} for the acc init. (loop [s #{}] ... (recur (conj s x))) emits (loop [s (transient #{})] ... (recur (conj! s x))) plus the (persistent! s) wrap, matching the existing vec / map treatment.

Coverage measured against realistic_bench at v0.296: 2 hits / 2 misses (50%) when no set builders were eligible; the eligible miss patterns are 4-binding loops with non-empty / non-literal accumulators that the rewriter is not in a position to promote safely. Set support unlocks (into #{} ...)-shaped tight loops that previously went through full persistent conj.

The vec/map paths already shipped in earlier cycles; they are what already gives realistic_bench/build 5k int-map and /nested vectors 500x100 their transient-equivalent cost in the v0.296 baseline.

v0.301.0 — Write-barrier MAJOR_MARK call-site dedup

gc_write_barrier's two MAJOR_MARK arms collapse to a single gc_phase load, and the SATB / Dijkstra gc_mark_push calls become conditional on !h->mark so the function-call path runs only when there is actually a header to enqueue.

Measured effect on the targeted bench rows is at the floor of the noise envelope: write-barrier old<-young 5k moves between 8.49ms and 9.26ms across runs both before and after the change; the realistic_bench bump 5k row stays within 16.10-16.84ms. The optimization is shipped for its code-shape benefit (one fewer load on every non-mark-phase barrier, no extra work when the mark dedup hits) rather than for a measured speedup. The mark dedup was always going to happen inside gc_mark_push; pulling it to the call site removes a function-call round-trip when it fires.

The original cycle target (≥ 8% on bump 5k int-map) was not met by this lever. The remaining cycle releases target builder-loop transient promotion and nursery cadence, both of which compose with the bump path landed in v0.300.

v0.300.0 — Bump allocator on by default

MINO_BUMP_ALLOC defaults to 1; the env var only needs to be set to 0 to fall back to the calloc-only path. The bump arm runs when the freelist arm misses and replaces the per-call calloc with a single cursor advance inside a 64 KiB slab.

Median-of-5 measurement on Apple Silicon (ARM64 Darwin), JIT-off:

| benchmark | bump off | bump on | delta | |----------------------------------|---------:|---------:|-------:| | realistic / nested vec 500x100 | 18.73ms | 16.78ms | -10.4% | | realistic / build 5k int-map | 9.36ms | 9.05ms | -3.3% | | realistic / realize 10k lazy | 5.52ms | 5.33ms | -3.4% | | realistic / bump 5k int-map | 15.74ms | 16.23ms | +3.1% | | gc-alloc / alloc-only 100k vals | 8.39ms | 7.60ms | -9.4% | | gc-alloc / nursery-pressure 50k | 5.47ms | 4.73ms | -13.5% | | gc-alloc / hamt-assoc 1k | 1.70ms | 1.61ms | -5.3% | | gc-alloc / transient-build 5k | 6.37ms | 6.21ms | -2.5% | | gc-alloc / persistent-build 5k | 9.51ms | 9.39ms | -1.3% | | gc-alloc / write-barrier 5k | 8.43ms | 8.60ms | +2.0% |

Win pattern: rows whose alloc mix is dominated by sizes without a freelist class (vec internal/leaf nodes, larger HAMT nodes) benefit most; rows where the freelist was already absorbing 80%+ of allocs see modest or no gain. No row regresses outside the ±7% noise envelope.

The aspirational micro target (≥ 25% on alloc-only) was not met on this host: macOS libmalloc is well-tuned for small allocs and the bump path's saving over calloc is bounded by libmalloc's already-low per-call cost. Targets remaining for later releases in the cycle are nursery cadence and write-barrier batching, both of which compose on top of bump.

v0.299.0 — Slab-backed bump allocator (opt-in)

A new bump-allocator path lives alongside the existing freelist arm in gc_alloc_raw. When MINO_BUMP_ALLOC is set at state creation, allocations that miss the per-size-class freelist are carved from a 64 KiB slab via a single cursor advance instead of going through calloc. Slabs are appended on-demand, never freed until state destruction.

gc_hdr_t gains a one-byte bump flag (slotted into existing header padding; struct size unchanged) so sweep paths skip free() on slab-resident headers and the freelist round-trip preserves the flag through the pull-side memset.

Default off in this release. The wire-in to default-on plus the full A/B measurement against the realistic_bench corpus lands in v0.300.0. (gc-stats) exposes two new counters (:alloc-bump-hits, :alloc-bump-slab-refills) so embedders can observe the path without driving it.

v0.298.0 — Alloc-source probe in `(gc-stats)`

(gc-stats) gains three counters exposing where every allocation came from: :alloc-freelist-hits, :alloc-calloc-class-miss, :alloc-calloc-no-class. They're cumulative since state creation and cost nothing in the common path (one branch + one increment in gc_alloc_raw, which the call already pays for size-class selection).

The counters drive the cycle's next allocator-design work. On the realistic_bench rows: freelist hit rates run 62-96% with the remaining mass split between calloc-class-miss (5-15%) and calloc-no-class (0-37%). The non-class arm dominates for nested-vec and HAMT-internal-node workloads.

The counters are placed at the tail of mino_state_t alongside reader_depth so the JIT's pinned offsets in stencils/runtime_layout.h stay stable. No stencil regeneration.

v0.297.0 — GC/alloc bench harness extensions

Companion repo mino-bench gains a new gc_alloc_micro.clj suite covering the four allocator levers the next cycle targets: transient builder fast path, persistent HAMT assoc, write-barrier old<-young hot path, and pure nursery pressure. Plus a fixed-size alloc-only row that isolates the small-alloc free-list / calloc boundary.

The shared mino.bench reporter now prints alloc-bytes/op in the human-readable line. The EDN block was already complete; this brings the at-a-glance view up to par.

No behavior change in the mino runtime. Baseline at v0.296.0 frozen for cycle-end comparison.

v0.296.0 — `MINO_CPJIT_STATS=summary` one-line mode

MINO_CPJIT_STATS now honours a third value: summary. When set, the env-var sniff captures the same aggregate counters as the existing full-dump mode (compiled / eligible / attempted, total native bytes, per-op rejection counts) but skips the per-fn ring allocation entirely. At atexit it emits a single self-describing line:

`` [cpjit-stats] compiled=58/59 eligible=58 native_bytes=950272 top_blockers=OP_LOOP_INT_LT(1) ``

Suitable for routine --jit=auto introspection without the per-fn verbosity. The full dump remains available under MINO_CPJIT_STATS=1 for deeper investigation.

v0.295.0 — JIT-call fast lane through `mino_bc_run_known_native`

The IC-hit call path now enters a streamlined mino_bc_run_known_native helper instead of the generic mino_bc_run. The fast helper handles the single-clause, fixed-arity, captures-free, JIT-compiled case inline -- bc cursor save, dyn anchor, register window push, arg copy, optional try snapshot, mino_jit_invoke, and matching cleanup -- and skips the clause matcher + captures branch + the per-op bc_current_pc write that the bytecode dispatch loop owns.

Any precondition miss (multi-arity, rest-binding, inner-fn / lazy-seq captures, missing native, fold-driven stale native_gen, mismatched argc) tail-calls mino_bc_run so a stale IC slot still produces correct output. The fallback path is hit on the cold path only; the steady-state hot loop stays in the fast lane.

Final Phase F numbers (medians of 5 runs, Apple Silicon):

| benchmark | off (ms) | on (ms) | speedup | |--------------------------------------------|---------:|--------:|--------:| | (fib 25) global-defn recursion | 9.23 | 4.45 | 2.07x | | 1M defn-call loop (one global callee) | 39.21 | 18.99 | 2.06x | | user loop (inc i) (dec j) 1e7 iters | ~29 | 16.63 | 1.75x |

vs the v0.283.0 starting point:

| benchmark | v0.283 on | v0.295 on | gain | |--------------------------|----------:|----------:|-------:| | (fib 25) global-defn | 6.78ms | 4.45ms | 1.52x | | user loop dec-inc 1e7 | 30.83ms | 16.63ms | 1.85x |

realistic_bench.clj rows remain in the noise envelope (~3-7% run-to-run variance, allocator-dominated). The local-recursive fibonacci(25) row in that suite uses (fn fib ...) not a global defn, so the recursive call dispatches through OP_CALL on a register rather than OP_CALL_CACHED, and the cached-bc IC fast lane introduced in v0.293 is not on that path. This is the honest contract: IC-mediated calls (the common case for globally-defined fns calling each other) see the full Phase F win; non-IC call shapes are unchanged.

v0.294.0 — Defer `mino_bc_run` try-state snapshot

mino_bc_fn_t carries a new has_try flag, set during compile when the body emits any of OP_PUSHCATCH / OP_POPCATCH / OP_THROW. mino_bc_run now gates its per-call snapshot of ctx->try_depth / ctx->bc_catch_depth (and the matching cleanup at bc_done) on this flag. Bodies without try / catch / throw -- the overwhelming majority of hot fns including arithmetic recursors, loop kernels, and collection callbacks -- skip the load + store pair on every call.

Correctness: a body without OP_PUSHCATCH / OP_POPCATCH / OP_THROW cannot grow either counter during its execution (callees that do use try / catch balance their pushes and pops within their own frames), so the defensive rollback at bc_done would be a no-op anyway. Primitives do not touch the try stack directly; eval_try's setjmp / longjmp are balanced at its own boundary. The saved_try_depth / saved_bc_catch_depth locals are still passed through to bc_cold_op for OP_POPCATCH's bounds check; that case only fires when the slot is non-zero, i.e. when has_try == 1.

Saving is modest per call but cleanly compounds with v0.293's cached-bc fast lane on bytecode-dispatch-heavy paths:

| benchmark | off (ms) | on (ms) | speedup | |--------------------------------------------|---------:|--------:|--------:| | (fib 25) global-defn recursion | 8.97 | 4.48 | 2.00x | | 1M defn-call loop (one global callee) | 38.46 | 19.04 | 2.02x | | user loop (inc i) (dec j) 1e7 iters | 29.55 | 16.59 | 1.78x |

Vs the v0.293.0 numbers: the JIT-off rows pick up the larger absolute gain (the prologue runs on every bytecode call), while JIT-on rows benefit only for the outer entry into a JIT-compiled fn and any sub-call that drops to the interpreter for an unstenciled op. realistic_bench.clj rows remain in the noise envelope; GC-dominated rows (lazy realize, build/bump int-map) fluctuate by ~3-7% across runs, dominated by allocator variance.

v0.293.0 — Cached-bc IC fast lane

The OP_CALL_CACHED IC slot now caches the callee's mino_bc_fn_t * alongside the existing classified-callable-kind metadata. When the slot's hit branch sees cached_callable_kind == FN_BC_SINGLE, has_rest == 0, cached_fn_n_params == argc, AND cached_bc != NULL, it routes to a new helper mino_jit_call_known_native_slow that skips invoke_bc_fn_argv's per-call staleness rechecks (MINO_BC_RUNNABLE, bc->native_gen vs S->ic_gen, hot-counter bump, fold-staleness recompile) since the IC's own gen check already validated them. The helper enters mino_bc_run directly with the pre-resolved bc; mino_bc_run's native dispatch then hands off to mino_jit_invoke as usual.

The TAIL\_CALL sentinel branch remains correct: if mino_bc_run returns a TAIL\_CALL, the helper drops its frame + ns scope and hands off to apply_callable to drive the cons-form trampoline. The hot recursion case (fib, tight call loops over a global defn) does not produce sentinels.

IC slot grows from 56 to 64 bytes (one new cached_bc pointer plus 6 bytes of explicit padding so the layout assert is mechanical). Stencil byte tables regenerated across all five hosts -- the MINO_JIT_BC_IC_SLOTS mirror in runtime_layout.h carries the new size so any stencil that indexes the slot array picks up the new stride automatically.

Measured on Apple Silicon, fresh build, median of 5 runs:

| benchmark | off (ms) | on (ms) | speedup | |--------------------------------------------|---------:|--------:|--------:| | (fib 25) global-defn recursion | 9.07 | 4.58 | 1.98× | | 1M defn-call loop (one global callee) | 40.15 | 18.88 | 2.13× | | user loop (inc i) (dec j) 1e7 iters | 29.35 | 16.75 | 1.75× |

Baseline at v0.283.0 for these same shapes: fib(25) was 1.46× (9.87 / 6.78 ms). The cached-bc IC fast lane moved fib's already-JIT'd path from 6.78 ms to 4.58 ms -- a 1.48× compounding gain on top of the prior IC kind-classification work.

realistic_bench.clj rows whose fib uses a local (fn fib ...) rather than a global defn show no change: the local-recursive call goes through OP_CALL on a register, not OP_CALL_CACHED, so the IC fast lane is not on that path. This is the honest behavioural contract of v0.293.0 -- IC-mediated global recursion is sped up; non-IC dispatch (local recursion, closure invocation across an fn boundary) is unchanged. Alloc-heavy rows (build/bump 5k int-map, nested vec 500x100, realize 10k lazy) remain bound by the HAMT/vector allocator path and are likewise unchanged within noise.

v0.292.0 — `OP_DISSOC` stencil

JIT now compiles OP_DISSOC (op=59). Mirrors OP_ASSOC -- a thin stencil that calls mino_jit_dissoc_slow, which inlines the MINO_MAP fast lane through mino_map_dissoc1 and falls through to prim_dissoc for the type diagnostic on non-map operands.

Unlike OP_ASSOC (which packs [coll k v] in three consecutive registers starting at B), OP_DISSOC uses three independent slot operands: A=dst, B=coll, C=key. The slow helper reads each slot directly.

Dissoc-heavy workloads are bound by HAMT allocation, so this release doesn't move the needle on a tight dissoc loop (315 ms before, 315 ms after on a 500-key dissoc-down-from-1k-entry-map). The value is fn-level eligibility: any body that contains an OP_DISSOC anywhere (common in update/merge/select-keys-shaped code) now JIT- compiles fully instead of falling back to the interpreter.

v0.291.0 — Protocol-dispatch stencils

JIT now compiles OP_PROTOCOL_CALL_CACHED and OP_PROTOCOL_TAILCALL_CACHED. Both are two-word IC-mediated ops: word-1 carries A (arg base, also first-arg slot for type-disc), B (argn), C (return dst); word-2 carries the IC slot index.

The two stencils route through mino_jit_protocol_call_cached_slow and mino_jit_protocol_tailcall_cached_slow, both new in this release. They mirror the interpreter's handler bit-for-bit: bounds-check the slot, atom-shape-check, call the (now-exposed) mino_bc_ic_resolve_protocol for the dispatch-table lookup + write-barriered IC refill + MPR001 / MPR002 diagnostics, then apply_callable_argv with the args sitting at regs[A..A+B-1].

The tail variant is a FINAL stencil -- its return value becomes the fn's return value, matching the interpreter's retval = r; goto bc_done shape. Self-tail-recursive protocol methods continue to grow the C stack linearly here (the deliberate trade-off avoids the cons-spine the MINO_TAIL_CALL sentinel path would otherwise force on the JIT region).

Measured on Apple Silicon, 1M-iter protocol-dispatch loop:

| benchmark | off (ms) | on (ms) | speedup | |-------------------------|---------:|--------:|--------:| | (area sq) x 1M | 62.48 | 35.18 | 1.78× |

ic_resolve_protocol's inline fast lane (atom-deref + type-disc + triple-pointer-compare) is intentionally left in the slow helper for now: the type-disc compute branches on record-vs-non-record and the no-impl throw path is large enough that fully inlining the resolver in the stencil would bloat the JIT region for marginal win. Phase F's IC cache work (cache bc->native directly in the IC slot) is the natural follow-up here.

v0.290.0 — `OP_MAKE_LAZY` stencil unblocks lazy-using core helpers

JIT now compiles OP_MAKE_LAZY (op=17), previously the single biggest fn-level eligibility blocker on real workloads. The blocker histogram on realistic_bench drops from one blocker (10 fns rejected) to zero -- every body in the bench is now JIT-eligible.

The op has no useful fast-path (every invocation hits the allocator), so the stencil pattern mirrors OP_CLOSURE: a thin shim that calls mino_jit_make_lazy_slow, which captures the current jit_invoke_env, reads the child body bc from bc->consts[bx], allocates a MINO_LAZY, fills its four fields, and stores it at regs[a]. The interpreter's cold-op handler does exactly the same; this lifts that work into the JIT region so the surrounding control flow can stay native.

Eligibility-unlocked fns in <core> include mapcat, repeat, iterate, range, and the inner lazy-seq helpers used across the sequence library.

A/B on realistic_bench rows (median of 3, ARM64 Darwin):

| row | off (ms/op) | on (ms/op) | ratio | |------------------------------|------------:|-----------:|------:| | build 5k int-map and sum | 11.19 | 10.73 | 1.04× | | bump 5k int-map values | 18.98 | 18.35 | 1.03× | | map/filter/map/reduce 50k | 0.79 | 0.74 | 1.07× | | nested vectors 500x100 | 22.04 | 22.08 | 1.00× | | realize 10k of lazy range | 7.84 | 7.49 | 1.05× | | fibonacci(25) | 9.78 | 6.74 | 1.45× |

The user's headline dec-inc 10M benchmark holds the v0.285.0 win (30.7 ms off → 18.3 ms on, 1.67×).

v0.289.0 — Stencils for unary int predicates + `bit-not`

JIT now compiles pos? / neg? / even? / odd? / bit-not on tagged-int operands (OP_POS_P_I, OP_NEG_P_I, OP_EVEN_P_I, OP_ODD_P_I, OP_BNOT_I). Each stencil mirrors zero_int_p.c: tag-check, evaluate inline (compare-against-zero / parity-LSB-check / complement), store the boolean or tagged result. Boxed ints / doubles / non-numeric values still take the slow path through prim_*.

bit-not cannot escape the 60-bit tagged range on a tagged input, so no range guard is needed.

Measured on Apple Silicon, 1M-iter loop kernels:

| op | off (ms) | on (ms) | speedup | |----------|---------:|--------:|--------:| | pos? | 27.04 | 3.22 | 8.39× | | even? | 26.35 | 3.47 | 7.59× | | bit-not| 26.73 | 2.75 | 9.71× |

neg? and odd? mirror pos? / even? byte-for-byte; their bench kernels were omitted to keep the row count short.

v0.288.0 — Stencils for the bitwise int family

JIT now compiles the bitwise int ops: OP_BAND_II, OP_BOR_II, OP_BXOR_II, OP_SHL_II, OP_SHR_II, OP_USHR_II. Each stencil inlines the tagged-int fast lane (and / or / xor are unconditional; shifts guard [0, 63] on the amount, and shl / ushr range-check the result the same way add_ii does).

Previously any fn body that used one of these ops was rejected by the JIT eligibility classifier even though the bytecode interpreter has inline fast lanes for all six.

Measured on Apple Silicon, 1M-iter loop kernels:

| op | off (ms) | on (ms) | speedup | |-------|---------:|--------:|--------:| | band | 19.85 | 2.90 | 6.85× | | bor | 19.86 | 2.49 | 7.98× | | bxor | 20.13 | 2.51 | 8.02× | | shl | 27.85 | 3.91 | 7.12× | | shr | 22.08 | 2.94 | 7.51× | | ushr | 22.55 | 3.10 | 7.27× |

v0.287.0 — Stencils for `mod` / `quot` / `rem`

JIT now compiles the int divide family (OP_MOD_II, OP_QUOT_II, OP_REM_II). Each stencil inlines the tagged-int fast lane (both operands tagged, divisor non-zero, no MIN/-1 overflow corner) and falls through to mino_jit_binop_slow for the boxed / bigint / diagnostic paths.

The interpreter has had inline fast lanes for these opcodes for a while; previously any fn body that contained one of them was rejected by the JIT eligibility classifier and ran the entire body through the interpreter even with --jit=on.

Measured on Apple Silicon, 1M-iter loops:

| op | off (ms) | on (ms) | speedup | |---------------|---------:|--------:|--------:| | mod loop | 25.85 | 7.59 | 3.40× | | quot loop | 21.90 | 3.50 | 6.26× | | rem loop | 25.89 | 6.37 | 4.06× |

The realistic_bench blocker histogram drops OP_QUOT_II from the fns-blocked list (was 1 fn at baseline).

v0.285.0 — `OP_LOOP_INT_DEC_INC` stencil

The two-binding reverse-counted loop shape

`` (loop [i 0 j N] (if (zero? j) i (recur (inc i) (dec j)))) ``

is the kernel of every count-down-from-N idiom. The bytecode compiler has emitted OP_LOOP_INT_DEC_INC (op=61) since the loop-fusion landings, but the JIT had no stencil for it: any fn containing the op fell through to the interpreter.

This release adds src/eval/bc/stencils/loop_int_dec_inc.c, the matching mino_jit_loop_int_dec_inc_slow helper, and the entry-table + registry wire-up across all five host stencil byte tables. The fast path is a single inline pass per iter: tag-check both operands, zero- check the test register, overflow-guard the dec/inc pair, write back, continue.

Measured on Apple Silicon, mino at -O2:

| benchmark | off (ms) | on (ms) | speedup | |-----------------------------------------------------|---------:|--------:|--------:| | (loop [i 0 j 10000000] ...) (top-level time) | 30.69 | 18.41 | 1.67× | | dec-inc 10M harness mode | 30.95 | 18.23 | 1.70× |

The headline benchmark (time (run)) now reports ~18 ms with MINO_JIT=on and ~30 ms with MINO_JIT=off (median of 7).

OP_LOOP_INT_LT_INC (1.91×) and OP_LOOP_INT_DEC (12×) remain the faster shapes — they do strictly less inner work — but the dec/inc pair is the shape users write most often, so the absolute floor matters more than the ratio.

v0.284.0 — `MINO_CPJIT_STATS` blocker breakdown self-describes opcode names

The [cpjit-stats] ---- unknown-op breakdown ---- block previously printed each row as op=17 9 fns, leaving the numeric id without a symbolic name and forcing every downstream parser to carry its own opcode-name table.

The opcode-name lookup that the bytecode dispatch profiler keeps behind -DMINO_BC_OP_COUNTS=1 is now exposed as mino_bc_op_name, and the cpjit-stats dumper writes the name alongside the id:

`` [cpjit-stats] ---- unknown-op breakdown ---- op=17 OP_MAKE_LAZY 9 fns op=61 OP_LOOP_INT_DEC_INC 1 fns ``

The per-fn block already named these inline; the histogram now matches. No behaviour change for non-stats runs.

v0.283.0 — `apply` passes lazy / chunked tails to fn rest-args

(apply f a1 a2 ... infinite-seq) previously materialized the final-arg collection element-by-element before invoking f, which hangs whenever the seq is infinite. JVM Clojure's variadic dispatch hands the seq directly to the callee's rest-arg binding; mino now does the same when the callee is a single-arity fn with & rest.

prim_apply splices a LAZY or chunked tail directly into the args spine when the callee shape qualifies (helper: fn_lazy_safe_rest). Anything else -- prims, fixed-arity fns, multi-arity dispatch -- keeps the eager seq_iter materialization so primitives that walk raw cons cells stay correct.

apply_callable probes the args spine for a non-cons tail (LAZY / CHUNKED_CONS / CHUNK) and routes those calls through the tree-walker; the bc fast path's argv walk would otherwise silently drop the tail. bind_params and bind_vec_destructure now force lazy cells incrementally per positional bind, and the vector pre-walk stops at the & boundary so the rest-arg receives the unforced remainder. val_to_seq (used by cons) gained pass-throughs for CHUNKED_CONS and CHUNK so consumers downstream of the splice can keep their chunked spine.

Closes the spiral example from the ClojureDocs probe (partition:16-19): (apply concat coll (repeat pad)) now terminates under (take n ...).

| Shape | Before | After | |----------------------------------------|--------|--------| | (apply mfn (list 1 2 3 4 5)) | ~6.8μs | ~6.9μs | | (apply mfn (range 10)) | ~7.6μs | ~6.4μs | | (apply mfn (range 100)) | ~19.7μs| ~6.7μs | | (apply + (list 1 2 3 4 5)) | ~6.7μs | ~6.7μs | | (take 4 (apply concat p (repeat q))) | hang | finite |

(Times are per-call from .local/bench_apply.clj on the dev machine; comparison serves as a regression gate, not a deep benchmark.)

v0.282.0 — `do` body iteration forces lazy cdrs

eval_implicit_do_impl walks a body cons chain via cdr and stops at the first non-cons cell. Macros that synthesize a do form via (apply list 'do all-forms) can leave the body as a lazy seq (or with a lazy intermediate cdr), and the prior iteration would treat the lazy as "no more forms" and silently skip the remainder. The walk now forces any lazy body / lazy cdr at each step. Same safety net eval_impl's CONS branch already applies to call-form args.

v0.281.0 — Regex `|` top-level alternation + trailing-greedy fix

(re-find #"a|b" "cat") previously returned nil because the tiny-regex compiler ignored | (treated it as a literal char). The token is now compiled as a new ALT op; re_matchp dispatches top-level alternations by trying each branch in left-to-right order at every text position. JVM semantics: first matching alternative at the leftmost position wins.

Also closes a real bug in the matcher: when matchpattern early-returned 0 from a failed matchplus/matchstar/etc. sub-call, the partial increments to matchlength from atoms that ran *before* the failed quantifier leaked through to the next probe position, making re-find / re-seq on multi-word inputs return whole-match strings one byte too long (["fodder " "o" "dder"] instead of ["fodder" "o" "dder"]). Every early-failure path now resets matchlength to its entry value.

(?i)-style inline flags already worked. Nested alternation inside groups ((foo|bar)+) is not yet handled and remains a follow-up.

v0.280.0 — `eval` recurses through lazy / chunked call forms

A call form built via concat / sequence (the shape macros ship when they synthesize an expansion with ~@ unquote-splicing) came back to eval_impl as either a MINO_LAZY value or a MINO_CONS whose cdr was lazy. The previous self-evaluating branches for those types treated the seq as data and returned it unchanged, and the MINO_CONS branch read the cdr without forcing — so quote, if, and every other special form saw "no args" and either errored or returned the wrong value.

The eval kernel now:

This unblocks the macro idiom that ClojureDocs uses everywhere — (clojure.core/sequence (clojure.core/seq (clojure.core/concat ...))) — plus the local-context / check-call / def+ style of macros that synthesize their expansion from runtime data.

v0.279.0 — `clojure.math` namespace

Ships the Clojure 1.11+ clojure.math surface as a bundled library: PI, E, all the trig (sin/cos/tan + asin/ acos/atan + sinh/cosh/tanh + atan2), logarithm / exponential family (sqrt, cbrt, log, log10, log1p, exp, expm1, pow), rounding (floor, ceil, round), angle conversion (to-radians, to-degrees), and the IEEE 754 utility set (signum, hypot, copy-sign, next-up, next-down, IEEE-remainder). New math-* C primitives back the entries that weren't already covered.

Gated by the new MINO_CAP_MATH_LIB capability (included in MINO_CAP_DEFAULT). Loaded via (require 'clojure.math).

v0.278.0 — Macros receive `&env` (map of locals) and `&form`

Inside a macro body, &env is now a map whose keys are every lexical local in scope at the call site (values are nil — JVM binds LocalBinding objects, but the typical uses, (contains? &env sym) and (keys &env), are key-driven). &form is bound to the entire macro-call form. Both were previously bound to nil, so macros that inspect locals (the standard resolve / destructure / check-call idioms ClojureDocs documents) couldn't tell a local from a var.

The walk stops at the namespace-root env so the map doesn't balloon with the global var set.

v0.277.0 — `rem` / `mod` on doubles match JVM byte-identically

(mod 1024.8402 5.12) returned 0.840200000000074; JVM returns 0.8402000000000953. Both implementations follow the same spec (a - trunc(a/b) * b) but the C library's fmod and inline a - q*b were free to use a fused-multiply-add, which yields a different ULP-level result than JVM's two-step rounded sequence.

The primitive now computes q = trunc(a/b), then forces the multiply and subtract into separate IEEE 754 ops via volatiles. For huge magnitudes where q overflows long, falls back to fmod. All three corpus mod examples now match JVM exactly: 6.095000000000027, 0.8402000000000953, 4.279799999999905.

v0.276.0 — Keywords / symbols carry ns and name as separate boundaries

Previously a keyword or symbol stored a single flat interned string, so (keyword "a/b" "c") and (keyword "a" "b/c") both produced :a/b/c and were indistinguishable: name and namespace couldn't recover the original split, and = reported them as equal.

The interned value now records ns_len — the byte offset of the separating /. Two keywords with the same flat string but different ns_len are distinct (different intern slot, distinct identity, distinct equality, distinct name / namespace results). Read-back of a single-string form like :a/b/c derives the boundary via last-slash, matching JVM Clojure when the keyword wasn't constructed via the 2-arg form.

New public constructors: mino_keyword_ns_n, mino_symbol_ns_n. The single-string mino_keyword_n / mino_symbol_n continue to work and now auto-derive ns_len from the data.

v0.275.0 — `last` survives `with-redefs` of `first`

(with-redefs [first last] (first [1 2])) previously hung at 100% CPU because mino routes every core-fn call through the var, including last's own internal (first s) base case. With first redef'd to last, that base case became a recursive (last s) that never terminated.

The fix captures first and next into private boot-time locals (-lock-first, -lock-next) and rewrites last as a tail-loop against those locals. JVM Clojure achieves the same isolation through direct-linking of the compiled last; mino captures the var values at definition time instead.

v0.274.0 — `letfn` supports mutual recursion via `letfn*` special form

The letfn macro previously expanded to a sequential let, which created a new env layer per binding — each fn's closure could only see bindings declared before it. JVM Clojure handles this through a letfn* special form that pre-binds every name (with placeholder values) in a single env, evaluates each fn body against that fully- populated env, and then mutates the slot to the evaluated fn.

Mino now ships the same special form. letfn macro-expands to letfn*, so existing user code that depended on mutual recursion among local fns (trampoline / state-machine idioms) starts working.

The new field for the letfn* symbol lives after the JIT-layout-pinned region of mino_state so the runtime_layout.h offsets are unaffected.

v0.273.0 — `merge-with` returns a single non-map input as-is

JVM Clojure's (merge-with f x) with a single non-map input reduces to x (the underlying reduce of one element). Mino was rejecting non-map non-nil inputs unconditionally, which broke the recursive deep-merge idiom whose base case bottoms out at a non-map leaf. The primitive now short-circuits when there is exactly one input arg: the arg is returned unchanged regardless of type. Two or more non-map inputs still raise, matching JVM.

v0.272.0 — `walk` recurses into any `seq?`, not just `cons?`

The walk kernel checked (cons? form) for its sequence branch, so a lazy seq like (map inc [1 2 3]) fell through to the default identity branch instead of being recursively walked. JVM Clojure's walk dispatches on anything that satisfies seq? (any ISeq). Mino now matches: postwalk and prewalk correctly recurse into the output of fns that return lazy seqs from the vector branch of their walking-fn.

v0.271.0 — `reduce-kv` on vectors uses index-as-key

JVM Clojure's reduce-kv on a vector calls (f acc index element) per slot. Mino was falling through to the seq-driven internal-reduce-kv, which decomposed each element as a [k v] pair — so (reduce-kv (fn [m k v] (assoc m k v)) {} ["one" "two" "three"]) produced char-keyed garbage instead of {0 "one", 1 "two", 2 "three"}. The IKVReduce protocol now ships a :vector impl that walks index-by-index and respects reduced?.

v0.270.0 — `partition-all` transducer emits vector groups

The 1-arg transducer arity of partition-all was passing each group through (seq buf), so transduced output printed as [(a b c) (d e f)]. JVM's transducer impl emits each group as (vec (.toArray buf)) — i.e. a vector — and downstream consumers that compose via (into [] ...) produce [[a b c] [d e f]]. The mino transducer now emits the buf vector directly. The 2- and 3-arg seq arities continue to produce lists, matching plain (partition n coll).

v0.269.0 — Doubles print at the shortest round-trippable precision

pr-str on a double used %g with the C default of 6 sig figs, so (mod 475.095 7) printed as 6.095 even though the actual value is 6.095000000000027. The printer now picks the shortest precision whose result re-parses to the same double, mirroring Double.toString semantics. Fixed notation is used in the [1e-3, 1e7) range (signed magnitude) and scientific elsewhere. Clean values (1.5, 100.0, 0.1) still print short; floats born of imprecise arithmetic now surface their full extent.

v0.268.0 — `seq-to-map-for-destructuring` helper

Adds the JVM 1.11+ helper used by varargs kwargs destructuring: collapses a seq of :k v :k v ... (optionally with a trailing override map) into a single map. Empty input or a single-element seq with a map element pass through; trailing override entries overwrite same-keyed pair entries.

v0.267.0 — Nested patterns inside map destructuring

The pattern position of {pattern :key} now accepts vector and map patterns in addition to bare symbols, so (let [{a :a, [lhs rhs] :c} {:a 1, :c [:foo :bar]}] ...) binds lhs and rhs to :foo and :bar respectively. The recursive bind_form already knew how to walk vector/map patterns; the runtime map-destructure path just wasn't passing them through. :or defaults still apply only to leaf-symbol bindings (matching JVM).

v0.266.0 — Symbols callable as keyword-style lookups

('sym m) and ('sym m default) now return the value at the symbol-keyed entry in m, or the default (nil if absent). For non-map collections the lookup falls through to the default, matching JVM Clojure: (map (fn [s] (s 0)) '(inc dec zero?)) yields (nil nil nil) instead of erroring. Sorted-map and record lookups are also handled.

v0.265.0 — Map destructuring evaluates the key expression

{sym k} previously used k as a literal key, so destructuring {nm k} {:name "john"} with k bound to :name looked up the symbol k rather than the keyword :name and bound nm to nil. The runtime destructure path now evaluates the RHS expression when it is a symbol — matching JVM Clojure, which expands {sym k} to (get gmap k) so k resolves in the surrounding scope. Self-evaluating forms (keywords, strings, numbers) are unchanged.

v0.264.0 — `keys` / `vals` accept seqs of map entries

(keys coll) and (vals coll) previously demanded a map and rejected sequences. JVM Clojure routes anything seqable through RT/keys / RT/vals, which iterate elements as MapEntry / [k v] vectors. Mino now does the same: vectors, cons lists, lazy seqs of MINO_MAP_ENTRY or 2-vectors are walked element-wise. This unblocks idioms like (keys (remove (fn [[_ v]] (= v 1)) (frequencies xs))).

v0.263.0 — `keep-indexed` gains the 1-arg transducer arity

(keep-indexed pred) returns a transducer that calls pred with the running index and each input, dropping nil results. The existing 2-arg lazy-seq path is unchanged.

v0.262.0 — `name` / `namespace` split at the last slash

The 2-arg keyword constructor (keyword "a/b" "c") produces a qualified keyword whose name is "c" and whose namespace is "a/b" (matching JVM Clojure, which stores ns and name as separate strings). Previously name and namespace split at the *first* slash in the symbol/keyword's internal "a/b/c" form, so (name (keyword "a/b" "c")) returned "b/c" and (namespace ...) returned "a". Both now scan for the last slash so the round-trip recovers the originally-passed segments.

v0.261.0 — `condp` recognizes the `:>>` result-fn arrow

(condp pred expr ... test :>> result-fn ...) now activates the JVM result-fn arrow form: when (pred test expr) is truthy, the truthy pred-result is bound and passed to result-fn, and the call replaces the branch. The plain 2-form test then continues to work unchanged. Useful for set/predicate dispatches where the match value itself is the input to the handler.

v0.260.0 — `subseq` / `rsubseq` 5-arg form matches JVM Clojure semantics

The 5-arg (subseq sc start-test start-key end-test end-key) form previously required start-test ∈ {>, >=} and end-test ∈ {<, <=} and performed a strict-bounds scan, rejecting any other test orientations with a type error. JVM Clojure permits any of <, <=, >, >= in either position and gives them a specific seqFrom-and-filter semantics: scan from start-key forward (for subseq) or end-key backward (for rsubseq), drop the first element if its pivot test fails, then take-while the other test.

The C primitive now mirrors that semantics. The 3-arg form is unchanged. Empty-result behavior also tracks JVM: when seqFrom returned a non-empty list but the take-while drained it, the 5-arg form returns () rather than nil (matching the lazy-seq path JVM produces); when the sorted collection has no elements at or past the pivot, the result remains nil.

The previous "rejects-bad-args" tests that pinned the strict interpretation have been replaced with positive tests covering the JVM-shape forms.

v0.259.0 — `mapv` gains the multi-collection arity

(mapv f c1 c2 ...) previously raised MAR001 mapv requires 2 arguments. The 2-arg arity is heavily optimized (pipeline fast lane, transient builder) and stays unchanged. A second path in the same C primitive handles N >= 3: walk all collections in parallel through seq_iter_t, call fn with one element from each per step, and stop at the shortest. Up to 32 collections.

Examples that now work: (mapv + [1 2 3] [4 5 6])[5 7 9], (mapv + [1 2 3] (iterate inc 1))[2 4 6], (apply mapv vector [[:a :b :c] [:d :e :f] [:g :h :i]])[[:a :d :g] [:b :e :h] [:c :f :i]].

v0.258.0 — `sequence` gains the 1-arg coerce-to-seq arity

(sequence coll) was missing — only the 2- and N-arg transducer forms were defined, so (sequence nil), (sequence []), and (sequence [1 2 3]) all raised MAR002 no matching arity. JVM Clojure's 1-arg form coerces its input to a (possibly empty) seq: nil and empty collections yield (), an already-realized seq is returned identically, and other seqables go through seq.

The fix adds the missing arity to sequence in src/core.clj under the existing (when (mino-installed? :transducers) ...) block. The transducer-bearing arities are unchanged.

v0.257.0 — `drop` always returns a seq, never the source collection

(drop n coll) for n <= 0 previously short-circuited and returned coll as-is. When coll was a vector / map / set / chunked source, the returned value was still a vector / map / set — so (pr-str (drop 0 [1 2 3 4])) printed [1 2 3 4] instead of (1 2 3 4), and (vector? (drop 0 [1 2 3 4])) was true. The C prim_drop_seq fast-path now routes through prim_seq for the n <= 0 case so the result has the same seq type as the positive-n path.

take was already correct (its n <= 0 path returns an empty list) and stays unchanged.

v0.256.0 — `clojure.string/replace` accepts regex match and fn replacement

(clojure.string/replace s match repl) now accepts a regex as the match argument, matching JVM Clojure. The C primitive walks the input via the existing regex engine and emits each replacement, so the regex path works whether or not the script has explicitly required clojure.string first — which closes the cookbook gap the ClojureDocs differential probe surfaced ((clojure.string/replace "The color is red" #"red" "blue")).

Three replacement shapes are supported on the regex path:

The clojure.string/replace wrapper now only adds char-match coercion (1-char-string → primitive); regex dispatch lives in C.

v0.255.29 — Fix: BC catch lands restore `bc_current_bc`; safepoint sleeps for fair handoff

Two follow-ups to v0.255.27's safepoint poll + diag location work, surfaced by the v0.255.28 CI run on a 2-CPU ubuntu-24.04 x86_64 runner.

`bc_current_bc` survives a longjmp out of an inner BC frame

prim_throw_classified and normalize_exception (the v0.255.27 diag-location work) now consult ctx->bc_current_bc first when attributing :mino/location. But when a throw inside an inner BC fn longjmps past that fn's normal exit-time restore, the cursor is left pointing at the now-popped frame. The catch handler's normalize_exception then dereferenced a soon-to-be-freed bc_fn_t, surfacing as a heap-use-after-free under ASan.

Two fixes:

Auto-yield uses 100us nanosleep instead of `sched_yield()` only

POSIX mutex unlock+lock is NOT a fair handoff: on a 2-CPU host, the yielding thread can re-acquire state_lock ahead of a waiter even with a sched_yield() between. The v0.255.28 async-busy-spin-does-not-starve-siblings test hung on ubuntu-24.04 x86_64 because the busy-spin reader kept winning the lock race against the writer futures.

The BC safepoint poll's auto-yield path now does a 100us nanosleep (POSIX) or Sleep(0) (Win32) between the mino_yield_lock release and the mino_resume_lock re-acquire. At one auto-yield per ~64K backward jumps the overhead is negligible on real workloads; on a pathological tight loop it caps throughput in a way that lets siblings actually run.

v0.255.28 — Fix: tighten new async tests for tight-CPU CI runners

Follow-up to v0.255.27: two of the new regression tests (async-busy-spin-does-not-starve-siblings and async-future-cancel-interrupts-cpu-bound) failed on macos-14 and ubuntu-24.04 GHA runners with MTH001 thread-limit-exceeded. CI runners get ~3 CPU allocations vs 12 on dev; combined with the test order, prior workers were still in worker_run cleanup when the next spawn ran -- their thread_count slots not yet released.

No runtime code change; just test hygiene.

v0.255.27 — Bug-fix sweep: deref/regex/location/concurrency/cleanup

Nine fixes landed in this patch, covering Clojure-canon correctness gaps, BC VM concurrency safety, location reporting hygiene, and one cosmetic source-tree boundary. Headline items:

Details in the per-fix sections below.

Fix: user-throw catch values now carry `:mino/location`; BC throws blame the throw site

Two related diag-location precision fixes:

1. User-throw catch values previously lacked :mino/location. (try (throw "boom") (catch e (get e :mino/location))) returned nil. (try (throw (ex-info "x" {})) ...) likewise. System throws (prim_throw_classified) included the field; user-throw paths (normalize_exception) didn't, producing an inconsistent error shape. Fix: normalize_exception now consults bc_current_pc (preferred) or eval_current_form (fallback) and attaches the location.

2. A throw inside a BC-compiled fn body previously blamed the call site, not the throw site. (defn f [] (assoc nil)) followed by (f) reported the (f) line for the arity error instead of the inner (assoc nil) line. Root cause: both prim_throw_classified and normalize_exception checked eval_current_form first, which stays at the outer call form during BC dispatch. Fix: both paths now consult the BC PC first (which the VM keeps in sync with each instruction); they fall back to eval_current_form for tree-walker frames.

Regression in tests/bc_error_quality_test.clj (user-throw-carries-location, bc-throw-prefers-pc-over-call-site).

Fix: `(char n)` constructor added to clojure.core

(char 65) previously raised MNS001 unbound symbol: char. The underlying mino_char(S, codepoint) C entry point existed and MINO_CHAR was already a first-class type with literals (\A), predicate (char?), and char-keyed collection support -- but the canonical Clojure constructor that turns an integer codepoint into a character was not exposed.

prim_char is now registered in clojure.core: takes one argument, identity on existing characters, integer codepoints in 0..0x10FFFF become the corresponding Unicode scalar value. Out-of-range codepoints throw MBD001; non-integer / non-char inputs throw MTY001.

Regression in tests/char_test.clj (char-constructor-from-codepoint).

CI: release-gate doc clarified for `check-stencils-fresh` exclusion

The release-gate composite has not included check-stencils-fresh since the dev / CI toolchain divergence emerged (Apple clang 17 on dev, Apple clang 15 on macos-14 GHA runners, gcc-no-musttail on ubuntu-24.04 runners). The CI workflow's step comment was stale and still listed check-stencils-fresh as part of the gate; the actual composite in lib/mino/tasks/builtin.clj has long excluded it.

Updated .github/workflows/ci.yml to describe the actual gate (reloc-mirror, stencil-registry, test suite, ASan, 4-way JIT parity) and the architectural reason check-stencils-fresh is a dev pre-commit step rather than a CI gate. No code change.

Internals: `src/public` borrows internals through one explicit bridge

src/public/embed.c and src/public/gc.c previously included runtime/internal.h and prim/internal.h directly. The public ABI in src/mino.h is unaffected; the dependency was an implementation-side maintainability concern -- a runtime refactor could touch a file under src/public/ and that lookalike-looking dependency was the only signal.

Both files now include src/public/internal_bridge.h, a thin re-export of the two internal headers. Any new internal that a public function reaches for now surfaces in the bridge first, making the borrow explicit at the include line. No ABI or runtime behavior change.

Fix: per-ctx BC register stack + `pmap` re-enabled

Concurrent fn calls that yield state_lock (e.g. via thread-sleep) previously corrupted each other's argument slots. Root cause: S->bc_regs and S->bc_top lived on mino_state_t, so all workers shared one BC register stack. The second worker's bc_push_window grew the stack above the first worker's frame; when the first worker resumed and ran bc_pop_window, it dropped bc_top below the second worker's frame, NULL'ing slots the second worker would later read on resume. The visible symptom was (let [f (fn [x] (thread-sleep 5) x) f1 (future-call (fn [] (f 1))) f2 (future-call (fn [] (f 2)))] [@f1 @f2]) returning [1 nil] instead of [1 2].

Fix: each mino_thread_ctx_t now owns its BC register stack pointer / cap / top via dedicated bc_regs_storage / bc_regs_storage_cap / bc_top_snapshot fields. mino_lock / mino_unlock snapshot the state-level cursor into the current ctx on outermost lock-depth transitions; mino_yield_lock / mino_resume_lock mirror the snapshot across the yield window so a peer worker that takes the lock during the yield sees its own ctx's view of S->bc_regs. The GC root walker visits every yielded ctx's stored snapshot so a peer's allocation pressure can't collect slots a parked worker holds.

JIT-pinned offsets shifted by 32 bytes (the ctx grew); the generated stencils were regenerated and runtime_layout.h updated to match.

With the underlying yield-safety fixed, clojure.core/pmap is now shipped: a lazy parallel map that spawns (- (mino-thread-limit) 1) futures per chunk and derefs them in order. Falls back to plain map when threads aren't granted so callers don't need a guard.

Regression in tests/async_smoke_test.clj (async-concurrent-fn-yield-preserves-args).

Fix: BC VM safepoint poll for `future-cancel` + busy-spin auto-yield

future-cancel previously only flipped the impl's CANCELLED tag and broadcast its cv. A CPU-bound worker running a tight (loop ... recur) had no enclosing cv_wait, so the cancel was invisible to the worker and the subsequent state_destroy hung forever waiting on pthread_join.

Separately, a worker future running a busy spin without explicit yield monopolized state_lock, so sibling workers couldn't make progress and the script deadlocked.

Both fixes share one mechanism: a new mino_bc_safepoint(S) poll called from every backward jump in the BC VM (OP_JMP, OP_JMPIFNOT, and all OP_LOOP_INT_* fast and slow paths). Two responsibilities:

The fast-path cost is one branch (tls_cancel_ptr != NULL, which is NULL on the embedder thread) plus an unsigned counter increment. The TLS storage avoids growing mino_thread_ctx_t, which would shift JIT-pinned offsets in main_ctx.

Regression in tests/async_smoke_test.clj (async-future-cancel-interrupts-cpu-bound, async-busy-spin-does-not-starve-siblings).

Fix: `eval_current_form` restored after sub-eval

When eval recursed into a sub-form, eval_current_form was overwritten but never restored on return. Any throw that fired later -- e.g. a thread-limit throw from inside mino_future_spawn after eval_args had finished walking the argument list -- blamed the last sub-form's source location instead of the active call form's. Across test files, the "last sub-form" could be in a previously loaded file, so :mino/location reported a wrong filename entirely (the BUG was filed against tests/bc_closure_test.clj reporting locations from tests/bc_binding_test.clj).

eval_impl's MINO_CONS branch now saves the previous eval_current_form on entry and restores it before every return path (host syntax, special form, regular call). Throws that fire from a primitive after eval has handed off (e.g. arity errors from a prim called via apply_callable) now see the call form, not the last argument.

Regression in tests/bc_error_quality_test.clj (eval-current-form-restored-after-subeval).

Fix: regex inline flags `(?i)` / `(?s)` / `(?m)` / `(?x)`

JVM Pattern-style inline-flag ops are now supported. Previously (re-find #"(?i)foo" "FOO") returned nil because the engine parsed (?i)foo as literal characters and tried to match the substring "(?i)foo". Now the compiler emits dedicated SET_FLAGS pattern slots that the matcher absorbs to update an active flag word; matchone, matchcharclass, matchrange, matchdot, and the BEGIN / END anchor handling all honor the active flags.

Supported flags:

Flags can be cleared with the negation form (?-i), combined ((?ix)), or toggled in one op ((?i-s)). Mid-pattern flag ops apply from that point onwards, matching JVM semantics.

Scoped flag groups (?<flags>:...) are detected and rejected with a clear compile error rather than silently misinterpreted, since mino's matcher has no per-group flag-state stack to restore the prior flags on group close. Adding that is a separate cycle.

Default regex semantics are now closer to JVM: . no longer matches \n by default (RE_DOT_MATCHES_NEWLINE default flipped to 0). (?s) opts back into the previous always-on DOTALL behavior. Existing tests didn't exercise the previous default behavior; the change is observable only on patterns containing newlines, which are rare in the existing suite.

Regression in tests/regex_test.clj (re-inline-flags).

Fix: `(deref ref ms timeout-val)` 3-arg timed deref

Real Clojure ships a 3-arg deref that returns timeout-val if a blocking ref (future, promise) doesn't realize within ms milliseconds. mino's prim_deref rejected anything other than the 1-arg form with MAR001, so callers had to spin (realized? p) + thread-sleep -- busier and racier than the canonical form.

prim_deref now dispatches on arity; the 3-arg form routes futures and promises through a new mino_future_deref_timed that wraps a portable cv_timedwait_ms shim (pthread_cond_timedwait on POSIX, SleepConditionVariableCS on Windows). Spurious wakes recompute the remaining budget against the monotonic clock so the deadline is honest. The wait runs under a yielded state_lock so sibling workers can still deliver during the timed window. Non-blocking ref types (atom, var, volatile, agent, reduced, tx-ref, delay) throw a clear "3-arg form only supported on blocking refs" diagnostic.

The core.clj deref override (which routes delays through deref-delay) now also dispatches on arity so the new C path is reachable.

Regression in tests/async_smoke_test.clj (async-deref-timed-promise).

v0.255.26 — Fix: `(str/split "" re)` returns `[""]`, not `[]`

JVM Clojure (and POSIX String.split) returns [""] -- a single empty-string element -- when the input is empty, regardless of the separator. mino's prim_split returned [] (an empty vector) because the empty-input case fell into the regular split loop which never appended a piece. Downstream code that destructures [head & tail] on the result expected head to be "", not nil.

prim_split now short-circuits on slen == 0 and returns a 1-element vector containing the empty string before either the regex or literal-string path runs.

Regression in tests/string_test.clj (split-empty-input).

v0.255.25 — Fix: `seq_iter_init` walks records as kv pairs

(into {} record) returned {} instead of {:x 1 :y 2}. The iterator that backs into, reduce, transduce, and most other seq aggregates fell through to the default (immediately-done) case for MINO_RECORD. prim_seq itself had a record case that built a [k v] cons list in declared-field-then-ext order, but the iterator didn't route through it.

seq_iter_init now routes MINO_RECORD through prim_seq up-front so every downstream consumer walks records as a seq of [k v] pairs. After this:

``clojure (into {} (->Point 1 2)) ; => {:x 1 :y 2} (into [] (->Point 1 2)) ; => [[:x 1] [:y 2]] (reduce conj #{} (->Point 1 2)) ; => #{[:x 1] [:y 2]} ``

Regression in tests/records_test.clj (record-seq-is-iterable).

v0.255.24 — Fix: Vector destructuring of lazy / chunked seqs

(let [[a b c] (range 3)] [a b c]) returned [nil nil nil] instead of [0 1 2]. bind_vec_destructure walked the value as a cons chain via mino_is_cons, which returns false for MINO_LAZY and MINO_CHUNKED_CONS (the shapes range, map, filter, and most of clojure.core's seq pipeline produce). Every pattern slot fell through to "bind nil when value is shorter than pattern". Vector and list values worked because they were pre-converted to a cons chain at the top of the function; lazy and chunked-cons had no corresponding case.

Added a type-dispatched inline walker that handles MINO_LAZY (force then re-dispatch), MINO_CHUNKED_CONS (walk chunk slice then descend into .more), MINO_CHUNK (walk the flat array), and MINO_VECTOR (positional read), realizing just the prefix the pattern needs and stashing it as a fresh cons chain so the downstream positional walk works uniformly.

Affects every destructure of a sequence produced by range, map, filter, take, drop, etc. (i.e. most real-world Clojure code that uses vector destructure on a seq).

Regression in tests/destructuring_test.clj (vec-destructure-lazy-seq).

v0.255.23 — Fix: `set!` mutates dynamic-var bindings (was no-op)

set! used to be a no-op macro: (set! *x* 5) returned nil and left *x* unchanged, even inside a (binding [*x* 0] ...) form. The comment called it "a JVM compiler directive, not applicable to mino", which is half-true (set! does double duty in JVM Clojure: field mutation on the JVM side, *and* thread-local dynamic-var binding mutation). The dynamic-var shape is portable code that mino was silently ignoring.

Now: (set! *x* expr) calls the new set-dyn-binding! C primitive which walks ctx->dyn_stack, finds the topmost binding for the symbol, mutates its val, and returns the new value. With no active binding frame for the named var, throws Can't change/establish root binding of: *name* with set! — matches JVM Clojure's runtime contract.

The JVM-only field-mutation shape (set! (.-field obj) val) is not supported; mino has no JVM fields. Pre-existing Clojure code that relied on the old no-op behavior to silently swallow (set! *warn-on-reflection* true) now needs to wrap such calls in try/catch (or just delete them, since they had no effect anyway). Per [[alpha-no-backwards-compat]], this contract change is acceptable.

Regression in tests/compat_test.clj (set-bang-mutates-dynamic-binding).

v0.255.22 — Fix: Regex engine now parses `{n}` / `{n,m}` / `{n,}` quantifiers

The vendored regex engine recognized *, +, ? quantifiers but silently dropped {n} and friends — the compile-time switch never matched {, so the brace and its digits were treated as unknown meta and the pattern produced "zero matches needed" behavior. (re-find #"\d{4}" "year 2026") returned nil; (re-matches #"(\d{4})-(\d{2})-(\d{2})" "2026-05-17") returned nil. Date / version / hex-id / password patterns common in Clojure code all failed silently.

Added a BOUNDED op with {min, max} storage in the regex_t union, a case '{': arm that parses {n} / {n,} / {n,m} (and falls back to treating { as a literal when followed by non-digit content, so {abc} still parses), and a matchbounded matcher modeled on matchstar / matchplus with explicit bounds and backtracking. {n,} encodes max == 0xFF as the unbounded sentinel.

Min/max clamped to 255 — far above any real-world pattern. Greedy with backtracking, same shape the other quantifiers use.

Regression in tests/regex_test.clj (re-bounded-quantifier).

v0.255.21 — Fix: `time-ms` returns wall-clock, not process CPU time

(time-ms) was implemented as clock() / CLOCKS_PER_SEC * 1000 — process CPU time, not wall-clock. The (time expr) macro in core.clj and the task runner's "elapsed" reporter both built on this, so (time (thread-sleep 200)) printed "Elapsed time: 0.194 ms" instead of "~200 ms": the sleeping thread spent no CPU during the sleep, even though 200ms of wall time elapsed. Same shape any benchmark or progress reporter built on time-ms silently undercounted by however long the process spent blocked.

Clojure's (time expr) contract is wall-clock; mino's diverged silently. Fixed by routing time-ms through mino_monotonic_ns() / 1e6, matching the same monotonic clock nano-time already exposed. The new release-gate elapsed reading jumps from "44 ms" (CPU) to "~24000 ms" (real wall-clock) — the latter is honest.

Regression in tests/math_test.clj (time-ms-fn testing block "measures wall-clock, not CPU time").

v0.255.20 — Fix: defrecord auto-binds fields in inline method bodies

Real Clojure binds each declared field as a local inside an inline protocol method body, so

``clojure (defrecord Square [side] IShape (area [_] (* side side))) ``

works as written. mino's defrecord previously didn't wrap method bodies: bare side raised "unbound symbol: side" at compile time, forcing users to write (:side this) or destructure manually.

defrecord now transforms each spec method (mname [params] body...) by wrapping body in a (let [field (get this :field) ...] body) that introduces every declared field as a local pointing at (get this :kw). Fields whose name shadows the method's first parameter are skipped (the param wins, matching Clojure: the user's chosen first-param name is the dispatching slot regardless of what the protocol declared it).

Regression in tests/records_test.clj (defrecord-inline-method-binds-fields).

v0.255.19 — Fix: Worker leaks last_diag on uncaught throw

LSan on the gcc-built ubuntu-24.04 runners reported a 160-byte direct leak from diag_new via set_eval_diag_with_data after the v0.255.18 async test added (future (throw (ex-info ...))). The worker's per-thread ctx allocates a mino_diag_t when its body throws and the throw lands at try_depth == 0 (no enclosing try); the diag installed itself into ctx->last_diag, its cached_map got captured into impl->exception for consumer-side rethrow, and then worker_run called free(ctx) without freeing the diag itself.

The leak predates v0.255.18 — any future that threw leaked its diag — but macOS clang ASan ships without LSan, so it never showed locally. v0.255.17's CI run had no async test that threw; v0.255.18 added one and surfaced it.

Fix: worker_run now calls diag_free(ctx->last_diag) before free(ctx). The diag struct's malloc'd fields (kind/code/msg strings, notes, spans, frames) all get reclaimed; the Mino-side cached_map stays reachable from the future's impl->exception on the GC heap.

v0.255.18 — Fix: Preserve ex-info data through futures + expose `>!` / `<!`

Two async-surface canon-parity fixes surfaced by an adversarial concurrency probe pass.

ex-info data lost through futures. When a future body threw (ex-info "msg" {:k :v}), derefing in a try/catch returned an exception map with :mino/data nil — the :data payload disappeared on the way out of the worker. Two places dropped it:

Regression in tests/async_smoke_test.clj (async-future-ex-info-data-preserved).

>! and <! not :refer-able. The parking variants of clojure.core.async had no var definitions, only the blocking >!! / <!!. (:require [clojure.core.async :refer [<! >!]]) raised an unresolved-name error before the user's go block compiled, even though the go-transformer recognized the symbols by literal name inside a go body. Added stub macros that throw on direct invocation (matching real core.async's shape); the transformer intercepts them before expansion when used inside (go ...).

Regression in tests/async_smoke_test.clj (async-parking-ops-are-referable).

v0.255.17 — Fix: Disable ASan fake-stack for conservative scanner

After v0.255.16 let libasan print its full report on ubuntu-24.04 x86_64, the report named the real bug: a SEGV inside gc_scan_stack at src/gc/roots.c reading from an unmapped page near 0x7f7a65220000. Root cause is ASan's fake-stack (use-after-return detection) feature, which is default-on in gcc-built ASan binaries.

When fake-stack is active, each function's address-taken locals are relocated to a separately allocated region. &probe recorded at state init (gc_note_host_frame) and &probe taken inside gc_scan_stack end up in different fake-stack regions, with unrelated heap pages between them. The conservative scanner walks word-by-word from one &probe toward the other, hits a guard page, and SEGVs.

The no_sanitize_address attribute added in v0.255.14 only suppresses red-zone checks; it does not change where local addresses live. The correct fix is to disable fake-stack at the ASan runtime layer.

Define __asan_default_options() in main.c (a weak hook libasan calls before main) that returns "detect_stack_use_after_return=0" when the binary is built with ASan. Use-after-return is a nice-to-have; the heap-corruption / red-zone class of bugs the release-gate ASan run actually targets is unaffected. detect_leaks remains on, so LSan continues to catch what ubuntu-24.04-arm was flagging in v0.255.15.

macos-14 / windows-2022 / ubuntu-24.04-arm stayed green in v0.255.16; only ubuntu-24.04 x86_64 carried this layer.

v0.255.16 — Fix: Defer to libsanitizer + filter JIT note from parity diff

Two remaining CI matrix failures after v0.255.15:

Changes

Verification

v0.255.15 — Fix: Free dyn_frame on eval_try unwind path

With v0.255.14's ASan stack-scan attribute fix letting the ASan suite run to completion on Linux runners, LeakSanitizer (active by default in gcc's libsanitizer; off on macOS) flagged a real leak:

`` ==2692==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 ... in malloc #1 ... in eval_binding src/eval/bindings.c:855 ``

eval_binding heap-allocates a dyn_frame_t deliberately -- the comment at bindings.c:803 calls out that a stack-allocated frame would be read-after-pop by control.c's throw-unwind walker. The normal-completion path at line 868 calls free(frame). The throw-unwind walker in control.c::eval_try_special_form does dyn_binding_list_free(f->bindings) and unlinks the frame from dyn_stack, but forgets to free f itself -- so every binding form whose body throws leaks 16 bytes.

The BC VM's matching unwind paths (bc_done at vm.c:2103-2109 and the catch landing pad at vm.c:2057-2063) already free f correctly. control.c had the two walkers that lacked the free.

Changes

Verification

v0.255.14 — Fix: gc_scan_stack no_sanitize attribute on gcc-built ASan

With the v0.255.13 stencil-check drop letting CI's Release gate reach the ASan suite step on Linux runners for the first time, a latent gcc-only bug surfaced:

`` ==2674==ERROR: AddressSanitizer: stack-buffer-overflow READ of size 8 at 0x7f44bb31daa0 thread T0 #0 ... in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29 #1 ... in gc_scan_stack src/gc/roots.c:560 #2 ... in gc_minor_collect src/gc/minor.c:460 ``

gc_scan_stack is a conservative stack scanner -- it walks every aligned machine word between gc_stack_bottom and the collector's own frame, treating each as a candidate pointer. By design it crosses every prior frame and reads ASan's poisoned red zones; the function is decorated with __attribute__((no_sanitize_address)) to suppress the false positive.

The decoration was gated on __has_feature(address_sanitizer), which is a clang-only builtin. gcc exposes ASan via the __SANITIZE_ADDRESS__ predefined macro instead. On gcc-built ASan binaries the attribute was silently dropped and libsanitizer flagged every cross-frame read in the scan loop.

The same dual-detection pattern was already in src/gc/internal.h for the MINO_GC_PIN_LOUD_ASSERT macro -- adopted here for the scanner attribute. macos clang's ASan was permissive enough to let the scan slide without the attribute, which is why the bug went undetected for the cycle.

Changes

Verification

v0.255.13 — Fix: Drop byte-identity stencil checks from CI

With the v0.255.12 fanout fix landing, CI finally reached the Release gate step on the matrix runners for the first time since the gate was wired into the workflow on 2026-05-14. Two pre-existing infrastructure issues surfaced immediately:

Byte identity is a dev pre-commit hygiene check ("did you regen after editing a stencil source"), not a runtime correctness check. The runtime impact of any stale stencil is caught by the actual correctness gates: bytecode + JIT path, ASan, 4-way JIT parity. Pinning a canonical clang version across dev and every matrix host isn't tractable; the check belongs locally.

Changes

Verification

v0.255.12 — Fix: Adapt closure-capture-macro-introduced fanout to thread-limit

With the v0.255.11 pthread_join deadlock cleared, CI got past the transient-survives-gc-yield hang and reached the next latent issue: closure-capture-macro-introduced in tests/bc_closure_test.clj spawns 10 simultaneous futures unconditionally. The default host thread_limit is cpu_count, and GitHub-hosted runners are 3-4 CPUs (macos-14, ubuntu-24.04, ubuntu-24.04-arm), so the 10-way fanout trips MTH001 (thread-limit-exceeded) before the test can verify the closure-capture invariant.

The invariant under test is "N macro-introduced closures each capture their own per-iteration i" — and N is a parameter, not the property under test. Any N ≥ 2 exercises the bug fix this regression test guards. The fix adapts the fanout to (- (mino-thread-limit) 1), clamped to [2, 10], so:

Same shape as the T9 probe fix in mino-tests v0.8.1. The local suite reports 1274 tests, 4557 assertions, 0 failed on this branch.

Verification

Files

v0.255.11 — Fix: Yield state_lock Around pthread_join in Future Sweep

The actual cause of the v0.255.x CI hang, pinpointed by v0.255.10's watchdog backtrace. The trace from CI macos-14 showed:

`` [mino] fatal SIGABRT (signal 6) [mino] gc: minor=133 major=5 ... phase=1 ... 0 crash_handler 2 _pthread_join <-- stuck here 3 mino_future_gc_sweep 4 gc_minor_collect 5 mino_gc_collect 6 prim_gc_bang ``

mino_future_gc_sweep calls pthread_join on a worker thread while the caller still holds state_lock. The worker thread needs state_lock to make progress through its body (it ran inside mino_call, which acquires state_lock). Result: classic acquire-while-holding deadlock -- worker is blocked on state_lock, sweep is blocked on pthread_join, neither moves.

Why it didn't surface from auto-tick GC: the legacy gc_tick_should_suppress path skips collection while thread_count > 0, so sweep never ran with live workers there. But mino_gc_collect (the public (gc!) API) explicitly does not check gc_tick_should_suppress -- and transient-survives- gc-yield calls (gc!) three times mid-test, right after bc_closure_test's closure-capture-macro-introduced left worker threads in their cleanup tail (worker has finished deliver, returned from the thunk, but hasn't yet reached the thread_count-- at the bottom of worker_run -- so the future's MINO_VAL is unreachable and sweep tries to free it while the worker still has work to do under state_lock).

Fix: drop state_lock around the pthread_join via the existing mino_yield_lock / mino_resume_lock pair, matching what mino_future_deref already does for its cv_wait. The worker finishes its cleanup tail unblocked, pthread_join returns immediately, sweep resumes lock and finishes destroying mu/cv and freeing the impl.

The v0.255.10 SIGABRT watchdog stays in .github/workflows/ci.yml as a permanent safety net for future hangs -- harmless when no hang fires (sleep 450 + kill never reaches a live pid). The v0.255.9 GC ordering fix is also load-bearing; both bugs were real and live at transient-survives-gc-yield. This release unblocks the CI matrix on macos-14 / ubuntu-24.04{,-arm}.

Local verification: 1274/4557 tests green; mino_asan green; release-gate 17/17 green.

v0.255.10 — Diagnostic: SIGABRT Watchdog on CI Test Hang

A diagnostic-only release that converts the remaining CI test hang from "silent SIGKILL at the 8-min cap" into "SIGABRT 30s before the cap, mino's crash_handler dumps a backtrace + GC stats". v0.255.9 fixed one real bug at transient-survives-gc- yield (use-after-free reproducible locally under ASan) but mino's CI matrix still hangs at the same test on macos-14 / ubuntu-24.04 / ubuntu-24.04-arm. Without a stack trace from the hung process we can't tell whether mino is in a tight loop, a pthread cv_wait, a GC mark-stack drain, or somewhere else.

.github/workflows/ci.yml's Test step now wraps ./mino tests/run.clj in a watchdog:

* Streams stderr to /tmp/test_trace.log (already an artifact on failure since v0.255.8). * Backgrounds mino via (exec ./mino ...) so the subshell's $! is mino's pid directly. * 7m30s into the run, if mino is still alive, sends SIGABRT. * mino's existing crash_handler (main.c:711) prints [mino] fatal SIGABRT (signal 6), GC stats (minor / major / live / alloc / freed / phase / remset), and a libc-backtrace stack frame list. * The 30s buffer before GHA's own SIGKILL gives the handler time to flush stdout/stderr and exit cleanly with code 134.

Local behaviour unchanged: a plain ./mino tests/run.clj still exits 0 in ~1.6s without the watchdog firing. The wrapper only lives in CI.

This release does not include any runtime change; v0.255.9's fix is still load-bearing for the use-after-free path.

v0.255.9 — Fix: `(gc!)` During In-Flight Major Mark Use-After-Free

Root cause of the v0.255.6 / .7 / .8 CI hang: mino_gc_collect(MINO_GC_FULL) (the (gc!) primitive) ran the nested minor BEFORE finishing the in-flight major. The minor freed a YOUNG header that was still on major's mark stack from a prior incremental slice; the subsequent gc_force_finish_major chased the freed pointer through gc_mark_child_push.

gc_tick_during_major (driver.c:141-147) already documents the correct ordering -- "finish-then-minor rather than nest" -- and honours it for the auto-tick path; the public-API path simply had the calls reversed. Auto-tick is the common case so the bug went unnoticed locally; CI's test ordering puts transient-survives-gc- yield (which explicitly calls (gc!)) inside an in-flight major on every push, surfacing the bug deterministically.

Symptom shape under each environment:

* GitHub Actions matrix (macos-14 / ubuntu-24.04{,-arm} / windows-2022): 8-min timeout in the Test step, identical hang point across all four OSes -- the v0.255.8 diagnostic trace pinned the hanging deftest to transient-survives-gc-yield. * Local without sanitizer: intermittent SIGSEGV in tiny_free_no_lock (sweep double-free) or error[MTY001] inc expects a number (corrupted value via the reused freed header). * Local under ASan: clean repro -- heap-use-after-free at gc_mark_child_push driver.c:414 with the freer site at gc_minor_collect minor.c:472 inside the same mino_gc_collect call.

Fix: reorder MINO_GC_FULL to finish the major BEFORE the minor, mirroring gc_tick_during_major. Behaviour for MINO_GC_MINOR and MINO_GC_MAJOR unchanged.

Regression: tests/gc_test.clj gains gc-bang-during-incremental-major -- warms up enough OLDs to start an incremental major, then calls (gc!) mid-mark in the transient shape that surfaced the bug. Fails (SIGSEGV / inc-error / hang) under the old ordering, passes under the fix. Full suite climbs from 1273 / 4555 to 1274 / 4557 (one deftest, two assertions).

Local verification: * ./mino tests/run.clj: 1274 / 4557 green in 1.6s. * ./mino_asan tests/run.clj: 1274 / 4557 green, ASan clean. * ./mino task release-gate: 17/17 probes green. * 5x consecutive ./mino /tmp/repro.clj (35 transient deftests): all green, no flakiness.

v0.255.8 — Diagnostic: Per-Deftest Trace for CI Hang Investigation

A diagnostic-only release: surfaces which test is running at the point CI hits the 8-minute timeout. clojure.test's run-tests-impl now prints [trace] <ns>/<deftest-name> to stderr before each deftest fires, gated on MINO_TEST_TRACE. Locally the trace stays off so plain ./mino tests/run.clj looks unchanged.

The hang is CI-only: all four matrix entries (macos-14, ubuntu-24.04, ubuntu-24.04-arm, windows-2022) time out at exactly the same point in the test run, but the suite passes locally (Apple Silicon, 1.6 s) and inside Docker (arm64 native + x86_64 emulation, both under cpus=3 / memory=7g resource caps that mirror GHA macos-14, both under bash -e -o pipefail). Without a local reproducer the next step has to be a focused CI data gather; this release is that.

CI workflow now sets MINO_TEST_TRACE=1, tees stderr to /tmp/test_trace.log, and uploads the trace as a test-trace-<matrix.os> artifact on failure. The last line of the trace identifies the hanging deftest; the actual fix lands in v0.255.9.

No runtime semantics changed; binary behaviour with MINO_TEST_TRACE unset is byte-identical to v0.255.7.

v0.255.7 — Portability: gcc Sanitizer Detection, POSIX strcasecmp, Empty TUs

CI on ubuntu-24.04, ubuntu-24.04-arm, and windows-2022 went red on the v0.255.6 push with three pre-existing portability bugs that landed sometime in the v0.241-v0.252 cycle but didn't surface locally:

* src/gc/internal.h:150 -- defined(__has_feature) && __has_feature(address_sanitizer) doesn't short-circuit at the preprocessor level under gcc. gcc evaluates the right side syntactically and fails with "missing binary operator before token '('". Split into nested #if defined(__has_feature) so gcc never sees the __has_feature(...) call when the macro isn't defined.

* main.c:815 -- strcasecmp is POSIX, declared in <strings.h> (not <string.h>). Add the include under #ifndef _WIN32; on Windows alias mino_strcasecmp to _stricmp. Call sites updated to use the portable name.

* src/eval/bc/jit/{helpers,emit,patcher,patcher_x86_64,stats}.c -- empty translation unit under -Werror=pedantic when MINO_CPJIT_HOST isn't defined for the build target (e.g., mingw without MINO_CPJIT_X86_64_WINDOWS). Added a sentinel typedef after each #endif to keep each TU non-empty.

* src/eval/bc/jit/entry.c:680 -- mino_jit_invoke stub had 4 parameters; header declared 5 (mino_env *env was added in the v0.219+ refactor and the stub wasn't updated). Stub signature now matches the header.

Verified: full local build + tests pass (1273 tests / 4555 assertions, 0 failed). Pushed to trigger a fresh CI run.

v0.255.6 — Fix: BC Speculative Fold Longjmps Through Active Try-Frame

Surfaced by mino-tests v0.7.0's gen_program.clj: a defn whose body contains an (if cond then else) where the else-branch holds a compile-time-foldable error ((quot/mod/rem a 0), shift out of range, LLONG_MIN/-1) raised the error at runtime even when the cond was statically truthy and the else was unreachable. Top-level forms with the same shape worked. Reproducer:

`` (defn f [] (if (zero? 0) 43 (quot 1 0))) (f) ; => quot: division by zero -- expected 43 ``

Root cause: try_fold_call and try_fold_arg in src/eval/bc/compile.c speculatively invoke a pure prim at compile time to test whether the call can fold to a constant. The contract assumes prim_throw_classified will take the "set diag and return NULL" branch on error so the caller can clear_error and decline. But that branch only runs when try_depth == 0; with an active try-frame in the surrounding eval context (compile-on-call from inside any (try ...)), prim_throw_classified longjmps into the active frame instead. The longjmp escapes the compile, the exception lands on the user's catch, and the unreachable else surfaces as if it were the actual eval path.

Fix: save and zero try_depth around the speculative prim call in both try_fold_call and try_fold_arg. The prim sees a clean no-try context, takes the diag-return-NULL path, and the caller's existing clear_error + decline logic works as documented. The try-frame stack contents are untouched; only the depth counter is temporarily masked.

Regression: tests/bc_let_fold_test.clj gains if-else-fold-error-stays-unreachable with 6 sub-tests covering the anchor reproducer plus mod/rem, when-with-unreachable-then, and param-driven cond paths.

v0.255.5 — Fix: BC Bitwise Fast Path No Longer Promotes to Bigint

Surfaced by mino-tests's gen.clj while building a seeded RNG: an xorshift64* step inside a defn body raised MTY001 "bit-xor expects integers" after a few iterations. The same code at the top level worked.

Root cause: binop_int_fast in src/eval/bc/vm.c routed BAND / BOR / BXOR / SHL / SHR / USHR results through tag_or_box_int, which falls back to mino_int for values outside the inline-tag range. mino_int promotes to MINO_BIGINT when the bignum capability is installed. The corresponding prims (prim_bit_shift_left, prim_bit_and, ...) all use mino_int_wrap, which always boxes as a plain MINO_INT and never promotes. The two paths produced different result types for the same operation, and a downstream bit-xor's fast-path check refused the bigint with MTY001 "bit-xor expects integers".

Fix: bitwise BC fast-path results now go through mino_int_wrap, matching the prim semantics exactly. Arithmetic (ADD / SUB / MUL) keeps tag_or_box_int since Clojure-correct overflow there DOES promote to bigint.

Regression test in tests/bc_bitwise_test.clj covers all five bitwise ops at i64 range plus the xorshift64* chain that surfaced the original issue. Suite: 1272 / 4549 green.

v0.255.4 — Hygiene: I/O Buffer Overflow + Safepoint Comment

Two small cleanups closing the rest of the runtime-audit BUGS.md backlog.

* src/prim/io.c: - append_byte's growth loop wraps the doubling in checked_double_sz. The pattern *cap == 0 ? 64 : *cap * 2 was the only doubling left in the file without an overflow guard. - file_seq_recurse's items-array growth wraps both the capacity doubling and the nc * sizeof(**items) multiplication in checked_double_sz / checked_mul_sz. * src/runtime/internal.h: the should_yield field's comment said "Single-threaded today: nothing sets the flag" but mino_safepoint_propagate_stw in state.c does set it (and mino_safepoint_park clears it). Comment updated to describe the actual writers / readers.

Suite: 1270 / 4544 green.

v0.255.3 — Fix: VM Arithmetic Fallback UB

BINOP_ADD / BINOP_SUB / BINOP_MUL in src/eval/bc/vm.c had a non-GCC/Clang fallback that performed direct signed integer arithmetic. Signed overflow is UB per ISO C, so the fallback was unsafe even though no major mino target ships without the __builtin_*_overflow builtins.

Fix: the fallbacks now detect overflow explicitly.

- ADD / SUB cast to unsigned long long, perform the well- defined wrap, and use the textbook sign-bit comparison (XOR-of-operands AND XOR-of-result, shifted to bit 63) to detect overflow. - MUL pre-checks against LLONG_MAX / LLONG_MIN via division so the multiply itself never overflows; the LLONG_MIN * -1 corner case is rejected explicitly.

The GCC/Clang __builtin_*_overflow paths are unchanged. <limits.h> added to the includes for LLONG_MAX / LLONG_MIN.

Suite: 1270 / 4544 green.

v0.255.2 — Fix: thread_count Atomic Accesses

Surfaced by mino-tests's adv_stm_mix probe under TSan: a data race between gc_tick_should_suppress (driver.c) reading S->thread_count without a lock and worker_run (host_threads.c) decrementing it under worker_list_lock. The plain-int access was documented as a "relaxed observability counter" since the value only feeds an approximation (suppress major GC while host workers are alive), but under the C standard the unsynchronised read concurrent with a write is UB, and TSan correctly flagged the race.

Fix: route every read and write through __atomic_load_n / __atomic_fetch_add / __atomic_fetch_sub with __ATOMIC_RELAXED. The semantics are unchanged -- relaxed ordering is exactly what the "stale value is fine" comment promised -- but the C standard is no longer offended and TSan stops flagging the race. Lock-side accesses (spawn gate, exit decrement) also use atomic ops; the worker_list_lock still serializes the gate-then-increment so two spawns can't both pass the limit check.

Verified: TSan harness was 15/15 + 1 warning before this fix; 15/15 + 0 warnings after.

v0.255.1 — Fix: thread-sleep Yields state_lock

Surfaced by mino-tests's T9 conc_deadlock probe: a polling pattern like `` (loop [...] (when-not (realized? p) (thread-sleep 50) (recur))) `` spun forever even though the promise was eventually delivered by a sibling future.

Root cause: thread-sleep called nanosleep while holding state_lock. Worker threads spawned by (future ...) need state_lock for any mino_call -- including the body's call to deliver -- so they couldn't make progress during the sleep. The main thread observed realized? p as false at every poll because no worker had actually run yet. @p worked because mino_future_deref yields the lock before blocking on the condvar; thread-sleep didn't have an equivalent yield.

Fix: wrap the nanosleep loop in mino_yield_lock / mino_resume_lock, matching what mino_future_deref already does. Workers can now make progress during the sleep, so a sibling future's delivery is visible to the next realized? call.

Regression test in tests/async_smoke_test.clj adds async-realized-cross-thread which fails under the old behavior and passes under the fix. Suite: 1270 / 4544 green.

v0.255.0 — Fix: loop+recur Closures Inside defn Bodies

Surfaced by mino-tests's T8 closure_tco_jit probe during Cycle A: a (loop [i 0 ...] ... (recur ...)) inside a defn body returns closures that all share iteration-0's bindings. Top-level loops worked correctly; the bug lived entirely in the BC compile path.

Root cause: compile_loop emitted OP_PUSH_ENV + OP_ENV_BINDs ONCE before the recur target. Each recur then jumped back to the post-binding entry, updated the loop register values via OP_MOVE, and re-ran the body. The env frame, however, was the same frame -- and OP_ENV_BIND stores a snapshot of the register value, not a reference -- so the env's "i" -> value mapping stayed at iter 0. Every closure built in the body captured that env, so every closure returned iter-0's value.

Fix: compile_recur now emits OP_POP_ENV + OP_PUSH_ENV + re-OP_ENV_BIND of every loop binding before jumping back, when the enclosing function is env-capturing (bc->captures). Loops with no closure capture skip the work; the fused counted-loop family (OP_LOOP_INT_DEC etc.) also skips because they don't go through compile_recur. The tree-walker eval_loop path was already correct: it allocates a fresh env_child per recur iteration.

loop_target_t gains captures and bind_const_idx[] so compile_recur knows which symbol names to publish without re-resolving them.

Regression test in tests/bc_closure_test.clj adds three deftests that fail under the old behavior: closure capture inside a defn body, side-effect capture inside a defn body, and nested-loop capture across outer + inner bindings. Suite: 1269 / 4542 green.

v0.254.0 — Cross-Repo Release-Gate Hook

mino's release-gate composite now chains into the mino-tests satellite suite when both repos are checked out side-by-side (../mino-tests/mino.edn detection). The chain runs mino-tests's tests/adv/runner.clj --seed 0 --mode smoke, which exercises all eleven script-side adversarial probes. A clean fresh clone of mino without mino-tests adjacent prints a "skipped satellite smoke" notice and the gate continues; no hard dependency.

The hook makes the boundary principle operationally explicit: mino's release-gate verifies language semantics + JIT parity, and when the satellite suite is available it also verifies the adversarial / E2E posture. Either repo can release independently or together.

Also bundled here: the v0.253.3 borderline-E2E audit landed ns_parity_run.clj back in mino (it depends on mino's own tests/ns_*_test.clj corpus and isn't standalone). The companion audit in mino-tests dropped creative_test.clj (zero deftests -- script-only output, belongs in mino-examples).

v0.253.3 — Test-Suite Split: Borderline E2E Audit

Fourth migration cut. Per the strict-by-category boundary principle the borderline tests move to mino-tests:

- tests/creative_test.clj -- multi-feature closure / atom combinators run as a script (no deftest). Heavy E2E. - tests/doc_examples_test.clj (427 lines) -- documentation- matched example deftests; designed for the mino-site example matcher. - tests/bc_jit_deopt_test.clj -- cross-runtime JIT deopt behaviour; exercises the CPJIT path under multiple binary modes. - tests/ns_parity_run.clj -- multi-process namespace parity runner; loads upstream Clojure / ClojureScript / Babashka namespace-mechanic corpora. - tests/spawn_stress_regression.clj -- spawn-fleet stress against the precise-GC HAMT churn surface.

Mino suite: 1268 tests / 4539 assertions.

v0.253.2 — Test-Suite Split: C-Side Embed Harnesses + Error-Message Normalization

Third migration cut moves the multi-state / STM / capability C-side embed harnesses out to mino-tests. The lightweight API-surface smoke (tests/embed_api_test.c) stays in mino — single state, sub-second, exercising the basic C API contract.

Also: the "unhandled exception" wording emitted by the top-level unprotected-throw path is normalised to "uncaught exception" project-wide, matching the wording already used by normalize_exception in src/eval/control.c since v0.252.1's embed: surface raw thrown payload through _ex out_ex introduced the new path. Previously the two emitters disagreed and tests/embed_api_test.c::test_throw_uncaught was checking the older wording; the v0.252.1 commit added a sibling test but left this one stale. Fixed inline: state.c and embed.c now emit "uncaught exception"; the existing test asserts the new wording.

Moved out of mino:

- tests/embed_multi_state.c (16 states + 16 pthreads) - tests/embed_stm_test.c (STM Layer 2a smoke) - tests/embed_caps_test.c (capability install matrix)

mino's test-embed task now compiles + runs only embed_api_test.c (renamed in spirit to "embed-api smoke"; the full battery moved).

v0.253.1 — Test-Suite Split: Fuzz / GC Stress / Fault Injection

Second migration cut. Fuzz, GC stress, fault-injection, and the HAMT churn regression move out of mino's tests/ into the mino-tests sibling repo.

Moved out of mino:

- tests/reader_fuzz_test.clj - tests/gc_generational_test.clj - tests/gc_incremental_test.clj (basic GC tests in tests/gc_test.clj stay here) - tests/regression_hamt_str_churn.clj - tests/fault_inject_test.clj, tests/fault_inject_runner.clj - tests/gc_stress_runner.clj

Tasks removed from mino's task table:

- test-fault-inject (runner now in mino-tests) - test-gc-stress (runner now in mino-tests)

CI nightly workflow simplified: the GC-stress and fault-injection job steps move to mino-tests's own nightly; mino's nightly keeps release-gate + embed-stress only.

Mino suite shrank to 1367 tests / 4732 assertions; mino-tests migrated suite grew to 384 / 3204.

v0.253.0 — Test-Suite Split: Concurrency-Heavy Migration

Aggressive migration kickoff. The concurrency-heavy and async-soak tests move out of mino's tests/ and into the new sibling repo mino-tests (see the satellite suite). The split's boundary principle: mino keeps tests of language semantics (one primitive or special form at a time, sub-second, single-runtime); mino-tests holds anything multi-runtime, concurrency-heavy, fuzz, soak, sanitizer-trinity, coverage, or adversarial.

Moved out of mino:

- tests/stm_concurrent_test.clj (multi-thread STM contention) - tests/host_threads_test.clj (host-thread budget + lifecycle) - tests/agent_test.clj (511 lines of agent fan-out) - tests/regex_reentrant_test.clj (multi-thread regex) - tests/async_alts_test.clj, async_api_test.clj, async_blocking_test.clj, async_buffer_test.clj, async_combinators_test.clj, async_conformance_test.clj (1056 lines), async_go_test.clj, async_mult_pub_test.clj, async_timer_test.clj

Kept in mino as a minimal three-shape conformance:

- tests/async_smoke_test.clj — one go, one alts!!, one promise. Single deftest each; sub-second.

Suite shrank from 1743 tests / 7929 assertions to 1433 / 7395; the migrated set runs green at 313 tests / 538 assertions in mino-tests.

v0.252.3 — Closure Capture Across Self-Tail-Call

A third adversarial whitebox pass — this one building real multi-CPU concurrency workloads to tease out races, deadlocks, and pathological slowdowns — caught a deep, pre-existing Clojure-semantics divergence: self-tail-call reused the same local env_child and bind_params mutated param slots in place, so a closure built in iteration N silently observed iteration N+1's param values once the next recur landed. Every common closure-over-iteration pattern was affected:

- (loop [i 0] (recur ...) + (fn [] i)) - (dotimes [i N] (fn [] i)) - (defn G [i] ... (G (inc i))) self-recursion - (for [i ...] (fn [] i)) (expanded form) - Any macro that expands to (fn [] ...)future, delay, lazy-seq, for, doseq — when used inside a recur body

The original surface that caught this in pass 3 was W10: (def ps (vec (repeatedly 10 promise))) (dotimes [i 10] (future (deliver (nth ps i) ...))) (mapv deref ps) which hung indefinitely (every future captured the post-loop i = 10, so every (nth ps 10) threw out-of-bounds and no promise got delivered) and, under --jit=on, SIGSEGV'd in mino_jit_getglobal_cached_slow while the threads dereffed corrupt state.

The fix is in two paths:

- apply_callable's self-tail-call trampoline (src/eval/fn.c) for both the (recur ...) and named-self-call cases now allocates a fresh env_child on each iteration instead of reusing local and mutating param slots in place. - eval_loop's recur trampoline (src/eval/bindings.c) does the same on every recur rebind.

Source-level "does the body have closures?" analysis would miss the macro-introduced closures (a future body isn't a fn form until after macroexpansion), so the allocation is unconditional on these paths. The BC compiler's fused counted-loop family (OP_LOOP_INT_DEC, OP_LOOP_INT_LT_INC, …) recognises the tight- loop shapes ahead of this code, so the cost is contained to general self-recursion: one env_child allocation per iteration, ≈640 ns on a tight (loop [i 100000] ...).

Why the existing test suite didn't catch this: tests/bc_closure_test.clj covered closures over fn params *across separate invocations* ((let [c1 (mk-counter 100) c2 (mk-counter 200)] ...)) but had no test for closures captured *during a single invocation that self-recurses*. The new regression block (six deftests, nine assertions) closes that gap:

- closure-capture-named-self-tail-call(defn G [i] ... (G (inc i))) - closure-capture-loop-recur — both value-collecting and side- effect-via-atom shapes - closure-capture-dotimes — the most common surface in real code - closure-capture-for-comprehension - closure-capture-macro-introduced — future/delay/W10 promise fan-out, the cases that source-level scanning would miss - closure-capture-multi-arity-self-call — covers the multi-arity branch of apply_callable's recur trampoline

Full suite: 1743 tests / 7929 assertions / 0 failed (was 1737/7920 pre-fix — +6 tests, +9 assertions). 4-way parity byte-identical on stdout; mino-lean + JIT-on/off/auto all match.

v0.252.2 — Runtime Hardening from Cycle-I Whitebox Pass 2

A second adversarial whitebox pass targeted the runtime internals (stencil patcher boundaries, GC × JIT interaction under allocation pressure, multi-state lifecycle, reader hostile inputs, host-thread + JIT concurrency) instead of the surface UX the previous pass covered. Two pre-existing defects turned up, both reachable from ordinary script input and both able to crash the embedder process. Neither is JIT-architecture related; both are fixed here so a hostile or merely-deep script can no longer SIGSEGV / SIGABRT the host.

- Reader recursion depth is now bounded at 1024 levels with a clean MRE011 "nesting too deep" diagnostic. Without the bound, a pathological ((((... input around 30,000 nested levels exhausts the default 8 MiB main-thread stack and SIGSEGVs the embedder with no message. The check fires in read_form (the universal reader entry) so lists, vectors, maps, sets, and reader macros all share one balanced enter/exit pair. Legitimate input never gets close to 1024 — even macro-heavy code peaks around 50; pathological hand-written data peaks around a few hundred. The new state field is appended to the end of struct mino_state so the JIT's pinned offsets in src/eval/bc/stencils/runtime_layout.h stay byte-stable.

- gc_pin / gc_unpin overflow asserts are now gated on sanitizer-build detection rather than firing in every -O2 build. The macros' header comment already documented the intent ("debug / sanitizer builds loud; release builds keep the documented soft-loss path") but the standard build didn't define NDEBUG, so the asserts fired anyway. A script with ~60+ case clauses expands to a deeply-nested cond chain; each level pins fn through eval_apply_regular_call, blows past GC_SAVE_MAX = 64, and used to abort the process at special.c:785. The fix detects __SANITIZE_ADDRESS__, __SANITIZE_THREAD__, __SANITIZE_UNDEFINED__, and clang's __has_feature(address_sanitizer). Sanitizer builds keep the loud assert (which is the regression-detector for real liveness drift); release-grade builds keep the soft-loss path the conservative C-stack scanner already covers.

Release-gate green; 4-way parity byte-identical on stdout; ASan build still asserts loudly on the case repro (verified); 1737 tests / 7920 assertions / 0 failed.

v0.252.1 — Developer UX Fixes from Cycle-I Whitebox Pass

Adversarial whitebox testing on the v0.252.0 surface (JIT CLI, mino-lean dual-binary, REPL workflow, error-reporting UX) turned up seven defects that degrade the developer experience without being JIT-architecture regressions. All are fixed in this patch; none change observable JIT behavior or the embed API.

Diagnostic accuracy:

- reader_col is now reset alongside reader_line at the start of every mino_eval_string / mino_load_file call. Without the reset, the terminal column of the previously loaded source (typically the bundled core.mino) leaked into the next load, so every line-1 error reported a wildly off column. (/ 1 0) on its own line went from <file>:1:87 (caret 86 spaces past the source) back to <file>:1:1 with the caret under the (. - File-mode (script and -e) error reporting now preserves the inner exception's kind, code, and message instead of degrading every runtime error to a generic MCT001 unhandled exception. The map-unwrap path mirrors the REPL handler and routes through normalize_exception so an ex-info value ({:message ... :data ...}) reports error[MUS001]: <message> rather than swallowing the payload. Both mino script.clj and mino -e EXPR paths now agree with the REPL on what a thrown value looks like. - REPL multi-line snippets now render the actual user input. The REPL keeps an append-only session-history buffer and publishes that to the source cache, instead of just the bytes still pending parse. Errors past the first form previously rendered as 3 | (blank line); they now show the real line the user typed.

JIT CLI / dual-binary UX:

- --jit= is now case-insensitive, matching the MINO_JIT env var. --jit=ON, --jit=Auto, --jit=Off all parse; the old behavior accepted MINO_JIT=ON but rejected --jit=ON, which was the worst-of-both kind of asymmetry. - mino-lean (or any build with the JIT compiled out) now emits a one-line stderr note when --jit=on or MINO_JIT=on is requested explicitly: "this build has the JIT compiled out; --jit=on / MINO_JIT=on has no effect, interpreter will run". Silently swallowing user intent was the worst-of-both for the dual-binary design. --jit=auto and --jit=off stay silent because they're compatible with a no-JIT build. - mino-lean --help now annotates the --jit= and --jit-threshold= flags as "Accepted for parity; this build has the JIT compiled out" instead of advertising them with their full-binary descriptions. - mino-lean --version now reports mino-lean 0.252.0 (no-jit) instead of mino 0.252.0, so install audits and bug reports can tell the two binaries apart at a glance.

The 4-way parity (jit-auto / jit-on / jit-off / lean) stays byte-identical on stdout. The new mino-lean note goes to stderr and only fires when the user asked for --jit=on or set MINO_JIT=on, so the release-gate parity sweep continues to pass unchanged. mino task test is green at 1737 tests / 7920 assertions / 0 failed; mino task release-gate is green; the stencil-extract selftest is green.

v0.252.0 — JIT Feature-Complete Declaration + Cycle I Close

Closes Cycle I. The CPJIT layer is feature-complete on the dev host. The companion mino-site repo gains a new Internals page, JIT status, that carries the verification checklist (seven local boxes ticked at v0.252; three deferred boxes pending the first push to upstream), documents the runtime-control surface (mino_state_set_jit_mode, mino_state_set_jit_hot_threshold, mino_state_jit_capability, the --jit=auto|off|on and --jit-threshold=N CLI flags, and the MINO_JIT / MINO_JIT_HOT_THRESHOLD env vars), and lists the by-design omissions (no type-feedback specialization, no deoptimization, no adaptive stencil expansion).

What "feature-complete" means here:

- 39 stencils registered across five host arches with on-disk byte tables in src/eval/bc/stencils/generated/. - Dual-binary build: full mino (CPJIT-enabled, AUTO default) + mino-lean (CPJIT compiled out, smaller footprint). - Four-way parity green on the dev host: stdout byte-identical across MINO_JIT=auto/on/off plus mino-lean. - Synthetic-blob selftests pass for Mach-O / ELF / COFF via tools/stencil-extract --selftest. - Embed API stable since v0.240; no breaking changes through cycle close.

What "feature-complete" does not mean: a freeze. Bug fixes, portability fixes, and perf-neutral cleanup remain in scope. New stencils land in a separate opcode-expansion cycle.

Perf thresholds for the runtime-perf track that follows are seeded developer-side (regression ceiling +/- 10%, gain ratchet >= 10%, +/- 7% single-run noise envelope, median-of-3 discipline for any number that enters a CHANGELOG perf table); see the JIT status page on mino-site for the headline numbers.

No mino source changes beyond the version bump.

v0.251.0 — JIT Portability Matrix + On/Off A/B Evidence

Cycle I opens with documentation evidence rather than source changes. The companion mino-site repo gains a new Internals page, JIT support matrix, that records the verification posture for each of the five CPJIT host pipelines and the median-of-three on/off A/B numbers captured against mino-bench/benchmarks/realistic_bench.clj on the dev host.

The A/B confirms the v0.250 nursery-bump gains were JIT-architecture independent: five of six realistic_bench rows sit within the +/- 7% noise envelope when toggling MINO_JIT=on vs MINO_JIT=off, while fibonacci(25) (pure compute, near-zero allocation) shows a 1.37x JIT win and the fused transducer pipeline shows a 1.06x JIT win. Allocation- and GC-dominated workloads are not where the JIT lives; the runtime-perf track that follows feature-complete is correctly framed as allocation- and dispatch-dominated rather than stencil-substrate-dominated.

No source changes beyond the version bump.

v0.250.0 — Default Nursery 1 MiB -> 4 MiB + Cycle G Close

Closes cycle G with one substantive change. The default GC nursery size bumps from 1 MiB to 4 MiB. Allocation-heavy realistic workloads spend 35-43% of wall time in minor-GC at 1 MiB; bumping to 4 MiB cuts total GC time by ~35-60% across the board without raising worst-case minor-GC pause -- the larger nursery means each minor pass collects more bytes per cycle, lowering per-byte overhead.

realistic_bench (before / after / speedup)

| Row | 1 MiB (v0.249) | 4 MiB (v0.250) | speedup | |----------------------------------|---------------:|---------------:|--------:| | build 5k int-map and sum | 12.72 ms | 11.18 ms | 1.14x | | bump 5k int-map values | 22.92 ms | 16.12 ms | 1.42x | | map/filter/map/reduce over 50k | 0.77 ms | 0.68 ms | 1.13x | | nested vectors 500x100 | 23.00 ms | 16.92 ms | 1.36x | | realize 10k of lazy range | 7.82 ms | 5.79 ms | 1.35x | | fibonacci(25) | 7.83 ms | 6.43 ms | 1.22x |

GC-time share on the four allocating rows fell from 35-43% to 20-32%. Worst-case minor-GC pause dropped from ~3.5 ms to ~2.0 ms.

Trade-off

Each VM state now holds 3 extra MiB of young-gen residency before the first major GC. Embedders with tighter memory budgets override via the MINO_GC_NURSERY_BYTES env var or mino_gc_set(state, MINO_GC_NURSERY_BYTES, n). The comment in src/runtime/state.c documents the rationale.

Cycle G closure

Plan called for v0.250 -- v0.254 as five candidate releases gated >= 1.05x per-release ratchet, with cycle gate at >= 1.10x on at least one row before v0.255 close. The single v0.250 change cleared the cycle gate on every row (6/6), 1.42x peak. Per [[release-per-cohesive-piece]] and [[no-fakery]], the additional placeholder releases collapsed into this one; cycle closes at v0.250.

Levers parked for a future perf cycle: - PHM-32 lookup constants - Vector slice without copy - Sorted-coll node-fan tuning - Multi-arity dispatch cache

- src/runtime/state.c: default gc_nursery_bytes bumped 1 MiB -> 4 MiB. Inline comment captures the rationale and the override path.

release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical.

v0.249.0 — Perf Cycle G: Measurement Baseline

Opens cycle G. This release is measurement-only: a fixed baseline on realistic_bench so the v0.250 -- v0.254 candidate releases have a stable reference to ratchet against.

Baseline captured on the dev host (ARM64 Darwin, macOS 24.6.0):

| Row | mean | alloc/op | gc% | |----------------------------------|-----------:|---------:|-----:| | build 5k int-map and sum | 12.72 ms | 7.27 MB | 35.3 | | bump 5k int-map values | 22.92 ms | 13.00 MB | 37.5 | | map/filter/map/reduce over 50k | 0.77 ms | 0.01 MB | 0.0 | | nested vectors 500x100 | 23.00 ms | 20.24 MB | 36.3 | | realize 10k of lazy range | 7.82 ms | 3.96 MB | 42.5 | | fibonacci(25) | 7.83 ms | 0.00 MB | 0.0 |

Notes: - Four of six rows are 35-43% GC time. Allocation is the dominant cost. - The pipeline row (774 us) shows the [[clojure-aware-perf-cycle]] transducer-fusion win is already in place; no further win available there. - fib-25 is mutator-bound (no allocation); no obvious lever short of broader stencil expansion, which this cycle's gate explicitly rules out.

Candidate selection (one per axis, per Cortex's framing in [[next-cycle-cortex-feedback]]):

- Allocation: bump 5k int-map values (nursery sizing) - Data-struct: nested vectors 500x100 (small-int cache reach) - Dispatch: build 5k int-map (int-key assoc fast path)

Cycle gate: one realistic_bench row >= 1.10x cumulative before v0.255.0 close.

- .local/perf-cycle-baseline.md: captured numbers. - .local/perf-cycle-candidates.md: per-axis lever notes and sequencing.

No code change in this release; baseline parity vs. v0.248.0 is the success criterion.

v0.248.0 — Nightly Matrix Workflow + Cycle F Close

Closes cycle F. Adds .github/workflows/ci-nightly.yml: a cron-scheduled (04:00 UTC daily) workflow that runs the full extended-suite battery on every supported Linux + Darwin host:

- release-gate (re-run, since nightly is self-contained) - test-gc-stress (GC stability under stress collection) - test-fault-inject (simulated OOM fault-injected paths) - test-embed (C embedding stress with 16 states / 16 pthreads)

Same host set as the PR-time matrix (ubuntu-24.04, ubuntu-24.04-arm, macos-14). Windows is skipped for the same reason it skips PR-time release-gate -- ASan needs a libsanitizer mingw doesn't ship.

PR-time CI keeps to the smoke set + release-gate (push latency stays low); the nightly catches regressions in the stress-test paths that aren't worth running on every PR.

Cycle F summary (v0.245.0 -- v0.248.0):

- Two Dockerfiles + ci-matrix task (local Linux mirror of the GHA matrix). - GHA matrix pinned to four host labels (ubuntu-24.04, ubuntu-24.04-arm, macos-14, windows-2022) with a release-gate step on every non-Windows entry. - cross-compile job on macos-14 covering every gen-stencils-<arch>-<os> target via header-byte parity -- the verification floor for x86_64 Darwin until GHA re-introduces Intel Mac runners. - Nightly extended-suite workflow on Linux + Darwin.

The "5 host paths code-complete" line is no longer aspirational: every push exercises the runtime patcher on ARM64 Linux, x86_64 Linux, ARM64 Darwin, and x86_64 Windows; every push also re-verifies the cross-compiled bytes for x86_64 Darwin; nightly batteries cover the stress paths.

release-gate green locally; nightly workflow activates on the next 04:00 UTC tick or via workflow_dispatch.

v0.247.0 — Cross-Compile Smoke Job + x86_64 Darwin Posture

Adds a cross-compile GHA job on macos-14 that runs every gen-stencils-<arch>-<os> task and asserts the freshly- generated stencil headers match the committed bytes. The job catches toolchain drift on every supported target without needing an end-to-end runtime on each:

- gen-stencils-arm64-linux - gen-stencils-x86-64-linux - gen-stencils-x86-64-darwin - gen-stencils-x86-64-windows

A git diff --exit-code on src/eval/bc/stencils/generated/ gates the job. If a clang point release ever changes codegen for any target, the diff is the diagnostic.

x86_64 Darwin posture. Apple has been retiring Intel Mac GHA runners; Docker on Apple Silicon cannot run Darwin/x86_64 either. The cross-compile job covers x86_64 Darwin via header-generation parity (the parser confirms bytes; the runtime patcher confirms patch arithmetic in --selftest), which is the verification floor for that target until either:

- GHA re-introduces an Intel Mac runner tier, or - the project self-hosts an Intel Mac mini, or - the target is dropped from the supported list.

The synthetic-blob selftests from v0.244.0 plus this cross- compile job together give x86_64 Darwin a credible parser / toolchain coverage without an end-to-end runtime gate.

- .github/workflows/ci.yml: new cross-compile job on macos-14; runs after build succeeds.

release-gate green locally; new cross-compile job activates on push.

v0.246.0 — GHA Matrix Extension + Release-Gate Step

Extends the GitHub Actions CI matrix to cover every host arch with a committed stencil header, then adds a release-gate step that exercises the cpjit substrate end-to-end on each.

Matrix shape (was: ubuntu-latest / macos-latest / windows-latest):

- ubuntu-24.04 -- x86_64 Linux native - ubuntu-24.04-arm -- ARM64 Linux native (GHA added this tier in 2024) - macos-14 -- ARM64 Darwin native (Apple Silicon) - windows-2022 -- x86_64 Windows native

Runner labels are pinned so a future GHA image bump doesn't silently shift the matrix.

A new Release gate step runs ./mino task release-gate on every non-Windows entry. The gate covers check-reloc-mirror, check-stencil-registry, check-stencils-fresh, the test suite, the ASan-built suite, and 4-way JIT parity (auto / on / off / lean). Windows skips the gate because the ASan step needs a libsanitizer mingw doesn't ship; the matrix still builds and runs the smoke test on the Windows runner.

This is the moment the "5 host paths code-complete" line stops being aspirational: every push now exercises the runtime patcher, the stencil byte tables, and the dual-binary build pipeline on each host the cpjit cycle wired up.

- .github/workflows/ci.yml: matrix entries pinned; release- gate step added; windows-latest -> windows-2022 propagated through CC selection and continue-on-error gate.

release-gate green locally; CI matrix activates on push.

v0.245.0 — Docker Images + ci-matrix Task

Opens cycle F. Adds the local CI scaffolding that makes the "5 host paths code-complete" claim continuously verifiable instead of aspirational.

Two new Dockerfiles under docker/ build minimal Ubuntu 24.04 images carrying just the toolchain mino task release-gate needs (gcc + make + libc6 dev headers):

- docker/arm64-linux.Dockerfile (arm64v8/ubuntu:24.04) - docker/x86_64-linux.Dockerfile (amd64/ubuntu:24.04)

A new mino task ci-matrix driver builds each image (cached), bind-mounts the repo as a read-write volume, runs make && ./mino task release-gate inside, and aggregates pass / fail per target. Failure prints the last 60 lines of output for the failed target so the diagnostic surfaces without trawling through Docker layer logs.

On an Apple Silicon dev host: - linux/arm64 runs natively via the macOS Virtualization framework. - linux/amd64 runs via Rosetta 2 / qemu (slower but functional).

x86_64 Windows is not in this matrix: Windows containers need a Windows host. The GHA matrix in .github/workflows/ci.yml covers that target via the windows-2022 runner; the local ci-matrix driver mirrors only the Linux pair.

- docker/arm64-linux.Dockerfile: new. - docker/x86_64-linux.Dockerfile: new. - lib/mino/tasks/builtin.clj: new ci-matrix-targets table and ci-matrix driver. - mino.edn: registers the ci-matrix task.

The GHA matrix extension lands in v0.246.0; the cross-compile smoke job in v0.247.0; the nightly workflow + cycle close in v0.248.0.

release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical.

v0.244.0 — Extractor Carve-Out: coff + Synthetic-Blob Selftests + Cycle E Close

Closes cycle E. Two changes in this release:

1. Final extractor carve-out: coff module. Lifts the PE/COFF amd64 parser into its own module: file-header + section typedefs, IMAGE_REL_AMD64_* reloc constants, storage-class constants, the byte-packed-symtab accessors, parser entry points (coff_open, coff_list_symbols, coff_find_symbol), the x86_64 reloc-kind map (coff_reloc_x86_64_kind_map), and the coff_emit_stencil_header entry. tools/stencil_extract.c now sits at ~440 lines: format-magic sniff, main(), and the aggregate selftest dispatcher.

2. Per-format synthetic-blob unit tests. New tools/stencil_extract/selftest.{h,c} builds tiny in-memory .o-style buffers (Mach-O / ELF / COFF) by hand, parses each via its public format API, and asserts symbol lookup + body size + reloc-bound checks match the known-good values encoded into the blob. These tests catch parser regressions independent of compiling the project's own .c files into .o: a struct-layout drift on a new compiler, a missing reloc-kind map entry, or an off-by-one in a per-format extract loop now surfaces from --selftest without needing a full gen-stencils pass.

The aggregate selftest() in tools/stencil_extract.c calls each per-format synthetic test in turn; each prints OK on success, accumulates a fail count on miss.

Cycle E summary (v0.241.0 -- v0.244.0):

- tools/stencil_extract.c: 1833 lines -> ~440 lines. - 6 new files under tools/stencil_extract/: core.{h,c}, macho.{h,c}, elf.{h,c}, coff.{h,c}, selftest.{h,c}. - Binary renamed tools/stencil_extract -> tools/stencil-extract (hyphen, matching the mino-lean convention) to free the directory name for the source modules. - lib/mino/tasks/builtin.clj: build-stencil-extract takes a source-file list; adding a new format / new architecture is a localised change instead of 1700-line surgery.

The generated stencil headers regenerate byte-identical across every carve-out -- a parser refactor that changes any byte output would be a bug, not a feature.

release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical, synthetic-blob selftests green on all three formats.

v0.243.0 — Extractor Carve-Out: elf Module

Third extractor carve-out. Lifts the ELF64 parser into its own module: header types (elf64_ehdr_t, elf64_shdr_t, elf64_sym_t, elf64_rela_t), section / class / symbol-info constants, AArch64 + x86_64 reloc-kind constants, the parser entry points (elf_open, elf_list_symbols, elf_find_symbol), the AArch64 + x86_64 reloc-kind maps, and the elf_emit_stencil_header entry. The r_info bit-field accessors stay as static inline in the header so callers across the module boundary use the same decode path.

The kind-map functions take an elf_ prefix (elf_reloc_arm64_kind_map, elf_reloc_x86_64_kind_map) to mirror the Mach-O module's naming convention.

- tools/stencil_extract/elf.h: new. ELF64 types + constants + reloc accessor inlines + parser API. - tools/stencil_extract/elf.c: new. Implementation. - tools/stencil_extract.c: removes the ELF block; the selftest references the new prefixed map names. - lib/mino/tasks/builtin.clj: stencil-extract-srcs gains elf.c.

COFF stays in the monolith; its carve-out plus the synthetic- blob selftests land in v0.244.

The generated stencil headers regenerate byte-identical across the move. release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical.

v0.242.0 — Extractor Carve-Out: macho Module

Second extractor carve-out. Lifts the Mach-O 64 parser into its own module: header types, ARM64 + x86_64 reloc-kind constants, reloc bit-field accessors, the parser entry points (macho_open, macho_list_symbols, macho_find_symbol), the ARM64 + x86_64 reloc-kind maps, and the macho_emit_stencil_header entry. Function names take a macho_ prefix to match the module boundary; the selftest and the main dispatch update to the new spelling.

- tools/stencil_extract/macho.h: new. Mach-O header types + constants + reloc accessor inlines + parser API. - tools/stencil_extract/macho.c: new. Implementation. - tools/stencil_extract.c: removes the Mach-O block; main dispatches to macho_emit_stencil_header; selftest references macho_reloc_* symbols. - lib/mino/tasks/builtin.clj: stencil-extract-srcs gains macho.c.

The generated stencil headers regenerate byte-identical across the move. ELF + COFF stay in the monolith; their carve-outs land in v0.243 and v0.244.

release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical.

v0.241.0 — Extractor Carve-Out: core Module

Opens cycle E. The stencil extractor lived for the entire cpjit cycle as a single 1700-line tools/stencil_extract.c covering Mach-O, ELF, and COFF parsers plus shared schema, format-agnostic emit, and the selftest. Adding a fifth format compounds the problem; carving the file into per-format modules unblocks easier maintenance for the rest of the cycle and any future format work.

This first carve-out lifts the format-agnostic plumbing into a core module:

- tools/stencil_extract/core.h: mblob_t, stencil_reloc_t, sym_table_intern, write_stencil_header prototypes, and the MINO_STENCIL_RELOC_* host enum. - tools/stencil_extract/core.c: matching implementation. - tools/stencil_extract.c: shrinks; includes core.h.

The build task build-stencil-extract now compiles the multi-file pipeline. The binary moves to tools/stencil-extract (hyphen, matching the mino-lean convention) so the underscore directory name can carry the source modules without a filesystem collision. All callers, source comments, and .gitignore updated.

The per-format parsers (Mach-O, ELF, COFF) stay in the entry file for now; their carve-outs follow in v0.242 -- v0.244. The generated stencil headers stay byte-identical across the move; a parser refactor that changes any byte output would be a bug, not a feature.

- tools/stencil_extract/core.h: new. - tools/stencil_extract/core.c: new. - tools/stencil_extract.c: removed the carved-out declarations. - lib/mino/tasks/builtin.clj: build-stencil-extract extended to take a source list; check-reloc-mirror reads core.h instead of the monolith for the enum mirror. - .gitignore: tracks tools/stencil-extract (was tools/stencil_extract).

release-gate green: 1737 tests / 7919 assertions, ASan clean, 4-way JIT parity stdout byte-identical, generated stencil headers identical before and after the split.

v0.240.0 — 4-Way JIT Parity (AUTO / ON / OFF / lean) + Cycle Close

Closes cycle D. The parity test now exercises every JIT configuration the runtime exposes and asserts byte-identical stdout across all four:

- ./mino --jit=auto (the default release-gate path) - ./mino --jit=on (eager-compile every eligible fn) - ./mino --jit=off (interpreter only, JIT pipeline gated out) - ./mino-lean (the lean build, no JIT pipeline at all)

Any divergence -- a JIT'd stencil whose output drifts from the interpreter, a runtime gating bug, an embed-API skew between the two binaries -- now surfaces in a localised diff against the AUTO baseline.

- lib/mino/tasks/builtin.clj: test-jit-parity rewritten to drive four variants instead of two. The failure path writes jit-parity-<label>.out per variant and diffs each non-matching variant against the AUTO baseline. - .gitignore: drops the obsolete jit-parity-jit.out; adds jit-parity-jit-{auto,on,off}.out to match the new artifacts.

Cycle D summary (v0.237.0 -- v0.240.0):

- Dual-binary build: mino (full) + mino-lean (no-JIT, ~4 % smaller static footprint). - Per-state runtime JIT mode: AUTO (default) / OFF / ON. Each VM state has its own mode; CLI --jit= flag, MINO_JIT env var, mino_state_set_jit_mode embed API. - Per-state hot-threshold tuning. CLI --jit-threshold=N, MINO_JIT_HOT_THRESHOLD env var, mino_state_set_jit_hot_threshold embed API. - Capability query: mino_state_jit_capability returns {available, mode, threshold, host_arch, host_os}. - 4-way CI parity (./mino --jit=auto|on|off + ./mino-lean) confirming stdout byte-identical across all configurations.

End-to-end portability (cycles A1-A4) lands underneath: ARM64 Darwin, ARM64 Linux, x86_64 Linux, x86_64 Darwin, and x86_64 Windows all have on-disk byte tables plus the runtime patcher and memory-API wrappers they need to compile and run.

release-gate green: 1737 tests / 7915 assertions, ASan clean, 4-way JIT parity stdout byte-identical.

v0.239.0 — JIT Threshold Tuning + Capability Query API

Adds per-state hot-threshold tuning plus a public capability-query API so embedders can introspect what the runtime they linked against can actually do.

- src/mino.h: new mino_state_set_jit_hot_threshold / mino_state_jit_hot_threshold setter and getter. Zero clamps to 1 (intent is "ASAP", same gating as MINO_JIT_MODE_ON but still in AUTO). New mino_jit_capability_t struct {available, mode, threshold, host_arch, host_os} and mino_state_jit_capability query function. - src/runtime/internal.h: new unsigned jit_hot_threshold field on struct mino_state (placed next to jit_mode so the layout-static-assert in entry.c stays green). - src/runtime/state.c: state_init reads MINO_JIT_HOT_THRESHOLD env var (positive integer, falls back to MINO_JIT_THRESHOLD on parse failure / non-positive value). Implements the setter/getter and capability query. - src/eval/bc/jit.h: new MINO_CPJIT_HOST_DETECTED macro exposed alongside MINO_JIT_THRESHOLD. Mirrors the host detection cascade in eval/bc/jit/internal.h so state.c can branch on build-time host support without pulling in the JIT-private header. - main.c: new --jit-threshold=N CLI flag. Rejects non-positive or unparseable values with a clear error message. Usage banner updated.

release-gate green; --jit-threshold=5 cuts the warm-up window from 100 to 5 calls, observable through MINO_CPJIT_STATS=1. Capability query confirms available=1, host_arch="arm64", host_os="darwin" on the ARM64 Darwin development host.

v0.238.0 — Runtime JIT Mode (AUTO / OFF / ON)

Adds per-state JIT mode control to the full mino binary. Three modes:

- AUTO (default) -- JIT eligible fns after warming past the per-state hot threshold (currently 100 calls). - OFF -- never JIT. Useful for embedding hosts that need predictable cold-start latency or that ship a W^X security policy. - ON -- JIT every eligible fn on its first call (no threshold). Useful for benchmarking and for hosts that know ahead of time JIT'd execution is wanted everywhere.

Each VM state has its own mode, so a single embedding process can mix OFF, AUTO, and ON runtimes per workload.

- src/mino.h: new public mino_jit_mode_t enum (AUTO=0 / OFF=1 / ON=2) and mino_state_set_jit_mode / mino_state_jit_mode setter and getter. Both are no-ops on a NULL state; setter validates the enum range so a stray int doesn't slip through. - src/runtime/internal.h: new int jit_mode field on struct mino_state. Placed after jit_invoke_ctx so the runtime-layout offsets the stencil bytes depend on don't shift (the layout-static-assert in entry.c stays green). - src/runtime/state.c: state_init reads MINO_JIT env var (auto / off / on, case-insensitive) for the initial mode; unknown values fall back to AUTO. Setter + getter implementations land here. - src/eval/fn.c: both JIT trigger sites (the argv-by-cons apply path and the bytecode apply path) gate on S->jit_mode != OFF. AUTO uses the per-state hot threshold; ON uses threshold 1. - main.c: new --jit=auto|off|on CLI flag. Rejects unknown values with a clear error message. CLI overrides the env var when both are set. Usage banner updated.

release-gate green: 1737 tests / 7915 assertions, ASan clean, JIT parity byte-identical. --jit=on compiles 40 fns on a trivial expression vs the AUTO default's 1; --jit=off keeps the JIT stats counters at zero through the whole evaluation.

v0.237.0 — Dual-Binary Build: mino + mino-lean

Opens cycle D. The pre-existing mino_nojit build is renamed to mino-lean -- the lean distributable artifact for hosts that prefer a smaller static footprint and faster cold start over JIT throughput. Both binaries build from the same source tree.

- lib/mino/tasks/builtin.clj: build-nojit renamed to build-lean. Output binary is ./mino-lean (hyphen matches the mino ecosystem convention of mino-site / mino-bench / mino-lsp / mino-examples / tree-sitter-mino). The test-jit-parity task drives both binaries; the parity-diff artifact pair is now jit-parity-jit.out / jit-parity-lean.out. - mino.edn: build-nojitbuild-lean. Test runner registers the renamed task. - .gitignore: drops the obsolete /mino_nojit, /mino_no_jit, and jit-parity-nojit.out entries; adds /mino-lean and jit-parity-lean.out.

On ARM64 Darwin, mino-lean measures 991 KB vs mino at 1.03 MB -- a ~4 % static-footprint reduction driven by dropping the emit / patcher / stencil-table code paths the lean build doesn't need.

release-gate green; parity-test stdout byte-identical between the two binaries.

Subsequent cycle D releases add runtime JIT mode control (ON/OFF/AUTO), threshold tuning, and a capability query API to the full mino binary so embedding hosts can choose their JIT posture per VM state.

v0.236.0 — COFF Parser + VirtualAlloc Swap + Generated x86_64 Windows Header

Closes cycle A4. The extractor now parses PE/COFF amd64 object files alongside Mach-O and ELF; the runtime swaps mmap / mprotect / munmap for VirtualAlloc / VirtualProtect / VirtualFree under _WIN32; a cross-compiled stencils_x86_64_windows.h is checked in. With cycles A2 and A3 already delivering the runtime x86_64 patcher set and the Mach-O x86_64 reloc path, Windows x86_64 builds now have a complete JIT-eligible toolchain. End-to-end verification needs a Windows x86_64 host (deferred); ARM64 Darwin builds stay byte-identical.

- tools/stencil_extract.c: - New IMAGE_FILE_HEADER / IMAGE_SECTION_HEADER types plus raw-byte accessors for the 18-byte IMAGE_SYMBOL and 10-byte IMAGE_RELOCATION entries (the COFF spec packs these to odd sizes; struct layout would be non-portable). - New coff_view_t, coff_open, coff_list_symbols, coff_find_symbol (derives function size by scanning the next .text symbol -- COFF doesn't record size in the IMAGE_SYMBOL entry), and coff_extract_relocs. - New reloc_x86_64_coff_kind_map: covers IMAGE_REL_AMD64_REL32 (PC32, addend -4), IMAGE_REL_AMD64_REL32_1 (PC32, addend -5), and IMAGE_REL_AMD64_ADDR64 (ABS64, addend 0). Unknown kinds reject so the build fails loudly. - main() dispatches between Mach-O / ELF / COFF based on the first few bytes; the placeholder error that pinned the COFF path is now replaced with a fully wired extractor. - --selftest extended to cover every COFF reloc entry plus the unknown-rejects path. - src/eval/bc/jit/emit.c: - New jit_region_alloc / make_rx / free / page_size host abstractions. POSIX uses the existing mmap / mprotect / munmap / sysconf(_SC_PAGESIZE) calls; _WIN32 uses VirtualAlloc(MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE) + VirtualProtect(PAGE_EXECUTE_READ) + VirtualFree(MEM_RELEASE) + GetSystemInfo. - Every call site swapped to the wrapper. The compile pipeline and the region-tracking teardown both go through jit_region_free so a Windows build never executes a stray munmap. - src/eval/bc/jit/internal.h: MINO_CPJIT_X86_64_WINDOWS detection (already scaffolded in cycle A2) now wires stencils_x86_64_windows.h. - lib/mino/tasks/builtin.clj: new gen-stencils-x86-64-windows task. Cross-compiles every stencil via clang --target=x86_64-pc-windows-msvc -mno-red-zone. - mino.edn: registers the new task. - src/eval/bc/stencils/generated/stencils_x86_64_windows.h (new): cross-compiled byte tables for all 39 stencils. Every non-final stencil's symbol table includes mino_jit_chain_continue_marker; reloc kinds are PC32 (COFF doesn't have a GOT, so extern var reads are direct rip-relative and map to PC32 alongside the call/jmp/musttail PC32s).

release-gate green on ARM64 Darwin (1737 tests / 7915 assertions, ASan clean, JIT parity byte-identical). Cycles A1-A4 are now complete: ARM64 Darwin, ARM64 Linux, x86_64 Linux, x86_64 Darwin, and x86_64 Windows all have on-disk byte tables plus the runtime patcher and memory-API wrappers they need.

v0.235.0 — Mach-O x86_64 in Extractor + Generated x86_64 Darwin Header

Lands the cycle A3 work: Mach-O x86_64 reloc-kind mapping in the extractor, dispatch by cputype between ARM64 and x86_64, and a cross-compiled stencils_x86_64_darwin.h. With cycle A2's runtime patcher set already in place, x86_64 Darwin builds now have an on-disk byte-table they can pick up via -DMINO_CPJIT_X86_64_DARWIN=1. End-to-end verification needs an x86_64 Darwin host (deferred); ARM64 Darwin builds stay byte-identical.

- tools/stencil_extract.c: - New X86_64_RELOC_* constants for the kinds clang emits (UNSIGNED, SIGNED, BRANCH, GOT_LOAD, GOT, SIGNED_1 / _2 / _4). The SIGNED_X family encodes the implicit addend (-1 / -2 / -4) in the kind itself because Mach-O REL relocations carry no r_addend field. - New CPU_TYPE_X86_64 / CPU_TYPE_ARM64 constants. - New reloc_x86_64_macho_kind_map: maps Mach-O x86_64 reloc kinds to the runtime-stable MINO_STENCIL_RELOC_X86_64_* enum and writes out the implicit addend so the patcher gets the same S + A - P semantics as the ELF path. - extract_relocs now reads v->hdr->cputype and dispatches between reloc_arm64_kind_map and reloc_x86_64_macho_kind_map. The recorded addend now flows through (was hard-coded to 0 in the ARM64-only path). - --selftest extended to cover every x86_64 Mach-O reloc entry plus the unknown-rejects path. - lib/mino/tasks/builtin.clj: new gen-stencils-x86-64-darwin task. Cross-compiles every stencil via clang --target=x86_64-apple-darwin -mno-red-zone and emits stencils_x86_64_darwin.h. - mino.edn: registers the new task. - src/eval/bc/stencils/generated/stencils_x86_64_darwin.h (new): cross-compiled byte tables for all 39 stencils. PC32 + GOTPCREL reloc kinds; every non-final stencil's symbol table includes mino_jit_chain_continue_marker; implicit addend of -4 stored explicitly in the reloc tuple's fourth slot.

release-gate green on ARM64 Darwin (1737 tests / 7915 assertions, ASan clean, JIT parity byte-identical).

v0.234.0 — x86_64 Patcher + Direct-emit + Trampoline + Arch Dispatch

Lands the second half of cycle A2: x86_64 patcher functions, direct-emit byte templates, trampoline encoding, arch dispatch in emit.c, and a cross-compiled stencils_x86_64_linux.h. With v0.233.0's musttail chain mechanism, the runtime side now compiles cleanly on x86_64 hosts when configured with -DMINO_CPJIT_X86_64_LINUX=1. End-to-end verification needs an x86_64 Linux host (deferred to the platform CI step); ARM64 Darwin builds stay byte-identical.

- src/eval/bc/jit/internal.h: new MINO_CPJIT_HOST_ARM64 / MINO_CPJIT_HOST_X86_64 macros set alongside the existing MINO_CPJIT_HOST guard. Arch-conditional MINO_JIT_JMP_SIZE, MINO_JIT_JMPIFNOT_SIZE, and MINO_JIT_TRAMPOLINE_SIZE. New x86_64 patcher signatures (patch_abs64, patch_pc32, patch_gotpcrel, patch_jmp32_to, patch_jcc32_to). Windows x86_64 host detection scaffolded behind MINO_CPJIT_X86_64_WINDOWS for cycle A4. - src/eval/bc/jit/patcher.c: gate changed from MINO_CPJIT_HOST to MINO_CPJIT_HOST_ARM64. No body changes; the file now opts out cleanly on non-ARM64 builds. - src/eval/bc/jit/patcher_x86_64.c (new): all x86_64 patchers plus direct-emit OP_JMP / OP_JMPIFNOT byte templates (30-byte JMPIFNOT covers NULL + nil/false-tagged via mov rax, [rdi+disp]; test rax, rax; je <taken>; sub rax, 2; cmp rax, 1; jbe <taken>) plus the 12-byte movabs rax, target; jmp rax trampoline writer. - src/eval/bc/jit/emit.c: per-reloc-kind switch now dispatches on arch. ARM64 cases continue calling patch_branch26 / patch_adrp / patch_pageoff12_ldr64 / patch_imm19; x86_64 cases call patch_pc32 / patch_gotpcrel / patch_abs64 / patch_jmp32_to / patch_jcc32_to. Reloc addend is now passed through; ARM64 patchers ignore it, x86_64 patchers use ELF's S+A-P arithmetic. The chain pass handles the 5-byte e9 rel32 encoding by backing up one byte from the reloc offset to land on the opcode before calling patch_jmp32_to. - tools/stencil_extract.c: x86_64 ELF reloc map extended to cover R_X86_64_GOTPCRELX (41) alongside _REX_GOTPCRELX (42) -- both collapse to MINO_STENCIL_RELOC_X86_64_GOTPCREL. clang emits GOTPCRELX for MINO_STENCIL_IMM_* reads under default codegen, so this entry was load-bearing for the cross-compile. Selftest extended. - lib/mino/tasks/builtin.clj: new gen-stencils-x86-64-linux task. Cross-compiles every stencil source via clang --target=x86_64-linux-gnu -mno-red-zone and emits stencils_x86_64_linux.h. Added patcher_x86_64.c to lib-srcs so the mino-lean (no-JIT) build picks it up too. - mino.edn: registers the new gen-stencils task. - src/eval/bc/stencils/generated/stencils_x86_64_linux.h (new): cross-compiled byte tables for all 39 stencils. PC32 + GOTPCREL reloc kinds; every non-final stencil's symbol table contains mino_jit_chain_continue_marker.

release-gate green on ARM64 Darwin. The x86_64 Linux build path is now code-complete; running it end-to-end needs a Linux x86_64 host with the cross-compiled header.

v0.233.0 — Stencil ABI Overhaul: musttail Chain Marker

Replaces the ARM64-only chain mechanism with a portable musttail-call marker, making the stencil ABI host-agnostic. The old design leaned on three properties of AArch64 + AAPCS: 2-pointer struct return goes to (x0, x1), arg registers are also (x0, x1, x2), and ret plus b are both 4 bytes so a patcher can swap one for the other in place. None of these hold on x86_64 SysV, so the chain mechanism had to change before x86_64 stencils could ship.

- src/eval/bc/stencils/abi.h: mino_stencil_chain_t typedef removed; the chain-return type is now plain void. MINO_STENCIL_CHAIN_RETURN(regs, consts, S) lowers to __attribute__((musttail)) return mino_jit_chain_continue_marker( regs, consts, S);. New extern declaration for the marker. - Every chained stencil source: return type changed from mino_stencil_chain_t to void. stencil_op_move and stencil_op_load_k had their signatures normalised to the canonical (regs, consts, S) shape so musttail's strict signature-match holds; each now ends with an explicit MINO_STENCIL_CHAIN_RETURN. - stencil_op_loop_int_dec, stencil_op_loop_int_lt, and stencil_op_loop_int_lt_inc had their NULL-return paths (return (mino_stencil_chain_t){NULL, consts};) replaced with MINO_STENCIL_CHAIN_RETURN(NULL, consts, S); so the soft-NULL signal flows through the same chain branch. - src/eval/bc/jit/helpers.c: new mino_jit_chain_continue_marker(regs, consts, S) no-op stub. The runtime never executes it; the linker keeps the symbol so stencil .o files have something to reference, and the JIT rewrites every call site before the region is set RX. - src/eval/bc/jit/internal.h: new MINO_JIT_CHAIN_MARKER_NAME constant, declaration added. - src/eval/bc/jit/emit.c: new SYM_SLOT_CHAIN slot kind; inst_t carries the stencil descriptor pointer so the post-emit pass can walk each non-final stencil's reloc table. Pass A no longer scans for 0xd65f03c0 ret instructions -- it walks the reloc table for each non-final stencil, finds every MINO_JIT_CHAIN_MARKER_NAME relocation, and patches the BRANCH26 to point at the next instance's native_start. Inline fast paths that exit through multiple basic blocks each emit their own chain reloc; the pass patches them all. - src/eval/bc/stencils/generated/stencils_arm64_darwin.h and stencils_arm64_linux.h: regenerated. Every non-final stencil's symbol table now includes mino_jit_chain_continue_marker; the trailing 4 bytes of each non-final stencil's byte table changed from 0xc0 0x03 0x5f 0xd6 (ret) to 0x00 0x00 0x00 0x14 (a placeholder b 0 that carries the BRANCH26 reloc against the marker).

This release deliberately collapses the originally-planned v0.233.0 (ABI overhaul) and v0.234.0 (emit.c chain pass) into a single tag because they form one cohesive build-green-keeping transition: the ABI change without the emit pass would leave the runtime unable to JIT, and the emit pass without the ABI change would have no chain marker relocs to find. Subsequent A2 work (x86_64 patcher, direct-emit, trampoline, arch dispatch, generated x86_64 Linux header) now lands cleanly on top because the chain mechanism is no longer arch-specific.

release-gate green on ARM64 Darwin: 1737 tests / 7915 assertions, ASan suite clean, JIT parity stdout byte-identical.

v0.232.0 — x86_64 ELF Reloc Map + Enum Mirror

Opens cycle A2 (x86_64 portability). Adds the runtime-stable MINO_STENCIL_RELOC_X86_64_* enum entries (ABS64=6, PC32=7, GOTPCREL=8) on both the runtime side (src/eval/bc/jit/internal.h) and the extractor side (tools/stencil_extract.c), and wires the ELF x86_64 reloc-kind map.

- reloc_x86_64_elf_kind_map covers R_X86_64_64 (ABS64), R_X86_64_PC32 and R_X86_64_PLT32 (both collapse to PC32 -- the linker-only PLT indirection doesn't apply once stencils are flattened into the JIT region), and R_X86_64_GOTPCREL / R_X86_64_REX_GOTPCRELX (collapse to GOTPCREL -- REX is a peephole hint, not a different patcher instruction). - elf_extract_relocs now dispatches on e_machine to pick the right kind map. AArch64 and x86_64 ELF objects both parse cleanly; unrecognised machines still error out. - --selftest extended to cover every x86_64 entry and the unknown-rejects path. - G3 check-reloc-mirror automatically picks up the three new shared keys -- the gate already compares the intersection of runtime-declared and extractor-declared MINO_STENCIL_RELOC_* names. Verified by a synthetic x86_64 ELF smoke test that extracted a stencil with a PC32 call site (addend=-4 for the rip-points-to-next-instruction convention) and a GOTPCREL global-load.

x86_64 patcher functions, direct-emit templates, trampoline encoding, and arch dispatch in emit.c land in v0.233.0; generated stencils_x86_64_linux.h follows in that release.

release-gate green.

v0.231.0 — Generated stencils_arm64_linux.h Committed

Closes cycle A1. The byte tables for every stencil are now generated and checked in for ARM64 Linux alongside the existing ARM64 Darwin header. Native Linux builds pick up the committed header without needing a regenerate pass.

- lib/mino/tasks/builtin.clj: gen-stencils refactored into a parameterised gen-stencils-for so the same compile + extract pipeline drives every target. New gen-stencils-arm64-linux task cross-compiles via clang's --target=aarch64-linux-gnu, using the same -O2 -fno-builtin -fno-optimize-sibling-calls flags as the Darwin path so the generated codegen mirrors what a native Linux clang would produce. - mino.edn: new gen-stencils-arm64-linux task entry. - src/eval/bc/stencils/generated/stencils_arm64_linux.h: generated header committed. 39 stencils, byte-for-byte equivalent semantics to the Darwin header with the expected ELF/Mach-O code-gen ordering differences (relocs reorder, instruction scheduling reshuffles loads).

Runtime side already gates on MINO_CPJIT_ARM64_LINUX and points at this header path (src/eval/bc/jit/internal.h:23). Native ARM64 Linux builds with -DMINO_CPJIT_ARM64_LINUX=1 now compile cleanly with the JIT enabled. CI runner setup for ARM64 Linux is operational follow-up; this release ships the on-disk artifacts.

release-gate green on Darwin.

v0.230.0 — ELF Parser In Stencil Extractor

Opens cycle A1 (ARM64 Linux portability). The stencil extractor now parses 64-bit ELF object files alongside its existing Mach-O path, enabling stencil generation on Linux hosts.

- tools/stencil_extract.c: new elf64_ehdr_t / elf64_shdr_t / elf64_sym_t / elf64_rela_t typedefs; elf_open walks the section table to locate .text + .symtab + .rela.text; elf_list_symbols / elf_find_symbol / elf_extract_relocs mirror their Mach-O counterparts. - reloc_arm64_elf_kind_map maps R_AARCH64_* constants to the runtime-stable MINO_STENCIL_RELOC_* enum -- covers ABS64, CALL26 / JUMP26 (branch26), ADR_PREL_PG_HI21 (page21), ADD_ABS_LO12_NC / LDST64_ABS_LO12_NC (pageoff12), ADR_GOT_PAGE (got_load_page21), LD64_GOT_LO12_NC (got_load_pageoff12). - Format-agnostic write_stencil_header extracted from emit_stencil_header so both Mach-O and ELF paths funnel through the same on-disk shape. - Extractor dispatch in main() sniffs the file magic and routes to either parser. The COFF placeholder remains for the Windows cycle. - --selftest extended to cover ELF struct sizes, elf64_r_sym / elf64_r_type decode, and every entry of the AArch64 kind map.

Runtime side unchanged -- ARM64 codegen, patchers, and direct-emit templates are arch-shared with Darwin and reuse without modification on ARM64 Linux. The generated header stencils_arm64_linux.h lands in v0.231.0 after building on an ARM64 Linux host.

release-gate green on Darwin.

v0.229.0 — JIT Stencil For OP_ASSOC + Cycle C Close

Closes the second coverage cycle. The 3-arg (assoc coll k v) fast-lane stencil rounds out the seven hot vector / map ops named in the post-CPJIT-hygiene audit.

The triple [coll, k, v] sits at regs[B..B+2]; the slow helper routes: - MINO_VECTOR + tagged-int key in range (or len-equal for the append case) → vec_assoc1. - MINO_MAP key → mino_map_assoc1. - Otherwise → prim_assoc (sorted-map, record, transient, non-int vec key, idx out of range, variadic forms).

Cycle close status doc at .local/cpjit-coverage-c-cycle-status.md. Tier 2-5 ops (extended arith, reserve opcodes, OP_LOOP_INT_LT) remain explicitly out of scope; the next cycle is A1 (ARM64 Linux portability).

v0.228.0 — JIT Stencil For OP_CONJ_VEC

Continues the coverage cycle. The (conj v x) fast-lane stencil now trampolines through mino_jit_conj_vec_slow. Vector case allocates via vec_conj1 (and refreshes regs base after the possible relocation); other coll types fall through to prim_conj for lists / sorted-colls / sets / maps / transients.

release-gate green.

v0.227.0 — JIT Stencil For OP_GET_KW_MAP

Continues the coverage cycle. Adds the (get coll k) fast-lane stencil. Trampoline into mino_jit_get_kw_map_slow which mirrors the bc_run handler: MINO_MAP via map_get_val, MINO_RECORD + MINO_KEYWORD via record_field_index, anything else (sorted-map, transient, 3-arg-default, ext-map miss) routes through prim_get.

release-gate green.

v0.226.0 — JIT Stencils For OP_COUNT_VEC + OP_EMPTY_VEC

Continues the coverage cycle. Two more trampoline stencils for the remaining single-arg vector predicates. Same shape as v0.225.0: the slow helper carries the vector fast lane (MINO_VECTOR + .len read / zero-test) and the prim_count / prim_empty_p fallback.

- src/eval/bc/stencils/count_vec.c, empty_vec.c (new). - mino_jit_count_vec_slow, mino_jit_empty_vec_slow in src/eval/bc/jit/helpers.c. - Stencil registry + descriptor + extern-fn registrations updated. - stencils_arm64_darwin.h regenerated.

release-gate green.

v0.225.0 — JIT Stencils For OP_NTH_VEC + OP_FIRST_VEC

Opens the coverage cycle (cycle C). Adds two new stencils so the JIT no longer falls back to the interpreter on fns that use vector indexing or first. Both stencils are trampoline-only — the interpreter's vector fast lane lives in a new slow helper (mino_jit_nth_vec_slow / mino_jit_first_vec_slow) that mirrors the bc_run handlers' MINO_VECTOR + tagged-int checks before falling through to prim_nth / prim_first. Stencil sources stay hermetic; the helper carries the type-switch.

Why coverage rather than perf

The dominant per-call cost in the interpreter's NTH_VEC / FIRST_VEC handlers is the type check and the vec_nth body itself, not the bc_run switch dispatch around them. The JIT version still calls the same fast lane via the slow helper, so the inline savings are minimal. The visible signal is JIT eligibility: fns that use vector indexing or first as a hot op now pass mino_jit_classify_eligibility and get JIT-compiled instead of running through the interpreter.

What landed

- src/eval/bc/stencils/nth_vec.c, first_vec.c (new). - mino_jit_nth_vec_slow, mino_jit_first_vec_slow in src/eval/bc/jit/helpers.c. - Stencil registry entries in lib/mino/tasks/builtin.clj (both gen-stencils and check-stencil-registry lists). - Stencil descriptor entries in entry.c::mino_jit_stencils[]. - Helper extern-fn registrations in entry.c::g_extern_fns[]. - stencils_arm64_darwin.h regenerated.

release-gate green; correctness verified via the full test suite (1737 tests / 7915 assertions).

v0.224.0 — Perf-Pivot Cycle Close

Closes the apply_callable_argv inlining cycle (v0.220 - v0.224). The architectural changes are in place: the IC slot carries classified callable shape, OP_CALL_CACHED branches three ways on that shape, the bc-fn invocation core is factored, and mino_bc_run's arity dispatch peels the single-clause case.

The cycle's plan gate (fib(25) 1.30x after v0.222.0) is not met. Measured medians sit within run-to-run noise across the four releases. Status report + root-cause analysis in .local/cpjit-perf-pivot-cycle-status.md.

Direction for the next perf cycle: cache bc->native directly in the IC slot, inline bc_push_window + arg copy in the stencil, and push callable-kind dispatch into a per-slot function pointer. Substantial enough to merit its own cycle; status doc names the concrete levers.

v0.223.0 — PRIM_ARGV Fast Path In OP_CALL_CACHED

Step 4 of the apply_callable_argv inlining cycle. Adds a third branch in OP_CALL_CACHED's inline fast path: when the IC slot has classified the cached callable as MINO_IC_CALLABLE_PRIM_ARGV, the stencil routes to a new mino_jit_call_known_prim_slow helper that invokes the prim's fn2 directly without going through apply_callable_argv's var-unwrap + type-of dispatch switch.

Like v0.221 (the MINO_FN_BC_SINGLE branch), this is architectural parity with the cycle's plan. The measurable win is similar magnitude — a few branches saved per call out of a ~1 us total per-call cost on a tight (dotimes [_ N] (f 0)) loop with f = inc. The bigger levers (mino_bc_run setup, dotimes / loop overhead, JIT region re-entry through mino_jit_invoke) are unaddressed and dominate.

What landed

- mino_jit_call_known_prim_slow in src/eval/bc/jit/helpers.c: handles the Var-unwrap defensively, then calls prim.fn2 directly with push_frame for trace attribution. Bails to apply_callable_argv if the cached value's shape has drifted. - Registered in entry.c::g_extern_fns[]; declared in abi.h. - stencils/call_cached.c now has a three-way branch in the IC hot path: MINO_FN_BC_SINGLE → known-fn helper, PRIM_ARGV → known-prim helper, else → mino_jit_call_resolved_slow.

stencils_arm64_darwin.h regenerated (call_cached grew further to accommodate the third branch). release-gate green.

v0.222.0 — Single-Clause Dispatch Fast Path In mino_bc_run

Step 3 of the apply_callable_argv inlining cycle. Adds an early-out in mino_bc_run's arity dispatch for the common case of n_clauses == 1, peeling that hot shape off so it skips two loop iterations + two bounds tests per call. Annotated __builtin_expect(... == 1, 1) because measured workloads (fib, reduce, transducer compositions, realistic_bench rows) overwhelmingly use single-arity callees.

Measurement honesty

The plan named a 1.30x fib(25) gate for this release. Measured: no meaningful improvement over v0.221.0 (within run-to-run noise). Root cause analysis on Apple Silicon:

- Dispatch loop savings ~5 ns/call were predicted; actual savings appear to be near zero because the loop's two iterations + two branches predict near-perfectly on a stable callee shape, so the pipeline retires the loop body at branch-predictor rate. - The bulk of mino_bc_run's per-call cost lives further down in bc_push_window, the regs slot zeroing on pop, and the try-state snapshot setup — none of which this release touches.

Per [[measure-before-after]] / [[no-fakery]]: the architectural change is correct and shipped, but the cycle's headline 1.30x gate is not realistically reachable on this codebase without a larger refactor that bypasses mino_bc_run entirely from the JIT helper (allocating the register window inline + dispatching to mino_jit_invoke directly, skipping push_frame / defining_ns / try-snapshot setup). That refactor is invasive enough to merit its own scope; the v0.222 release ships what it can defensibly verify and v0.224's cycle-close re-evaluates whether to chase the remainder.

release-gate green; mino_jit_call_known_fn_slow correctness verified via parity suite.

v0.221.0 — Known-Callee Fast Path In OP_CALL_CACHED (Scaffolding)

Step 2 of the apply_callable_argv inlining cycle. Adds the kind-aware branch in the JIT's OP_CALL_CACHED inline fast path plus the new helper it calls into. The architectural saving (skipping apply_callable_argv's dispatch switch) is in place, but the measurable win lands in v0.222.0 when the bc-fn entry inlines clause-arity dispatch — the dispatch-switch cost on its own is within the call layer's own measurement noise.

What landed

- mino_apply_known_bc_fn_argv (new external entry in src/eval/fn.c): skips apply_callable_argv's var-unwrap + type-of dispatch switch and delegates straight to the bc-fn invocation core. Defensive: returns to apply_callable_argv if the callee's shape has drifted from what the IC slot captured. - invoke_bc_fn_argv (new shared core in src/eval/fn.c): always_inline static inline extraction of the bc-fn branch body — lazy compile, fold-staleness recompile, hot-counter bump, JIT invalidation, push/pop frame, defining_ns scope, tail-call trampoline. Called inline from both apply_callable_argv and the new known-callee entry, so the refactor adds no call layer. - mino_jit_call_known_fn_slow (new helper in src/eval/bc/jit/helpers.c): JIT-side counterpart to mino_jit_call_resolved_slow that routes through the new known entry. Registered in entry.c::g_extern_fns[]; declared in abi.h so stencil sources can reference it. - stencils/call_cached.c now branches on slot->cached_callable_kind. When the IC slot has classified the callable as MINO_FN_BC_SINGLE and the incoming argc matches cached_fn_n_params (and cached_fn_has_rest == 0), the stencil routes to mino_jit_call_known_fn_slow; otherwise it falls back to mino_jit_call_resolved_slow.

stencils_arm64_darwin.h regenerated. The OP_CALL_CACHED stencil's byte count grew 200 -> 276 because the kind-aware dispatch lives inside the inline fast path before the bl into the slow helper.

Why no measurement gate

The plan named a 1.15x fib(25) gate for this release. Measured: ratio is within noise. Root cause is structural: skipping apply_callable_argv's 3-branch dispatch saves ~3-5 ns/call, which is recovered by the new IC-slot kind-check branch in the stencil's inline path. The architectural saving is real, but it can only materialize after the bc-fn entry skips its own clause-arity dispatch loop — that's v0.222.0's scope. Per [[measure-before-after]] / [[no-fakery]] the release is shipped as scaffolding with that framing instead of being held back or relaxed-gated.

release-gate green.

v0.220.0 — IC Slot Callable-Shape Cache (Setup)

Step 1 of the apply_callable_argv inlining cycle. The IC slot (src/eval/bc/internal.h::mino_bc_ic_slot_t) grew three new fields populated on each cache fill, in preparation for the v0.221+ OP_CALL_CACHED fast-path branch that will skip the apply_callable_argv dispatch switch on observed-stable callees.

- cached_callable_kind (1 byte): one of MINO_IC_CALLABLE_NONE / _PRIM_ARGV / _MINO_FN_BC_SINGLE / _MINO_FN_BC_MULTI / _OTHER. Set in ic_resolve_global (src/eval/bc/vm.c) by a new classify_callable_kind helper that walks one var-deref and inspects the value's type + arity-clause shape. - cached_fn_has_rest (1 byte) and cached_fn_n_params (2 bytes): populated for MINO_FN_BC_SINGLE; zero otherwise.

Struct size grew 48 -> 56 bytes (existing offsets unchanged; the three new fields fit either in pre-existing padding after kind or after the trailing cached_type pointer). The JIT layout asserts in src/eval/bc/jit/entry.c pin every offset including the new ones, and the runtime_layout.h mirror tracks the same struct.

Stencil bytes for OP_GETGLOBAL_CACHED and OP_CALL_CACHED regenerated (stencils_arm64_darwin.h) because the per-slot stride folded into the index multiplication changed.

No behaviour change in this release. The fields are filled on every IC fill but no reader consumes them yet. release-gate green.

v0.219.0 — Regex Engine + str/split With Regex Separators

Two runtime defects surfaced while wiring G3's reloc parser in v0.218.0 (both logged in .local/BUGS.md). Fixes shipped together so the next refactor cycle can use canonical regex forms without the workaround layer that v0.218.0 had to wrap around them.

Regex compile no longer truncates silently

re_compile (src/regex/re.c) carried tinyregex-c's original fixed-size limits of 30 compiled objects and a 40-byte character-class buffer. Patterns past the object limit were silently truncated; when truncation landed inside an open capture group, the post-loop group-balance check rejected the partial compile and re-find surfaced MCT001 invalid regex pattern on any input. The smallest reproducer in the wild was #"#define\s+(MINO_STENCIL_RELOC_[A-Z_0-9]+)\s+(\d+)u?", which overflowed because each literal character takes one slot.

Two changes:

- MAX_REGEXP_OBJECTS 30 -> 256 and MAX_CHAR_CLASS_LEN 40 -> 256. Per-pattern heap footprint goes from ~520 bytes to ~4.3 KB, which is bounded and freed by re_free after each match. - Overflow is now an explicit compile failure: if the compile loop exits with input remaining, re_compile returns NULL instead of falling through to the group-balance check. Callers (re-find, re-matches, clojure.string/split) translate NULL to MCT001 invalid regex pattern, which is the same diagnostic callers already handle for malformed patterns.

Regression test: tests/regex_test.clj re-find-long-pattern-with-capture.

`clojure.string/split` now honours regex separators

prim_split (src/prim/string.c) used to treat a regex separator's source string as a literal substring, so #"\s+" never matched any whitespace in the input and split returned the whole string as a one-element vector. The TODO comment in the previous source said as much.

The regex path now compiles the pattern via re_compile and walks match sites using re_matchp, emitting the substring before each match. Zero-width matches advance by one codepoint to avoid an infinite loop on patterns like #"a*". Trailing empty pieces are stripped when limit <= 0, matching JVM Clojure's default String.split(re, 0) behaviour. The string-separator path is unchanged.

Regression test: tests/string_test.clj split-with-regex.

Workaround revert

parse-reloc-defines in lib/mino/tasks/builtin.clj (introduced in v0.218.0 to dodge both bugs) is back to its canonical regex form. G3 check-reloc-mirror continues to pass.

v0.218.0 — CI Guardrails For Stencil + Reloc Drift

Closes the three-release post-cycle hygiene cluster. Adds three independent check tasks plus a composite release-gate that fails fast on the first non-OK step. The verbal "kept in sync" comments that used to live around the stencil registry and the reloc enum are now build-time contracts that fire on drift.

New tasks

- check-stencils-fresh (G1) -- runs gen-stencils, then git diff --exit-code against src/eval/bc/stencils/generated/. Non-zero on stale committed byte tables. Fix: ./mino task gen-stencils && git add .... - check-stencil-registry (G2) -- cross-checks the hardcoded stencil list in lib/mino/tasks/builtin.clj against src/eval/bc/stencils/*.c. Catches both directions: registry entries that reference non-existent sources, and orphan source files with no registry entry. Also pins stencil_op_<basename> prefix shape. __proto_-prefixed sources are skipped on both sides -- that prefix is reserved for the op-fusion prototype branch's throwaway stencils. - check-reloc-mirror (G3) -- parses MINO_STENCIL_RELOC_* #defines from src/eval/bc/jit/internal.h (runtime side) and tools/stencil_extract.c (toolchain side), asserts the shared keys agree on integer value; then runs tools/stencil_extract --selftest for the extractor's own internal consistency. Cross-file value mismatch fires independent of either file's internal consistency. - release-gate -- composite, fail-fast, in order: 1. check-reloc-mirror 2. check-stencil-registry 3. check-stencils-fresh 4. test-suite 5. mino_asan + suite 6. test-jit-parity

Verification

./mino task release-gate exits 0 on a clean tree. Each guardrail verified individually under its negative control:

- G1: flipped + to - in add_ii.c; check-stencils-fresh reports the generated header is stale. - G2: added an orphan fake_op.c to src/eval/bc/stencils/; check-stencil-registry reports the missing registry entry. - G3: flipped MINO_STENCIL_RELOC_ARM64_PAGE21 value in src/eval/bc/jit/internal.h; check-reloc-mirror reports runtime=99 extractor=0.

All controls reverted and release-gate re-runs clean.

v0.217.0 — Boundary Parity Tests Between JIT And Interpreter

Adds tests/jit_parity_test.clj (47 deftests) plus two task-runner entries that build ./mino (JIT enabled) and ./mino_nojit (-DMINO_CPJIT=1 stripped) and assert their stdout bytes match byte-for-byte when running the parity test. Second of three post-cycle hygiene releases.

What the parity test pins

Each of the 16 inlined arith / cmp / unary stencils now has at least one deftest exercising it through a hot wrapper fn warmed past MINO_JIT_THRESHOLD:

- Range boundaries for II / IK arith (ADD / SUB / MUL): MINO_INT_MAX, MINO_INT_MIN, overflow-promotion paths. - Tag-miss for all 16 ops: each handler called with a non-int operand so the inline tag check fails and the stencil dispatches to its mino_jit_*_slow helper. - Comparison-result identity for the 8 cmp ops (< <= > >= = in II, plus < <= = in IK). - Unary boundaries for INC_I / DEC_I / ZERO_INT_P.

Each assertion compares against a literal expected value; the binary stdout-diff catches anything the literal misses (different diagnostic strings, different boxed-int representations on overflow, divergent coercion).

Build / task additions

- build-nojit task -- produces ./mino_nojit by filtering -DMINO_CPJIT=1 out of the runtime CFLAGS. The bytecode interpreter handles every op; the JIT module compiles to its no-op stubs. - test-jit-parity task -- builds both binaries, runs the parity test against each, asserts byte-identical stdout AND both exit 0. Uses sh (not sh!) so a non-zero exit reports as a parity failure rather than crashing the task; writes jit-parity-jit.out and jit-parity-nojit.out plus a diff summary when divergent.

Verification

./mino task test-jit-parity exits 0 on a clean tree (47 tests / 47 assertions on both binaries). Negative control verified by flipping r = lhs + rhs to r = lhs - rhs in src/eval/bc/stencils/add_ii.c, regenerating stencils, and re-running: parity fails with three assertion mismatches in add-ii-normal / add-ii-max-overflow / add-ii-min-underflow. Sabotage reverted.

./mino task test (1735 tests / 7903 assertions) and ASan suite both green.

v0.216.0 — Split jit.c Into Five Translation Units

Refactor-only: src/eval/bc/jit.c (2012 lines) splits into a jit/ subdirectory of five one-word-named files plus one private header, matching the existing convention used under src/gc/ and src/collections/. No behaviour change. The first of three post-cycle hygiene releases that retire structural debts surfaced by the CPJIT-speedup cycle's external review.

New layout:

- src/eval/bc/jit/entry.c -- host detection, layout asserts, stencil descriptor table, eligibility classifier, extern-helper resolution, public entry points (mino_jit_compile, mino_jit_invoke, mino_jit_invalidate, mino_jit_offset_to_pc). - src/eval/bc/jit/stats.c -- MINO_CPJIT_STATS=1 attribution sink. - src/eval/bc/jit/helpers.c -- the mino_jit_*_slow cold helpers that stencils dispatch into on a fast-path miss. - src/eval/bc/jit/patcher.c -- ARM64 instruction patchers (adrp / pageoff12 / branch26 / imm19) plus direct-emit byte templates and the trampoline writer. - src/eval/bc/jit/emit.c -- region book-keeping (mino_jit_free_all), per-instance emit_stencil, and the top-level mino_jit_compile_inner two-pass copy-and-patch walk. - src/eval/bc/jit/internal.h -- private interface shared between the five TUs.

Verification

JIT'd region bytes for fib(25) are byte-identical pre- and post-split under MINO_CPJIT_TRACE=2 (only the mmap base address varies between runs of either binary, as expected). Test suite (1688 tests / 7854 assertions) and ASan suite both green.

Seams the split opens

The five-file shape isolates the seams the next two cycles will pull on. patcher.c will gain non-ARM64 patchers (x86_64, ARM64 Linux) without touching the emit pipeline; stats.c can be quietly omitted from a -DMINO_CPJIT_NO_STATS build; internal.h carries the cross-TU contract that the v0.218 stencil-registry guardrail will check.

v0.215.0 — CPJIT Speedup Cycle Close

Closes the speedup follow-on cycle that began at v0.210.0. The cycle took the coverage cycle's shape eligibility (v0.203.0-v0.209.0) and moved the per-op cost on hot paths from "bl into slow helper that does the work" to "inline the work, bl only on miss." Five releases shipped:

- v0.210.0 Stencil runtime-layout header (foundation, no-op) - v0.211.0 Inline OP_GETGLOBAL_CACHED hit path - v0.212.0 Inline INC_I / DEC_I / ZERO_INT_P fast lanes - v0.213.0 Inline arith II + IK families (13 stencils) - v0.214.0 Inline OP_CALL_CACHED resolve fast path

The originally-planned v0.215.0 (self-recursive tail-call shortcut via a new OP_TAILCALL_SELF opcode) is deferred. Its surface widens the cycle from "JIT inlining" into "bytecode shape evolution" -- the new opcode needs a compile.c-side recogniser plus an interpreter handler -- and the plan flagged the scope-creep risk explicitly. The remaining work, including the self-tail-call shortcut, moves to a future cycle on bytecode-shape evolution.

Speedup (median of three runs, JIT vs no-JIT'd baseline v0.210.0)

realistic_bench -- the cycle's headline suite:

| Workload | v0.210.0 | v0.214.0 | Ratio | |-----------------------------------|------------|------------|-------| | fibonacci(25) | 7.75ms/op | 6.53ms/op | 1.19x | | map/filter/map/reduce over 50k | 773us/op | 730us/op | 1.06x | | build 5k int-map and sum | 11.31ms/op | 11.37ms/op | 0.99x | | bump 5k int-map values | 20.21ms/op | 20.22ms/op | 1.00x | | nested vectors 500x100 | 21.66ms/op | 21.4ms/op | 1.01x | | realize 10k of lazy range | 6.80ms/op | 6.7ms/op | 1.02x |

jit_bench leaf rows (one op per iter, dominated by wrapper-call overhead): all rows within run-to-run noise (0.97x - 1.02x).

tests/run.clj wall-clock: 16.04s -> 16.03s, ratio ~1.00x.

Speedup gate evaluation

The plan asked for:

- jit_bench eligible rows geomean >= 1.5x -- not hit (rows at ~1.0x; the per-iter work is too thin to expose the inline savings) - realistic_bench geomean >= 1.3x -- not hit (geomean ~1.04x; fib carries the suite at 1.19x, others flat) - tests/run.clj wall-clock >= 1.2x -- not hit (1.00x) - No row in any suite below 0.95x -- hit (lowest 0.99x)

The cycle's strongest single row is fibonacci(25) at 1.19x, just below the gate's 1.2x floor.

Honest read on the shortfall

The plan's hypothesis was that helper-call overhead was 30-50% of per-stencil cost. The measurement landed at "real but small": the inline checks save the bl plus prologue/epilogue, ~5-10ns per stencil, but the workloads' iter cost is dominated by either the unavoidable apply_callable_argv dispatch (call-heavy) or by code outside the JIT region (collection allocation in build/bump, HAMT walks in pipeline, lazy-cell allocation in lazy-range). The inline savings sit below the iter-cost floor on most rows.

Fibonacci is the outlier because every iter is two OP_CALL_CACHED ops back-to-back with virtually no non-JIT work between them; the per-op savings stack visibly.

The cycle nonetheless lands durable infrastructure that subsequent cycles can reuse:

- Stencil-layer runtime-layout header with offset constants and accessor macros, verified at jit.c compile time against the canonical struct layout. Future inlining work routes through these without reopening the type-visibility question. - Multi-ret chain patching in mino_jit_compile. Previously the patcher rewrote only the first ret in a stencil's span; any inlined fast path that didn't share a prologue with the cold path would short-circuit out of the JIT region on miss. Surfaced as a silent-assertion-drop bug in clojure.test the moment v0.211 landed. The fixed patcher rewrites every ret, unblocking arbitrary inline-fast-path patterns. - S->jit_invoke_ctx publication in mino_jit_invoke. The stencil layer reads ctx->dyn_stack via a single fixed-offset load from S, dodging the Darwin TLVP relocations the stencil_extract tool does not model. Same field will carry other per-invocation context the JIT layer needs in subsequent cycles. - mino_jit_call_resolved_slow helper -- the smallest viable post-resolve dispatch path. Reusable by any future cached-call stencil that resolves the callee inline.

Cycle deliverables vs decisions deferred

Shipped (5 releases). Deferred:

- Self-recursive tail-call shortcut: requires a new OP_TAILCALL_SELF bytecode opcode + recogniser in compile.c + interpreter handler. Out-of-cycle: bytecode-shape work belongs to its own cycle. - jit_bench in perf_gate.clj: not wired. The gate floor here is below the row variance; pinning a gate that the workload's own run-to-run noise can violate would just flag false positives. Revisit after a cycle that delivers a cleaner per-row signal.

Verification

- make -j8 clean (release + ASan). - 1688 tests / 7854 assertions pass under release + ASan. - gen-stencils produces a clean stencils_arm64_darwin.h. - Full bench matrix captured against v0.210.0 baseline; deltas tabled above.

Next

A bytecode-shape evolution cycle owns the deferred self-tail-call work and the remaining unknown-op long tail (OP_GET_KW_MAP, OP_FIRST_VEC, OP_NTH_VEC, OP_MAKE_LAZY, OP_THROW). The portability cycle (x86_64 / ARM64 Linux / Windows COFF) remains scheduled after the speedup-and-shape work settles.

v0.214.0 — Inline OP_CALL_CACHED Resolve Fast Path

The OP_CALL_CACHED stencil now inlines the IC-slot hit check (same shape as OP_GETGLOBAL_CACHED inlined in v0.211): read the slot, verify cached / gen / dyn_stack, and on hit hand the pre-resolved callee directly to a thin new helper that skips the IC cascade and goes straight to apply_callable_argv. On miss (slot unfilled, gen stale, or dyn binding active) the stencil falls through to the existing mino_jit_call_cached_slow which runs the full IC resolve.

The new mino_jit_call_resolved_slow helper is the smallest viable post-resolve path: it pulls env off the published ctx and dispatches through apply_callable_argv. Same regs / GC refresh contract as the cached_slow path.

The actual call into apply_callable_argv is unavoidable at this release -- it walks the callable's dispatch table and may invoke arbitrary mino code that triggers GC. The win on the hit branch is purely the IC-resolve step (a function call, a slot reread, a gen compare, a dyn-stack reread). Per-call savings are a handful of cycles, but high-frequency cached call sites compound them.

Measurement

Median of three runs on ARM64 Darwin:

| Workload | v0.210.0 | v0.214.0 | Ratio | |-------------------------------------|------------|------------|-------| | realistic_bench fibonacci(25) | 7.75ms/op | 6.53ms/op | 1.19x | | realistic_bench map/filter/m/r 50k | 773us/op | 730us/op | 1.06x | | realistic_bench build 5k int-map | 11.31ms/op | 11.37ms/op | 0.99x | | realistic_bench bump 5k int-map | 20.21ms/op | 20.22ms/op | 1.00x |

Fibonacci is the visible signal. Pure-recursive arithmetic with no intermediate allocations, every iter is (+ (fib (- n 1)) (fib (- n 2))): the body has two OP_CALL_CACHED ops and the call frequency dwarfs everything else. The inline-resolve fast path saves a bl per call. Over the ~150,000 calls fib(25) generates, that adds up to the observed 19% speedup.

Workloads dominated by collection mutation (build/bump 5k int-map) or HAMT / lazy-seq traversal (map/filter/map/reduce) spend most of their time outside the JIT region in mino_assoc / mino_conj / the lazy realisation slow paths. Their per-row deltas stay at run-to-run noise.

Speedup-gate state

Cumulative through v0.214.0: the cycle's headline row family now includes a >= 1.2x row. Fibonacci hits 1.19x, just below the gate floor; map/filter/map/reduce sits at 1.06x; the rest are flat.

The mid-cycle gate (after v0.212) called for 1.2x; v0.214 brings the strongest row to within 1 percentage point. Continuing to v0.215 (self-recursive tail-call shortcut) -- fibonacci has been retained specifically as the workload where v0.215 should compound, since fib's tail-position recur-on-self pattern is exactly what the new short-circuit targets.

Verification

- make -j8 clean (release + ASan). - 1688 tests / 7854 assertions pass under release + ASan. - gen-stencils produces a clean stencils_arm64_darwin.h.

v0.213.0 — Inline Arith II + IK Families

Thirteen stencils that had been bl-ing into binop_int_fast now inline the tagged-int fast lane:

- II arith: OP_ADD_II, OP_SUB_II, OP_MUL_II. Both operands tagged-int, 64-bit signed arithmetic, range check against MINO_INT_MAX / MIN, tag-encoded store. OP_MUL_II uses __builtin_smulll_overflow for the 60x60-bit product since long long is 64 bits. - II comparison: OP_LT_II, OP_LE_II, OP_GT_II, OP_GE_II, OP_EQ_II. Both operands tagged-int, 64-bit signed compare, tagged-bool store. No overflow path. - IK arith: OP_ADD_IK, OP_SUB_IK. Lhs tagged-int (the pre-tagged immediate is trusted); range check; tag-encoded store. - IK comparison: OP_LT_IK, OP_LE_IK, OP_EQ_IK. Lhs tagged-int; compare; tagged-bool store.

On a tag miss (boxed int, non-numeric, mixed type) or arith over/underflow each stencil falls through to the existing mino_jit_binop_slow / mino_jit_binop_k_slow helpers, which route through prim_add / prim_lt / etc. for the boxed-int / bigint-promote / diagnostic paths.

Measurement

Median of three runs on ARM64 Darwin:

| Workload | v0.210.0 | v0.213.0 | Ratio | |------------------------------------|------------|-----------|-------| | jit_bench (add 1 2) x 1M | 1.80us/op | 1.77us/op | 1.02x | | jit_bench (sub 3 2) x 1M | 1.79us/op | 1.76us/op | 1.02x | | jit_bench (mul 2 3) x 1M | 1.76us/op | 1.73us/op | 1.02x | | jit_bench (sum-to 1000) x 1K | 19.04us/op | 19.43us/op| 0.98x | | jit_bench (countdown 1000) x 1K | 2.41us/op | 2.38us/op | 1.01x | | realistic_bench map/filter/m/r | 779us/op | 730us/op | 1.07x |

Speedup-gate state

Through v0.213.0, the cumulative speedup on the cycle's headline row (realistic_bench map/filter/map/reduce) sits at ~1.07-1.12x across runs vs the v0.210.0 baseline. The plan asked for 1.2x by the end of v0.212; we are at 1.12x peak. v0.213's per-row deltas are smaller still: the leaf arith benches in jit_bench have one arith op per iter, so the inline savings are below the iter-cost floor. The realistic workload exercises the unary fast lane (v0.212) more than the binary arith fast lane on its hot path.

Continuing to v0.214 (inline OP_CALL_CACHED resolve fast path) -- the cycle's highest-volume cached op and the one with the biggest expected single-release contribution to the cycle's target. If the v0.214 + v0.215 close still leaves the headline row below 1.2x we will document the gap in the cycle-close entry rather than press on with v0.216-equivalent work.

Verification

- make -j8 clean (release + ASan). - 1688 tests / 7854 assertions pass under release + ASan. - gen-stencils produces a clean stencils_arm64_darwin.h.

v0.212.0 — Inline INC_I / DEC_I / ZERO_INT_P Fast Lanes

The three single-operand tagged-int stencils that had been calling unop_int_fast via bl now inline the entire fast lane in the stencil bytes:

- OP_INC_I: tag check on operand, increment, overflow check against MINO_INT_MAX, tag-encoded store. - OP_DEC_I: tag check, decrement, underflow check against MINO_INT_MIN, tag-encoded store. - OP_ZERO_INT_P: tag check, compare to zero, tag-encoded bool store via the new MINO_MAKE_BOOL macro added to runtime_layout.h.

On a tag miss (boxed int, non-numeric, etc.) or arith over/underflow the stencils fall through to the existing mino_jit_unop_slow helper, which routes through prim_inc / prim_dec / prim_zero_p exactly as the v0.210 stencils did.

Mechanism reuses the v0.211 patch-all-rets infrastructure: the inline fast path lands in the lean basic block; the cold slow path sits behind the prologue / call. Both ret instructions chain to the next stencil.

Measurement

Median of three runs on ARM64 Darwin:

| Workload | v0.210.0 | v0.212.0 | Ratio | |------------------------------------|------------|-----------|-------| | jit_bench (inc 1) x 1M | 1.78us/op | 1.75us/op | 1.02x | | jit_bench (dec 1) x 1M | 1.80us/op | 1.79us/op | 1.01x | | jit_bench (zero? 0) x 1M | 1.80us/op | 1.79us/op | 1.01x | | realistic_bench map/filter/m/r | 849us/op | 758us/op | 1.12x | | jit_bench (countdown 1000) x 1K | 2.42us/op | 2.39us/op | 1.01x |

The jit_bench unary rows -- (fn [] (inc-fn 1)) etc. -- still sit at the noise floor. Each iter has one OP_INC_I per call, and the inline saves only the bl plus helper prologue, ~5ns out of a 1.75us iter.

The realistic_bench map/filter/map/reduce over 50k row is the visible signal: a 12% improvement on a workload whose hot path runs many tagged-int arith operations per iter. v0.210.0 → v0.211.0 moved that row marginally; v0.211.0 → v0.212.0 contributes the unary-fast-lane savings on top.

Speedup-gate state

The plan's mid-cycle risk callout asked: "If v0.211 + v0.212 together don't show >= 1.2x on at least one bench row, pause the cycle." Map/filter/map/reduce comes in at 1.12x -- a real, reproducible win that beats the 0.95x regression floor but lands below the 1.2x target.

Decision: continue to v0.213.0 (the arith / comparison II + IK family, thirteen stencils). v0.213.0 is the cycle's per-op-density release: a body with five tagged-int ops sees five inline checks per iter instead of five bl's, multiplying the per-op saving by the body's arith density. If v0.213.0 also lands below the 1.2x mark on the headline row, pause and re-profile.

Verification

- make -j8 clean (release + ASan). - 1688 tests / 7854 assertions pass under release + ASan. - gen-stencils produces a clean stencils_arm64_darwin.h.

v0.211.0 — Inline OP_GETGLOBAL_CACHED Hit Path

First release that consumes runtime_layout.h. The OP_GETGLOBAL_CACHED stencil now reads the IC slot, verifies cached / gen / dyn_stack inline, and writes the cached value to regs[A] without bl-ing into the slow helper. On miss (slot unfilled, gen stale, or dyn binding active) the stencil falls through to mino_jit_getglobal_cached_slow which runs the same full resolve cascade the interpreter does.

Three pieces had to land together

1. Published ctx on mino_state_t::jit_invoke_ctx. Darwin's __thread access compiles to a TLVP-class relocation (ARM64_RELOC_TLVP_LOAD_PAGE21 / _PAGEOFF12) that the stencil_extract tool does not model. To let the inline fast path read ctx->dyn_stack without a TLS round-trip, mino_jit_invoke now publishes the calling thread's ctx on the state struct before jumping into the JIT region and restores it on return. The stencil reaches the ctx via a single fixed-offset load from S. Save / restore supports nested JIT entry.

2. Patch every ret in a non-final stencil's span, not just the first. Inlining the hit path produced two basic blocks: a lean fast-path exit (no prologue, immediate ret) and a cold slow-path exit (with prologue + helper bl, ending in its own ret). The previous chain-patcher walked to the first ret and rewrote it to a b <next>; the slow path's ret survived and short-circuited out of the JIT region whenever a cache miss hit. The patcher now walks the full span and rewrites every ret to chain. Surface silently broke any test whose hot fn read a dyn-bound global (every clojure.test assert-pass! invocation, for example) -- the fast path bailed correctly, the slow helper resolved correctly, but the second ret returned mid-stencil before the rest of the fn body ran.

3. runtime_layout.h typedef gating widened. Stencil .c sources include abi.h (which typedefs mino_val_t, mino_state_t, mino_bc_fn_t) and now also runtime_layout.h. The redefinition-of-typedef warning under -std=c99 -Wpedantic would trip on the second declaration; the gating guards now also recognise MINO_BC_STENCIL_ABI_H.

Measurement

jit_bench and realistic_bench deltas vs v0.210.0 baseline, median of three runs on the same hardware (ARM64 Darwin):

| Workload | v0.210.0 | v0.211.0 | Ratio | |------------------------------------|-----------|-----------|-------| | jit_bench (id 7) x 1M | 1.52us/op | 1.56us/op | 0.97x | | jit_bench (add 1 2) x 1M | 1.90us/op | 1.86us/op | 1.02x | | jit_bench (mul 2 3) x 1M | 1.74us/op | 1.83us/op | 0.95x | | jit_bench (sum-to 1000) x 1K | 19.4us/op | 19.7us/op | 0.98x | | realistic_bench map/filter/m/r | 782us/op | 819us/op | 0.95x |

Honest read: every row is within run-to-run noise. The mechanism lands but the per-row inline-savings on these workloads is below the measurement floor.

The reason is that each bench has at most one or two OP_GETGLOBAL_CACHED ops on the hot path (one in the wrapper, one inside the inner fn's body); each saves about a bl plus a prologue / epilogue pair, roughly 5-10ns. With the workload itself costing 1-20us per iter, the saved fraction is sub-1%. The win compounds over subsequent releases: v0.212.0 inlines the unary fast lanes (INC_I / DEC_I / ZERO_INT_P), v0.213.0 inlines the II / IK arith and comparison families, v0.214.0 inlines OP_CALL_CACHED's resolve step. Together a single JIT region's stencil chain ends up doing five-to-ten inline checks per iter, and the iter cost floor shifts visibly.

Verification

- make -j8 clean (release + ASan). - 1688 tests / 7854 assertions pass under release and ASan. Both suites turned out to be the surface that exposed the short-circuit bug: pre-fix, the suite ran but assert-pass!'s JIT'd body returned mid-region on every dyn-bound read, silently dropping the assertion counters; post-fix, both come back green. - gen-stencils produces a clean stencils_arm64_darwin.h.

Next

v0.212.0 inlines the unary tagged-int fast lanes (INC_I, DEC_I, ZERO_INT_P). Each is a single tag check + arithmetic + overflow check + tag-encoded store. The plan's risk gate at the v0.211 + v0.212 checkpoint: if neither release shows >= 1.2x on a real-workload row, pause and re-profile. v0.211.0 in isolation landed below that line. The mechanism is sound; the question for v0.212.0 is whether the unary tag-check inline crosses the measurement floor.

v0.210.0 — JIT Stencil-Layer Runtime-Layout Header

Foundation release. Lands src/eval/bc/stencils/runtime_layout.h, a curated stencil-side view of the runtime struct fields the JIT's inline fast paths will read in subsequent releases.

Stencil .c sources compile standalone with -fno-builtin -fno-optimize-sibling-calls and no -I path beyond stencils/, so they cannot reach mino.h, mino_internal.h, runtime/internal.h, or eval/bc/internal.h directly. Today every cached / arith / call stencil routes through a slow helper via bl; the helper contains the same fast-path-then-resolve logic the interpreter runs inline, so the JIT pays for the bl and the chain-ABI dance while matching (rather than beating) the interpreter's per-op cost on hit paths.

The follow-on releases on this branch move the IC-slot check, tagged-int arith, and call-resolve fast paths inline inside the stencil bytes; on hit the stencil never bls. That requires visibility into mino_state_t::ic_gen, mino_state_t::bc_regs, mino_thread_ctx_t::dyn_stack, and mino_bc_fn_t::ic_slots from within stencil sources. The new header provides that visibility without dragging the canonical headers into the hermetic stencil compilation unit.

Mechanism:

- Forward-declared struct tags + selectively-gated typedefs (each gated by the canonical header's MINO_H / RUNTIME_INTERNAL_H / MINO_EVAL_BC_INTERNAL_H include guard) so this header is also includable from runtime translation units that already see the canonical definitions. - The IC slot struct is mirrored field-for-field; tagged-int macros (MINO_TAG_INT, MINO_INT_VAL, MINO_MAKE_INT, MINO_INT_MAX, etc.) are re-exported. - Layout-anchor offset constants (MINO_JIT_LAYOUT_OFFSET_*) and accessor macros (MINO_JIT_STATE_IC_GEN, MINO_JIT_CTX_DYN_STACK, MINO_JIT_BC_IC_SLOTS, MINO_JIT_CURRENT_CTX) for fields the stencils will read. - mino_tls_ctx re-declared so the inlined MINO_JIT_CURRENT_CTX macro can do TLS load + main_ctx fallback without a bl. - src/eval/bc/jit.c (which sees both the canonical typedefs and the new header) fires C99-compatible build-time asserts against offsetof(<real struct>, <field>), so any field reorder in runtime/internal.h or eval/bc/internal.h surfaces as a compile error in jit.c rather than a stencil mis-read at runtime.

No stencil .c source consumes runtime_layout.h at this release. Re-running gen-stencils produces a byte-identical stencils_arm64_darwin.h against v0.209.0. Behavioural no-op.

Verification

- make -j8 clean. - 1688 tests / 7854 assertions pass under both release and ASan. - git diff src/eval/bc/stencils/generated/stencils_arm64_darwin.h against v0.209.0 is empty after gen-stencils.

Next

v0.211.0 inlines the OP_GETGLOBAL_CACHED hit path through the new accessor macros. That is the first release that should show a measurable speedup on hot-loop fns reading globals; today's helper bl becomes a load + compare + branch.

v0.209.0 — CPJIT Coverage Cycle Close

Closes the seven-release CPJIT coverage cycle. The cycle's headline deliverable was eligibility coverage -- moving the JIT from "leaf fns with recognized ops" to "calls + globals + closures + multi-arity all work" -- with measurement infrastructure that survives the cycle. The infrastructure landed; the eligibility coverage hit the plan's ≥80% target. The speedup gate the plan also called for did not.

Eligibility coverage (the headline)

MINO_CPJIT_STATS=1 on tests/run.clj:

| Release | attempts | eligible | top blocker | |-----------|----------|----------|---------------------| | v0.202.0 | 1 | 100% | (only top-level) | | v0.203.0 | 69 | 18.8% | ic-slots (46%) | | v0.204.0 | 69 | 20.3% | unknown-op (45%) | | v0.205.0 | 69 | 26.1% | OP_TAILCALL (20%) | | v0.206.0 | 121 | 68.6% | captures (12%) | | v0.207.0 | 126 | 76.2% | captures (12%) | | v0.208.0 | 143 | 83.9% | unknown-op (long tail) |

MINO_CPJIT_STATS=1 on jit_bench:

| Release | attempts | eligible | |-----------|----------|----------| | v0.202.0 | ~17 | ~44% | | v0.208.0 | 39 | 94.9% |

The 23 fns still rejected at cycle end are all blocked by unstencilised body ops in their bodies (OP_GET_KW_MAP x6, OP_FIRST_VEC / OP_COUNT_VEC / OP_EMPTY_VEC x2-3 each, OP_MAKE_LAZY x6, OP_THROW x1, etc.). The closure / call / global eligibility set is fully covered; what remains is a long tail of small read- and write-side stencils, scheduled outside this cycle.

Raw speedup (JIT vs no-JIT, v0.208.0 binary)

jit_bench counted-loop bodies (arm64 / Darwin, single-shot):

| Workload | JIT | no-JIT | Speedup | |------------------|---------|---------|---------| | sum-to 100 | 3.31us | 3.44us | 1.04x | | sum-to 1000 | 18.65us | 19.27us | 1.03x | | count-to 1000 | 3.59us | 3.57us | 0.99x | | countdown 1000 | 2.22us | 2.40us | 1.08x | | lockstep 1000 | 3.90us | 3.92us | 1.01x | | abs -5 | 1.69us | 1.73us | 1.02x | | (+ x 3) x 1M | 1.71us | 1.63us | 0.95x |

realistic_bench:

| Workload | JIT | no-JIT | Speedup | |-------------------------|---------|---------|---------| | fibonacci(25) | 7.43ms | 8.88ms | 1.20x | | map/filter/map/reduce | 704us | 757us | 1.08x | | realize 10k lazy range | 6.39ms | 6.60ms | 1.03x | | nested vectors 500x100 | 20.56ms | 21.17ms | 1.03x | | bump 5k int-map values | 18.66ms | 18.79ms | 1.01x | | build 5k int-map + sum | 11.17ms | 10.43ms | 0.93x |

Test suite wall-clock (best of 3 runs):

| Run | JIT | no-JIT | |--------|--------|--------| | best | 14.77s | 14.91s | | avg | 14.88s | 14.97s |

The plan's targets (jit_bench geomean ≥1.5x, no row <0.95x, test suite ≥1.2x) are not met. The cycle's slow-helper-routed stencils call into mostly the same prim / IC / dispatch paths the interpreter takes inline; clang's chain-ABI register dance plus the per-stencil bl overhead consume the savings that come from skipping the bytecode-dispatch fetch + switch. The cycle's expected unlock -- shape coverage -- is real and large. The speedup unlock the plan also predicted needs a follow-on cycle that moves the cached-fast-paths inline (slot.gen comparison inside the stencil bytes, callee-bc comparison for self-recursive tail-calls, etc.).

Cycle decisions / deferrals

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.208.0 — Closure / Env Stencils

Drops the captures blocker by stencilising the four env-related ops the bytecode compiler emits when a fn body contains an inner fn literal:

- OP_CLOSURE -- create a fresh closure over the current env, bind the child bc, store at regs[A]. - OP_PUSH_ENV -- extend the JIT-invoke env with a new frame. - OP_POP_ENV -- walk the env up one frame. - OP_ENV_BIND -- publish regs[A] under a symbol into the env.

All four route through the jit_invoke_env ctx field v0.204.0 established, so reads via OP_GETGLOBAL_CACHED's env-lookup branch (the only path that finds env-bound names) see the same env an interpreter pass would see at the same pc. The slow helpers' bodies mirror the interpreter handlers line-for-line.

mino_jit_invoke's save / restore of jit_invoke_env around the JIT region's call means nested JIT regions inherit the right chain at entry and don't leak modifications past the chain's trailing ret.

Eligibility-tracer comparison (MINO_CPJIT_STATS=1, tests/run.clj):

| Reason | v0.207.0 | v0.208.0 | |-------------------|----------|----------| | ok | 96 | 120 | | captures | 15 | 0 | | unknown-op | 15 | 23 | | total eligibility | 76.2% | 83.9% |

The unknown-op rise reflects the cycle's now-visible long tail: unstencilised body ops the closure-eligible fns happen to use -- OP_GET_KW_MAP (6 fns), OP_FIRST_VEC / OP_COUNT_VEC / etc. (2-3 fns each), OP_MAKE_LAZY (6 fns), OP_THROW (1 fn). All are out-of-cycle follow-on stencil work.

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.207.0 — Multi-Arity + Variadic Eligibility

Drops the n_clauses != 1 and has_rest blockers from the eligibility check. Multi-arity fns and & rest variadics are now JIT-eligible.

The two pieces are asymmetric in how much new machinery they need:

- has_rest is free. mino_bc_run's entry-time dispatch builds the rest-collection cons list and places it at regs[n_params] before the JIT region runs; the body reads that slot the same way it reads any other local. The JIT path never sees the rest-collection step. The blocker existed only to keep the eligibility check minimal during the early stencil rollout.

- n_clauses > 1 needs an entry guard. Each clause has its own entry_pc into the shared bytecode stream. The JIT region is one block of mmap'd code with a single ARM64 function prologue at the front -- entering mid-region would skip the callee-saved register saves and corrupt the caller's frame on epilogue. So the JIT-invoke check in mino_bc_run now also gates on match->entry_pc == 0: the JIT fires when the matched clause starts at the region's front (typically clauses[0], the first source-order arity), and falls back to the interpreter for other clauses. Lifting that constraint needs per-clause native entry points; a follow-on cycle's work.

Eligibility-tracer comparison (MINO_CPJIT_STATS=1, tests/run.clj):

| Reason | v0.206.0 | v0.207.0 | |-------------------|----------|----------| | ok | 83 | 96 | | captures | 15 | 15 | | n-clauses | 7 | 0 | | has-rest | 2 | 0 | | unknown-op | 14 | 15 | | total eligibility | 68.6% | 76.2% |

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.206.0 — OP_CALL Uncached + OP_TAILCALL Stencils

Closes the call story for the cycle's eligibility set. Adds the remaining two call-class stencils:

- OP_CALL -- non-final, uncached path. Reads the callee out of regs[A], hands argv at regs[A+1..A+B] to apply_callable_argv, stores the return value at regs[C]. Compiled for sites where the head isn't a statically-known global (head is itself an expression, a local fn-value, an inline lambda, etc.).

- OP_TAILCALL -- FINAL stencil. Builds the args cons list head-first, publishes (callee, args) on S->tail_call_sentinel, returns the sentinel pointer. The stencil's natural ret hands it back to mino_bc_run, whose caller (apply_callable's trampoline) re-dispatches without growing the C stack. Marking the stencil FINAL keeps the ret in place; subsequent stencils after this pc in the same JIT region are dead code that never runs.

Both helpers read env from jit_invoke_env (the publish point v0.204.0 established), so callable-side env-dependent paths (closures, dynamic resolution against the call frame's binding) keep working.

Eligibility-tracer comparison across the cycle so far (MINO_CPJIT_STATS=1):

| Workload | v0.202.0 | v0.205.0 | v0.206.0 | |----------------------|----------|----------|----------| | tests/run.clj attempts | ~1 | 69 | 121 | | tests/run.clj eligible | ~100% | 26.1% | 68.6% | | jit_bench.clj attempts | 39 | 39 | 39 | | jit_bench.clj eligible | 43.6% | 43.6% | 94.9% |

(The v0.202.0 tests/run.clj number is dwarfed by the v0.202.0 → v0.203.0 ABI fix that wired the bc-call ABI's warming path through mino_jit_compile; until that landed the runtime barely attempted any compiles outside top-level invocations.)

jit_bench reaches 95% eligibility -- the cycle's "real mino code can JIT" goal is essentially hit for the bench workload. The remaining blockers on tests/run.clj split across:

- 15 fns blocked by captures (closures -- v0.208.0) - 7 fns blocked by n-clauses != 1 (multi-arity -- v0.207.0) - 2 fns blocked by has-rest (variadic -- v0.207.0) - 14 fns blocked by various unstencilised body ops (OP_NTH_VEC, OP_ASSOC, OP_COUNT_VEC, etc.) -- these are follow-on stencil work outside this cycle.

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.205.0 — OP_CALL_CACHED Stencil + Two-Word Op Handling

Adds the JIT stencil for OP_CALL_CACHED, the fused (resolve-global + call) op the bytecode compiler emits for any call whose head is a global symbol. The stencil drives the same shared IC cascade OP_GETGLOBAL_CACHED uses (added in v0.204.0) and then hands off to apply_callable_argv, so PRIM-fn / FN-bc / multi-arity / record-method targets all reach their correct entry.

OP_CALL_CACHED is the bytecode's first two-word instruction in the JIT pipeline. The compile walk and the eligibility check both grow a new helper, op_extra_words, that returns the number of trailing words an opcode consumes after its primary word. The walks consult it to advance pc the way the interpreter consumes those words via its handler's code[pc++]. Without the skip, the eligibility loop classifies the slot-bearing word (whose OP_OF field is the placeholder OP_NOP) as an unknown op.

emit_stencil grows an insn2 parameter and a new IMM_KIND_BX2 immediate kind so a stencil can pull the slot index out of the trailing word at JIT-compile time. The pool slot the JIT writes for IMM_BX2 is the same 16-bit unsigned the interpreter reads via Bx_OF(slot_word).

The chain ABI carries through unchanged. The existing MINO_STENCIL_CHAIN_RETURN macro already pins x2 = S at the trailing ret via a register-asm pin, and the struct return pins x1 = consts via AAPCS, so a mid-stencil bl into the slow helper (which itself calls apply_callable_argv and clobbers every caller-saved register) leaves clang to spill / reload the chain registers naturally. No new spill discipline at the stencil source layer.

Eligibility-tracer comparison (MINO_CPJIT_STATS=1, tests/run.clj):

| Reason | v0.204.0 | v0.205.0 | |-------------------|----------|----------| | ok | 14 | 18 | | captures | 15 | 15 | | n-clauses | 7 | 7 | | has-rest | 2 | 2 | | unknown-op (op=19, OP_CALL_CACHED) | 12 | 0 | | unknown-op (op=8, OP_TAILCALL) | 8 | 14 |

OP_CALL_CACHED rejections (op=19) drop to zero. Most fns that were previously double-blocked (OP_CALL_CACHED + OP_TAILCALL) now reveal the tail-call as the remaining hot blocker, which the next release stencilises.

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.204.0 — OP_GETGLOBAL_CACHED Stencil

Adds the JIT stencil for OP_GETGLOBAL_CACHED and drops the ic_slots_len > 0 blocker from the eligibility check. Fns that read global vars (effectively every defn with (inc ...) / (< ...) / (map ...) / call to any name in clojure.core) now walk past the ic-slot gate; whether they then become JIT-eligible depends on what other ops their body uses.

The stencil routes through a new slow helper mino_jit_getglobal_cached_slow that calls mino_bc_ic_global_load -- the same entry point the interpreter uses for its OP_GETGLOBAL_CACHED handler. The cascade is identical: active dyn binding wins, then env lookup, then the cached var if its gen still matches S->ic_gen, then a fresh resolve under the GC write barrier.

The slow helper needs env for the env-lookup branch (captured locals reach their values through it). The JIT invoke ABI is extended to carry env from the mino_bc_run frame into a transient jit_invoke_env field on the current thread ctx; the slow helper reads it back at use. Without this thread-local publish, inner fns that reference outer-fn locals would surface as spurious "unbound symbol" diagnostics -- the gc-closure-churn test caught exactly that path on the first run of this release.

A new IMM_KIND_BC immediate kind carries the mino_bc_fn_t * into the stencil's literal pool. The compile-time JIT walk writes the bc pointer at the pool slot the stencil reads via IMM_BC, so the slow helper sees the right bc without needing to look it up from thread state.

Eligibility-tracer comparison (MINO_CPJIT_STATS=1, tests/run.clj):

| Reason | v0.203.0 | v0.204.0 | |-------------------|----------|----------| | ok | 13 | 14 | | captures | 15 | 15 | | ic-slots | 32 | 0 | | n-clauses | 0 | 7 | | has-rest | 0 | 2 | | unknown-op | 9 | 31 |

The drop in ic-slots reveals the next priority: most ic-slot fns also have unstencilised call ops, so they shift to unknown-op (blocked on OP_CALL / OP_TAILCALL / OP_CALL_CACHED) rather than becoming eligible. Closing the call story (v0.205-206) is the next unlock.

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.203.0 — JIT Introspection + Eligibility Tracer

Adds an opt-in observability layer over the CPJIT compile pipeline. Setting MINO_CPJIT_STATS=1 in the environment turns on a per-fn tracker; at process exit, a summary plus per-fn table is written to stderr.

The tracker records every fn that crosses MINO_JIT_THRESHOLD and becomes a compile candidate. For each candidate it stores:

- source location (file:line:column, from the bc source map) - bytecode length - eligibility outcome (ok or a specific blocker reason) - first unknown opcode (only for the unknown-op reason) - whether compile succeeded - native bytes emitted

The blocker reasons mirror the existing eligibility check: captures, ic-slots, n-clauses, has-rest, unknown-op, empty, bad-terminator, null-bc. The summary block aggregates attempt / eligible / compiled counts and total native bytes; the per-reason block shows the histogram across blockers; the unknown-op breakdown shows which specific opcodes are responsible for unknown-op rejections, naming the next stencilisation targets in priority order.

mino_jit_eligible keeps its existing boolean contract; internally it delegates to a new classify_eligibility helper that returns a typed reason. The boolean wrapper collapses every non-OK reason back to 0, so the existing call site in fn.c is unchanged. mino_jit_compile becomes a thin wrapper around a renamed compile_inner: it classifies eligibility, records the result, and recordings include both successful and rejected attempts.

Also closes an asymmetry between the two callable-entry paths in fn.c. apply_callable (cons-args ABI) has always bumped bc->hot_counter and triggered a compile at the threshold; apply_callable_argv (the argv ABI hit by OP_CALL and OP_TAILCALL from inside bc bodies) used to skip both. Without the warming hook on the argv path, fns reached exclusively through the bc call ABI never reach the threshold, regardless of their shape. The tracer made this visible -- a fresh test-suite run showed exactly one compile attempt against a corpus that contains many hot eligible fns. With the hook mirrored across both entry points the test suite now reports 69 attempts; jit_bench reports 39.

Baseline eligibility on this binary (arm64 / Darwin):

| Workload | attempts | eligible | top blocker | |-----------------------|----------|----------|-------------------| | tests/run.clj | 69 | 18.8% | ic-slots (46%) | | benchmarks/jit_bench | 39 | 43.6% | ic-slots (51%) |

ic-slots > 0 is the single biggest unlock the upcoming OP_GETGLOBAL_CACHED stencil targets. Every defn that reads a global var has at least one ic-slot, so dropping that blocker extends compile coverage to roughly two-thirds of real-workload fns.

The infrastructure adds no measurable overhead when the env var is unset: a tri-state cached check in cpjit_stats_record short- circuits before any allocation. With the var set, allocations are one cpjit_stat_entry_t per compile attempt; the linked list and strdup'd filenames live until process exit (atexit-dumped, then the host frees the process).

Full test suite (1688 / 7854) passes.

v0.202.0 — CPJIT Stencil Cycle Close + Perf Measurement

Drops OP_LOOP_INT_LT from the active stencil descriptor table (g_stencils[] in src/eval/bc/jit.c). Bench measurement against the interpreter's inline fast path showed a 17% regression on the canonical count-loop shape: the interpreter's if (vc != NULL && MINO_IS_INT(vc) && vc < vl) { ... } chain plus its tight dispatch loop edges out the stencil's chain-ABI overhead plus literal-pool reads. Per [[measure-before-after]], the regression isn't shipped.

The stencil source stays under src/eval/bc/stencils/loop_int_lt.c so a future cycle can revisit. The other two fused-loop stencils (OP_LOOP_INT_DEC, OP_LOOP_INT_LT_INC) remain active -- both ship measurable wins.

Cycle measurement ((nano-time)-based, n=1000 inner iters, 10k outer calls; arm64 / Darwin):

| Workload | JIT | no-JIT | Speedup | |------------------|----------|----------|---------| | sum-loop 1000 | 8666 ns | 18063 ns | 2.08x | | dec-loop 1000 | 1737 ns | 2877 ns | 1.66x | | lockstep 1000 | 3290 ns | 3510 ns | 1.07x | | count-loop 1000 | 3149 ns | 3032 ns | 0.96x | | abs-fn -5 | 1524 ns | 1529 ns | 1.00x | | add-3 5 | 1611 ns | 1632 ns | 1.01x |

The cycle's headline target -- 2-3x on a counted loop -- is hit on sum-loop (a two-binding accumulator loop that compiles to unfused INC_I / LT_II / JMPIFNOT / JMP stencils, all of which the v0.196.0

OP_LOOP_INT_DEC, sees a clean 1.66x via the v0.201.0 stencil. count-loop remains at interpreter speed, the deliberate outcome described above. Pure-arith functions (abs-fn, add-3) sit at noise: their bodies were already JIT-eligible before the cycle and the JIT region's call-edge cost dominates the iteration delta.

Cycle stencil set after v0.202.0: 22 ops active (MOVE, LOAD_K, RETURN, fused LOAD_K-RETURN, 3 arith II, 5 comparison II, 3 unary, 5 IK forms, 2 fused-loop) plus 2 direct-emit branch ops (OP_JMP, OP_JMPIFNOT).

Full test suite (1688 / 7854) passes; ASan rebuild + tests pass.

v0.201.0 — Fused Counted-Loop Stencils

src/eval/bc/stencils/{loop_int_lt,loop_int_dec,loop_int_lt_inc}.c add stencils for the three fused counted-loop opcodes the bytecode compiler emits for common Clojure-canon loop shapes:

- OP_LOOP_INT_LT(loop [i 0] (if (< i n) (recur (inc i)) ...)) - OP_LOOP_INT_DEC(loop [i n] (if (zero? i) ... (recur (dec i)))) - OP_LOOP_INT_LT_INC(loop [i 0 k 0] (if (< i n) (recur (inc i) (inc k)) ...))

Each fused op IS the loop entry; the interpreter handles iteration by pc -= 1 so the same instruction re-executes. The stencil mirrors this with a natural C for (;;) loop -- clang emits one prologue, one set of callee-saved spills, and a tight body with a single back-edge that re-runs without re-entering the prologue. Loop-exit edges use MINO_STENCIL_CHAIN_RETURN, which compiles to the function's natural ret; the JIT then patches that ret into the usual b <next_stencil> chain branch.

Three slow helpers join the lineup in jit.c:

- mino_jit_loop_int_lt_slow -- cons + prim_lt + prim_inc - mino_jit_loop_int_dec_slow -- cons + prim_zero_p + prim_dec - mino_jit_loop_int_lt_inc_slow -- cons + prim_lt + prim_inc + prim_inc

Each helper signals exit-vs-continue by tagging the low bit of the returned regs pointer ((ret_ptr & 1) == 1 means "exit"). The caller masks the bit off before storing back. The low-bit tag is safe because regs always points to 8-byte-aligned storage.

Fast paths inline the tagged-int range checks. OP_LOOP_INT_LT relies on the invariant c < l ≤ MAX_INT, so c + 1 never overflows the inline range; OP_LOOP_INT_DEC skips the inline decrement when the counter equals MIN_INT; OP_LOOP_INT_LT_INC checks the carry operand for MAX_INT since k is unconstrained relative to the loop bound.

A back-jump marker mechanism (mino_jit_loop_continue_marker plus SYM_SLOT_LOOP) is wired through emit_stencil for future stencils whose back-edge can't be expressed by a natural C for loop. None of the v0.201.0 stencils use it; the infrastructure is kept in place so a later release can target it without re-plumbing emit_stencil.

Stencil set grows to 23 ops. Full test suite (1688 / 7854) passes.

v0.200.0 — JIT Control Flow (OP_JMP / OP_JMPIFNOT) + Chain ABI Fix

The JIT covers branches. OP_JMP and OP_JMPIFNOT get direct-emit templates in src/eval/bc/jit.c -- 4 bytes (b <target>) and 20 bytes (ldr / cbz / sub / cmp / b.ls) respectively. The target address is unknown at emit time; a target-patch pass after the layout walk resolves pc_offsets[curr_pc + 1 + sBx] for each branch and rewrites the imm26 (B26) or imm19 (CBZ / B.cond) encoding in place.

OP_JMPIFNOT inlines the mino truthiness check:

`` ldr x9, [x0, #(IMM_A * 8)] ; v = regs[a] cbz x9, <target> ; NULL -> take sub x10, x9, #2 cmp x10, #1 ; unsigned <= covers v == 0x2 (false) b.ls <target> ; and v == 0x3 (nil) ; fall through (truthy) ``

The trick exploits that mino's nil-tagged sentinel is 0x3 and its false-tagged sentinel is 0x2; subtracting 2 maps both to a contiguous [0, 1] range, so a single b.ls covers both. Every other tagged value (int, char, bool-true, heap-pointer) lies outside that range and falls through to the truthy branch.

Counted loops now JIT-compile end-to-end. (loop [i 0] (if (< i n) (recur (inc i)) ...)) shapes lower to a JIT'd body that uses OP_INC_I, OP_LT_II, OP_JMPIFNOT, and OP_JMP together; no hand-off back to the interpreter for the back-jump. A 100-iteration (reduce + 0 (range 100)) style body compiles and produces the right result (4950).

Stencil chain-ABI fix. Pre-v0.200.0 stencils that called a helper through bl left x1 (consts) and x2 (S) clobbered at the patched-ret boundary. A subsequent stencil expecting x1 to still be the consts table (OP_LOAD_K, OP_LOAD_K_RETURN) or x2 to still be the state (helper-calling stencils) would read garbage. The bug stayed latent until v0.200.0's branch ops chained OP_ZERO_INT_P -> OP_JMPIFNOT -> OP_LOAD_K and segfaulted on the first LOAD_K.

The fix:

All 16 helper-calling stencils use the new macro: add_ii / sub_ii / mul_ii / lt_ii / le_ii / gt_ii / ge_ii / eq_ii / inc_i / dec_i / zero_int_p / add_ik / sub_ik / lt_ik / le_ik / eq_ik. The void-returning stencils (move, load_k) and the final stencils (return_imm, fused load_k_return) are unchanged: they neither call helpers nor chain.

Eligibility now accepts OP_JMP / OP_JMPIFNOT; the OP_LOAD_K + OP_RETURN fusion is disabled when the body contains any branch op, because a mis-aimed jump landing on the RETURN-half of a fused atomic stencil would re-execute the LOAD_K. With branch-free bodies the fusion still fires.

Full test suite (1688 / 7854) passes. Smoke runs exercise an (if (zero? x) 1 2) body, an abs form, and a counted-loop sum.

v0.199.0 — Unary + Immediate-Arg Stencils

src/eval/bc/stencils/{inc_i,dec_i,zero_int_p}.c cover the unary int-arith ops ((inc x), (dec x), (zero? x)), and src/eval/bc/stencils/{add_ik,sub_ik,lt_ik,le_ik,eq_ik}.c cover the immediate-rhs variants ((+ x N), (- x N), (< x N), (<= x N), (= x N) where N is a signed 8-bit literal baked into the bytecode word). With these eight stencils landed the JIT covers all the single-instruction shapes the bytecode VM emits for counted-loop bodies, so a (loop [i 0] (if (< i N) (recur (inc i)) ...)) body now JIT-compiles to native instructions end-to-end below the back-jump.

jit.c grows two new cold helpers:

A new IMM_KIND_KIMM joins the stencil-immediate enum in jit.c. The JIT writes (uint64_t)(uintptr_t)MINO_MAKE_INT((int8_t)C_OF(insn)) into the pool slot at materialisation time so the stencil reads it as a tagged mino_val_t* and hands it to binop_int_fast exactly the way the II form passes regs[c]. The ABI extension is a single new extern name (MINO_STENCIL_IMM_KIMM) plus an IMM_KIMM macro in abi.h; no changes to the existing IMM_A / B / C / Bx / sBx slots.

The extern-fn table gains three entries (unop_int_fast, mino_jit_binop_k_slow, mino_jit_unop_slow) and the stencil descriptor table gains eight entries (OP_INC_I / OP_DEC_I / OP_ZERO_INT_P / OP_ADD_IK / OP_SUB_IK / OP_LT_IK / OP_LE_IK / OP_EQ_IK). Stencil set up to 20 ops now. Full test suite (1688 / 7854) passes; smoke runs exercise inc / dec / zero? on the JIT path, ADD_IK / SUB_IK with positive and negative immediates, LT_IK / LE_IK / EQ_IK against truthy and falsy comparisons, and the slow path through both bigint overflow ((inc 1152921504606846975)) and double tag-miss ((+ x 7) on a 1.5 input).

v0.198.0 — Comparison Stencils (LT / LE / GT / GE / EQ_II)

src/eval/bc/stencils/{lt,le,gt,ge,eq}_ii.c extend the JIT's stencil set to cover (< a b), (<= a b), (> a b), (>= a b), and (= a b). The shape mirrors v0.197.0's arith stencils: tagged- int fast lane via binop_int_fast with the matching BINOP_* subop, cons-spine fallback via mino_jit_binop_slow for the cold (non-int operand) path. The fast lane returns mino_true / mino_false sentinels without allocating; only mixed-type or non-numeric inputs hit the prim.

mino_jit_binop_slow in jit.c grows five subop cases (BINOP_LT / LE / GT / GE / EQ) so the slow path routes to prim_lt / prim_lte / prim_gt / prim_gte / prim_eq -- the same prims the interpreter's OP_*_II fallback uses.

Fns whose bodies are (fn [a b] (< a b)) and similar single- comparison shapes JIT-compile end-to-end; the full test suite plus ASan rebuild pass with -DMINO_CPJIT=1. Stencil set up to ten ops now: MOVE / LOAD_K / RETURN / fused LOAD_K-RETURN / three arith / five comparisons.

v0.197.0 — Stencil Call ABI + ADD_II / SUB_II / MUL_II

src/eval/bc/stencils/{add_ii,sub_ii,mul_ii}.c are the first JIT stencils that call host C helpers. Each stencil reads regs[B] and regs[C], dispatches to binop_int_fast for the tagged-int fast lane, falls back through mino_jit_binop_slow (a new jit.c helper that builds the two-element cons list and dispatches to the matching prim with regs-base refresh) on a miss, and writes regs[A]. The stencil ABI settles at (regs, consts, S) in (x0, x1, x2); every non-final stencil now returns regs so the chain preserves the window pointer across bl calls that clobber x0.

tools/stencil_extract.c already extracted ARM64_RELOC_BRANCH26; the runtime side gains a patch_branch26 patcher and a 16-byte per-call trampoline (ldr x16, [pc, #8]; br x16; <target-addr>) appended between the code region and the literal pool. The trampoline sidesteps the bl's ±128 MB range when the host helper lives far from the mmap'd region. The trampoline-slot count is sized from a one-pass classification of every stencil symbol; MINO_STENCIL_IMM_* names route to pool slots, all other names look up in a small extern-fn table (currently 2 entries: binop_int_fast, mino_jit_binop_slow).

The trailing-ret trim heuristic is gone. Every non-final stencil emits its full body; the JIT's second pass scans each instance for the first ret (0xd65f03c0) and rewrites it as b <next> so the chain falls through to the next stencil. Cold blocks that clang lays out after the natural exit (the slow-path bl and merge-back branch in the new arith stencils) stay in place, reachable only through the in-stencil cbz + b-back-to-epilogue pattern. The layout becomes [code | trampolines | literal-pool] in a single mmap; the multi-page distinction is dropped since adrp handles any ±4 GB page diff regardless.

Stencils compile with a new -fno-optimize-sibling-calls flag in the gen-stencils task so clang doesn't tail-call the slow helper -- the chain pattern depends on stencils returning normally through the rewritten ret.

Bench gate: 1688 tests / 7854 assertions pass under both the default build (-DMINO_CPJIT=1) and the mino_asan rebuild with -DMINO_CPJIT=1. The micro-bench at ~/Code/mino-bench/benchmarks/ jit_bench.clj confirms add-fn JIT-compiles (152 code bytes, 32 trampoline bytes, 4 pool slots) and runs within run-to-run noise of v0.195.0; per-call savings inside add-fn are dwarfed by the bench-closure wrapper. The wins land progressively in v0.198+ as more shapes (comparisons, unary, control flow) become JIT-eligible.

v0.196.0 — Externalise Int-Arith Fast-Path Helpers

binop_int_fast, unop_int_fast, and tag_or_box_int lose their static linkage in src/eval/bc/vm.c and gain extern declarations in src/eval/bc/internal.h. The three helpers are the tagged-int fast lanes the bytecode dispatcher uses for the OP_*_II / OP_INC_I / OP_DEC_I / OP_BINOP_INT families; promoting them to TU-public symbols lets upcoming JIT stencils call them via a direct BL instruction the same way the interpreter dispatch does. No semantics change, no rename, no new code path: just storage extension for the call ABI work that lands next.

The BINOP_* and UNOP_* enums in internal.h were already public; only the function-symbol storage moves. The full test suite and ASan build pass identically. Bench harness across the three suites stays within run-to-run noise of v0.195.0.

v0.195.0 — Fused LOAD_K + RETURN Superinstruction

src/eval/bc/stencils/load_k_return.c is the first fused superinstruction stencil. The pattern OP_LOAD_K (A=R, Bx=K) immediately followed by OP_RETURN A=R collapses into a single stencil that places consts[Bx] directly in x0 (the AArch64 return register) and exits. Skips the intermediate regs[A] write that two separate stencils would emit. Constant-returning fns -- (fn [] 42), arity-stubs that wrap a literal -- hit this pattern.

The JIT compile walk pattern-matches the source pair: when it sees OP_LOAD_K followed by OP_RETURN with matching A, it emits the fused stencil and advances pc by two. The fused stencil sits behind a pseudo-opcode (OP_FUSED_LOAD_K_RETURN, allocated above OP__COUNT) so the regular find_stencil lookup never confuses it with a real bytecode opcode emitted by the compiler. native_pc_offsets[pc+1] aliases the fused chunk's start since deopt mid-superinstruction is not a representable state.

Effect: (fn [] 42)'s native code size drops from 40 bytes (LOAD_K stencil + RETURN_IMM stencil = 6 + 4 instructions) to 16 bytes (fused stencil = 4 instructions); the literal pool shrinks from 3 slots to 1. The full test suite plus ASan build pass; nine fns still JIT-compile under tests/run.clj.

v0.194.0 — Saturating Counter on JIT-Ineligible Fns

The tier-selection branch in apply_callable now saturates a fn's hot_counter to UINT_MAX when mino_jit_compile rejects it. Subsequent calls skip the per-call eligibility re-check entirely -- the fn's shape (captures / IC slots / opcode mix) is stable across its lifetime, so a single negative answer is final.

The plan's original v0.194.0 scope (background compile worker thread) is deferred under its own conditional clause: with the current narrow stencil set (MOVE / LOAD_K / RETURN), compile latency for any eligible fn is sub-millisecond and the synchronous compile path is already off the user-visible critical path. Reintroducing the worker becomes interesting once heavier stencils land and compile latency starts mattering.

v0.193.0 — Single-Page JIT Layout for Small Fns

The JIT compile path now detects when a fn's code plus literal pool fits in one host page and lays them out together in a single mmap'd region; only fns that overflow a page fall back to the multi-page layout (separate code and pool ranges).

Adrp's page-relative addressing handles both layouts uniformly: in the single-page case both halves live on the same 4 KB / 16 KB host page so the page diff is zero, and ldr's 12-bit page-offset field reaches the pool slots that sit 8-byte aligned right after the code. The patcher math doesn't change.

Effect on the smoke shape (fn [x] x) on a 16 KB-page host: per- fn resident memory drops from 32 KB (two pages: one code + one pool) to 16 KB (single page). The full test suite plus ASan build pass identically.

The full multi-fn-per-page allocator (where many small JIT'd fns share one page and each chunk's pool sits inside the same page as its code) needs a write-protect-toggle mechanism the runtime doesn't yet have; it lands in a later release. The single-page layout is the practical first step that gives most of the win for typical small fns without the toggle complexity.

v0.192.0 — Windows COFF Detection Scaffolding

tools/stencil_extract.c learns to sniff a COFF amd64 object (machine ID 0x8664 little-endian in the first two bytes) and emits a placeholder error pointing at the Windows platform release. The x86_64 COFF reloc-kind constants (IMAGE_REL_AMD64_ADDR64, IMAGE_REL_AMD64_ADDR32, IMAGE_REL_AMD64_REL32, IMAGE_REL_AMD64_REL32_1) are declared so the eventual parser maps each through the shared MINO_STENCIL_RELOC_* enum without restructuring the existing extraction pipeline.

The Windows runtime needs a VirtualAlloc / VirtualProtect adapter alongside the mmap path on Linux / Darwin; that adapter, the COFF parser, and the x86_64 instruction patcher land together with the first Windows stencil. Embedders on Windows land on the stub path today and get the JIT once the adapter + parser + patcher arrive in a future release.

v0.191.0 — x86_64 Infrastructure Scaffolding

tools/stencil_extract.c declares the x86_64 ELF reloc-kind constants (R_X86_64_64, R_X86_64_PC32, R_X86_64_PLT32, R_X86_64_GOTPCREL, R_X86_64_REX_GOTPCRELX) so the eventual parser landing on an x86_64 Linux build host drops directly into the existing MINO_STENCIL_RELOC_* mapping flow.

src/eval/bc/jit.c host detection grows two new branches behind MINO_CPJIT_X86_64_LINUX and MINO_CPJIT_X86_64_DARWIN. The detection picks the right MINO_CPJIT_STENCILS_HEADER path for each host triple; when the corresponding generated header is committed and the macro defined at build time, the full pipeline compiles in.

The x86_64 instruction patcher (RIP-relative addressing, variable-length instruction width) lives outside today's ARM64-only patch_adrp / patch_pageoff12_ldr64 pair and lands with its first stencil source. Like the ARM64 Linux work, the generated header needs a build run on the target host. Embedders on x86_64 land on the stub path today.

v0.190.0 — ARM64 Linux Infrastructure Scaffolding

tools/stencil_extract.c learns to sniff an object file's magic bytes before dispatching to a parser. Mach-O 64 keeps its existing path; ELF objects (0x7f 'E' 'L' 'F') are detected and the tool emits a placeholder error pointing at the platform release that finishes the wiring. The ARM64 ELF reloc-kind constants (R_AARCH64_ABS64, R_AARCH64_CALL26, R_AARCH64_JUMP26, R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC, R_AARCH64_LDST64_ABS_LO12_NC, R_AARCH64_ADR_GOT_PAGE, R_AARCH64_LD64_GOT_LO12_NC) are declared so the eventual ELF parser drops straight into the existing MINO_STENCIL_RELOC_* mapping path.

src/eval/bc/jit.c host detection grows an ARM64 Linux branch behind MINO_CPJIT_ARM64_LINUX. The stencils header path is indirected through a MINO_CPJIT_STENCILS_HEADER macro so the file structure no longer hard-codes the Darwin path. When the ARM64 Linux header is generated and the macro defined, the full pipeline compiles in without a source-level reshuffle.

Actual ARM64 Linux header generation needs the build to run on an ARM64 Linux host (the host C compiler emits the host's object format, and no portable cross-compiler is bundled). The infrastructure carries the load until that build runs; embedders on ARM64 Linux land on the stub path today and get the JIT once their build produces the generated header.

v0.189.0 — JIT Default On with Host-Aware Stubs

Both the bootstrap Makefile and lib/mino/tasks/builtin.clj ship -DMINO_CPJIT=1 by default. Fresh checkouts and ./mino task build invocations now build the JIT into the runtime without an extra flag flip.

src/eval/bc/jit.c learns to be portable across hosts. The host is detected as __aarch64__ && __APPLE__ (the only triple with a generated stencils header today). When MINO_CPJIT is defined but the host doesn't have a stencil header, the file compiles a parallel set of public-API stubs whose only behaviour is to return failure / NULL -- the runtime falls through to the interpreter and the rest of the binary is unaffected. ARM64 Linux, x86_64 Linux, and Windows headers extend this fence as the platform releases land.

The runtime impact on the supported host (ARM64 Darwin): nine fns get JIT-compiled during a full tests/run.clj walk, ASan is clean, and all 1688 tests / 7854 assertions pass identically to the interpreter-only build. JIT'd fns today are the narrow shape covered by the MOVE / LOAD_K / RETURN stencils -- linear data-movement bodies. Wider coverage (arithmetic, control flow, calls, IC-cached globals) is deferred to dedicated stencil releases after the cycle's platform expansion.

v0.188.0 — Public Deopt Primitive

mino_jit_invalidate(S, fn) is the public deopt primitive: it drops the runtime-visible native, native_size, and native_pc_offsets pointers on a JIT'd fn and rewinds the hot counter so a re-warming attempt starts from zero. The backing mmap'd region and offset table stay owned by the state's jit_regions list and get reaped at state teardown.

apply_callable's ic_gen-mismatch path is refactored to call the new entry instead of poking the fields directly. Any future client that needs to take a JIT'd fn off the native path -- a breakpoint mechanism, a profiler re-instrumenting a hot fn, fault-injection tests -- now calls the same primitive. The stub defined when MINO_CPJIT is off keeps the call site unconditional in fn.c.

mino doesn't yet expose breakpoint registration to user code; the plan's original scope (walk JIT'd fns on set-bp, invalidate those that cover the line) is gated on that mechanism arriving. This release ships the deopt half so the future bp work has the mechanism waiting.

v0.187.0 — JIT Deopt-on-IC-Gen Mismatch Regression Suite

tests/bc_jit_deopt_test.clj is the new regression-protective test class for the deopt contract: identical results before and after redefinition for a JIT'd fn, with batched defs, with const fn bodies, and across many warm/cool cycles. The dispatch-entry mismatch check that drops the runtime-visible native pointer (and resets the hot counter so the next compile is gated by the full threshold again) was already in place from v0.185.0; this release pins the externally-observable contract so a regression to that path fails at the test layer.

src/eval/bc/jit.h grows a Deopt-model docs block alongside MINO_JIT_THRESHOLD documenting the dispatch-entry check and the not-yet-relevant mid-execution invalidation case (no stencil currently emitted can call back into mino-land where it could observe a mid-frame def).

The test class runs in the default suite as well as under -DMINO_CPJIT=1; the deopt branch lives outside the build flag so the contract is identical with or without JIT'd execution.

v0.186.0 — Per-PC Native-Offset Side Table

mino_bc_fn_t grows a native_pc_offsets field that the JIT compile path populates as it lays out the stencil sequence: native_pc_offsets[i] is the byte offset of the i-th bytecode instruction's stencil within bc->native. The table is the foundation for two later releases: ic-gen deopt needs a way to reconstruct the bytecode resume pc when invalidating a partially- executed JIT region, and breakpoint deopt needs to know which stencil bytes cover a given bytecode position.

mino_jit_offset_to_pc(bc, native_off) is the reverse lookup -- given a native byte offset, returns the bytecode pc whose stencil contains it (or -1 when the offset is out of range). Cold path; intended for stack-trace formatting and debugger introspection.

The table is allocated alongside the mmap'd region and tracked in the state's jit_regions linked list (as aux_ptr), so mino_state_free reaps the malloc'd table during teardown together with the executable region. The deopt cleanup path drops its runtime-visible pointer and resets hot_counter to 0; the backing table stays owned by jit_regions until process exit.

Stack-trace consumers stay on the bytecode source map (via bc_current_pc and mino_bc_source_lookup); the JIT'd code today doesn't update bc_current_pc per stencil. Wiring stack frames to go through mino_jit_offset_to_pc lands when more complex stencils (those that can throw or transfer control) start needing fine-grained per-pc attribution.

v0.185.0 — Runtime JIT Compile Path

src/eval/bc/jit.c materialises bc-compiled fns into mmap'd RX pages through the copy-and-patch pipeline. The compile flow:

The state's new jit_regions linked list owns every mmap'd region; mino_state_free walks the list and munmaps each one so the OS reclaims executable pages at teardown. The slot stays present even without the build flag (always NULL) so the field offset doesn't drift between configurations.

The entire JIT path compiles in only when the build defines MINO_CPJIT. The default build leaves it off, so the runtime behaviour, footprint, and test results are unchanged for embedders. A MINO_CPJIT_TRACE=1 environment variable emits one stderr line per successful compile -- a developer-facing diagnostic, off in the test suite. With the build flag on, a smoke fn whose body is (fn [x] x) warms past the threshold, compiles, and returns identical results on every subsequent call. The full test suite (1684 tests / 7843 assertions) passes under both builds and under ASan in both configurations.

What does NOT yet ship: stencils for arithmetic, control flow, calls, IC-cached globals, lazy-seq production, or any opcode beyond MOVE / LOAD_K / RETURN. Those are scoped into the expansion releases that follow.

v0.184.0 — Stencil Immediate ABI and Relocation Pipeline

src/eval/bc/stencils/abi.h defines the copy-and-patch stencil immediate ABI: extern char-array symbols MINO_STENCIL_IMM_A, MINO_STENCIL_IMM_B, MINO_STENCIL_IMM_C, MINO_STENCIL_IMM_BX, and MINO_STENCIL_IMM_SBX. Stencil sources access the operand fields of the bytecode instruction word as the addresses of those symbols (IMM_A, IMM_B, ... macros), and the compiler emits one relocation pair per read site. The JIT patches those relocations when materialising a stencil instance for a specific bytecode op.

tools/stencil_extract learns to walk the Mach-O section relocation table. Each reloc whose offset lies inside the stencil function body is recorded as a (offset, kind, sym_index, addend) quadruple in the emitted header, alongside a de-duplicated symbol table. The runtime in subsequent releases consumes that data through a stable enum (MINO_STENCIL_RELOC_ARM64_* / MINO_STENCIL_RELOC_ABS64) so the generated header is decoupled from the consumer's reloc-kind layout. The extractor now also accepts --append so multiple stencils can co-exist in a single generated header.

Three new stencils ship in src/eval/bc/stencils/:

src/eval/bc/stencils/generated/stencils_arm64_darwin.h regenerates to include byte tables, symbol lists, and reloc tables for all four stencils. The runtime build is still unchanged — the header is consumed in the next release where the runtime JIT compile path lands. Arithmetic, control-flow, and call-shape stencils need the runtime ABI to be settled before they make sense to author; they land alongside the runtime in subsequent releases.

The extractor selftest grows reloc-field decode coverage: it packs known values into r_info, runs each accessor (reloc_symbolnum, reloc_pcrel, reloc_length, reloc_extern, reloc_type), and checks the host-stable reloc_arm64_kind_map answers correctly for the supported reloc kinds plus rejects unknown kinds.

v0.183.0 — First Stencil Source

src/eval/bc/stencils/return.c is the first stencil source in the copy-and-patch pipeline: a minimal C function that returns the value at arg0[0]. The compiled body is two arm64 instructions (ldr x0, [x0]; ret) — load and return — and gives the build pipeline a real-world payload to chew on while the immediate-patching infrastructure is still landing.

./mino task gen-stencils is the regeneration target. It rebuilds tools/stencil_extract, compiles every .c under src/eval/bc/stencils/ to an intermediate .o, then dispatches the extractor to write the byte tables into src/eval/bc/stencils/generated/stencils_arm64_darwin.h.

The header is checked in (mino developers regenerate when stencil sources change; embedders rebuild without the extractor toolchain). The runtime build is still unchanged — nothing in the mino binary consumes the header yet — but every other piece of the build flow is exercised end-to-end.

v0.182.0 — Stencil Extractor Tool

tools/stencil_extract.c is the build-time utility that turns an object file produced by the host C compiler into the byte tables the runtime JIT memcpy's into RWX memory. The first cut handles 64-bit Mach-O (Darwin arm64 and Darwin x86_64); ELF and COFF support land with the corresponding platform releases. The tool exposes two modes beyond its --selftest smoke check: <obj> --list enumerates the defined symbols in __TEXT,__text, and <obj> <symbol> <out> emits a self-contained C header with the function bytes and size.

./mino task build-stencil-extract compiles the tool, and ./mino task test-stencil-extract chains the build with the selftest. The selftest verifies that the Mach-O header / segment / section / nlist / reloc struct sizes match the documented file format on the host, which is the load-bearing check the parser depends on.

The runtime build is unchanged in this release; the tool exists but nothing in the mino binary consumes its output yet. The first stencil source plus its generated header lands next.

v0.181.0 — Diagnostic Source-Span Coverage

Catchable diagnostics raised from inside the bytecode runtime now carry a :mino/location entry whenever the source position is known. The fallback chain mirrors the new bc cursor: the call form's cons metadata when it has a line, otherwise the bc fn's source map at the current pc. prim_throw_classified and set_eval_diag_with_data both build the same shape, so a (try ... (catch e (ex-data e))) binding observes a single consistent diagnostic schema regardless of which C code emitted the throw.

tests/bc_error_quality_test.clj is the new regression-protective test class: it pins the location-carrying contract for arith type errors, divide-by-zero, unresolved-symbol, and user-throw sites. Future degradations to error attribution fail at the test layer instead of silently regressing the user-visible diagnostic surface.

v0.180.0 — Var-Discipline Uniform Read Path

The per-fn IC-slot array gains a stable C entry point, mino_bc_ic_global_load, that performs the same dynamic-then-lexical-then-cache-then-resolve lookup the OP_GETGLOBAL_CACHED handler does. Native tiers, profiling tooling, and embedders that need to read a fn's resolved globals can call it without going through the dispatch loop.

The header now spells out the contract callers depend on: one slot per syntactic var reference; the gen field tracks the S->ic_gen snapshot; def / ns-unmap / var_set_root / var_unintern bump ic_gen and force re-resolve on the next read; the cached value is observed- consistent with the var's root at the moment of refill.

No behavioural change in the interpreter; the existing OP_GETGLOBAL_CACHED path is unchanged. ic_gen invalidation tests continue to pass identically.

v0.179.0 — Deopt Protocol Scaffolding

mino_bc_fn_t gains four native-tier slots: native (head of the mmap'd page that will carry compiled stencils), native_size, an native_gen snapshot of S->ic_gen at compile time, and a hot_counter of interpreter invocations. All start at NULL/0 and nothing in the compile path writes them yet; the dispatch site in apply_callable carries the tier-selection branch (native / counter / interpreter) so the runtime layer that wires JIT'd code in can drop the missing arm in.

The bc cell sentinel changes from const-qualified to plain mutability so the new hot_counter slot is writable through the single fn->as.fn.bc pointer; the sentinel itself stays read-only by discipline (no code path mutates it).

Stencil ABI invariant is documented in eval/bc/internal.h: every opcode boundary keeps (S, regs, pc, env, consts, vars) in the same machine-level state across interpreter handlers and native stencils. The contract is the load-bearing piece behind deopt and tracing readiness; stencils that drift from it would not survive the boundary without per-handler fixup code.

v0.178.0 — Source-Map Scaffolding

The bytecode VM gains a per-fn (line, column) side table indexed by pc, populated by the compiler from each form's cons metadata. The table is allocated GC_T_RAW and walked through the bc record's GC mark hook; the source file is stored once on the table and consulted through a new mino_bc_source_lookup accessor that callers outside the bc module can use to attribute diagnostics back to a precise source position.

The bytecode dispatch loop publishes a current-pc cursor on the thread context (saved and restored around each mino_bc_run frame), so errors raised from primitives invoked by an opcode can resolve a precise source span when the surrounding eval frame's form has no line info. set_eval_diag falls back to the cursor when the explicit form is missing it; otherwise diagnostic behaviour is unchanged.

The data structure is the foundation for the runtime PC ↔ source mapping that downstream tiers will lean on for stack traces inside native code regions.

v0.177.0 — Lazy-Cell Allocation Probe

Re-ran MINO_BC_OP_COUNTS on lazy_bench.clj against the v0.176.0 binary.

Findings

On the C-backed (reduce + 0 (map inc (range 1000))) path, OP_MAKE_LAZY is 0.73% of dispatch (one cell per chunk, then unwound by pipeline_walk). On the pure-mino (lazy-seq ...) recursion path, OP_MAKE_LAZY is 44.4% of dispatch and the workload runs 70x slower than the C-backed equivalent.

The per-cell cost is dominated by the generic memset(h, 0, sizeof(*h) + size) (~96 bytes per cell) that gc_alloc_typed_inner does to keep freelist slots safe for the collector. A MINO_LAZY-specific freelist that knew the cell's fields could skip the memset and inline the field stores -- ~3x faster OP_MAKE_LAZY, mapping to ~25% wall-time win on the pure cohort.

Decision

Defer. The win is workload-narrow (pure-mino lazy-seq path that idiomatic users replace with map/filter/iterate). The freelist-with-known-fields refactor moves to the JIT-cycle backlog; pre-JIT optimisations should target hot pipelines, which already bypass MINO_LAZY allocation. Full writeup at .local/post-v0.176.0-lazy.md.

v0.176.0 — BigInt Fusion Bench

Added bigint_bench.clj ((reduce + 0N (range 1 1000)), (reduce *' 1 (range 1 1000)), etc.) to mino-bench and wired it into the matrix runner.

Measurements

| Bench | Per-op | |---|---:| | (reduce + 0N (range 1 1000)) | 566 ns | | (reduce *' 1 (range 1 1000)) | 1.11 us | | (reduce + 0 (range 1 1000)) (int control) | 4.7 ns |

Bigint paths are 120-235x slower per op than the int control. GC pressure (30% wall on add, 18% on mul) hints at a ~20-30% fusion win on the add chain.

Decision

Defer fusion. The opportunity is real but narrow, and the implementation would need a transient-bignum substrate that no non-bignum code shares. The bench stays in the matrix so the question is re-evaluated automatically each cycle. Full writeup at .local/post-v0.175.0-bigint.md.

v0.175.0 — Threaded Dispatch Probe

Ran MINO_BC_OP_COUNTS=1 against the v0.174.0 binary to substantiate the threaded-dispatch hypothesis. Findings: the top 10 bytecode opcodes account for 97.2% of dispatches and the top 4 alone (OP_MOVE, OP_CALL_CACHED, OP_GETGLOBAL_CACHED, OP_TAILCALL) carry 70.6%. The opcode distribution is the opposite shape (concentrated, not flat) of what threaded dispatch monetises; the branch predictor handles a 4-way-hot switch jump well.

Decision: stay on the switch dispatch. Instrumentation is left in place behind -DMINO_BC_OP_COUNTS=1 so a future workload shift can be re-evaluated. Full writeup at .local/post-v0.174.0-threaded-dispatch.md.

v0.174.0 — Type-Feedback IC Probe Re-Run

Re-ran the MINO_CALL_SITE_SHAPES=1 instrumentation that M3 used at v0.99 against the post-v0.173.0 binary on the test suite and the mino-bench arithmetic suites. Counts hot + 90%-monomorphic-int call sites that could in principle benefit from a speculative type-feedback inline cache.

Findings

| Workload | Hot sites (>=10k calls) | Hot + mono-int (>=90%) | |---|---:|---:| | tests/run.clj | 8 | 0 | | reduce_int_bench | 1 | 0 | | recur_shape_bench | 0 | 0 | | protocol_bench | 4 | 1 (non-arith) |

The lone mono-int site (protocol_bench) resolved to a non-arith prim. Every hot canonical-arith path in the matrix is already specialised out of the call site layer by reduce_int_*, OP_LOOP_INT_LT[_INC], or the v0.173.0 range-direct pipeline.

Decision

Defer the type-feedback IC. There are no workload-substantiated hot mono-int arith sites for the IC to monomorphise. The instrumentation stays behind -DMINO_CALL_SITE_SHAPES=1 for future workloads. Full writeup at .local/post-v0.173.0-ic-probe.md.

v0.173.0 — Range-Direct Pipeline_Walk

pipeline_walk now recognises a bounded int-range source and drives the stages from an inline for (cur; cur < end; cur += step) loop. The range never has to be materialised into 32-int chunks via range_thunk -- each element is a tagged MINO_MAKE_INT(cur) with no allocation. Infinite ranges stay on the chunked path so take and friends can still terminate the walk.

Measured impact (v0.172.0 -> v0.173.0)

| Workload | v0.172.0 | v0.173.0 | Delta | |---|---:|---:|---:| | (reduce + (map inc (range 100k))) | 4.12 ms | 0.81 ms | 5.09x | | into-vec-pipeline (range 1k) | 155 us | 104 us | 1.49x | | mapv-pipeline (range 1k) | 188 us | 112 us | 1.68x | | filterv-pipeline (range 1k) | 78 us | 26 us | 3.00x | | dorun-pipeline (range 1k) | 78 us | 22 us | 3.55x |

reduce_int_range, persistent map / set / vec, and protocol- dispatch benches all stable within 2% (the range-direct path only fires when an intermediate pipeline stage sits between range and consumer).

The full profiling write-up lives at .local/pipeline-walk-profile.md.

ASan clean. 1680 tests / 7831 assertions all green.

v0.172.0 — Builder-Rewrite Coverage Probe

Adds a build-flag-gated instrumentation pair to compile.c that counts every loop form's pass through try_builder_rewrite and classifies the misses by binding count and accumulator init. Built with -DMINO_BUILDER_REWRITE_COUNTS=1 and the env var MINO_BUILDER_REWRITE_DUMP=1, a one-shot test run produces the table shown below. The production binary is unchanged: every counter and the form-dump helper live behind the #ifdef.

Findings on the v0.171.0 binary

`` hits=12 misses=43 coverage=21.8% ``

Misses break down as:

| Misses | Bindings | Acc init | |---:|---:|:---| | 21 | 4 | non-collection literal | | 9 | 2 | non-collection literal | | 6 | 4 | [] | | 3 | 6 | non-collection literal | | 2 | 6 | [] | | 2 | 4 | #{} |

The 33 misses with a non-collection acc init are correctly declined; they are not builder-pattern loops. The 10 misses with a literal [] or #{} init are candidates whose finer-grained shape (test predicate, recur step, or acc-read placement) takes them off the rewriter's narrow path. The v0.166.1 safety patch makes widening these load-bearing: without source-attribution the audit cannot distinguish "rejected for correctness" from "rejected for narrowness." Decision: keep the matcher narrow. The full report lives at .local/builder-rewrite-coverage.md.

v0.171.0 — User-Fn-Wrapping-Prim Recogniser

compile_fn_literal now stamps a wraps_prim pointer on the fn template when the body is the canonical single-form, single-arg shape that just forwards to a primitive on its lone argument -- (fn [x] (inc x)), (fn [x] (odd? x)), and friends. The shape is deliberately narrow: single arity, no destructuring, no closure over the param, body is exactly one call form, the call target is a bare symbol resolving to a MINO_PRIM, and the call's lone arg is the param symbol itself.

pipeline_fast_callable (the pipeline_walk inner-loop callable recogniser) dereferences this field so the existing FAST_INC / FAST_DEC / FAST_ODD_P / FAST_EVEN_P / FAST_POS_P / FAST_NEG_P / FAST_ZERO_P inline paths kick in for fn-wrapped calls the same way they do for bare prim references. The fn keeps its identity for slow-path fallbacks (overflow, non-int) so behaviour matches the fn-wrap's actual body.

Measured impact (v0.170.0 -> v0.171.0)

(reduce + 0 (map (fn [x] (inc x)) (range 1000))):

| Variant | v0.170.0 | v0.171.0 | |---|---:|---:| | (map (fn [x] (inc x)) ...) | ~60 us | ~52 us (-13%) | | (map inc ...) (bare) | ~57 us | ~52 us (-9%, noise) |

The wrap and bare cases now share the same inner-loop path. The absolute win is modest because the BC compiler already emits a tight (inc x) body for the wrap closure; the recogniser's value is closing the remaining apply_callable gap and matching bare performance for any future workload that traverses the same pipeline shape on a more-expensive prim.

ASan clean. 1680 tests / 7831 assertions all green (5 new tests assert semantic parity between fn-wrap and bare prim across map inc, map dec, filter odd?, and a non-matching shape that must not fire the recogniser).

v0.170.0 — In-Place Set Mutation And Set Into Fast Path

Mirrors the v0.169.0 map work onto sets. set_conj1_owned and set_disj1_owned route through the owner-tagged HAMT walks added last release; the persistent path stays the default and is reached only when the transient's owner-id space has wrapped. mino_conj_bang on a MINO_SET transient and mino_disj_bang both dispatch to the owned variants when owner_id != 0.

prim_into for MINO_SET destinations also picks up the transient fast path (mino_transient + mino_conj_bang + mino_persistent), with gc_pin on the wrapping transient for the same -O2 register-scan reason as the map branch.

Measured impact (v0.169.0 -> v0.170.0)

| Bench | v0.169.0 | v0.170.0 | Delta | |---|---:|---:|---:| | into #{} from 1000-vec | 1.02 ms | 0.41 ms | -60% (2.52x) | | transient conj! 1000 elements | 1.59 ms | 0.99 ms | -38% (1.59x) | | into #{} (range 1000) | 1.23 ms | 0.95 ms | -22% (1.29x) | | conj 1000 elements (persistent) | 1.59 ms | 1.56 ms | flat | | contains? on 1000-set | 1.67 us | 1.67 us | flat | | disj from 100-set (persistent)| 69 us | 69 us | flat |

ASan clean. 1679 tests / 7826 assertions all green (4 new regression tests cover in-place set batch correctness, mid-batch GC survival, and (into #{} ...) equivalence).

v0.169.0 — Hamt Owner Discipline And In-Place Map Mutation

assoc!, dissoc!, and (into {} ...) on map transients now route through an owner-tagged HAMT walk. mino_hamt_node_t carries a 32-bit owner field (gc-zero-init keeps every persistent allocation at owner = 0); the new hamt_assoc_owned / hamt_dissoc_owned mirror the persistent walks but mutate owner-matching nodes in place, cloning-with-owner only on the first touch. The slot writes route through a barrier that records the OLD-node -> YOUNG-slots edge so a long batch's mid-stride minor never sweeps a freshly-installed slots array.

prim_into for MINO_MAP destinations also routes through mino_transient / mino_assoc_bang / mino_persistent instead of calling prim_assoc per element. The transient is gc_pin'd across the loop so the conservative C-stack scan can't lose the wrapper to a register-only optimisation at -O2.

Measured impact (v0.168.0 -> v0.169.0)

map_bench.clj:

| Bench | v0.168.0 | v0.169.0 | Delta | |---|---:|---:|---:| | into {} from 1000-pair vec | 1.45 ms | 0.45 ms | -69% (3.24x) | | transient assoc! 1000 keys | 2.95 ms | 2.02 ms | -31% (1.46x) | | assoc 100 keys (persistent) | 219 us | 215 us | flat | | assoc 1000 keys (persistent) | 2.63 ms | 2.70 ms | +3% noise | | get on 100-key map | 1.69 us | 1.72 us | flat | | dissoc from 100-key map | 72.5 us | 71.3 us | flat |

Neighbor suites (vec_bench, reduce_int_bench, pipeline_consumers_bench, protocol_bench) within 2% of baseline.

ASan clean. 1675 tests / 7814 assertions all green (6 new regression tests cover in-place batch correctness, the flatmap-to-HAMT promotion boundary, mid-batch GC survival, and (into {} ...) equivalence with the persistent path).

v0.168.0 — Keyword-As-Fn Pipeline Fast Lane

(map :k coll) and (filter :k coll) previously went through apply_callable_argv per element, paying the keyword-as-fn dispatch cost on every record / map lookup. pipeline_fast_callable now recognises a MINO_KEYWORD callable as PIPELINE_FAST_KW, and the inline map / filter fast paths handle records (declared fields plus the ext-map fallback) and maps directly. The slow path stays in place for sorted-maps, transients, and any other coll-type that needs the full keyword-as-fn dispatch.

record_field_index (val.c) also gains a pointer-equality first pass over the field vector. Keywords are interned, so identical keywords share pointer identity; this resolves the hot path in a single load+compare instead of the byte-string memcmp that the previous code did per field.

Measured impact (v0.167.0 → v0.168.0)

protocol_bench.clj:

| Bench | v0.167.0 | v0.168.0 | Δ | |---|---:|---:|---:| | kw-fn-record-loop | 20.94 µs | 8.51 µs | -59% | | proto-mono-area | 1.85 µs | 1.86 µs | flat | | proto-bi-area | 2.35 µs | 2.37 µs | flat | | proto-tri-area | 40.95 µs | 40.01 µs | -2% | | proto-reduce-sum | 1.07 ms | 1.04 ms | -3% |

The protocol-dispatch rows are unaffected (they don't route through the pipeline apply_callable_argv path); the keyword-as-fn row drops 2.5× because the per-element keyword dispatch is now inlined.

Matrix neutral elsewhere within 2%. ASan clean. 1669 tests, 7700 assertions all green.

v0.167.0 — Forward-Counted Recur-Shape Fusion

Two new fused-loop opcodes, OP_LOOP_INT_LT and OP_LOOP_INT_LT_INC, catch the forward-counted recur shape that the existing decrement-based matcher missed:

``clojure (loop [i 0] (if (>= i N) i (recur (inc i)))) ; 1-binding (loop [i 0 j 0] (if (>= i N) j (recur (inc i) (inc j)))) ; 2-binding ``

try_compile_counted_loop now recognises (< c L), (<= L c), (>= c L), and (> L c) test shapes in either then/else order with (inc c) step. The limit operand is materialised into a fresh register at loop setup so any literal or outer-scope binding can serve as the limit. The slow path delegates to prim_lt / prim_inc so the canonical diagnostic still fires on non-int and overflow.

This closes the 0% coverage gap M5 surfaced: every bench loop today uses a 2- or 3-binding shape with (< i N) / (<=) / (>= i N) tests, none of which the prior (zero? i)/(dec i) matcher could take.

Measured impact (v0.166.1 → v0.167.0)

recur_shape_bench.clj (mino-bench, new in this release):

| Bench | v0.166.1 | v0.167.0 | Δ | |---|---:|---:|---:| | loop-1b-ge 1k | 13.16 µs | 2.58 µs | -80% | | loop-1b-ge 10k | 143.2 µs | 15.5 µs | -89% | | loop-1b-lt 10k | 132.5 µs | 21.4 µs | -84% | | loop-1b-le 10k | 128.5 µs | 20.2 µs | -84% | | loop-1b-gt 10k | 128.6 µs | 20.4 µs | -84% | | loop-2b-ge 10k | 163.1 µs | 19.7 µs | -88% | | loop-2b-lt 10k | 169.3 µs | 19.9 µs | -88% |

micro_bench.clj rows whose (loop [i 0] (if (>= i N) i (recur (inc i)))) shape now hits the fused opcode:

| Bench | v0.166.1 | v0.167.0 | Δ | |---|---:|---:|---:| | loop 1000 iters | 13.62 µs | 3.61 µs | -73% | | loop 10000 iters | 131.1 µs | 14.0 µs | -89% |

Indirect impact: reduce + over 1000 ints (whose vector-reduce inner uses a (loop [i 0] (if (< i n) ... (recur (inc i) ...))) shape inside core.clj) drops 12.96 µs → 6.61 µs (-49%).

Matrix neutral elsewhere within 2%. ASan clean. 1669 tests, 7700 assertions all green.

v0.166.1 — Builder-Rewriter Safety Patch

try_builder_rewrite (introduced in v0.166.0) now declines whenever the loop body reads the accumulator outside the bare-exit branch. The rewrite reinterprets acc as a transient inside the loop body; mino's transient protocol covers count / nth / get but not seq / reduce / = / contains? / peek / empty?. A rewrite that exposed any of those reads to a transient would either throw or silently diverge from the persistent value the user wrote.

The conservative guard rejects when acc-sym appears anywhere in <test> and in any recur-arg position outside the recognized step (including in the step's own <x> / <k> / <v> sub-expressions). The bare-exit branch remains the only allowed acc read; the loop's result is the persistent wrap of the transient at exit.

A real example exists in src/core.clj's tree-seq-style helpers, which use the (if (contains? result p) ...) shape; the rewrite's exact-shape recognizer didn't fire on those today, but the gap was a foot-gun for any future Clojure code that lands on the recognizer's shape with an acc-touching <test>.

10 new regression tests in tests/transient_test.clj cover six unsafe shapes (acc in <test> through contains? / = / empty?+peek; acc in a non-step recur arg through count; acc in the step's <x> through peek; two acc-build steps in one recur) plus four safe shapes (canonical vec-conj, map-assoc, reversed then/else order, counter-bound recur).

Matrix neutral within 2%. ASan clean. 1669 tests, 7700 assertions all green.

v0.166.0 — Builder-Pattern Compile-Time Rewrite

compile_loop now recognises the canonical persistent-builder shape and rewrites it to the transient form before emitting bytecode. The pattern:

``clojure (loop [<v1> <v1-init> ... acc []] (if <test> (recur <step1> ... (conj acc <x>)) acc)) ``

becomes:

``clojure (persistent! (loop [<v1> <v1-init> ... acc (transient [])] (if <test> (recur <step1> ... (conj! acc <x>)) acc))) ``

with sister forms covered (assoc builder step over {}; then/else branches in either order). The rewrite happens once, before emission, so all subsequent bytecode walks land on the transient variant.

This pattern was investigated and deferred at v0.160.0 -- on the old wrapper transients the rewrite was 2.5x slower than the persistent baseline because mino_conj_bang round-tripped through prim_conj's full path-copy. v0.165.0 supplied the in-place substrate; this release re-introduces the recogniser now that the substrate makes it pay off.

Measured impact (v0.163.0 → v0.166.0, defn-bound builder)

| Bench | v0.163.0 | v0.166.0 | Δ | |---|---:|---:|---:| | (loop ... (conj acc i)) N=1k | 893 µs | 228 µs | -74% | | (loop ... (conj acc i)) N=10k | 9.18 ms | 2.58 ms | -72% | | (loop ... (conj acc i)) N=100k | 92.0 ms | 26.8 ms | -71% |

The rewritten form runs within run-to-run noise of a hand-written transient builder ((persistent! (loop ... (conj! acc i)))), confirming the rewrite produces the same shape the substrate's in-place mutation has been waiting for.

Coverage: 17% of (loop ...) forms in lib/ and src/core.clj match the recogniser (M2). The rewrite is currently disabled inside macroexpanded fn literals (e.g. the inner fn that dotimes expands to), since their compilation route bypasses compile_loop; that case is queued for a later cycle alongside the broader fn-literal BC-pass coverage work.

Matrix neutral elsewhere within 2% noise. ASan clean. 1659 tests, 7690 assertions all green.

v0.165.0 — In-Place Transient Vector Mutation

(transient ...) / conj! / assoc! / pop! / persistent! over vectors now mutate owner-tagged trie nodes in place rather than path-copying on every step. Each (transient ...) mints a monotonic owner ID; the vector's trie and tail nodes carry a 32-bit owner field that the mutators check before deciding to clone or edit. The first edit through a fresh transient clones the touched node (and stamps it with the transient's owner); every subsequent edit through the same transient hits the in-place fast lane.

mino_vec_node's layout is unchanged in size: the formerly 4-byte unsigned count field is split into unsigned char count plus a 32-bit owner, with count capped well below 256 by MINO_VEC_WIDTH=32. The persistent code path (vec_conj1 / vec_assoc1 / vec_pop) clones the same number of bytes per node as before, so persistent-vector workloads stay within run-to-run noise of v0.164.0.

GC barrier discipline: in-place writes through vnode_slot_set route through gc_write_barrier, so an OLD owner-tagged node that ages across a minor GC during a transient batch keeps its remset entry consistent when freshly-allocated YOUNG values are stored into its slots. Without this every long-running transient build would silently lose the second half of any element conjed after the node aged.

Measured impact (v0.163.0 → v0.165.0)

Pipeline-fused vector consumers in pipeline_consumers_bench.clj:

| Bench | v0.163.0 | v0.165.0 | Δ | |---|---:|---:|---:| | into-vec-pipeline | 589 µs | 152 µs | -74% | | mapv-pipeline | 623 µs | 191 µs | -69% | | filterv-pipeline | 82 µs | 77 µs | -6% | | dorun-pipeline | 78 µs | 79 µs | flat |

Plain (loop ... (conj! acc i)) over (range 100k) wrapped in persistent!: 113 ms (v0.163.0) → 89 ms (v0.165.0), -21%. Plain persistent (conj v ...) 1k loop is flat within noise.

Item C (profile-driven type-feedback IC) was deferred per M3's finding that no current workload has hot non-statically-promotable arith sites; the v0.163.0 IC framework already covers the substantiating case (protocol dispatch).

Matrix neutral elsewhere within 2% noise. ASan clean. 1659 tests, 7690 assertions all green.

v0.164.0 — Unboxed Int-Acc Reducer Fast Lane

Adds an unboxed long long accumulator path to the canonical numeric reducers (+, *, -, bit-and, bit-or, bit-xor). When (reduce <op> [init] coll) is invoked with a tagged-int accumulator and the collection iterates tagged-int elements, the inner walker runs entirely in long long arithmetic, falling back to the generic reduce_step path on the first overflow or non-int element so the numeric tower stays Clojure-correct.

The unboxed-acc machinery is plumbed through every walker entry point a (reduce ...) call can reach:

Shared by a single reduce_ctx_t struct + reduce_ctx_init / reduce_ctx_step / reduce_ctx_finalize helpers. The reduce_step primitive is retained as the box-mode fallback so the numeric tower (BigInt promotion, float coercion, user reducers) is unchanged.

Measured impact (v0.163.0 → v0.164.0)

| Bench | v0.163.0 | v0.164.0 | Δ | |---|---:|---:|---:| | (reduce + vec-100k) | 505 µs | 264 µs | -48% | | (reduce + list-100k) | 856 µs | 654 µs | -24% | | (reduce + set-100k) | 466 µs | 235 µs | -49% | | (reduce + vec-1k) | 10.6 µs | 9.1 µs | -14% | | (reduce + (range 1m)) | 252 µs | 251 µs | flat (already optimal via reduce_int_range) | | (reduce bit-or 0 vec-1k) | 12.4 µs | 10.3 µs | -17% | | (reduce + (map inc (range 100k))) | 6.92 ms | 7.03 ms | flat (per-stage overhead dominates) |

Pipeline-reduce rows stay near baseline because tagged-int boxing is already free in current mino — the per-stage apply_callable dispatch is the residual cost there, not the reducer-side box. The remaining pipeline-reduce ceiling is a separate cycle's target.

Matrix neutral elsewhere within 2% noise. ASan clean. 1659 tests, 7690 assertions all green.

v0.163.0 — IC Resolve Path Consolidation

Pure refactor. Consolidates the three IC-cache consumers in the bytecode VM behind two shared resolve helpers:

GC side: the IC-slot walk that used to live as a duplicated loop in both arms of gc_trace_children (the MINO_FN walker and the GC_T_BC_FN walker) is centralised in a static gc_mark_bc_ic_slots(state, bc) helper. The slot-kind -> field mapping (GLOBAL kind walks sym + cached; PROTOCOL kind also walks atom + cached_map + cached_type) lives in one function, so the two passes can't drift if a future IC kind adds a field.

No behavior change, no new opcodes, no new IC kinds. The substantiation pays off when the next cycle adds a fourth IC consumer (e.g. a cached tail-call variant or a profile-driven specialisation) -- the new consumer plugs into the existing resolve / refill / GC-scan machinery instead of replicating any of it.

Matrix neutral as a refactor gate. Run-to-run noise on macOS is ~10% for the protocol benches; the readings below are the best of three runs to filter out runner noise:

| Bench | v0.162.0 | v0.163.0 | Δ | |--------------------------------|---------:|---------:|-----:| | empty fn call | 1588 ns | 1557 ns | flat | | identity fn call | 1650 ns | 1610 ns | flat | | 3-arg fn call | 1716 ns | 1733 ns | flat | | let binding (5) | 1632 ns | 1618 ns | flat | | fibonacci(20) | 820 µs | 809 µs | flat | | map + filter + reduce | 8210 ns | 8130 ns | flat | | proto-mono-area | 1994 ns | 2044 ns | flat | | proto-bi-area | 2725 ns | 2803 ns | flat | | proto-tri-area | 4998 ns | 5022 ns | flat |

v0.162.0 — Hot/Cold Bytecode Handler Partition

Splits the bytecode dispatch switch in src/eval/bc/vm.c along the hot/cold opcode partition surfaced by the op-count profile (the long tail of ~18 opcodes accounts for <1% of dispatches across the bench matrix). Cold opcodes -- OP_NOP, OP_GETGLOBAL (the uncached variant), OP_SETGLOBAL, OP_CALL (the uncached variant), OP_CLOSURE, OP_MAKE_LAZY, OP_PUSH_ENV / OP_POP_ENV / OP_ENV_BIND, the legacy OP_BINOP_INT, OP_POPCATCH, OP_PUSHDYN / OP_POPDYN, OP_THROW, OP_NTH_VEC, OP_EMPTY_VEC, OP_CONJ_VEC, OP_DISSOC -- move out of the dispatch switch into a static bc_cold_op helper called from the default: arm. The hot opcodes (move, load-k, getglobal-cached, jmp/jmpifnot, call- cached, protocol-call-cached, the int fast lanes, the loop-fused ops, the read-side small-prim fast lanes, assoc, tailcall, return, pushcatch) stay inlined.

OP_PUSHCATCH stays in the dispatch switch because its setjmp must execute in mino_bc_run's stack frame so the matching longjmp from a thrown exception unwinds back to the right landing pad. The cold-handler OP_THROW longjmp still targets that buf safely -- the jmp_buf lives in ctx->try_stack (heap- backed), and the longjmp unwinds the cold-handler frame on its way to the setjmp in mino_bc_run.

The refactor lets the main dispatch switch carry ~25 case labels instead of ~50, keeping clang's jump-table layout compact and the hot-op register allocation across iterations stable. Each cold opcode does enough per-invocation work (allocation, env mutation, exception unwind, longjmp setup) that the indirection cost amortizes for free, and the partition leaves room to add future hot opcodes (next cycle item: unified PGO substantiation) without bumping the dispatch switch back over clang's case-count tipping point that bit OP_TAILCALL_CACHED last cycle.

Matrix neutral as a measurement gate; the small speedups land on shapes where the hot ops fit in fewer cache lines after the ladder shrinks:

| Bench | baseline | v0.162.0 | Δ | |--------------------------------|---------:|---------:|-----:| | empty fn call | 1664 ns | 1588 ns | -5% | | identity fn call | 1651 ns | 1650 ns | 0% | | 3-arg fn call | 1770 ns | 1716 ns | -3% | | let binding (5) | 1776 ns | 1632 ns | -8% | | fibonacci(20) | 817 µs | 820 µs | 0% | | map + filter + reduce | 8853 ns | 8210 ns | -7% | | proto-mono-area | 2092 ns | 1994 ns | -5% |

fib(20) stays flat -- it is dispatch-bound and the hot ops it uses are the same case labels they were in the unpartitioned switch. The other rows pick up a few percent from clang's tighter codegen on the smaller switch.

v0.161.0 — Chunked-source Walk + Canonical-prim Stage Recognition

Two combined optimizations in pipeline_walk for the fused-pipeline path:

1. Chunked-source fast walk. When the unwound source is (or forces to) a MINO_CHUNKED_CONS, iterate the chunk's value array directly instead of calling seq_iter_val and seq_iter_next per element. The transition from chunked to any other shape (cons tail, vector, etc.) falls through to the existing seq_iter walk so non-chunked sources keep their semantics.

2. Canonical-prim stage recognition. Pre-resolve each MAP / FILTER stage's callable at walk-entry: when it's a MINO_PRIM whose argv-ABI fn2 pointer is one of the canonical prim_inc_argv / prim_dec_argv / prim_odd_p_argv / prim_even_p_argv / prim_pos_p_argv / prim_neg_p_argv / prim_zero_p_argv (covers inc, dec, odd?, even?, pos?, neg?, zero?), inline the operation on tagged int elements without going through apply_callable_argv. The tagged-int fast path uses one shift + one compare + one make- int per element; on overflow or non-int it falls back to apply_callable_argv so Clojure's promotion semantics stay correct. Var-deref upfront so a stage whose callable is a MINO_VAR pointing at the prim still hits the fast path.

A new pre-stage struct fast_kinds[] stores the resolved kind per stage. The per-element inner loop is unified across the chunked and seq-iter walks via pipeline_apply_stages, which uses the fast-kind classification when it fires and falls back to the existing apply path otherwise.

The wins are largest on shapes where the stage callable IS one of the recognized canonicals -- the common (map inc ...) / (filter odd? ...) / (map dec ...) combinations:

| Bench | baseline | v0.161.0 | Δ | |----------------------------------|---------:|---------:|-----:| | reduce + map inc (range 1m) | 91 ms | 74 ms | -19% | | reduce + filter odd? (range 1m) | 95 ms | 86 ms | -9% | | reduce + map dec . filter odd? . (range 1m) | n/a | 90 ms | new |

Falls short of the plan's 3-10x target because the per-iter floor in mino's bytecode VM (reduce_step's apply_callable_argv(+, ...) dominates after the stage inlining). Closing the rest requires inlining the reduce step's + too, which lands as part of a later PGO substantiation cycle (v0.163.0) and the OP_TAILCALL_CACHED follow-on.

Non-canonical stages keep the existing apply_callable_argv path and pay one (cheap) function-pointer compare per stage at walk entry. Existing bench rows that don't go through pipeline_walk are unaffected: (reduce + 0 (range 1m)) stays on reduce_int_range's short-circuit path (270 µs).

Added

Verification: 1 659 tests / 7 690 assertions green.

v0.160.0 — Builder-pattern Recur Fusion: Investigation, Deferred

Spike of a compile-time transient rewrite for the canonical builder loop:

``clojure (loop [i 0 acc []] (if <test> (recur (inc i) (conj acc x)) acc)) ``

The compiler would have recognized this exact shape and rewritten it to wrap acc in transient at entry, replace conj with conj! inside the recur step, and call persistent! at exit. The plan expected a 2-5x speedup on the bench (build 1000) from skipping per-iter persistent-vector path-copies.

Measurement shows a 2.5x slowdown, not a speedup. Three thousand warm iters of (build 1000) on Apple M-class hardware:

| Variant | mean ns/op | |------------------------|-----------:| | baseline (no rewrite) | 241 - 264 | | rewritten (transients) | 653 - 704 |

The cause is that mino's transient implementation in src/collections/transient.c does not actually mutate in place. The transient wrapper holds a current slot that gets reassigned on every conj!, but conj! itself calls prim_conj on the inner persistent collection and writes the freshly path-copied result back. Each conj! does strictly more work than a plain conj: prim-call dispatch overhead + a transient slot write on top of the same persistent-vector allocation.

A real in-place transient (mutable tail buffer for vectors, owner- tagged HAMT nodes for maps) is the prerequisite for this rewrite to pay off. That's a sizeable change to vec.c / map.c and deserves its own cycle. The recognizer + rewriter that this release would have shipped is documented in .local/v0_160_0-finding.md (gitignored) with a re-introduction checklist for the post-real-transients window.

This release is docs-only -- no behavior change, no opcode change, no public API delta. The version bump keeps the cycle numbering aligned with the original plan (v0.158.0 protocol IC -> v0.159.0 seq-fusion -> v0.160.0 [deferred] -> v0.161.0 chunked-seq).

Verification: 1 659 tests / 7 690 assertions green.

v0.159.0 — Pipeline Fusion For The Seq-consumer Surface

Extends v0.157.0's reduce-pipeline fusion to the rest of the seq-consumer primitives: into (vector target), mapv, filterv, and dorun. When any of these is called against a chain of lazy map / filter / take (the shape (->> coll (map ...) (filter ...) (take ...))), the consumer now walks the bottom source once through the unwound stages and applies its own step inline — same machinery prim_reduce already used, with the per-element accumulator hook abstracted behind a callback.

The refactor splits reduce_pipeline_walk into a generic pipeline_walk(src, stages, n_stages, step_fn, ctx, env) and keeps reduce_pipeline_walk as a thin wrapper. Each new consumer fast path prepends its own stage when relevant (mapv adds a MAP stage carrying its fn; filterv adds a FILTER stage carrying its predicate) and supplies a transient- vector step that conj-bangs survivors into the accumulator. into reuses the same step shape with the user-supplied target as the transient seed; dorun uses a no-op step and relies on the take-exhausted return code for early stop.

Lazy-seq cells along the map / filter / take chain are not allocated at all — the same alloc win the reduce path already enjoyed. The pre-check coll_is_pipeline_head keeps the cost of "non-pipeline source" calls unchanged.

| Bench | baseline | v0.159.0 | Δ | |-------------------|-----------:|-------------:|-----:| | into-vec-pipeline | 1 833 µs | 601 µs | -67% | | mapv-pipeline | 1 610 µs | 590 µs | -63% | | filterv-pipeline | 1 584 µs | 100 µs | -94% | | dorun-pipeline | 1 503 µs | 101 µs | -93% |

All four pipelines walk (->> (range 10000) (map inc) (filter odd?) (take 1000)) into a fresh consumer per iter. The filterv / dorun deltas are the largest because their post-stage work is minimal: filterv with a predicate that rejects every survivor allocates nothing past the empty result vector, and dorun discards. The into / mapv deltas reflect the transient- vector accumulator carrying a real 1 000-element output but skipping the 3 000 intermediate lazy-seq cells.

Out of scope this release: prim_apply over a fused seq (the apply ABI runs through a cons spine, harder to short-circuit without breaking apply's other shapes); prim_into map / set / sorted target paths (each uses its own conj semantics, not the transient-vector step). transduce / doseq live in core.clj and get the fusion transitively via reduce.

Added

Verification: 1 659 tests / 7 690 assertions green.

v0.158.0 — Protocol-keyed Inline Cache

Cuts the cost of a direct protocol-method call site by roughly 2.4x. Each (area c) style call now compiles to a new opcode OP_PROTOCOL_CALL_CACHED (and a tail-position twin OP_PROTOCOL_TAILCALL_CACHED) that caches the resolved implementation keyed by the dispatch atom's deref'd map pointer and the first arg's type discriminator. On a hit, the call skips the protocol-dispatch trampoline entirely and hands the args straight to apply_callable_argv. No symbol resolution, no get against the dispatch map, no cons-spine traffic on the hot path.

The compile-time recognizer matches the macroexpanded shape of each defprotocol-emitted dispatcher fn (single-arity body (protocol-dispatch <atom> "<mname>" & params)) and resolves the dispatch atom at compile time. The IC slot type fans out through a kind discriminator added to mino_bc_ic_slot_t; GLOBAL slots (OP_GETGLOBAL_CACHED / OP_CALL_CACHED) keep their old {sym, cached, gen} discipline, PROTOCOL slots add {atom, cached_map, cached_type} and reuse cached for the impl fn.

Cache invalidation: pointer-comparison against the atom's current val field. Any swap! / reset! against the dispatch atom (which is exactly what extend-type / extend-protocol do) installs a fresh map; the next call misses and refills.

Tail-position protocol calls fan out through a direct apply_callable_argv call rather than the MINO_TAIL_CALL sentinel trampoline. The trade-off: a protocol method that tail-calls the same protocol method on a different value grows the C stack linearly. Self-tail-recursive protocols are rare enough that this is the right default for the common-case throughput win. Internal tail-recursion inside the impl body is unaffected — apply_callable_argv carries its own trampoline.

| Bench | v0.157.1 | v0.158.0 | Δ | |-------------------|---------:|---------:|-----:| | proto-mono-area | 4 832 ns | 2 092 ns | -57% | | proto-bi-area | 5 426 ns | 2 691 ns | -50% | | proto-tri-area | 43 µs | 46 µs | noise | | proto-reduce-sum | 1.13 ms | 1.18 ms | noise | | kw-fn-record-loop | 21 µs | 23 µs | noise |

The mono / bi-morphic cases drop ~50-57% (~2.4x faster). The megamorphic three-type case stays flat — case overhead dominates that bench, not protocol dispatch. The reduce-sum and kw-fn loops also stay flat: those pass area as a value through map, where dispatch goes through apply_callable_argv's slow path rather than the call-site IC. Closing that gap is a follow-up via per-protocol-fn IC.

Added

Verification: 1 659 tests / 7 690 assertions green.

v0.157.1 — Per-opcode Dispatch Counter Build Flag

Adds MINO_BC_OP_COUNTS=1 build flag that wires a per-opcode dispatch counter into vm.c and dumps the totals to stderr at process exit. Useful for VM perf work — answers "which opcodes actually dominate the dispatch loop?" without resorting to sample-based profiling. The flag adds one branch + one increment per dispatch when set; production builds (no flag) are byte-identical.

Build with the flag:

`` cc ... -DMINO_BC_OP_COUNTS=1 ... -o mino_opcounts ... ./mino_opcounts your_script.clj 2> opcounts.txt ``

Output is sorted by frequency, includes percentage and cumulative percentage. Captured findings from this build informed the post-v0.157.0 VM perf plan (in .local/, gitignored): 18 of 63 opcodes account for ~99% of dispatches across the microbench suite, which validates the hot/cold partition direction.

Added

Verification: 1 659 tests / 7 690 assertions green on release.

v0.157.0 — Transducer Fusion For Reduce Pipelines

(reduce f init (->> src (map ...) (filter ...) (take ...))) no longer materialises the intermediate lazy seqs. prim_reduce now inspects its coll argument; when the outer cell is one of the canonical map / filter / take LAZY thunks, it unwinds the chain by walking the thunk pointers, collects the stages, and walks the bottom source element by element while applying each stage inline. Cons cells from the map/filter/take stages are not allocated at all on the fused path.

Per-element calls into a stage's fn / pred / counter route through apply_callable_argv instead of building a one-cell cons-spine per call. The five common numeric predicates (zero?, pos?, neg?, odd?, even?) gained argv-ABI variants so a (filter odd? ...) stage hits the cons-free fast path. inc, dec, +, and friends already had argv variants from previous cycles.

Soundness is automatic. The unwinder only matches LAZY cells whose c_thunk pointer is the exact canonical lazy_map1_thunk / lazy_filter_thunk / lazy_take_thunk. A user-defined shadow of map / filter / take produces LAZY cells with different thunks (or no LAZY at all), the unwinder stops, and the regular slow path takes over. reduced short-circuit, exception propagation, and chunked-source forcing all re-use the existing reduce_step and seq_iter machinery — the fusion is structurally a different walk order over the same primitives, not a new semantic.

What this leverages from Clojure: transducer-shaped pipelines are the canonical Clojure pattern for sequence processing precisely because the intermediate seqs are wasted work; the JVM Clojure core gets the same effect from transduce and IReduceInit. mino emits the lazy chain at construction time (matching seq-Clojure's implicit chain shape) but recognises it at consumption time inside reduce, so the user writes the natural ->> form and pays the fused cost.

Benchmark matrix on local Mac M-class, median of 5 full microbench passes (raw runs preserved in .local/bench/v0_157_0_after_run*.txt):

| Benchmark | v0.156.0 | v0.157.0 | Δ | |----------------|--------------|--------------|----------| | pipeline-sum | 93 520 ns | 21 350 ns | -77% | | fib-30 | 96 821 000 ns | 96 397 600 ns | flat | | loop-recur-1M | 18 574 500 ns | 18 565 000 ns | flat | | dissoc-map | 939 ns | 880 ns | -6% (noise) |

Allocation profile for the same pipeline-sum driver (100 iters, mino_prof with -DMINO_ALLOC_PROFILE=1): total allocations dropped from 117 630 to 16 730 (-86%). Lazy-seq cells from the map / filter / take stages are gone; the remaining cons traffic is the chunked-cons walk through (range 1000)'s own backbone.

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.156.0 — Generic Get And Dissoc Fast Lanes

Two map-side coverage extensions on top of v0.154.0's read-side fast lane. (get coll k) on a MINO_MAP now fires the fast lane for any hashable key, not just keywords -- string-keyed config maps, int-keyed lookup tables, and symbol-keyed env maps no longer fall through to prim_get for the lookup. (dissoc m k) gets a dedicated arity-2 opcode that mirrors OP_ASSOC's shape: type- guards MINO_MAP, calls mino_map_dissoc1 directly, falls back through prim_dissoc on anything else (sorted-map, transient, nil, non-map) to preserve full Clojure semantics.

Record-side semantics are unchanged: declared fields are interned by keyword identity, so (get record key) still requires a keyword key to hit the slot-index path. Non-keyword keys on records fall through to prim_get which scans the optional ext- map -- same behaviour as before. Variadic (dissoc m k1 k2 ...) keeps the OP_CALL path so prim_dissoc's loop handles the key list including absent-key short-circuits.

What this leverages from Clojure: maps treat any hashable value uniformly as a key (the HAMT's hash + equality contract doesn't care about the key's runtime type), so the keyword-only outer guard the v0.154.0 code carried was a coverage limit, not a semantic constraint. Dissoc on a map is a structural rebuild, so the operation itself is the bottleneck; the fast lane saves the prim's outer arg validation and cons traffic.

Benchmark matrix on local Mac M-class, min-of-5, with the empty- thunk harness floor subtracted:

| Benchmark | v0.155.0 | v0.156.0 | Δ | |--------------|-----------|-------------|-------| | get-str-map | 145 ns | 28 ns | -81% | | dissoc-map | 1 183 ns | 930 ns | -21% | | get-kw-map | 20 ns | 21 ns | flat |

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.155.0 — Inline-Cached Call Sites

Non-tail call sites whose head is a global symbol now compile to a new OP_CALL_CACHED opcode that fuses the name resolution and the dispatch into a single inline-cached step. The previous shape emitted OP_GETGLOBAL_CACHED to load the callee into a register and then OP_CALL to invoke it; the cached call collapses both into one opcode, drops the temporary fn-reg from the register window, and re-uses the same ic_slot discipline that already backs OP_GETGLOBAL_CACHED (per-site (sym, cached, gen) triple, global S->ic_gen invalidation on def / alter-var-root / ns-unmap / var-unintern / OP_SETGLOBAL).

Dynamic bindings and closure captures still shadow even on a hot call site: the handler probes the dyn stack and the env chain before reading from the slot, so (binding [*x* ...] (foo)) and the inner reference inside (fn [foo] (foo x)) keep their full semantics. Tail-position calls continue to emit OP_TAILCALL so the trampoline keeps the C stack flat; a cached tail variant is a follow-up.

What this leverages from Clojure: var rebinding is the only way a top-level callee resolution can change, and each rebinding path (def, alter-var-root, namespace unmap, var unintern) already bumps the IC generation. The cached pointer is therefore always the current var root for the duration of one IC generation, and the resolution that the previous OP_GETGLOBAL_CACHED performed to load the callee is exactly the resolution the call site needs.

Benchmark matrix on local Mac M-class, min-of-5, with the empty- thunk harness floor subtracted:

| Benchmark | v0.154.0 | v0.155.0 | Δ | |-------------------|---------------|-------------------|--------| | fib-30 | 97 630 000 ns | 85 044 000 ns | -13% | | loop-recur-1M | 17 715 000 ns | 16 077 000 ns | -9% | | fn-call-identity | 63 ns | 60 ns | flat | | call-noop-1M | 16 ns | 14 ns | flat |

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.154.0 — Record Fast Path And Keyword-As-Fn Inlining

Two bytecode-VM tightenings around the most common record / map access patterns. (get coll :kw) and (:kw coll) now share one fast path that handles both maps and records: the map path is the existing HAMT lookup, the record path is a fixed-slot read after a declared-field index scan. The compiler also recognises the keyword-as-fn invocation shape (:kw coll) at emit time and turns it into the same fast lane that (get coll :kw) uses, so (.field record) style accessors no longer pay the apply-callable keyword-as-fn dispatch tax.

User shadows, sorted-coll keys, 3-arg get with a default, and records whose ext-map carries the key all fall through to prim_get, keeping their full Clojure semantics.

What this leverages from Clojure: records are immutable structs with a fixed declared-field set, so the slot index is stable for the lifetime of the record type. The keyword-as-fn invocation is a documented Clojure shape ((:k m) -> (get m :k)), and the compiler can statically rewrite a literal-keyword head into the get path without changing the surface contract.

Benchmark matrix on local Mac M-class, min-of-5, with the empty- thunk harness floor subtracted:

| Benchmark | v0.153.0 | v0.154.0 | Δ | |----------------|------------|---------------|--------| | get-kw-record | 125 ns | 9 ns | -93% | | kw-fn-record | 70 ns | 11 ns | -84% | | kw-fn-map | 78 ns | 18 ns | -77% | | get-kw-map | 22 ns | 20 ns | flat |

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.153.0 — Small-Prim Inlining For Vectors

The single-arg seq prims first, count, and empty? now compile to dedicated bytecode opcodes when the argument is a vector at runtime. Hot iteration patterns like (when-not (empty? v) (first v)) and (loop [i 0] (when (< i (count v)) ...)) no longer pay the call dispatch + argv cons + prim resolution cost on every iteration; instead the runtime walks a single type guard and reads the underlying field directly.

Misses fall through to the canonical prim: lazy seqs, chunked conses, strings, maps, sets, sorted-colls, host arrays, map entries, and nil / empty-list all keep their full Clojure semantics (lazy-seq forcing, string code-point decode, map-entry key, count-of-non-vector paths, etc.).

User shadows defeat the emission via the same head_is_canonical_pure_prim gate the existing read-side and write-side fast lanes use.

What this leverages from Clojure: first, count, and empty? are referentially transparent over immutable collections; on a vector the answer is a direct read of a stable field (vec.len, vec_nth(coll, 0)), so the fast lane is a pure value-rewrite of the call-site that preserves the surrounding semantics on miss.

Benchmark matrix on local Mac M-class, min-of-5, with the empty- thunk harness floor subtracted:

| Benchmark | v0.152.0 | v0.153.0 | Δ | |--------------|------------|---------------|--------| | count-vec3 | 79 ns | 5 ns | -94% | | first-vec | 85 ns | 6 ns | -93% | | empty?-vec | 83 ns | 0 ns | -100% | | nth-vec | 7 ns | 9 ns | flat | | get-kw-map | 19 ns | 20 ns | flat | | conj-vec | 260 ns | 251 ns | flat | | assoc-vec | 196 ns | 235 ns | flat | | fib-30 | 98.0 ms | 96.1 ms | flat | | loop-recur-1M| 17.9 ms | 17.6 ms | flat |

(empty?-vec hits zero because the compiler folds the literal-true result into a const before the fast lane runs; the fast lane still fires when the operand isn't compile-time constant.)

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.152.0 — Write-Side Fast Lanes In The Bytecode VM

(conj v x) on vectors and (assoc coll k v) on vectors or maps now compile to dedicated bytecode opcodes instead of going through the generic OP_CALL path. The new opcodes mirror the read-side fast lanes shipped earlier for (nth v i) and (get m :k): a type-guarded fast path inside mino_bc_run calls directly into vec_conj1 / vec_assoc1 / mino_map_assoc1, and any miss -- list conj, set conj, sorted-coll, record, transient, variadic forms -- falls back to prim_conj / prim_assoc so the full Clojure-semantics path stays intact.

User shadows defeat the fast lane the same way they defeat the existing read-side lanes: head_is_canonical_pure_prim checks that the symbol still resolves to the canonical C prim before the compiler emits the specialised opcode.

What this leverages from Clojure: conj and assoc are referentially transparent over immutable collections, so the fast path is a pure value-to-value rewrite of the call-site; the IC-gen-based redefinition machinery means a later (def conj ...) invalidates compiled call-sites by re-compile, not by holding a stale fn pointer in any cached slot.

Benchmark matrix on local Mac M-class, min-of-5, with the empty- thunk harness floor subtracted:

| Benchmark | v0.151.1 | v0.152.0 | Δ | |--------------|----------|---------------|--------| | conj-vec | 392 ns | 260 ns | -34% | | assoc-vec | 451 ns | 196 ns | -56% | | assoc-small | 623 ns | 422 ns | -32% | | fib-30 | 98.2 ms | 98.0 ms | flat | | loop-recur-1M| 17.9 ms | 17.9 ms | flat | | nth-vec | 11 ns | 7 ns | flat | | get-kw-map | 22 ns | 19 ns | flat |

(The benchmarks where the ns/op is in the single digits sit below the harness floor; numbers there report noise, not signal. The matrix rows that move are the ones the new opcodes target.)

Added

Verification: 1 659 tests / 7 690 assertions green on release, ASan, UBSan.

v0.151.1 — Embedding API Hardening

Five adversarial-test follow-ups on the v0.151.0 embedding-API revamp. Two NULL-input crashes in the public reader and string-eval entry points are gone, the iterator now walks sorted maps and sets, the protected-call _ex family delivers the raw thrown payload as documented, and mino_to_int accepts bigints so the bignum auto-promote round-trip closes.

Fixed

v0.151.0 — Embedding API Revamp And Stabilization

The C embedding surface in src/mino.h gets a substantial cleanup. The header trims to ~145 public functions (down from ~174), the body of struct mino_val becomes opaque to embedders, the pointer-tag scheme moves to a private companion header, the 22 parallel mino_install_<cap> entry points collapse to a single mino_install(S, env, caps) driven by a registry, and a small set of provisional surfaces is explicitly marked unstable.

This is a substantial reduction in the surface and a cleaner contract for embedders. The remaining unstable bits (GC tuning, thread-pool ABI, allocation profiler) are labelled as such so they can keep evolving without surprising callers.

Added: New Public Surface

mino_typeof for type dispatch without struct reach-in. A complete predicate grid (mino_is_int / _float / _string / _symbol / _keyword / _char / _bool / _vector / _map / _set / _fn / _macro / _prim / _lazy / _var / _bigint / _ratio / _bigdec / _uuid / _regex). Symmetric extractors (mino_to_keyword / _symbol / _char). Structured error access via mino_error_kind / mino_error_code / mino_clear_error. The eval-family _ex matrix (mino_eval_ex / mino_eval_string_ex / mino_load_file_ex) matching the existing mino_pcall precedent. Collection builders (mino_vector_builder_* / mino_map_builder_* / mino_set_builder_*) wrapping transients with explicit names. A unified collection iterator (mino_iter_init / _next / _done plus mino_iter_sizeof) that walks vectors, maps, sets, lists, lazy seqs, and chunked seqs through one API. Storage is host-owned, so embedders can stack-allocate the iterator with alloca(mino_iter_sizeof()) instead of paying a malloc per walk. mino_print_to_buf for embedders without a FILE *. mino_agent_deref for parity with atom / volatile / ref deref.

Fixed: Host Interop Resolves Across Namespaces

host/new, host/call, host/get, and host/static-call are installed under literal slash-names in clojure.core; with the namespace-first model a user-namespace caller's env no longer chains to clojure.core, so the interop special forms ((new Type ...), (.method t ...), (.-field t), Type/static) would fail to find their backing primitive. Both the tree-walker's eval_try_host_syntax and the BC compiler's dispatch now route host sugar through a direct clojure.core lookup. Plain qualified source like (host/new :Foo) also resolves from any namespace.

Changed: `mino_int` Auto-Promotes To Bigint With Bignum

mino_int(S, n) now checks MINO_CAP_BIGNUM for the tag-overflow path. With bignum installed, values outside the 61-bit tag range return a MINO_BIGINT instead of a boxed MINO_INT, so embedders that opt in to arbitrary-precision arithmetic get a single int family that grows past 64 bits transparently. Without bignum the boxed fallback stays in place and mino_int is total over long long as before. Runtime internals (reader, bit primitives, unchecked-*, tower-terminal coercion) keep Clojure-style "long stays long" semantics by routing through a private mino_int_wrap constructor.

Changed: Data-Driven Capability Install

mino_install(S, env, caps) takes a MINO_CAP_* bitmask and dispatches through a registry table that owns each capability's install function. Three named presets cover the common shapes: mino_install_minimal (floor only), mino_install_sandbox (floor + canonical Clojure-core + safe libs), mino_install_all (every capability). New bits cover the bundled stdlib namespaces (MINO_CAP_STRING_LIB, MINO_CAP_SET_LIB, MINO_CAP_WALK, MINO_CAP_EDN, MINO_CAP_PPRINT, MINO_CAP_ZIP, MINO_CAP_DATA, MINO_CAP_TEST, MINO_CAP_REPL_LIB, MINO_CAP_DATAFY, MINO_CAP_INSTANT, MINO_CAP_SPEC, MINO_CAP_TOOLING). MINO_CAP_DEFAULT and MINO_CAP_ALL macros name the common combinations. MINO_CAP_FLOOR is always installed implicitly. mino_install is idempotent and skips re-evaluating core.clj on subsequent calls so adding capabilities to a running state cannot shadow user bindings.

Renamed: `mino_new` → `mino_env_new_default`

mino_new read like a value constructor. The replacement mino_env_new_default allocates a new env and installs the sandbox preset in one call. No compat shim — hosts that want a specific tier should call mino_env_new + mino_install(S, env, caps).

Removed: Public Per-Capability Install Entry Points

The 22 mino_install_<cap> functions leave the public header; their implementations move behind the dispatch registry. Hosts that need fine-grained control use mino_install(S, env, MINO_CAP_<CAP>). The install_fn field on mino_capability_info_t is gone — the struct is now just {name, bit, summary}.

Removed: Struct Reach-In, Tag Macros, Test Helpers

The body of struct mino_val, MINO_TAG_*, MINO_MAKE_*, MINO_*_VAL, the host_array_kind_t enum, fault-injection helpers, and the chunked-seq constructors leave the public header. They live in a new private companion header src/mino_internal.h used by runtime internals and whitebox tests. Embedders go through the public predicate / extractor / iterator surface instead.

Marked Unstable: GC, Thread Pool, Allocation Profiler

Three sections of the public header now carry explicit [MINO_UNSTABLE_*] banners and rendered "subject to change" badges on the documentation site. The contracts are functional but provisional and may evolve in subsequent releases; symbols outside these blocks aim for source stability.

Verification

Built and ran the full test suite (1659 tests, 7690 assertions, 0 failures) under both AddressSanitizer and UndefinedBehaviorSanitizer. Built representative mino-examples binaries (cookbook/iterate, cookbook/build_collections, cookbook/error_handling, cpp_embed_test, api_stress_test) under ASan and ran each end to end — clean.

v0.150.0 — Stabilization Cycle: Realloc Safety, Checked-Size Arithmetic, And Embed-Test Tagging Fixes

Stabilization cycle landing the verified-real findings from a ship-readiness review. Highlights: nine more realloc overwrite leaks in src/prim/string.c follow the canonical temp-pointer pattern; a new checked_add_sz / checked_mul_sz / checked_double_sz helper trio guards growth arithmetic across env, state, module, eval/read, and prim/io; gc_pin now asserts on overflow in debug builds; mino_safepoint_poll becomes a static inline; add-load-path! swaps strcpy for memcpy; a stale TODO on the dyn_snapshot field is replaced with a description of what the field actually holds. embed_stm_test no longer crashes on small-integer agent values -- the test was reading the inline- tagged scalar as if it were a boxed pointer; six sites now route through the public mino_to_int API.

Fixed: `embed_stm_test` Crashed Reading Inline-Tagged Ints

tests/embed_stm_test.c accessed a->as.agent.val->type and a->as.agent.val->as.i directly across the test_c_api_agents block, and args->as.cons.car->type / ->as.i inside the prim_test_sleep C-side primitive. When the agent's value was a small integer the runtime stored it inline-tagged rather than as a boxed pointer, so dereferencing the "pointer" landed on whatever address the tag bits encoded. Stripped binaries crashed with EXC_BAD_ACCESS at the encoded address (0x11 = decimal 17, which is the tag for the integer 2 -- the test's expected post-inc value). All six unsafe sites now route through mino_to_int, the same public API the rest of the test already uses for the cross-state agent check. No runtime change.

Fixed: Stale `TODO` On The `dyn_snapshot` Field

The mino_future_t::dyn_snapshot field in src/runtime/internal.h was commented /* TODO: dyn-var conveyance */, but the conveyance had in fact been implemented in src/runtime/host_threads.c (snapshotting in mino_future_spawn, unpacking into a dyn_frame on the worker side, plus GC tracing in gc/driver.c and gc/minor.c). The comment now describes what the field actually holds.

Changed: `add-load-path!` Uses `memcpy` Now That The Length Is Known

prim_add_load_path! in src/prim/module.c allocated malloc(strlen(path) + 1) and then ran strcpy(dup, path). The length is already in hand at the malloc, so reusing it via memcpy(dup, path, path_len + 1) avoids a second pass over the string and removes a strcpy from the codebase. No behavior change.

Changed: `mino_safepoint_poll` Is Now A `static inline` Function

The safepoint-poll fast path lived as a function-style macro at the top of src/runtime/internal.h. Behaviorful macros are harder to step through under a debugger and don't get the same type-checking as real functions. The body is now a static inline taking mino_state_t *S, which inlines identically at every call site (gc/driver.c, eval/special.c, eval/bindings.c, eval/fn.c) and gives debuggers a real stack frame to break on. No behavior change.

Fixed: `gc_pin` Overflow Now Trips An Assert In Debug Builds

gc_pin keeps its pin/unpin counter balanced even after the 64-slot save array fills up -- by design, so a runaway pin doesn't crash the host -- but the slots past 64 were dropped silently. A deeply-nested test that ever reached past 64 would lose liveness protection without any signal. The macro now begins with assert(gc_save_len < GC_SAVE_MAX), which fires loudly in debug / sanitizer builds while release builds keep the documented soft-fail counter contract intact. The full mino test suite and the ASan binary run clean against the assert; no current path reaches the cap.

Fixed: Unchecked Format-Buffer Growth In `src/prim/io.c`

The pr/prn readably-print path appended each formatted argument's bytes into a heap buffer using len + formatted->as.s.len and nc *= 2 with no overflow guard. A pathological huge-value print could have wrapped either the size sum or the doubling loop and under-allocated the destination buffer. Both are now gated by checked_add_sz / checked_double_sz, with the existing "print: out of memory" diagnostic raised on overflow.

Fixed: Unchecked Element-Array Sizing In `src/eval/read.c`

The anonymous-fn (#(...)) expansion path in normalize_percent allocates a heap items buffer of len * sizeof(*items) when the vector overflows the 64-slot stack-fallback array. The multiplication had no overflow guard; for an attacker-controlled reader input large enough to wrap, the buffer would have been under-allocated and the subsequent fill loop would have written past it. The path now pre-computes the allocation size through checked_mul_sz and routes overflow into the existing reader OOM diagnostic.

Fixed: Unchecked Growth Arithmetic In `env`, `state`, And `module`

Several internal dynamic-growth paths computed cap * 2, len + add + 1, or cap * sizeof(T) directly with no SIZE_MAX guard. On wraparound the runtime would have under-allocated and silently corrupted the following memcpy or fill. A new trio of static inline helpers -- checked_add_sz, checked_mul_sz, checked_double_sz -- in src/runtime/internal.h returns 1 on success / 0 on overflow; each caller routes the overflow case into the same OOM diagnostic path it already uses for realloc / gc_alloc_typed failure. To make the GC allocator's existing throw reachable from the non-GC paths, gc_oom_throw is no longer static and is declared in src/gc/internal.h.

Guarded sites: env_ht_rebuild and env_bind_impl in src/runtime/env.c; dup_str and runtime_module_add_alias in src/runtime/module.c; the REPL line-append buffer growth in mino_repl_eval in src/runtime/state.c. The REPL site is the one input-controlled case (a host could feed a very large line into the REPL); the others guard wraparound classes that are extremely hard to reach in practice but were called out by the project's own runtime rules.

Fixed: Nine More `realloc` Overwrite Leaks In `src/prim/string.c`

v0.149.1 swept the fmt_ensure helper but left nine sister sites across the other string builders -- (prn ...), (str ...), the sequence-(str/join ...) path, (str-replace ...), and the default-branch printer fallback. Each one used buf = (char *)realloc(buf, cap); if (buf == NULL) return NULL;, which clobbers the still-valid buf with NULL on failure and leaks the partially-built buffer. The branches now take the canonical newbuf = realloc(buf, ...); if (newbuf == NULL) { free(buf); ... return NULL; } buf = newbuf; shape, matching the precedent in src/prim/proc.c that v0.149.1 called out. Sites: string.c:300, 593, 666, 679, 846, 875, 942, 1026, 1047 (pre-edit line numbers). The BIGINT branch at line 875 already freed digits on failure -- it now frees buf as well. macOS ASan ships without LSan, so the leak is not directly observable from the project's test runner; correctness here is by static reading. The existing tests/fault_inject_test.clj happy paths stay green.

v0.149.1 — Hash Contract, Sorted-Collection Counts, Error-Path Metadata, And OOM Cleanup

Fixed: Const-Qualifier Mismatch In Arity-Mismatch Diagnostic Helpers

The recent arity-mismatch diagnostic helpers in src/eval/bindings.c, src/eval/fn.c, and src/eval/bc/vm.c copied mino_current_ctx(S)->eval_current_form into a non-const local. The field is const mino_val_t *; the assignment dropped the qualifier, which trips -Werror on the strict Makefile bootstrap build (Apple clang 17). The task-runner build path uses softer flags and didn't surface the warning. Receiving locals now match the declared type.

Fixed: `catch` Dropped Metadata From The Thrown Value

(let [c (try (throw (with-meta (ex-info "x" {}) {:my :m})) (catch e e))] (meta c)) returned nil instead of {:my :m}. normalize_exception in src/eval/control.c builds a fresh diagnostic map whenever the throw isn't already a :mino/kind-tagged value, and the fresh map had no metadata -- silently dropping anything the user (or ex-info's 3-arity cause-via-meta path) attached. Now the normalizer copies ex_val->meta onto the diagnostic when the thrown value is a pointer-tagged allocation (tagged primitives like ints can't carry meta and would segfault on deref). A regression test in tests/reader_macros_test.clj (caught-exception-preserves-metadata) pins the round-trip for both ex-info and plain-map throws.

Fixed: `ex-info` Now Accepts A 3-Arity `(ex-info msg data cause)`

(ex-info "outer" {} (ex-info "root" {})) -- a routine Clojure idiom for building error cause chains -- raised fn 'ex-info' arity mismatch: got 3, expected 2. The defn in src/core.clj only had the 2-arg form. The 3-arity now attaches the cause via metadata, which the existing ex-cause already walks; the visible map shape stays identical to the 2-arity form so the existing ex-info-creates-map and ex-cause-from-data-or-meta tests still pass unchanged. New regression tests in tests/reader_macros_test.clj (ex-info-three-arg-attaches-cause) pin (ex-message e), (ex-data e), and (ex-cause e) for the 3-arity output, plus a 3-deep cause chain walk.

Fixed: `(sh ...)` Routed `pclose`'s `-1` Sentinel Through `WIFEXITED`

prim_sh in src/prim/proc.c jumped straight from status = pclose(fp); into WIFEXITED(status) without first guarding against the -1 sentinel. pclose (and _pclose on Windows) returns -1 when it could not obtain the subprocess's wait status -- an interrupted wait, a child that was reaped elsewhere, or any other underlying waitpid failure. POSIX does not define the result of WIFEXITED(-1), and the macro produced nonsense values that flowed into the caller's :exit map entry. The teardown failure was silently dropped and the shell call appeared to have "succeeded" or "failed weirdly". The branch now detects the -1 sentinel, frees the just-read stdout buffer, and raises an io / MIO001 error so the caller sees the teardown failure. No mino-level regression test is feasible (forcing pclose failure from script isn't easy); the fix is by static reading and the existing tests/proc_test.clj happy paths stay green.

Fixed: `(sh ...)` Leaked Its Working Buffer When `realloc` Failed

Two sites in src/prim/proc.c used buf = realloc(buf, cap); if (buf == NULL) return NULL;. When realloc fails the C standard leaves the original buf valid, but the assignment clobbers the variable with NULL and the caller has no way to free the original. This affected the command-line build path (build_command, growing the escaped-arguments string) and the subprocess-stdout read path (read_all). Both now use a temporary newbuf, free buf on failure, and only assign on success -- matching the pattern already used at prim_sh line 190 for the 2>&1 realloc. macOS ASan ships without LSan, so the leak is not directly observable from the project's test runner; correctness here is by static reading.

Fixed: `fmt_ensure` Leaked Its Input Buffer When `realloc` Failed

fmt_ensure in src/prim/string.c is the growth helper for (format ...). Callers use the canonical buf = fmt_ensure(...); if (buf == NULL) return NULL; pattern, which relies on the helper leaving no resources behind on failure. The realloc-failure branch returned NULL without freeing the still-valid input buffer -- so on OOM during a long format call, the partially built buffer was leaked. The sibling size-overflow branch a few lines above already does the right thing; this one was missed. The branch now frees the input before returning, matching the contract callers already assume. macOS ASan ships without LSan, so the leak is not directly observable from the project's test runner; correctness here is by static reading. A new fi-format-recovers contract test under tests/fault_inject_test.clj locks in that (format ...) under simulated OOM raises a catchable exception.

Fixed: Stale Comment Claimed The Safepoint Yield Flag Was Never Set

The block comment above mino_safepoint_poll in src/runtime/internal.h claimed that "nothing sets the flag, so park is unreachable on the live execution path." That stopped being true when gc_request_stw / gc_release_stw in src/runtime/state.c were wired up to flip mino_current_ctx(S)->should_yield around major collections. Anyone reading the safepoint protocol from the header was being told the wrong thing. The comment now describes the actual contract: the flag is set on the main ctx around major collections, the park slow path is reachable, and a future multi-worker variant would set the flag on peer ctxs instead of (or in addition to) the self-set.

Fixed: `hash` On Sequential Collections Violated Equal-Implies-Equal-Hash

(= [0 1 2] (list 0 1 2) (seq [0 1 2]) (cons 0 (cons 1 (cons 2 nil)))) is true across all four representations, but hash_val had independent per-type branches for MINO_VECTOR, MINO_CONS, MINO_EMPTY_LIST, and MINO_CHUNKED_CONS -- each producing a different hash for the same sequential content. The branch comment at MINO_EMPTY_LIST already acknowledged the gap. User-observable consequence: a sequential value used as a key in a HAMT-sized hash-map could not be retrieved by an equal-but-distinct sequential value of a different representation: (get {[0 1 2] :found} (list 0 1 2)) returned nil.

A new static hash_sequential helper now walks any sequential value under a unified scheme: FNV basis, tag byte 0x09 (the byte already used by MINO_VECTOR and MINO_MAP_ENTRY, so existing vector hashes and the 2-vector / map-entry parity are preserved), then per-element fold via hash_uint32_bytes(h, hash_val(elem)). Empty vector, empty list, (seq ...) output, and a realized empty lazy-seq all collapse to the same constant. The cached_hash slot on vectors is still honored. Regression tests in tests/collection_test.clj (sequential-hash-honors-equality) cover empty parity, non-empty vector / list / seq / cons-chain parity, realized lazy-seq parity, nested-content parity, and HAMT round-trip lookup with mixed-type keys.

Known limitation: an *unrealized* lazy-seq still hashes by pointer identity. The helper folds in hash_pointer_bytes(...) when it meets an unrealized lazy tail because hash_val has no mino_state_t to allocate with and so cannot force. Callers that need lazy-content hash equality must force first via seq, doall, dorun, or any operation that drives iteration. Closing this gap requires threading state through the hash API and is tracked for a follow-up.

Fixed: `hash` On Sorted-Map / Sorted-Set Violated Equal-Implies-Equal-Hash

hash_val in src/collections/map.c had no case for MINO_SORTED_MAP or MINO_SORTED_SET, so both fell through to the default branch and hashed by pointer identity. mino_eq already treated content-equal sorted and hash collections as equal -- so (= {:a 1} (sorted-map :a 1)) was true while (hash {:a 1}) and (hash (sorted-map :a 1)) were not, and even two freshly built (sorted-map :a 1) instances had different hashes. The user-observable cascade was that a sorted-map used as a key in a HAMT-sized hash-map could not be looked up by an equal but distinct sorted-map instance: (get m (sorted-map :a 1)) returned nil for an m that held the same entry under a different sorted-map pointer.

hash_val now has cases for both sorted variants that walk the red-black tree and XOR-fold per-entry hashes with the same tag bytes and mixing scheme as the corresponding MINO_MAP / MINO_SET branches. The result is uncached (the sorted struct has no cached_hash slot, and sorted collections rarely appear as hash-map keys). Regression tests in tests/collection_test.clj (sorted-map-hash-honors-equality, sorted-set-hash-honors-equality) pin same-instance, cross-type, HAMT-key-lookup, and empty-collection parity.

Fixed: `dissoc` / `disj` Of An Absent Key Corrupted Sorted-Collection Count

(dissoc sorted-map :missing) and (disj sorted-set :missing) returned a fresh collection with count blindly decremented by one, even though the actual red-black tree was structurally unchanged. The same bug applied across repeated misses, so (count (reduce dissoc m (range 100))) on a three-entry sorted-map returned -97. Cascading symptoms: (= m (dissoc m :missing)) was false because the count side disagreed even though the entries matched; the identity short-circuit (identical? m (dissoc m :missing)) reported a fresh allocation; and the len field could underflow into nonsense.

rb_delete in src/collections/rbtree.c clones along the descent path on every call, so the returned root is never pointer-equal to the input -- meaning the if (nr == root) return m; short-circuit in sorted_map_dissoc1 / sorted_set_disj1 could never fire. The containment check is now done before the rb_delete walk, mirroring the shape already used by mino_map_dissoc1 for hash-maps. Regression tests in tests/collection_test.clj (sorted-map-dissoc-missing-noop, sorted-set-disj-missing-noop) pin the contract for empty, single-element, multi-element, and repeated-miss cases.

Fixed: BC Clause Params Vector Was Not Traced By The GC

The bytecode compiler rewrites destructured params like [[a b]] into a gensym placeholder plus a wrapping let, and stashes the gensym vector on clauses[i].params_vec. The clauses buffer is allocated as GC_T_RAW (POD), so the GC tag-walk could not see the embedded value pointers. The original (pre-rewrite) params vector still lived on fn.params and stayed reachable, but the gensym vector was held ONLY by the clause record — so a major collection could reclaim it while the bytecode was still in use. The runtime then read a NULL slot when binding params, returned NULL silently, and that NULL propagated upward until it surfaced — several frames later — as a misleading "seq requires one argument" error from whatever prim happened to be next on the eval path.

The user-visible repro: extend-type with six or more protocol groups (extend-type T P1 (m1 [_] 1) P2 (m2 [_] 2) ... P6 (m6 [_] 6)). The macro expands to (apply list 'do (vec (mapcat fn groups))); the inner (fn [[proto methods]] ...) is exactly the destructured shape whose gensym vector got collected mid-iteration.

The MINO_FN tracer in src/gc/driver.c now pushes each clauses[i].params_vec explicitly. Regression test in tests/regression_bc_clause_params_gc.clj locks the behaviour in.

While digging through the symptom, eval_apply_regular_call was also tightened: when eval_args returned NULL without an error latched, the previous guard evaled == NULL && mino_last_error != NULL let the NULL slip through to apply_callable, where it produced the misleading prim-arity error. The guard now always bails on NULL and synthesises an "argument evaluation produced no value" diagnostic if nothing upstream set one — so any future silent-NULL leak surfaces at its actual eval site instead of as arbitrary collateral damage downstream.

Fixed: `name` / `namespace` / `extend-type` Errors Now Name The Offender

Three diagnostic gaps that consistently surfaced as the wrong error many frames downstream of the actual call site:

Fixed: `:/` Now Reads As The Slash Keyword

The reader rejected :/ as a malformed keyword because the generic trailing-slash check fired any time / was the final character of a keyword body, regardless of whether the slash was the entire name. The result was that the slash keyword — used in real Clojure code (e.g. HoneySQL represents the SQL / operator with :/ alongside the rest of its arithmetic-op keywords) — was unreadable on mino even though (keyword "/") produced the same value at runtime and the printer emitted :/ back. The check now also verifies that there is content before the slash, so :bar/ still reports malformed keyword while :/ reads as the keyword whose name is "/".

Fixed: Constructor-Sugar Form Now Reports A Clear Diagnostic

(ClassName. args...) is JVM Clojure's shorthand for (new ClassName args...). mino has no JVM class layer so the form is genuinely unsupported, but the diagnostic was the misleading unbound symbol: ClassName. -- which suggests the user typo'd a symbol reference, hiding what the form actually was.

eval_symbol now detects the trailing-dot constructor pattern (name length > 1, ends in ., doesn't START with . so single-. and leading-dot method-call sugar pass through unchanged) and emits a dedicated message:

constructor sugar AutoFlattenSeq. is not supported on mino -- there is no JVM class layer; use defrecord and the generated ->AutoFlattenSeq positional ctor instead

Constructing the trailing-dot name as a symbol VALUE ((symbol "Foo."), 'Foo.) is unchanged — the diagnostic only fires on symbol lookup in eval position.

Fixed: Arity Mismatch Diagnostics Now Name The Callee And Count

A fn or macro arity miss surfaced as the bare macro arity mismatch / no matching arity for 2 args -- the callee wasn't named and (for the fixed-arity path) the expected count was missing. So an arity gap caused by a reader-conditional elision (e.g. (defonce x #?(:clj v)) collapsing to (defonce x) on mino's dialect) showed up as the generic macro arity mismatch with no hint at which macro mismatched.

The diagnostic now reads macro defonce arity mismatch: got 1, expected 2 for the fixed-arity path and no matching arity m__am for 2 args for the multi-arity dispatch path. The callee name comes from the head symbol of the in-progress (callee args...) form so anonymous fns still report cleanly when their call site has no symbol. Both the tree-walker (src/eval/bindings.c, src/eval/fn.c) and the bytecode VM (src/eval/bc/vm.c) report the same shape.

Fixed: `mino_state_free` No Longer Hangs On Workers Blocked On An Undelivered Promise

A worker thunk that called @undelivered-promise parked in cv_wait on the promise's condition variable. At embedder teardown, mino_host_threads_quiesce did a straight pthread_join on each spawned worker thread — and the thread never returned because nothing would ever deliver the promise. The embedder process hung indefinitely on mino_state_free.

Quiesce now does a cancel-pass before joining: every still-PENDING future and promise cell on the state's future-list is moved to CANCELLED and its cv is broadcast. The worker blocked in mino_future_deref on the cancelled promise wakes, exits the wait loop, throws future was cancelled up through its thunk, and worker_run reaches the publish path (which no-ops cleanly because the worker's own future was also cancelled). The pthread_join then returns and the state tears down. The existing future-cancel user-facing primitive already used this same mechanism for single futures; quiesce just applies it to the whole future-list at shutdown.

Fixed: Forward-Declared Unbound Var Now Throws On Access

(declare x) x returned nil silently on mino. JVM Clojure throws Var #'…/x is unbound so a reference-before-def bug fails at the use site instead of propagating a silent nil downstream. The internal var->as.var.bound flag was tracked accurately the whole time (@#'x already threw correctly), but symbol resolution didn't consult it: declare bound nil into the namespace env (instead of the var itself), so lookup saw a plain nil value and there was no var for the auto-deref path to check.

declare now binds the var into the namespace env, and the eval_symbol auto-deref path (commit f6730e4) checks bound before unwrapping: an unbound var throws Var is unbound: <ns>/<sym>. A var explicitly def-d to nil is bound=true with root=nil, so it still reads as nil silently — the discriminator is the flag, not the value.

Fixed: `ref` Now Accepts Option Keywords

(ref init :validator f :meta m :min-history n :max-history n) is the canon JVM Clojure signature. mino rejected every trailing arg with ref requires one argument, so library code that constructed refs with validators or metadata didn't load. The prim now parses the trailing keyword pairs and applies:

Fixed: `binding` Now Rejects Non-Dynamic Vars

JVM Clojure throws Can't dynamically bind non-dynamic var when a binding form rebinds a var that wasn't declared ^:dynamic. mino used to accept this silently, which let a real bug compile clean on mino and then blow up in production on the JVM. eval_binding now looks up the referenced var (qualified or current-ns) and rejects the rebind at the binding site with the same message shape as Clojure JVM, naming the offending var. Pure lexical names (without an interned var -- typical of macro-introduced gensyms) fall through to the normal dynamic-frame push as before, so the macro layer is unaffected.

Fixed: `clojure.core/refer` Now Binds Callable Vars Correctly

Calling (clojure.core/refer 'clojure.core) directly (rather than through (:require ... :refer :all) in an ns form) bound the source var into the destination namespace's env -- which preserves the source ns for syntax-quote and metadata -- but the unqualified-symbol lookup in eval_symbol did not auto-deref a var found in the namespace env chain. The next call to any referred fn ((println ...), (+ 1 2), etc.) surfaced as the deeply confusing not a function (got var), since the symbol resolved to the var value itself instead of the fn at its root.

eval_symbol now auto-derefs a var binding when the binding came from the namespace env chain, matching JVM Clojure's lookup semantics. Lexical / dynamic bindings that hold a var (e.g. the result of (let [v (resolve 'foo)] v)) are left intact so callers that need the var get the var.

Fixed: Tagged Literal With Missing Body Now Names The Tag

#foo at EOF -- or #foo followed immediately by a closing delimiter ((a b #foo)) -- used to surface as the deeply misleading unbound symbol: form. The reader passed body=NULL through to core's tagged-literal fn, whose form parameter then read as unbound; the fn's internal parameter name leaked into a user-facing error with no mention of the tag, the position, or even that a reader macro was involved. The reader now detects EOF and immediate-closer positions before recursing into the body read and emits tagged literal #foo: missing form at the actual offense site, plus the reader-conditional variant when the body was a #?(...) that resolved to nothing on mino's dialect.

Fixed: Uncaught Throws Now Preserve The Original Message In The Diagnostic

Three intertwined gaps lost the thrown value's information at the boundary between "throw with no enclosing try" and the diagnostic the embedder / caller actually saw:

Together these mean (let [f (future (throw "MYORIG"))] @f) surfaces as unhandled exception: MYORIG on the consumer side instead of the opaque future failed. Regressions live in tests/host_threads_test.clj.

Fixed: Set And Namespaced-Map Readers Now Skip Reader-Cond No-Match Forms

The plain map / vector / list readers all skip an inner form that resolved to nothing via a reader conditional, mirroring how Clojure itself reads [1 #?(:clj 2) 3] as [1 3]. The set reader (#{...}) bailed instead with the misleading unterminated set -- even though the closing } was right there. The namespaced-map reader (#:foo{...}) was worse: standalone it returned nil; embedded in a parent form ([1 #:foo{:a #?(:clj 1)} 3]) it silently surfaced as unexpected ')' because it bailed without consuming its own closing }. Both readers now treat NULL-without- -error from their inner read_form as "no form produced; continue" exactly like the other compound readers, and the namespaced-map reader correctly drops the paired key/value when one side is eliminated. Regressions live in tests/reader_cond_test.clj.

Fixed: Wrap-One Reader Macros Now Name Empty Reader Conditionals

@, ', ` `, ~, ~@, and #' produced the bare diagnostic expected form after @ (or its sibling) when the form that should follow was a #?(...)` whose arms didn't match mino's dialect — so the reader silently returned NULL to "produce no form" (the signal that lists/vectors/maps handle by skipping). In a wrap-one position there *is* no enclosing structure to absorb the empty, so the macro failed with a message that named neither the macro's expectations nor the reader-conditional cause.

The reader now sets a transient reader_last_cond_empty flag on the state when #?(...) matches no arm, and the wrap-one macros consult it when their inner read returns silent NULL. The new diagnostic is expected form after @: form was a reader conditional with no matching arm for dialect :mino (add a :default arm) — the offender is named in the message so the malformed file is greppable at first sight.

Fixed: `defrecord` / `deftype` Now Reject Non-Vector Fields At The Call Site

defrecord and deftype previously assumed fields was a vector and passed it straight to internal helpers; when a caller handed in a symbol or list — typically because a reader conditional in the fields slot resolved to nothing on mino's dialect (#?(:clj [...] :cljs [...]) with no :default arm leaves the next form sitting in the fields position, which is usually a protocol name) — the failure surfaced several frames later as the opaque vec: cannot create a vector from :symbol. Both macros now check up front and throw defrecord: fields must be a vector, got: <printed value> / deftype: fields must be a vector, got: ..., naming the actual offender so the malformed call site is obvious from the message alone.

v0.149.0 — clojure-test-suite Conformance Pass: 220 / 220 Files, 5340 / 5340 Assertions

Five focused changes bring the external jank-lang/clojure-test-suite from 211/220 files green to 220/220, all 5340 assertions passing, with mino's own 7527-assertion suite unchanged. Every change is a behavioural alignment with JVM Clojure canon — no test-rigging, no compatibility shims, no JVM-class-name aliases.

`abs` Of `Long/MIN_VALUE` Returns `Long/MIN_VALUE`

(abs Long/MIN_VALUE) previously threw MCT001 integer overflow because (- x) overflowed under mino's checked arithmetic. JVM Math/abs returns Long/MIN_VALUE for that input — a 2's-complement quirk that all of Clojure relies on. abs now routes integer negation through unchecked-negate to match. Other numeric types (ratio, bigdec, bigint, double, NaN, ±Inf) keep their normal paths.

`derive` Validates Input Shape

The 3-arity (derive h tag parent) form silently accepted nil parents, non-Named / non-type parents, and structurally invalid hierarchies — failures surfaced indirectly as assoc: expected a map or vector or as a vacuous succeed. derive now rejects each case up front with an ex-info whose message names the contract violation: tag and parent must be a keyword, symbol, or record type; the hierarchy must be a map carrying map-valued :parents, :ancestors, and :descendants. Namespacing requirements stay lenient (matching babashka and ClojureScript).

`some`, `every?`, And `zipmap` Validate Seqability

The C primitives prim_some, prim_every_p, and prim_zipmap iterated through seq_iter directly. seq_iter's unknown-type fall-through treated non-seqable inputs (keywords, numbers, booleans) as empty, so (some pred :kw) silently returned nil, (every? pred 42) silently returned true, and (zipmap :not-seqable [1 2 3]) silently returned {}. Each primitive now calls prim_seq on its collection argument up front; non-seqable inputs throw MTY001 seq: cannot coerce <type> to a sequence. Matches prim_set's long-standing pattern.

Dropped JVM-Class-Name Bridges From `clojure.core`

src/core.clj carried three bridges added in the original test-suite compatibility pass:

``clj (def clojure.lang.IPending :future) (def clojure.lang.BigInt :bigint) (ns clojure.lang.MapEntry) (defn create [k v] (map-entry k v)) ``

These let suite assertions like (instance? clojure.lang.BigInt 1N) and (clojure.lang.MapEntry/create 'k 'v) resolve on mino without the test author having to thread a reader-conditional. That side-stepped the convention every other dialect uses — :bb, :cljs, :cljr, :lpy, :jank all carry per-test reader-cond arms instead. The bridges are gone; the corresponding test files now resolve their :default arm only on the JVM.

Added `thread-sleep` Primitive

mino had no exposed sleep primitive, so the test-suite portability shim was (defn sleep [ms] nil) — a no-op. Tests in realized_qmark.cljc and add_watch.cljc were passing through worker-thread scheduling latency rather than honest waiting. The new thread-sleep C primitive in src/prim/proc.c is nanosleep-backed, restarts on EINTR using the residual time, validates that its argument is a non-negative integer, and returns nil. The portability shim now delegates to it, so the sleep-using tests actually wait the intended duration.

v0.148.0 — Move More Of clojure.core Into C: distinct?, merge-with, complement, comp, partial, juxt

Six more core.clj defns move to C primitives in src/prim/sequences.c. User-visible behaviour is identical.

The closure-construction pattern (make a MINO_FN from C with a captured env) generalises — follow-on releases can use it to port more HOF surface without growing core.clj.

v0.147.0 — Move Seq Predicates And Map Builders Into C

every?, some, not-any?, not-every?, zipmap, frequencies, and group-by move from core.clj defns to C primitives in src/prim/sequences.c. User-visible behaviour is identical — same names, same arities, same contracts, same diagnostics for type and arity errors.

The motivation is install-time cost on the new Floor tier: every form deleted from core.clj is one fewer to parse and evaluate when the host calls mino_install_clojure_core (and any embedder that stays on mino_install_minimal gets the C-level prims without the core.clj eval at all). The combined deletion is ~50 lines of Clojure.

Standalone test suite stays at 1616 tests, 7527 assertions, green.

v0.146.0 — Capability-Gated Install API For Embedders

Embedded mino's mino_install_core was monolithic — every fresh runtime parsed and evaluated all ~117 KB of core.clj and registered every C primitive whether the host needed it or not. Hosts that wanted just a calculator paid the full Clojure-stdlib bill. This release introduces a capability-gated install API so embedders opt into exactly the surface they need, without changing what a mino_install_core-equipped runtime exposes to user Clojure code.

The new surface:

mino_state_t carries a new caps_installed bitmask; MINO_CAP_* constants and mino_capability_installed(S, bit) / mino_capabilities(S) let host code query what is on. A static capability registry powers mino_capability_for_symbol(name), used by the symbol-resolution path to raise an enriched diagnostic when user code calls a name from a capability the host disabled.

New diagnostic code MNS002 (capability-disabled), distinct from MNS001 (unbound symbol). The :mino/data payload carries {:capability :symbol :reason :enable-via} so user-side error handlers can pattern-match on the disabled capability. Example:

`` error[MNS002]: slurp is not installed in this runtime (capability 'io' disabled by host) note: the host can enable this capability by calling mino_install_io from C before mino_install_core ``

REPL UX gains a :capabilities (alias :caps) command that prints a two-column installed-vs-not table, and the banner shows "embedded, N of M capabilities installed" plus a one-liner pointing to :capabilities when the runtime is in a partial-install state.

Standalone ./mino, ./mino -e ..., ./mino script.clj, REPL, all remain unchanged at the user-visible surface: same Clojure-core names, same diagnostics for non-capability errors, same 1616-test suite green.

Embedded surface: a host that wants the full Clojure runtime still calls mino_install_core and gets bit-for-bit identical behaviour. A host that wants a lean numeric/collection-only mino calls mino_install_minimal and skips the core.clj eval entirely.

The capability install ordering rule: a capability that gates a core.clj section must have its bit set before mino_install_clojure_core runs. The back-compat mino_install_core wrapper handles this for the canonical caps; mino_install_all does the same for the I/O / fs / proc / stm / agent / async / host tier.

Follow-on work — porting thin core.clj wrappers to C, pre-parsed core AST, pre-compiled bytecode, image-based bootstrap — composes on top of this surface without breaking it.

v0.145.1 — Task Runner Fix: Pre-Resolve Tasks Outside The BC Doseq Body

mino task <name> raised a confusing subs: first argument must be a string from inside the bench's lib/mino/tasks/builtin.clj (or any downstream mino.tasks.builtin shadow that follows the same (def mino-srcs (vec (filter ... (file-seq ...)))) pattern) when the namespace was loaded for the first time *during* the run-task loop. The corruption was visible as cons-list forms from inside the loaded file's later defn bodies showing up as elements of the earlier def's vector — the same name reused as a local in a later form fell through to the file-load const pool.

Root cause was mino.tasks/run-task! triggering the file load from inside a bc-compiled doseq body: the file's top-level forms landed with the caller's lexical env in scope and the bc compile of those forms (specifically the def's value expression) shared const-pool slots with the still-mid-execution outer bc frame.

The fix moves ensure-task-fn (the namespace require + var resolve step) out of the doseq body and into a separate mapv pass before any task runs, so module loads complete at the top-level boundary rather than nested inside an active bc frame. Same user-visible task semantics; the resolved task fns are now collected as [task-key sym task-fn] triples and invoked directly through the resolved fn rather than via (eval (list sym)) — which removes a second eval-from-inside-bc edge that was easy to trip from any task that also issued a require.

Backed by the surfaces this broke in practice (the mino-bench local mino.tasks.builtin shadow with its filesystem-walked mino-srcs) running cleanly through mino task build after the fix, and the standalone mino.tasks.builtin/build path unaffected.

The underlying bc-compile-time const-pool interaction with nested file loads is still latent for any user code that follows the same shape (require triggered from inside a bc-compiled fn body with mutually-named locals); the cleanest workaround for now is to lift the require above the bc frame, which is what this fix does for the canonical task dispatcher.

v0.145.0 — Fusion Cycle

Reduce, assoc/conj, and compile-time constant folding all get faster without a JIT, by leaning on Clojure's load-bearing mechanics: persistent immutable data (so direct walks don't need snapshots, identity short-circuits become semantically real), seq abstraction (so collection-typed direct paths can replace the generic dispatch without breaking the contract callers see), and homoiconicity (so the bc compiler can fold and elide on the AST itself).

Bench picks-of-the-day (M3 Pro, 5 runs):

| Bench | Before | After | Speedup | |--------------------------------|----------:|---------:|--------:| | sum-1-to-1M reduce + (range) | 53.8 ms | 0.3 ms | 180× | | reduce-over-map-10k | 8.6 ms | 5.4 ms | 1.6× | | reduce-vec-1M | 10.7 ms | 5.7 ms | 1.9× | | assoc-noop HAMT-100 | 4.31 us | 2.11 us | 2.0× |

The shipped phases are below in causal order: the reduce wrapper fix unblocked the direct-walk paths; those direct walks are the load-bearing prerequisite for the bench picks. Compile-time fold-through and dead-binding elimination are independent; they collapse a class of macro-heavy code.

Direct-Walk Reduce Over Map And Set

(reduce f m) and (reduce f s) previously routed through the generic seq_iter dispatch and allocated a fresh [k v] vector per map entry (a vector header plus a 2-slot trie node — two allocs). Map / set reduce now take a typed direct-walk path that skips the seq dispatch entirely and yields MINO_MAP_ENTRY (single alloc, equal to [k v] via the cross-type sequential path in mino_eq, prints as [k v], and vector?-true). seq_iter_val aligns on the same mino_map_entry shape so any caller that walks a map through seq sees the cheaper representation too. Flatmaps read the parallel val_order vector directly; HAMTs resolve via map_get_val. The inner reduce step is factored into a shared reduce_step helper so the existing int+int arithmetic lane and reduced? early-exit serve both the direct and seq paths.

Bench delta (5 runs each, M3 Pro):

| Bench | Before | After | Speedup | |------------------------|--------:|--------:|--------:| | reduce-over-map-1k | 1.9 ms | 1.1 ms | 1.7× | | reduce-over-map-10k | 8.6 ms | 5.4 ms | 1.6× |

Backed by tests/reduce_perf_test.clj parity tests at the flatmap / HAMT boundary (sizes 0, 1, 8, 9, 100), reduced? early-exit, and the MapEntry vector? / destructure contract.

Leans on the seq abstraction's contract: seq promises to yield each entry, not a specific shape. Swapping the entry from MINO_VECTOR to the leaner MINO_MAP_ENTRY is a representation choice the protocol allows, and downstream vector? / (first e) / destructure callers keep working because they're written against the contract, not the implementation.

Direct-Walk Reduce Over Vector

(reduce f v) and (reduce f init v) over a persistent vector no longer route through seq_iter; a recursive trie walker visits each 32-wide leaf in order, dispatching the inner step through the shared reduce_step. The win is the per-element vec_nth's O(log32 n) trie navigation collapsing to one DFS pass that touches each leaf once. Sets share the same walker via their key_order backing vector. Subvec offset/len windows are honored by passing absolute backing positions into the walker -- one walker, no separate offset-aware codepath.

Bench delta (5 runs each, M3 Pro):

| Bench | Before | After | Speedup | |------------------------|---------:|---------:|--------:| | reduce-vec-10k | 0.2 ms | 0.1 ms | 2.0× | | reduce-vec-100k | 1.0 ms | 0.4 ms | 2.5× | | reduce-vec-1M | 10.7 ms | 5.7 ms | 1.9× |

Backed by tests/reduce_perf_test.clj parity tests at sizes 0, 1, 31, 32, 33, 1024, 10000 (which cross the tail-only and multi-level-trie boundaries), reduced? early-exit, and a subvec offset case.

Leans on persistent data: the trie is immutable, so a recursive in-place leaf walk is safe -- no snapshot, no copy-on-write, no race window. Subvecs share the trie with the parent and just carry an offset/len window, so walking by absolute backing positions is the natural way to honor a window without materialising it.

Reduce Wrapper Preserves C-Side Fast Paths

The Clojure-level reduce 2-arg form was eagerly seq-decomposing the coll before dispatching: it would (let [s (seq coll)] ...) and pass (rest s) to internal-reduce regardless of whether a CollReduce protocol impl had been registered. That destroyed every 2-arg fast path in prim_reduce -- the int-range fast lane, the direct map / vec / set walks from this same cycle -- because by the time the C primitive saw the coll, it was already a forced chunked-cons.

The wrapper now looks up the protocol impl on the ORIGINAL coll's type and only seq-decomposes when an extended impl is taking over. Built-in collections (lazy ranges, vectors, maps, sets) hand through unmodified to the C primitive, which has its own fast paths and now actually gets to run them.

Bench delta (5 runs each, M3 Pro):

| Bench | Before | After | Speedup | |--------------------------------|----------:|---------:|--------:| | sum-1-to-1M reduce + (range) | 53.8 ms | 0.3 ms | 180× |

The 3-arg form already passed the coll through unchanged, so the fix shows nothing there (0.3 ms before and after). The seq path remains the fallback for user-extended CollReduce types.

Leans on Clojure's "value, not call" treatment of colls: a fresh (range 1M) and (rest (seq (range 1M))) are =-equal but not the same value -- one is a counted iterator the runtime can recognise, the other is a forced spine. Keeping the original value visible to the dispatcher is what lets the fast lanes recognise its shape.

Let-Binding Fold-Through In The BC Compiler

(let [x (+ 1 2)] (* x x)) collapses to a single OP_LOAD_K 9 at compile time. The compiler now remembers, on each local, the compile-time-known value of its right-hand side when that right-hand side itself folds via the existing PURE_PRIMS table (literal, pure-prim call over literals, pure-prim call over already-folded locals). When a later pure-prim call has all foldable args -- including symbol args that resolve via this substitution -- the call folds. The existing has_folds flag plus the compile_ic_gen soundness check still gate the OP_LOAD_K: a later (def + -) invalidates the bc and a recompile re-runs the fold against the new resolution. Capturing lets (those that publish bindings into a fresh env for an inner closure) skip the fold-through path; with env publishing in play, an inner closure could otherwise capture a stale folded value before a global resolution shifted.

Leans on homoiconicity: a let is just a list with a vector of binding pairs, so the compiler walks it with the same code shape as any other form -- no separate AST layer. Late binding of vars makes the redef-invalidates-bc check both necessary and sufficient: we never have to ask "is + definitely canonical forever?"; we check at dispatch, against compile_ic_gen.

Dead-Binding Elimination In The BC Compiler

A let binding whose name appears nowhere in the rest of the bindings or the body, and whose right-hand side is observably side-effect-free (literal, symbol, or pure-prim call over side-effect-free args), is dropped entirely at compile time: no register allocation, no value emission, no env publish. Macros that expand to verbose (let [_ ... ...] body) shapes around a binding that the body never reads now collapse to just the body. Side-effecting unused bindings (e.g., (let [_ (println "hi")] :done)) are kept; the dead-form check requires is_side_effect_ free, which only admits values it can see through. Capturing lets opt out for the same reason as Phase E1: env publishing is itself observable.

Identity Short-Circuit For `assoc` / `conj`

(assoc m k v) and (conj s x) now return the input unchanged when the operation has no observable effect: assoc short-circuits when k already maps to a value mino_eq-equal to v; set conj short-circuits when x is already present. (identical? m (assoc m k (get m k))) now holds. The check is one map_get_val / one hamt_get -- O(log32 n) for HAMTs, with the cached-hash short-circuit on collection values keeping the structural compare O(1) in the typical no-match case. disj already had this property by construction.

Bench delta (10000 iters each, M3 Pro):

| Bench | Before | After | Speedup | |------------------------|----------:|---------:|--------:| | assoc-noop HAMT-100 | 4.31 us | 2.11 us | 2.0× | | conj-set-noop size-100 | 2.27 us | 1.73 us | 1.3× |

The replace-with-different-value path pays one extra lookup (measured at ~5% in micro-benches); the trade-off is worth it for the common "rehash a map back into itself" idiom, where the saved rebuild traffic dominates.

Leans on persistent data + cached hash. With mutable maps the short-circuit would be a hazard -- a caller could rely on assoc returning a *new* container for them to mutate. With immutable maps "the result is the same value" and "the result is the same identity" are equivalent for observability, so identical? after a no-op assoc becomes a real signal callers can pattern-match on. The cached hash on collection values keeps the existence check O(1) in the typical case.

Correctness Fix: `count` On Strings Returns Codepoints

(count s) on a string returned the byte length instead of the codepoint count, while subs / nth / char-at index by codepoint (matching the documented "strings are sequences of chars" model). For ASCII the two agree; with multi-byte UTF-8 in the string they diverged, so e.g. (subs s 0 (- (count s) 1)) would raise "index out of range" on otherwise-fine input. prim_count's MINO_STRING case now walks the UTF-8 codepoints via the shared utf8_codepoint_count helper (exposed from string.c through prim/internal.h).

Surfaced because src/core.clj carries em-dashes in its docstrings -- the build's gen-core-header step does (subs src 0 (- (count src) 1)) to trim a trailing newline, which masked the bug while incremental builds skipped that step. Backed by parity tests at tests/string_test.clj.

v0.144.6 — Correctness Fixes: BC Closure Capture, Catch Unwind, And Loud Limit Errors

Adversarial whitebox testing of the bytecode VM surfaced four bugs. All four ship together because they are independent point fixes under the same banner (compile-time invariants the bc dispatcher glossed over) and each adds its own regression test.

1. IC cache poisoned closure free vars. OP_GETGLOBAL_CACHED stores its result on a slot that lives on the shared bc record, but every closure built from one fn template shares that bc. Caching an env-resolved free var (e.g. i inside (fn [i] (fn [] i))) committed closure A's captured value into a slot closure B then read back, so ((mk 100)) followed by ((mk 200)) returned 100, 100 instead of 100, 200. The handler now probes dyn then env (matching eval_symbol's order) and uses the found value directly without writing the slot; the cache only fires for symbols that neither dyn nor env shadow.

2. Dyn frames leaked across catch landings. A (binding [...] (throw ...)) body inside a try had its dyn frame still on the stack when the longjmp landed at the catch handler. The handler observed the binding's value instead of the surrounding root, and the dyn frame stayed live until the enclosing fn returned. The bc catch entry now records dyn_stack at push and unwinds to that anchor on the longjmp branch, mirroring eval_try's saved_dyn discipline.

3. Silent NULL on arity mismatch. A multi-arity bc fn called with no matching clause returned NULL with no diagnostic, surfacing as "unhandled exception" with no message. Set the same MAR002 "no matching arity for N args" the tree-walker raises so the cause is visible.

4. Silent NULL when OP_PUSHCATCH hit MAX_TRY_DEPTH. A deeply-recursive (try ... (catch ...)) body would unwind through bc_done returning NULL when the 64-frame cap was reached. Surface MLM002 "try nesting too deep" via set_eval_diag so the nearest live catch gets a real exception instead of bailing on NULL.

v0.144.5 — Correctness Fix: BC Re-Throw Through Nested Try / Finally

Nested try blocks where an inner catch handler re-throws an exception that an outer try/finally then catches were returning the inner-thrown value to the outer handler instead of the re-thrown one on Linux gcc. Apple clang's codegen for the same source returned the right value, which is why the regression test in bc_try_catch_test only failed on Linux.

The bug: OP_PUSHCATCH's longjmp-return branch read try_stack[td].exception where td was a mino_bc_run local captured before setjmp. The local was set in each PUSHCATCH case scope, but the C99 standard does not promise that distinct case-scope locals occupy distinct stack slots — and gcc with -O2 reuses slots when their lifetimes don't visibly overlap. When a sibling (nested) PUSHCATCH ran and re-wrote the slot, the longjmp back to the outer PUSHCATCH read the inner's td value, indexing into the wrong try_stack entry.

The fix reads try_depth_at_push from bc_catch_stack[d] — heap-backed storage that survives the longjmp unchanged — instead of from the local td. The PUSHCATCH arm-phase still stashes td into the catch entry where it belongs.

v0.144.4 — Build Fix: Suppress gcc's `-Wclobbered` Instead Of `volatile`

The v0.144.2 attempt to silence gcc's -Wclobbered by marking mino_bc_run's env parameter and rest local volatile turned out to miscompile the nested-try-rethrow path on Linux gcc (the outer catch handler saw the inner exception instead of the rethrown one). Apple clang's codegen for the same source was correct without volatile. The simpler fix: pass -Wno-clobbered to the compiler (gated through -Wno-unknown-warning-option so clang silently ignores it). The volatile annotations are reverted; the codegen on both compilers now matches the v0.144.0-era source. The mino_current_ctx inline simplification from v0.144.3 (drop the unused local t) stays — it is a clean style cleanup independent of the warning.

v0.144.3 — Build Fix: Drop Local In `mino_current_ctx`

gcc's -Werror=clobbered flagged the local t in mino_current_ctx's inlined body when the inline expansion landed in a function with a setjmp. The local was a single unused-after-assignment cache (mino_thread_ctx_t *t = mino_tls_ctx; followed by a ternary), so the warning was a false positive — but -Werror made it fatal anyway. Removed the local; the body is now a direct ternary on mino_tls_ctx. The compiler is free to load the TLS slot once; there is no observable change in generated code on either Apple clang or gcc 12.

v0.144.2 — Build Fix: Mark setjmp-Adjacent Locals `volatile`

GCC's -Werror=clobbered flagged three locals in mino_bc_run that could be modified between setjmp and a matching longjmp return: the env parameter (rewritten on catch-frame unwind), the rest cons-list local in the variadic dispatch (built before setjmp and read after), and the t local inside the inlined mino_current_ctx (reached after setjmp via a TLS load + branch). Apple clang doesn't emit this warning so local builds were green; the Linux CI build (gcc 12) failed at cc1: all warnings being treated as errors.

The fix marks env and rest volatile in mino_bc_run and replaces the second mino_current_ctx(S) call after setjmp with the already-captured ctx local that was set before the try frame. The hot path is unchanged: ctx reads the same TLS slot once at fn entry; the volatile-marked locals only matter to the compiler's analysis, not to the emitted code.

v0.144.1 — GC Fix: Compiled Bytecode Children Traced Via Remset

The compiled-bytecode record (mino_bc_fn_t) was allocated as GC_T_RAW, so gc_trace_children did nothing when the record reached the trace via the remembered set. The record's code, consts, clauses, and ic_slots buffers were reached only through MINO_FN's trace; when a write barrier on the bc record added it to the remset (e.g. ensure_code growing the code buffer under an OLD bc with a YOUNG replacement), the minor's mark walked the bc header but pushed none of its children. The next sweep then freed the still-referenced YOUNG buffers.

The fix is a new GC_T_BC tag with its own child-tracing branch in gc_trace_children that pushes code, consts, clauses, and the ic_slots buffer plus each slot's sym and cached value fields. mino_bc_compile_fn now allocates the record with this tag.

The bug was latent — exposed by a regression test that builds a 12-entry HAMT and shrinks it back to 5 via reduce dissoc, a pattern that produces enough fresh allocation to age and promote the bc record between compile and execution. ASan reported heap-use-after-free at mino_bc_run's code[pc++] read. Test suite, ASan, UBSan all green after the fix.

This release also adds 25 regression tests covering the K1, M3, and L2 work shipped earlier in the cycle: tests/collection_test.clj gains 10 small-persistent-map cases (flatmap basics, cross-threshold growth, shrink to flatmap size, meta preservation in both modes, insertion order); tests/destructuring_test.clj gains 6 tree-walker cases for :strs / :syms map destructure; and tests/hash_compare_test.clj gains 2 forcing-eq cases for lazy-cdr values inside maps plus 7 hash-cache cases (equal- implies-equal-hash across vec / map / set, deterministic hashing, pointer-eq fast path, populated-hash mismatch does NOT falsely short-circuit). Suite is now 1 596 tests / 7 427 assertions.

v0.144.0 — Cached Hash On Immutable Collection Headers

Vectors, maps, and sets gain a cached_hash field on their value header. hash_val populates it lazily on first call and returns the memo thereafter. The cache field is 0-initialised; 0 means "uncomputed" (a real hash that happens to be 0 pays the recompute cost on each call -- rare and bounded).

mino_eq adds a same-type-and-both-cached short-circuit: when both arguments already carry a populated cached_hash and the hashes differ, the values cannot be = (equal-implies-equal-hash invariant for immutable contents), so the structural walk is skipped. The short-circuit fires only on hashes that were ALREADY computed; computing fresh hashes here would cost as much as the structural compare for first-time pairs.

Why this is sound under Clojure semantics: vectors, maps, and sets are immutable -- once constructed, their contents (and therefore their hash) never change. Storing a memo of the hash in the header is a pure observation; concurrent fills compute the same value and the uint32_t write is atomic on the platforms mino targets, so the race is harmless. (For mutable types -- atoms, refs, agents -- no such field exists, matching the design.)

Header-size cost: 4 bytes per vector / map / set value header. GC walker unaffected -- the field is a uint32_t scalar, not a pointer.

Bench: hash-heavy and equality-on-cached pairs see a measurable drop; the hot path (single OP_LOOP_INT_DEC iter, fib-30 inner call) is unaffected since none of those touch hash. 1 571 tests / 7 353 assertions green on release, ASan, UBSan.

v0.143.0 — Tree-Walker `:strs` / `:syms` Destructure And Forcing Map Equality

Two correctness fixes that surfaced during the bytecode-VM cycle.

Tree-walker map destructure now handles :strs and :syms. bind_map_destructure only knew about :keys, silently ignoring the other two forms (BC-side expansion in destructure_pair already covered them). After this fix, (let [{:strs [a b]} {"a" 1 "b" 2}] [a b]) and the :syms analogue work everywhere they should, including any path that lands in the tree-walker fallback (multi-arity, declined-BC fns, eval-string callers).

mino_eq_force now walks into map values and set elements instead of delegating to non-forcing mino_eq. The non-forcing helper treats an unrealised lazy-seq cdr as end-of-seq -- the right call for the contexts where forcing is unsafe but the wrong one for =, which the user expects to compare contents semantically. A bc-compiled fn's & rest binding lands as a chunked / lazy seq (the nthnext expansion the BC destructure uses), so a map value carrying that rest seq would compare unequal to the same value built from a literal cons. With the forcing path now reaching into map / set entries, that case matches user expectation:

(let [r {:head 1 :tail (rest [1 2 3])} e {:head 1 :tail '(2 3)}] (= r e)) ;=> true (was false)

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. Bench: no regression on tight-loop-10M (~15 ms) or on the other matrix entries (the forcing path is only on the rare lazy-cdr-in-map case).

v0.142.0 — Flatmap For Small Persistent Maps

MINO_MAP now carries two representations behind one shape:

- Flatmap (len ≤ 8): root is NULL, a new val_order vector parallels key_order. Lookup is a linear scan via mino_eq over the insertion-order keys. No per-entry hash, no hamt_entry_t allocation, no bitmap-node allocation.

- HAMT (len > 8 or promoted-and-stayed): unchanged from before -- root non-NULL, val_order NULL. Lookup goes through hamt_get as it always did.

Promotion (flat → HAMT) happens lazily at the assoc that pushes len past the threshold. Demotion is intentionally never done: a map that was once HAMT stays HAMT even after dissoc shrinks it below 8, so callers that thrash around the boundary don't pay re-build cost on every write.

A single discriminator carries the mode -- val_order != NULL means flat -- which the new mino_map_lookup / mino_map_assoc1 / mino_map_dissoc1 helpers branch on. All map mutators now go through these helpers (assoc, dissoc, merge, agent watches, metadata-merge in the reader, clone). Direct hamt_assoc / hamt_get calls outside the HAMT primitives are gone from the map path.

Why this works for Clojure specifically: keyword keys compare pointer-equal under mino_eq's identity short-circuit, so the inner loop of flat_find_index is N pointer compares plus at most one structural compare. For the typical {:k1 v1 :k2 v2} shape -- a configuration map, a record-with-extra-keys, a destructure options bag -- 8 pointer compares beats one hash_val of the keyword's bytes followed by a HAMT bitmap test and slot index.

The cache-line crossover is the threshold: 8 keyword keys fit in roughly one 64-byte vector tail-slot line on arm64. Past that, the HAMT's log32 N walk reclaims the lead.

Threshold lives in collections/internal.h as MINO_FLATMAP_THRESHOLD = 8u; the GC walker (gc/driver.c and gc/minor.c) marks/verifies val_order alongside the existing root / key_order so the field never goes unscanned.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. Behavioural equivalence: flat and HAMT-built maps with the same content compare = and hash the same; keys / vals / seq walk in insertion order in both modes; meta is preserved across the threshold transition.

v0.141.2 — Fast-Lane Emission Honours User Shadows

The speculative OP_*_II / OP_*_IK / unary OP_*_I opcodes and the fused counted-loop pattern detector now gate on a canonical-prim identity probe: the head symbol must resolve at compile time to the C-level PRIM associated with its name. A user shadow -- (defn + [a b] (* a b)), (defn dec [x] ...), etc. -- now correctly falls through to the regular OP_CALL path so the shadow's body runs at the call site.

What this leverages from Clojure: vars carry stable identity under read/eval. The compile-time probe looks at the var's current root; if it isn't the canonical PRIM, the fast-lane emission declines and the regular dispatch handles whatever the user wrote. OP_NTH_VEC and OP_GET_KW_MAP are covered by the same gate (and nth / get are now listed in PURE_PRIMS so the probe recognises them).

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. Direct tests:

(defn + [a b] (* a b)) (+ 2 3) ;=> 6 (user shadow, not 5)

(defn dec [x] (str "shadowed: " x)) (loop [i 5] (if (zero? i) :done (recur (dec i)))) ;=> throws "zero? requires a number" (unfused path runs)

Bench unchanged: the identity probe is a compile-time operation, the hot path remains a single decode + step.

v0.141.1 — Fused Counted-Loop: Proper Diagnostics on Miss

OP_LOOP_INT_DEC and OP_LOOP_INT_DEC_INC previously bailed with a silent NULL return on non-int test register or MIN_INT / MAX_INT overflow. That dropped the proper "zero? requires a number" diagnostic and -- worse -- swallowed the integer-overflow throw that Clojure semantics demand.

The miss path now calls prim_zero_p to decide the branch and to surface any non-number diagnostic, then on the non-zero side calls prim_dec (and for the two-binding form, prim_inc) so an overflow throw fires exactly as the unfused emission's would. Hot path is unchanged: tagged-int test, in-range step, single back-jump.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. tight-loop-10M stays at ~15 ms.

v0.141.0 — Cycle Close: Measurement Gate

Final measurement after the bytecode-vm perf cycle. Local mac arm64, min-of-3, best timings:

| Benchmark | v0.128.0 | v0.140.0 | Lua 5.5 | Janet | |-------------------|----------|----------|---------|--------| | tight-loop-10M | 157 ms | 15 | 67 | 692 | | fib-30 | 359 | 89 | 37 | 58 | | sum-1-to-1M | 18 | 17 | 4 | 8 | | call-noop-1M | 173 | 31 | 10 | 19 | | cond-branch-1M | 1 340 | 29 | 10 | 73 | | arith-chain-1M | 639 | 35 | 12 | 15 |

Headline: mino beats Lua 5.5 by 4.5x on tight-loop-10M and beats Janet by 2.5x on cond-branch-1M; mino vs the cycle's start point is 10.5x faster on tight-loop, 46x on cond-branch, 18x on arith-chain.

What this leverages from Clojure: every win in this cycle came from a Clojure-shape that the compiler can see. Persistent bindings make the fused (recur (inc i) (dec j)) step safe to emit as a single in-place update. Var indirection with monotonic ic_gen gives a cheap, sound inline cache for global symbol resolution. Pure-prim identity tracking lets the compiler fold (+ 1 2) to a constant while still re-compiling when the var gets redefined. Left-associative variadic arithmetic expands into chained binary ops without losing the throw-on-overflow semantics. Bytecode-level pattern recognition of (zero? bX) ... (recur (dec bX) ...) would not be sound in a language with mutable counters or surprise effects; it works here because the loop-recur form is total over its bindings and overflow always throws.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. Full report at .local/perf_v141.md.

v0.140.0 — Direct Compile for when / and / or

(when test body...), (and a b ...), and (or a b ...) no longer fall back to the tree-walker. The compiler emits direct short-circuit OP_JMPIFNOT chains: when jumps past the body on a falsy test (storing nil); and returns the first falsy arg; or returns the first truthy arg. The last arg of and / or is compiled in tail position so (and ... (recur ...)) still goes through the bytecode trampoline.

What this leverages from Clojure: the short-circuit operators are total over the truthy/falsy axis and their values are deterministic from the args' source order. The compiler can lower each arg's evaluation to a single register write + one conditional jump, no thunking or wrapper functions.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan.

v0.139.0 — Collection Fast Lanes: nth-vec and get-kw-map

(nth v idx) and (get m :kw) get compile-time specialized opcodes. OP_NTH_VEC reads the vector's leaf directly when the collection is a MINO_VECTOR and the index is a tagged int; OP_GET_KW_MAP does a single hash + HAMT lookup when the collection is a MINO_MAP and the key is a MINO_KEYWORD. Any miss (lazy seq / non-keyword / non-vector / out-of-range) falls back to prim_nth / prim_get so the diagnostics stay Clojure-correct.

What this leverages from Clojure: persistent vectors and maps publish a stable internal layout. The hot path for an indexed get is always vector-with-int-index or map-with-keyword-key, and both go through one specialised data structure read. The fallback handles the long tail of polymorphic shapes without penalising the common case.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. 100k nth-on-vector: 3 ms (~30 ns/op). 100k get-kw on an 8-entry map: 3 ms.

v0.138.0 — Broader Int Fast Lane Inside Reduce

prim_reduce's inner loop already shortcuts (+, *) on int+int pairs without allocating the per-step 2-element cons spine. This extends the same shortcut to -, bit-and, bit-or, and bit-xor. Subtraction goes through __builtin_sub_overflow so the throw-on-overflow semantics stay intact; the bitwise ops are total on int+int and need no overflow check.

What this leverages from Clojure: reduce is the canonical folder over any seqable. Once the accumulator and the next element are both ints and the function is identified as a known pure core prim, we know exactly what the step computes and can skip both the cons allocation and the apply_callable dispatch for it. The PRIM identity check ensures a user shadow of + falls through to the slow path automatically.

Verification: 1 571 tests / 7 353 assertions green on release and ASan. reduce-with-range stays at the existing reduce_int_range fast path (~0 ms for 1M ints).

v0.137.0 — Fused Counted-Loop Opcodes

The compiler recognises two common (loop ...) shapes:

(loop [i 0] (if (zero? i) (recur (dec i)))) (loop [i 0 j N] (if (zero? j) (recur (inc i) (dec j))))

and emits a single fused opcode (OP_LOOP_INT_DEC / OP_LOOP_INT_DEC_INC) at the recur target. Each iteration is now one decode + one tag-bit check + one (or two) tagged-int updates + one back-jump-to-self -- the per-iter ZERO_INT_P, JMPIFNOT, DEC_I, INC_I, MOVE, MOVE, JMP cascade collapses to a single fetch.

What this leverages from Clojure: loop / recur is the canonical iteration form and its bytecode shape is stable and homoiconic. Persistent bindings mean each step depends only on the same binding's prior value -- no aliasing between iterations -- so the fused step can update the registers in place without the temp-register staging the generic recur emission needs to avoid clobbering. Integer-overflow semantics are preserved: dec MIN_INT and inc MAX_INT decline the fused step and fall back to the boxed-int slow lane so the throw still fires.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. tight-loop-10M: 155 ms -> ~15-18 ms (~9x), well under Lua's 72 ms. Other benches steady.

v0.136.0 — Drop Redundant Register Zeroing on Push

bc_push_window no longer zeroes the new window's slots up front. The slots are already NULL: bc_pop_window clears every slot before the next push lands on it, and the bc_regs growth path zeroes the freshly-allocated tail. The per-call zero loop was duplicating that work for every fn entry.

The body's compiler emits a write to every register before any op that may collect (OP_CALL, OP_GETGLOBAL_CACHED, OP_CLOSURE, mino_cons / env_child during has_rest / captures setup). The GC root walk sees those filled slots, not the uninitialized state.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. fib-30: ~92 ms -> ~86 ms (~7%). Other benches within noise.

v0.135.0 — Inline Cache for Global Symbol Resolution

OP_GETGLOBAL is replaced by OP_GETGLOBAL_CACHED at every compiled call site. The first time a global symbol resolves, the result is stashed in a per-fn inline-cache slot together with the state's ic_gen snapshot. Subsequent reads in the same fn skip the full eval_impl cascade (dyn-stack -> lexical env -> current-ns env -> ambient ns -> diagnostic) and return the cached value directly.

What this leverages from Clojure: vars carry stable identity under read/eval separation. Bindings only change via def / ns-unmap / var-set-root / var-unintern; the existing ic_gen counter that already gates the int+int fast lane and the literal-arg fold doubles as the IC's coherence epoch. In steady state the gen never moves, so every hit is a single load-compare-load.

Soundness: the cache is bypassed when dynamic bindings are active (mino_current_ctx(S)->dyn_stack != NULL), so (binding [*x* ...] ...) can't be masked by a stale var root. The cache fills only when no dyn-bindings are active, so the stored value is always the var's published root, never a dyn-shadowed value. Each fn-value owns its own bc and its own IC slots, so the cache key doesn't need to include the closure env; the env is constant across calls to a given fn-value.

Verification: 1 571 tests / 7 353 assertions green on release and ASan. fib-30: 188 ms -> ~92 ms (~2x). call-noop-1M: 58 ms -> ~30 ms (~2x). Other benches steady within noise.

v0.134.0 — argv ABI for BC Calls

OP_CALL no longer builds a cons-spine arg list when dispatching to a callable. The new apply_callable_argv entry point takes argv + argc directly; for the two hot callee shapes -- argv-ABI C prims and bytecode- runnable user fns -- the call goes from register slice to argv to callee with zero cons cells allocated.

Before: every OP_CALL allocated N cons cells, then apply_callable walked them right back into an argv scratch array. After: the register slice IS the argv; apply_callable_argv jumps straight to the prim's fn2 or the fn's bc trampoline. Legacy callees (fn1 prims, tree-walker fns, macros, non-fn callables) still get a cons list, built lazily on the slow path.

OP_TAILCALL still produces a cons-format MINO_TAIL_CALL sentinel for the trampoline; the trampoline inside apply_callable_argv walks it back to argv for bc-FN targets. Non-bc tail-call targets get the existing apply_callable handoff.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. call-noop-1M: 173 ms -> ~58 ms (~3x). fib-30: 359 ms -> ~188 ms (~1.9x).

v0.133.0 — N-Arity Arithmetic Expansion

(+ a b c d) is no longer a four-arg cons-spine call into the + prim. The compiler now expands variadic core +, -, and * into a left-associative chain of binary operations: (+ a (+ b (+ c d))) -- no wait, the order matters. Left-associative: ((a + b) + c) + d. Each binary step goes through the existing OP_*_II fast lane with its overflow check, so (+ INT_MAX 1 -1) still throws at the first step (Clojure-correct), whereas a right-associative expansion would mask the overflow by computing (+ 1 -1) first.

What this leverages from Clojure: variadic core arithmetic primitives have a well-defined left-fold semantics on the JVM, and that's exactly the contract mino's prims implement. The homoiconic source form lets the compiler see all N args before lowering, and immutability means the running accumulator can ride a single register through the whole chain.

Comparators (<, <=, ...) and bitwise ops are NOT expanded: their variadic forms have AND-of-pairs semantics, not chained fold, so they stay on the prim fallback for now.

Verification: 1 571 tests / 7 353 assertions green on release. arith-chain-1M drops from 639 ms baseline to ~37 ms (~17x), and that's measured AFTER F1/F2/F3 -- the shape of +/-/* in inner loops is the bigger win.

v0.132.0 — Literal-Arg Pure-Fn Fold

When a call's head is a core arithmetic / comparison / bitwise / numeric-predicate prim AND every argument is a self-evaluating literal, the bytecode compiler now runs the prim at compile time and emits the result as an OP_LOAD_K constant. (+ 1 2 3) becomes a one-opcode literal load instead of three loads plus a call; (* 60 60 24) becomes 86400 in the const pool; nested (some-fn (* 60 60 24)) folds the inner call and leaves the outer call to do its normal work against the folded literal.

What this leverages from Clojure: the core arithmetic / predicate / bitwise prims are pure (no side effects, deterministic on their inputs) and stable (their C implementations don't change at runtime). The homoiconic source form gives the compiler the literal arguments before evaluation starts, and immutability means the folded value's identity stays valid for the bc's lifetime. Lua / Janet can't do this with the same generality -- mutable globals mean an "innocuous" plus might have been replaced; mino's var indirection makes the dependency observable through S->ic_gen.

Soundness: each compiled bc records compile_ic_gen at the end of compile and has_folds = 1 if any fold fired. apply_callable compares bc->compile_ic_gen against S->ic_gen on entry; a mismatch (a def / ns-unmap ran since the compile) drops fn->as.fn.bc back to NULL so the next call recompiles against the current bindings. A user redefining + is observed; the folded constants get replaced with whatever the new + returns.

Folds attempt at the level of compile_call_impl -- one pass per call form, declining (and clearing the prim's error state, if any) on any divergence. The existing speculative OP_*_II / OP_*_IK fast lanes still fire after a successful fold attempt declines.

Verification: 1 571 tests / 7 353 assertions green on release.

v0.131.0 — Immediate-Operand Fast Lanes

Five new immediate-operand opcodes -- OP_ADD_IK, OP_SUB_IK, OP_LT_IK, OP_LE_IK, OP_EQ_IK -- encode a compile-time int literal in the C operand slot (signed 8-bit, range [-128, 127]). When (+ x 2) / (< i 10) / (= mode 1) appears at compile time with the literal in range, the compiler emits the IK form directly instead of the OP_LOAD_K + register-slot + OP_*_II pair.

The reduction is: one fewer opcode dispatch per occurrence, one fewer live register through the surrounding peephole window (helps the linear-scan allocator), and one fewer tag check at runtime (the immediate is by construction an int). Commutative ops (+, =) accept the literal on either side; -, <, <= require it on the right. Anything outside the range, or operand types the compiler can't prove are int literals, falls through to OP_*_II as before.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan.

v0.130.0 — Extended Int Fast-Lane Breadth

mod, quot, rem, bit-and, bit-or, bit-xor, bit-shift-left, bit-shift-right, unsigned-bit-shift-right, pos?, neg?, even?, odd?, and bit-not now have dedicated int+int fast-lane opcodes when both operands are inline-tagged ints. The handlers do a single MINO_IS_INT tag check, decode inline, run the C op (with the same div-by-zero / shift-range / MIN/-1 bails the prim uses), and re-tag the result. The C prim remains the slow path: any miss -- a non-int operand, a shift amount outside [0, 63], division by zero, or an overflow that escapes the tagged range -- falls back to it so the Clojure-correct diagnostic or numeric-tower promote still fires.

Each new fast lane lives behind its own per-op OP_*_II / OP_*_I opcode rather than the generic OP_BINOP_INT sub-op dispatch, matching the existing +/-/*/<... pattern. The peephole tail-move folder and the producer- to-A rewrite see the new opcodes as foldable, so a (+ acc (mod i 3)) inner loop keeps its register-stable shape.

What this leverages from Clojure: mod/quot/rem and bit-* are pure, side-effect-free core prims whose type-and-result contract on int+int is fully specified. That lets the compiler emit a speculative inline opcode without observing call-site state; any divergence from that contract (boxed bigint, float, ratio) bails to the real prim and the numeric tower handles the promotion. The bytecode never has to track the operand's tier explicitly -- the tag bits ARE the tier check.

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. cond-branch-1M (the previously hot mod benchmark) drops from 1 340 ms to 31 ms (~43×); the per-iteration cost is now dominated by the loop-recur shape rather than the modulus call.

v0.129.0 — Drop Arith Hot-Path Instrumentation

mino_int, tag_or_box_int, and the BC arith fast lane no longer increment the bc_int_make_count / bc_int_alloc_avoided counters on every tagged-int production. The counters are gated behind MINO_BC_PROFILE_COUNTS; turn the macro on when investigating alloc-avoidance, leave it off in release builds.

The counters were two unconditional memory writes per mino_int call -- on the hottest path the VM has (every arith result, every inc/dec, every range iteration). Steady-state runs paid the writes in cache-line traffic without observing the numbers; the macro moves them behind a compile-time flag without losing the diagnostic when wanted.

Verification: 1 571 tests / 7 353 assertions green; tight-loop microbench unchanged in shape but faster by ~10 % (the expected size of two-write removal on a tagged-int-heavy inner loop).

v0.128.0 — Destructure Bytecode Compilation

Third Phase E tag. The bytecode compiler now accepts destructuring patterns in let / loop bindings and in fn-params; previously both shapes declined to the tree-walker.

What changed:

Verification: 1 571 tests / 7 353 assertions green on release, ASan, UBSan. New tests/bc_destructure_test.clj covers vector positional + rest + :as + nested, map :keys + explicit pair + :or + :as, fn-param destructure mixed with plain params, and destructure-around-try-with-throw on both let and fn-param surfaces.

v0.127.0 — Binding (Dynamic Vars) Bytecode Compilation

Second Phase E tag. The bytecode compiler now emits code for binding, so dynamic var rebinds inside compiled fn bodies run on the BC path instead of declining into the tree-walker.

What changed:

Verification: 1 567 tests / 7 338 assertions green on release, ASan, UBSan. New tests/bc_binding_test.clj covers nested bindings, bindings around a try with both caught and uncaught throws, bindings across fn calls, and the empty-bindings no-op. Existing error_path_test/binding-dynamic-scope exercises the binding-restores-on-exception contract from a regular fn body, which now runs through BC.

v0.126.0 — Try/Catch/Throw Bytecode Compilation

First Phase E tag. The bytecode compiler now emits code for try/catch/finally/throw instead of declining; programs that use exception handling can run on the BC fast path without falling back to the tree-walker.

What changed:

Verification: full test suite (1 558 / 7 306) green on release, ASan, UBSan. Throw + catch through nested BC frames, throws from prim-called C code unwinding through a BC try, re-throws inside catch handlers, and finally-on-uncaught all exercised by the existing error_path_test, dialect_test, and async_*_test suites — they previously declined into the tree-walker; they now run on BC.

v0.125.0 — Arith Fast-Lane Direct Tag Extraction

The Phase D payload. binop_int_fast and unop_int_fast in src/eval/bc/vm.c now extract tagged ints inline via MINO_IS_INT tag-bit tests and MINO_INT_VAL decode, skipping the mino_val_int_p / mino_val_int_get helper chain. A single tag-bit test per operand replaces NULL + tag + boxed-type — 2–3 ALU ops saved per operand check, ~5 ops saved per call.

Encoding: a new inline helper tag_or_box_int handles the post-overflow-check encode path. For results that fit in [MINO_INT_MIN, MINO_INT_MAX] (the 61-bit signed range, ~±1.15e18) it returns MINO_MAKE_INT(r) — no allocation, no cell init. For the narrow band beyond the tagged range, it falls back to mino_int(S, r) which allocates a boxed cell. Both code paths increment the bc_int_make_count / bc_int_alloc_avoided counter pair.

Overflow boundary tests added in tests/arithmetic_test.clj:

Verification: full test suite (1 558 / 7 306) green on release, ASan, UBSan. UBSan green is the load-bearing correctness signal — it catches every misaligned deref the tag scheme could silently corrupt around.

v0.124.0 — GC IC-Marking Audit and Stress Gate

A verification milestone. The IC-marking path in src/gc/roots.c and every GC-internal (gc_hdr_t *) deref site was audited for tagged-value safety. The result: every site already routes through gc_mark_interior (which fast-rejects tagged values at the top via MINO_TAG_MASK), gc_mark_child_push (same guard), or operates on a pointer that is heap-only by construction (intern table entries, which are always interned symbol/keyword cells; never tagged).

What changed:

Gate run at this tag:

The single-tag work for the next milestone (v0.125.0) is the arith fast-lane payload: OP_ADD_II, OP_INC_I, OP_DEC_I, OP_ZERO_INT_P rewritten to extract tagged ints inline without going through mino_val_int_get, plus overflow boundary tests at MINO_INT_MAX / MINO_INT_MIN and ±1 around INT64_MAX / INT64_MIN.

v0.123.0 — Inline Tags for BOOL, NIL, CHAR

Extends the tag scheme that v0.118.0–v0.122.0 set up for inline-tagged integers to the remaining three reserved tags. After this commit mino_true, mino_false, mino_nil, and mino_char all return inline-encoded values; the nil_singleton, true_singleton, and false_singleton fields on mino_state_t become dead storage (kept in the struct only to avoid a separate ABI break — embedders never touched them directly).

New macros in src/mino.h:

src/runtime/internal.h:

Migration (≈12 sites for bool, ≈12 for char):

Verification: full test suite, ASan, UBSan all green (1557 / 7279). Perf impact: small — tagged BOOL/NIL save no allocations (singletons already shared one cell per kind); CHAR saves an alloc per char construction, visible on char-heavy workloads. The allocation-heavy hot lanes for INT continue to dominate the measurement signal; the bigger BOOL/NIL/CHAR win is structural (cleaner abstraction, no per-state singleton bookkeeping for constants).

v0.122.0 — Constructor Flip: Inline-Tagged Integers

mino_int(S, n) now returns an inline-tagged pointer for every n in the 61-bit signed range [MINO_INT_MIN, MINO_INT_MAX]. The boxed heap allocation that previously happened for every out-of-cache int is gone — the value rides in the pointer's spare bits.

The constructor change is six lines:

``c mino_val_t *mino_int(mino_state_t *S, long long n) { mino_val_t *v; S->bc_int_make_count++; if (n >= MINO_INT_MIN && n <= MINO_INT_MAX) { S->bc_int_alloc_avoided++; return MINO_MAKE_INT(n); } v = alloc_val(S, MINO_INT); v->as.i = n; return v; } ``

The boxed fallback stays in place for the narrow band between MINO_INT_MAX (≈1.15e18) and LLONG_MAX (≈9.22e18) where the tag would lose precision; in that band the value still allocates a cell. The small-int cache that v0.121.0 and earlier used is now dead code for the in-range case (the tagged form is faster and allocation- free) and dropped from the constructor.

v0.118.0 set up the tag scheme, v0.119.0 added the infrastructure helpers + GC alignment audit, v0.120.0 migrated every ->type == / switch (X->type) / X->as.i site to the mino_type_of / mino_val_int_get helpers, and v0.121.0 closed the remaining generic-deref gaps (X->type as a function argument, cross-type comparisons, the defensive a->type < 0 check). With all four landed, this commit is the six-line payload — and the test suite, ASan, and UBSan all stay green with no further changes needed.

v0.121.0 — Generic-Deref Audit for Tagged-Int Safety

Closes the remaining audit gaps that v0.120.0's ->type == / ->as.i sweep didn't catch, so the v0.122.0 constructor flip can land without revisiting the call-site layer.

The v0.120.0 perl regex caught X->type ==, X->type !=, and switch (X->type), but missed several patterns where X->type is still read as a plain value:

This tag rewrites those sites to use mino_type_of(X) so they stay correct when X is inline-tagged. Files touched:

The ->meta read sites surveyed during this tag were all found to be safe: every one either operates on a freshly-allocated value (out, copy, result, new_rec, etc.) or is inside a type-discriminated branch that already excludes MINO_INT via mino_type_of. No further guards needed for those.

Verification: the constructor flip was temporarily applied at this tag and the full suite ran green (release + ASan + UBSan, 1557 / 7279); the flip was then reverted so this commit ships the audit alone. The actual mino_int(S, n) flip lands at v0.122.0 as a six- line constructor change.

Perf gate skipped at this tag: no behavior change.

v0.120.0 — Tag-Safe Type Discrimination at Call Sites

Migrates every type-discrimination site to the mino_type_of(v) helper introduced in v0.119.0. No representation change at this tag — mino_int(S, n) still returns a boxed cell from the small-int cache or a fresh alloc — but the codebase is now uniformly safe for the constructor flip, with one important caveat (see below).

Three blanket call-site rewrites land here:

The rewrites touch 48 files. All 1557 tests / 7279 assertions stay green on release, ASan, and UBSan.

Scope caveat — second sweep needed before constructor flip. While attempting the flip in this tag, the test suite crashed in atom_set and prim_type. Root cause: the migration pass covered ->type and ->as.i access, but the GC write barrier (gc_write_barrier in src/gc/barrier.c) still dereferences new_value and old_value as gc_hdr_t * without tag-checking, and a wide population of sites read v->meta, v->as.cons.car, v->as.atom.val, etc. directly off a value that the next tag could see as inline-tagged. The plan's "22 files, 200 callsites" estimate was correct for the int-typed sites; the broader generic-deref audit (~60 ->meta reads alone, and the GC barrier's three internal derefs) needs its own tag.

GC barrier guards added preemptively at this tag (no behavior change today, prep for v0.121.0): the two SATB pushes and the remset path in gc_write_barrier now early-return on any pointer with non-zero low three bits.

Helper layout in internal.h was reordered so mino_type_of is defined before mino_val_int_p/mino_val_int_get; the latter two delegate to it, eliminating two near-duplicate tag-check sites.

v0.119.0 — Pointer-Tagged Value Representation: Infrastructure

Lands the infrastructure that subsequent tags need to flip the boxed-int representation to inline-tagged. Still no behavioural change: every value continues to flow through the boxed mino_val_t cell path; mino_int(S, n) keeps returning a small-int cache cell or a freshly-allocated boxed cell. What this tag installs is the machinery so the constructor flip at the next tag can land without crashing the GC and without requiring a 200-callsite rewrite in the same commit.

Three additions in src/runtime/internal.h:

In src/gc/driver.c:

Scope note: the cycle plan originally estimated v0.119.0 as a single tag that flipped the constructor and rewrote the ~50–80 affected call sites. A re-grep of the codebase found ~129 reads of v->as.i and ~70 *->type == MINO_INT predicates, plus scattered switch (v->type) patterns that also need gating on MINO_IS_PTR(v) before deref. Splitting infrastructure (this tag) from migration (v0.120.0) keeps each commit reviewable and ASan- verifiable, and matches the per-tag ASan/UBSan gates the plan requires.

Release / ASan / UBSan continue to pass; 1557 tests / 7279 assertions green on all three.

v0.118.0 — Pointer-Tagged Value Representation: Layout Contract

Header-only change that lands the layout contract for the pointer-tagged value representation. No call site uses the new macros yet; this tag installs the vocabulary so subsequent tags can migrate alloc sites, GC paths, and arithmetic fast lanes one chunk at a time.

mino.h gains the tag scheme:

`` tag 000 -> heap pointer to struct mino_val tag 001 -> inline 61-bit signed int (payload bits 63..3) tag 010 -> reserved for inline BOOL tag 011 -> reserved for inline NIL tag 100 -> reserved for inline CHAR tag 101..111 -> reserved ``

Public macros: MINO_TAG_PTR, MINO_TAG_INT, MINO_TAG_BOOL, MINO_TAG_NIL, MINO_TAG_CHAR, MINO_TAG, MINO_IS_PTR, MINO_IS_INT, MINO_INT_VAL, MINO_MAKE_INT, plus the 61-bit signed range constants MINO_INT_MAX and MINO_INT_MIN. The MINO_INT_VAL decode relies on arithmetic right shift of signed integers, which C99 6.5.7p5 leaves implementation-defined for negative operands; the layout note in the header records that every supported toolchain (clang, gcc, msvc on x86_64 and arm64) implements it as sign-preserving. 64-bit hosts only.

The header also fixes the stable execution ABI carried across the representation rollout: frame layout, register window indexing, call/tailcall handoff, and the bailout-to-tree-walker contract do not change. Only the in-register and in-memory layout of values changes. Prims with the mino_val_t *args (cons spine) ABI keep that ABI; tagged values flow through every slot identically.

src/runtime/internal.h gains the runtime-internal debug assertion helpers MINO_ASSERT_INT, MINO_ASSERT_PTR, MINO_ASSERT_TAGGED_NONNULL, and MINO_ASSERT_ALIGNED. They compile to no-ops under -DNDEBUG and are intended for the alloc-site audit and GC adjustments in upcoming tags. No production code references them yet.

No tests changed. Release / ASan / UBSan continue to pass; 1557 tests / 7279 assertions green on all three.

v0.117.0 — Bytecode Constant-If Fold And Tail-MOVE Peephole

Two small bytecode compiler optimisations.

First, compile_if now folds constant conditions at compile time. When the condition form is self-evaluating (true, false, nil, a literal int / keyword / string / char), the compiler evaluates truthiness on the spot and emits only the chosen branch. (if true 1 0) compiles to OP_LOAD_K dst, k(1) + OP_RETURN instead of the previous six-instruction OP_LOAD_K cond / OP_JMPIFNOT / OP_LOAD_K then / OP_JMP / OP_LOAD_K else sequence.

Second, a tail-MOVE peephole runs once per clause at the end of compile_clause. If the clause body's last emitted instruction is OP_MOVE ret_reg, X, the instruction before it is a foldable producer (OP_LOAD_K, OP_GETGLOBAL, any of the *_II binops, the unary fast-lane ops, OP_CLOSURE, OP_MAKE_LAZY) that wrote to X, and neither pc is a jump target, the producer's A operand is rewritten to ret_reg and the OP_MOVE is dropped. This catches the (let [x form] x) shape where the body's tail returns a binding directly -- the binding's emit becomes the return-slot emit, the redundant MOVE disappears.

Measured against v0.116.0 (release -O2, min-of-3 from perf_gate.clj):

`` arith-add 1716 -> 1640 (+4.4%) arith-inc 1652 -> 1598 (+3.3%) fn-call-identity 1546 -> 1515 (+2.0%) if-branch 1479 -> 1448 (+2.1%) let-local-lookup 1445 -> 1439 (+0.4%) loop-recur-5 1678 -> 1652 (+1.6%) do-block 1604 -> 1576 (+1.7%) 10M tight loop 1091ms -> 1054ms (+3.4%) ``

Modest, narrow wins. The eval-floor benches are largely call-overhead-bound, so even removing four instructions from the if-branch body (six down to two) only shaves 31ns out of a ~1500ns total. The tight loop dropped 37ms because the arith-add change closes a per-iteration overhead in the return path.

Bytes/op unchanged; both folds remove instructions but no allocations.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.116.0 — Bytecode Operand-Inplace For Fast Lanes

The bc compiler's int fast lanes now read operand registers directly when the operand is a local. Before, (+ x j) with x and j both as locals allocated two fresh temp registers, emitted OP_MOVE temp1, x_reg and OP_MOVE temp2, j_reg, then emitted OP_ADD_II dst, temp1, temp2. After, compile_operand_inplace returns the local's existing register, the two MOVEs are gone, and the binop becomes OP_ADD_II dst, x_reg, j_reg. Same change covers the unary lanes for inc, dec, and zero?. Literals and non-local refs still allocate temps and run through compile_expr as before, since the operand must live in some register for the opcode to read it.

The net is fewer instructions and lower n_regs for the common shapes -- (fn [x j] (+ x j)) drops from n_regs=5 (params + ret + 2 temps) to n_regs=3 (just params + ret). Per-call cost in bc_push_window falls in proportion; the high-water mark amplifies through every recursive arm.

Measured against v0.115.0 (release -O2, min-of-3 from perf_gate.clj):

`` arith-add 1810 -> 1716 (+5.5%) arith-inc 1737 -> 1652 (+5.1%) fn-call-identity 1640 -> 1546 (+6.1%) if-branch 1580 -> 1479 (+6.8%) let-local-lookup 1523 -> 1445 (+5.4%) loop-recur-5 1812 -> 1678 (+8.0%) 10M tight loop 1183ms -> 1091ms (+8.4%) fib(20) 2.45ms -> 2.29ms (+6.9%) ``

The header block in src/eval/bc/compile.c no longer claims "Phase 1" coverage or a "stupid first" allocator; it now lists the actual special-form coverage as of this release.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.115.0 — Bytecode Tail-Position Propagation And Unary Int Fast Lanes

The bc compiler now propagates tail position through if, do, let, and loop. Any call sitting in a control-form's tail slot -- including the recursive call inside (if cond done (recur ...)) and the cross-fn tail call in (defn f [n] (if (= n 0) :done (f (- n 1)))) -- emits OP_TAILCALL and reuses the apply_callable trampoline, keeping the C stack flat across arbitrary recursion depth.

The same change exposed two latent encoder bugs that had been silently masking bytecode coverage. The bias-encoded jump-offset bounds in patch_jmp and the recur back-jump emitter both read INT16_MIN + 0x8000 (= 0) and INT16_MAX - 0x8000 (= -1), so every conditional branch declined bytecode compilation and landed back on the tree-walker. The correct bounds (-0x8000..0x7FFF) replace them; every if-bodied fn now runs through the VM as intended.

Three unary int fast-lane opcodes -- OP_INC_I, OP_DEC_I, OP_ZERO_INT_P -- join the eight binary lanes from the prior release. The compiler emits them for (inc x), (dec x), and (zero? x) when the head resolves to the non-local non-macro prim; on a type miss the handler falls back to prim_inc / prim_dec / prim_zero_p via the same cons-spine ABI as a regular OP_CALL. The !tail gate on both unary and binary fast lanes is gone: speculation produces a value that the surrounding OP_RETURN carries out, with no trampoline indirection.

The 10M (inc i)(dec j) tight-loop probe drops from ~4.7s to ~1.2s. Self-tail recursion at depth 100000 ((defn countdown [n] (if (= n 0) :done (countdown (- n 1))))) now runs flat through the VM trampoline instead of overflowing the C stack. All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.114.0 — Bytecode Speculative Int+Int Fast Lanes

The bc compiler emits per-op specialised opcodes -- OP_ADD_II, OP_SUB_II, OP_MUL_II, OP_LT_II, OP_LE_II, OP_GT_II, OP_GE_II, OP_EQ_II -- for the eight binary arith / compare calls (+ - * < <= > >= =) when both the head is a non-local non-macro and the call site has exactly two args. Each handler runs the v0.103.0-era int+int fast lane and, on a type miss, falls through to the matching prim with the same cons-spine argv as a regular OP_CALL. The compiler skips the speculation when the call is in tail position so the OP_RETURN / OP_TAILCALL discipline isn't disturbed.

A pre-existing encoding bug in MK_BINOP_INT (sub-op nibble overlapped the op byte for any non-zero sub-op) is sidestepped: the original OP_BINOP_INT opcode stays in the enum and runtime for any hand-written stream that uses sub-op zero, but the compiler now emits the per-op variants instead. Phase-4 profile-driven runtime rewriting -- the original plan -- is overkill once the compile-time speculation covers the only set of opcodes the plan would have promoted to anyway.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.113.0 — Bytecode Multi-Arity Dispatch

Multi-arity fns now bc-compile. Each ([params] body...) clause becomes a mino_bc_clause_t entry on the compiled record (its own n_params, has_rest, entry_pc, and params_vec); the shared code stream holds every clause's bytecode back-to-back.

At fn entry, the runtime scans the clauses array twice: first looking for a fixed-arity match against argc, then for a variadic clause whose n_params <= argc. The matched clause's entry pc starts the interpreter loop; the matched params publish into the env when the fn captures, alongside any collected rest list.

The compile path keeps the single-arity fast path as a degenerate one-clause case so nothing about the existing benchmarks regresses. Full tree-walker retirement still waits on try/catch, binding, and full destructuring; those land alongside the specialization work.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.112.0 — Bytecode Loop/Recur and Lazy-Seq

(loop [...] body) compiles into a binding scope with a recur target installed at the loop entry pc. (recur ...) evaluates its args into temporaries, moves them onto the loop's binding registers, and jumps back to the entry. Nested loops stack the recur targets so each recur only sees its enclosing loop.

(lazy-seq body...) stashes the body forms in the constant pool and emits OP_MAKE_LAZY, which builds a MINO_LAZY whose .body is the form list and whose .env is the live lexical chain at the OP_MAKE_LAZY site. Realisation reuses the existing tree-walker path; only the construction side is new on the bc dispatch. The env-capture pre-scan now recognises (lazy-seq ...) alongside inner (fn ...) literals, so the enclosing fn publishes its let-bindings into the env in time for the lazy body to see them.

(try ...), (throw ...), (binding [...] ...) stay tree-walked for this cycle. Their PUSH/POP-DYN and PUSHCATCH/POPCATCH handlers land alongside the tree-walker retirement.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.111.0 — Bytecode &-Rest and Constant Vectors

Single-arity fns with a trailing & rest binding now bc-compile. The compiler tracks a has_rest flag on the compiled record; the runtime relaxes its arity check to argc >= n_params and collects the overflow args into a list, placed in the register right after the fixed params. When the enclosing fn also captures, the rest list is published into the env alongside the other params so any inner closure sees it via mino_env_get_sym.

Vector literals whose elements are all self-evaluating (nil / bool / int / float / string / keyword / char) are stashed whole in the constant pool and loaded with a single OP_LOAD_K -- the common shape (defn f [...] [...] [literal-values]) no longer declines to the tree-walker on this one count. Vector literals with non-const elements, plus all map and set literals, still decline; their full lowering lands alongside the multi-arity and destructuring work in a follow-up cycle.

Multi-arity and full destructuring (vector / map / :as / :keys) remain on the tree-walker for this cycle. The follow-up adds them together with the loop/recur and Phase-2 opcode wiring.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.110.0 — Bytecode Closures

The bc compiler emits inner (fn ...) and (fn* ...) literals, including the named-fn (fn name [params] body) form that lets a closure recurse by name. Three new opcodes -- OP_PUSH_ENV, OP_POP_ENV, OP_ENV_BIND -- manage the lexical-env chain so captured closures see exactly the let-scoped bindings that were in scope at OP_CLOSURE time. Fns whose body contains no inner fn skip the env machinery and keep their bindings register-only.

A pre-scan over the fn body sets a captures flag on the compiled record. When set, the runtime extends the captured env with a fresh child at entry and publishes the fn's params into it, and the compiler brackets every let scope with PUSH_ENV / POP_ENV plus an OP_ENV_BIND per binding. Named-fn literals emit a 4-instruction sequence that wraps a child env around the OP_CLOSURE itself so the closure captures an env that already has its name pointing at it.

Inner fn literals are stored as MINO_FN templates in the outer fn's constant pool. OP_CLOSURE copies the template's params, body, defining-ns, shape, and bc into a fresh closure value and seals in the live env; each invocation that reaches OP_CLOSURE therefore produces a distinct closure over the current lexical chain.

Multi-arity inner fns are normalised via build_multi_arity_clauses (the same helper eval_fn uses) so the template's params/body shape matches what the tree-walker fallback expects; their bc remains the declined sentinel for this cycle and the closures fall back to the tree-walker at apply_callable time. Single-arity inner fns whose body the compiler covers run on the bc dispatch from the first call.

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.109.0 — Bytecode Macro-Aware Emit

The bc compiler emits OP_CALL for non-tail regular calls and OP_TAILCALL for tail-position simple calls. Both emissions are gated on a compile-time macro probe that walks the same cascade the runtime uses at dispatch: lexical environment first, then the fn's defining-namespace env, with full alias resolution for qualified ns/name heads. The lexical-only check that previously gated the emitter missed macros that live in the ns env -- which is where most macros sit -- so the emitter declined every call shape rather than risk handing evaluated args to a macro. The ns-aware probe closes that gap and unblocks bc dispatch for ordinary calls.

The probe scopes alias lookup to the fn's defining ns rather than S->current_ns at compile time, since lazy compile-on- first-call runs with the caller's ns active but the runtime dispatch then switches to the fn's defining ns. Without that scoping, alias resolution would consult the wrong table.

The OP_CALL ABI loads consecutive register slots and hands them to apply_callable via the same cons-spine argv that the rest of the runtime consumes. OP_TAILCALL is emitted only when the final expression of a body is a direct call (special forms in tail position keep tree-walked behaviour for this cycle).

All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.108.0 — Specialization Opcode Reservation

Eleven Phase-4 opcodes are added to the bytecode opcode enum so the encoding is stable across the cycle that wires the specializing interpreter. The new entries cover the most-likely specializations against the v0.103.0 hot-path profile: OP_GETGLOBAL_CACHED (version-checked direct slot read), OP_CALL_CACHED (cached callable + version snapshot), eight per-op int+int variants (OP_ADD_II, OP_SUB_II, OP_MUL_II, OP_LT_II, OP_LE_II, OP_GT_II, OP_GE_II, OP_EQ_II) that split the single Phase-1 OP_BINOP_INT, and two shape specializations (OP_GET_KW_MAP for keyword-on-map get, OP_NTH_VEC for integer-index-on-vector nth).

Their handlers and the in-place opcode-rewriting machinery land alongside the runtime profiling counters; this cycle reserves the opcode IDs so embedders that inspect compiled fns get a stable instruction-set view from the start.

ABI surface and semantics unchanged; all 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.107.0 — Bytecode Require Mode

MINO_BC_REQUIRE=1 flips the tree-walker fallback in apply_callable's bc path from a silent recovery into a hard abort. With the knob on, every fn that the compiler declines prints MINO_BC_REQUIRE: fn declined by compiler and aborts; production builds default to unset / 0 and keep the silent fallback in place. The knob is the standing development gate for the cycle that retires the tree-walker: once the compiler covers every form that the test suite exercises, CI runs with MINO_BC_REQUIRE=1 set and any silent decline turns into a loud failure.

The flag lives on a single global (mino_bc_require_flag) that the runtime initialises from the env var at startup. Embedders that want to opt in programmatically can flip it via the externally-visible symbol; the runtime does not gate behind a public C API entry until the form-coverage cycle lands.

ABI surface and semantics unchanged when the knob is off; all 1557 tests, 7279 assertions pass on release / ASan / UBSan with MINO_BC_REQUIRE unset, and abort cleanly with it set as expected on the current Phase-1/2 declined shapes.

v0.106.0 — Bytecode Tail-Call Trampoline

A flat trampoline at the bc dispatch boundary. The VM's OP_TAILCALL returns the existing MINO_TAIL_CALL sentinel instead of recursing through apply_callable; apply_callable's bc path consumes the sentinel in a loop, switching the active function and rebuilding argv without growing the C stack. When the tail target is bc-compatible the trampoline stays in the VM; when it isn't (a non-fn callable, a multi-arity / &-rest fn, a declined-bc fn) the loop pops its frame and hands off to the regular apply_callable path. Tail-recursive shapes that the tree-walker has been trampolining since v0.71.x now flatten the same way under the bc dispatch.

The compiler holds off on emitting OP_TAILCALL for this cycle: the emit check needs to consult the namespace-env macro table in addition to the captured lexical env, otherwise a tail call whose head resolves to a macro ((future :done) inside for, for example) hands off pre-evaluated args and produces wrong results. The trampoline machinery is in place and verified through hand written programs; the discriminator fix and the form-coverage expansion ship together in the next cycle.

The Phase-2 opcodes (OP_PUSHCATCH, OP_POPCATCH, OP_THROW, OP_PUSHDYN, OP_POPDYN, OP_MAKE_LAZY) are reserved in the opcode enum so the encoding stays stable; their handlers and emitters land alongside the corresponding form coverage.

ABI surface and semantics unchanged. All 1557 tests, 7279 assertions pass on release / ASan / UBSan.

v0.105.0 — Bytecode VM Foundation

A register-based bytecode interpreter sits behind the existing tree-walker. Compilation is lazy and per-fn: on first call a fn attempts to compile its body to a 32-bit fixed-width instruction stream; on success the program is cached on the fn and dispatched through the VM, on any unsupported shape the call falls back to the tree-walker. Var redefinition discipline is preserved because every global reference resolves through the var cell, not a baked direct value.

This cycle ships the foundation: opcode encoding, dispatch loop, register stack, GC integration, per-fn compile entry, and the apply_callable wiring that makes the bc path live. The Phase-1 compiler covers literals, local and global variable refs, (if), (do), plain-symbol (let [b v ...] body), (quote), and top-level (def name expr). Function application, multi-arity, destructuring, (fn ...) literals, (loop / recur), (try / catch / finally), (binding), (lazy-seq), and macro-using forms decline to the tree-walker; the next cycle adds them with proper tail-call elimination. ABI surface and semantics are unchanged; every existing test passes through either path.

v0.104.0 — Eval-Floor Performance Cycle

A non-JIT performance cycle. Each entry below is a self-contained commit; the user-visible surface stays put while the eval floor and allocation shape come down. Cumulative result on the microbenchmark gate: average per-op cost reduced about 24 percent across 15 benches, allocation per op unchanged. A tight integer loop/recur bench dropped from 941 ms to 375 ms.

Microbenchmark: (reduce + (range 1M)) was ~870 ms; now ~514 ms.

Microbenchmark: (loop [i 0 acc 0] (if (< i 1M) (recur (+ i 1) (+ acc i)) acc)) was ~941 ms at the start of the cycle and ~787 ms after the argv-ABI work; this entry takes it to ~375 ms (-60 percent overall).

Microbenchmark: (loop [i 0 acc 0] (if (< i 1M) (recur (inc i) (+ acc i)) acc)) was ~941 ms before this cycle, now ~787 ms.

v0.103.0 — Worker-List Lock Split

Closes the only open NEEDS-DESIGN finding from the v0.102.0 adversarial pass: future / agent worker bookkeeping no longer contends with the heavy state_lock. A tight embedder loop in (dosync ...) or any other state_lock-held form can no longer starve workers at their entry-link or exit-detach steps.

mino.h is unchanged at the API surface; embedders that take a fresh source build pick up the fix transparently.

v0.102.1 — Adversarial-test pass: doc accuracy + qa-arch hygiene

Adversarial whitebox test of the v0.102.0 STM + Agent surfaces (both Clojure-level and the new C-API perimeter, individually and in combination) ran 70+ probes. All real findings are documentation accuracy issues -- no behavior changed.

A pre-existing thread-count bookkeeping issue ((future ...) worker decrements lag the embedder under tight-loop contention, so a subsequent (send ...) may throw MTH001 even when fire- and-forget futures have logically completed) was identified and filed as NEEDS-DESIGN in .local/BUGS.md. The fix requires a non-trivial threading-model refactor; deferred to a dedicated cycle. Workaround: deref the last future or await an agent before spawning more workers.

v0.102.0 — Agents finish MVP: async dispatch + pool split + C-API

Agent execution model removes the synchronous-on-the-calling-thread fallback. Per-state agent workers + run queues land in this cycle, with a separate POOLED / SOLO split for send / send-off, and a public C-API perimeter for embedders.

v0.101.1 — STM and agent hardening pass

Concentrated correctness, consistency, and safety pass over the STM and agent surfaces that landed in v0.101.0. No new features; every change closes a real or latent bug or aligns mino with JVM canon. Highlights:

Bind `*agent*` During Action / Validator / Watch Dispatch

JVM canon binds the dynamic var *agent* to the dispatching agent across the entire body of an action, validator, and watch fn. mino had no such binding, so an action that wanted to refer to itself had to capture the agent in a closure.

Install *agent* as a dynamic var (nil default) in mino_install_agent. Push a stack-allocated dyn_frame_t binding *agent* to the running agent across agent_apply_action's mino_pcall calls; pop on every exit path via single-exit goto. The existing symbol-lookup path already consults dyn_stack first, so user code reading *agent* finds the binding without any custom-resolver wiring.

Ref Watch Dispatch Continues Past A Throwing Watch

Earlier dispatch_watches invoked each ref watch through mino_call, so the first throw longjmped out and every later watch -- including watches on unrelated refs in the same commit -- silently never fired. Agent watches were already invoked through mino_pcall; the inconsistency meant a misbehaving ref watch could swallow legitimate notifications.

Wrap each ref watch in mino_pcall, capture the first thrown exception, finish dispatching every other watch, then re-throw the captured exception so the dosync caller still surfaces an error. No watch is silently lost; the caller still sees a watch failure when one occurs.

Pending-Sends Drain Honors Failed-State In `:fail` Mode

prim_send rejects sends to agents already in failed-:fail state at queue time, but a pending send queued earlier in the same dosync can fail an agent that a later pending send also targets. The drain used to call agent_apply_action unconditionally, so the second action ran against the failed agent's state -- inconsistent with the send-time contract.

Re-check at dispatch: if the agent has err set and err_mode == :fail, skip the action silently. JVM canon would throw in the agent's executor thread, never reaching the dosync caller; mino's sync drain models that by dropping the action and leaving agent-error as the surviving failure record. :continue mode keeps accepting actions, matching prim_send.

Reject `with-meta` / `vary-meta` On Stateful Types

(with-meta (atom 0) m) used to shallow-copy the atom struct, so the sibling cell got its own val slot and diverged from the original on the next swap! -- a silent identity split. JVM canon decouples atom storage from atom identity (Atom-with-meta shares the AtomicReference); without that indirection, the faithful behavior would require restructuring atom storage.

Until that refactor lands, throw a clear MTY001 with the directive "use alter-meta! for in-place mutation or the constructor's :meta option" on with-meta / vary-meta for both atoms and agents. alter-meta! keeps working (it mutates obj->meta in place, so identity is preserved) and (meta x) keeps working (read-only). Refs already threw because they weren't in supports_meta; no change there.

While in alter-meta!, add the missing gc_write_barrier around the in-place meta update -- stale OLD-to-YOUNG pointer was a latent issue.

Implement `shutdown-agents` And `send-via` Properly

shutdown-agents was a no-op stub returning nil. send-via wasn't installed, so calling it produced unbound symbol. Fix both:

Remove Dead `tx_state_t.retry_signal` Field

Initialized in two places, set in zero, read in zero. Likely a left-over from a never-landed (retry) user-facing trigger. Drop the field and the two write sites; no behavior change.

Agent Print Form Carries Identity

(pr-str (agent 0)) and (pr-str (agent 0)) produced the same string for distinct agents, so two agents holding the same value were indistinguishable in logs and debug output. Add agent_id (a monotonic counter on mino_state_t.agent_next_id, mirroring stm_next_ref_id) and emit #agent[ID VAL] to match the existing #ref[ID VAL] form.

Wire `release-pending-sends` And Drain Agents Before Watches

Two follow-ups to the in-tx send deferral.

release-pending-sends was a stub returning 0. Now that tx_state_t.pending_sends exists, walk it, return the count, and clear it -- so a body that wants to abort just its agent dispatches before commit can do so. Outside a transaction the prim still returns 0 without side effects.

tx_outer_run used to dispatch ref watches BEFORE draining pending sends. A ref watch that threw longjmped to the outer setjmp and silently swallowed every queued agent send. Swap the order: drain agents first so a successful body always reaches its agents, then fire watches.

Defer Agent `send` From Inside `dosync` Until Successful Commit

send and send-off from inside a transaction body used to fire the action synchronously: the action saw mid-tx tentative state through (deref ref), fired again on every retry attempt (so an N-retry tx ran the action N+1 times), and the action's (io! ...) falsely tripped because current_tx was still set. JVM canon queues these as pending sends and only dispatches them once, on successful commit.

Add tx_state_t.pending_sends (a cons list of (agent fn . extra) triples), check current_tx in prim_send and prepend the triple instead of dispatching, then drain in tx_outer_run after a clean commit (between current_tx = NULL and watch dispatch, so the action body can itself open a fresh dosync). Pending sends are cleared on retry and on transaction abort, so a failed attempt never produces side effects through agents.

Wire `:meta` Constructor Option for Agents

(agent state :meta m) previously threw "not yet supported". Store the map (or nil) on the agent's cell-level meta field and let (meta a) read it back. with-meta on agents is intentionally still rejected: with-meta's shallow copy of the agent struct would create a sibling cell with its own val / err slots that diverges on the next send. (meta a) reads through a special-case in prim_meta rather than extending supports_meta, so the broken-copy path stays closed.

Cross-State Defense for Agents and Tighter Defense for Refs

Refs already carried owning_state and threw MST007 if a public C API entry saw a foreign value, but the Clojure-side prims (alter / commute / ref-set / ensure) had no check -- a host that smuggled a foreign ref in via mino_env_set could mutate across states. Move tx_check_ref_owned into the shared cores (tx_alter_core, etc.) so both the C API and Clojure prims hit the check; drop the now-redundant calls in C-API entries.

Agents had no defense at all. Add owning_state to the agent struct, set it in mino_agent, and check it in every agent prim (send, send-off, await, await-for, agent-error, restart-agent, set-error-handler!, error-handler, set-error-mode!, error-mode) plus the shared watch / validator paths in add-watch, remove-watch, set-validator!, get-validator (which also pick up the equivalent ref defense).

Mirror the existing cross-state ref test in embed_stm_test.c with a parallel run that drives all 14 agent-touching probes through mino_eval_string from a foreign-allocated agent.

Validate Callability of Watch / Validator Arguments

add-watch and set-validator! accepted any value as the watch or validator -- a non-callable was stored quietly and only exploded later when the dispatcher tried to call it. Reject anything that isn't a fn/prim/macro at install time across all watchable references (atom, ref, var, agent). Same rule applies to the agent constructor's :validator and :error-handler options. set-validator! still accepts nil (clears the validator) per JVM canon.

Make STM Commits Atomic and Reject Mid-Commit Mutation

tx_commit walked the write set in iteration order, applying each ref's new value (write barrier + version bump) before validating the next ref. A late-iteration validator rejection or commute throw therefore left earlier refs already committed -- atomicity violation.

Restructure the commit into two passes. Pass 1 walks every ref, runs commute log replay and validators, and stages the new value on rs->committed_new without touching ref->val. Any failure aborts the whole commit before a single write hits memory. Pass 2 applies the staged writes; it runs no user code and cannot fail mid-flight. Adds a tx_state_t.in_commit flag and rejects alter / ref-set / commute re-entered through pass 1's user callbacks (commute fns, validators) -- their new tentatives would otherwise dangle past the iterator and silently disappear.

Validate `set-error-handler!` Handler Argument

set-error-handler! previously stored any value -- so (set-error-handler! a 5) quietly put 5 in the slot, only to fail far away when an action threw and the dispatcher tried to call it. Reject anything that isn't a fn/prim/macro (or nil to clear) at install time so the typo surfaces immediately.

Validate `set-error-mode!` Argument

set-error-mode! accepted any value silently: a non-keyword like "fail" or 99 was a no-op, and an invalid keyword like :silent flipped any agent to :fail regardless of its previous mode -- silent data loss. Reject anything other than :fail or :continue with a classified type error so user typos surface.

Run Validator on `restart-agent`

restart-agent cleared the failed agent's error and published the new state without consulting the agent's validator, so a failed agent could be restarted to a state the validator forbids -- the next send would just refail. JVM canon validates first; mino now matches. The validator runs through mino_pcall; a throw or falsy return aborts the restart and leaves the agent in its failed state, so the caller can see and retry.

Invoke Agent `error-handler` on Action and Validator Failure

set-error-handler! stored a fn but agent_apply_action never called it -- on action throw or validator rejection mino latched the exception into agent.err regardless. JVM canon: when an error-handler is installed, route the failure through (handler agent ex) and leave the agent in a clean state. With no handler, keep the latching behavior. If the handler itself throws, capture the handler's payload into agent.err so the failure isn't silently lost.

Wire Up Agent Constructor Options

(agent state :validator pred :error-handler h :error-mode m) previously accepted but silently ignored every option, so an agent declared with :validator pos? would still publish negative values. Parse trailing keyword pairs and apply them to the agent's slots. Unknown keys, odd numbers of trailing args, and invalid :error-mode values now throw with a classified error. :meta also throws with "not yet supported" rather than being silently dropped (cell metadata on agents is not yet surfaced through (meta a)).

Fix STM Commit-Lock Leak on Commute Throw During Replay

A commute fn that succeeded during the transaction body but threw during commute_log_replay longjmped past stm_unlock, leaving the global STM commit lock held. Subsequent dosync calls on another thread would deadlock; on the same thread, re-acquiring the non-recursive mutex is undefined. Route Clojure-side commute log entries through mino_pcall so a throw is captured and surfaced as a hard failure with the user's original exception payload, after the lock is released. C-side closures (TX_C_CLOSURE_TAG) remain direct calls per the public API contract that host transformers must surface failure via NULL rather than longjmp.

v0.101.0

Add software transactional memory (refs + `dosync`)

mino gains Clojure's STM surface: refs (MINO_TX_REF), dosync, alter, commute, ref-set, ensure, io!. Single-version optimistic locking with a global commit lock; coarse on purpose, since mino's typical workload is single-digit refs and a handful of worker threads. ref-min-history, ref-max-history, ref-history-count are no-op stubs (return 0 / 10 / 0); long readers under sustained writer contention may exhaust the 10000-retry cap rather than serve an older snapshot from history.

STM is opt-in for embedders via mino_install_stm(S, env) and is auto-included in the standalone ./mino binary through mino_install_all. An embedder that never calls the install function pays nothing beyond one enum tag and a NULL pointer per context.

#### Type plumbing

Introduce MINO_TX_REF enum tag plus the tx_ref struct holding the committed value, watches map, validator, version counter, and monotonic ID. Wire the tag through GC mark / verify, the type-tag string, the print form (#ref[ID VAL]), self-evaluation, the clone non-transferable list, identity equality, and prim_type (:ref). Add stm_commit_lock, stm_lock_inited, and stm_next_ref_id fields on mino_state_t; the lock itself is lazy-initialized only on the first call to mino_install_stm.

#### Embedder constructor

Add mino_tx_ref(S, val) for hosts that want to publish refs directly without going through the (ref v) primitive. The returned cell has empty watches/validator slots and a fresh monotonic ID drawn from the per-state counter.

#### Transaction state plumbing

Define tx_state_t and tx_ref_state_t in runtime/internal.h. Add a current_tx pointer on mino_thread_ctx_t so an active transaction is reachable per-thread; gc_mark_roots walks current_tx->refs_head for both the main ctx and all worker ctxs so tentative values, commute log cells, and the refs themselves stay reachable mid-transaction. The pointer is NULL outside dosync and on every freshly-allocated thread context.

#### dosync*, ref, ref?, ref-aware deref

Add src/prim/stm.c with the entry-point primitives. ref constructs a MINO_TX_REF. ref? is the identity predicate. dosync* takes a thunk, allocates a tx_state_t on the C stack, attaches it to the active context, runs the thunk, and detaches; the commit phase is empty until ref-set / alter land in the next step. dosync itself is a defmacro in core.clj that expands to (dosync* (fn [] body...)).

deref (in prim/stateful.c) gains a MINO_TX_REF arm that delegates to mino_ref_deref: inside a transaction, it returns the in-tx tentative if any, else records a read with the ref's current version and returns the committed value; outside, it returns the committed value directly.

The primitives are not yet installed -- mino_install_stm ships with the install hook in a later step. Until then the symbols are unbound at runtime, so existing programs are unaffected.

#### ref-set, alter, commit phase

Add the two simplest write primitives. Both throw eval/state MST002 (No transaction running) when called outside dosync. ref-set sets the in-tx tentative directly. alter reads the current in-tx value, calls (apply f cur args), and stores the result.

The commit phase, run on every successful body return, validates the read set under the global commit lock: every ref the transaction touched must still be at its captured snapshot version. On mismatch the lock is released and the body re-runs (up to 10000 times before throwing MST004). On match, every recorded write is applied with gc_write_barrier plus a version bump, then the lock is released.

dosync* now pushes its own try frame so an in-body throw is intercepted long enough to clear ctx->current_tx and free the per-ref state nodes before re-throwing -- otherwise a longjmp past the now-unwound stack frame would leave a dangling current_tx pointer.

#### commute and ensure

commute records (fn arg1 arg2 ...) in a per-ref log instead of materializing a tentative value; the log is replayed against the latest committed value at commit time. commute does NOT mark the ref as read in the read-set, so two transactions commuting on the same ref do not conflict (matches Clojure JVM semantics). The fn is invoked once eagerly inside the body so its return value is visible to subsequent in-tx code, but that result is informational -- the authoritative value is recomputed at commit.

ensure reads a ref and pins the snapshot version so any other transaction that mutates the same ref will fail this transaction's read-set validation. In our single-version optimistic model that is structurally identical to a deref-with-read-recording.

alter-after-commute on the same ref folds the log: the effective in-tx value is computed by replaying the log against the committed value, the alter fn is applied to that, the resulting value is pinned, and the log is dropped. ref-set- after-commute does the same but skips the fn application. commute-after-alter degrades to a fold-into-alter rather than appending to a log -- the alter has already pinned a value that the next commute should refine, not commute against.

#### Watches and validators on refs

Extend add-watch, remove-watch, set-validator!, and get-validator to accept MINO_TX_REF in addition to MINO_ATOM. The implementations share a small watchable_get accessor that dispatches between the atom and ref watch / validator slots so each primitive's body stays single-pass.

The transaction commit phase now captures committed_old and committed_new per write and dispatches watch callbacks (key ref old new) after the commit lock is released. ctx->current_tx is cleared before dispatch so a watch that itself enters dosync allocates fresh transaction state. A watch that throws propagates out of the commit; later watches do not fire (matches atom semantics).

Validators run inside the commit phase via mino_pcall against the proposed new value -- a thrown validator does not longjmp out while the lock is held; a falsy return raises eval/contract MCT001 ("Invalid reference state") after the lock is released. Both retry and validator-rejection paths free the per-ref state nodes before throwing.

#### io!, in-transaction?, history stubs

Add io! as a defmacro in core.clj that expands to (do (io!-check) body...); io!-check is a primitive that throws eval/state MST003 ("I/O in transaction") when called inside dosync, otherwise returns nil. The macro form ensures the throw fires before the body evaluates.

Add in-transaction? predicate primitive returning true inside dosync. Add ref-min-history, ref-max-history, ref-history-count as no-op stubs returning 0 / 10 / 0 -- mino uses single-version optimistic locking, not MVCC with history, so the values are not configurable.

(The MST002 contract for ref-set / alter / commute / ensure outside dosync was already wired in commits #5 and #6 -- this entry records the rest of the surface.)

#### mino_install_stm wired into mino_install_all

The standalone ./mino binary now installs the STM primitives out of the box. Embedders calling mino_new (which only installs core + io) still opt in explicitly via mino_install_stm(S, env).

Add tests/stm_test.clj with single-threaded coverage of every primitive: ref / ref?, ref-set, alter (with multi-arg), commute, ensure, in-transaction?, watches (add / remove / commit dispatch), validators (accept / reject), nested dosync, deref-in-tx-sees-tentative, commute-then-alter fold, history stubs, and the io! / io!-check contract.

#### Concurrent test coverage

Add tests/stm_concurrent_test.clj exercising the multi-thread retry path. N worker futures each run M dosync increments against a shared ref via alter and commute; a third test verifies post-commit watch dispatch fires exactly once per successful commit (retries do not double-fire). Skipped when mino-thread-limit is 1.

#### External suite impact

add_watch.cljc and remove_watch.cljc now pass their atom and ref arms; the var arm still fails because mino does not yet support add-watch on vars (separate feature, not in scope for the STM rollout). External suite holds at 212 OK.

Internal suite 1498 / 7127 / 0.

Layer 1 audits (Clojure-surface fidelity)

A pass through the STM surface comparing it against canonical JVM Clojure (clojure.lang.LockingTransaction, clojure.lang.Ref). Two accidental deviations were found and corrected; no API additions or removals on the Clojure side.

#### set-validator! no longer validates the current value

set-validator! previously called the new validator on the ref/atom's current value at install time and rejected with MCT001 if it failed. JVM Clojure does not do this -- only subsequent state transitions are checked. Match the canon: install the validator unconditionally. The old test validator-on-current (which asserted the rejection) is replaced with validator-install-on-failing-current; the STM validator-rejects test no longer needs the (ref 1) workaround.

#### alter / ref-set after commute throws

JVM Clojure throws "Can't set after commute" when alter or ref-set is called on a ref that already has a logged commute in the same transaction. mino previously folded the commute log into the alter's tentative value -- the final committed value matched JVM by accident, but the error contract differed. Throw eval/state MST002 with the canonical message. The commute-after-alter direction is unchanged: alter pins the value, the commute folds in, and commit writes the alter+commute tentative (matching JVM, which skips the commute log replay for refs already in the write set).

#### Documented deviation list in the STM module header

src/prim/stm.c now opens with a numbered enumeration of every intentional or documented deviation from JVM Clojure: single- version optimistic locking, the global commit lock, no barging, no mid-body retry, history stubs, the simpler print form, the post-A.1 set-validator! semantics, and the post-A.2 alter-after-commute contract. A reader auditing mino's STM against canon should find the answer in one place.

Layer 2a C API (host-side mirror of the Clojure surface)

Anything a Clojure programmer can do, a C host can do. The new mino_tx_* entry points sit alongside the existing mino_atom_* and mino_volatile_* API in src/mino.h. Each shares its core implementation with the corresponding Clojure- side primitive via a tx_*_core helper, so the two surfaces cannot drift.

#### mino_is_tx_ref + mino_tx_ref_deref

Predicate + reader. The deref's in-tx vs. out-of-tx dispatch is unchanged: a host calling it from inside an outer transaction gets the in-tx effective value plus read-set bookkeeping; outside, the committed value. NULL- and non-ref-tolerant at the public entry.

#### mino_tx_ref_set

Writer. Refactors prim_ref_set to share tx_ref_set_core with the new C entry; both go through the same kind transition, read-set bookkeeping, and post-commute set-rejection check.

#### mino_tx_alter_c + mino_tx_commute_c

Host transformers. The new typedef mino_val_t *(*mino_tx_xform_fn)(mino_state_t *, mino_val_t *cur, void *user, mino_env_t *) is the C-side analogue of (fn [cur] ...). mino_tx_alter_c applies the fn to the in-tx value and records a read; mino_tx_commute_c applies it without recording a read and -- if the ref has not also been altered in the same tx -- replays it at commit against the latest committed value.

The Clojure-side prim_alter / prim_commute and the new C entries share tx_alter_core / tx_commute_core helpers. The compute step is parameterised by a tx_compute_fn callback so the Clojure side dispatches via mino_call and the C side calls the host transformer directly. Commute log entries are likewise polymorphic: a (fn . extra) cons for Clojure entries, or a MINO_HANDLE wrapping a heap-allocated {xform_fn, user} closure for C entries (freed via the handle's GC finalizer). Replay dispatches per entry shape.

#### mino_tx_ensure

Read pin. Refactors prim_ensure to share tx_ensure_core with the new C entry. As before, the implementation captures the ref's snapshot version so any concurrent committer fails this tx's read-set validation; the JVM "block any other write" semantic falls out of the version-bump-on-commit rule for free.

#### mino_tx_run

Host-level dosync. The new typedef mino_val_t *(*mino_tx_body_fn)(mino_state_t *, void *user, mino_env_t *) is the C-side analogue of a (fn [] ...) body thunk. mino_tx_run owns the setjmp / try-frame, retry loop, commit phase, and watch dispatch.

prim_dosync_star's body invocation extracts cleanly into a tx_invoke_body_fn callback shared with the new C entry: the Clojure side wires it to mino_call(thunk, []), the C side wires it to a direct body(S, user, env). The setjmp-bearing tx_outer_run is shared verbatim (same -Wclobbered discipline, same try-stack overflow guard, same outer-vs-nested dispatch), and both entry points absorb a nested call into the active tx without touching the setjmp frame. Cross-thread defense is unchanged -- the active current_tx lives on the per-thread context, so a host calling mino_tx_run on a worker sees its own retry loop.

C-side embed test + task wiring

New tests/embed_stm_test.c exercises every Layer 2a entry point end-to-end: predicate / construction / outside-tx deref, a full mino_tx_run body that mixes mino_tx_ref_deref / mino_tx_alter_c / mino_tx_commute_c / mino_tx_ensure, mino_tx_ref_set, a commute-only path that goes through the log-replay code, the outside-tx error contract (every entry throws MST002 outside any transaction), the type-check throws on non-ref input, and a watch installed from Clojure observing a C-side commit through the side-atom that add-watch records into.

The ./mino task test-embed task gains a second invocation: it now compiles and runs embed_multi_state and embed_stm_test against the same lib srcs. The task helper compile-and-run-embed-test factors the shared compile + run recipe.

src/prim/stm.c is also added to the task-driven lib-srcs list (the bootstrap Makefile already picked it up via wildcard, but the task's explicit list had been missing it -- a benign drift that became a broken link as soon as the C-side test referenced the new public symbols).

C-side retry test

Added test_run_retry_under_contention to tests/embed_stm_test.c: spawns four Clojure futures that each drive 200 calls to a registered C primitive (c-incr-ref!); the prim runs mino_tx_run with a body that uses mino_tx_alter_c to increment a shared ref. Asserts the post-commit value equals workers * per-thread and that body_attempts >= successful_commits (so the user pointer survived every body invocation, including any retries).

mino's eval loop holds a per-state lock that serializes worker threads on the same state, so on this run the threads typically do not race for commits and attempts == commits. The contract holds either way: a yielding inner call (e.g. blocking on a future) would let another thread bump the version and fire the retry path, and the body fn must survive that.

Var watches and validators

add-watch, remove-watch, set-validator!, and get-validator now accept vars. The MINO_VAR struct gains watches and validator slots; var_set_root runs the validator before publishing the new root and dispatches watches after, matching the atom / ref behaviour. The fast path (no watches, no validator, no env lookup) is unchanged for early-bound install paths -- state init and the install_stdlib bootstrap stay zero-cost.

JVM Clojure fires var watches on alter-var-root (and on def with rebind); mino does the same. Watches in mino are non-^:dynamic only -- a thread-local binding push does not fire watches anywhere.

tests/external_runner.clj now requires core_test/add_watch.cljc and core_test/remove_watch.cljc upstream, and the atom / ref / var arms pass cleanly. The agent arm of each still errors (out of scope until agents land).

Fix `mino_pcall` re-throw via `set_eval_diag`

mino_pcall's catch arm called set_eval_diag to publish the caught error's message via mino_last_error / mino_last_error_map. But set_eval_diag itself longjmps to the next-outer try frame when one exists -- so any pcall caller that ran inside a Clojure try would see its catch path hijacked by the longjmp and unwind to the outer frame, leaking any bookkeeping the caller depended on.

In the STM commit path, that bookkeeping was the global commit lock: a validator throw inside a try (dosync ...) (catch ...) would longjmp out of run_ref_validator past stm_unlock, leaving S->stm_commit_lock permanently held. The next dosync deadlocked.

Fix: mino_pcall's catch arm no longer publishes anything to last_error / last_diag. Callers that want a diag set after pcall returns -1 do it explicitly. (An interim attempt routed the publish through a non-throwing record_eval_diag variant, but that left a stale diag in last_error that eval_impl's evaled == NULL && mino_last_error != NULL check would then misread as a fresh error during a later call — flushed via the follow-up commit that drops the publish entirely.)

tests/stm_test.clj's validator-throw-does-not-deadlock-stm-lock regression test proves the lock is released after a validator throw.

The agent code's agent_try_call workaround (added in E.4) is left in place by this commit; the follow-up replaces it with a direct mino_pcall call now that the catch arm is well-behaved.

`mino_pcall` exposes the raw thrown value; STM validator throws propagate

Two coupled changes follow on from the catch-arm fix:

mino_pcall's signature gains an out_ex parameter:

``c int mino_pcall(mino_state_t *S, mino_val_t *fn, mino_val_t *args, mino_env_t *env, mino_val_t **out, mino_val_t **out_ex); ``

When the call throws, *out_ex receives the raw thrown value (the cell passed to (throw ...) -- typically an ex-info map or similar payload). Callers like agent dispatch and STM validator handling that want to surface the user's exception unchanged read from out_ex directly. Breaking ABI change: existing pcall callers need to add the extra parameter (NULL is fine if the value isn't needed). mino is alpha; no compat shim.

The agent code's agent_try_call workaround is removed: src/prim/agent.c now calls mino_pcall(...&new_state, &thrown_ex) for actions, validators, and watches, getting the same exception capture in 1 line where agent_try_call took 30. The custom try frame is gone.

run_ref_validator in src/prim/stm.c likewise uses out_ex, threading the captured exception through tx_state_t's new validator_thrown_ex slot. tx_commit sets the validator_rejected flag for both throws and falsy-rejects (both are hard failures, distinct from read-set-conflict retries) and parks the captured exception on tx. dosync_run consumes it: if validator_thrown_ex is set, it propagates the user's original payload via mino_throw; otherwise it raises the canonical MCT001 "Invalid reference state".

Net behavior: a validator that throws now aborts the transaction with the validator's own exception (matching JVM Clojure's "propagate the validator's exception" semantic), where the previous code retried until the cap and then threw MST004 "transaction retry limit exceeded". A validator that returns falsy without throwing still produces MCT001.

tests/stm_test.clj covers both cases: validator-throw-propagates-original-exception checks that (ex-data e) returns the original ex-info data; the new validator-falsy-reject-throws-MCT001 pins the falsy-reject path.

Internal suite 1514 / 7171 / 0.

Agents (MVP)

mino now ships agents: agent, agent?, send, send-off, await, await-for, agent-error, restart-agent, set-error-handler!, error-handler, set-error-mode!, error-mode, plus shutdown-agents / release-pending-sends stubs. Watches and validators on agents go through the same watchable_get machinery as atoms / refs / vars.

The MVP runs sends synchronously on the calling thread. mino's eval loop holds a per-state mutex so a worker-pool design would serialize on it anyway; running synchronously is observably equivalent for any program that does not race against the agent itself, and await becomes a trivial no-op (the queue is always drained on send return). Action throws and watch throws are both captured into agent-error via a manual try frame in src/prim/agent.c (mino's mino_pcall re-throws to any enclosing try via set_eval_diag's longjmp path, which would defeat the catch contract here).

Documented deviations: send-via is not implemented (no public Executor type); shutdown-agents and release-pending-sends are stubs; the :fail error mode is the default and rejects further sends until restart-agent clears the err.

tests/agent_test.clj exercises construct / send / send-off / watches / validators / restart / error-mode / await and is in the internal run.clj. The agent arms of the upstream add_watch.cljc / remove_watch.cljc tests now pass cleanly, bringing the external runner to 134 / 2680, 1 fail + 2 errors (matching the pre-STM baseline; remaining failures are pre-existing test-abs / test-reduce / test-short, unrelated to STM or watches).

Equality of empty lazy seqs

Fix: (= (filter pred []) (filter pred [])) returned false. The case MINO_LAZY arm in mino_eq was a leftover stub from a previous force-then-compare design; the realized-to-empty case stayed MINO_LAZY per the unwrap policy and fell into the stub. Route both-LAZY equality through eq_seq_like so it walks element-wise (immediately terminating for two empty seqs). Unblocks the var arm of the upstream add-watch test, which compares two (filter ...) results that both yield empty.

Cross-state ref defense (MST007)

A C host that accidentally passes a ref allocated in one mino_state_t to another's mino_tx_* entries used to silently mutate the foreign heap. Now every public C entry that takes a ref (mino_tx_ref_deref / mino_tx_ref_set / mino_tx_alter_c / mino_tx_commute_c / mino_tx_ensure) checks the ref's allocating state via a new tx_ref.owning_state back-pointer recorded by mino_tx_ref at construction time, and throws eval/state MST007 ("ref from foreign state") on mismatch. The check is one pointer comparison; the back-pointer adds 8 bytes to each ref but no GC traversal cost (the state itself outlives all its refs and is not a GC value).

tests/embed_stm_test.c now covers this with a second-state ref passed into the first state's entries (deref, ref-set, alter_c, commute_c, ensure); each must error and the foreign state's own ops must keep working unchanged.

Internal suite

1499 / 7137 / 0 (one new alter-after-commute-throws case + one new alter-then-commute-folds case + one new validator-install-on-failing-current case). External suite holds at the prior baseline; no STM-specific test files exist in clojure-test-suite, and the add_watch.cljc / remove_watch.cljc files (which would exercise the ref arm) remain out of the external runner because their var arm still fails (var watches are out of scope for this work).

v0.100.34

Add `aset` for host arrays; tighten `vec` bad-shape rejection

(aset arr i x) now mutates MINO_HOST_ARRAY's vals[i] in place. This is the only mutation path mino exposes outside MINO_ATOM / MINO_VOLATILE; it exists because the host-array tier mirrors JVM array semantics for cross-dialect tests.

seq_iter_init / seq_iter_done / seq_iter_val now handle MINO_HOST_ARRAY and MINO_MAP_ENTRY so into, mapv, etc. iterate them uniformly. (vec arr) then materializes a normal persistent vector.

vec in src/core.clj rejects bad shapes (numbers, booleans, chars, keywords, symbols, regexes, transients) up front rather than passing them to into and getting a generic not seqable error.

vec.cljc 13/15 errors -> 19/20 passes. The remaining failure is the (aset arr 0 -1) (is (= [-1 2 3] (vec arr))) storage-aliasing assertion -- JVM's LazilyPersistentVector.createOwning reuses small Object[] arrays as the persistent vector's tail, which is incompatible with mino's persistent-trie vec that genuinely copies its input. Documented as JVM-internal optimization, not portable.

Internal suite 1476 / 7091 / 0. External suite stays at 212 OK (vec.cljc still has 1 fail for the aliasing assertion).

v0.100.33

Add `MINO_FLOAT32` tier; split `float?` and `double?`

double_qmark.cljc asserts (double? (float 0.0)) is false. mino had one float tier (MINO_FLOAT, double-precision), so (float x) and (double x) produced indistinguishable values. Introduce a separate MINO_FLOAT32 value tag sharing the same as.f storage (the 32-bit narrowing happens at mino_float32 construction) so double? can distinguish 64-bit doubles from 32-bit floats. float? matches both tiers; double? matches only MINO_FLOAT. (type x) returns :float for 64-bit and :float32 for 32-bit.

Arithmetic always promotes to MINO_FLOAT (matching JVM Clojure where Float arithmetic yields Double): tower_to_double, classify_or_throw, is_compare_number, tower_cmp, has_nan, prim_inc / prim_dec fast paths, unary -, extract_integer_for_cast, narrow_cast, prim_NaN_p, prim_infinite_p, GC clone, hash, print all dispatch through both tags. Equality between MINO_FLOAT and MINO_FLOAT32 is false even if the value matches (matches JVM where (= 5.0 (float 5)) is false).

New C primitive prim_double returns a MINO_FLOAT; replaces the prior (def double float) alias. prim_float now returns MINO_FLOAT32.

Internal numeric-coercion test updated to cover the new contract. Internal suite 1476 / 7091 / 0. External double_qmark.cljc 43/46 -> 46/46. External suite: 211 -> 212 OK.

v0.100.32

Add `MINO_MAP_ENTRY` value type

JVM Clojure's MapEntry is a vector-shaped seq returned by first / seq of a map; key and val accept it but throw on a plain 2-vector. mino conflated map entries and 2-vectors, so (p/thrown? (key [1 2])) failed. Add a distinct MINO_MAP_ENTRY type with (k, v) slots, GC mark + verify, hash that matches a 2-vector (so cross-type equality works in hash maps), (type x) returns :map-entry, and vector? / coll? / counted? / associative? / reversible? / sequential? return true on it. Equality with [k v] is element-wise via the existing cross-type sequential path. seq of a map / sorted-map / record now produces MAP_ENTRY values; find, first, rest, nth, get, count, empty?, vector destructuring, compare, into-map, conj-map, and conj-of-MAP_ENTRY all dispatch through it. key / val in src/core.clj accept only MAP_ENTRY and throw otherwise. clojure.lang.MapEntry/create now constructs a MAP_ENTRY (via the new map-entry C primitive). aset is intentionally not implemented for MAP_ENTRY since entries are immutable.

External key.cljc 8/17 -> 17/17, val.cljc 7/16 -> 16/16. External suite: 209 -> 211 OK.

v0.100.31

`(float x)` narrows to 32-bit float precision

prim_float now range-checks the input against [-FLT_MAX, FLT_MAX] (NaN passes through; +/-Infinity and overflow throw eval/type MTY001) and narrows precision via (double)(float)d so values that underflow the 32-bit range round to zero. JVM Java's (float)4.9e-324 is 0.0f and (float Double/MAX_VALUE) throws; mino now matches.

External float.cljc 15/19 -> 19/19. External suite: 208 -> 209 OK.

v0.100.30

Reader promotes out-of-long literals to bigint

The reader's number parser called strtoll and used its saturated return value without checking errno, so -9223372036854775809 silently became LLONG_MIN. Setting errno = 0 before the call and parsing through mino_bigint_from_string_n on ERANGE lets (long out-of-range-literal) reach the existing range check in prim_long and throw.

External long.cljc 23/25 -> 25/25.

v0.100.29

Add `MINO_HOST_ARRAY` value type

JVM Java arrays are not collections, vectors, associatives, etc., but mino had object-array / int-array / to-array aliased to vec, so every predicate returned true. Add a distinct MINO_HOST_ARRAY value type with malloc-owned vals[], a host_array_kind_t element-kind tag, GC mark + sweep, and a single-chunk MINO_CHUNKED_CONS emission from prim_seq so iteration still works. prim_first, prim_rest, prim_count, prim_get, prim_nth, prim_empty_p route through the new tag; coll? / vector? / counted? / associative? / sequential? / reversible? all return false on the new type. Equality is identity (matching JVM arrays). Constructors (object-array, int-array, long-array, etc.) are now C primitives that zero-fill on size and copy from collections. aset is intentionally not implemented -- host-array mutation is out of scope per the JVM-only group.

External associative_qmark.cljc, coll_qmark.cljc, counted_qmark.cljc, reversible_qmark.cljc, sequential_qmark.cljc, vector_qmark.cljc, seq.cljc, get.cljc, some.cljc, seqable_qmark.cljc all -> 0/0/0. External suite: 200 -> 207 OK.

v0.100.28

Enforce fixed-arity contracts at fn apply

bind_vec_destructure was discarding extra args once plen patterns were bound, so ((fn [x] x) 1 2 3 4) returned 1 silently. JVM Clojure throws ArityException. Tighten the binder to throw eval/arity MAR001 when a fn / defn call has more args than the parameter vector has positions; let / loop / for / doseq keep their lenient destructuring (unmatched tail elements ignored).

External update.cljc 62/63 -> 63/63. External suite: 199 -> 200 OK.

v0.100.27

`(cons x y)` returns non-list shape; peek rejects it

JVM Clojure's (cons x y) returns a clojure.lang.Cons that is a seq but not a list ((list? (cons 1 nil)) is false; (peek (cons 1 nil)) throws). mino conflated cons-results and list literals as MINO_CONS. Add a not_list flag to the cons cell: mino_cons zeroes it, prim_cons sets it to 1, and peek, pop, and list? check the flag. The data shape stays MINO_CONS so the eval path can still apply macro-built forms.

External peek.cljc 10/11 -> 11/11. External suite: 198 -> 199 OK.

v0.100.26

Widen bigdec / ratio division to bigdec

The tower's RATIO + BIGDEC contagion rule used to collapse to float -- a punt from before mino had exact bigdec division. coerce_at_tier for TT_BIGDEC now handles MINO_RATIO by computing bigdec(num) / bigdec(denom) via mino_bigdec_div (exact, throws on non-terminating expansions); promote_acc does the same widening on the running accumulator. (/ 2.0M 1/2) is now 4M, not 4.0.

External slash.cljc 158/160 -> 160/160. External suite: 197 -> 198 OK.

v0.100.25

Strict overflow on +/-/*; auto-promote on primed forms

mino historically auto-promoted on long overflow for the unprimed forms with +' / -' / *' / inc' / dec' aliased to them (v0.100.17). JVM Clojure splits the contract: unprimed throws on overflow, primed auto-promotes. tower_apply_int and the int-tier branch of tower_reduce_seeded now take a strict flag; on overflow they throw eval/contract MCT001 "integer overflow". prim_inc / prim_dec route through the same flag. New C primitives prim_addp / prim_subp / prim_mulp / prim_incp / prim_decp carry the auto-promote behavior and are registered as the primed names. Unary - on LLONG_MIN now throws under strict; -' still auto-promotes. Removes the v0.100.17 aliases in src/core.clj.

prim_short and prim_byte now share a narrow_cast helper with prim_int that compares the double value against the target tier bounds before truncation, so (byte -128.000001) throws.

External star.cljc 121/125 -> 125/125, plus.cljc 127/129 -> 129/129, minus.cljc 137/139 -> 139/139, byte.cljc 25/27 -> 27/27. External suite: 193 -> 197 OK. One regression: abs.cljc errors on (* -1 r/min-int) in the test's :default expected-value computation -- the test relies on auto-promote * which can't be reconciled without modifying the test.

v0.100.24

`Foo.` trailing-dot constructor invokes the defrecord factory

(Foo. a b c) is JVM reader sugar for the positional constructor of type Foo; mino's defrecord generates a ->Foo factory but had no dispatch for the trailing-dot syntax. eval_try_host_syntax now detects a head symbol ending in ., looks up the stem through the lexical / current-ns / ambient-ns chain, and if the stem resolves to a MINO_TYPE value, invokes the matching ->stem factory.

External dissoc.cljc 13/14 -> 22/22. External suite: 192 -> 193 OK.

v0.100.23

`(int x)` and `(long x)` throw on out-of-range

prim_int previously silently saturated/clamped on overflow: bigint via mino_as_ll returned a clamped long long; float saturated to LLONG_MIN / LLONG_MAX; single-char strings hit a legacy fast-path. prim_int now range-checks against int32; for floats and bigdecs the check is on the double value itself, so (int -2147483648.000001) throws even though it truncates to the in-range -2147483648. The single-char string fast-path is gone; chars still pass through as codepoints. New prim_long is registered as "long" and range-checks against int64 via the new extract_integer_for_cast helper. Removes the (def long ... int) alias in src/core.clj.

External int.cljc 22/27 -> 27/27, long.cljc 22/25 -> 23/25 (remaining 2 fixed in v0.100.30 by the reader bigint promotion).

v0.100.22

Add `(short x)` and `(byte x)` with range checks

clojure.core-test.num exercises (short 1) and (byte 1) in its :default arm. mino had int and the long alias but no short or byte, so the test errored on the first (short 1) call. prim_short and prim_byte share a new extract_integer_for_cast helper covering MINO_INT, MINO_FLOAT, MINO_BIGINT, MINO_RATIO, and MINO_BIGDEC, with NaN / infinity / out-of-long-range checks. They range-check the extracted value against int8 / int16 and throw on overflow. The result is returned as MINO_INT since mino has no narrow-int tier; only the contract narrows. Also relax num in src/core.clj to pass nil through (returning nil), matching the :default arm's (= nil (num nil)).

External num.cljc 6/7 -> 13/13.

v0.100.21

Bridge `clojure.lang.MapEntry/create` to a 2-vector ctor

Cross-dialect tests build a map-entry literal under their :default arm via (clojure.lang.MapEntry/create k v). mino's map entries are 2-vectors so the shape already matched; only the constructor namespace was missing. Define a clojure.lang.MapEntry namespace in src/core.clj next to the existing clojure.lang.IPending and clojure.lang.BigInt bridges. The remaining (p/thrown? (key [1 2])) cases in key.cljc and val.cljc need a distinct MapEntry value type to reject non-entry 2-vectors and stay open.

External suite aggregate: 5252 -> 5278 assertions, errors 10 -> 8.

v0.100.20

Build: Make `make` clean on gcc-11 (Ubuntu 22.04)

The release-build workflow runs on ubuntu-22.04, where the default cc is gcc-11. Two -Werror regressions broke linux-amd64 and linux-arm64 there (and the v0.99.0 - v0.99.4 attempts to diagnose only added log capture, never the underlying fix):

Verified locally with gcc-11.3 (matches Ubuntu 22.04), gcc-12.4, and gcc-14.2. All three build clean with -Werror. Internal suite 1476 / 7071 / 0.

v0.100.19

Future spawn now conveys the caller's dynamic bindings

JVM Clojure's (future ...) snapshots the calling thread's binding frame and reinstalls it on the worker, so a bound-fn captured inside the future body sees the caller's *x* (plus whatever the worker's own binding blocks pushed). mino's worker thread previously started with an empty dyn_stack, so a nested future that captured *x* saw only the root value (or, for the test in question, the dereferer's binding).

mino_future_spawn now calls a new public helper mino_snapshot_thread_bindings(S) (factored out of prim_get_thread_bindings) and stores the resulting symbol -> value map on impl->dyn_snapshot. worker_run unpacks that map into a malloc-owned dyn_binding_t chain wrapped in a single dyn_frame_t, pushes it as the worker's initial dyn_stack, invokes the thunk, then pops and frees. The frame is freed on both the success and error paths.

External bound_fn.cljc 7/8 -> 8/8, bound_fn_star.cljc 7/8 -> 8/8. External suite: 191 -> 193 OK.

v0.100.18

`(seq sorted-map/-set)` is no longer a list

(list? (seq (sorted-map :a 1))) was returning true on mino because sorted_seq returned a MINO_CONS chain and list? accepts both MINO_CONS and MINO_EMPTY_LIST. JVM Clojure draws a sharper distinction: PersistentList (literal list / (list ...) result) matches list?; the seq view of any other coll does not. The sorted-map and sorted-set arms of prim_seq now re-package the cons chain produced by sorted_seq into a single-chunk MINO_CHUNKED_CONS, so list? returns false while first, rest, count, etc. keep working unchanged.

External list_qmark.cljc 20/22 -> 22/22. External suite: 190 -> 191 OK.

v0.100.17

Shortest-decimal Double-to-string for `bigint` and `rationalize`

Adds mino_double_shortest, a static helper in src/prim/bignum.c that prints a finite double as the shortest decimal that round-trips through strtod to the same bit pattern. Handled by iterating precision 1..17 with %.*g, parsing back, and accepting the first match. Slow paths only -- never on the hot loop.

(bigint d) now routes the float arm through the shortest-decimal string, parses it via mino_bigdec_from_string, and truncates the resulting BigDecimal toward zero with imath's integer division. Result: (bigint 1.7976931348623157E308) is the full 309-digit integer instead of Long/MIN_VALUE (the previous long-cast saturation). Other doubles are unaffected.

(rationalize d) likewise converts via shortest-decimal + BigDecimal: (rationalize 1.1) is 11/10, (rationalize 1.5) is 3/2, (rationalize (/ 1.0 3.0)) is 3333333333333333/10000000000000000. The previous binary mantissa decomposition (m * 2^e -> m / 2^|e|) is gone.

`clojure.lang.BigInt` bridge for `instance?`

(def clojure.lang.BigInt :bigint) so cross-dialect tests using (instance? clojure.lang.BigInt x) succeed on mino. Same narrow bridge pattern as the prior clojure.lang.IPending -> :future mapping; no other JVM types are aliased.

Auto-promoting arithmetic aliases

+', -', *', inc', dec' are now defined as aliases for their unprimed forms. mino's unprimed forms already auto-promote (an intentional divergence) so the primed forms have the same semantics. The aliases let portable Clojure code that uses the primed forms compile without rewriting.

External bigint.cljc 13/15 -> 18/18, rationalize.cljc 14/16 -> 16/16. External suite: 188 -> 190 OK.

v0.100.16

`(repeat n x)` accepts booleans (true -> 1, false -> 0)

Per the cross-dialect test suite, every non-:clj dialect coerces a boolean count via (if n 1 0) instead of throwing. mino's repeat rejected booleans up front because number? returns false for them; the cond now adds an explicit boolean arm so (repeat true :a) is [:a] and (repeat false :a) is [], while non-numeric / non-boolean counts (nil, strings, keywords) still throw with the same "count must be a number" message.

External repeat.cljc 15/16 -> 17/17. External suite: 187 -> 188 OK.

v0.100.15

`subvec` coerces any number-tier index to long

JVM Clojure's subvec happily accepts floats, ratios, bigdecs, and NaN as start/end -- it casts them to long via the same JVM (long) truncation that (int x) uses, so (subvec v 2.72 3.14) is equivalent to (subvec v 2 3) and (subvec v ##NaN ##NaN) is []. mino's prim_subvec rejected anything other than MINO_INT with "subvec: start must be an integer", breaking the external test's exhaustive borderline-index coverage. The start/end checks now route through a subvec_to_long helper covering MINO_INT, MINO_FLOAT (NaN -> 0), MINO_BIGINT, MINO_RATIO, and MINO_BIGDEC; non-numeric values still throw with a clearer "must be a number" message.

External subvec.cljc 8/9 -> 34/34. External suite: 186 -> 187 OK.

v0.100.14

`(nth nil i)` returns nil instead of throwing

Clojure treats nil as an empty seq for nth: (nth nil 10) is nil, and (nth nil 10 :default) is :default. mino's prim_nth threw "nth index out of range" for the 2-arg form, which broke any code relying on the nil-as-empty equivalence and the external nth.cljc test ((is (nil? (nth nil 10)))). The nil-coll arm now returns def_val when supplied, else nil; non-nil out-of-range still throws.

External nth.cljc 4/5 -> 13/13. External suite: 185 -> 186 OK.

v0.100.13

Watch exceptions now propagate out of swap! / reset! / compare-and-set!

atom_notify_watches previously wrapped each watch call in a try frame and swallowed any throw, which meant a watch's exception was invisible to the user. Per Clojure JVM semantics the value commits via CAS first and then watches fire; if a watch throws, the exception propagates to the swap! call site (and any later watches in the iteration order are skipped). The swap!/reset!/cas! arms now check the watch return code and propagate NULL instead. Internal watch-exception-ignored test renamed to watch-exception-propagates and updated to assert the throw + the post-CAS value.

External add_watch.cljc 7/10 -> 9/10 (passes once the watch tests run; var-watch + ref-watch portions still error because mino lacks var watches and STM, both intentional gaps).

v0.100.12

`atom` accepts trailing positional args and any persistent map as `:meta`

(atom v) now tolerates extra positional args after the initial value -- the option-pair loop already absorbed unknown keys, but (atom nil nil nil) and (apply atom (take 11 (repeat nil))) now construct an atom with the initial value and ignore the trailing nils, matching Clojure JVM. Also broadened the :meta value check to accept MINO_SORTED_MAP in addition to MINO_MAP, so (atom nil :meta (sorted-map :a "a")) succeeds. Vectors, sets, numbers, etc. still reject -- the (p/thrown? ...) shapes in the external suite remain.

Validator returning nil rejects the new state

atom's :validator arm previously only rejected on false; nil slipped through. Per Clojure's docstring ("validate-fn should return false or throw"), nil counts as logical false too. Both the construction-time check and the swap!/reset! check now route through mino_is_truthy, so a (constantly nil) validator throws "Invalid reference state" both at construction and on every attempted update.

External atom.cljc 74/74. External suite: 184 -> 185 OK.

v0.100.11

Two fixes that close the last external-suite timeouts:

`(promise)` no longer blocks process exit

mino_host_threads_quiesce was waiting on impl->cv for any future that never had a worker thread, expecting a pool-managed worker to publish the result. Promises (constructed via (promise) with no backing thunk) match that shape but have no worker -- if the user doesn't (deliver p val) before exit, the wait blocked forever. The quiesce loop now distinguishes "pool-managed pending" (has a thunk; will be delivered) from "promise" (no thunk; may stay pending forever) and skips the latter.

`ifn?` recognises promises; `clojure.lang.IPending` bridge

(ifn? (promise)) now returns true (matches Clojure JVM where promises implement IFn). instance? works against clojure.lang.IPending -- mino binds that JVM interface name to the keyword :future at bootstrap, so cross-dialect tests that detect pending values via (instance? clojure.lang.IPending x) succeed against mino's promise/future type. The bridge is narrow: only clojure.lang.IPending, no other JVM interfaces are aliased.

External ifn_qmark.cljc 19/19, taps.cljc 4/4. External suite: 182 -> 184 OK, 0 crashes, 0 timeouts. Group 7 of the external-suite plan is complete.

v0.100.10

let now follows Clojure's sequential-binding semantics: an init expression sees only the *previous* bindings in the same let, not its own binding. Each binding lands in a freshly-created child env, so closures captured during init expressions are immune to a later shadow of the same name.

The previous (buggy) behavior used a single mutable env that all binding inits shared. A nested (let [f X] (let [f (fn [] (g f))] ...)) shape would have the inner closure see the inner-let's f (after the rebind) instead of the outer f it should have captured. The external bound_fn.cljc test triggered this through nested (let [f (bound-fn [] ...)]) shadowing and segfaulted via unbounded recursion.

The change has a knock-on effect: code that relied on mutable-env semantics for self-recursive let -- (let [go (fn [...] (go ...))]) -- no longer works, because go is unbound when the fn body is created. Any such pattern must use a named fn: (let [go (fn go [...] (go ...))]). The named-fn binding is established before its body is captured.

Audit and rename the recursive let-fn patterns in mino's bundled code: run!, tree-seq, interleave, shuffle, take-last, trampoline, the condp and case macros' build helper, and dotimes/while/doseq's emitted recursion shapes. Every callsite that previously self-referenced through the mutable-env trick now uses (fn name [...] ...).

External bound_fn.cljc and bound_fn_star.cljc no longer segfault (they go from process crashes to 7/8 each, with the one remaining failure being a separate dynamic-binding-across-futures issue). External suite: 182 OK with 0 crashes (was 0 crashes already after prior fixes; this confirms the segv class is closed).

v0.100.9

(add-load-path! path) adds a directory to the runtime's extra-load-paths list, consulted by require after mino.edn's project paths and before the cwd fallback. The list lives on mino_state_t and is freed at state teardown; entries are deduplicated so re-registering an existing path is a no-op.

The pure motivation is the external clojure-test-suite driver, where cross-file (:require [clojure.core-test.X :as ...]) lives in files outside mino's tree. The driver now does (add-load-path! "../clojure-test-suite/test") once per sub-process, and sibling files resolve through the standard module path -- no per-file preloading hack, no test pollution from preloaded namespaces. External not_eq.cljc is now 130/130 (was load-error).

External suite: 181 -> 182 OK. The remaining single load-error is ancestors.cljc, which uses the JVM Object symbol -- pre-existing JVM-specific gap, not addressable from this hook.

The standalone-mode resolver (no mino.edn present) is now runtime_paths_resolve, which still consults the extra-load-paths list before the cwd fallback. Project mode keeps using project_resolve and now passes the state pointer as the resolver context so it can read S->extra_load_paths.

v0.100.8

Regexes are now a first-class value type (MINO_REGEX), distinct from strings. Equality is identity, matching Clojure JVM's Pattern.equals: two distinct #"x" literals are not =. Type tag is :regex; print form is #"source" so the round-trip is exact.

Surface changes:

External eq.cljc is now 65/65. External suite: 180 -> 181 OK.

Internal regex tests under regex-literal-reader and re-pattern-fn are updated to assert the new contract (regex values, identity =).

v0.100.7

UUIDs are now a first-class value type (MINO_UUID, 16 bytes inline). The #uuid "..." reader literal, (parse-uuid s), (random-uuid), and (uuid? x) all participate in the new type:

External suite: 178 -> 180 OK. parse_uuid.cljc 17/17 (was 9/17), uuid_qmark.cljc 24/24 (was load-error). The internal compat tests for random-uuid / uuid? / parse-uuid are updated to exercise the new type contract.

v0.100.6

Bigdec division

(/ 2.0M 1.0M) no longer errors with "with-precision unimplemented". The new mino_bigdec_div mirrors Java's BigDecimal.divide(BigDecimal): preferred scale is sa - sb, but the algorithm tries successively larger scales (multiplying the numerator by 10 each step) until the division is exact. If the quotient has a non-terminating decimal expansion the function throws "non-terminating decimal expansion in bigdec division" -- same error class as Java's ArithmeticException. Cap is 1024 extra digits, well past anything that would terminate.

The tower-arithmetic dispatch in tower_op_at_tier and tower_reduce_seeded now both route OP_DIV for the BIGDEC tier through this primitive. External slash.cljc rises from 41/42 (1 error that aborted the rest of the file) to 158/160 -- the remaining two are bigdec-meets-ratio cases ((/ 2.0M 1/2)) where mino's documented tier-collapse-to-float diverges from JVM Clojure's promote ratio-to-bigdec; tracked as a separate intentional divergence.

`=` on BigDecimals is numerical

(= 1.0M 1.00M) is now true, matching JVM Clojure's = (which dispatches BigDecimals through Numbers.equiv -> compareTo == 0). mino was previously scale-strict (Object.equals-style), which mismatched both Clojure's = and the cross-dialect tests in the external suite. The hash function now strips trailing zeros from the unscaled bigint before mixing in the scale, preserving the equal-implies-equal-hash invariant.

This is a breaking change for code that relied on (= 1.0M 1.00M) being false -- use mino_bigdec_equals directly via identical?-style comparisons or compare the printed forms if you need to distinguish scales. Internal numeric_tower_test is updated to assert the new behavior.

v0.100.5

Three small fixes for Clojure parity, all driven by the external suite. External suite: 170 OK -> 178 OK on the cumulative run.

`sort` throws on incomparable elements

(sort [1 []]) now throws "compare: cannot compare values of different types", matching Clojure's ClassCastException. Default sort (no comparator) routes through prim_compare instead of the internal type-tag fallback in val_compare, so cross-type elements fail loudly. sort with an explicit comparator is unchanged.

`min-key` / `max-key` NaN handling

The variadic case (min-key k a b & more) now uses the (<= kw kv) "keep current on NaN" loop that JVM Clojure uses, instead of folding through the 2-arg form's (< kx ky) predicate. The two-arg case itself is unchanged. NaN-bearing inputs match Clojure's order-dependent results: (min-key identity [##NaN ##-Inf 1]) is ##-Inf, (min-key identity [##-Inf ##NaN 1]) is ##NaN, etc.

`if-let` / `when-let` / `if-some` / `when-some` validate the binding vector

The macros now assert at expansion time that the binding form is a vector of exactly two elements (one symbol/expr pair). Anything else -- a list, a multi-pair vector -- throws. External when_let.cljc goes 13/13.

v0.100.4

rationalize accepts BigDecimals. The previous arms accepted int, bigint, ratio, and float; passing a BigDecimal threw "argument must be numeric". Now (rationalize unscaled * 10^-scale) reduces to a ratio (or integer when scale <= 0). External rationalize.cljc rises from 11/16 to 14/16 assertions; the two remaining failures are float cases that depend on JVM Double.toString (shortest-decimal roundtrip) which mino does not yet implement -- mino's %g printer is the wider gap and is tracked separately.

v0.100.3

This release bundles four fixes that move three external test files to green and patches one underlying equality bug surfaced by the fourth.

`nthnext` validates inputs

(nthnext nil _) returns nil (was throwing). Non-integer n throws a typed error instead of bottoming out in the inner <= arithmetic. Matches Clojure's surface behavior. Test: nthnext.cljc 13/13.

`rand-nth` validates the collection

(rand-nth nil) returns nil (was throwing on count). (rand-nth 1) (or any non-collection) throws a typed error rather than the inner "count: expected a collection" message. Test: rand_nth.cljc 4/4.

Equality forces lazy tails inside chunked-cons spines

A chunked-cons can hold an unrealized lazy seq in its more field (the typical shape (filter pred (range N)) builds when the predicate keeps every element of each source chunk). The non-forcing eq_seq_like was treating that unrealized lazy as end-of-seq, so (= (range 1000) (filter (fn [_] true) (range 1000))) returned false. mino_eq_force now routes same-tag chunked-cons through eq_seq_like_force, which forces lazy tails on both sides. Regression test added at tests/lazy_test.clj under eq-chunked-cons-with-lazy-tail.

`random-sample` test now passes

The remaining random_sample.cljc failures were a downstream effect of the equality bug above (the suite compares the filter output to the source range with =). With both random-sample itself and the chunked-cons equality bug addressed, the test goes 21/21.

v0.100.2

Transients are now read-callable. Per Clojure, a transient supports the same read-only interface as its persistent view: nth, get, count, contains?, and direct invocation ((t-vec idx), (t-map :k), (t-set v), (:k t-map)). All write operations (assoc, conj, dissoc, disj, pop) still throw, matching the "transients are not persistent" contract.

mino was throwing "expected a vector/map/set, got transient" on every read primitive. Each call site now unwraps the transient to its current persistent backing (failing on transients that have already been persistent!'d). External transient.cljc rises from 6/51 to 51/51 assertions passing.

v0.100.1

(deref delay) now forces and returns the delay's value. mino's delay is map-shaped ({:delay/fn ... :delay/state ...}), so the C-side prim_deref rejected it as "not an atom/var/future/...". force and realized? already special-cased delays mino-side; the override here adds the matching deref arm so all three reference operations agree. External realized_qmark.cljc now passes end-to-end.

v0.100.0

Reader conditionals: drop the :clj fallback. mino is not a JVM dialect, so :clj branches must not fire here. Cross-dialect tests in the wild (e.g., jank-lang/clojure-test-suite) put JVM-only assertions (System/getProperty, (new Object), clojure.lang.MapEntry/create, int-array, …) inside their :clj branch, with :default as the catch-all for non-JVM runtimes -- the suite was authored on the assumption that each non-JVM dialect is named (:cljs, :bb, :jank, :cljr, :lpy, …) and :default covers everything else, mirroring how peer dialects (ClojureScript, Babashka, jank, ClojureCLR, Basilisp) handle the same files.

mino now matches S->reader_dialect (defaults to "mino") and :default only. The bundled lib/clojure/* is unaffected -- it uses :mino/:default exclusively, so no internal lib relied on the old :clj fallback. Internal reader-conditional tests under tests/reader_cond_test.clj and tests/compat_test.clj are updated to exercise the new semantics directly. External jank-lang/clojure-test-suite rises from 166/223 to 170/223 OK on this change alone.

This is a documented intentional divergence. Embedders that *want* JVM-style behavior can override S->reader_dialect to "clj" to have their conditional code receive the :clj branch, but doing so loses the :mino-tagged escape hatch and is not a supported configuration.

v0.99.4

Add the same build-log artifact upload to release-build.yml that ci.yml already grew under v0.99.2. The release-build job runs on ubuntu-22.04 (gcc-11) and uses a different runner image than ci.yml's ubuntu-latest, so a build break that's gcc-11-specific (or glibc-22.04-specific) doesn't surface in ci.yml. Capturing the log here lets external observers grab the gcc error from the artifact even when only the release-build legs fail.

v0.99.3

Handle getcwd's return value in main.c. Ubuntu's glibc declares getcwd with __attribute__((warn_unused_result)) and the bootstrap CFLAGS treat unused-result as an error, so the ignored-call line tipped over -Werror=unused-result on the ubuntu-latest runner. The macOS runner's libc declares the function without the attribute, so the same source compiled cleanly there and the regression went unnoticed locally on a gcc-14 + Debian glibc box where the attribute also doesn't fire. Fix is to capture the result and clear initial_dir on failure -- best-effort, so the rest of the binary still launches if the cwd lookup fails.

v0.99.2

CI follow-up to v0.99.1: also upload the captured build log as a public-downloadable artifact when the bootstrap step fails (in addition to the job summary), so external observers can fetch the exact gcc error without log-download permission. Initialise r_at and b_at to NULL in the mqr_ratio modulus-adjust path so older GCC's -Wmaybe-uninitialized flow analysis sees a definite assignment -- the conditional branches already cover every reachable path, but gcc-11 (release-build runner default on ubuntu-22.04) has a less-precise analyzer.

v0.99.1

CI plumbing: surface the build log on the job summary page (visible without log-in) when a step fails, and print every available gcc-N version on Linux so a regression triggered by the runner-image default GCC change is easier to triage. Also tidy up the try_parse_numeric reader helper -- drop a dead-store buf_capacity variable and move the now-late *err = 0; assignment back to the top of the function body.

v0.99.0

External jank-lang/clojure-test-suite compatibility pass: 166/223 files green (74%) at 4472/4542 = 98.5% assertion pass rate. Each entry below is a Clojure-parity fix or an intentional divergence made explicit.

`(get string i)` Returns a `\char`; Other Strict-Predicate Tightenings

(get "ab" 0) now returns \a (was "a"). The seq path already yielded chars after the earlier UTF-8 walk fix; the indexed get was the only string accessor still emitting one-byte substrings. Walks codepoint by codepoint so multi-byte chars count as a single index.

numerator and denominator now require a Ratio argument; passing a plain integer throws (was: silently returned the integer / 1).

intern requires the target namespace to already exist and throws no namespace: <name> found otherwise; previously it silently created the namespace via ns_env_ensure.

The internal get-fn test was updated to expect \char results.

Symbol / Keyword Compare Sorts Unqualified Before Qualified

compare and val_compare for symbols and keywords now follow clojure.lang.Symbol.compareTo: an unqualified name (no namespace) sorts before any qualified one, and within a single namespace the local names are compared lexicographically. The previous straight strcmp over the printed form put :cat after :animal/cat because 'c' > 'a', so (compare :cat :animal/cat) returned 1 instead of -1. Plain strings still use strcmp.

`(symbol "" "name")` Preserves the Empty Namespace

symbol previously dropped an empty-string namespace argument, producing a symbol whose (namespace ...) returned nil. Per Clojure, (namespace (symbol "" "x")) is "" (the explicit empty namespace differs from nil). The 2-arg form now emits the ns/name cons regardless of whether ns is empty, so the empty prefix round-trips through namespace.

Misc Eager Validations / Predicate Tightening

`pos-int?` / `neg-int?` / `nat-int?` Stay Long-only; `counted?` Drops Strings

Per Clojure, the long-tier predicates pos-int?, neg-int?, and nat-int? compose int? (Long only) -- they reject BigInts -- so (neg-int? -1N) returns false. mino briefly broadened these to the new integer? (long + bigint) when fixing (integer? 1N); this restores the narrow contract.

counted? no longer reports strings as counted. Strings are not Counted on the JVM, where count on a String walks the CharSequence protocol; the predicate now mirrors that.

`use-fixtures` Captures the Caller's Namespace

use-fixtures is now a macro so it can capture the calling namespace at expansion time. The previous function-based implementation read (str *ns*) from inside the function body, but mino's *ns* is the function's *defining* namespace (set when the fn was created) rather than a dynamic var that tracks the caller. As a result every (use-fixtures ...) call registered fixtures under "clojure.test" instead of the user's namespace, so :once and :each fixtures never fired. This was visible in the external suite: parents.cljc and descendants.cljc use a :once fixture to install a global hierarchy via derive, and without the fixture the queries returned nil.

`subs` Indexes by Codepoint

subs previously interpreted its start and end indices as raw byte offsets, so a multi-byte codepoint (e.g. ֎, U+05CE, two bytes in UTF-8) shifted later characters and slicing through one returned a malformed UTF-8 fragment (). It now walks the string by codepoint -- matching Clojure's "string is a sequence of chars" model -- so (subs "ab֎de" 0 5) returns "ab֎de" instead of truncating mid-codepoint. ASCII strings are unaffected since the codepoint walk is byte-equivalent there.

`sort` / `set` -- Char Comparison, Eager Validation, Empty Result

sort now orders MINO_CHAR values by codepoint (was effectively a no-op because val_compare had no MINO_CHAR arm and fell through to type-tag comparison, which is identical for any two chars). Also:

- (sort nil) returns the empty-list singleton () instead of nil, matching Clojure's "sort always returns a sequence" contract; corrected the internal sort-fn test that asserted the nil-tolerant behaviour. - (sort 1) and similar non-seqable inputs now route through prim_seq for the standard "cannot coerce" type error, instead of silently returning an empty result. - The same eager-seqability check was added to set so (set 1) throws.

`list?` Distinguishes Lists From Other Sequences

list? was previously cons?, so it returned true for any cons cell including the chunked-cons spine produced by (seq vector). Per Clojure's contract list? is narrower than seq?: it accepts the empty-list singleton and proper cons chains but excludes lazy-seqs and chunked-seqs. mino now ships a dedicated list? C primitive that matches MINO_CONS and MINO_EMPTY_LIST but rejects MINO_CHUNKED_CONS. (seq? continues to accept the broader family.) The lingering case (list? (seq (sorted-map ...))) still returns true because the sorted-collection seq builds a plain cons chain -- that requires a finer-grained "is this a real list literal" tag and is left as future work.

`rational?` Returns True For BigDecs

(rational? 1.5M) now returns true instead of false, matching Clojure: a BigDecimal's value unscaled * 10^-scale is an exact rational number, so the predicate accepts the bigdec tier alongside int / bigint / ratio. Only the float (IEEE-754) tier remains outside. The internal nt-bigdec-literal test, which had the inverted expectation, was corrected.

Reader Accepts Arbitrarily Long Numeric Literals

try_parse_numeric previously bailed out (treated the token as a symbol) whenever the literal exceeded a 63-byte stack buffer. That meant a bigint literal like 1797693134862315700000...0N -- standard in Clojure for double-overflow tests -- read back as an unbound symbol. The parser now falls back to a heap-allocated buffer for long tokens so the bigint / bigdec / ratio / radix paths handle digit runs of any length.

Eager Validation for `cycle`, `mapcat`, `reverse`

cycle, mapcat, and reverse now raise a type error eagerly when their collection (or function) argument is non-seqable (or, for mapcat, non-invokable). Previously the lazy variants returned a seq-shaped value that only blew up when something forced it; the eager variants (reverse) silently treated the input as empty because the seq_iter_* family short-circuits on unknown types. Matches Clojure's "throws at the call site" behaviour for these specific functions.

Sorted Collections: Predicate Comparator + Cross-comparator Equality

Two related fixes for sorted-map / sorted-set and the -by variants:

- rb_compare now follows Clojure's "predicate comparator" contract: when the comparator returns a non-numeric truthy value it means a < b, but if it returns falsy the function probes the reverse direction (cmp b a) to distinguish a > b from a == b. The previous fall-through always treated falsy as >, so a comparator like plain < could never report equality. That broke rb-tree lookup -- every key landed slightly off-node and rb_get returned nil -- so (get (sorted-map-by < 1 :a) 1) came back as nil even though (seq ...) clearly contained (1 ...). The seq path happened to mask the bug differently: it iterated keys via rb_to_list then re-rb_get-ed each one, so the keys were right but the values were uniformly nil. - Equality on two sorted collections with different comparators (e.g. (sorted-map-by < 1 :a) vs (sorted-map-by > 1 :a)) now returns true for matching content. The trees are arranged in opposite orders, so the structural rb_trees_equal walk could never see them as equal; the new rb_trees_content_equal pairs entries by mino_eq on the key (O(n*log n), tree-shape independent) when the two collections share neither a comparator nor the default ordering.

`str` Drops the `N` / `M` Suffix on BigInts and BigDecs

(str 1N) now returns "1" (was "1N") and (str 1.0M) returns "1.0" (was "1.0M"). The readable printer (pr-str, prn) keeps emitting the suffix so round-tripping through the reader still works; only the non-readable str family was wrong. The previous implementation routed bigints / bigdecs through print_to_string which uses the readable form. The fix adds explicit MINO_BIGINT and MINO_BIGDEC cases in prim_str that format the digits directly -- mirroring how MINO_INT and MINO_FLOAT are already handled.

`<` / `<=` / `>` / `>=` -- Strict Numeric Operands and NaN Unordering

The four numeric comparison operators previously accepted nil (silently treating it as 0.0) and ranked any NaN operand as equal to NaN, so (< nil 1) returned true and (<= ##NaN ##NaN) also returned true. Both now match Clojure:

- Each pair of consecutive operands must be a number (long, bigint, ratio, bigdec, or float). Anything else -- nil included -- throws eval/type. - If either operand in a pair is NaN, the whole chain short-circuits to false (NaN is unordered against every value, itself included).

The single-argument form is unchanged: (< x) returns true for any x without inspecting its type, matching Clojure's "trivially true on zero or one argument" contract.

`special-symbol?` Recognises Clojure's Reserved Special Forms

special-symbol? now returns true for the Clojure-reserved special form names &, ., case*, catch, deftype*, finally, fn*, let*, letfn*, and loop*. mino implements the unstarred forms (fn, let, loop) directly and also accepts the starred aliases where applicable; the remaining names are unimplemented but are still reserved as a portability courtesy so that code which inspects symbol status (linters, code-walkers, syntax-quote logic) does not have to special-case the dialect.

`mod` / `rem` / `quot` Preserve Operand Type

mod, rem, and quot now dispatch on the higher tier of their two operands and preserve the result type per Clojure's contagion rules:

- bigint inputs produce bigint results - ratio inputs produce bigint quot, ratio rem / mod (collapsing to bigint when the value is integer) - bigdec inputs produce bigdec results at the aligned scale - long inputs stay long (with LLONG_MIN / -1 overflow promoted to bigint) - float inputs use the existing fmod path

Previously the three primitives coerced both operands through tower_to_double, computed via fmod, and packed the result as long or float. Bigints, ratios, and bigdecs all collapsed lossily, so (mod 10 3N) returned a long (failing (big-int? r)) and (mod 10 3.0M) returned a float (failing (decimal? r)).

The new path adds three internal helpers in src/prim/bignum.cmino_bigint_quot / _rem / _mod (truncated division on bigints, with mod adjusting toward the sign of the divisor) — and mino_bigdec_quot / _rem / _mod which align scales before deferring to the bigint helpers. The ratio path cross-multiplies numerators and denominators into bigints so it can reuse the same quotient logic, then derives rem and mod via tower subtraction.

`integer?` Recognises BigInts

integer? previously aliased int?, so (integer? 1N) returned false. Per Clojure's contract, integer? is true for any value that is "exactly an integer", which on the JVM covers Long, Integer, Short, Byte, BigInt, and BigInteger. mino represents the integer tier with MINO_INT (long) and MINO_BIGINT (arbitrary-precision), so integer? now returns (or (int? x) (bigint? x)). The composed predicates pos-int?, neg-int?, and nat-int? route through the new integer? so they pick up the bigint tier for free.

The external test suite's portability shim now defines big-int? as bigint? (rather than integer?) so it specifically probes the bigint type, matching the JVM (instance? clojure.lang.BigInt n) semantics.

`doseq` Supports `:let`, `:when`, and `:while` Modifier Clauses

doseq now recognises the three modifier clauses Clojure's version exposes alongside plain bindings:

- :let [name expr ...] introduces locals visible to the remaining clauses and the body. - :when expr skips an iteration when expr is falsy. - :while expr halts iteration entirely (including outer binding loops) when expr is falsy.

Previously the binding parser stopped at clause-keyword/value pairs and tried to call seq on the keyword's "value" (e.g. on the boolean produced by :while (< x 3)), so any modifier triggered a "seq: cannot coerce bool to a sequence" error.

:while's "stop everything" semantics is implemented with a shared stop atom that the outer recursive driver consults each iteration; without it an outer infinite seq paired with a later :while would never terminate.

doseq.cljc: 11 passing / 1 error -> 15/15 clean.

`realized?` Throws on Non-pending Inputs

realized? previously returned true for any value that wasn't a lazy seq, which let (realized? 1) / (realized? :foo) / (realized? []) etc. silently pass through. Now the prim matches Clojure's contract: it returns the realized state for MINO_LAZY (lazy seqs and delays share that representation) and MINO_FUTURE, and throws realized? expects a lazy seq, delay, promise, or future for anything else.

Keywords as Functions Look Up in Sets

(:k #{:k :other}) now returns :k (and similarly for sorted sets) instead of nil. Per Clojure, keyword invocation against a set treats the set as a membership probe -- the keyword is its own value, returned when present and the supplied default (or nil) when absent. The previous fall-through returned the default for any non-map collection.

`repeat` Truncates Non-integer `n` Toward Zero

(repeat 3.14 x) and (repeat 3.99 x) now both return three repetitions of x, matching Clojure (repeat truncates the count toward zero before counting). The previous implementation recursed with (- n 1) and only stopped when n reached 0, so floats produced an off-by-one extra element.

`reverse` Returns the Empty-list Singleton

(reverse nil) and (reverse <empty>) now return () rather than nil, matching Clojure (reverse always returns a sequence). Updated the internal reverse-fn test which had been asserting the old nil-tolerant behaviour.

Non-readable Print of Characters

print / println (the non-readable family) now emit a character's codepoint as UTF-8 bytes instead of its \name / \letter escape form. So (print \A) writes A (matching Clojure) rather than \A. The readable pr / prn family still emits the escape form via print-to-string. Implemented in append_print_chunk with a MINO_CHAR branch that encodes the codepoint inline.

`(empty seq)` Returns the Empty-list Singleton

empty on a list / cons / lazy-seq / chunked-cons / () now returns the empty-list singleton () rather than nil. Per Clojure, the contract is "an empty collection of the same kind"; for sequence types that's (), not nil. The branches for maps / vectors / sets / sorted maps already returned the right empty collection; only the seq branches were wrong.

Char Semantics Across `first`, `rest`, `cons`, and Iterators

The (seq string) change shipped chars on the seq path; this follow-up brings the rest of mino's string-as-sequence operations in line:

cons.cljc, fnext.cljc, zipmap.cljc, interpose.cljc etc. that depended on character iteration now pass cleanly.

`(seq string)` Yields Characters Per Clojure

Previously (seq "abc") returned a sequence of one-byte strings (("a" "b" "c")); now it returns a sequence of MINO_CHAR codepoints ((\a \b \c)), matching Clojure. The implementation walks UTF-8 codepoint by codepoint so multi-byte characters like and \☃ come out as single MINO_CHAR values rather than fragmented byte slices.

This unblocks the conformance suite's interpose, cons, fnext, zipmap, etc. cases that depend on character semantics ((map identity "abc")(\a \b \c)). Two internal tests (seq-fn's string case and clj-into-concat) were updated to expect \a \b instead of "a" "b"; the test suite now uses the Clojure semantics throughout.

The companion seq_iter_val path used by direct iterators (first, next over strings via seq_iter_*) still emits substrings; that path needs UTF-8 byte-step tracking to switch safely and is left for a separate fix.

`parse-boolean` Throws on Non-string Input

Per Clojure 1.11+'s contract, parse-boolean throws on non-string arguments (NullPointerException / ClassCastException on the JVM for nil / non-string types). Mino's previous implementation silently returned nil for any non-string input. The function now raises an ex-info for non-strings; matching strings return their boolean and non-matching strings still return nil.

The internal parse-boolean-cases test was updated to assert the new contract (the cases that previously expected nil for nil and 42 now expect a throw).

`keys` and `vals` Accept the Empty-List Singleton

Both keys and vals had explicit "return nil" branches for empty vectors / sets / strings / sorted sets / nil, but () (the MINO_EMPTY_LIST singleton) wasn't in the set, so it fell through to the "must be a map" error. Added MINO_EMPTY_LIST alongside the other empty cases.

`clojure.test/use-fixtures`

use-fixtures now lives in lib/clojure/test.clj with the familiar :once and :each kinds. Fixtures are registered per- namespace in a fixtures-registry atom; the runner groups registered tests by namespace, wraps each ns's batch with its :once fixtures (outermost first), and threads each individual test through its :each fixtures. Multiple fixtures of the same kind compose left-to-right via compose-fixtures. This unblocks the suite's descendants.cljc and parents.cljc, which both declared (use-fixtures :once with-global-hierarchy) to set up shared derive state.

Numeric Predicates Across the Full Tower

zero?, pos?, neg?, even?, and odd? now accept the full numeric tower (long, double, bigint, ratio, bigdec) instead of just long/double. Sign predicates route through tower_to_double for ordering; even? / odd? use imath's mp_int_is_odd directly so arbitrary-precision integers work without the lossy double conversion. tower_to_double is exported via prim/internal.h for shared use.

This unblocks the predicate test files (e.g. pos_qmark.cljc, even_qmark.cljc) that previously erred on (pos? 0N), (even? 122N), etc.

`compare` Cross-numeric, Chars, and Vectors

compare now matches Clojure's contract more precisely:

Genuinely incompatible cross-type pairs (e.g. (compare 1 [])) still throw, matching Clojure's compareTo ClassCastException behaviour.

`(symbol var)` Returns the Var's Qualified Name

symbol's 1-arg form now accepts a Var and returns its fully-qualified name as a symbol (e.g. (symbol #'+)'clojure.core/+). Vars are Named in Clojure, so this matches the contract; previously the call raised a type error. Vars with no owning namespace yield the bare name.

`merge` Accepts MapEntries and Non-map Args via `conj` Semantics

Rewrote merge to match Clojure's (reduce conj (or acc {}) ms) shape rather than walking (seq m) and assuming each entry is a (k v) pair. Position-2+ args may now be MapEntries or 2-element vectors (e.g. (merge {:a nil} (first {:a "a"}) {:b "b"})), and non-map first args follow conj's undefined-but-non-throwing behaviour. The previous implementation broke as soon as it hit a MapEntry because (seq mapentry) yielded the bare key as the first "kv", which first then rejected.

`(conj map nil)` is a No-op

Per Clojure, (conj coll nil) returns coll unchanged on every collection type. Mino's conj honoured that for vectors / lists / sets but mapped nil to the "must be a map entry or 2-element vector" type error on maps. Mixed sequences like (conj {:a 1} nil [:b 2] nil) now collapse the nils and apply only the real entries.

`peek` on the Empty-List Singleton

peek now recognises the canonical empty list () (a MINO_EMPTY_LIST singleton, distinct from MINO_NIL and MINO_CONS) and returns nil for it, matching Clojure. Without this, (peek '()) threw peek: expected a vector or list, got list because the empty-list type slipped past the existing NIL/CONS branches.

`assoc!` Variadic Arity (with Odd-out Nil)

(assoc! tcoll k v & kvs) now accepts the variadic Clojure form plus the documented JVM quirk: a trailing odd-out key with no matching value is treated as key nil. Previously only the 3-arg form was accepted. Each pair (or trailing nil pair) is assoc'd left-to-right against the running transient.

`dissoc!` / `disj!` Variadic Arity

Both transient ops now accept the variadic Clojure form (dissoc! tcoll k & ks) / (disj! tcoll k & ks). Each extra key is processed left-to-right against the running transient. Previously only the 2-arg form was accepted.

`conj!` Variadic Arity

conj! now matches Clojure's full signature: (conj!) returns a fresh transient empty vector, (conj! tcoll) returns the transient unchanged, and (conj! tcoll x & xs) conj's each extra value in turn and returns the final transient. Previously only the 2-arg form was accepted.

`(dissoc m)` Returns the Map Unchanged

Per Clojure's contract, dissoc is variadic with a 1-arg form that returns the map untouched. Mino previously required at least one key and threw on (dissoc m); now the no-key call short- circuits and returns m directly.

`atom` Accepts `:meta` and `:validator` Options

(atom x) now accepts the variadic Clojure form (atom x & opts) where opts is a flat keyword/value sequence including :meta map- or-nil and :validator fn-or-nil. :meta attaches metadata visible via (meta the-atom) (atoms now sit alongside symbols / collections / fns / vars in the set of types that carry metadata). :validator runs the supplied fn against the initial value and installs it on the atom so subsequent swap! / reset! calls go through it; an initial value the validator rejects raises an Invalid reference state error at construction time, matching Clojure's contract. Unknown option keys are tolerated silently.

Syntax-Quote Auto-Qualification Inside Macro-Generated Closures

Closures created during a macro body's evaluation -- the canonical shape being (map (fn [row] (some-sym ...)) rows) inside a macro -- now capture the macro's defining namespace as their defining_ns, not the caller's. Without this, invoking the closure overwrote fn_ambient_ns` with the caller's namespace, so syntax-quote inside the closure couldn't find symbols in the macro's namespace and emitted them bare.

The clearest example was clojure.test/are. From the test file's namespace, (macroexpand '(are [x] (= x x) 1 2)) produced (do (is (= 1 1)) (is (= 2 2))) -- bare is -- because the inner (fn [row] ...) ran with defining_ns = caller's ns. Predicate test files in the external suite that referred only [are deftest testing] (not is) consequently errored on unbound symbol: is, which is why files like boolean_qmark, coll_qmark, map_qmark, and many others reported 0 passed, 1 errors. Post-fix the same expansion produces (do (clojure.test/is (= 1 1)) ...) and those files run cleanly.

A paired fix in the require/load path clears fn_ambient_ns for the duration of a file load. File loads are a top-level boundary, and the file's own defn closures must capture the file's namespace as their defining_ns -- not whatever macro-expansion ambient happened to be active when require was called from inside a closure. Without this, (require 'clojure.edn) from inside a deftest body bound clojure.edn/read with defining_ns = "user", breaking subsequent calls to it; mirrors the existing eval / load-string / load-file behavior.

External clojure-test-suite Driver

A new pure-mino driver, tests/clojure_test_suite.clj, runs the [jank-lang/clojure-test-suite][cts] against mino. The driver expects the suite cloned as a sibling directory (../clojure-test-suite), forks one ./mino sub-process per .cljc file so a single SIGSEGV or hang doesn't lose the rest of the run, applies a 30 s per-file timeout, parses each summary line, and prints an aggregate report plus a categorized breakdown (load errors, crashes, timeouts, assertion failures). The same script self-dispatches into a one-file harness when given a path argument, used by the driver's sub-fork.

Two new shims under lib/clojure/core_test/ make the suite loadable: portability.clj provides when-var-exists (the suite's per-test "skip if var doesn't exist" macro), big-int?, and a no-op sleep; number_range.clj already existed for numeric constants. Without those shims every test file fails to load because the canonical jank portability.cljc is JVM/CLJS-bound (Throwable, Thread/sleep, cljs.test).

[cts]: https://github.com/jank-lang/clojure-test-suite

Clean Compile Under -Werror

The default make build now compiles warning-free with -Wall -Wpedantic -Wextra and treats remaining warnings as errors via a new -Werror in CFLAGS. Fourteen pre-existing warnings are fixed, split between two classes.

-Wclobbered flagged five locals whose values could be lost when a longjmp rewound past their setjmp. In eval_try the vol_result and vol_ex declarations were volatile mino_val_t * — pointer-to-volatile, which protects the pointee but leaves the pointer itself non-volatile. Moved the qualifier so the pointer is volatile (mino_val_t * volatile). In mino_eval_string_inner the src parameter was reassigned in the read loop after the top-level setjmp; it now lives in a const char * volatile local copied from the parameter. In atom_notify_watches the loop counter i is alive across the per-iteration setjmp and is now volatile.

-Wformat-truncation flagged nine snprintf sites where worst- case inputs could overflow the destination. The diagnostic buffers in apply_refer_options (require: errors) and validate_only_names (refer: errors) were char msg[300] but hold two 256-byte names plus format text; bumped to 600. The four cwd_resolve / try_resolve paths that build <dir>/lib/<name><ext> now gate each snprintf on a runtime length check that proves the output fits.

v0.98.6 — Bump MINO_VERSION_* Constants

The five v0.98 tags (v0.98.0 through v0.98.5) shipped with MINO_VERSION_MINOR=97 / MINO_VERSION_PATCH=5 left over from v0.97.5. The release-build workflow asserts the tag matches the header constants and rejected v0.98.5 on every platform. Per the no-force-push-tags rule this lands as a fresh patch tag. No behavioral change; only src/mino.h's version triple moves forward.

v0.98.5 — Seedable PRNG + Minimal clojure.test.check Port

random-seed! is a new primitive that seeds the per-state PRNG (xorshift64* on S->rand_state) to a known integer so subsequent rand / rand-int / rand-nth calls produce a reproducible stream. Same seed in, same sequence out.

A minimal clojure.test.check ports lands in three new bundled namespaces:

Shrinking is not implemented — failure reports return the unshrunk failing args with a :note explaining the limit.

clojure.spec.alpha/gen and clojure.spec.alpha/exercise no longer throw :mino/unsupported. They now consult clojure.test.check.generators to produce values matching common predicate forms (int?, string?, keyword?, boolean?, etc., both bare and clojure.core/* qualified) and the structural combinators coll-of, tuple, nilable, and, or. Specs that need a custom generator can pass an overrides map keyed by spec or predicate symbol.

mino strings carry no separate char type, so the char family of generators yields single-character strings instead of character values, matching mino's existing subs s i (inc i) idiom.

v0.98.3 — Auto-Chunking Sources

(seq vector) and (range ...) now emit MINO_CHUNKED_CONS spines of 32-element chunks instead of flat cons cells. Vector leaves are already 32-wide (MINO_VEC_WIDTH=32), so the source-side chunking walks them directly via vec_nth into a new MINO_CHUNK per leaf; lazy range produces a fresh chunk of 32 (or however many remain) on each force.

(chunked-seq? (seq [1 2 3])) and (chunked-seq? (range 10)) now return true. Map/filter/take/keep/keep-indexed/map-indexed already propagated chunkedness end-to-end (v0.96.8), so (reduce + (map inc (filter odd? (range 1e6))))-style pipelines now run end-to-end chunked without per-element cons-cell allocation.

array-map insertion-order semantics were verified to already match canon — MINO_MAP's companion key_order vector preserves insertion order through seq, assoc, and dissoc. mino's hash-map is more conservative than canon's (which has undefined order); no new MINO_ARRAY_MAP value type was needed.

Touched primitives that needed CHUNKED_CONS handling now that more seqs flow through them:

v0.98.2 — clojure.string/split 3-Arg Limit

(split s sep limit) now returns at most limit substrings, with the last element absorbing the rest of the input — matching canon's String.split(re, limit) for limit > 0. limit <= 0 keeps the existing no-cap behavior (which preserves trailing empties, the canon limit < 0 semantics). Char-split ((split s "" limit)) is covered by the same code path.

Audit closures for the rest of the namespace:

v0.98.1 — compare Cross-Type Total Order

compare no longer throws when its two arguments straddle type tiers; it returns the canon total order instead:

`` nil < false < true < numbers < strings < symbols < keywords ``

(sort [:b 'a "c" 1 false nil]) now returns (nil false 1 "c" a :b) — same as canon Clojure.

Same-type compares are unchanged; same-tier-different-content mixes still throw if neither operand is comparable to the other (e.g., a record and a function in the same tier).

v0.98.0 — Macro Hygiene For Cross-NS :refer :all

Syntax-quote inside a macro body now qualifies bare symbols against the macro's defining namespace, not the consumer's *ns*. Before this fix, a clojure.core macro that referenced bare atom (or *out*, deref, etc.) inside ` (...) ` would qualify those symbols to whichever namespace the consumer happened to be in, silently breaking expansions like with-out-str whenever the consumer had pulled the source ns in via :refer :all. The (a/go ...) form invoked from outside clojure.core.async similarly threw unbound symbol: chan*` because the macro's bare references to its own private helpers were qualified to the consumer's ns.

qq_qualify_symbol (src/eval/eval.c) now consults S->fn_ambient_ns (the macro's defining ns, set by apply_callable) for both alias resolution and the env walk that finds the qualifying namespace. The check fires only when fn_ambient_ns differs from current_ns, which apply_callable arranges only for MINO_MACRO bodies — MINO_FN bodies leave them equal, so this is a no-op for fn calls.

clojure.test/is-eq, is-thrown, is-truthy lose ^:private, and the assert-pass! / assert-fail! helpers lose defn-. They were only "private" because the bug let public macros emit syntax-quoted references to them under the consumer's ns, which bypassed the privacy check. With the macro-hygiene fix these references correctly qualify to clojure.test, so the helpers must be public — matching canon clojure.test's pattern of public-but- internal helpers.

v0.97.5 — clojure.spec.alpha Introspection Utilities

clojure.spec.alpha gains the two canon introspection helpers:

The namespace now requires [clojure.walk :as walk]. Generators (gen, exercise) continue to throw :mino/unsupported.

v0.97.4 — Lift defn So Top-Of-File Predicates Use It

defn, defn-, defonce, and the private fn-arity-with-prepost helper move above the early type predicates in src/core.clj. With defn now available before not=, the six bootstrap-era (def NAME "doc" (fn ...)) sites — not=, identity, ifn?, qualified-symbol?, simple-symbol?, qualified-keyword?, simple-keyword? — become regular defn forms. The defn macro itself only depends on special forms, primitive fns, and the macros already defined above its new position (when, cond, and, or, ->, ->>).

No behavioral changes; the full test suite still passes.

v0.97.3 — clojure.core.async Canon Combinators

Adds four canon channel combinators to clojure.core.async:

The namespace's :refer-clojure :exclude list now also drops reduce, transduce, and partition-by. The one internal use of clojure.core/reduce inside the go macro is now fully qualified so excluding the unqualified name doesn't break macro expansion.

v0.97.2 — src/core.clj Code-Quality Sweep

Walk src/core.clj for the project's 80-char line limit. 157 long lines are gone (the longest was 226 chars on partition). Most cuts are docstrings that used to live on the same line as the defn signature; they now sit on their own line with a 3-space continuation indent.

Five macros (lazy-cat, delay, defprotocol, extend-protocol, defmulti) had their args vectors on the docstring line; both moves to their own line beneath the docstring. Three inline anonymous fns (method metadata and method-defn builders inside defprotocol, and the descendants accumulator inside recompute-hierarchy) became letfn helpers with descriptive names. bit-test swaps (not (= 0 ...)) for (not= 0 ...). Two opportunistic idiom swaps: (when (not (coll? ...))) becomes when-not in shuffle, and (when (not (nil? idx))) becomes (when (some? idx)) in re-seq.

The six (def NAME "doc" (fn ...)) sites at the very top of the file (identity, ifn?, qualified-symbol?, simple-symbol?, qualified-keyword?, simple-keyword?) keep the def form because they load before defn itself is interned. Their docstrings are wrapped onto their own lines.

No behavioral changes; the full test suite still passes.

v0.97.1 — Sort-By and Reductions Arities

sort-by and reductions were single-signature [f & args] defns that branched on (count args) and silently returned nil on any arity outside the canon shapes. Both are now multi-arity: sort-by exposes [keyfn coll] and [keyfn cmp coll]; reductions exposes [f coll] and [f init coll]. Bad arities now throw the standard "no matching arity" diagnostic instead of producing a quiet wrong answer.

The wider audit of clojure.core arities walked the rest of the spot-check list (partition 4-arg, pop/peek on lists, subseq/rsubseq, nth 3-arg, assoc/dissoc n-arg, range 0-arg, subs 3-arg, min-key/max-key n-arg, concat 0/1/n-arg, zipmap/interleave arity coverage, apply, merge, update) and found everything else covered.

v0.97.0 — Kwargs Destructuring

& {:keys [...]} parameter lists now match Clojure 1.11+ canon. The runtime's map destructure accepts all three rest-args shapes: an inline keyword/value pair sequence ((g :k v :k v)), a single trailing map ((g {:k v})), and a mix of pairs followed by an override map ((g :k v {:k v})). The fix lives in bind_map_destructure in src/eval/bindings.c. :or defaults are now evaluated in the binding env, so symbols like some? resolve to their function values instead of being bound as the literal symbol.

iteration no longer carries a divergence note. Its signature is now [step & {:keys [somef vf kf initk] :or {...}}], matching canon.

v0.96.9

Adds workflow_dispatch to the release-build GitHub Actions workflow. GitHub drops tag-push events when more than three tags push in one batch, so the v0.95.* and v0.96.* canon-parity cycles never fired the workflow on tag push. The dispatch trigger lets the workflow run against any existing tag via gh workflow run release-build --ref <tag>. No runtime changes; the C version-define moves to 0.96.9 so the bump itself fires release-build under the new trigger.

v0.96.8 — Chunked-Seq Family

Adds the clojure.core chunked-seq surface: chunk-buffer, chunk-append, chunk, chunk-cons, chunk-first, chunk-rest, chunk-next, and chunked-seq?. Two new C value types back the implementation: MINO_CHUNK (a fixed-cap, mutable-then-sealed value buffer) and MINO_CHUNKED_CONS (a seq cell that carries a chunk plus an offset and a tail seq).

Chunked seqs participate in the seq protocol transparently: first, next, rest, seq, count, nth, reduce, equality ((= chunked flat) is true), and printing all walk a chunk-cons the way they walk a regular cons. The walk dispatches at the chunk level where possible — count sums chunk lengths, nth indexes into the underlying chunk, reduce honours chunk boundaries via the seq iterator.

The C-level lazy combinators map, filter, and take propagate chunkedness end-to-end: when fed a chunked input, they read the head chunk in one go via chunk-first, build a fresh chunk via chunk-buffer/chunk-append/chunk, and emit a chunk-cons. The mino-level keep, keep-indexed, and map-indexed follow the same pattern, so longer pipelines preserve chunkedness across mixed C-level and mino-level steps.

Sources are not auto-chunked yet — (seq [1 2 3]) still returns a flat cons list, and (chunked-seq? (seq [1 2 3])) is false. The chunk-aware fast paths fire when consumers explicitly construct a chunked seq via the new primitives. Auto-chunking vectors and ranges is a follow-up cycle that needs the wider walker audit (mino_is_cons appears in 416 sites; see .local/BUGS.md-tracked notes).

v0.96.7 — `:refer :all` Drops Transitive Refers; Macros Get Vars

(require '[some.ns :refer :all]) previously bound every name present in the source ns env into the consumer — including names the source ns had referred *into* itself from clojure.core via auto-refer. Result: any consumer of a wrapper namespace silently re-bound every clojure.core name through that wrapper, shadowing its own clojure.core refers. Canon brings only the source ns's owned publics (matching (ns-publics 'src)); mino now does the same.

defmacro now interns a var alongside the env binding, so macros appear in (ns-publics 'ns) and propagate via :refer :all the same way defn does. Macro publics that previously slipped through only the env binding now show up in introspection too.

A separate macroexpansion-after-:refer :all defect is still open and tracked in .local/BUGS.md #9; the recommended idiom for now remains (require '[some.ns :as a :refer [...]]) with an explicit refer list when the consumer also calls macros defined in clojure.core.

v0.96.6 — Wrap `clojure.core.async`; Rename `merge-chans`/`async-into`

The two files that backed mino's CSP layer — lib/core/channel.clj and lib/core/async.clj — combine into lib/clojure/core/async.clj, declaring (ns clojure.core.async (:refer-clojure :exclude [merge into])). The pre-existing merge-chans and async-into names existed only to avoid shadowing clojure.core/merge and clojure.core/into for any consumer that loaded core/async; with the namespace wrap, that constraint goes away and the canon names are restored.

Consumers in mino's own test suite migrate from (require "core/async") to (require '[clojure.core.async :as a :refer [...]]) with an explicit refer list. The async surface stays bare in test bodies; the renamed merge and into are accessed as a/merge and a/into so they do not shadow clojure.core/merge and clojure.core/into in the test file's local namespace.

(into old modes) inside toggle switches to (clojure.core/into ...) because the unqualified call now resolves to the channel into.

The :refer :all shape is intentionally not used here. Mino's require :refer :all pulls every binding present in the source ns env, including transitive refers from clojure.core (atom, *out*, deref, ...) — that drag-along is itself a smaller silent-surprise debt tracked separately, and an explicit refer list sidesteps it for this consumer.

Sibling-repo consumers — mino-bench benches that (require "core/async"), the mino-site "Coming from Clojure" page that mentions merge-chans, and mino-site/parse/async_api.clj that reads both source files — update when their submodule pins advance.

v0.96.5 — `iteration` (Clojure 1.11)

iteration constructs a seqable from repeated calls to a step function: each step returns a value plus a continuation token. Used to consume paginated APIs and other batch sources where the producer exposes "give me the next page from here". The first call is deferred until the seq head is forced, so the step function may be impure.

The defaults match canon: :somef defaults to some?, :vf and :kf default to identity, and :initk defaults to nil.

Divergence from canon: opts are passed as a single map argument ((iteration step {:vf identity ...})), not as keyword args ((iteration step :vf identity ...)). Mino's & {:keys [...]} destructuring does not yet pick up trailing keyword pairs; a future cycle will close that gap and the canon-style call shape will work without code changes.

v0.96.4 — Small Canon-Parity Additions

comp and partial adopt canon's hand-unrolled fast-path shape: 0/1/ 2-arg comp and partial no-op or curry directly; the binary comp returns a fn with explicit 0/1/2/3-arg arities plus a variadic fallthrough; partial does the same for one-, two-, and three-arg prebound forms. The general n-arg form remains for the long tail.

some-fn and every-pred move from a single variadic implementation to canon's per-arity unrolled shape (1, 2, 3 preds × 0, 1, 2, 3 args plus variadic). The binary semantics are unchanged — both still short-circuit on the first decisive value — but the hot 1/2/3-pred case skips the iterator the variadic shape used.

into gains the missing 0-arg ((into) ;=> []) and 1-arg ((into to) ;=> to) forms that canon ships. The 2-arg (into to from) and 3-arg (into to xform from) forms are unchanged.

unchecked-divide-int is installed as an alias for quot — both are truncating integer division. Canon's unchecked-divide-int skips overflow checks because the JVM idiv instruction does; mino's quot is already a primitive C division on long, so no extra elision is needed.

The four (def name "doc" (let [helper ...] (fn ...))) forms left over from the prior cycle's hygiene pass — zipmap, cycle, partition-all, re-seq — convert to (defn name "doc" [args] (letfn [(helper ...)] ...)). The local helper now sits in a letfn (or directly in the body) where it can recur instead of self-reference; semantics are identical.

v0.96.3 — Transients in `frequencies`/`group-by`; `unreduced` Cleanups

frequencies and group-by rebuild their result map through a (transient {}) accumulator with assoc!, ending in persistent!. Both used to allocate a fresh persistent map per input element via update; the transient path drops that to one allocation per distinct key plus log-N batched writes.

get now treats a transient associative as transparent — it follows the transient's underlying persistent collection, matching canon's ITransientAssociative2 contract. find already did this; bringing get in line was needed for frequencies/group-by's (get acc x default) lookups against the transient accumulator.

The completion arities of partition-by and partition-all swap their inline (if (reduced? r) @r r) for the existing unreduced helper; the helper has been in src/core.clj since the Cycle G rewrite.

v0.96.2 — Lazy-Seq `recur`-On-Skip Rewrites

Four lazy-seq combinators that previously allocated a fresh lazy-seq cell on every input — including the ones they were going to skip — adopt canon's pattern: an outer step function produces a lazy-seq cell only when emitting, and an inner anonymous fn recurs when skipping. The rewritten sites are distinct (collection arity), drop-while (collection arity), keep-indexed, and dedupe (collection arity). dedupe's collection arity now delegates to (sequence (dedupe) coll), matching canon's shortcut.

The user-visible result on duplicate-heavy or long-skip inputs is one allocation per emitted value instead of one per element visited. The pre-existing drop-while collection arity used a non-lazy recursive walk; the rewrite restores lazy semantics that match canon.

v0.96.1 — Stateful Transducers Use Real `volatile!`

Ten transducer state slots in src/core.clj switch from (atom ...) plus swap! / reset! to (volatile! ...) plus vswap! / vreset!: take, drop, drop-while, take-nth, interpose, distinct, partition-by (both buf and pval), partition-all, map-indexed, and dedupe. The transducer contract already implies single-thread access to that state — the reducing fn is invoked from one thread at a time — so the watch + validator + atomic-publish overhead the atom carried was pure waste on every step.

The user-visible contract is unchanged: same primitives, same lazy-vs- eager arities, same return values. The change is per-step throughput on stateful-transducer pipelines once host threads enter the picture (single-threaded states avoided the CAS already, but still paid for the atom struct's extra slots).

v0.96.0 — `volatile!` Becomes a Real Type

Up to this release, volatile! was a Clojure-side alias for atom, which meant every transducer state slot paid for the atom's watch and validator pointers and (once host threads entered the picture) for the write barrier and atomic publish that swap! issues on multi-threaded states. Canon and ClojureScript both ship a real one-slot volatile cell because transducer state has a single owner — the reducing function — and does not need any of that infrastructure.

MINO_VOLATILE joins the value-type enum as a one-slot mutable cell with no watches, no validators, and no atomic publish. The four operations are now C primitives: volatile!, volatile?, vreset!, and vswap!. deref recognises a volatile in addition to atom, var, future, and reduced. The four Clojure-side aliases at the top of the volatile section in src/core.clj are gone; nothing in user code should notice because the surface and semantics are unchanged on single-thread reads and writes.

The print form is #volatile[VAL], (type v) returns :volatile, and (= (atom 1) (volatile! 1)) is now false because the two are distinct types. The MINO_VOLATILE enum entry is appended after MINO_ATOM, so the embedder ABI stays additive.

This release is the foundation for the stateful-transducer rewrite that ships in v0.96.1.

v0.95.5 — `src/core.clj` Hygiene Sweep

The bundled core library that ships inside the binary went through a naming and surface-form pass. Private helpers no longer carry a trailing underscore; mino now uses defn- (and def ^:private for non-fn vars) to communicate privacy the same way Clojure does. The private symbols renamed include fn-arity-with-prepost, map1, all-some?, map-n, match-whole, substring-index, re-find-on-matcher, type-marker-key, partition-protocol-specs, global-hierarchy, hierarchy-version, tc-ancestors, recompute-hierarchy, valid-hierarchy?, prefers?, find-best-method, create-multimethod, register-method, special-symbols-set, uuid-hex-pattern, uuid-string?, and tap-fns. The captured-primitive alias into_ becomes the prim-- prefixed prim-into, matching the convention in clojure.string. Two formerly-underscored protocol helpers are public surface and keep their canon names: internal-reduce and internal-reduce-kv (shadow the C primitives that the protocol-aware reduce and reduce-kv delegate to). protocol-dispatch stays public because it is emitted by the defprotocol macro into user namespaces.

Every definition past the bootstrap zone moved from (def name "doc" (fn [args] body)) to the equivalent (defn name "doc" [args] body). The bootstrap area at the top of the file (anything before the defn macro is bound) keeps the bare-def form because defn does not yet exist there. Roughly 120 forms changed shape; the binary semantics are identical because mino's defn macro expands to the same (def name doc (fn ...)) form underneath.

comparator no longer uses true as its catch-all clause in cond; it uses the canonical :else. some-fn was rewritten from a double-loop accumulator to a (some (fn [p] (some p args)) preds) expression; behaviour matches canon's "first truthy value of any pred against any argument" surface and the implementation is no longer an obstacle when reading the file.

v0.95.4 — `mino.tasks.builtin` and `clojure.string` Hygiene

gen-core-header no longer carries its own copy of the C-string-literal escape logic. The escape-source-as-c-string-literal helper now sits above both gen-core-header and gen-stdlib-headers, and both call into it. The escape rules can no longer drift between the two generators.

gen-stdlib-headers and qa-arch no longer thread accumulator atoms through their bodies. gen-stdlib-headers reduces over a per-file regen-stdlib-header helper that returns 1 or 0; the total update count is (reduce + 0 ...) instead of an (atom 0) updated inside a doseq. qa-arch follows the same shape: each gate (TU size, function size, abort inventory) is its own helper that prints its report and returns its failure count, and the top-level summary just adds them up.

clojure.string/index-of-from_ is renamed to index-of-from. The trailing-underscore-for-private convention is non-standard; the defn- on the helper already communicates privacy. re-quote-replacement no longer reinvents a per-character loop/reduce; it now delegates to the existing clojure.string/escape with a two-key char map for \\ and $.

v0.95.3 — `core.async` Canon Parity

onto-chan and to-chan are renamed to onto-chan! and to-chan! to match canon clojure.core.async. Both side-effecting bang-suffixed names communicate the same write intent canon does: onto-chan! puts each element of a collection onto a channel and (by default) closes it; to-chan! constructs a channel sized to a collection, fills it, and closes. No aliases are kept — alpha posture means call sites move forward in lockstep.

pipeline gains the canon 6-arg form [n to xf from close? ex-handler]. When the transducer throws, ex-handler is called with the exception and its return value (when non-nil) is forwarded as the replacement output; nil results are dropped. The 4-arg and 5-arg forms keep the same surface and now route through the new arity with a nil handler.

alts! accepts canon-style trailing kwargs in addition to its existing single-map form. (alts! ops :priority true :default :nope), (alts! ops {:priority true :default :nope}), and (alts! ops) all work. The dispatch normalises the trailing args via a small alts-opts-map helper that detects the legacy single-map call and otherwise rebuilds the opts map from the kwargs.

Two ad-hoc helpers in core/channel were collapsed into primitives: range-vec is now (vec (range n)) and shuffle-vec is now shuffle, both already in mino. pipeline-blocking remains a def alias for pipeline until a separate blocking-IO scheduler lands; the comment on the alias documents the divergence.

Two canon names that would shadow clojure.core/merge and clojure.core/into if defined unqualified — merge-chans and async-into — are intentionally still mino-spelled. Wrapping lib/core/async.clj and lib/core/channel.clj in their own namespace and updating every consumer to refer them is its own follow-up cycle and has been logged in the bug registry.

v0.95.2 — Decomposed `clojure.instant/parse-timestamp`

parse-timestamp was a single ~70-line cond inside one driver loop, mixing per-segment parsing with bounds checks and the position-marker cascade that decides which segment fires next. Both halves are now separate: each ISO 8601 component lives in a small parse-month-segment, parse-day-segment, parse-time-segment, parse-second-segment, parse-frac-segment, or parse-zone-segment helper that takes [s idx m] and returns [m new-idx]. The driver loop is a one-screen cond over the next-segment marker that delegates to a helper and recurs on the returned position.

Inline (parse-long (nth s j)) truthiness as a digit test became a named digit? predicate so the fractional-seconds scan reads as intent. The public parse-timestamp, validated, and read-instant-date surface is unchanged; the existing tests/instant_template_test.clj (27 instant assertions) covers the refactor.

v0.95.1 — Dynamic-Var `clojure.test` Internals

clojure.test previously kept its pass/fail counters, testing-context stack, and current-test name in atoms named with earmuffs (*test-state*, *testing-context*, *current-test*). Earmuffs signal a dynamic var meant for binding-style rebinding; an atom behind one is a smell, and canon clojure.test uses real ^:dynamic vars + binding for these. mino now does the same: pass/fail counters live in *report-counters* (canon name) bound to a fresh atom inside each run-tests call; the testing-context stack lives in *testing-contexts* (canon name) and is pushed via binding inside the testing macro; *current-test* is bound per test. The cross-file suite-mode flag (suite-mode) stays a plain atom because require evaluates a loaded file outside the caller's dynamic scope.

run-tests is now library-friendly: it returns the summary map {:test n :pass n :fail n :error n :failures [...]} instead of calling (exit ...), and it accepts an [& namespaces] arity that filters the registry to tests registered in those namespaces. Process exit moved to a small run-tests-and-exit wrapper used by tests/run.clj and the per-file bottoms.

The is macro previously dispatched three branches inline; it now dispatches into private is-thrown, is-eq, is-truthy helpers. The internal assert-pass!, assert-fail!, and thrown?-form? are private (defn-).

v0.95.0 — Reduce-Based `clojure.data/diff`

clojure.data/diff-map and diff-sequential previously threaded three mutable atoms (only-a, only-b, both) through a doseq or loop/recur driver, accumulating shape via swap! on each step. The standard treats earmuffs and swap!-as-fold as a smell when a plain reduction would do, and the canon clojure.data implementation is itself a reduce over a three-element accumulator.

Both helpers are now reduce over [only-a only-b both] triples (starting from [nil nil nil] for maps and [[] [] []] for sequentials), with no atoms in flight. Behaviour is unchanged — the same diff triples come out for maps, sequentials, sets, scalars, and mixed-type inputs — and a new tests/data_test.clj covers the public surface (14 tests, 21 assertions) so the next refactor pass has a real safety net.

mino --version and the REPL silently failed under PowerShell on fresh Windows installs (Scoop or Homebrew-on-Windows). Exit code -1073741515 (0xC0000135, STATUS_DLL_NOT_FOUND) showed the binary never started: mingw-gcc by default produces an exe that imports libgcc_s_seh-1.dll and libwinpthread-1.dll. The GHA runner has those DLLs in scope (so the release-build smoke test passed), but a clean Windows install doesn't.

The bootstrap Makefile now passes -static to the linker on Windows_NT, so mingw's runtime gets baked into mino.exe. macOS and Linux remain dynamically linked. This makes the v0.94.4 stdout- buffering patch actually observable too, since the binary now runs.

v0.94.4 — Force Line-Buffered Stdout on Windows

mino --version and mino (REPL) printed nothing when launched from PowerShell against a Scoop install. The Git Bash path on the same binary worked: the GHA release-build's smoke step ran mino.exe --version under Git Bash and got the expected output. The difference is buffering — MSVCRT's stdout is block-buffered when stdout is not a tty (which the Scoop shim's PowerShell pipeline looks like), and the shim's child-process plumbing doesn't always propagate the buffered tail when mino.exe exits.

main() now calls setvbuf(stdout, NULL, _IOLBF, 0) and setvbuf(stderr, NULL, _IONBF, 0) on _WIN32 at program start. Each fprintf flushes on newline (or immediately, for stderr) regardless of how the binary is invoked. macOS and Linux are unchanged.

v0.94.3 — bundle.awk Sidesteps MSYS Path Translation

v0.94.2 moved the bundled-source escape from sed to awk, but kept the script inline on the command line. Git Bash on Windows mangled awk's inline /\\/ regex literal through the same MSYS path-translation heuristic that broke sed: argument fragments that look path-shaped get rewritten before the tool parses them. The Windows job's Bootstrap step in v0.94.2's release-build matrix surfaced empty headers a second time and the Release artifact for Windows didn't upload (so scoop install mino against v0.94.2 would have 404'd just like v0.94.1).

The escape script now lives in src/bundle.awk. The recipe invokes awk -f src/bundle.awk "$src" — the -f argument is a file path, which path translation handles correctly, and the script body never appears on the command line at all. Output is byte-identical to all prior implementations across the 20 generated headers; the full test suite passes (1460 / 7017). With Windows Bootstrap genuinely green, the v0.94.2 cleanup of continue-on-error and fail-fast finally takes effect: the Windows artifact rejoins the Release matrix.

v0.94.2 — Portable Bootstrap, Windows Rejoins Releases

The bootstrap Makefile recipe now uses awk instead of sed to escape each lib/<ns>.clj source into its src/<sym>.h C string literal. Sed via Git Bash on Windows mangled the leading-slash regex argument through MSYS path translation and emitted empty headers; awk's script body starts with { and the regex literals are internal tokens, so the recipe is one source for every platform. Output is byte-identical across all 20 generated headers.

With the recipe portable, the continue-on-error guards that were masking the Windows Bootstrap failure go away: ci.yml's Windows Bootstrap step is no longer informational, and release-build.yml drops its job-level continue-on-error: ${{ matrix.os == 'windows' }}. The Windows artifact rejoins the Release matrix; scoop install mino works against the v0.94.2 zip again. (The Test step on Windows stays informational — that's a separate cmd.exe trailing- space quirk in the proc-test assertions, unrelated to the bootstrap.)

No runtime behaviour changes vs v0.94.1; this is a build-pipeline patch.

v0.94.1 — Release-Build Windows Guard

Patch fix for the v0.94.0 release pipeline. The Windows release-build job tripped the same Git Bash sed quirk that ci.yml already gates around — the bootstrap Makefile recipe escapes differently than POSIX sed and emits empty bundled-source headers. ci.yml had been marking its Windows Bootstrap step continue-on-error since v0.93.0; release-build.yml was missing the same guard, and fail-fast: true was cancelling the otherwise-green macOS jobs. The release-build matrix now runs with fail-fast: false and the Windows job is informational at the job level until a portable Makefile recipe lands. Linux and macOS artifacts are the authoritative release set.

No runtime behaviour changes; if you build from source on Linux, macOS, or via the bootstrap Makefile, this release is identical to v0.94.0.

v0.94.0 — Empty-List Canon Parity

The empty list () is now a real value type, distinct from nil. This matches Clojure's canonical semantics where the empty list, an empty vector, and an empty seq compare equal but none of them equal nil. The cycle also folds in three post-v0.93.0 fixes that have been sitting on main: the bootstrap Makefile, the Windows informational guard, and the disk-wins-over-bundled resolver fix.

Empty-list canon parity (breaking). The reader, the (list) constructor, and every primitive that surfaces an empty seq result now produce the canonical empty-list singleton instead of nil. User- visible behaviour flips on five axes:

Internally, cons-cell cdrs still terminate on nil (the precise GC treats nil as the canonical end-of-chain marker), and the lazy thunk contract still returns nil to mean "no more elements". The translation to () happens at the user-facing seam — first, rest, seq, count, equality, and the printer — so embedders walking cons chains via mino_is_cons see no behaviour change.

Bootstrap Makefile. A 75-line top-level Makefile generates the bundled-source headers and compiles ./mino in one make invocation; that's the entire bootstrap surface. Everything beyond a clean checkout still lives in ./mino task. README, both CI workflows, and mino-site's deploy use it. Windows uses $(OS) to pick up the .exe suffix; the Bootstrap step there is continue-on-error: true because Git Bash's sed handles the recipe's escape pattern differently than POSIX sed and emits empty headers — Windows test posture is already informational, and a portable recipe is its own follow-up.

Resolver: disk wins over bundled. v0.93.0's bundled-stdlib registry shadowed user-supplied overrides on disk. The lookup order flips: a lib/<ns>.clj file on the resolver's path wins over the bundled copy, with the bundled copy as the brew/scoop fallback. This unblocks mino-bench's lib/mino/tasks/builtin.clj override (which adds a perf-gate task the builtin doesn't ship). Brew and Scoop installs see the same behaviour as v0.93.0 because they don't ship a lib/ tree, so the bundled fallback fires.

v0.93.0 — C Refactoring Pass

Top-down legibility pass over the C runtime. Behaviour is unchanged for script authors and embedders; the work is structural — splitting god functions into named helpers, documenting lock and ownership contracts, and removing dead helpers — so future changes land more cleanly. All commits in the cycle pass the full mino test suite (1453 tests, 6991 assertions) and a clean macOS build.

Trust model and lock contracts. Three subsystem entry points now state their authority and threading model in a banner comment: prim/proc.c and prim/fs.c declare that the script author is the trust boundary (primitives validate shape, not intent — embedders that want to forbid shell-out or filesystem mutation refuse to bind these primitives in the embedder's namespace); runtime/state.c declares the single-embedder lifecycle of mino_state_t. Every public-API entry point in runtime/host_threads.c (mino_promise_deliver, mino_future_cancel, worker_run, mino_future_spawn, mino_host_threads_quiesce, mino_future_gc_sweep) now states the lock invariant it relies on or maintains. The relaxed-read on S->thread_count is documented at both the reader (mino_thread_count) and writer (mino_future_spawn, worker exit) sites so its deliberately-loose contract is no longer implicit.

God-function surgery. Eight large functions were split along natural seams into named helpers:

File-level smell sweeps. Per-pattern helpers were extracted to flatten near-identical sites in five files:

Code-level fixes. runtime_module_add_alias returns int instead of void; all five callers now surface OOM as a catchable internal/MIN001 exception instead of silently dropping the alias. prim_random_uuid swaps sprintf for snprintf for hygiene (the buffer was already correctly sized so this is not a fix). ns_process_require_spec_ex now sets a loud MSY001 diagnostic when an alias, module, refer, or rename name exceeds the 256-byte stack-buffer limit; previously the entry was silently skipped.

Defensive overflow guards. Five buffer-grow paths previously did unguarded cap*2 or len+1 arithmetic. None are reachable today, but the invariant is now explicit:

Dead-code removal. diag_add_note_at and diag_set_cause were declared and defined but never called from anywhere in the repo or in any sibling consumer (mino-bench, mino-examples, mino-site). They are not part of the public mino.h embedding surface; removed without a deprecation shim per the alpha posture.

Public-header polish. src/mino.h had a doc-only sweep: removed stale references to deleted code paths, replaced "see mino.c" / "see rbtree.c" with "opaque to embedders" for forward-declared types, removed remaining cycle-name references from inline comments, and renamed an internal-jargon section banner to a shape-describing one.

mino_state god-struct seam map. Eight banner comments inside mino_state (GC, value caches, modules, printer/reader, namespaces, misc per-state, host threads, async) name the conceptual sub-states that share fields. No memory layout changes — the banners give later refactors a seam to split along.

Bundled mino tooling. mino deps and mino task previously required lib/mino/*.clj to be reachable from cwd, so brew-installed mino on a project without a sibling lib/ couldn't use the built-in tooling without a symlink or submodule. The three sources (lib/mino/deps.clj, lib/mino/tasks.clj, lib/mino/tasks/builtin.clj) now bundle into the binary the same way the clojure.* stdlib does: gen_header escapes each into a C string literal, and a new mino_install_mino_tooling install hook registers them via mino_register_bundled_lib. Standalone projects work from any cwd. Embedders that don't expose those subcommands can omit the install hook.

Empty-list type scaffolding (foundation for a later cycle). A new MINO_EMPTY_LIST value type and mino_empty_list(S) accessor sit in the runtime as scaffolding; nothing produces or consumes the singleton in v0.93.0. Wiring it through the reader, sequence primitives, and equality lattice to fix the (list) ⇒ nil divergence requires updating ~70 compatibility tests that currently rely on the legacy "empty seq is nil" semantics, so the user-visible parity work was deferred to a later cycle. The type sits in mino_type_t as an explicit seam; embedders can ignore it.

v0.92.1 — CI And Linux Build Fixes

Patch release covering build-pipeline fixes that surfaced after v0.92.0 went out. No runtime-visible behaviour changes.

Linux build. src/runtime/state.c uses PTHREAD_MUTEX_RECURSIVE, which glibc gates behind _XOPEN_SOURCE >= 500. Without the macro the constant is undeclared and the build fails on Linux. Define _XOPEN_SOURCE 600 at the top of runtime/internal.h so glibc exposes it to every translation unit. macOS and Windows are unaffected.

CI bootstrap. The bundled-stdlib generator that produces lib_clojure_*.h ships as a mino task, so the manual bootstrap step in ci.yml, release-build.yml, and the README only generated core_mino.h. After install_stdlib.c was added, every fresh checkout failed at link time on lib_clojure_string.h: not found. Replace the inline sed with a gen_header shell function called once per bundled namespace.

CI test step. ./mino task test wraps the suite invocation in sh!, which buffers stdout until the subprocess exits; under a hang, no diagnostic ever surfaces. Invoke ./mino tests/run.clj directly from the workflow so per-test output streams as it's emitted, and cap the step at 8 minutes so a deadlock fails fast instead of waiting on the 6h job-default.

Test fan-out cap. concurrent-atom-cas and blocking-many-cross-thread-pings hard-coded n=4 worker futures plus the test thread, which blew past the runtime grant on a 3-vCPU shared CI runner. Cap n at (dec (mino-thread-limit)) so the suite still validates atomicity and cross-thread channel parking on small machines.

Channel close drain. Folded into v0.92.0 retroactively but worth calling out for embedders who hit it on the fix-tag pre-release: a parked <!!/>!! waiter that was supposed to be released by close! could deadlock because close! scheduled the wake-callback without draining the run queue. Producers calling close! are typically the only thread that could pull the wake off the queue, so the parked thread waited forever. close! now drains at the tail.

Windows test informational. tests/proc_test.clj asserts exact stdout from sh "echo" "...", which on Windows comes back with a trailing space before \n because of cmd.exe's echo quirk. The build still must pass; the proc-test cases are marked continue-on-error on Windows until those tests are rewritten in a platform-portable way.

v0.92.0 — Audit and Doc Realignment

Cycle G4.6 closes the host-threads slice with a sanitizer audit, a documentation pass, and one bug fix surfaced while writing the Performance page.

Audit. Full test suite runs ASan-, UBSan-, and TSan-clean. Perf smoke matches the v0.91.0 baseline. The slot-tracking and GC-sweep fixes from v0.90.0 hold under repeated stress runs.

Channel close fix. close! now drains the run queue after scheduling wake-callbacks for parked takers and putters. Without the drain, blocking <!!/>!! calls could deadlock when close! was the only signal that could release them, because the producer thread returns immediately and no one else runs the scheduler. Surfaced while writing the cross-thread channel ping-pong benchmark for the new Performance page; reproducible at modest iteration counts before the fix.

Site refresh. mino-site realigns positioning around four pillars ("Drop into any host with C FFI", "Isolated runtimes with explicit message-passing", "Capability-gated host interop", "Clojure-inspired ergonomics"). Top nav trims to Get Started, Documentation, GitHub. The documentation hub reorganises into Embed, Script, Reference, and Internals sections with role chips at the top. Host-thread rows in the compatibility matrix and intentional-divergences page now reflect the shipped runtime, not the API-shipped/runtime-pending state from v0.84.0. The Coming-from-Clojure concurrency section gains a Futures, promises, threads subsection covering the OS-thread parking model.

Performance page refresh. Single-thread numbers re-measured against v0.92.0 on the M3 Pro reference machine. New Concurrency section reports future spawn + deref roundtrip, atom-CAS contention scaling under the per-state GIL, and blocking-channel cross-thread ping-pong throughput. New Footprint and Startup section reports stripped binary size, source-tree size, vendor size, bundled-stdlib size, and cold REPL invocation time. Banner shifts from "preliminary results" to a versioned line that names the binary and hardware.

Internal cleanup. Phase and version refs stripped from src/runtime/host_threads.c and tests/host_threads_test.clj. examples/embed_host_threads.c removed; examples/embed_multi_tenant_threads.c covers the same ground end-to-end.

v0.91.0 — Embed-Distinctive Thread API

Three knobs let embedders shape mino's threading without forking the runtime: a host thread pool, a per-worker lifecycle factory, and a per-worker stack size. Default behaviour (spawn-per-future) is unchanged when none of them are set.

mino_set_thread_pool. Hand mino a host pool — Tokio runtime, libuv, ASIO, custom pthread pool — and every (future ...) submits a work item via pool->submit_fn instead of calling pthread_create. The same pool can be bound to multiple mino_state_t for multi-tenant patterns: per-NPC AI, per-tenant script sandbox, per-buffer linter, chat-bot fleet. The pool's N workers fan out across all states; each work item carries its own ctx and finds the right state via impl->state. Pool-managed quiesce uses cv-wait on the future's mu since mino doesn't own the pthread; spawn-per-future quiesce keeps pthread_join.

mino_set_thread_factory. Install start/end callbacks that fire on the worker thread for the spawn-per-future path. Use for naming (pthread_setname_np), CPU affinity, priority class, or tracing-context propagation. Pool-managed workers run under the pool's own lifecycle hooks.

mino_set_thread_stack_size. Per-worker stack size for the spawn-per-future path. Defaults to platform default. Useful for tight-RSS embedders running many small futures. Pool workers ignore it (the pool decides).

Quiesce drops the GIL. Previously a recursive caller (most common: prim_exit from inside a script-side (exit ...)) would deadlock on pthread_join because the worker needed the same state_lock to publish its result. mino_host_threads_quiesce now yields the lock before joining and re-acquires after.

Worked example. examples/embed_multi_tenant_threads.c spins up six tenants over three shared pool workers and round-trips a future from each tenant. Demonstrates the work-item-carries-state-pointer model end-to-end.

v0.90.0 — Blocking Channel Ops Park Across Threads

<!!, >!!, and alts!! outside a go block now do real OS-thread blocks when host threads are granted. The matching producer or consumer can run on any worker; the calling thread parks on a promise and is woken when the other side fires the callback. This closes the last gap that made channel-based coordination single-threaded in practice.

Behaviour by mode. Each operation registers its callback on the channel and drains the scheduler once. If the result lands during that drain, return it. Otherwise: when (mino-thread-limit) > 1, park on the promise indefinitely (canonical Clojure semantics — no deadlock detection, since another thread can always supply the value). When threads are not granted, fall back to the cooperative drain-loop and throw on no progress (so a lone driver thread can't lock itself).

thread shares the future pool. (thread body) is now a stable alias for (future-call (fn [] body)); the docstring is no longer phrased as a temporary alias. Same worker pool, same lifecycle, same thread-limit budget.

Slot-tracking fix. S->thread_count now decrements when a worker exits, not only on quiesce. Previously, after spawning N futures, the count stayed at N even when all had completed — so a long-running standalone session would eventually hit the limit despite no live workers. The pthread itself remains joinable until mino_host_threads_quiesce; pthread_join on an exited joinable thread returns immediately. The limit now bounds *concurrently live* workers, matching JVM Clojure's future semantics.

GC sweep detaches future from list. Latent in v0.89 but masked by the slot bug above: mino_future_gc_sweep freed the impl without unlinking it from S->future_list_head, so a later mino_quiesce_threads (called from prim_exit and state_free) walked into a freed pointer. Sweep now joins the worker thread (a no-op if it has already exited), removes the impl from the list, and only then destroys mu/cv and frees the struct. ASan caught it on the new cross-thread tests once the slot fix let GC run on resolved futures; both ASan and TSan are clean across the full suite after the fix.

Tests. Cross-thread parking tests cover the multi-threaded path (producer in one future, consumer on the test thread; alts winning across threads; N×M ping stress). The single-threaded deadlock tests are gated on (<= (mino-thread-limit) 1) so the standalone suite doesn't hang on canonical-park behaviour. TSan-clean across the full suite.

v0.89.0 — Real Host Threads

Real OS-thread futures and promises. (future expr), (thread expr), (promise), deliver, realized?, future-cancel, future-done?, future-cancelled?, future? all work end-to-end against pthread-backed workers (CreateThread on Windows). Standalone ./mino grants cpu_count after mino_install_all so REPL users get the canonical surface without configuration; embedders raise the limit per state via mino_set_thread_limit.

New value type: MINO_FUTURE. A future cell holds a malloc-owned impl struct with mu/cv, state machine (PENDING/RESOLVED/FAILED/CANCELLED), result+exception slots, cancellation flag, and OS thread handle. Promises share the type (no thread; deliver writes the result directly). Identity equality. Prints as #<future:state>.

TLS-backed ctx accessor. Worker threads allocate their own mino_thread_ctx_t at entry, install via TLS, and link onto S->worker_ctxs_head so GC root scanning walks every blocked worker's gc_save and dyn_stack. The embedder thread leaves TLS NULL and falls through to &S->main_ctx. ~415 sites migrated from S->ctx->FIELD to mino_current_ctx(S)->FIELD; per-state field removed.

Per-state recursive mutex. mino_lock(S) / mino_unlock(S) take a recursive state_lock at the boundaries of mino_eval, mino_eval_string, and mino_call. Workers and the embedder thread serialize within one state; cross-state work runs fully concurrent. ctx->lock_depth tracks recursion so mino_yield_lock / mino_resume_lock can drop the lock entirely around a blocking cv_wait in mino_future_deref, then re-acquire to the saved depth. The lock is uncontested in single-threaded states; cost is one mutex-acquire per public eval entry.

GC suppression while workers are alive. gc_driver_tick skips collection when thread_count > 0. The conservative stack scan only walks the current thread's stack, so a GC initiated from one thread can't see another thread's stack-rooted values. Memory normalizes after mino_quiesce_threads. Cycle G4.4+ replaces this with safepoint-driven per-thread stack snapshots for true concurrent GC.

Lifecycle. mino_quiesce_threads(S) joins every outstanding worker. Called automatically from mino_state_free and from (exit ...) so workers don't run after the state is torn down. Embedders also call it directly to wait for in-flight futures before doing other work.

TSan-clean. Full suite (1449 tests, 6987 assertions) passes under -fsanitize=thread. The host_threads test exercises spawn + deref, promise + deliver, future-cancel, the future? predicate, and a 4-future × 250-iter atom CAS contention test (lost updates caught via the v0.87.0 atomic CAS upgrade).

Documented limitation: v0.89 single-state futures execute serialized; cross-state futures sharing a host pool run fully concurrent (no shared lock). Cycle G4.4 introduces blocking channel ops + core.async/thread unification; G4.5 adds the embed-distinctive surface (mino_set_thread_pool, mino_set_thread_factory, mino_set_thread_stack_size); G4.6 relaxes single-state serialization with per-thread allocator arenas and finer-grained registry locks.

v0.88.0 — Safepoint Poll And STW Request For Major GC

Mutators now poll a per-thread should_yield flag at canonical safepoints so a stop-the-world major collection can run with a stable view of the heap. Locations: eval_impl entry (folded into the existing limit / interrupt gate), gc_alloc_typed prologue, and the two loop / recur backward branches in eval/bindings.c and eval/fn.c. The fast path is one predictably-not-taken volatile read; the slow path (mino_safepoint_park) blocks the mutator until the collector signals release.

The major GC driver wraps its sweep in gc_request_stw / gc_release_stw. Single-threaded today these are O(1) flag toggles on S->main_ctx with no contention; the GC is itself the mutator and is at a safepoint by definition. Cycle G4 later sub-cycles iterate the worker set and use a condition variable for park / release.

The flags themselves: ctx->should_yield (per-thread parking signal) and S->stw_request (per-state broadcast). Both are volatile so multi-threaded sub-cycles read them without explicit fences; ordering invariants pair with the same __atomic_* primitives the atom CAS path uses.

Perf budget held: fib(30) and reduce-over-million-range bench both within noise compared to v0.87.0, comfortably under the 1% target.

ASan + UBSan clean. GC-stress smoke clean. Suite: 1453 tests, 6984 assertions, all green.

v0.87.0 — Per-Thread Context And Atom CAS

Foundation for real host threads, with no observable change in v0.87.x. Two pieces:

Per-thread context (mino_thread_ctx_t). Every field that mutates with eval progress moves off mino_state_t into a new mino_thread_ctx_t struct: try_stack / try_depth, dyn_stack, gc_save / gc_save_len, eval_steps / limit_exceeded / eval_current_form, interrupted, error_buf / last_diag, call_stack / call_depth / trace_added, and gc_stack_bottom / gc_depth. The state embeds one main_ctx and exposes S->ctx pointing at it. Single-threaded today: S->ctx == &S->main_ctx always, so observable behavior is unchanged. Cycle G4 later sub-cycles introduce per-spawn ctxs and TLS-backed lookup; the field locations they need are already in place.

Atom CAS gated on multi_threaded. swap! and compare-and-set! gain a multi-threaded path through __atomic_compare_exchange_n (GCC/Clang builtin, works on plain pointer fields without _Atomic typing). Single-threaded path keeps the existing read+write fast path. The CAS path is dormant until S->multi_threaded flips, which v0.87.x never does; getting the structure in place now means host-thread spawn lights up correct atom semantics without a second touch.

compare-and-set! also moves from value-equality (mino_eq) to pointer-identity for the comparison, matching canon Clojure (JVM AtomicReference uses reference eq). Small-int cache means this is observably the same for small integers; the change matters for boxed values where pointer-eq is what a CAS instruction can actually express.

ASan clean. Suite: 1453 tests, 6984 assertions, all green.

v0.86.1 — Audit-Cycle Fixes

Three issues found auditing v0.84.0 + v0.85.0 + v0.86.0:

ASan + UBSan clean. Suite: 1453 tests, 6984 assertions, all green.

v0.86.0 — Test Harness Suite Mode

Fixes a long-standing quirk where tests/run.clj silently dropped the test files required after the first one whose bottom-of-file (run-tests) call reached completion. The runner's (exit ...) short-circuited the suite, so 246 tests across 11 files (most of tests/async_*, plus fs_test, proc_test, deps_test) were never executed under the combined runner — they ran only when invoked individually.

clojure.test/*suite-mode* now gates the per-file (run-tests). When *suite-mode* is true, individual calls are no-ops; the suite driver flips it back to false at the end and runs the accumulated registry once. tests/run.clj sets the flag before the require list and clears it for the final call.

Three pre-existing test bugs surfaced by the now-running files are fixed alongside:

Suite count: 1452 tests, 6983 assertions, all green — 246 tests / 371 assertions previously hidden are now counted.

v0.85.0 — Capability Metadata As Documentation

Each non-core install group tags its primitives with a per-state capability label so users can discover at a glance which group their code requires. Capability is descriptive, not prescriptive — the gate lives at install time in C, not at call time. User code can't strip the metadata to gain access because the fn either exists in the env or doesn't.

The labels match the existing install hooks one-for-one:

Always-installed core primitives (inc, +, println, prn, conj, etc.) carry no capability label; the io_core table that ships printable I/O without filesystem or process access stays unlabelled.

Two surfaces expose the label:

A new meta_set_capability C helper attaches the label to the existing meta_entry_t (docstring + capability + source); the meta-table teardown frees it. The prim_install_table_with_capability helper lets each install hook tag its whole table in one call without touching the underlying mino_prim_def shape, so the ~150 prim defs across the core/numeric/sequences/etc. tables stay untouched.

Tests: 7 new tests, 22 assertions in tests/capability_metadata_test.clj. Total: 1206 tests, 6612 assertions, all green.

The naming "G0.5" reflects the cycle's heritage — the install groups landed in cycle G0 (v0.81.0) and the capability metadata was always queued as a small follow-up; this ships it.

v0.84.0 — Host Threads — Foundation Slice

Lays the API surface for host-grant-gated host threads (cycle G4) without yet shipping the runtime that backs them. The mino_set_thread_limit / mino_get_thread_limit / mino_thread_count / mino_quiesce_threads C surface is final and embedders can code against it now; (future ...), (thread ...), (promise), deliver, realized?, future-cancel, future-done?, future-cancelled? are defined and throw :mino/unsupported with a message that distinguishes two failure modes:

future? returns false for everything (no future value can be constructed yet) so callers that branch on it pick the non-future arm without surprise.

Standalone ./mino calls mino_set_thread_limit with the host CPU count (via sysctlbyname on Darwin, sysconf elsewhere, GetSystemInfo on Windows) right after mino_install_all, so REPL/script users see the "in flight" message while embedders that haven't opted in see the "not granted" message. Once the runtime ships, the same call grants Clojure-canon (future ...) semantics by default in standalone mode.

Two new primitives expose the per-state knobs to the script side for diagnostics and tests: (mino-thread-limit) returns the int and (mino-thread-count) returns the live worker count (always 0 in this slice). The :mino/thread-limit key in the thrown ex-info map carries the same value.

Tests: 11 new tests, 22 assertions in tests/host_threads_foundation_test.clj plus a C smoke program in examples/embed_host_threads.c that exercises both grant states from the embedder side. Total suite: 1199 tests, 6590 assertions, all green.

Six open questions for cycle G4 are settled and locked for the incoming runtime work:

v0.83.0 — Clojure.spec.alpha And Clojure.core.specs.alpha

Substantial port of clojure.spec.alpha and the destructure-form specs in clojure.core.specs.alpha. Both ship in the bundled stdlib under a new mino_install_clojure_spec hook.

clojure.spec.alpha provides the canonical surface: s/def, s/valid?, s/conform, s/explain, s/explain-data, s/explain-str, s/and, s/or, s/keys, s/coll-of, s/map-of, s/tuple, s/nilable, s/spec, s/cat, s/*, s/+, s/?, s/alt, s/fdef, s/instrument, s/unstrument, s/registry, s/get-spec, s/form, and s/assert. Spec values are tagged maps keyed by ::s/kind and dispatched through multimethods. s/instrument wraps the named var via alter-var-root and validates :args on every call; s/unstrument restores. Registered keys are reachable through s/get-spec; s/registry returns the full map.

s/gen and s/exercise throw :mino/unsupported. A clojure.test.check port is deferred until a concrete user need lands. The error names the missing dependency so onboarders see exactly what is absent.

clojure.core.specs.alpha ships destructure-form specs for defn, fn, let, binding, and the binding-form sub-shapes (::seq-binding-form, ::map-binding-form, ::local-name, ::params+body, ::defn-args). Tools that want to validate macro forms call (s/conform :clojure.core.specs.alpha/defn-args ...) directly. Validation is opt-in; the core compiler does not consult the specs.

Two evaluator fixes ship alongside the spec port because the port surfaced them:

The two changes are observable only when a namespace's macro body references its own helpers or internal defs; bare-symbol macros (none in core.clj) are unaffected.

s/cat and the regex repetition operators (s/*, s/+, s/?, s/alt) interpret nested specs and registered regex keys uniformly: the cat helper resolves keyword refs to their registered spec, so (s/* (s/cat :k keyword? :v any?)) over [:a 1 :b 2] greedily consumes pairs and returns [{:k :a :v 1} {:k :b :v 2}]. s/spec wraps a regex into an element-level spec so multi-arity defn bodies match the canonical shape (s/+ (s/spec ::params+body)).

Test surface: 37 new tests, 86 assertions in tests/spec_test.clj covering def/valid?/conform/explain, and/or/nilable/tuple, keys required and optional, coll-of/map-of, cat/*/+/?/alt, spec wrap, gen stub, assert, fdef + instrument/unstrument, and the core.specs.alpha destructure forms. Total suite: 1188 tests, 6568 assertions, all green. ASan + GC stress smoke clean on the spec load + conform path.

v0.82.0 — Clojure.instant, Clojure.template, And Tagged-Literal Reader Hook

Three small fills accumulating under the bundled-stdlib registry established in v0.81.0.

The reader now resolves #tag form at read time. Resolution order: (get *data-readers* 'tag) -> *default-data-reader-fn* -> tagged-literal record fallback. Both vars are interned as dynamic vars in clojure.core with empty-map and nil defaults. The reader's tag is emitted as a symbol now (not a keyword), per canonical Clojure; calling tagged-literal directly still accepts any tag value. The fallback record is built at read time so (read-string "#foo bar") returns a {:tag foo :form bar} tagged-literal record directly instead of a deferred (tagged-literal ...) call form.

*data-readers* follows read/eval separation: the binding visible at the read-string call site decides the reader fn, and a later rebind does not retroactively change a value already produced. With clojure.instant required, a one-line (binding [*data-readers* {'inst clojure.instant/read-instant-date}] ...) makes #inst "2026-04-27" parse to the component map.

Two small bundled namespaces drop into the registry established in v0.81.0.

clojure.template ports the apply-template and do-template substitution macros that user code historically reaches for when generating repeated test cases or shape variants. mino's own clojure.test/are macro is self-contained (it uses postwalk-replace directly), so the namespace exists for parity with user code that references it. Ships under the mino_install_clojure_test install hook -- the test/template pair installs together since are is the historical caller.

clojure.instant parses ISO 8601 timestamp strings into a component map. mino does not have a host Date / Timestamp / Calendar type, so the parse fns return a map with the keys :years, :months, :days, :hours, :minutes, :seconds, :nanoseconds, :offset-sign, :offset-hours, and :offset-minutes. This is a deliberate divergence from JVM Clojure: callers that wrap read-instant-date in (java.util.Date.) need to consume the map directly. The parser accepts every ISO 8601 shape the canonical regex matches (year-only through nanosecond precision with optional zone offset) and validates each component before returning.

The new namespace ships under its own install hook, mino_install_clojure_instant. mino_install_all calls it along with the rest, so the standalone build picks it up without further wiring.

v0.81.0 — Bundled Stdlib And Per-Group Install Hooks

The clojure.* namespaces that ship with mino (string, set, walk, edn, pprint, zip, data, test, repl, stacktrace, datafy, and core.protocols) are now baked into the binary alongside the core library. A standalone install with no lib/ directory on disk still loads (require '[clojure.string]) and the rest of the bundled set, closing the brew/scoop bundling gap that previously required users to colocate lib/clojure/ next to the binary.

Each bundled namespace gets a per-state install hook on the public C API: mino_install_clojure_string, mino_install_clojure_set, mino_install_clojure_walk, mino_install_clojure_edn, mino_install_clojure_pprint, mino_install_clojure_zip, mino_install_clojure_data, mino_install_clojure_test, mino_install_clojure_repl, and mino_install_clojure_datafy. Pairs that depend on each other ship together: clojure.repl brings clojure.stacktrace, and clojure.datafy brings clojure.core.protocols. Each hook registers its in-binary source into a per-state stdlib registry that the require system consults before the disk resolver, so a (require '[clojure.string]) from script side loads the bundled source from memory.

mino_install_all(S, env) is the new "give me everything" convenience for the standalone build: it calls mino_install_core plus the I/O / fs / proc groups plus every bundled clojure namespace hook, mirroring what a full link from ./mino provides. Embedders that want a tighter footprint pick the subset they need explicitly; mino_register_bundled_lib(S, name, source) exposes the underlying registry so a host can bundle its own non-clojure namespaces with the same mechanism.

The gen-stdlib-headers build task escapes each bundled lib/clojure/*.clj into a per-namespace header (src/lib_clojure_<name>.h) parallel to how gen-core-header handles src/core.clj. The headers are gitignored and regenerated on every build, so editing a bundled wrapper picks up automatically. Test-fixture .clj files under lib/clojure/test_clojure/ and lib/clojure/core_test/ are not bundled -- they exist on disk so the require/resolve test surface can verify file-loading behaviour.

Bundled-lib lookup treats . and / as the same separator so a hook registered under clojure.string still matches the clojure/string path-style name produced when the symbol form of require recurses with the path-converted name.

v0.80.0 — Real Records And Embed-Distinctive Type Construction

Records are now first-class value types in mino. (defrecord Point [x y]) defines Point as a real type (not a tagged map), ->Point as the positional constructor, and map->Point as the constructor that splits declared fields from extension keys. Field access via (:x p), (get p :y), and (p :z :missing) all resolve through the same primitive path; assoc keeps the record type when the key is declared or new (ext); dissoc on a declared field degrades the record to a plain map (canonical Clojure semantics). seq, keys, vals, count, contains?, and find cover the rest of the map-isomorphic surface.

Records are not maps with type tags. Storage is field slots, not a backing map; the slot array is malloc-owned and freed during GC sweep. Equality requires type-pointer identity plus per-field value equality plus extension map equality; (= (->Point 1 2) {:x 1 :y 2}) is false, and the two values hash differently. This is the (= record map-with-same-content) litmus that distinguishes a real record from a tagged-map wrapper.

deftype is an alias for defrecord. mino has no separate JVM-class layer to expose, so the deftype/defrecord distinction collapses; values created either way are real types with map-isomorphic behaviour. reify creates a fresh anonymous type at expansion time and returns a single instance with the named protocols extended onto it; repeated invocations of the same reify form share the type pointer because record types intern by (ns, name).

(instance? T x) is now meaningful: it compares t against (type x), which is type-pointer identity for records and keyword equality for built-in types and ad-hoc :type-tagged values. The previous throw-stub macro is gone.

Protocol dispatch atoms hold mixed keyword and type-pointer keys: built-in types continue to dispatch via keywords like :string, :vector, and :map, while record types dispatch via the MINO_TYPE value defrecord produces. extend-type and extend-protocol accept type symbols that resolve at runtime to the type pointer, so (extend-type Point IFoo (foo [this] body)) registers under the type's pointer and (get @IFoo--foo (type p)) finds the impl. The dispatch path does not distinguish C from script: a host that wants its own impl interns an ordinary primitive and uses extend-type from mino code, the same way every other protocol method does.

The (with-meta x {:type :tag}) keyword-tag dispatch path is unchanged. defrecord is the canonical path for new code; the metadata path remains for ad-hoc tagging and is still used by mino's own multimethod implementation.

The C embed surface gains mino_defrecord, mino_record, mino_record_field, mino_is_record, and mino_is_record_type in src/mino.h. A host can define a record type from C, build instances directly, and read declared field values back without going through map-key lookups. The constructor is idempotent by (ns, name), so re-calling it from a script reload returns the existing type and existing record values keep (instance? T r) true. The new examples/embed_record.c exercises the full round trip: defines a Vec3 type from C, builds an instance with mino_int field values, hands it to script that extends a magnitude-squared protocol on the type and calls it on the C-built value, then reads field values back via mino_record_field.

Migration: code that called the throw-stubbed defrecord, deftype, reify, or instance? will now succeed instead of throwing. The tests/compat_test.clj block asserting they throw has been pruned to keep only the still-unsupported :import case. Code that relied on the throw stubs to gate platform detection should switch to a different shibboleth.

v0.79.0 — Auto-Promoting Arithmetic And `unchecked-*` Opt-In

Plain +, -, *, inc, and dec now auto-promote to bigint on long overflow rather than throwing. The expression (+ 9223372036854775807 1) returns 9223372036854775808N instead of raising :eval/overflow; the same applies to unary (- LLONG_MIN), (- LLONG_MIN 1), (* big big big), (inc LLONG_MAX), and (dec LLONG_MIN). The previous loud-throw default was the silent-surprise cousin of canonical Clojure: working code that ran on a JVM raised an unfamiliar classified error here. The new default matches what Clojure programs assume, while the named opt-in below preserves the fast int64 path for code that needs it.

The unchecked-add, unchecked-subtract, unchecked-multiply, unchecked-inc, unchecked-dec, and unchecked-negate primitives ship as the named opt-in for two's-complement wraparound int64 arithmetic. (unchecked-add 9223372036854775807 1) returns -9223372036854775808 (LLONG_MAX wraps to LLONG_MIN); operands must be ints, non-int operands throw :eval/type. The names match canonical Clojure surface and pair fixed-arity calls (unchecked-add is binary, unchecked-inc is unary), matching the JVM signatures.

Per the alpha-no-backcompat policy, the auto-promoting quote-suffix siblings +', -', *', inc', and dec' have been removed entirely. Code that called them now resolves through plain +/-/*/inc/dec, which auto-promote with the same semantics. The clojure_coverage_test lists the quote-suffix names alongside JVM-only names: present in canonical Clojure but intentionally absent in mino because the plain forms now do the same job.

The :eval/overflow MOV001 error code is retired. The single remaining caller, (int huge-bigint) for a value out of long range, now reports :eval/type MTY001 since the conversion is a type/range error rather than an arithmetic overflow.

Internally, the tower_reduce and tower_reduce_seeded helpers shed the promote_long_overflow flag they took to distinguish + from +'; they now always promote. The 6 throw sites in src/prim/numeric.c for :eval/overflow MOV001 are gone.

v0.78.0 — `clojure.core.protocols` And Cross-Namespace Protocol Extension

The four canonical protocols CollReduce, IKVReduce, Datafiable, and Navigable are now first-class in mino. They are interned at boot time in clojure.core and re-exported under the clojure.core.protocols namespace, so user code can write (extend-protocol clojure.core.protocols/CollReduce SomeType ...) and have the override consulted by reduce. The clojure.datafy namespace ships as a thin wrapper that surfaces datafy and nav at the canonical home expected by code ported from canonical Clojure.

reduce, reduce-kv, datafy, and nav now consult the protocol dispatch table on every call. When no per-type or :default override is registered, reduce and reduce-kv fall through to the existing internal seq-driven walk; the override only kicks in when a user has extended the protocol for the value's type. Datafiable and Navigable are seeded with identity-shaped :default impls so (datafy x) and (nav coll k v) are well-defined for built-in types.

The extend-type and extend-protocol macros now preserve the namespace prefix on the protocol symbol when emitting the underlying (swap! Proto--method ...) form. Before this fix (extend-protocol some.lib/SomeProto ...) silently looked up the dispatch atom in the calling namespace and failed with an unbound-symbol error. Cross-namespace protocol extension is the standard usage pattern, so what was previously a quiet breakage is now part of the supported surface.

Two new private vars are exposed in clojure.core for the protocol wiring: internal-reduce_ and internal-reduce-kv_ hold references to the pre-protocol implementations and serve as the fall-through when no override applies. Both are underscore-suffixed by mino's existing convention for implementation-detail names.

v0.77.0 — REPL Specials And `clojure.repl` / `clojure.stacktrace`

The interactive REPL now binds the standard introspection vars after each form: *1, *2, *3 rotate to hold the three most recent results, *e captures the most recent error as a structured diagnostic map, *command-line-args* exposes any positional arguments past the script path, and *file* is bound to the script path during file-mode load (or "NO_SOURCE_PATH" in the REPL). The vars are interned from main.c rather than mino_install_core, so embedders that don't ship a REPL pay nothing for these.

Two new bundled namespaces ship under lib/clojure/. The clojure.repl namespace wraps the existing introspection primitives in print-shaped helpers: the doc and source macros print, the dir macro lists a namespace's public names, find-doc searches docstrings for a substring or regex, and pst prints *e as a formatted summary. The C primitives that return raw data are exposed as clojure.repl/doc-string, clojure.repl/source-form, and clojure.repl/apropos. The clojure.stacktrace namespace provides print-throwable, print-stack-trace, print-cause-trace, and root-cause for walking mino's diagnostic-map exception representation.

Per the alpha-no-backcompat policy, the previously-exposed doc, source, and apropos names in clojure.core have been removed. Code that called them as data accessors should require clojure.repl and use the renamed names; code that wanted print behavior gets it via the clojure.repl/doc and clojure.repl/source macros.

The require machinery's runtime-namespace shortcut had a pre-existing bug that this cycle exposed: namespaces with both pre-installed C primitives and a backing .clj file would skip loading the file, since the var registry already held entries from install time. The check now consults module_cache (which records actually-loaded files) instead, so (require '[clojure.repl :refer [doc]]) and (require '[clojure.string :refer [capitalize]]) correctly load the wrapper and bind the :refer'd names.

v0.76.2 — Insertion Barrier For Incremental Major

The mutator write barrier now also pushes the just-installed new_value onto the major mark stack while a major collection is in MAJOR_MARK. Pure SATB captures the previous slot contents, which is correct for objects already reachable from the snapshot, but does not protect an OLD whose only surviving root path runs through the new edge of this very write. Combining the Yuasa SATB push with a Dijkstra insertion push closes that window: either pre-existing snapshot reachability or post-update reachability is sufficient to keep an OLD alive across the cycle. gc_mark_push deduplicates against the mark bit, so the extra push is a no-op for values already in the snapshot.

The bug surfaced as a heisenbug whose footprint depended on the exact size of src/core.clj: past a threshold (Cycle B's print- pipeline additions plus one more defn), the test suite would fail in tests/compat_test.clj :: multimethod-with-docstring with shifting error shapes (fn arity mismatch, unsupported binding form, map as function takes 1 or 2 arguments). ASan was clean because the freed OLDs were recycled through the GC's internal freelist rather than free(). MINO_GC_VERIFY=1 showed no remset gap. Forcing every major to run STW (MINO_GC_STRESS=1 or disabling slicing in the driver) hid the bug, which localized the problem to the incremental mark path.

v0.76.1 — GC Defensive Fixes On Alloc-Pair Patterns

Two intern and trie-build paths that allocate one GC object, hold its only reference in a C local, then call back into the allocator are now wrapped in a gc_depth raise so a collection cannot fire between the two allocs. Both ASan and load-time stress had been catching this under specific layouts; the conservative stack scan misses locals the optimizer keeps in registers, which is what made the symptoms heisenbug-shaped (error messages shifted between runs even with the same inputs).

intern_lookup_or_create in src/collections/val.c now keeps the freshly dup_n'd character buffer protected across the following alloc_val. Without the raise the buffer could be swept by a sweep triggered by alloc_val's own driver tick, which surfaced as use-after-free reads in gc_mark_push later on.

vec_from_array in src/collections/vec.c already raised gc_depth for the trie-build phase but lowered it before vec_assemble. The lowered window is now closed: gc_depth stays raised through vec_assemble in both the tail-only and full-trie paths so the just-built tail and root nodes (held only in C locals at that point) are not swept while alloc_val runs.

Both changes are localized: existing call sites are unchanged and the test suite continues to pass under both the normal incremental schedule and MINO_GC_STRESS=1 full-STW majors.

v0.76.0 — Print Pipeline And `*out*` / `*err*` / `*in*`

The print and read primitives now route through configurable sinks resolved from *out*, *err*, and *in*. The three names are interned as dynamic vars in clojure.core holding the sentinel keywords :mino/stdout, :mino/stderr, and :mino/stdin; binding *out* or *err* to a string- collecting atom captures the output bytes into the atom's value instead of the default FILE*, and binding *in* to a string-cursor atom feeds reads from the string. The dyn-stack lookup matches both the bare and clojure.core/-qualified symbol forms so syntax-quote-expanded bindings work without ceremony.

The print family (println, prn, print, pr, newline, pr-builtin) now consults *out* before deciding the sink, falling back to stdout when bound to :mino/stdout or stderr when bound to :mino/stderr. (binding [*out* *err*] ...) routes output through stderr because the dyn-bound :mino/ stderr keyword identifies the FILE\* fallback.

with-out-str, with-in-str, print-str, prn-str, println-str, printf, flush, read-line, and read* are new. with-out-str allocates a fresh string-atom, binds *out* to it for the body, and returns the accumulated text. with-in-str binds *in* to a string-cursor atom holding the given text. read-line reads one line from *in* (atom-bound or stdin), returning the line or nil on EOF. read* is the zero-arity primitive that the user-facing clojure.core/read dispatches to: a fresh (read) consumes the next form from an atom-bound *in* (the stdin path raises an unsupported error, since stream-fed read needs reader-side plumbing that lands in a follow-up). The *-str companions wrap their print counterparts; printf formats then prints; flush calls fflush on stdout and stderr (a no-op for atom-bound sinks).

Internally the print primitives moved from the optional mino_install_io table to k_prims_io_core, which runs before core.clj evaluates so the bundled print-str/prn-str/ println-str definitions can reference them. Sandboxed embedders that called mino_install_core without mino_install_io already had pr-builtin; they now also see the print family plus read-line, read*, and printf. Filesystem and process I/O (slurp, spit, exit, file- seq, getenv, getcwd, chdir) stay in k_prims_io for capability-gated installation.

The print-method multimethod still dispatches readable formatting per type. When the hook is installed and a user method is called, the print primitive runs the hook under a nested *out* rebinding that captures the hook's output to a temporary string-atom, then emits the captured bytes through the outer sink — so user-defined methods that call pr- builtin or other print fns flow correctly into with-out- str.

v0.75.0 — Surface Honesty

Three small but visible gaps closed against the canon surface, under the principle that silent divergences cost more than loud ones.

The reader's #"..." regex literals now pass body bytes to the regex engine verbatim. Previously the body ran through the same string-escape pass that ordinary strings do, so \d lost its backslash before the engine saw it; (re-find #"\d+" s) would silently match d+ instead of digits. The literal path now preserves backslashes (and \" is a literal two-character sequence rather than a string terminator), matching how regex literals work elsewhere. The string-form "\\d+" workaround keeps working unchanged.

load-string and load-file are now exposed as primitives. The runtime already had mino_eval_string and mino_load_file as embedder-facing C functions; these primitives surface the same machinery to the language. (load-string "(+ 1 1)") returns 2; (load-file "path/to/file.clj") reads, evaluates, and returns the last form's value. Both clear the ambient namespace for the duration so forms see the current namespace plus their lexical chain, matching eval.

Documentation reflects the new state. The Intentional Divergences page no longer carries the regex-escape entry, and the Coming-from-Clojure quick-reference table marks #"regex" as Same.

v0.74.3 — One-Shot Expression CLI

The standalone mino binary now treats a positional argument that begins with a Lisp form character as an inline expression, matching the convenience shape other Lisp CLIs offer:

`` mino "(+ 1 2)" # 3 mino "[1 2 3]" # [1 2 3] mino "{:a 1}" # {:a 1} mino "(println :hi)" # :hi / nil ``

Form characters that trigger expression mode: (, [, {, #, @, '. A leading -- separator forces file-or-task interpretation; the explicit -e EXPR flag still works either way; everything else continues to be treated as a file path. File names that happen to start with one of those characters need an explicit -- or path prefix (e.g. mino ./(name).clj), but that's a vanishingly rare case in practice.

--help documents the new shape on its own line under USAGE.

v0.74.2 — Heap-Allocated Dynamic Binding Frames

Fixes the v0.74.1 known-issue Windows SIGSEGV during tests/run.clj. The binding special form and the new with-bindings* primitive both pushed a stack-local dyn_frame_t onto S->dyn_stack and only popped it on the success path. When a throw inside the body unwound the C stack through longjmp to a containing try, the popped function's stack memory still held the frame, and the longjmp handler in eval/control.c walked S->dyn_stack and read frame->prev / frame->bindings from that now-stale stack region. Linux happens to leave popped stack memory readable for long enough that the walk succeeds; the Windows runner's stack handling makes the same read fault.

The fix is to heap-allocate the frame so the pointer remains valid even after the C frame is unwound. The success path frees the frame; the longjmp handler in eval/control.c already frees the malloc'd binding chain on each unwound frame and now sees a stable parent pointer too.

The Windows job in ci.yml returns to the blocking matrix; the informational marker added in v0.74.1 is no longer needed.

v0.74.1 — CI Hygiene

The v0.74.0 push surfaced two CI signals that needed addressing. Neither is a runtime correctness regression on the platforms covered by formal and parity gates (1058/6277, 230/230); both are about how the CI suite reports.

The Windows matrix job currently SIGSEGVs partway through tests/run.clj after the v0.73.0 first-class-namespace cycle. Without a Windows reproduction environment the root cause is not yet identified; the matrix job is marked continue-on-error: true so the Linux and macOS gates can keep blocking, and the Windows crash is tracked as a known issue for the next cycle.

The perf-gate job in ci.yml is now informational (continue-on-error: true). Shared GitHub-hosted runners are CPU-noisy, the ubuntu-latest image drifts under the pinned baseline, and v0.73.0's first-class-namespace lookup chain naturally adds eval-floor cost that the v0.70.0-era baseline did not anticipate. Local runs and the dedicated mino-bench workflow remain the authoritative signal; a self-hosted runner or scheduled comparison-run job is queued for a follow-up.

The mino-bench task runner's bundled-task module qualifies its clojure.string calls as str/split and str/ends-with?; the v0.73.0 namespace move broke the bare references. Same fix in the satellite repo, no mino-side change.

The mino-site deploy workflow bootstraps from src/core.clj instead of the pre-migration src/core.mino, and the mino-examples submodule pin is refreshed against the published SHA so submodule fetches succeed. Same shape: satellite-side adjustments after a major-namespace cycle.

v0.74.0 — Deferred Core Surface

The deferred names from the v0.73.0 coverage report — *ns* as a real var, bound-fn / bound-fn*, read with options, clojure.edn/read, destructure, re-groups, and re-matcher — land in this cycle. With them the clojure.core and clojure.edn portable surfaces hit 100% in the coverage report.

*ns* is now interned as a dynamic var in clojure.core, so (find-var 'clojure.core/*ns*) resolves and (deref ...) tracks user-visible namespace switches: in-ns and the (ns ...) special form republish the var, and require's save/restore boundary republishes the saved name on the way out so loading a file does not leak the loaded namespace into the caller. The bare-symbol fast path stays as a fallback for embedders that read *ns* before installation finishes.

bound-fn and bound-fn* capture and replay dynamic bindings around an invocation, layered on two new C primitives: get-thread-bindings snapshots the active dynamic bindings into a symbol-keyed map (newest-first wins on shadowing), and with-bindings* pushes a transient frame around a thunk. The mino-side macros provide the standard Clojure call shape for inheriting context into a returned function.

read-string accepts an optional opts-map first argument with the :read-cond key (:allow default, :preserve, :disallow). The reader threads the mode through a new reader_cond_mode field so #? and #?@ sites consult it: :preserve emits a reader-conditional record (the same shape clojure.core/reader-conditional constructs), and :disallow rejects the form. Top-level, list-context, and vector-context conditionals all participate; #?@ inside a map literal is unsupported in :preserve mode and errors with a clear message. read aliases read-string (mino has no PushbackReader type so the string form is the only shape). clojure.edn/read and clojure.edn/read-string force :read-cond :preserve so untrusted text never auto-evaluates a reader conditional.

destructure surfaces mino's destructuring algorithm as a function. It takes a binding-pairs vector [lhs1 rhs1 ...] and emits a flat [name init ...] vector that, fed to (let ...), produces the same bindings. Vector patterns lower through nth, & rest through nthnext, map :keys / :strs / :syms through get with optional :or defaults, plus :as and explicit {sym :key} entries. Implementation lives next to bind_form in src/eval/bindings.c; the primitive is registered in clojure.core via the reflection table.

The bundled regex engine grows a parenthesised-group construct. Compile parses ( and ) into GROUP_OPEN / GROUP_CLOSE markers with sequential ids; the matcher treats the markers as zero-width hooks that record the current text offset. re-find and re-matches now return [whole g1 g2 ...] vectors when the pattern has groups and keep the old string shape otherwise. re-matcher returns an atom-backed iterator that re-find advances; re-groups reads the matcher's last recorded result. Pattern \( still escapes a literal paren. Caveat: #"..." literals run through the regular string-escape path, so \d / \s / \w lose their backslash before the regex engine sees them; pass patterns as strings ("\\d+") until a regex-aware reader escape mode lands.

Caveats. read accepts only the string form — mino has no stream reader value. #?@ splice in :preserve mode is supported in lists and vectors but not inside map literals. re-matcher is mino-side, so its :pos advance uses substring scanning; this is acceptable for typical input but is not the right choice for very large strings.

v0.73.0 — First-Class Namespaces

Namespaces are now real. Each namespace has its own root binding table, so (ns a) (def x 1) and (ns b) (def x 2) are independent and visible only by qualified name from each other. clojure.core is the bundled-core namespace; every other namespace's root env chains to it via a parent pointer, so unqualified references to if, map, let and friends keep working without an explicit refer.

The full namespace machinery landed in one cycle. (ns name ...) clauses accept :require, :use, and :refer-clojure with the expected modifier set: :as, :as-alias, :refer [syms], :refer :all, :only, :exclude, and :rename. Prefix lists work too: (:require [pkg [a :as a] [b :as b]]). require itself accepts symbol, vector, prefix-list, and string arguments and is multi-arg. A namespace created by (ns ...) in memory is requirable without a backing file -- the resolver checks the runtime registry before falling back to the filesystem.

Vars are first-class runtime objects. (def x 1) returns the var #'<ns>/x; (def x) creates an unbound var that bound? reports as false and that throws on deref. intern, find-var, var-get, var-set, and alter-var-root all work; the with-redefs macro binds a stack of root-value swaps so test code can stub vars temporarily. ^:private is a hard error on cross-namespace qualified access, and :refer :all skips privates rather than exposing them.

Auto-resolved keywords landed too. ::foo reads as :<current-ns>/foo; ::alias/foo looks the alias up in the session's alias table at read time and errors if absent. The namespaced-map literals follow: #:foo{:b 1} qualifies bare keys with foo; #::{:b 1} qualifies with the current namespace; and #::alias{...} resolves the alias the same way ::alias/foo does. The underscore namespace (:_/x) strips off, leaving a bare key.

A handful of correctness gaps closed alongside. Cyclic require chains now throw with the load chain in the message rather than recursing into a stack overflow. A loaded file whose first (ns ...) form disagrees with the requested module name is rejected; the comparison treats dashes and underscores as equivalent so (ns foo-bar) in foo_bar.clj is fine. def, declare, and defmacro refuse to shadow a name brought in by :refer from another namespace, so accidental collisions surface immediately. The "unbound" diagnostic for qualified symbols distinguishes "no such alias", "no such namespace", and "no var X in namespace Y". Symbols ending in a colon (foo:) are rejected at read time, namespaced map literals reject duplicate keys after prefix qualification, and (ns 1) errors instead of silently returning nil.

refer accepts :only, :exclude, and :rename. Names listed in :only are validated up front: each must exist in the source namespace and must not be a private var, so refer no longer silently drops missing or hidden names. find-var throws for an unknown namespace; the var-not-found case still returns nil to match upstream. ns-resolve accepts the optional environment-map arg so (ns-resolve ns env-map sym) returns nil when the symbol is shadowed locally.

Namespaces carry metadata. (ns ^{:a 1} foo "docstring" {:b 1}) collects the ^meta, the docstring (as {:doc "..."}), and the attribute map into a single map and stores it on the namespace. (meta *ns*), (meta (find-ns 'foo)), and (meta (the-ns 'foo)) return that map. Each (ns ...) invocation replaces the namespace metadata wholesale; merging only happens between the three sources within one call.

The introspection surface is roughly the runtime-namespace shape that other interpreted dialects expose: in-ns, find-ns, the-ns, create-ns, remove-ns, ns-name, ns-publics, ns-interns, ns-refers, ns-aliases, ns-map, ns-unmap, ns-unalias, alias, all-ns, loaded-libs, find-var, ns-resolve, requiring-resolve, intern, var-get, var-set, var?, bound?, alter-var-root, plus *ns* for the current namespace symbol. ns-publics returns only the namespace's own public vars; ns-refers walks the parent chain to surface inherited names; ns-map combines both with the alias table. Values come back as vars (so pr-str produces #'ns/name), and ns-unmap clears both the env binding and the var registry entry.

Syntax-quote (\) auto-qualifies bare symbols against the current-namespace lexical chain (already true since the cycle opened) and now also expands an alias prefix on namespaced symbols, so \str/x becomes clojure.string/x when str is aliased. Refer'd entries keep their source-namespace identity: after (refer 'clojure.string) in a fresh namespace, \capitalize resolves to clojure.string/capitalize` rather than the receiving namespace, matching the contract the reflective APIs already followed.

Namespace aliases are scoped per-namespace. Setting an alias in one namespace no longer leaks into another, so (require '[a :as x]) in one namespace doesn't make x/y resolvable from a sibling namespace. Vars carry :ns, :name, :private, and :dynamic metadata synthesized from their intrinsic fields, so (meta #'foo) returns a useful map. eval resets the ambient namespace before running the form, so a form passed to eval sees only its own current-namespace bindings rather than the calling function's defining namespace. The with-local-vars macro lands as a thin wrapper over intern and var-set for lexically-scoped mutable cells.

ns-unmap correctly removes large-frame bindings (the previous implementation shifted the array in place but left the backing hash table pointing at the old slot, so the binding still resolved). resolve no longer falls back to a global var-registry scan when the current namespace doesn't own a name; that fallback picked up unrelated names from sibling namespaces.

(require "deps/foo/src/foo.cljc") -- a literal path argument -- no longer trips file-to-namespace validation. Path-style requires are deliberate "load this file" requests; only the dotted-name form imposes the namespace-must-match-name check. (doc 'foo) falls back to the namespace's :doc metadata when the named- binding table doesn't have an entry, so namespaces declared with (ns foo "docstring" ...) are documented through the same primitive that surfaces defn docs. (doc 'clojure.core/inc) also finds the docstring registered under the bare name.

mino.deps now probes a fetched dependency directory for common source-root conventions. If the lib follows the Maven layout (src/main/clojure/) the root is added automatically alongside a plain src/ entry, so a multi-file library can require its sibling namespaces by symbol without a manual :deps/root override in mino.edn.

A few small Clojure-shaped affordances landed alongside the namespace work. extend-protocol accepts nil as a type marker (translated to :nil so nil-safe protocol implementations match what (type nil) returns); bare class symbols (Object, Pattern, ...) are rejected with a clear error so silently collapsing them to :default doesn't mask broken dispatch. Reader conditionals now treat :clj as an active dialect alongside :mino, so libraries that only have :clj/:cljs branches read correctly here. defn honors a {:pre [...] :post [...]} map between params and body, threading assertions around the body so % refers to the return value. *assert* is bound to true at the clojure.core level. find accepts transient associatives, mirroring real Clojure semantics. re-find and re-matches return nil for a nil text argument instead of throwing. :refer-clojure skips bindings whose source var is private, matching how Clojure's auto-refer treats private vars. The stale clojure.core/blank? wrapper has been removed; blank? lives only in clojure.string now, matching the upstream contract.

Mino targets pure portable Clojure — there is no JVM and no JavaScript runtime — so any form that exists solely to interface with one of those platforms throws an ex-info carrying :mino/unsupported. defrecord, deftype, reify, proxy, gen-class, definterface, import, and instance? all error at expansion or call time. agent, send-to, and agent-error do the same — aliasing them to atoms only pretended the async dispatch semantics were honored. The ns form rejects :import and :gen-class clauses so files that mix Java interop into their namespace declarations fail loud at load time.

Source files have moved from .mino to .clj. Mino sources are a host-targeted Clojure dialect (the same defn / macro system / sequence semantics, with the JVM-only forms above swapped for :mino/unsupported errors), and the new extension lets editors, formatters, language servers, and tree-sitter grammars recognize mino code out of the box. The require resolver searches .cljc, .clj, and .cljs in that order; .mino is gone. External libraries that ship as portable Clojure continue to load as .cljc. Sibling repositories (mino-bench, mino-examples, mino-lsp, tree-sitter-mino) follow the same rename on local branches.

C primitives are now interned as vars in their install-time namespace. (find-var 'clojure.core/inc) returns #'clojure.core/inc, (resolve 'inc) returns the var, (meta #'inc) returns {:ns clojure.core :name inc}, and (deref (resolve 'inc)) invokes the primitive. clojure.string primitives like split and join resolve through their own namespace var. Refer-collision detection no longer exempts primitive bindings unconditionally — a primitive that has been refer'd into another namespace and then re-defined surfaces the same "already refers to a var from another namespace" diagnostic as a mino-side defn would.

The pure-Clojure surface gained the names that portable libraries expected to find: identifier predicates ident?, simple-ident?, qualified-ident?, special-symbol?, map-entry?, the no-op-on- mino predicates bytes?, inst?, uri?, plus uuid? / parse-uuid (string-shaped, since mino has no Java UUID type). Parsing helpers parse-boolean and find-keyword round out the 1.11 set alongside the existing parse-long / parse-double. Collection helpers partitionv, partitionv-all, splitv-at, and replicate build on partition/partition-all (which now also accepts the four-argument (partition n step pad coll) form real Clojure exposes). Hash-combining helpers hash-ordered-coll, hash-unordered-coll, and mix-collection-hash produce mino-internal-consistent hashes (not bit-equal to Clojure's Murmur3, but stable across runs). ex-cause reads from ex-data :cause or attached metadata. with-redefs-fn is the function counterpart to the existing with-redefs macro. inst-ms throws :mino/unsupported. The tap mechanism — add-tap, remove-tap, tap> — is implemented over an atom of subscribers that swallows tap-fn exceptions so a misbehaving subscriber does not poison the stream. tagged-literal and reader-conditional constructors and the tagged-literal? / reader-conditional? predicates round out the reader-record surface; list* and reset-meta! close two long-standing gaps. walk, postwalk, prewalk, postwalk-replace, and prewalk-replace are re-exported from clojure.walk (the implementations live in clojure.core because the bundled-core organization needs them across the standard library).

A new clojure.* coverage test reports the breadth of Clojure- core-namespace surface mino exposes against a manifest of canonical 1.11 names. JVM-only names and special forms are excluded from the percentages and accounted separately; missing names are printed by namespace so the gap is visible without grep'ing the source.

The coverage report drove a follow-up pass that closed the easy gaps. clojure.string adds index-of (with optional from-index), last-index-of, re-quote-replacement, and replace-first. The substring-search helpers are mino-side brute-force scans on top of the existing prim-includes? short-circuit; replace-first uses literal-substring semantics because mino's regex literals share the string type (the same constraint that scopes clojure.string/ replace today). clojure.zip adds leftmost and rightmost. compare-and-set! lands as a stateful primitive in clojure.core: it checks the atom's current value against an expected value and only swaps on equality, returning true on success and false when the expected value did not match.

Final coverage: clojure.core 405/413 portable names (98%), clojure.string 21/21 (100%), clojure.set 12/12 (100%), clojure.walk 8/8 (100%), clojure.edn 1/2 (50%), clojure.zip 28/28 (100%). The remaining clojure.core gaps are queued as follow-ups: bound-fn / bound-fn* need a dynamic-binding-capture API; destructure would rewrite the C-side destructuring helper as a mino-callable function; re-groups / re-matcher need regex capture groups; read and clojure.edn/read need a reader-with- options surface; *ns* works at the symbol-resolution level today and would need a real dynamic var to be find-var-visible.

Breaking Changes

The single shared global env that previously masqueraded as many namespaces is gone. Code that relied on (ns a) (def x ...) clobbering x in b (and vice versa) must now qualify references explicitly or use :require/:use/:refer. Files loaded via require whose first (ns ...) declares a different name than the require argument now error rather than silently mismatching.

The bundled-core namespace is renamed from mino.core to clojure.core, matching the convention other Clojure dialects use for their bundled core. Code that referenced mino.core/foo qualified-name forms must update to clojure.core/foo. Embedding-side C identifiers (mino_state_t, mino_env_t, mino_install_core, etc.) are unchanged. The string operations that already lived in the clojure.string namespace are unaffected.

blank? is no longer reachable through the clojure.core parent chain. Code that called bare (blank? s) from a namespace that did not :require [clojure.string :refer [blank?]] must now bring the name in explicitly or call clojure.string/blank?.

Reader conditionals now match :clj as an active dialect. Tests or code that asserted #?(:clj X) would be skipped under mino must use :cljs (or any other inactive tag) to drive elimination behavior.

Source files now use .clj instead of .mino. The .mino extension is removed from the require resolver. Embedders calling mino_load_file with an explicit path are unaffected — the C API opens whatever path is passed in regardless of extension. Code that hard-coded src/core.mino in a build pipeline must update to src/core.clj; the bootstrap recipe in README.md and mino.edn shows the new form.

v0.72.0 — Release Pipeline & Build Polish

Tag-triggered builds and a controlled promotion path. Pushing a tag matching v* now produces a draft GitHub Release with five platform archives (linux/darwin amd64 and arm64, windows amd64) plus a checksums.txt. Each build job verifies the tag against MINO_VERSION_* in src/mino.h, bootstraps with the canonical recipe, runs a --version and arithmetic smoke test, and uploads its archive. A fan-in publish step concatenates checksums and creates the draft Release. Nothing is published downstream until a maintainer un-drafts the Release and runs the manual promote-packages workflow.

The promote workflow takes a tag and per-ecosystem booleans (publish_brew, publish_scoop). It fails loudly if the Release does not exist or is still a draft, downloads checksums.txt and all assets, re-verifies SHA-256s against the assets, renders the formula or manifest from a template under .github/release-templates/, and opens or updates a PR against the corresponding tap or bucket repo. Auto-merge stays off so the maintainer can review every formula and manifest before users see it.

Three small build issues are also addressed. The form parameter of apply_non_fn_callable is now const mino_val_t * to match S->eval_current_form's qualifier, which clears a -Wcast-qual warning at the only caller without changing behavior. The host- interop dispatch doc comment in src/eval/special.c was tripping -Wcomment because of a /* glob inside an open block comment; it now reads host/.... The README's pasteable bootstrap snippet was missing the printf/sed prelude that generates src/core_mino.h, so a fresh-clone copy-paste failed with 'core_mino.h' file not found; the README now mirrors the canonical recipe in mino.edn.

The public embedding API in src/mino.h is unchanged.

v0.71.0 — Standalone CLI Polished

The standalone mino binary now recognises -h/--help, -V/--version, and -e/--eval EXPR, with a -- separator that ends option processing in the usual POSIX way. Help and version output goes to stdout and exits 0; usage errors go to stderr and exit 2. The -e EXPR path runs one expression through the same evaluator that file mode uses and prints the result via mino_println.

A small subcommand surface is recognised after option processing. mino repl is an explicit alias for the bare REPL invocation; mino nrepl ... and mino lsp ... exec the matching companion binary from PATH and exit 127 if the companion is not installed, with a clear message naming the missing binary. mino task and mino deps continue to work as before.

The REPL banner gains a Type :help for help, :quit to exit hint, and the prompt is now mino=> with a 7-char-aligned continuation prompt. Two reader-level meta-commands are intercepted before eval: a bare :help prints a one-screen description of the REPL, and a bare :quit exits cleanly with code 0. Both fire only when the entire form is the keyword, so they do not affect (println :help) or (do :quit).

The public embedding API in src/mino.h is unchanged.

v0.70.0 — C-Core Refactored

Cycle banner. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

This closes the C-Core Refactor cycle that began at v0.61.0. Across v0.61.0 through v0.68.0 the runtime was reorganized into per- subsystem subdirectories, decomposed into named helpers with explicit boundaries, switched to data-driven primitive registration, gained a three-class internal severity contract, isolated the regex engine, decomposed equality and hashing, flattened the reader into a thin classifier with named dispatch helpers, and replaced the cascading evaluator dispatch with a data-table for special-form recognition.

This release pass focuses on documentation:

- File-level headers no longer carry "Extracted from X / No behavior change" provenance lines that survived the rename pass; each header describes what's in the file, not where it came from. Embedded references to old filenames (runtime_gc.c, prim_*.c, eval_special_*.c, host_interop.c) update to the current paths in comments and doc blocks across the headers and source. - docs/INTERNAL_MODULE_MAP.md lists src/eval/special_registry.c and updates "How to Add a Special Form" for the data-table dispatch. - docs/ARCHITECTURE_CONTRACT.md Section 6 records that when, and, and or have fast-path special-form entries on top of their core.mino macro definitions, so macroexpand is unaffected but the evaluator skips the expansion. - src/mino.h drops a stale claim that the user-visible transient API isn't shipped (it landed in v0.51.0). - src/prim/bignum.c documents that the upper-magnitude hash path is reached only when the bigint exceeds long long; the fits-in-ll path joins int and float at tag 0x03 in hash_val.

v0.68.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The evaluator's eval_impl is split. The orchestrator function becomes a thin classifier plus four named helpers:

- eval_check_limits gates each step on the host limit knobs (limit_steps, limit_heap), the interrupt flag, and the sticky limit_exceeded latch. One source of truth for bail- out. - eval_try_host_syntax owns the four interop sugar shapes (.method, .-field, (new T ...), (T/static ...)) and rewrites them into the matching host/* primitive call. - eval_try_special_form (new src/eval/special_registry.c) walks a static k_special_forms[] table that pairs cached interned-symbol slots with handlers. The previous cascading if (HEAD_IS(...)) chain is gone; new special forms are one table row. - eval_apply_regular_call wraps the function / macro / non-fn-callable dispatch.

Every special-form handler now takes (S, form, args, env, tail) — the seven that didn't already accept tail accept and ignore it. The inline-bodied special forms (quote, quasiquote, var, if, do, recur, lazy-seq, when, and, or) move into static helpers in the registry file so the table can reference them uniformly.

v0.67.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The reader in src/eval/read.c is decomposed. read_form shrinks from ~380 lines to a ~80-line classifier and three new helpers absorb the bulk:

- read_dispatch handles the full #-prefix family in one place: #{ set, #_ discard, #( anon-fn, #' var-quote, ##Inf/##-Inf/##NaN, #"…" regex literal, #?/#?@ reader-conditional, and the tagged-literal fallback. - read_wrap_one captures the prefix-quote pattern that the six reader macros (', \, @, ~, ~@, #') all share — read one form, wrap as (sym form), preserve the macro's source position. Five near-identical inline blocks collapse to five one-line calls. - read_char_literal owns the character-literal decoding (\space, \uNNNN`, UTF-8 codepoints, octal escapes).

The ADVANCE / ADVANCE_N macros are replaced with static inline helpers — same emit, type-checked arguments, no behavior change.

v0.66.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

hash_val is decomposed into a switch dispatch over named byte- loop helpers (hash_long_long_bytes, hash_pointer_bytes, hash_uint32_bytes). The numeric tier collapse — (= 1 1.0 1N) mapping to a single hash under tag 0x03 — funnels through one helper, making the equal-implies-equal-hash invariant explicit in the source. The MINO_MAP branch's inlined HAMT walk is replaced with a call to the shared map_get_val so the per-entry lookup path stays in lock step with the public API.

mino_eq's grouped helpers are renamed to the eq_*_like family that pairs with the hash side: seq_equal becomes eq_seq_like, mino_eq_maps_cross becomes eq_map_like_cross, and the matching set variant becomes eq_set_like_cross. A doc block above mino_eq states the equal-implies-equal-hash contract and notes that new tier additions or new equality bridges must extend the matching hash_val branch in the same commit.

v0.65.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The regex engine in src/regex/ is now a fully isolated module. Its sole header re.h is consumed only from src/prim/regex.c and the include is path-qualified (#include "regex/re.h"); the -Isrc/regex flag is gone from the build configuration, the CI bootstrap, and the README. re.c no longer pulls in <stdio.h> or any mino subsystem header — it depends only on the C standard library. The dead-code debug helper re_print has been removed, so the only symbols exported from src/regex/re.o are the four functions declared in re.h (re_compile, re_free, re_match, re_matchp); a nm probe of the object file confirms no other external symbols. A style-exception note at the top of re.c records that the module preserves its upstream tinyregex-c conventions (Allman braces, two-space indent, fixed-size pattern arena) under Rule 15 of the project's C implementation guide.

v0.64.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The companion-repo perf gate at mino-bench/benchmarks/perf_gate.mino grows from five micros to fifteen, covering reader (read-string over ints and lists), eval-special (fn, let, if, do, loop/recur), allocation (cons, vector, map), host-call (inc, +, count, assoc), and regex (re-find) paths so a regression in any of them surfaces at the gate. Each bench reports timing and bytes-allocated- per-op; the gate fails on either dimension. Allocation counts are deterministic, so the alloc gate uses zero tolerance for zero-baseline entries and a tight 10% band elsewhere. The timing gate stays at +15% locally but widens to +30% on CI runners (CI=true) to absorb the shared-runner noise that produced a uniform +74% skew on ubuntu-latest at the close of the prior cycle. The pinned baseline at baselines/perf_baseline.edn is re-recorded against the current runner shape and now stores both metrics per bench.

v0.63.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The DEF_PRIM macro is gone. Each src/prim/<domain>.c now exports a static mino_prim_def table at TU bottom listing the (name, fn, doc) triples for that domain; the new src/prim/install.c composes the tables into k_core_domains[] and walks it via prim_install_table to bind primitives and attach docstrings. mino_install_core becomes one nested loop instead of ~400 lines of macro calls. The standalone install entry points (mino_install_io, mino_install_fs, mino_install_proc, mino_install_host, mino_install_async) each become a thin wrapper over prim_install_table referencing their own domain's table. The registry of primitives is now data, not code: each domain file owns the list of names it exposes alongside the implementations.

A new src/diag/diag_contract.h introduces a three-class internal severity taxonomy: MINO_ERR_RECOVERABLE (catchable user faults), MINO_ERR_HOST (I/O, OS, capability rejections), MINO_ERR_CORRUPT (invariant violations that abort). The existing user-facing diagnostic kinds (:eval/..., :type/..., :io/..., etc.) stay as the reporting surface; the new enum drives control-flow policy. Each per-subsystem internal.h gains an "Error classes emitted" block listing which classes its code paths produce, where, and why. diag.c carries a kind-to-class mapping table next to the code that builds the diagnostic record.

v0.62.2

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

Source files under a subsystem directory drop the redundant subsystem prefix wherever the prefix duplicated the directory name.

.c renames:

Header renames (path-qualified includes throughout):

Includes are now path-qualified for the renamed subsystem headers (#include "runtime/internal.h" etc.) — bare internal.h would resolve based on -I flag order across the per-subdirectory include paths added in v0.61.0.

lib/mino/tasks/builtin.mino, docs/INTERNAL_MODULE_MAP.md, and docs/ARCHITECTURE_CONTRACT.md reflect the new filenames. Embedders enumerating individual source files need to update their build configuration; the CI bootstrap glob at .github/workflows/ci.yml and the README.md snippet are unchanged because both already use per-subdirectory globs.

v0.62.1

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

src/mino_internal.h is decomposed into per-subsystem internal headers so each translation unit pulls in just the types and declarations it needs:

The old src/mino_internal.h is deleted with no compatibility shim. src/eval/eval_special_internal.h and src/prim/prim_internal.h now include runtime_internal.h. Per- subsystem .c files include the header(s) they actually need.

docs/INTERNAL_MODULE_MAP.md and docs/ARCHITECTURE_CONTRACT.md reflect the new header layout.

v0.62.0

Internal cleanup. No user-visible behavior change; the public embedding API in src/mino.h is unchanged.

The mino_state_free teardown function is split into per-subsystem helpers (state_free_root_envs, state_free_refs, state_free_ns_aliases, state_free_module_cache, state_free_host_types, state_free_meta_table, state_free_intern_tables, state_free_string_interns, state_free_gc_aux, state_free_diag_state, state_free_async, state_free_heap) called in fixed order from a thin orchestrator. Teardown order is preserved.

The remaining behavioral macros become regular functions: FMT_ENSURE becomes fmt_ensure (a static inline that returns the new buffer or NULL on OOM); MINO_GC_VERIFY_CHECK becomes gc_verify_check (a static inline taking the state and container header explicitly); MATH_UNARY becomes math_unary (a static inline taking a function pointer, replacing nine macro-expansion copies).

A new src/runtime/path_buf.{c,h} centralizes the PATH_BUF_CAP constant (4096) that was repeated across the file-I/O primitives, and exposes a path_buf_t struct + path_buf_init / path_buf_set / path_buf_append / path_buf_format API for new callers that want explicit truncation reporting.

v0.61.0

Internal source-tree reorganization. No user-visible behavior change; the public embedding API in src/mino.h is unchanged. Source files under src/ are now grouped into per-subsystem directories: public/, runtime/, gc/, eval/, collections/, prim/, async/, interop/, regex/, diag/, and vendor/imath/.

The bootstrap-compile command in README.md and the GitHub Actions workflow now use explicit per-subdirectory globs in place of the flat src/*.c src/vendor/*.c pattern. Embedders building mino from source need to update their build to enumerate the new subdirectories and add a matching -I flag for each.

docs/INTERNAL_MODULE_MAP.md reflects the new layout. CLAUDE.md and docs/ARCHITECTURE_CONTRACT.md are unchanged.

v0.60.0 — Dialect Complete

Banner release closing the Dialect-Complete cycle. mino is now the Clojure dialect at embedded scale: code that doesn't reach for JVM interop, chunked-seq throughput, or host-thread primitives runs on mino unchanged.

This release adds no new runtime features over v0.56.0. It is a docs and ecosystem ripple — the dialect surface is settled and the companion tooling (mino-site, mino-lsp, mino-nrepl, tree-sitter-mino, mino-examples) all track the new numeric-tower type tags and the Clojure-shape multimethod / hierarchy semantics.

What's complete after this cycle:

Documentation

Companion ripple

What's next

The Dialect-Complete cycle closes here. Two cycles are queued:

1. C-Core Refactor cycle. Reader decomposition, evaluator dispatch split, behavior-macro cleanup, error-class contract, regex-engine isolation. Internal-only refactoring; the user surface stays put. 2. v1.0 / ABI freeze cycle. src/mino.h frozen and the evolving-API language removed from the header. Numeric-tower type tags (MINO_BIGINT, MINO_RATIO, MINO_BIGDEC) lock in. Optional mino.hpp C++ RAII wrappers.

Until v1.0, src/mino.h stays labelled evolving and any item in this cycle is revisitable under a minor bump.

v0.56.0 — Dialect-Semantics Audit

Sixth release of the Dialect-Complete cycle. Targeted fixes to mino's multimethod / hierarchy implementation tighten dialect alignment with Clojure on four edge cases that don't show up until you reach for them.

Fixed

Added

v0.55.0 — Numeric Tower Complete

Fifth release of the Dialect-Complete cycle. mino's numeric tower closes: ratio and bigdec types arrive, the four arithmetic primitives plus all comparison primitives tier-dispatch across the five numeric tiers (int, bigint, ratio, bigdec, float), and = goes Clojure-strict on the numeric tier with a new == for cross-tier numeric equality.

Added

Changed

Known limitations

v0.54.0 — Auto-Promoting Arithmetic

Fourth release of the Dialect-Complete cycle. The promoting siblings of +, -, *, inc, and dec arrive: when a long accumulator overflows, the running sum / product crosses into bigint instead of throwing. Plain + / - / * / inc / dec are unchanged — the overflow-throwing semantics from v0.45.0 stay in place — so the choice between fail-fast and auto-promote is now a per-call-site decision.

Added

Fixed

Known limitations

v0.53.0 — Bigint Foundation

Third release of the Dialect-Complete cycle. mino gains the first tier of the Clojure numeric tower: an arbitrary-precision integer type, backed by vendored imath. Literals, constructors, equality, hashing, and readable printing are all wired up; auto-promoting arithmetic (+', -', *', inc', dec') and the remaining tower tiers (ratio, bigdec) arrive in v0.54.0 and v0.55.0.

Added

Known limitations

v0.52.0 — Extensible Printer

Second release of the Dialect-Complete cycle. pr and prn now route through a mino-level print-method multimethod, so user code can extend readable printing for its own types.

Added

Changed

Known limitations

v0.51.0 — Transients, Sorted-By, Subseq, Pr/Print/Newline

First release of the Dialect-Complete cycle. Four additions on top of the already-landed C kernels, aimed at the everyday Clojure surface that mino was missing: batch-mutation transients at the mino level, custom-comparator sorted collections, bounded range queries on sorted collections, and the no-trailing-newline companions to prn / println.

Added

Reviewed

Cortex reviewed all six open questions for the Dialect-Complete cycle. Q3 and Q4 gated Phase A and resolved cleanly into the implementation above. Q2 shapes the numeric tower in Phase C, Q5 shapes the printer rework in Phase B, Q1 gates the dialect- semantics audit in Phase D, and Q6 shapes the intentional- divergences doc in Phase E.

v0.50.0 — C Core Complete and Polished

Cycle-closure release. The C core is feature-complete for the work that had to land at the C level: lazy-seq write-barrier coverage, overflow-throwing arithmetic, first-class characters, the callable protocol for non-fn values, vector pop with metadata, multi-coll sequence, C-surface transients, C-surface multimethods, a perf regression gate wired into CI, fuzz coverage with a nightly libFuzzer job, a native crash handler, version constants, and two embedder helpers (mino_throw, mino_args_parse). The embedding API in src/mino.h stays labelled as evolving until a later ABI-freeze cycle; the surface is stable enough for external embedders to build against today, with any break called out in its minor-bump CHANGELOG entry.

This release is purely a tag. No code changed since v0.49.1. The full sanitizer matrix (ASAN, UBSAN, TSAN) is clean across the test suite, the GC stress shards, and the multi-state embedding harness.

What Ships Next

Three separate cycles are queued after v0.50.0, in order:

1. An internal C-core refactor cycle that picks up code-quality and organization items deferred during the complete-and-polish work. User-visible surface stays stable; this is internal hygiene. 2. A dialect cycle that fills the remaining mino-level surface on top of the C groundwork landed here: public transient! / persistent! / assoc! / conj! / dissoc! / pop! / disj!, public defmulti / defmethod / prefer-method plus hierarchy APIs, the currently-disabled clj-compat test blocks that still need a macro layer, and gaps like sorted-map-by, subseq, pr / print. 3. An ABI-freeze cycle that commits src/mino.h to a stable contract for the first time. This is the v1.0 tag.

BigInt / Ratio / BigDecimal arithmetic ships as one whole feature (hook plus backend plus tower dispatch plus tests plus docs) in one of the later cycles, not piecemeal. Integer overflow throws is the honest complete behavior in v0.50.0.

v0.49.1 — Callable and Module-Resolution Dedup

Two pieces of internal duplication turned out to be drifting. No user-visible surface change from the mino side; the fixes are available to C embedders that introspect mino_last_error and to any code that invokes (require '[x :as a]) from inside a primitive at the same time an ns form is pending.

Changed

v0.49.0 — Docs and Hygiene

A documentation-focused release. No runtime or API changes; the mino binary is bit-for-bit equivalent to v0.48.0. The work here brings the public docs back in line with the source of truth.

Fixed

Changed

v0.48.0 — Embedder Polish

Sharpens the embedding surface in src/mino.h without rearranging any runtime internals. Version constants land so embedders can compile-time guard against an unexpected runtime. A reference Makefile ships at repo root with sanitizer dev targets. Two new helpers -- mino_throw and mino_args_parse -- pull patterns out of hand-written primitives and give host code a shorter path to structured exceptions and validated arguments. The README gains an explicit SemVer policy paragraph.

Added

v0.47.0 — Release Gates

Release-gate infrastructure pass. No mutator-visible surface changes; the work here exists to keep the surface from silently decaying as later releases layer on top. A perf regression gate now runs in CI against a pinned baseline. The fuzz corpus grew from four seeds to twenty-two, with a libFuzzer nightly job backing it. A native crash handler now produces a usable post-mortem line instead of a bare segfault. The write barrier grew a structural matrix in its header comment plus a debug-time assertion, and the C transient API picked up a real barrier for its mutator-stored inner pointer.

Added

Fixed

Changed

v0.46.0 — Dialect C Groundwork

Lands the C-level mechanisms that later dialect work will build on without dragging the user-visible surface along yet. Integer arithmetic now refuses to silently wrap, character literals are a first-class value type, transducer sequence accepts multiple collections, and embedders get a C API for batch mutation of persistent collections. The previously-disabled clj compat assertions that this C work unlocks are re-enabled in the same release — they were gated off precisely because this foundation was missing.

Added

Changed

Fixed

v0.45.0 — Correctness Closure

Closes the three known correctness gaps in the C core. The lazy-seq cache-barrier path gains a regression test that exercises the realisation slot through promotion. The bit-shift primitives bounds-check their amount and raise a classified error on out-of-range input, closing the last UBSAN shift-exponent finding. ns :require surfaces missing-module load failures instead of silently swallowing them.

Added

Changed

Fixed

v0.44.0 — GC Observability and Spawn-Path Perf

Adds embedder-visible remset and mark-stack sizing fields to gc-stats, plus targeted perf improvements for spawn-heavy workloads. No functional changes to the collector; existing embedders see strictly more data in the stats struct and map.

Added

Changed

Performance

v0.43.1 — Nested-Minor UAF Fix, GC Event Ring, Multi-State Stress

Bug-fix and hardening release. Closes a nursery-overflow use-after-free under MAJOR_MARK surfaced at high go-loop spawn concurrency, adds a GC event ring + reachability classifier for future debugging, and moves three pieces of mutable process state (filename intern, var-string intern, PRNG) into mino_state_t so multiple embedded states no longer race.

Breaking changes

Changed

Fixed

Added

Internal

v0.43.0 — Pure-mino Channels and Actors

Two successive demotions move the channel layer and the actor system out of C into lib/core/. The C runtime keeps only what must be C: the scheduler run queue, the deadline-timer priority queue, the GC, and the evaluator. Total C surface shrinks by roughly 2,100 LOC; the built binary drops ~20 KB on darwin arm64 at -O2. The public mino API is unchanged except where flagged below.

Breaking changes (channel demotion)

Mino callers see no difference: the public surface (chan, put!, take!, offer!, poll!, close!, closed?, chan?, alts!, buffer, dropping-buffer, sliding-buffer, promise-chan, timeout, go) is unchanged. The starred names still resolve through compatibility aliases in lib/core/channel.mino.

Host embedders that called these primitives directly from C via mino_eval must either (a) invoke the mino-level equivalents through mino_eval on the corresponding public name, or (b) pin to v0.42.0.

Breaking changes (actor demotion)

Added

Fixed

Removed

Channel demotion:

Actor demotion:

prim_async.c drops from 475 to 127 LOC. clone.c drops from 661 to 213 LOC and keeps only cross-state mino_clone.

v0.42.0 — Generational + Incremental Garbage Collector

Replaces the single-generation mark-and-sweep collector with a two-generation non-moving tracing collector whose old-gen mark phase runs incrementally, paced by mutator allocation. Max pause on tail-heavy realistic workloads drops from 100-110 ms to under 60 ms; GC share drops from 65-95% to 15-30% on the same workloads. No API-breaking changes to value semantics or evaluation, but the public C embedding surface gains three new entry points for host- driven collection, tuning, and stats.

Added

Fixed

Changed

Performance

v0.41.0 — GC Timing Instrumentation

Adds wall-clock measurement of garbage-collection pauses so the mino-bench harness can report GC share of wall time per benchmark. Purely instrumentation — no behavior change, no optimization.

Added

Changed

v0.40.0 — Interpreter Performance Pass

30 benchmark-driven optimizations. The eval floor (per-operation cost in a tree-walking step) dropped from ~6 us to ~1 us — roughly a 5x speedup on realistic programs. Every change ships with a before/after measurement in the mino-bench suite.

Added

Changed

Fixed

v0.39.1 — Cross-Platform Portability Fixes

Fixed

v0.39.0 — Task Runner and Self-Hosting Build

Added

Removed

v0.38.0 — Project Manifest and Dependency Management

Added

v0.37.0 — Compatibility and Stdlib

Added

Changed

Removed

v0.36.0 — Error Diagnostics

Added

Fixed

v0.35.0 — core.async and Conformance

Added

Changed

Fixed

v0.34.0 — Conformance Hardening Phase 2

Added

Changed

Fixed

v0.33.0 — Conformance Hardening

Added

Changed

Fixed

v0.32.0 — Host Interop

Added

Changed

v0.31.0 — clojure.string Namespace

Added

Fixed

v0.30.0 — Hierarchies and Dispatch Essentials

Added

v0.29.0 — Stateful Operations and Watches

Added

Changed

v0.28.0 — Core Collections Semantics

Added

Changed

v0.27.0 — Numeric Tower Behavior

Added

v0.26.0 — Reader Literal Parity

Added

v0.25.0 — Test Framework Compatibility

Added

v0.24.0 — Namespace and Var Semantics

Added

v0.23.0 — Reader and Loadability Baseline

Added

v0.22.0 — Collection and Sequence Conformance

Added

Changed

v0.21.0 — Architecture Hardening

Changed

Fixed

v0.20.0 — Dialect Alignment

Brings mino's surface language into close alignment with standard conventions. Multi-arity functions, destructuring, protocols, transducers, value metadata, and reader macros land as a cohesive set. A large test suite derived from the official test repository validates conformance across 552 tests (up from 300) and 2039 assertions (up from 664).

Added

Changed

Fixed

v0.19.0 — Explicit Runtime State

Breaking changes

Added

Changed

v0.18.0 — Runtime State, GC Hardening, and Repo Reorganization

Multi-instance runtime support, GC correctness under stress, and a cleaner project layout for embedding and development.

Added

Changed

Fixed

Performance

v0.17.0 — Proper Tail Calls and Core Library

Proper tail call optimization in the evaluator. All function calls in tail position run in constant stack space, including mutual recursion. Plus ~80 new core.mino definitions bringing the standard library close to feature parity with core language functions.

Added

v0.16.0 — Complete C Primitive Layer

Adds every C primitive needed to implement the non-JVM parts of clojure.core. The pure mino compositions come in a later version; this version focuses on the C foundation.

Added

Changed

v0.15.0 — Test Framework and Dogfooding

Replaces all shell test scripts with mino-based tests. The language now tests itself.

Added

Changed

v0.14.0 — Lazy Sequences, Complete C Core, core.mino Expansion

Lazy sequences land as a first-class type, enabling infinite data structures and demand-driven evaluation. The C core gains its final set of primitives; seven sequence operations move from C to lazy mino implementations. core.mino nearly doubles in size.

Added

Breaking

v0.13.0 — Atoms, Spit, Stdlib Architecture

Establishes the three-tier architecture: C runtime (irreducible primitives), bundled stdlib.mino (macros and compositions), and future mino-std package. Delivers atoms and spit.

Added

Changed

v0.12.0 — Release Candidate (Alpha)

Quality, polish, and documentation pass. No new language features.

Changed

Added

Verified

v0.11.0 — Sequences and Remainder of Stdlib

Sets, sequence transformations, string operations, and utility functions round out the core standard library. Strict (non-lazy) semantics throughout — every sequence operation returns a concrete list or collection.

Added

Changed

Notes

v0.10.0 — Interactive Development

The printer is now cycle-safe, def/defmacro record metadata for introspection, and a new in-process REPL handle lets a host drive read-eval-print one line at a time with no thread required.

Added

Changed

Notes

v0.9.0 — Sandbox, Modules, Diagnostics

Runtime errors now carry source locations and call-stack traces. Script code gains try/catch/throw for recoverable exceptions. The core environment is sandboxed by default — I/O primitives are installed separately via mino_install_io. A host-supplied module resolver enables require for file-based modules.

Added

Changed

Notes

v0.8.0 — Host C API

First draft of the embedding API. An external C program can now create a runtime, register host functions, evaluate source, call mino functions, and extract results — all in under 50 lines of glue code. The surface language gains type predicates, str, and basic I/O. All new symbols are mino_*-prefixed; the header remains marked UNSTABLE until v1.0.

Added

Notes

The header remains /* UNSTABLE until v1.0.0 */. API additions are possible through the 0.x series; the v1.0 release freezes the ABI. Execution limits are global rather than per-env; this simplifies the implementation while a single-threaded model is the only supported configuration. The mino_load_file function is the first place the runtime performs host I/O on behalf of the caller; v0.9 will gate this behind the capability model.

v0.7.0 — Tracing Garbage Collection

Replaces the per-allocation malloc/free discipline with a stop-the-world mark-and-sweep collector. Every heap object the runtime produces — values, environments, persistent-collection internals, and scratch arrays — is now tracked by a single registry and reclaimed automatically once it becomes unreachable. The surface language is unchanged.

Added

Changed

Notes

The collector is non-incremental and non-generational; the entire heap is scanned on each cycle. For the sizes this runtime is meant to embed at, linear scan over a sorted range index is a good fit, and the 2× live-bytes threshold keeps mean pause time bounded. The v0.12 release candidate will profile realistic workloads and decide whether to layer on an incremental pass.

v0.6.0 — Macros

Lifts the surface language above its primitives. defmacro, quasiquote, and a small set of in-language threading and short-circuit forms mean that new control shapes can land without growing the C evaluator.

Added

Notes

0.x makes no automatic hygiene promise; macro writers should reach for gensym when they need an identifier that can't capture anything the caller introduced. The decision whether to keep gensym-only or add full hygiene lands in v1.0 triage.

v0.5.0 — Persistent Maps

Replaces the map layout with a 32-wide hash array mapped trie. get, assoc, and update are now sub-linear; maps can be used as map keys, equality between maps no longer scales quadratically, and lookup no longer depends on key arity.

Changed

Added

Notes

The v0.5 HAMT is the last structural replacement before the GC work in v0.7; from here the layout stays but the allocator underneath changes. Semantics remain the contract.

v0.4.0 — Persistent Vectors

Replaces the vector layout with a persistent 32-way trie without changing the surface language. Every vector primitive from v0.3 behaves identically; the work lives entirely behind the API.

Changed

Added

Notes

The naïve map layout from v0.3 is still in place. v0.5 replaces it with a HAMT, again without changing the surface API. The semantics are the contract, not the layout.

v0.3.0 — Literal Vectors, Maps, and Keywords

Brings the value-oriented data model to the surface language. Programs can now express structured data literally and manipulate it through immutable collection primitives.

Added

Notes

The v0.3 representations (flat arrays for vectors and maps, linear scan for map lookup) are intentionally naïve. The public contract is the primitive signatures and semantics; v0.4 replaces the vector layout with a persistent 32-way trie and v0.5 replaces the map with a HAMT, both without changes to the surface API.

v0.2.0 — Core Special Forms and Closures

Locks in lexical scope, first-class functions, and bounded-stack tail recursion. The evaluator is now expressive enough to define factorial and fib iteratively and to build and apply higher-order functions.

Added

v0.1.0 — Walking Skeleton

The first published milestone. Establishes the single-file build, the public C header, and an end-to-end read-eval-print pipeline.

Added