This vignette explains how Arl code is compiled and executed. It is intended for contributors and curious users who want to understand what happens between typing an expression and seeing its result. Note that everything in this document should be considered an internal implementation detail subject to change without notice.
Pipeline overview
Every Arl expression passes through five stages:
Source text -> Tokenizer -> Parser -> Macro Expander -> Compiler -> R eval()
Tokenizer (
R/tokenizer.R) – Lexical analysis using regex-based lexing. Produces a flat token stream (LPAREN, RPAREN, SYMBOL, NUMBER, STRING, KEYWORD, etc.).Parser (
R/parser.R) – Converts the token stream into R call objects representing Arl S-expressions. Reader macros (',`,,,,@) are expanded into explicitquote,quasiquote,unquote, andunquote-splicingforms during parsing.Macro Expander (
R/macro.R) – Walks the parsed AST and expandsdefmacro-defined macros. Supports macro expansion up to an arbitrary fixed depth (e.g., the single-levelmacroexpand-1, or 2-level, 3-level, etc), or full recursive expansion until no macros remain. Quasiquote templates are processed by a shared walker (R/quasiquote.R).Compiler (
R/compiler.R) – Translates the macro-expanded Arl AST into R language objects. Handles all special forms, applies optimizations (constant folding, dead code elimination, self-TCO), and optionally inserts coverage instrumentation.R eval() – The compiled R expression is evaluated with R’s native
eval(). Because Arl compiles to R code, all R functions and data structures are directly accessible.
What “compilation” means
Arl does not produce bytecode or machine code. Instead, the compiler
translates Arl AST nodes into R language objects – the same
objects you get from quote() in R. For example:
arl> (define x (+ 1 2))
#> 3
compiles to something like:
assign("x", 1 + 2, envir = .__env)The result is a single R expression that can be passed to
eval(). This approach piggybacks on R’s own evaluation
machinery, giving Arl access to R’s scoping, garbage collection, and
entire function library for free. (Note that in practice, constant
folding would precompute 3 rather than leaving an
unevaluated 1 + 2 in the generated R code.)
Inspecting the compilation pipeline
Use engine$inspect_compilation(text) to see each
stage:
engine <- Engine$new()
info <- engine$inspect_compilation("(when #t (+ 1 2))")
info$parsed # Arl AST after parsing
info$expanded # After macro expansion
info$compiled # Compiled R expression
info$compiled_deparsed # R source as a character vectorThe compiled_deparsed field is especially useful for
understanding what R code the compiler generates.
Special form dispatch
The compiler’s core is a dispatch table in compile_impl
(see compiler.R). When it encounters a call, it checks the
operator against the known
special forms. Anything not in that list falls through to
compile_application, which compiles a regular function
call.
A few implementation notes on how individual special forms are compiled:
-
if– wraps the condition in a truthiness check (.__true_p()) -
define/set!– detect(define name (lambda ...))or(set! name (lambda ...))patterns and enable self-TCO when applicable -
lambda– compiles closure creation with parameter patterns and optional TCO rewriting -
begin– compiled to R{ }blocks -
defmacro– registers the macro with the macro expander at compile time -
while– compiled to its R equivalent -
and,or– multi-argument wrappers around R’s&&and||, which can only take two arguments -
quasiquote– expands templates with unquote/splicing
Self-tail-call optimization
When the compiler sees (define name (lambda ...)) or
(set! name (lambda ...)) and the lambda body contains
self-calls in tail position, it rewrites the entire function as a
while(TRUE) loop. Tail calls become parameter reassignments
followed by next, eliminating stack growth. The
set! support means letrec-bound lambdas are
automatically optimized, since the letrec macro expands
into set!.
The key methods:
-
has_self_tail_calls– checks if a lambda body has self-recursive tail calls -
expr_has_self_tail_call– recursively walks the AST looking for self-calls in tail positions (throughif,begin,cond,let,let*,letrec) -
compile_self_tail_call– transforms the recursive call into parameter reassignments within the loop body -
compile_tail_if,compile_tail_begin– ensure both branches ofifand the last expression ofbeginget tail-position treatment
For example:
arl> (define factorial
arl> (lambda (n acc)
arl> (if (< n 2)
arl> acc
arl> (factorial (- n 1) (* acc n)))))
#> <function>
compiles to something like:
function(n, acc) {
while (TRUE) {
if (.__true_p(n < 2)) {
return(acc)
} else {
.__tco_n <- n - 1
.__tco_acc <- acc * n
n <- .__tco_n
acc <- .__tco_acc
next
}
}
}Self-TCO works with destructuring parameters, keyword arguments, and rest parameters. Since self-calls become loop iterations, recursive frames do not appear in stack traces – only the outermost call is visible.
Constant folding and dead code elimination
The compiler evaluates pure function calls on compile-time literals
at compile time. This is controlled by try_constant_fold,
which maintains a list of safe functions (arithmetic, comparisons, math,
string operations).
When an if test is a compile-time constant, the compiler
eliminates the dead branch entirely (eval_constant_test in
compile_if). For example:
arl> (if #t "yes" "no")
#> "yes"
compiles to just "yes". Note that compile-time constant
tests are not as trivial as they sound and can often result from macro
expansion.
Constant folding is automatically disabled when coverage tracking is active, because folding would bypass the instrumented function bodies and produce inaccurate coverage numbers.
Coverage instrumentation
When a CoverageTracker is attached to the engine, the
compiler inserts tracking calls at three points:
Statement-level: Before each statement in a lambda body, a
.__coverage_track(file, start_line, end_line)call is interleaved.Branch-level: Both branches of
ifexpressions are wrapped with coverage calls, tracking which branches execute.If-test narrowing: For
ifforms, coverage is narrowed to just the test line (since branches are tracked separately).
The tracker maintains a set of coverable lines derived from AST analysis, and coverage reports compare executed lines against this set.
Reserved internal names
If you inspect compiled output (via inspect_compilation)
or peek inside Arl environments, you will see names that start with
.__ (dot, underscore, underscore). These are
reserved for Arl’s internal machinery and should not be
used in user code.
The convention serves two purposes:
- The leading
.hides names fromls(), keeping the environment tidy. - The
.__prefix signals “internal – do not touch.”
Examples you might encounter:
| Name | Purpose |
|---|---|
.__env |
Reference to the enclosing Arl environment |
.__true_p |
Truthiness predicate for if tests |
.__assign_pattern |
Destructuring assignment helper |
.__tmp__N |
Compiler-generated temporaries |
.__tco_<param> |
Swap variables for tail-call optimization |
.__module |
Sentinel marking module environments |
Attempting to bind a .__-prefixed name with
define or set! is an error:
Error: define cannot bind reserved name '.__foo' (names starting with '.__' are internal)
This guard is enforced at both compile time and runtime. It cannot prevent all access (you can always reach into an R environment directly), but it makes the boundary between user code and internal machinery explicit.