Table of Contents
- Introduction: The Allure of Tagless Final
- The Goal: “Tagless Initial” Encoding
- A First Look: The Rust Expression and Its Assembly
- The Interpreter: A Simple
eval
Implementation - The Magic Behind the Curtain: GADT-like Enums
- Core Components:
Cursor
andAttic
- The Zero-Cost Proof: An Experiment with the
never
Type - Conclusion
Introduction: The Allure of Tagless Final
In the world of functional programming, the “Tagless Final” pattern is a wonderful abstraction for creating embedded domain-specific languages (DSLs). It allows you to define an interface for your language’s operations and then write multiple interpreters (e.g., one to evaluate, one to pretty-print, one to optimize) without changing the core program logic.
A key test for a systems language like Rust is its ability to adopt such high-level abstractions without sacrificing its core promise: zero-cost performance. This post explores how to implement the “tagless initial” variant of this pattern, which relies on Generalized Algebraic Data Types (GADTs), in Rust. We will demonstrate that with careful type-level programming, we can build these expressive structures and have the compiler completely erase them, resulting in optimal assembly code.
The Goal: “Tagless Initial” Encoding
The “tagless initial” encoding, as described in resources like Serokell’s Introduction to Tagless Final, uses a GADT to represent expressions. In Haskell, it looks like this:
data Expr a where
IntConst :: Int -> Expr Int
Lambda :: (Expr a -> Expr b) -> Expr (Expr a -> Expr b)
Apply :: Expr (Expr a -> Expr b) -> Expr a -> Expr b
Add :: Expr Int -> Expr Int -> Expr Int
eval :: Expr a -> a
eval (IntConst x) = x
eval (Lambda f) = f
eval (Apply f x) = eval (eval f x)
eval (Add l r) = (eval l) + (eval r)
Notice that the type of the expression Expr a
is tied to the type of value it will produce (a
). Our goal is to replicate this structure and its eval
function in Rust and verify that it compiles down to nothing but the computed result.
A First Look: The Rust Expression and Its Assembly
Let’s dive right in. Here is a Rust function that constructs a complex expression using our GADT-style encoding. It defines several integer constants, lambdas (including a higher-order one), and applications.
fn expr(u: isize, v: isize, w: isize) -> Gadt<cu::Int, impl Attic> {
let a = int_const(u);
let b = int_const(v);
let c = add::<(), _, _>(a, b);
let d = lambda::<(), _, _, _, _, _>(move |x| {
let a = int_const(u * 2 + v * 3 + w * 5);
add::<(), _, _>(a, x)
});
let e = apply::<(), _, _, _>(d, c);
let f = lambda::<(), _, _, _, _, _>(move |x| {
let a = int_const(u * 3 + v * 5 + w * 13);
add::<(), _, _>(add::<(), _, _>(a, x), c)
});
let j = lambda::<(), _, _, _, _, _>(move |x: Gadt<_, _>| -> Gadt<_, _> {
apply::<(), _, cu::Int, _>(x, e)
});
apply::<(), _, _, _>(j, f)
}
#[inline(never)]
pub extern "C" fn eval_expr(u: isize, v: isize, w: isize) -> isize {
expr(u, v, w).eval()
}
This looks like a heavy, complex structure. However, when we compile this in release mode and inspect the assembly for eval_expr
, we see something magical:
playground::eval_expr: # @playground::eval_expr
# %bb.0:
leaq (%rdx,%rdx,2), %rax
leaq (%rdx,%rax,4), %r8
addq %rsi, %rdx
leaq (%rdx,%rdx,4), %rdx
leaq (%rsi,%rdi,2), %rax
leaq (%rax,%rax,2), %rcx
leaq (%rdi,%rsi,2), %rax
addq %r8, %rax
addq %rdx, %rax
addq %rcx, %rax
retq
The entire expression tree, the lambdas, the apply
calls—it has all been boiled down to a series of arithmetic instructions (leaq
, addq
). There is no interpreter loop, no dynamic dispatch, no memory allocation. This is the “zero-cost” promise fulfilled.
The Interpreter: A Simple eval
Implementation
How is this possible? The evaluation logic is defined in an Eval
trait. Its implementation for our Gadt
type is a simple match
statement that recursively calls eval
on its components.
pub trait Eval<Cur: Cursor, Att: Attic> {
fn eval(self) -> SolOf<Cur, Att>;
}
impl<Cur: Cursor, Att: Attic> Eval<Cur, Att> for Gadt<Cur, Att> {
fn eval(self) -> SolOf<Cur, Att> {
match self.0 {
Enum::IntConst(v01, a) => v01.vu_cast::<_, _, _, _>(&a)(v01.get(a)),
Enum::Lambda(v02, f) => v02.vu_cast::<_, _, _, _>(&f)(v02.get(f)),
Enum::Apply(v03, t) => v03._vu_cast::<_, _, _, _, _>(&t)({
let (f, a) = v03.get(t);
let f = f.eval();
f(a).eval()
}),
Enum::Add(v04, t) => v04.vu_cast::<_, _, _, _>(&t)({
let (a, b) = v04.get(t);
let a = a.eval();
let b = b.eval();
a + b
}),
}
}
}
At first glance, this looks like a standard interpreter that would involve runtime overhead. The key to its optimization lies in the definition of the Gadt
and Enum
types.
The Magic Behind the Curtain: GADT-like Enums
The core of this technique is an enum
that uses Rust’s never
type (!
) to ensure that for any given type signature, only one variant is actually constructible. This effectively removes the “tag” from the enum, as the compiler knows at compile time which variant is in use.
pub struct Gadt<Cur: Cursor, Att: Attic>(
Enum<Cur::_V01<Att>, Cur::_V02<Att>, Cur::_V03<Att>, Cur::_V04<Att>, Cur, Att>,
);
pub enum Enum<V01: Ng, V02: Ng, V03: Ng, V04: Ng, Cur: Cursor, Att: Attic> {
__Ph__(!, Ph<(V01, V02, V03, V04, Cur, Att)>),
IntConst(V01, V01::NGuard<isize, Cur, cu::Int, Att>),
Lambda(
V02,
V02::NGuard<
Att::Fun<ReprOf<Att, Cur::_1, Att::_1>, ReprOf<Att, Cur::_2, Att::_2>>,
Cur,
cu::ReprFun<Cur::_1, Cur::_2>,
Att,
>,
),
Apply(
V03,
V03::NGuard<
(
ReprOf<Att, cu::ReprFun<Att::DomCur, V03::CGuard<Cur>>, Att::FunAtt>,
ReprOf<Att, Att::DomCur, _1Of<Att::FunAtt>>,
),
Cur,
V03::CGuard<Cur>,
_2Of<Att::FunAtt>,
>,
),
Add(
V04,
V04::NGuard<
(ReprOf<Att, cu::Int, Att::_1>, ReprOf<Att, cu::Int, Att::_2>),
Cur,
cu::Int,
Att,
>,
),
}
The types V01
through V04
are controlled by the Attic
trait. By setting these to !
, we make it impossible to construct the corresponding variant. Since a value of type !
can never be created, the compiler can prove that code path is unreachable. The NGuard
trait is a helper that ensures any variant “disabled” with !
has a size of zero, allowing the enum
to collapse to the size of its single active variant.
Core Components: Cursor
and Attic
The two orchestrating traits are Cursor
and Attic
. They work together as a type-level configuration system:
Attic
: This trait holds the actual information about whichEnum
constructors are disabled. In its default state, it sets all_V*
types to!
, effectively disabling all variants. To enable a constructor, a specificAttic
implementation will provide a non-!
type.Cursor
: This trait acts as a filter or view, selecting which configuration fromAttic
to apply at each point in the expression tree.
pub trait Attic {
// ... defaults to `!` ...
type _V01: NGuard = !;
type _V02: NGuard = !;
type _V03: NGuard = !;
type _V04: NGuard = !;
// ... other associated types ...
}
pub trait Cursor {
// ... uses the configuration from Attic ...
type _V01<Att: Attic>: NGuard = Att::_V01;
type _V02<Att: Attic>: NGuard = Att::_V02;
type _V03<Att: Attic>: NGuard = Att::_V03;
type _V04<Att: Attic>: NGuard = Att::_V04;
// ... other associated types ...
}
Through this mechanism, we can construct a Gadt
type where only one of the Enum
variants is valid, making pattern matching in eval
fully predictable at compile time.
The Zero-Cost Proof: An Experiment with the never
Type
What happens if we break this invariant? Let’s conduct an experiment. We’ll replace !
with a concrete, zero-sized type like ((),)
in our Attic
trait. This means that all variants are now theoretically constructible.
// In Attic, we change the default:
// from: type _V01: NGuard = !;
// to: type _V01: NGuard = ((),); // and for V02, V03, V04
Suddenly, the compiler can no longer guarantee which variant is active. It must now include a tag (discriminant) and perform runtime checks. The generated assembly explodes:
playground::eval_expr: # @playground::eval_expr
# %bb.0:
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $32, %rsp
...
callq <playground::Gadt<Cur,Att> as playground::Eval<Cur,Att>>::eval
...
callq <playground::Gadt<Cur,Att> as playground::Eval<Cur,Att>>::eval
...
popq %r15
retq
We now see explicit calls to eval
. The abstraction is no longer zero-cost; it’s being interpreted at runtime. This demonstrates that the never
type is the critical component for achieving taglessness and enabling compiler optimization.
One might think that simply adding #[inline(always)]
to eval
could fix this. Indeed, in this simple case, inlining can help the optimizer untangle the calls and produce much better assembly. However, this is not a robust solution. It relies on optimizer heroics and will likely fail in more complex, modular programs where DSLs are nested or defined across different crates. The never
type approach guarantees the optimization by construction.
Conclusion
By carefully using Rust’s type system, particularly the never
type (!
), we can successfully implement the “tagless initial” pattern. We created a Gadt
-style enum where only one variant is constructible for any given type, effectively removing the need for a runtime tag. This allows the compiler to see through the abstraction entirely, collapsing a complex expression tree into its raw computational equivalent.
This technique provides a powerful blueprint for building high-level, expressive DSLs in Rust without compromising on the performance expectations of a systems language.
You can experiment with the full code yourself:
Join this blog discussion: