The Little Book of Rust Macros
Note: This is a continuation of Daniel Keep's Book which has not been updated since the early summer of 2016, adapted to make use of mdBook.
View the rendered version here and the repository here.
This book is an attempt to distill the Rust community's collective knowledge of Rust macros,
Macros by Example
to be precise. As such, both additions (in the form of pull requests) and
requests (in the form of issues) are welcome.
The original Little Book of Rust Macros has helped me
immensely with understanding Macros by Example style macros while I was still learning the
language. Unfortunately, the author of the book has since left the project untouched, while the Rust
language as well as it's macro-system keeps evolving. Which is why I wanted to revive the project
and keep it up to date with current Rust helping other newcomers understand macros, a part of the
language a lot of people seem to have trouble with.
This book expects you to have basic knowledge of Rust, it will not explain language features or constructs that are irrelevant to macros. No prior knowledge of macros is assumed. Having read and understood the first seven chapters of the Rust Book is a must, though having read the majority of the book is recommended.
Thanks
Thanks to Daniel Keep for the original work and thanks to the following people for suggestions and corrections to the original: IcyFoxy, Rym, TheMicroWorm, Yurume, akavel, cmr, eddyb, ogham, and snake_case.
License
This work inherits the licenses of the original, hence it is licensed under both the Creative Commons Attribution-ShareAlike 4.0 International License and the MIT license.
Macros, A Methodical Introduction
This chapter will introduce Rust's Macro-By-Example system: macro_rules!
. Rather
than trying to cover it based on practical examples, it will instead attempt to give you a complete
and thorough explanation of how the system works. As such, this is intended for people who just
want the system as a whole explained, rather than be guided through it.
There is also the Macros chapter of the Rust Book which is a more approachable, high-level explanation, the reference chapter and the practical introduction chapter of this book, which is a guided implementation of a single macro.
Syntax Extensions
Before talking about macro_rules!
, it is worthwhile to discuss the general mechanism they are built on:
syntax extensions. To do that, we must discuss how Rust source is processed by the compiler, and the
general mechanisms on which user-defined macro_rules!
macros are built.
Source Analysis
Tokenization
The first stage of compilation for a Rust program is tokenization. This is where the source text is transformed into a sequence of tokens (i.e. indivisible lexical units; the programming language equivalent of "words"). Rust has various kinds of tokens, such as:
- Identifiers:
foo
,Bambous
,self
,we_can_dance
,LaCaravane
, … - Literals:
42
,72u32
,0_______0
,1.0e-40
,"ferris was here"
, … - Keywords:
_
,fn
,self
,match
,yield
,macro
, … - Symbols:
[
,:
,::
,?
,~
,@
1, …
…among others. There are some things to note about the above: first, self
is both an identifier
and a keyword. In almost all cases, self
is a keyword, but it is possible for it to be
treated as an identifier, which will come up later (along with much cursing). Secondly, the list
of keywords includes some suspicious entries such as yield
and macro
that aren't actually in
the language, but are parsed by the compiler—these are reserved for future use. Third, the list
of symbols also includes entries that aren't used by the language. In the case of <-
, it is
vestigial: it was removed from the grammar, but not from the lexer. As a final point, note that
::
is a distinct token; it is not simply two adjacent :
tokens. The same is true of all
multi-character symbol tokens in Rust, as of Rust 1.2.2
@
has a purpose, though most people seem to forget about it completely: it is used
in patterns to bind a non-terminal part of the pattern to a name.
Technically rust currently(1.46) has two lexers, rustc_lexer
which only emits
single character symbols as tokens and the lexer in rustc_parse
which sees multi-character
symbols as distinct tokens.
As a point of comparison, it is at this stage that some languages have their macro layer, though Rust does not. For example, C/C++ macros are effectively processed at this point. 3 This is why the following code works: 4
#define SUB void
#define BEGIN {
#define END }
SUB main() BEGIN
printf("Oh, the horror!\n");
END
In fact, the C preprocessor uses a different lexical structure to C itself, but the distinction is broadly irrelevant.
Whether it should work is an entirely different question.
Parsing
The next stage is parsing, where the stream of tokens is turned into an Abstract Syntax Tree (AST).
This involves building up the syntactic structure of the program in memory. For example, the token
sequence 1 + 2
is transformed into the equivalent of:
┌─────────┐ ┌─────────┐
│ BinOp │ ┌╴│ LitInt │
│ op: Add │ │ │ val: 1 │
│ lhs: ◌ │╶┘ └─────────┘
│ rhs: ◌ │╶┐ ┌─────────┐
└─────────┘ └╴│ LitInt │
│ val: 2 │
└─────────┘
The AST contains the structure of the entire program, though it is based on purely lexical
information. For example, although the compiler may know that a particular expression is referring
to a variable called a
, at this stage, it has no way of knowing what a
is, or even where it
comes from.
It is after the AST has been constructed that macros are processed. However, before we can discuss that, we have to talk about token trees.
Token trees
Token trees are somewhere between tokens and the AST. Firstly, almost all tokens are also token trees; more specifically, they are leaves. There is one other kind of thing that can be a token tree leaf, but we will come back to that later.
The only basic tokens that are not leaves are the "grouping" tokens: (...)
, [...]
, and {...}
.
These three are the interior nodes of token trees, and what give them their structure. To give a
concrete example, this sequence of tokens:
a + b + (c + d[0]) + e
would be parsed into the following token trees:
«a» «+» «b» «+» «( )» «+» «e»
╭────────┴──────────╮
«c» «+» «d» «[ ]»
╭─┴─╮
«0»
Note that this has no relationship to the AST the expression would produce; instead of a single root node, there are seven token trees at the root level. For reference, the AST would be:
┌─────────┐
│ BinOp │
│ op: Add │
┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ BinOp │
│ name: a │ │ op: Add │
└─────────┘ ┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ BinOp │
│ name: b │ │ op: Add │
└─────────┘ ┌╴│ lhs: ◌ │
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ BinOp │╶┘ └─────────┘ └╴│ Var │
│ op: Add │ │ name: e │
┌╴│ lhs: ◌ │ └─────────┘
┌─────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ Index │
│ name: c │ ┌╴│ arr: ◌ │
└─────────┘ ┌─────────┐ │ │ ind: ◌ │╶┐ ┌─────────┐
│ Var │╶┘ └─────────┘ └╴│ LitInt │
│ name: d │ │ val: 0 │
└─────────┘ └─────────┘
It is important to understand the distinction between the AST and token trees. When writing macros, you have to deal with both as distinct things.
One other aspect of this to note: it is impossible to have an unpaired paren, bracket or brace; nor is it possible to have incorrectly nested groups in a token tree.
Macros in the AST
As previously mentioned, macro processing in Rust happens after the construction of the AST. As such, the syntax used to invoke a macro must be a proper part of the language's syntax. In fact, there are several "syntax extension" forms which are part of Rust's syntax. Specifically, the following 4 forms (by way of examples):
# [ $arg ]
; e.g.#[derive(Clone)]
,#[no_mangle]
, …# ! [ $arg ]
; e.g.#![allow(dead_code)]
,#![crate_name="blang"]
, …$name ! $arg
; e.g.println!("Hi!")
,concat!("a", "b")
, …$name ! $arg0 $arg1
; e.g.macro_rules! dummy { () => {}; }
.
The first two are attributes which annotate items, expressions and statements. They can be classified into different kinds, built-in attributes, macro attributes and derive attributes. macro attributes and derive attributes can be implemented with the second macro system that Rust offers, procedural macros. built-in attributes on the other hand are attributes implemented by the compiler.
The third form, $name ! $arg
, is the one of interest to us: function-like macros. It is the form
available for use with macro_rules!
macros. Note that this form is not limited to macro_rules!
macros: it is a generic syntax extension form. For example, whilst format!
is a macro,
format_args!
(which is used to implement format!
) is not.
The fourth form is essentially a variation which is not available to macros. In fact, the only case
where this form is used at all is with macro_rules!
itself which, again we will come back to.
Disregarding all but the third form ($name ! $arg
), the question becomes: how does the Rust parser
know what $arg
looks like for every possible syntax extension? The answer is that it doesn't
have to. Instead, the argument of a syntax extension invocation is a single token tree. More
specifically, it is a single, non-leaf token tree; (...)
, [...]
, or {...}
. With that
knowledge, it should become apparent how the parser can understand all of the following invocation
forms:
bitflags! {
struct Color: u8 {
const RED = 0b0001,
const GREEN = 0b0010,
const BLUE = 0b0100,
const BRIGHT = 0b1000,
}
}
lazy_static! {
static ref FIB_100: u32 = {
fn fib(a: u32) -> u32 {
match a {
0 => 0,
1 => 1,
a => fib(a-1) + fib(a-2)
}
}
fib(100)
};
}
fn main() {
use Color::*;
let colors = vec![RED, GREEN, BLUE];
println!("Hello, World!");
}
Although the above invocations may look like they contain various kinds of Rust code, the parser simply sees a collection of meaningless token trees. To make this clearer, we can replace all these syntactic "black boxes" with ⬚, leaving us with:
bitflags! ⬚
lazy_static! ⬚
fn main() {
let colors = vec! ⬚;
println! ⬚;
}
Just to reiterate: the parser does not assume anything about ⬚; it remembers the tokens it contains, but doesn't try to understand them. This means ⬚ can be anything, even invalid Rust! As to why this is a good thing, we will come back to that at a later point.
The important takeaways are:
- There are multiple kinds of syntax extension in Rust. We will only be talking about macros
defined by the
macro_rules!
construct. - Just because you see something of the form
$name! $arg
, doesn't mean it's actually a macro; it might be another kind of syntax extension, a procedural macro for example. - The input to every macro is a single non-leaf token tree.
- Macros (really, syntax extensions in general) are parsed as part of the abstract syntax tree.
Aside: due to the first point, some of what will be said below (including the next paragraph) will apply to syntax extensions in general.1
This is rather convenient as "macro" is much quicker and easier to type than "syntax extension".
The last point is the most important, as it has significant implications. Because macros are parsed into the AST, they can only appear in positions where they are explicitly supported. Specifically macros can appear in place of the following:
- Patterns
- Statements
- Expressions
- Items(this includes
impl
Items) - Types
Some things not on this list:
- Identifiers
- Match arms
- Struct fields
There is absolutely, definitely no way to use macros in any position not on the first list.
Expansion
Expansion is a relatively simple affair. At some point after the construction of the AST, but before the compiler begins constructing its semantic understanding of the program, it will expand all macros.
This involves traversing the AST, locating macro invocations and replacing them with their expansion. In the case of non-macro syntax extensions, how this happens is up to the particular syntax extension. That said, syntax extensions go through exactly the same process that macros do once their expansion is complete.
Once the compiler has run a syntax extension, it expects the result to be parsable as one of a limited set of syntax elements, based on context. For example, if you invoke a macro at module scope, the compiler will parse the result into an AST node that represents an item. If you invoke a macro in expression position, the compiler will parse the result into an expression AST node.
In fact, it can turn a syntax extension result into any of the following:
- an expression,
- a pattern,
- a type,
- zero or more items, or
- zero or more statements.
In other words, where you can invoke a macro determines what its result will be interpreted as.
The compiler will take this AST node and completely replace the macro's invocation node with the output node. This is a structural operation, not a textural one!
For example, consider the following:
let eight = 2 * four!();
We can visualize this partial AST as follows:
┌─────────────┐
│ Let │
│ name: eight │ ┌─────────┐
│ init: ◌ │╶─╴│ BinOp │
└─────────────┘ │ op: Mul │
┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌────────────┐
│ LitInt │╶┘ └─────────┘ └╴│ Macro │
│ val: 2 │ │ name: four │
└────────┘ │ body: () │
└────────────┘
From context, four!()
must expand to an expression (the initializer can only be an expression).
Thus, whatever the actual expansion is, it will be interpreted as a complete expression. In this
case, we will assume four!
is defined such that it expands to the expression 1 + 3
. As a result,
expanding this invocation will result in the AST changing to:
┌─────────────┐
│ Let │
│ name: eight │ ┌─────────┐
│ init: ◌ │╶─╴│ BinOp │
└─────────────┘ │ op: Mul │
┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌─────────┐
│ LitInt │╶┘ └─────────┘ └╴│ BinOp │
│ val: 2 │ │ op: Add │
└────────┘ ┌╴│ lhs: ◌ │
┌────────┐ │ │ rhs: ◌ │╶┐ ┌────────┐
│ LitInt │╶┘ └─────────┘ └╴│ LitInt │
│ val: 1 │ │ val: 3 │
└────────┘ └────────┘
This can be written out like so:
let eight = 2 * (1 + 3);
Note that we added parentheses despite them not being in the expansion. Remember that the compiler always treats the expansion of a macro as a complete AST node, not as a mere sequence of tokens. To put it another way, even if you don't explicitly wrap a complex expression in parentheses, there is no way for the compiler to "misinterpret" the result, or change the order of evaluation.
It is important to understand that macro expansions are treated as AST nodes, as this design has two further implications:
- In addition to there being a limited number of invocation positions, macros can only expand to the kind of AST node the parser expects at that position.
- As a consequence of the above, macros absolutely cannot expand to incomplete or syntactically invalid constructs.
There is one further thing to note about expansion: what happens when a syntax extension expands to
something that contains another syntax extension invocation. For example, consider an alternative
definition of four!
; what happens if it expands to 1 + three!()
?
let x = four!();
Expands to:
let x = 1 + three!();
This is resolved by the compiler checking the result of expansions for additional macro invocations, and expanding them. Thus, a second expansion step turns the above into:
let x = 1 + 3;
The takeaway here is that expansion happens in "passes"; as many as is needed to completely expand all invocations.
Well, not quite. In fact, the compiler imposes an upper limit on the number of such recursive passes it is willing to run before giving up. This is known as the macro recursion limit and defaults to 32. If the 32nd expansion contains a macro invocation, the compiler will abort with an error indicating that the recursion limit was exceeded.
This limit can be raised using the #![recursion_limit="…"]
attribute, though it must be done
crate-wide. Generally, it is recommended to try and keep macros below this limit wherever possible
as it may impact compilation times.
macro_rules!
With all that in mind, we can introduce macro_rules!
itself. As noted previously, macro_rules!
is itself a syntax extension, meaning it is technically not part of the Rust syntax. It uses the
following forms:
macro_rules! $name {
$rule0 ;
$rule1 ;
// …
$ruleN ;
}
There must be at least one rule, and you can omit the semicolon after the last rule. You can use
brackets([]
), parentheses(()
) or braces({}
). Invocations with { .. }
and ( ... );
, notice
the trailing semicolon, will always be parsed as an item.
Each "rule" looks like the following:
($matcher) => {$expansion}
Like before, the types of parentheses used can be any kind, but parentheses around the matcher and braces around the expansion are somewhat conventional. The expansion part of a rule is also called its transcriber.
If you are wondering, the macro_rules!
invocation expands to... nothing. At least, nothing that
appears in the AST; rather, it manipulates compiler-internal structures to register the macro. As
such, you can technically use macro_rules!
in any position where an empty expansion is valid.
Matching
When a macro_rules!
macro is invoked, the macro_rules!
interpreter goes through the rules one by
one, in declaration order. For each rule, it tries to match the contents of the input token tree
against that rule's matcher
. A matcher must match the entirety of the input to be considered a
match.
If the input matches the matcher, the invocation is replaced by the expansion
; otherwise, the next
rule is tried. If all rules fail to match, macro expansion fails with an error.
The simplest example is of an empty matcher:
macro_rules! four {
() => { 1 + 3 };
}
This matches if and only if the input is also empty (i.e. four!()
, four![]
or four!{}
).
Note that the specific grouping tokens you use when you invoke the macro are not matched. That is,
you can invoke the above macro as four![]
and it will still match. Only the contents of the
input token tree are considered.
Matchers can also contain literal token trees, which must be matched exactly. This is done by simply
writing the token trees normally. For example, to match the sequence 4 fn ['spang "whammo"] @_@
,
you would write:
macro_rules! gibberish {
(4 fn ['spang "whammo"] @_@) => {...};
}
You can use any token tree that you can write.
Metavariables
Matchers can also contain captures. These allow input to be matched based on some general grammar category, with the result captured to a metavariable which can then be substituted into the output.
Captures are written as a dollar ($
) followed by an identifier, a colon (:
), and finally the
kind of capture which is also called the fragment-specifier, which must be one of the following:
block
: a block (i.e. a block of statements and/or an expression, surrounded by braces)expr
: an expressionident
: an identifier (this includes keywords)item
: an item, like a function, struct, module, impl, etc.lifetime
: a lifetime (e.g.'foo
,'static
, ...)literal
: a literal (e.g."Hello World!"
,3.14
,'🦀'
, ...)meta
: a meta item; the things that go inside the#[...]
and#![...]
attributespat
: a patternpath
: a path (e.g.foo
,::std::mem::replace
,transmute::<_, int>
, …)stmt
: a statementtt
: a single token treety
: a typevis
: a possible empty visibility qualifier (e.g.pub
,pub(in crate)
, ...)
For more in-depth description of the fragement specifiers, check out the Fragment Specifiers chapter.
For example, here is a macro_rules!
macro which captures its input as an expression under the
metavariable $e
:
macro_rules! one_expression {
($e:expr) => {...};
}
These metavariables leverage the Rust compiler's parser, ensuring that they are always "correct". An
expr
metavariables will always capture a complete, valid expression for the version of Rust being
compiled.
You can mix literal token trees and metavariables, within limits ([explained below]).
To refer to a metavariable you simply write $name
, as the type of the variable is already
specified in the matcher. For example:
macro_rules! times_five {
($e:expr) => { 5 * $e };
}
Much like macro expansion, metavariables are substituted as complete AST nodes. This means that no
matter what sequence of tokens is captured by $e
, it will be interpreted as a single, complete
expression.
You can also have multiple metavariables in a single matcher:
macro_rules! multiply_add {
($a:expr, $b:expr, $c:expr) => { $a * ($b + $c) };
}
And use them as often as you like in the expansion:
macro_rules! discard {
($e:expr) => {};
}
macro_rules! repeat {
($e:expr) => { $e; $e; $e; };
}
There is also a special metavariable called $crate
which can be used to refer to the current
crate.
Repetitions
Matchers can contain repetitions. These allow a sequence of tokens to be matched. These have the
general form $ ( ... ) sep rep
.
-
$
is a literal dollar token. -
( ... )
is the paren-grouped matcher being repeated. -
sep
is an optional separator token. It may not be a delimiter or one of the repetition operators. Common examples are,
and;
. -
rep
is the required repeat operator. Currently, this can be:?
: indicating at most one repetition*
: indicating zero or more repetitions+
: indicating one or more repetitions
Since
?
represents at most one occurrence, it cannot be used with a separator.
Repetitions can contain any other valid matcher, including literal token trees, metavariables, and other repetitions allowing arbitrary nesting.
Repetitions use the same syntax in the expansion and repeated metavariables can only be accessed inside of repetitions in the expansion.
For example, below is a macro which formats each element as a string. It matches zero or more comma-separated expressions and expands to an expression that constructs a vector.
macro_rules! vec_strs { ( // Start a repetition: $( // Each repeat must contain an expression... $element:expr ) // ...separated by commas... , // ...zero or more times. * ) => { // Enclose the expansion in a block so that we can use // multiple statements. { let mut v = Vec::new(); // Start a repetition: $( // Each repeat will contain the following statement, with // $element replaced with the corresponding expression. v.push(format!("{}", $element)); )* v } }; } fn main() { let s = vec_strs![1, "a", true, 3.14159f32]; assert_eq!(s, &["1", "a", "true", "3.14159"]); }
You can repeat multiple metavariables in a single repetition as long as all metavariables repeat equally often. So this invocation of the following macro works:
#![allow(unused)] fn main() { macro_rules! repeat_two { ($($i:ident)*, $($i2:ident)*) => { $( let $i: (); let $i2: (); )* } } repeat_two!( a b c d e f, u v w x y z ); }
But this does not:
#![allow(unused)] fn main() { macro_rules! repeat_two { ($($i:ident)*, $($i2:ident)*) => { $( let $i: (); let $i2: (); )* } } repeat_two!( a b c d e f, x y z ); }
failing with the following error
error: meta-variable `i` repeats 6 times, but `i2` repeats 3 times
--> src/main.rs:6:10
|
6 | $( let $i: (); let $i2: (); )*
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For the complete grammar definition you may want to consult the Macros By Example chapter of the Rust reference.
Minutiae
This section goes through some of the finer details of the macro_rules!
system. At a minimum, you should
try to be at least aware of these details and issues.
Fragment Specifiers
As shown in the [macro_rules
] chapter, Rust, as of 1.46, has 13 fragment specifiers. This section
will go a bit more into detail for some of them and tries to always show a few examples of what a
matcher can match with.
Note that capturing with anything but the
ident
,lifetime
andtt
fragments will render the captured AST opaque, making it impossible to further inspect it in future macro invocations.
item
The item
fragment simply matches any of Rust's
item definitions, not identifiers that refer to
items. Item examples:
macro_rules! items { ($($item:item)*) => (); } items! { struct Foo; enum Bar { Baz } impl Foo {} /*...*/ } fn main() {}
block
The block
fragment solely matches a block expression,
which consists of and opening {
brace, followed by any amount of statements and finally followed
by a closing }
brace.
macro_rules! blocks { ($($block:block)*) => (); } blocks! { {} { let zig; } { 2 } } fn main() {}
stmt
The statement
fragment solely matches a statement
without its trailing semicolon, unless its an item statement that requires one. What would be an
item statement that requires one? A Unit-Struct would be a simple one, as defining one requires a
trailing semicolon.
Let's use a simple example to show exactly what is meant with this. We use a macro that merely emits what it captures:
macro_rules! statements {
($($stmt:stmt)*) => ($($stmt)*);
}
fn main() {
statements! {
struct Foo;
fn foo() {}
let zig = 3
let zig = 3;
3
3;
if true {} else {}
{}
}
}
Expanding this, via the playground for example1, gives us roughly the following:
/* snip */
fn main() {
struct Foo;
fn foo() { }
let zig = 3;
let zig = 3;
;
3;
3;
;
if true { } else { }
{ }
}
From this we can tell a few things:
The first you should be able to see immediately is that while the stmt
fragment doesn't capture
trailing semicolons, it still emits them when required, even if the statement is already followed by
one. The simple reason for that is that semicolons on their own are already valid statements. So we
are actually invoking our macro here with not 8 statements, but 11!
Another thing you should be able to notice here is that the trailing semicolon of the struct Foo;
item statement is being matched, otherwise we would've seen an extra one like in the other cases.
This makes sense as we already said, that for item statements that require one, the trailing
semicolon will be matched with.
A last observation is that expressions get emitted back with a trailing semicolon, unless the expression solely consists of only a block expression or control flow expression.
The fine details of what was just mentioned here can be looked up in the reference.
See the debugging chapter for tips on how to do this.
pat
The pat
fragment matches any kind of pattern.
macro_rules! patterns { ($($pat:pat)*) => (); } patterns! { "literal" _ 0..5 ref mut PatternsAreNice } fn main() {}
expr
The expr
fragment matches any kind of expression
(Rust has a lot of them, given it is an expression orientated language).
macro_rules! expressions { ($($expr:expr)*) => (); } expressions! { "literal" funcall() future.await break 'foo bar } fn main() {}
ty
The ty
fragment matches any kind of type expression.
A type expression is the syntax with which one refers to a type in the language.
macro_rules! types { ($($type:ty)*) => (); } types! { foo::bar bool [u8] } fn main() {}
ident
The ident
fragment matches an identifier
or keyword.
macro_rules! idents { ($($ident:ident)*) => (); } idents! { // _ <- This is not an ident, it is a pattern foo async O_________O _____O_____ } fn main() {}
path
The path
fragment matches a so called TypePath
style path.
macro_rules! paths { ($($path:path)*) => (); } paths! { ASimplePath ::A::B::C::D G::<eneri>::C } fn main() {}
tt
The tt
fragment matches a TokenTree. If you need a refresher on what exactly a TokenTree was you
may want to revisit the TokenTree chapter of this
book. The tt
fragment is one of the most powerful fragments, as it can match nearly anything while
still allowing you to inspect the contents of it at a later state in the macro.
meta
The meta
fragment matches an attribute, to
be more precise, the contents of an attribute. You will usually see this fragment being used in a
matcher like #[$meta:meta]
or #![$meta:meta]
.
macro_rules! metas { ($($meta:meta)*) => (); } metas! { ASimplePath super::man path = "home" foo(bar) } fn main() {}
A neat thing about doc comments: They are actually attributes in the form of
#[doc="…"]
where the...
is the actual comment string, meaning you can act on doc comments in macros!
lifetime
The lifetime
fragment matches a lifetime or label.
It's quite similar to ident
but with a prepended '
.
macro_rules! lifetimes { ($($lifetime:lifetime)*) => (); } lifetimes! { 'static 'shiv '_ } fn main() {}
vis
The vis
fragment matches a possibly empty Visibility qualifier.
Emphasis lies on the possibly empty part. You can think of this fragment having an implicit ?
repetition to it, meaning you don't, and in fact cannot, wrap it in a direct repetition.
macro_rules! visibilities { // ∨~~Note this comma, since we cannot repeat a `vis` fragment on its own ($($vis:vis,)*) => (); } visibilities! { , pub, pub(crate), pub(in super), pub(in some_path), } fn main() {}
literal
The literal
fragment matches any literal expression.
macro_rules! literals { ($($literal:literal)*) => (); } literals! { -1 "hello world" 2.3 b'b' true } fn main() {}
Metavariables and Expansion Redux
Once the parser begins consuming tokens for a metavariable, it cannot stop or backtrack. This means that the second rule of the following macro cannot ever match, no matter what input is provided:
macro_rules! dead_rule {
($e:expr) => { ... };
($i:ident +) => { ... };
}
Consider what happens if this macro is invoked as dead_rule!(x+)
. The interpreter will start at
the first rule, and attempt to parse the input as an expression. The first token x
is valid as
an expression. The second token is also valid in an expression, forming a binary addition node.
At this point, given that there is no right-hand side of the addition, you might expect the parser to give up and try the next rule. Instead, the parser will panic and abort the entire compilation, citing a syntax error.
As such, it is important in general that you write macro rules from most-specific to least-specific.
To defend against future syntax changes altering the interpretation of macro input, macro_rules!
restricts what can follow various metavariables. The complete list, showing what may follow what
fragment specifier, as of Rust 1.46 is as follows:
stmt
andexpr
:=>
,,
, or;
pat
:=>
,,
,=
,|
,if
,in
path
andty
:=>
,,
,=
,|
,;
,:
,>
,>>
,[
,{
,as
,where
, or a macro variable of theblock
fragment specifier.vis
:,
, an identifier other than a non-rawpriv
, any token that can begin a type or a metavariable with anident
,ty
, orpath
fragment specifier.- All other fragment specifiers have no restrictions.
Repetitions also adhere to these restrictions, meaning if a repetition can repeat multiple times
(*
or +
), then the contents must be able to follow themselves. If a repetition can repeat zero
times (?
or *
) then what comes after the repetition must be able to follow what comes before.
The parser also does not perform any kind of lookahead. That means if the compiler cannot unambiguously determine how to parse the macro invocation one token at a time, it will abort with an ambiguity error. A simple example that triggers this:
#![allow(unused)] fn main() { macro_rules! ambiguity { ($($i:ident)* $i2:ident) => { }; } // error: // local ambiguity: multiple parsing options: built-in NTs ident ('i') or ident ('i2'). ambiguity!(an_identifier); }
The parser does not look ahead past the identifier to see if the following token is a )
, which
would allow it to parse properly.
One aspect of substitution that often surprises people is that substitution is not token-based, despite very much looking like it.
Consider the following:
macro_rules! capture_then_match_tokens { ($e:expr) => {match_tokens!($e)}; } macro_rules! match_tokens { ($a:tt + $b:tt) => {"got an addition"}; (($i:ident)) => {"got an identifier"}; ($($other:tt)*) => {"got something else"}; } fn main() { println!("{}\n{}\n{}\n", match_tokens!((caravan)), match_tokens!(3 + 6), match_tokens!(5)); println!("{}\n{}\n{}", capture_then_match_tokens!((caravan)), capture_then_match_tokens!(3 + 6), capture_then_match_tokens!(5)); }
The output is:
got an identifier
got an addition
got something else
got something else
got something else
got something else
By parsing the input into an AST node, the substituted result becomes un-destructible; i.e. you cannot examine the contents or match against it ever again.
Here is another example which can be particularly confusing:
macro_rules! capture_then_what_is { (#[$m:meta]) => {what_is!(#[$m])}; } macro_rules! what_is { (#[no_mangle]) => {"no_mangle attribute"}; (#[inline]) => {"inline attribute"}; ($($tts:tt)*) => {concat!("something else (", stringify!($($tts)*), ")")}; } fn main() { println!( "{}\n{}\n{}\n{}", what_is!(#[no_mangle]), what_is!(#[inline]), capture_then_what_is!(#[no_mangle]), capture_then_what_is!(#[inline]), ); }
The output is:
no_mangle attribute
inline attribute
something else (#[no_mangle])
something else (#[inline])
The only way to avoid this is to capture using the tt
, ident
or lifetime
kinds. Once you
capture with anything else, the only thing you can do with the result from then on is substitute it
directly into the output.
Hygiene
macro_rules!
macros in Rust are partially hygienic. Specifically, they are hygienic when it
comes to most identifiers, but not when it comes to generic type parameters or lifetimes.
Hygiene works by attaching an invisible "syntax context" value to all identifiers. When two identifiers are compared, both the identifiers' textual names and syntax contexts must be identical for the two to be considered equal.
To illustrate this, consider the following code:
macro_rules! using_a {
($e:expr) => {
{
let a = 42;
$e
}
}
}
let four = using_a!(a / 10);
We will use the background colour to denote the syntax context. Now, let's expand the macro invocation:
let four = { let a = 42; a / 10 };
First, recall that macro_rules!
invocations effectively disappear during expansion.
Second, if you attempt to compile this code, the compiler will respond with something along the following lines:
error[E0425]: cannot find value `a` in this scope
--> src/main.rs:13:21
|
13 | let four = using_a!(a / 10);
| ^ not found in this scope
Note that the background colour (i.e. syntax context) for the expanded macro changes as part of
expansion. Each macro_rules!
macro expansion is given a new, unique syntax context for its
contents. As a result, there are two different a
s in the expanded code: one in the first syntax
context, the second in the other. In other words, a
is not
the same identifier as a
, however similar they may appear.
That said, tokens that were substituted into the expanded output retain their original syntax context (by virtue of having been provided to the macro as opposed to being part of the macro itself). Thus, the solution is to modify the macro as follows:
macro_rules! using_a {
($a:ident, $e:expr) => {
{
let $a = 42;
$e
}
}
}
let four = using_a!(a, a / 10);
Which, upon expansion becomes:
let four = { let a = 42; a / 10 };
The compiler will accept this code because there is only one a
being used.
$crate
Hygiene is also the reason that we need the $crate
metavariable when our macro needs access to
other items in the defining crate. What this special metavariable does is that it expands to an
absolute path to the defining crate.
//// Definitions in the `helper_macro` crate.
#[macro_export]
macro_rules! helped {
// () => { helper!() } // This might lead to an error due to 'helper' not being in scope.
() => { $crate::helper!() }
}
#[macro_export]
macro_rules! helper {
() => { () }
}
//// Usage in another crate.
// Note that `helper_macro::helper` is not imported!
use helper_macro::helped;
fn unit() {
// but it still works due to `$crate` properly expanding to the crate path `helper_macro`
helped!();
}
Note that, because $crate
refers to the current crate, it must be used with a fully qualified
module path when referring to non-macro items:
#![allow(unused)] fn main() { pub mod inner { #[macro_export] macro_rules! call_foo { () => { $crate::inner::foo() }; } pub fn foo() {} } }
Non-Identifier Identifiers
There are two tokens which you are likely to run into eventually that look like identifiers, but aren't. Except when they are.
First is self
. This is very definitely a keyword. However, it also happens to fit the definition
of an identifier. In regular Rust code, there's no way for self
to be interpreted as an identifier,
but it can happen with macro_rules!
macros:
macro_rules! what_is { (self) => {"the keyword `self`"}; ($i:ident) => {concat!("the identifier `", stringify!($i), "`")}; } macro_rules! call_with_ident { ($c:ident($i:ident)) => {$c!($i)}; } fn main() { println!("{}", what_is!(self)); println!("{}", call_with_ident!(what_is(self))); }
The above outputs:
the keyword `self`
the keyword `self`
But that makes no sense; call_with_ident!
required an identifier, matched one, and substituted it!
So self
is both a keyword and not a keyword at the same time. You might wonder how this is in any
way important. Take this example:
macro_rules! make_mutable { ($i:ident) => {let mut $i = $i;}; } struct Dummy(i32); impl Dummy { fn double(self) -> Dummy { make_mutable!(self); self.0 *= 2; self } } fn main() { println!("{:?}", Dummy(4).double().0); }
This fails to compile with:
error: `mut` must be followed by a named binding
--> src/main.rs:2:24
|
2 | ($i:ident) => {let mut $i = $i;};
| ^^^^^^ help: remove the `mut` prefix: `self`
...
9 | make_mutable!(self);
| -------------------- in this macro invocation
|
= note: `mut` may be followed by `variable` and `variable @ pattern`
So the macro will happily match self
as an identifier, allowing you to use it in cases where you
can't actually use it. But, fine; it somehow remembers that self
is a keyword even when it's an
identifier, so you should be able to do this, right?
macro_rules! make_self_mutable { ($i:ident) => {let mut $i = self;}; } struct Dummy(i32); impl Dummy { fn double(self) -> Dummy { make_self_mutable!(mut_self); mut_self.0 *= 2; mut_self } } fn main() { println!("{:?}", Dummy(4).double().0); }
This fails with:
error[E0424]: expected value, found module `self`
--> src/main.rs:2:33
|
2 | ($i:ident) => {let mut $i = self;};
| ^^^^ `self` value is a keyword only available in methods with a `self` parameter
...
8 | / fn double(self) -> Dummy {
9 | | make_self_mutable!(mut_self);
| | ----------------------------- in this macro invocation
10 | | mut_self.0 *= 2;
11 | | mut_self
12 | | }
| |_____- this function has a `self` parameter, but a macro invocation can only access identifiers it receives from parameters
|
Now the compiler thinks we refer to our module with self
, but that doesn't make sense. We already
have a self
right there, in the function signature which is definitely not a module. It's almost
like it's complaining that the self
it's trying to use isn't the same self
... as though the
self
keyword has hygiene, like an... identifier.
macro_rules! double_method { ($body:expr) => { fn double(mut self) -> Dummy { $body } }; } struct Dummy(i32); impl Dummy { double_method! {{ self.0 *= 2; self }} } fn main() { println!("{:?}", Dummy(4).double().0); }
Same error. What about...
macro_rules! double_method { ($self_:ident, $body:expr) => { fn double(mut $self_) -> Dummy { $body } }; } struct Dummy(i32); impl Dummy { double_method! {self, { self.0 *= 2; self }} } fn main() { println!("{:?}", Dummy(4).double().0); }
At last, this works. So self
is both a keyword and an identifier when it feels like it.
Surely this works for other, similar constructs, right?
macro_rules! double_method { ($self_:ident, $body:expr) => { fn double($self_) -> Dummy { $body } }; } struct Dummy(i32); impl Dummy { double_method! {_, 0} } fn main() { println!("{:?}", Dummy(4).double().0); }
error: no rules expected the token `_`
--> src/main.rs:12:21
|
1 | macro_rules! double_method {
| -------------------------- when calling this macro
...
12 | double_method! {_, 0}
| ^ no rules expected this token in macro call
No, of course not. _
is a keyword that is valid in patterns and expressions, but somehow isn't
an identifier like the keyword self
is, despite matching the definition of an identifier just the
same.
You might think you can get around this by using $self_:pat
instead; that way, _
will match!
Except, no, because self
isn't a pattern. Joy.
The only work around for this (in cases where you want to accept some combination of these tokens)
is to use a tt
matcher instead.
Debugging
rustc
provides a number of tools to debug macro_rules!
macros. One of the most useful is
trace_macros!
, which is a directive to the compiler instructing it to dump every macro_rules!
macro invocation prior to expansion. For example, given the following:
// Note: make sure to use a nightly channel compiler.
#![feature(trace_macros)]
macro_rules! each_tt {
() => {};
($_tt:tt $($rest:tt)*) => {each_tt!($($rest)*);};
}
each_tt!(foo bar baz quux);
trace_macros!(true);
each_tt!(spim wak plee whum);
trace_macros!(false);
each_tt!(trom qlip winp xod);
fn main() {}
The output is:
note: trace_macro
--> src/main.rs:11:1
|
11 | each_tt!(spim wak plee whum);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: expanding `each_tt! { spim wak plee whum }`
= note: to `each_tt ! (wak plee whum) ;`
= note: expanding `each_tt! { wak plee whum }`
= note: to `each_tt ! (plee whum) ;`
= note: expanding `each_tt! { plee whum }`
= note: to `each_tt ! (whum) ;`
= note: expanding `each_tt! { whum }`
= note: to `each_tt ! () ;`
= note: expanding `each_tt! { }`
= note: to ``
This is particularly invaluable when debugging deeply recursive macro_rules!
macros. You can
also enable this from the command-line by adding -Z trace-macros
to the compiler command line.
Secondly, there is log_syntax!
which causes the compiler to output all tokens passed to it. For
example, this makes the compiler sing a song:
// Note: make sure to use a nightly channel compiler. #![feature(log_syntax)] macro_rules! sing { () => {}; ($tt:tt $($rest:tt)*) => {log_syntax!($tt); sing!($($rest)*);}; } sing! { ^ < @ < . @ * '\x08' '{' '"' _ # ' ' - @ '$' && / _ % ! ( '\t' @ | = > ; '\x08' '\'' + '$' ? '\x7f' , # '"' ~ | ) '\x07' } fn main() {}
This can be used to do slightly more targeted debugging than trace_macros!
.
Sometimes, it is what the macro expands to that proves problematic. For this, the --pretty
argument to the compiler can be used. Given the following code:
// Shorthand for initialising a `String`.
macro_rules! S {
($e:expr) => {String::from($e)};
}
fn main() {
let world = S!("World");
println!("Hello, {}!", world);
}
compiled with the following command:
rustc -Z unstable-options --pretty expanded hello.rs
produces the following output (modified for formatting):
#![feature(no_std, prelude_import)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std as std;
// Shorthand for initialising a `String`.
fn main() {
let world = String::from("World");
::std::io::_print(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str]
= &["Hello, ", "!\n"];
__STATIC_FMTSTR
},
&match (&world,) {
(__arg0,) => [
::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt)
],
}
));
}
Other options to --pretty
can be listed using rustc -Z unstable-options --help -v
; a full list
is not provided since, as implied by the name, any such list would be subject to change at any time.
But not just rustc
exposes means to aid in debugging macros. For the aforementioned
--pretty=expanded
option, there exists a nice cargo
addon called
cargo-expand
made by dtolnay
which is basically just a wrapper around it.
You can also use the playground, clicking on its TOOLS
button in
the top right gives you the option to expand macros right there!
Another amazing tool is lukaslueg
's
macro_railroad
, a tool that allows you visualize
and generate syntax diagrams for Rust's macro_rules!
macros. It visualizes the accepted
macro's grammar as an automata.
Scoping
The way in which function-like macros are scoped can be somewhat unintuitive. They use two forms of scopes: textual scope, and path-based scope.
When such a macro is invoked by an unqualified identifier(an identifier that isn't part of a mulit-part path), it is first looked up in textual scoping and then in path-based scoping should the first lookup not yield any results. If it is invoked by a qualified identifier it will skip the textual scoping lookup and instead only do a look up in the path-based scoping.
Textual Scope
Firstly, unlike everything else in the language, function-like macros will remain visible in sub-modules.
macro_rules! X { () => {}; } mod a { X!(); // defined } mod b { X!(); // defined } mod c { X!(); // defined } fn main() {}
Note: In these examples, remember that all of them have the same behavior when the module contents are in separate files.
Secondly, also unlike everything else in the language, macro_rules!
macros are only accessible
after their definition. Also note that this example demonstrates how macro_rules!
macros do not
"leak" out of their defining scope:
mod a { // X!(); // undefined } mod b { // X!(); // undefined macro_rules! X { () => {}; } X!(); // defined } mod c { // X!(); // undefined } fn main() {}
To be clear, this lexical order dependency applies even if you move the macro to an outer scope:
mod a { // X!(); // undefined } macro_rules! X { () => {}; } mod b { X!(); // defined } mod c { X!(); // defined } fn main() {}
However, this dependency does not apply to macros themselves:
mod a { // X!(); // undefined } macro_rules! X { () => { Y!(); }; } mod b { // X!(); // defined, but Y! is undefined } macro_rules! Y { () => {}; } mod c { X!(); // defined, and so is Y! } fn main() {}
Defining macro_rules!
macros multiple times is allowed and the most recent declaration will simply
shadow previous ones unless it has gone out of scope.
#![allow(unused)] fn main() { macro_rules! X { (1) => {}; } X!(1); macro_rules! X { (2) => {}; } // X!(1); // Error: no rule matches `1` X!(2); mod a { macro_rules! X { (3) => {}; } // X!(2); // Error: no rule matches `2` X!(3); } // X!(3); // Error: no rule matches `3` X!(2); }
macro_rules!
macros can be exported from a module using the #[macro_use]
attribute. Using this
on a module is similar to saying that you do not want to have the module's macro's scope end with
the module.
mod a { // X!(); // undefined } #[macro_use] mod b { macro_rules! X { () => {}; } X!(); // defined } mod c { X!(); // defined } fn main() {}
Note that this can interact in somewhat bizarre ways due to the fact that identifiers in a
macro_rules!
macro (including other macros) are only resolved upon expansion:
mod a { // X!(); // undefined } #[macro_use] mod b { macro_rules! X { () => { Y!(); }; } // X!(); // defined, but Y! is undefined } macro_rules! Y { () => {}; } mod c { X!(); // defined, and so is Y! } fn main() {}
Another complication is that #[macro_use]
applied to an extern crate
does not behave this way:
such declarations are effectively hoisted to the top of the module. Thus, assuming X!
is defined
in an external crate called mac
, the following holds:
mod a {
// X!(); // defined, but Y! is undefined
}
macro_rules! Y { () => {}; }
mod b {
X!(); // defined, and so is Y!
}
#[macro_use] extern crate macs;
mod c {
X!(); // defined, and so is Y!
}
fn main() {}
Finally, note that these scoping behaviors apply to functions as well, with the exception of
#[macro_use]
(which isn't applicable):
macro_rules! X { () => { Y!() }; } fn a() { macro_rules! Y { () => {"Hi!"} } assert_eq!(X!(), "Hi!"); { assert_eq!(X!(), "Hi!"); macro_rules! Y { () => {"Bye!"} } assert_eq!(X!(), "Bye!"); } assert_eq!(X!(), "Hi!"); } fn b() { macro_rules! Y { () => {"One more"} } assert_eq!(X!(), "One more"); } fn main() { a(); b(); }
These scoping rules are why a common piece of advice is to place all macro_rules!
macros which
should be accessible "crate wide" at the very top of your root module, before any other modules.
This ensures they are available consistently. This also applies to mod
definitions for files, as
in:
#[macro_use]
mod some_mod_that_defines_macros;
mod some_mod_that_uses_those_macros;
The order here is important, swap the declaration order and it won't compile.
Path-Based Scope
By default, a macro_rules!
macro has no path-based scope. However, if it has the #[macro_export]
attribute, then it is declared in the crate root scope and can be referred to similar to how you
refer to any other item. The Import and Export chapter goes more in-depth into said attribute.
Import and Export
Importing macro_rules!
macros differs between the two Rust Editions, 2015 and 2018. It is
recommended to read both parts nevertheless, as the 2018 Edition can still use the constructs that
are explained in the 2015 Edition.
Edition 2015
In Edition 2015 you have to use the #[macro_use]
attribute that has already been introduced in the
scoping chapter. This can be applied to either modules or external crates. For
example:
#[macro_use] mod macros { macro_rules! X { () => { Y!(); } } macro_rules! Y { () => {} } } X!(); fn main() {}
macro_rules!
macros can be exported from the current crate using #[macro_export]
. Note that this
ignores all visibility.
Given the following definition for a library package macs
:
mod macros {
#[macro_export] macro_rules! X { () => { Y!(); } }
#[macro_export] macro_rules! Y { () => {} }
}
// X! and Y! are *not* defined here, but *are* exported,
// despite `macros` being private.
The following code will work as expected:
X!(); // X is defined
#[macro_use] extern crate macs;
X!();
fn main() {}
This works, as said in the scoping chapter, because #[macro_use]
works slightly different on
extern crates, as it basically hoists the exported macros out of the crate to the top of the
module.
Note: you can only
#[macro_use]
an external crate from the root module.
Finally, when importing macro_rules!
macros from an external crate, you can control which macros
you import. You can use this to limit namespace pollution, or to override specific macros, like so:
// Import *only* the `X!` macro.
#[macro_use(X)] extern crate macs;
// X!(); // X is defined, but Y! is undefined
macro_rules! Y { () => {} }
X!(); // X is defined, and so is Y!
fn main() {}
When exporting macro_rules!
macros, it is often useful to refer to non-macro symbols in the
defining crate. Because crates can be renamed, there is a special substitution variable available:
$crate
. This will always expand to an absolute path prefix to the containing crate
(e.g. :: macs
).
Note that unless your compiler version is >= 1.30, this does not work for macro_rules!
macros,
since macro_rules!
macros do not interact with regular name resolution in any way. Otherwise, you
cannot use something like $crate::Y!
to refer to a particular macro within your crate. The
implication, combined with selective imports via #[macro_use]
is that there is currently no way
to guarantee any given macro will be available when imported by another crate.
It is recommended that you always use absolute paths to non-macro names, to avoid conflicts, including names in the standard library.
Edition 2018
The 2018 Edition made our lives a lot easier when it comes to macro_rules!
macros. Why you ask?
Quite simply because it managed to make them feel more like proper items than some special thing in
the language. What this means is that we can properly import and use them in a namespaced fashion!
So instead of using #[macro_use]
to import every exported macro from a crate into the global
namespace we can now do the following:
use some_crate::some_macro;
fn main() {
some_macro!("hello");
// as well as
some_crate::some_other_macro!("macro world");
}
Unfortunately, this only applies for external crates, if you use macro_rules!
macros that you have
defined in your own crate you are still required to go with #[macro_use]
on the defining modules.
So scoping applies there the same way as before as well.
The
$crate
prefix works in this version for everything, macros and items alike since this Edition came out with Rust 1.31.
Macros, A Practical Introduction
This chapter will introduce the Rust macro-by-example system using a relatively simple, practical example. It does not attempt to explain all of the intricacies of the system; its goal is to get you comfortable with how and why macros are written.
There is also the Macros chapter of the Rust Book which is another high-level explanation, and the methodical introduction chapter of this book, which explains the macro system in detail.
A Little Context
Note: don't panic! What follows is the only math that will be talked about. You can quite safely skip this section if you just want to get to the meat of the article.
If you aren't familiar, a recurrence relation is a sequence where each value is defined in terms of one or more previous values, with one or more initial values to get the whole thing started. For example, the Fibonacci sequence can be defined by the relation:
\[ F_{n} = 0, 1, ..., F_{n-1} + F_{n-2}\]
Thus, the first two numbers in the sequence are 0 and 1, with the third being \( F_{0} + F_{1} = 0 + 1 = 1\), the fourth \( F_{1} + F_{2} = 1 + 1 = 2\), and so on forever.
Now, because such a sequence can go on forever, that makes defining a fibonacci
function a
little tricky, since you obviously don't want to try returning a complete vector. What you want is
to return something which will lazily compute elements of the sequence as needed.
In Rust, that means producing an Iterator
. This is not especially hard, but there is a fair
amount of boilerplate involved: you need to define a custom type, work out what state needs to be
stored in it, then implement the Iterator
trait for it.
However, recurrence relations are simple enough that almost all of these details can be abstracted out with a little macro-based code generation.
So, with all that having been said, let's get started.
Construction
Usually, when working on a new macro, the first thing I do is decide what the macro invocation should look like. In this specific case, my first attempt looked like this:
let fib = recurrence![a[n] = 0, 1, ..., a[n-1] + a[n-2]];
for e in fib.take(10) { println!("{}", e) }
From that, we can take a stab at how the macro should be defined, even if we aren't sure of the actual expansion. This is useful because if you can't figure out how to parse the input syntax, then maybe you need to change it.
macro_rules! recurrence {
( a[n] = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ };
}
fn main() {}
Assuming you aren't familiar with the syntax, allow me to elucidate. This is defining a macro, using
the macro_rules!
system, called recurrence!
. This macro has a single parsing rule. That rule
says the input to the macro must match:
- the literal token sequence
a
[
n
]
=
, - a repeating (the
$( ... )
) sequence, using,
as a separator, and one or more (+
) repeats of:- a valid expression captured into the metavariable
inits
($inits:expr
)
- a valid expression captured into the metavariable
- the literal token sequence
,
...
,
, - a valid expression captured into the metavariable
recur
($recur:expr
).
Finally, the rule says that if the input matches this rule, then the macro invocation should be
replaced by the token sequence /* ... */
.
It's worth noting that inits
, as implied by the name, actually contains all the expressions that
match in this position, not just the first or last. What's more, it captures them as a sequence as
opposed to, say, irreversibly pasting them all together. Also note that you can do "zero or more"
with a repetition by using *
instead of +
and even optional, "zero or one" with ?
. There is no
support for "zero or one" or more specific numbers of repetitions.
As an exercise, let's take the proposed input and feed it through the rule, to see how it is
processed. The "Position" column will show which part of the syntax pattern needs to be matched
against next, denoted by a "⌂". Note that in some cases, there might be more than one possible
"next" element to match against. "Input" will contain all of the tokens that have not been
consumed yet. inits
and recur
will contain the contents of those bindings.
Position | Input | inits |
recur |
---|---|---|---|
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
a[n] = 0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
[n] = 0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
n] = 0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
] = 0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
= 0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
0, 1, ..., a[n-1] + a[n-2] |
||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ ⌂ |
, 1, ..., a[n-1] + a[n-2] |
0 |
|
Note: there are two ⌂ here, because the next input token might match either the comma separator betweenem> elements in the repetition, or the comma after the repetition. The macro system will keep track of both possibilities, until it is able to decide which one to follow. | |||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ ⌂ |
1, ..., a[n-1] + a[n-2] |
0 |
|
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ ⌂ |
, ..., a[n-1] + a[n-2] |
0 , 1 |
|
Note: the third, crossed-out marker indicates that the macro system has, as a consequence of the last token consumed, eliminated one of the previous possible branches. | |||
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ ⌂ |
..., a[n-1] + a[n-2] |
0 , 1 |
|
a[n] = $($inits:expr),+ , ... , $recur:expr
|
, a[n-1] + a[n-2] |
0 , 1 |
|
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
a[n-1] + a[n-2] |
0 , 1 |
|
a[n] = $($inits:expr),+ , ... , $recur:expr
⌂ |
0 , 1 |
a[n-1] + a[n-2] |
|
Note: this particular step should make it clear that a binding like $recur:expr will consume an entire expression, using the compiler's knowledge of what constitutes a valid expression. As will be noted later, you can do this for other language constructs, too. |
The key take-away from this is that the macro system will try to incrementally match the tokens provided as input to the macro against the provided rules. We'll come back to the "try" part.
Now, let's begin writing the final, fully expanded form. For this expansion, I was looking for something like:
let fib = {
struct Recurrence {
mem: [u64; 2],
pos: usize,
}
This will be the actual iterator type. mem
will be the memo buffer to hold the last few values so
the recurrence can be computed. pos
is to keep track of the value of n
.
Aside: I've chosen
u64
as a "sufficiently large" type for the elements of this sequence. Don't worry about how this will work out for other sequences; we'll come to it.
impl Iterator for Recurrence {
type Item = u64;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
We need a branch to yield the initial values of the sequence; nothing tricky.
} else {
let a = /* something */;
let n = self.pos;
let next_val = a[n-1] + a[n-2];
self.mem.TODO_shuffle_down_and_append(next_val);
self.pos += 1;
Some(next_val)
}
}
}
This is a bit harder; we'll come back and look at how exactly to define a
. Also,
TODO_shuffle_down_and_append
is another placeholder; I want something that places next_val
on
the end of the array, shuffling the rest down by one space, dropping the 0th element.
Recurrence { mem: [0, 1], pos: 0 }
};
for e in fib.take(10) { println!("{}", e) }
Lastly, return an instance of our new structure, which can then be iterated over. To summarise, the complete expansion is:
let fib = {
struct Recurrence {
mem: [u64; 2],
pos: usize,
}
impl Iterator for Recurrence {
type Item = u64;
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < 2 {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let a = /* something */;
let n = self.pos;
let next_val = (a[n-1] + a[n-2]);
self.mem.TODO_shuffle_down_and_append(next_val.clone());
self.pos += 1;
Some(next_val)
}
}
}
Recurrence { mem: [0, 1], pos: 0 }
};
for e in fib.take(10) { println!("{}", e) }
Aside: Yes, this does mean we're defining a different
Recurrence
struct and its implementation for each macro invocation. Most of this will optimise away in the final binary, with some judicious use of#[inline]
attributes.
It's also useful to check your expansion as you're writing it. If you see anything in the expansion
that needs to vary with the invocation, but isn't in the actual macro syntax, you should work out
where to introduce it. In this case, we've added u64
, but that's not neccesarily what the user
wants, nor is it in the macro syntax. So let's fix that.
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ }; } /* let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } */ fn main() {}
Here, I've added a new metavariable: sty
which should be a type.
Aside: if you're wondering, the bit after the colon in a metavariable can be one of several kinds of syntax matchers. The most common ones are
item
,expr
, andty
. A complete explanation can be found in Macros, A Methodical Introduction;macro_rules!
(Matchers).There's one other thing to be aware of: in the interests of future-proofing the language, the compiler restricts what tokens you're allowed to put after a matcher, depending on what kind it is. Typically, this comes up when trying to match expressions or statements; those can only be followed by one of
=>
,,
, and;
.A complete list can be found in Macros, A Methodical Introduction; Minutiae; Metavariables and Expansion Redux.
Indexing and Shuffling
I will skim a bit over this part, since it's effectively tangential to the macro stuff. We want to
make it so that the user can access previous values in the sequence by indexing a
; we want it to
act as a sliding window keeping the last few (in this case, 2) elements of the sequence.
We can do this pretty easily with a wrapper type:
struct IndexOffset<'a> {
slice: &'a [u64; 2],
offset: usize,
}
impl<'a> Index<usize> for IndexOffset<'a> {
type Output = u64;
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
let index = Wrapping(index);
let offset = Wrapping(self.offset);
let window = Wrapping(2);
let real_index = index - offset + window;
&self.slice[real_index.0]
}
}
Aside: since lifetimes come up a lot with people new to Rust, a quick explanation:
'a
and'b
are lifetime parameters that are used to track where a reference (i.e. a borrowed pointer to some data) is valid. In this case,IndexOffset
borrows a reference to our iterator's data, so it needs to keep track of how long it's allowed to hold that reference for, using'a
.
'b
is used because theIndex::index
function (which is how subscript syntax is actually implemented) is also parameterised on a lifetime, on account of returning a borrowed reference.'a
and'b
are not necessarily the same thing in all cases. The borrow checker will make sure that even though we don't explicitly relate'a
and'b
to one another, we don't accidentally violate memory safety.
This changes the definition of a
to:
let a = IndexOffset { slice: &self.mem, offset: n };
The only remaining question is what to do about TODO_shuffle_down_and_append
. I wasn't able to
find a method in the standard library with exactly the semantics I wanted, but it isn't hard to do
by hand.
{
use std::mem::swap;
let mut swap_tmp = next_val;
for i in (0..2).rev() {
swap(&mut swap_tmp, &mut self.mem[i]);
}
}
This swaps the new value into the end of the array, swapping the other elements down one space.
Aside: doing it this way means that this code will work for non-copyable types, as well.
The working code thus far now looks like this:
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { /* ... */ }; } fn main() { /* let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } */ let fib = { use std::ops::Index; struct Recurrence { mem: [u64; 2], pos: usize, } struct IndexOffset<'a> { slice: &'a [u64; 2], offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = u64; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b u64 { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(2); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = u64; #[inline] fn next(&mut self) -> Option<u64> { if self.pos < 2 { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let n = self.pos; let a = IndexOffset { slice: &self.mem, offset: n }; (a[n-1] + a[n-2]) }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..2).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [0, 1], pos: 0 } }; for e in fib.take(10) { println!("{}", e) } }
Note that I've changed the order of the declarations of n
and a
, as well as wrapped them
(along with the recurrence expression) in a block. The reason for the first should be obvious
(n
needs to be defined first so I can use it for a
). The reason for the second is that the
borrowed reference &self.mem
will prevent the swaps later on from happening (you cannot mutate
something that is aliased elsewhere). The block ensures that the &self.mem
borrow expires before
then.
Incidentally, the only reason the code that does the mem
swaps is in a block is to narrow the
scope in which std::mem::swap
is available, for the sake of being tidy.
If we take this code and run it, we get:
0
1
2
3
5
8
13
21
34
Success! Now, let's copy & paste this into the macro expansion, and replace the expanded code with an invocation. This gives us:
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ , ... , $recur:expr ) => { { /* What follows here is *literally* the code from before, cut and pasted into a new position. No other changes have been made. */ use std::ops::Index; struct Recurrence { mem: [u64; 2], pos: usize, } struct IndexOffset<'a> { slice: &'a [u64; 2], offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = u64; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b u64 { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(2); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = u64; #[inline] fn next(&mut self) -> Option<u64> { if self.pos < 2 { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let n = self.pos; let a = IndexOffset { slice: &self.mem, offset: n }; (a[n-1] + a[n-2]) }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..2).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [0, 1], pos: 0 } } }; } fn main() { let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } }
Obviously, we aren't using the metavariables yet, but we can change that fairly easily. However,
if we try to compile this, rustc
aborts, telling us:
error: local ambiguity: multiple parsing options: built-in NTs expr ('inits') or 1 other option.
--> src/main.rs:75:45
|
75 | let fib = recurrence![a[n]: u64 = 0, 1, ..., a[n-1] + a[n-2]];
|
Here, we've run into a limitation of macro_rules
. The problem is that second comma. When it sees
it during expansion, macro_rules
can't decide if it's supposed to parse another expression for
inits
, or ...
. Sadly, it isn't quite clever enough to realise that ...
isn't a valid
expression, so it gives up. Theoretically, this should work as desired, but currently doesn't.
Aside: I did fib a little about how our rule would be interpreted by the macro system. In general, it should work as described, but doesn't in this case. The
macro_rules
machinery, as it stands, has its foibles, and its worthwhile remembering that on occasion, you'll need to contort a little to get it to work.In this particular case, there are two issues. First, the macro system doesn't know what does and does not constitute the various grammar elements (e.g. an expression); that's the parser's job. As such, it doesn't know that
...
isn't an expression. Secondly, it has no way of trying to capture a compound grammar element (like an expression) without 100% committing to that capture.In other words, it can ask the parser to try and parse some input as an expression, but the parser will respond to any problems by aborting. The only way the macro system can currently deal with this is to just try to forbid situations where this could be a problem.
On the bright side, this is a state of affairs that exactly no one is enthusiastic about. The
macro
keyword has already been reserved for a more rigorously-defined future macro system. Until then, needs must.
Thankfully, the fix is relatively simple: we remove the comma from the syntax. To keep things
balanced, we'll remove both commas around ...
:
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ ... $recur:expr ) => { // ^~~ changed /* ... */ // Cheat :D (vec![0u64, 1, 2, 3, 5, 8, 13, 21, 34]).into_iter() }; } fn main() { let fib = recurrence![a[n]: u64 = 0, 1 ... a[n-1] + a[n-2]]; // ^~~ changed for e in fib.take(10) { println!("{}", e) } }
Success! ... or so I thought. Turns out this is being rejected by the compiler nowadays, while it
was fine back when this was written. The reason for this is that the compiler now recognizes the
...
as a token, and as we know we may only use =>
, ,
or ;
after an expression fragment. So
unfortunately we are now out of luck as our dreamed up syntax will not work out this way, so let us
just choose one that looks the most befitting that we are allowed to use instead,
I'd say replacing ,
with ;
works.
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { // ^~~~~~^ changed /* ... */ // Cheat :D (vec![0u64, 1, 2, 3, 5, 8, 13, 21, 34]).into_iter() }; } fn main() { let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]]; // ^~~~~^ changed for e in fib.take(10) { println!("{}", e) } }
Success! But for real this time.
Substitution
Substituting something you've captured in a macro is quite simple; you can insert the contents of a
metavariable $sty:ty
by using $sty
. So, let's go through and fix the u64
s:
macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { { use std::ops::Index; struct Recurrence { mem: [$sty; 2], // ^~~~ changed pos: usize, } struct IndexOffset<'a> { slice: &'a [$sty; 2], // ^~~~ changed offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = $sty; // ^~~~ changed #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b $sty { // ^~~~ changed use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(2); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = $sty; // ^~~~ changed #[inline] fn next(&mut self) -> Option<$sty> { // ^~~~ changed /* ... */ if self.pos < 2 { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let n = self.pos; let a = IndexOffset { slice: &self.mem, offset: n }; (a[n-1] + a[n-2]) }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..2).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [0, 1], pos: 0 } } }; } fn main() { let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } }
Let's tackle a harder one: how to turn inits
into both the array literal [0, 1]
and the array
type, [$sty; 2]
. The first one we can do like so:
Recurrence { mem: [$($inits),+], pos: 0 }
// ^~~~~~~~~~~ changed
This effectively does the opposite of the capture: repeat inits
one or more times, separating each
with a comma. This expands to the expected sequence of tokens: 0, 1
.
Somehow turning inits
into a literal 2
is a little trickier. It turns out that there's no direct
way to do this, but we can do it by using a second macro. Let's take this one step at a time.
macro_rules! count_exprs { /* ??? */ () => {} } fn main() {}
The obvious case is: given zero expressions, you would expect count_exprs
to expand to a literal
0
.
macro_rules! count_exprs { () => (0); // ^~~~~~~~~~ added } fn main() { const _0: usize = count_exprs!(); assert_eq!(_0, 0); }
Aside: You may have noticed I used parentheses here instead of curly braces for the expansion.
macro_rules
really doesn't care what you use, so long as it's one of the "matcher" pairs:( )
,{ }
or[ ]
. In fact, you can switch out the matchers on the macro itself (i.e. the matchers right after the macro name), the matchers around the syntax rule, and the matchers around the corresponding expansion.You can also switch out the matchers used when you invoke a macro, but in a more limited fashion: a macro invoked as
{ ... }
or( ... );
will always be parsed as an item (i.e. like astruct
orfn
declaration). This is important when using macros in a function body; it helps disambiguate between "parse like an expression" and "parse like a statement".
What if you have one expression? That should be a literal 1
.
macro_rules! count_exprs { () => (0); ($e:expr) => (1); // ^~~~~~~~~~~~~~~~~ added } fn main() { const _0: usize = count_exprs!(); const _1: usize = count_exprs!(x); assert_eq!(_0, 0); assert_eq!(_1, 1); }
Two?
macro_rules! count_exprs { () => (0); ($e:expr) => (1); ($e0:expr, $e1:expr) => (2); // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ added } fn main() { const _0: usize = count_exprs!(); const _1: usize = count_exprs!(x); const _2: usize = count_exprs!(x, y); assert_eq!(_0, 0); assert_eq!(_1, 1); assert_eq!(_2, 2); }
We can "simplify" this a little by re-expressing the case of two expressions recursively.
macro_rules! count_exprs { () => (0); ($e:expr) => (1); ($e0:expr, $e1:expr) => (1 + count_exprs!($e1)); // ^~~~~~~~~~~~~~~~~~~~~ changed } fn main() { const _0: usize = count_exprs!(); const _1: usize = count_exprs!(x); const _2: usize = count_exprs!(x, y); assert_eq!(_0, 0); assert_eq!(_1, 1); assert_eq!(_2, 2); }
This is fine since Rust can fold 1 + 1
into a constant value. What if we have three expressions?
macro_rules! count_exprs { () => (0); ($e:expr) => (1); ($e0:expr, $e1:expr) => (1 + count_exprs!($e1)); ($e0:expr, $e1:expr, $e2:expr) => (1 + count_exprs!($e1, $e2)); // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ added } fn main() { const _0: usize = count_exprs!(); const _1: usize = count_exprs!(x); const _2: usize = count_exprs!(x, y); const _3: usize = count_exprs!(x, y, z); assert_eq!(_0, 0); assert_eq!(_1, 1); assert_eq!(_2, 2); assert_eq!(_3, 3); }
Aside: You might be wondering if we could reverse the order of these rules. In this particular case, yes, but the macro system can sometimes be picky about what it is and is not willing to recover from. If you ever find yourself with a multi-rule macro that you swear should work, but gives you errors about unexpected tokens, try changing the order of the rules.
Hopefully, you can see the pattern here. We can always reduce the list of expressions by matching one expression, followed by zero or more expressions, expanding that into 1 + a count.
macro_rules! count_exprs { () => (0); ($head:expr) => (1); ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*)); // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ changed } fn main() { const _0: usize = count_exprs!(); const _1: usize = count_exprs!(x); const _2: usize = count_exprs!(x, y); const _3: usize = count_exprs!(x, y, z); assert_eq!(_0, 0); assert_eq!(_1, 1); assert_eq!(_2, 2); assert_eq!(_3, 3); }
JFTE: this is not the only, or even the best way of counting things. You may wish to peruse the Counting section later.
With this, we can now modify recurrence
to determine the necessary size of mem
.
// added: macro_rules! count_exprs { () => (0); ($head:expr) => (1); ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*)); } macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { { use std::ops::Index; const MEM_SIZE: usize = count_exprs!($($inits),+); // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ added struct Recurrence { mem: [$sty; MEM_SIZE], // ^~~~~~~~ changed pos: usize, } struct IndexOffset<'a> { slice: &'a [$sty; MEM_SIZE], // ^~~~~~~~ changed offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = $sty; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b $sty { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(MEM_SIZE); // ^~~~~~~~ changed let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = $sty; #[inline] fn next(&mut self) -> Option<$sty> { if self.pos < MEM_SIZE { // ^~~~~~~~ changed let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let n = self.pos; let a = IndexOffset { slice: &self.mem, offset: n }; (a[n-1] + a[n-2]) }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..MEM_SIZE).rev() { // ^~~~~~~~ changed swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [$($inits),+], pos: 0 } } }; } /* ... */ fn main() { let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } }
With that done, we can now substitute the last thing: the recur
expression.
macro_rules! count_exprs { () => (0); ($head:expr $(, $tail:expr)*) => (1 + count_exprs!($($tail),*)); } macro_rules! recurrence { ( a[n]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { { use std::ops::Index; const MEM_SIZE: usize = count_exprs!($($inits),+); struct Recurrence { mem: [$sty; MEM_SIZE], pos: usize, } struct IndexOffset<'a> { slice: &'a [$sty; MEM_SIZE], offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = $sty; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b $sty { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(MEM_SIZE); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = $sty; /* ... */ #[inline] fn next(&mut self) -> Option<u64> { if self.pos < MEM_SIZE { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let n = self.pos; let a = IndexOffset { slice: &self.mem, offset: n }; $recur // ^~~~~~ changed }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..MEM_SIZE).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } /* ... */ } Recurrence { mem: [$($inits),+], pos: 0 } } }; } fn main() { let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } }
And, when we compile our finished macro...
error[E0425]: cannot find value `a` in this scope
--> src/main.rs:68:50
|
68 | let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
| ^ not found in this scope
error[E0425]: cannot find value `n` in this scope
--> src/main.rs:68:52
|
68 | let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
| ^ not found in this scope
error[E0425]: cannot find value `a` in this scope
--> src/main.rs:68:59
|
68 | let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
| ^ not found in this scope
error[E0425]: cannot find value `n` in this scope
--> src/main.rs:68:61
|
68 | let fib = recurrence![a[n]: u64 = 1, 1; ...; a[n-1] + a[n-2]];
| ^ not found in this scope
... wait, what? That can't be right... let's check what the macro is expanding to.
$ rustc -Z unstable-options --pretty expanded recurrence.rs
The --pretty expanded
argument tells rustc
to perform macro expansion, then turn the resulting
AST back into source code. Because this option isn't considered stable yet, we also need
-Z unstable-options
. The output (after cleaning up some formatting) is shown below; in particular,
note the place in the code where $recur
was substituted:
#![feature(no_std)]
#![no_std]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std as std;
fn main() {
let fib = {
use std::ops::Index;
const MEM_SIZE: usize = 1 + 1;
struct Recurrence {
mem: [u64; MEM_SIZE],
pos: usize,
}
struct IndexOffset<'a> {
slice: &'a [u64; MEM_SIZE],
offset: usize,
}
impl <'a> Index<usize> for IndexOffset<'a> {
type Output = u64;
#[inline(always)]
fn index<'b>(&'b self, index: usize) -> &'b u64 {
use std::num::Wrapping;
let index = Wrapping(index);
let offset = Wrapping(self.offset);
let window = Wrapping(MEM_SIZE);
let real_index = index - offset + window;
&self.slice[real_index.0]
}
}
impl Iterator for Recurrence {
type Item = u64;
#[inline]
fn next(&mut self) -> Option<u64> {
if self.pos < MEM_SIZE {
let next_val = self.mem[self.pos];
self.pos += 1;
Some(next_val)
} else {
let next_val = {
let n = self.pos;
let a = IndexOffset{slice: &self.mem, offset: n,};
a[n - 1] + a[n - 2]
};
{
use std::mem::swap;
let mut swap_tmp = next_val;
{
let result =
match ::std::iter::IntoIterator::into_iter((0..MEM_SIZE).rev()) {
mut iter => loop {
match ::std::iter::Iterator::next(&mut iter) {
::std::option::Option::Some(i) => {
swap(&mut swap_tmp, &mut self.mem[i]);
}
::std::option::Option::None => break,
}
},
};
result
}
}
self.pos += 1;
Some(next_val)
}
}
}
Recurrence{mem: [0, 1], pos: 0,}
};
{
let result =
match ::std::iter::IntoIterator::into_iter(fib.take(10)) {
mut iter => loop {
match ::std::iter::Iterator::next(&mut iter) {
::std::option::Option::Some(e) => {
::std::io::_print(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
__STATIC_FMTSTR
},
&match (&e,) {
(__arg0,) => [::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt)],
}
))
}
::std::option::Option::None => break,
}
},
};
result
}
}
But that looks fine! If we add a few missing #![feature(...)]
attributes and feed it to a nightly
build of rustc
, it even compiles! ... what?!
Aside: You can't compile the above with a non-nightly build of
rustc
. This is because the expansion of theprintln!
macro depends on internal compiler details which are not publicly stabilised.
Being Hygienic
The issue here is that identifiers in Rust macros are hygienic. That is, identifiers from two different contexts cannot collide. To show the difference, let's take a simpler example.
macro_rules! using_a {
($e:expr) => {
{
let a = 42i;
$e
}
}
}
let four = using_a!(a / 10);
fn main() {}
This macro simply takes an expression, then wraps it in a block with a variable a
defined. We then
use this as a round-about way of computing 4
. There are actually two syntax contexts involved in
this example, but they're invisible. So, to help with this, let's give each context a different
colour. Let's start with the unexpanded code, where there is only a single context:
macro_rules! using_a {
($e:expr) => {
{
let a = 42;
$e
}
}
}
let four = using_a!(a / 10);
Now, let's expand the invocation.
let four = { let a = 42; a / 10 };
As you can see, the a
that's defined by the macro is in a
different context to the a
we provided in our invocation.
As such, the compiler treats them as completely different identifiers,
even though they have the same lexical appearance.
This is something to be really careful of when working on macros: macros can produce ASTs which
will not compile, but which will compile if written out by hand, or dumped using
--pretty expanded
.
The solution to this is to capture the identifier with the appropriate syntax context. To do that, we need to again adjust our macro syntax. To continue with our simpler example:
macro_rules! using_a {
($a:ident, $e:expr) => {
{
let $a = 42;
$e
}
}
}
let four = using_a!(a, a / 10);
This now expands to:
let four = { let a = 42; a / 10 };
Now, the contexts match, and the code will compile. We can make this adjustment to our
recurrence!
macro by explicitly capturing a
and n
. After making the necessary changes, we
have:
macro_rules! count_exprs { () => (0); ($head:expr) => (1); ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*)); } macro_rules! recurrence { ( $seq:ident [ $ind:ident ]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { // ^~~~~~~~~~ ^~~~~~~~~~ changed { use std::ops::Index; const MEM_SIZE: usize = count_exprs!($($inits),+); struct Recurrence { mem: [$sty; MEM_SIZE], pos: usize, } struct IndexOffset<'a> { slice: &'a [$sty; MEM_SIZE], offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = $sty; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b $sty { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(MEM_SIZE); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = $sty; #[inline] fn next(&mut self) -> Option<$sty> { if self.pos < MEM_SIZE { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let $ind = self.pos; // ^~~~ changed let $seq = IndexOffset { slice: &self.mem, offset: $ind }; // ^~~~ changed $recur }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..MEM_SIZE).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [$($inits),+], pos: 0 } } }; } fn main() { let fib = recurrence![a[n]: u64 = 0, 1; ...; a[n-1] + a[n-2]]; for e in fib.take(10) { println!("{}", e) } }
And it compiles! Now, let's try with a different sequence.
macro_rules! count_exprs { () => (0); ($head:expr) => (1); ($head:expr, $($tail:expr),*) => (1 + count_exprs!($($tail),*)); } macro_rules! recurrence { ( $seq:ident [ $ind:ident ]: $sty:ty = $($inits:expr),+ ; ... ; $recur:expr ) => { { use std::ops::Index; const MEM_SIZE: usize = count_exprs!($($inits),+); struct Recurrence { mem: [$sty; MEM_SIZE], pos: usize, } struct IndexOffset<'a> { slice: &'a [$sty; MEM_SIZE], offset: usize, } impl<'a> Index<usize> for IndexOffset<'a> { type Output = $sty; #[inline(always)] fn index<'b>(&'b self, index: usize) -> &'b $sty { use std::num::Wrapping; let index = Wrapping(index); let offset = Wrapping(self.offset); let window = Wrapping(MEM_SIZE); let real_index = index - offset + window; &self.slice[real_index.0] } } impl Iterator for Recurrence { type Item = $sty; #[inline] fn next(&mut self) -> Option<$sty> { if self.pos < MEM_SIZE { let next_val = self.mem[self.pos]; self.pos += 1; Some(next_val) } else { let next_val = { let $ind = self.pos; let $seq = IndexOffset { slice: &self.mem, offset: $ind }; $recur }; { use std::mem::swap; let mut swap_tmp = next_val; for i in (0..MEM_SIZE).rev() { swap(&mut swap_tmp, &mut self.mem[i]); } } self.pos += 1; Some(next_val) } } } Recurrence { mem: [$($inits),+], pos: 0 } } }; } fn main() { for e in recurrence!(f[i]: f64 = 1.0; ...; f[i-1] * i as f64).take(10) { println!("{}", e) } }
Which gives us:
1
1
2
6
24
120
720
5040
40320
362880
Success!
Patterns
Parsing and expansion patterns.
Callbacks
macro_rules! call_with_larch { ($callback:ident) => { $callback!(larch) }; } macro_rules! expand_to_larch { () => { larch }; } macro_rules! recognize_tree { (larch) => { println!("#1, the Larch.") }; (redwood) => { println!("#2, the Mighty Redwood.") }; (fir) => { println!("#3, the Fir.") }; (chestnut) => { println!("#4, the Horse Chestnut.") }; (pine) => { println!("#5, the Scots Pine.") }; ($($other:tt)*) => { println!("I don't know; some kind of birch maybe?") }; } fn main() { recognize_tree!(expand_to_larch!()); call_with_larch!(recognize_tree); }
Due to the order that macros are expanded in, it is (as of Rust 1.2) impossible to pass information to a macro from the expansion of another macro. This can make modularizing macros very difficult.
An alternative is to use recursion and pass a callback. Here is a trace of the above example to demonstrate how this takes place:
recognize_tree! { expand_to_larch ! ( ) }
println! { "I don't know; some kind of birch maybe?" }
// ...
call_with_larch! { recognize_tree }
recognize_tree! { larch }
println! { "#1, the Larch." }
// ...
Using a tt
repetition, one can also forward arbitrary arguments to a callback.
macro_rules! callback { ($callback:ident( $($args:tt)* )) => { $callback!( $($args)* ) }; } fn main() { callback!(callback(println("Yes, this *was* unnecessary."))); }
You can, of course, insert additional tokens in the arguments as needed.
Incremental TT Munchers
macro_rules! mixed_rules { () => {}; (trace $name:ident; $($tail:tt)*) => { { println!(concat!(stringify!($name), " = {:?}"), $name); mixed_rules!($($tail)*); } }; (trace $name:ident = $init:expr; $($tail:tt)*) => { { let $name = $init; println!(concat!(stringify!($name), " = {:?}"), $name); mixed_rules!($($tail)*); } }; } fn main() { let a = 42; let b = "Ho-dee-oh-di-oh-di-oh!"; let c = (false, 2, 'c'); mixed_rules!( trace a; trace b; trace c; trace b = "They took her where they put the crazies."; trace b; ); }
This pattern is perhaps the most powerful macro parsing technique available, allowing one to parse grammars of significant complexity.
A TT muncher
is a recursive macro that works by incrementally processing its input one step at a
time. At each step, it matches and removes (munches) some sequence of tokens from the start of its
input, generates some intermediate output, then recurses on the input tail.
The reason for "TT" in the name specifically is that the unprocessed part of the input is always
captured as $($tail:tt)*
. This is done as a tt
repetition is the only way to losslessly
capture part of a macro's input.
The only hard restrictions on TT munchers are those imposed on the macro system as a whole:
- You can only match against literals and grammar constructs which can be captured by
macro_rules!
. - You cannot match unbalanced groups.
It is important, however, to keep the macro recursion limit in mind. macro_rules!
does not have
any form of tail recursion elimination or optimization. It is recommended that, when writing a TT
muncher, you make reasonable efforts to keep recursion as limited as possible. This can be done by
adding additional rules to account for variation in the input (as opposed to recursion into an
intermediate layer), or by making compromises on the input syntax to make using standard repetitions
more tractable.
Internal Rules
#[macro_export] macro_rules! foo { (@as_expr $e:expr) => {$e}; ($($tts:tt)*) => { foo!(@as_expr $($tts)*) }; } fn main() { assert_eq!(foo!(42), 42); }
Internal rules can be used to unify multiple macros into one, or to make it easier to read and write TT Munchers by explicitly naming what rule you wish to call in a macro.
So why is it useful to unify multiple macros into one? The main reasoning for this is how macros are handled in the 2015 Edition of Rust due to macros not being namespaced in said edition. This gives one the troubles of having to re-export all the internal macros as well polluting the global macro namespace or even worse, macro name collisions with other crates. In short, it's quite a hassle. This fortunately isn't really a problem anymore nowadays with a rustc version >= 1.30, for more information consult the Import and Export chapter.
Nevertheless, let's talk about how we can unify multiple macros into one with this technique and what exactly this technique even is.
We have two macros, the common as_expr!
macro and a foo
macro that makes use of the first macro:
#[macro_export] macro_rules! as_expr { ($e:expr) => {$e} } #[macro_export] macro_rules! foo { ($($tts:tt)*) => { as_expr!($($tts)*) }; } fn main() { assert_eq!(foo!(42), 42); }
This is definitely not the nicest solution we could have for this macro, as it pollutes the global
macro namespace as mentioned earlier. In this specific case as_expr
is also a very simple macro
that we only used once, so let's "embed" this macro in our foo
macro with internal rules! To do so
we simply prepend a new matcher for our macro consists of the matcher used in the as_expr
macro,
but with a small addition. We prepend a tokentree that makes it match only when specifically asked
to. In this case we can for example use @as_expr
, so our matcher becomes
(@as_expr $e:expr) => {$e};
. With this we get the macro that was defined at the very top of this
page:
#[macro_export] macro_rules! foo { (@as_expr $e:expr) => {$e}; ($($tts:tt)*) => { foo!(@as_expr $($tts)*) }; } fn main() { assert_eq!(foo!(42), 42); }
You see how we embedded the as_expr
macro in the foo
one? All that changed is that instead of
invoking the as_expr
macro, we now invoke foo
recursively but with a special token tree
prepended to the arguments, foo!(@as_expr $($tts)*)
. If you look closely you might even see that
this pattern can be combined quite nicely with TT Munchers!
The reason for using @
was that, as of Rust 1.2, the @
token is not used in prefix position; as
such, it cannot conflict with anything. This reasoning became obsolete later on when in Rust 1.7
macro matchers got future proofed by emitting a warning to prevent certain tokens from being allowed
to follow certain fragments1, which in Rust 1.12 became a hard-error. There
other symbols or unique prefixes may be used as desired, but use of @
has started to become
widespread, so using it may aid readers in understanding your macro.
Note: in the early days of Rust the
@
token was previously used in prefix position to denote a garbage-collected pointer, back when the language used sigils to denote pointer types. Its only current purpose is for binding names to patterns. For this, however, it is used as an infix operator, and thus does not conflict with its use here.
Additionally, internal rules will often come before any "bare" rules, to avoid issues with
macro_rules!
incorrectly attempting to parse an internal invocation as something it cannot
possibly be, such as an expression.
Push-down Accumulation
#![allow(unused)] fn main() { macro_rules! init_array { (@accum (0, $_e:expr) -> ($($body:tt)*)) => {init_array!(@as_expr [$($body)*])}; (@accum (1, $e:expr) -> ($($body:tt)*)) => {init_array!(@accum (0, $e) -> ($($body)* $e,))}; (@accum (2, $e:expr) -> ($($body:tt)*)) => {init_array!(@accum (1, $e) -> ($($body)* $e,))}; (@accum (3, $e:expr) -> ($($body:tt)*)) => {init_array!(@accum (2, $e) -> ($($body)* $e,))}; (@as_expr $e:expr) => {$e}; [$e:expr; $n:tt] => { { let e = $e; init_array!(@accum ($n, e.clone()) -> ()) } }; } let strings: [String; 3] = init_array![String::from("hi!"); 3]; assert_eq!(format!("{:?}", strings), "[\"hi!\", \"hi!\", \"hi!\"]"); }
All macros in Rust must result in a complete, supported syntax element (such as an expression, item, etc.). This means that it is impossible to have a macro expand to a partial construct.
One might hope that the above example could be more directly expressed like so:
macro_rules! init_array {
(@accum 0, $_e:expr) => {/* empty */};
(@accum 1, $e:expr) => {$e};
(@accum 2, $e:expr) => {$e, init_array!(@accum 1, $e)};
(@accum 3, $e:expr) => {$e, init_array!(@accum 2, $e)};
[$e:expr; $n:tt] => {
{
let e = $e;
[init_array!(@accum $n, e)]
}
};
}
The expectation is that the expansion of the array literal would proceed as follows:
[init_array!(@accum 3, e)]
[e, init_array!(@accum 2, e)]
[e, e, init_array!(@accum 1, e)]
[e, e, e]
However, this would require each intermediate step to expand to an incomplete expression. Even though the intermediate results will never be used outside of a macro context, it is still forbidden.
Push-down, however, allows us to incrementally build up a sequence of tokens without needing to actually have a complete construct at any point prior to completion. In the example given at the top, the sequence of macro invocations proceeds as follows:
init_array! { String:: from ( "hi!" ) ; 3 }
init_array! { @ accum ( 3 , e . clone ( ) ) -> ( ) }
init_array! { @ accum ( 2 , e.clone() ) -> ( e.clone() , ) }
init_array! { @ accum ( 1 , e.clone() ) -> ( e.clone() , e.clone() , ) }
init_array! { @ accum ( 0 , e.clone() ) -> ( e.clone() , e.clone() , e.clone() , ) }
init_array! { @ as_expr [ e.clone() , e.clone() , e.clone() , ] }
As you can see, each layer adds to the accumulated output until the terminating rule finally emits it as a complete construct.
The only critical part of the above formulation is the use of $($body:tt)*
to preserve the output
without triggering parsing. The use of ($input) -> ($output)
is simply a convention adopted to
help clarify the behavior of such macros.
Push-down accumulation is frequently used as part of incremental TT munchers, as it allows arbitrarily complex intermediate results to be constructed. Internal Rules were of use here as well, as they simplify creating such macros.
Repetition Replacement
macro_rules! replace_expr {
($_t:tt $sub:expr) => {$sub};
}
This pattern is where a matched repetition sequence is simply discarded, with the variable being used to instead drive some repeated pattern that is related to the input only in terms of length.
For example, consider constructing a default instance of a tuple with more than 12 elements (the limit as of Rust 1.2).
#![allow(unused)] fn main() { macro_rules! tuple_default { ($($tup_tys:ty),*) => { ( $( replace_expr!( ($tup_tys) Default::default() ), )* ) }; } macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub}; } assert_eq!(tuple_default!(i32, bool, String), (i32::default(), bool::default(), String::default())); }
JFTE: we could have simply used
$tup_tys::default()
.
Here, we are not actually using the matched types. Instead, we throw them away and replace them with a single, repeated expression. To put it another way, we don't care what the types are, only how many there are.
TT Bundling
macro_rules! call_a_or_b_on_tail { ((a: $a:ident, b: $b:ident), call a: $($tail:tt)*) => { $a(stringify!($($tail)*)) }; ((a: $a:ident, b: $b:ident), call b: $($tail:tt)*) => { $b(stringify!($($tail)*)) }; ($ab:tt, $_skip:tt $($tail:tt)*) => { call_a_or_b_on_tail!($ab, $($tail)*) }; } fn compute_len(s: &str) -> Option<usize> { Some(s.len()) } fn show_tail(s: &str) -> Option<usize> { println!("tail: {:?}", s); None } fn main() { assert_eq!( call_a_or_b_on_tail!( (a: compute_len, b: show_tail), the recursive part that skips over all these tokens doesn't much care whether we will call a or call b: only the terminal rules care. ), None ); assert_eq!( call_a_or_b_on_tail!( (a: compute_len, b: show_tail), and now, to justify the existence of two paths we will also call a: its input should somehow be self-referential, so let's make it return some eighty-six! ), Some(92) ); }
In particularly complex recursive macros, a large number of arguments may be needed in order to carry identifiers and expressions to successive layers. However, depending on the implementation there may be many intermediate layers which need to forward these arguments, but do not need to use them.
As such, it can be very useful to bundle all such arguments together into a single TT by placing
them in a group. This allows layers which do not need to use the arguments to simply capture and
substitute a single tt
, rather than having to exactly capture and substitute the entire argument
group.
The example above bundles the $a
and $b
expressions into a group which can then be forwarded as
a single tt
by the recursive rule. This group is then destructured by the terminal rules to
access the expressions.
Building Blocks
Reusable snippets of macro code.
AST Coercion
The Rust parser is not very robust in the face of tt
substitutions. Problems can arise when the
parser is expecting a particular grammar construct and instead finds a lump of substituted tt
tokens. Rather than attempt to parse them, it will often just give up. In these cases, it is
necessary to employ an AST coercion.
#![allow(dead_code)] macro_rules! as_expr { ($e:expr) => {$e} } macro_rules! as_item { ($i:item) => {$i} } macro_rules! as_pat { ($p:pat) => {$p} } macro_rules! as_stmt { ($s:stmt) => {$s} } macro_rules! as_ty { ($t:ty) => {$t} } as_item!{struct Dummy;} fn main() { as_stmt!(let as_pat!(_): as_ty!(_) = as_expr!(42)); }
These coercions are often used with push-down accumulation macros in order to get the parser to
treat the final tt
sequence as a particular kind of grammar construct.
Note that this specific set of macros is determined by what macros are allowed to expand to, not what they are able to capture.
Counting
Repetition with replacement
Counting things in a macro is a surprisingly tricky task. The simplest way is to use replacement with a repetition match.
macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub}; } macro_rules! count_tts { ($($tts:tt)*) => {0usize $(+ replace_expr!($tts 1usize))*}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This is a fine approach for smallish numbers, but will likely crash the compiler with inputs of around 500 or so tokens. Consider that the output will look something like this:
0usize + 1usize + /* ~500 `+ 1usize`s */ + 1usize
The compiler must parse this into an AST, which will produce what is effectively a perfectly unbalanced binary tree 500+ levels deep.
Recursion
An older approach is to use recursion.
macro_rules! count_tts { () => {0usize}; ($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
Note: As of
rustc
1.2, the compiler has grievous performance problems when large numbers of integer literals of unknown type must undergo inference. We are using explicitlyusize
-typed literals here to avoid that.If this is not suitable (such as when the type must be substitutable), you can help matters by using
as
(e.g.0 as $ty
,1 as $ty
, etc.).
This works, but will trivially exceed the recursion limit. Unlike the repetition approach, you can extend the input size by matching multiple tokens at once.
macro_rules! count_tts { ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt $_k:tt $_l:tt $_m:tt $_n:tt $_o:tt $_p:tt $_q:tt $_r:tt $_s:tt $_t:tt $($tail:tt)*) => {20usize + count_tts!($($tail)*)}; ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $_f:tt $_g:tt $_h:tt $_i:tt $_j:tt $($tail:tt)*) => {10usize + count_tts!($($tail)*)}; ($_a:tt $_b:tt $_c:tt $_d:tt $_e:tt $($tail:tt)*) => {5usize + count_tts!($($tail)*)}; ($_a:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)}; () => {0usize}; } fn main() { assert_eq!(700, count_tts!( ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, // Repetition breaks somewhere after this ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, ,,,,,,,,,, )); }
This particular formulation will work up to ~1,200 tokens.
Slice length
A third approach is to help the compiler construct a shallow AST that won't lead to a stack overflow.
This can be done by constructing an array literal and calling the len
method.
macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub}; } macro_rules! count_tts { ($($tts:tt)*) => {<[()]>::len(&[$(replace_expr!($tts ())),*])}; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This has been tested to work up to 10,000 tokens, and can probably go much higher.
Enum counting
This approach can be used where you need to count a set of mutually distinct identifiers.
macro_rules! count_idents { ($($idents:ident),* $(,)*) => { { #[allow(dead_code, non_camel_case_types)] enum Idents { $($idents,)* __CountIdentsLast } const COUNT: u32 = Idents::__CountIdentsLast as u32; COUNT } }; } fn main() { const COUNT: u32 = count_idents!(A, B, C); assert_eq!(COUNT, 3); }
This method does have two drawbacks. First, as implied above, it can only count valid identifiers (which are also not keywords), and it does not allow those identifiers to repeat.
Secondly, this approach is not hygienic, meaning that if whatever identifier you use in place of
__CountIdentsLast
is provided as input, the macro will fail due to the duplicate variants in the
enum
.
Bit twiddling
Another recursive approach using bit operations:
macro_rules! count_tts { () => { 0 }; ($odd:tt $($a:tt $b:tt)*) => { (count_tts!($($a)*) << 1) | 1 }; ($($a:tt $even:tt)*) => { count_tts!($($a)*) << 1 }; } fn main() { assert_eq!(count_tts!(0 1 2), 3); }
This approach is pretty smart as it effectively halves its input whenever its even and then
multiplying the counter by 2 (or in this case shifting 1 bit to the left which is equivalent). If
the input is uneven it simply takes one token tree from the input or
s the token tree to the
previous counter which is equivalent to adding 1 as the lowest bit has to be a 0 at this point due
to the previous shifting. Rinse and repeat until we hit the base rule () => 0
.
The benefit of this is that the constructed AST expression that makes up the counter value will grow
with a complexity of O(log(n))
instead of O(n)
like the other approaches. Be aware that you can
still hit the recursion limit with this if you try hard enough. Credits for this method go to Reddit
user YatoRust
.
Let's go through the procedure by hand once:
count_tts!(0 0 0 0 0 0 0 0 0 0);
This invocation will match the third rule due to the fact that we have an even number of token trees
(10). The matcher names the odd token trees in the sequence $a
and the even ones $even
but the
expansion only makes use of $a
, which means it effectively discards all the even elements cutting
the input in half. So the invocation now becomes:
count_tts!(0 0 0 0 0) << 1;
This invocation will now match the second rule as its input is an uneven amount of token trees. In this case the first token tree is discarded to make the input even again, then we also do the halving step in this invocation again since we know the input would be even now anyways. Therefor we can count 1 for the uneven discard and multiply by 2 again since we also halved.
((count_tts!(0 0) << 1) | 1) << 1;
((count_tts!(0) << 1 << 1) | 1) << 1;
(((count_tts!() | 1) << 1 << 1) | 1) << 1;
((((0 << 1) | 1) << 1 << 1) | 1) << 1;
Now to check if we expanded correctly manually we can use a one of the tools we introduced for
debugging
. When expanding the macro there we should get:
((((0 << 1) | 1) << 1 << 1) | 1) << 1;
That's the same so we didn't make any mistakes, great!
Abacus Counters
Provisional: needs a more compelling example. Matching nested groups that are not denoted by Rust groups is sufficiently unusual that it may not merit inclusion.
Note: this section assumes understanding of push-down accumulation and incremental TT munchers.
macro_rules! abacus { ((- $($moves:tt)*) -> (+ $($count:tt)*)) => { abacus!(($($moves)*) -> ($($count)*)) }; ((- $($moves:tt)*) -> ($($count:tt)*)) => { abacus!(($($moves)*) -> (- $($count)*)) }; ((+ $($moves:tt)*) -> (- $($count:tt)*)) => { abacus!(($($moves)*) -> ($($count)*)) }; ((+ $($moves:tt)*) -> ($($count:tt)*)) => { abacus!(($($moves)*) -> (+ $($count)*)) }; // Check if the final result is zero. (() -> ()) => { true }; (() -> ($($count:tt)+)) => { false }; } fn main() { let equals_zero = abacus!((++-+-+++--++---++----+) -> ()); assert_eq!(equals_zero, true); }
This technique can be used in cases where you need to keep track of a varying counter that starts at or near zero, and must support the following operations:
- Increment by one.
- Decrement by one.
- Compare to zero (or any other fixed, finite value).
A value of n is represented by n instances of a specific token stored in a group. Modifications
are done using recursion and push-down accumulation. Assuming the token
used is x
, the operations above are implemented as follows:
- Increment by one: match
($($count:tt)*)
, substitute(x $($count)*)
. - Decrement by one: match
(x $($count:tt)*)
, substitute($($count)*)
. - Compare to zero: match
()
. - Compare to one: match
(x)
. - Compare to two: match
(x x)
. - (and so on...)
In this way, operations on the counter are like flicking tokens back and forth like an abacus.1
This desperately thin reasoning conceals the real reason for this name: to avoid having
yet another thing with "token" in the name. Talk to your writer about avoiding
semantic satiation today!
In fairness, it could also have been called "unary counting".
In cases where you want to represent negative values, -n can be represented as n instances of a
different token. In the example given above, +n is stored as n +
tokens, and -m is stored
as m -
tokens.
In this case, the operations become slightly more complicated; increment and decrement effectively
reverse their usual meanings when the counter is negative. To which given +
and -
for the
positive and negative tokens respectively, the operations change to:
- Increment by one:
- match
()
, substitute(+)
. - match
(- $($count:tt)*)
, substitute($($count)*)
. - match
($($count:tt)+)
, substitute(+ $($count)+)
.
- match
- Decrement by one:
- match
()
, substitute(-)
. - match
(+ $($count:tt)*)
, substitute($($count)*)
. - match
($($count:tt)+)
, substitute(- $($count)+)
.
- match
- Compare to 0: match
()
. - Compare to +1: match
(+)
. - Compare to -1: match
(-)
. - Compare to +2: match
(++)
. - Compare to -2: match
(--)
. - (and so on...)
Note that the example at the top combines some of the rules together (for example, it combines
increment on ()
and ($($count:tt)+)
into an increment on ($($count:tt)*)
).
If you want to extract the actual value of the counter, this can be done using a regular counter macro. For the example above, the terminal rules can be replaced with the following:
macro_rules! abacus {
// ...
// This extracts the counter as an integer expression.
(() -> ()) => {0};
(() -> (- $($count:tt)*)) => {
- ( count_tts!($( $count_tts:tt )*) )
};
(() -> (+ $($count:tt)*)) => {
count_tts!($( $count_tts:tt )*)
};
}
// One of the many token tree counting macros in the counting chapter
macro_rules! count_tts {
// ...
}
JFTE: strictly speaking, the above formulation of
abacus!
is needlessly complex. It can be implemented much more efficiently using repetition, provided you do not need to match against the counter's value in a macro:macro_rules! abacus { (-) => {-1}; (+) => {1}; ($( $moves:tt )*) => { 0 $(+ abacus!($moves))* } }
Parsing Rust
Parsing some of Rust's items can be useful in certain situations. This section will show a few macros that can parse some of Rust's more complex items like structs and functions to a certain extent. The goal of these macros is not to be able to parse the entire grammar of the items but to parse parts that are in general quite useful without being too complex to parse. This means we ignore things like generics and such.
The main points of interest of these macros are their matchers
. The transcribers are only there
for example purposes and are usually not that impressive.
Function
macro_rules! function_item_matcher { ( $( #[$meta:meta] )* // ^~~~attributes~~~~^ $vis:vis fn $name:ident ( $( $arg_pat:ident : $arg_ty:ty ),* $(,)? ) // ^~~~~~~~~~~~~~~~argument list~~~~~~~~~~~~~~~^ $( -> $ret_ty:ty )? // ^~~~return type~~~^ { $($tt:tt)* } // ^~~~~body~~~~^ ) => { $( #[$meta] )* $vis fn $name ( $( $arg_name : $arg_ty ),* ) $( -> $ret_ty )? { $($tt)* } } } function_item_matcher!( #[inline] #[cold] pub fn foo(bar: i32, baz: i32, ) -> String { format!("{} {}", bar, baz) } ); fn main() { assert_eq!(foo(13, 37), "13 37"); }
A simple function matcher that ignores qualifiers like unsafe
, async
, ... as well a generics and
where clauses. If parsing those is required it is likely that you are better off using a proc-macro
instead.
This lets you for example, inspect the function signature, generate some extra things from it and
then re-emit the entire function again. Kind of like a Derive
proc-macro but weaker and for
functions.
Ideally we would like to use a pattern fragment specifier instead of an ident for the arguments but this is currently not allowed. Fortunately people don't use patterns in function signatures that often so this is okay.
Method
The macro for parsing basic functions is nice and all, but sometimes we would like to also parse
methods, functions that refer to their object via some form of self
usage. This makes things a bit
trickier:
WIP
Struct
macro_rules! struct_item_matcher { // Unit-Struct ( $( #[$meta:meta] )* // ^~~~attributes~~~~^ $vis:vis struct $name:ident; ) => { $( #[$meta] )* $vis struct $name; }; // Tuple-Struct ( $( #[$meta:meta] )* // ^~~~attributes~~~~^ $vis:vis struct $name:ident ( $( $( #[$field_meta:meta] )* // ^~~~field attributes~~~~^ $field_vis:vis $field_ty:ty // ^~~~~~a single field~~~~~~^ ),* $(,)? ); ) => { $( #[$meta] )* $vis struct $name ( $( $( #[$field_meta] )* $field_vis $field_ty ),* ); }; // Named-Struct ( $( #[$meta:meta] )* // ^~~~attributes~~~~^ $vis:vis struct $name:ident { $( $( #[$field_meta:meta] )* // ^~~~field attributes~~~!^ $field_vis:vis $field_name:ident : $field_ty:ty // ^~~~~~~~~~~~~~~~~a single field~~~~~~~~~~~~~~~^ ),* $(,)? } ) => { $( #[$meta] )* $vis struct $name { $( $( #[$field_meta] )* $field_vis $field_name : $field_ty ),* } } } struct_item_matcher!( #[derive(Copy, Clone)] pub(crate) struct Foo { pub bar: i32, baz: &'static str, qux: f32 } ); struct_item_matcher!( #[derive(Copy, Clone)] pub(crate) struct Bar; ); struct_item_matcher!( #[derive(Clone)] pub(crate) struct Baz (i32, pub f32, String); ); fn main() { let _: Foo = Foo { bar: 42, baz: "macros can be nice", qux: 3.14, }; let _: Bar = Bar; let _: Baz = Baz(2, 0.1234, String::new()); }
Enum
Parsing enums is a bit more complex than structs so we will finally make use of some of the patterns we have discussed, Incremental TT Muncher and Internal Rules. Instead of just building the parsed enum again we will merely visit all the tokens of the enum, as rebuilding the enum would require us to collect all the parsed tokens temporarily again via a Push Down Accumulator.
#![allow(unused)] fn main() { macro_rules! enum_item_matcher { // tuple variant (@variant $variant:ident ( $( $( #[$field_meta:meta] )* // ^~~~field attributes~~~~^ $field_vis:vis $field_ty:ty // ^~~~~~a single field~~~~~~^ ),* $(,)? //∨~~rest of input~~∨ ) $(, $($tt:tt)* )? ) => { // process rest of the enum $( enum_item_matcher!(@variant $( $tt )*) )? }; // named variant (@variant $variant:ident { $( $( #[$field_meta:meta] )* // ^~~~field attributes~~~!^ $field_vis:vis $field_name:ident : $field_ty:ty // ^~~~~~~~~~~~~~~~~a single field~~~~~~~~~~~~~~~^ ),* $(,)? //∨~~rest of input~~∨ } $(, $($tt:tt)* )? ) => { // process rest of the enum $( enum_item_matcher!(@variant $( $tt )*) )? }; // unit variant (@variant $variant:ident $(, $($tt:tt)* )? ) => { // process rest of the enum $( enum_item_matcher!(@variant $( $tt )*) )? }; // trailing comma (@variant ,) => {}; // base case (@variant) => {}; // entry point ( $( #[$meta:meta] )* $vis:vis enum $name:ident { $($tt:tt)* } ) => { enum_item_matcher!(@variant $($tt)*) }; } enum_item_matcher!( #[derive(Copy, Clone)] pub(crate) enum Foo { Bar, Baz, } ); enum_item_matcher!( #[derive(Copy, Clone)] pub(crate) enum Bar { Foo(i32, f32), Bar, Baz(), } ); enum_item_matcher!( #[derive(Clone)] pub(crate) enum Baz {} ); }