Weird Expressions and Where to Find Them
This is a written transcript of my RustConf 2022 talk using rustc at commit 6491eb1e6c9fac20faad11e5da16db3185b2410d
You can find the video of the talk here.
Introduction
My name is Michael Gattozzi and today I want to bring you on a journey from "What?" to "No way that works" to "Wait why does that work?!" and then true understanding and acceptance. Many an article or talk has been created about "Idiomatic Rust" or "Proper Code" or whatever else the kids say these days. However, I've spent a long time gazing into the "non-idiomatic" abyss of the weird-exprs.rs
test file. A file which contains the kinds of programs that are non-sensical Rust, but technically correct Rust, the best kind of correct. A file hidden deep in the bowels of the Rust Compiler Test Suite designed to protect against messing up parsing rules and testing weird edge cases that rustc
must accept, even if we don't want it too. A test file known to the compiler devs as a necessary evil and sequestered from the public lest it make both old and new Rustaceans run away in abject horror. A test file that shows just what kinds of wretched programs mortals are capable of inflicting upon the the world.
Come with me on a journey to truly understand this file by first understanding where and why it came into being.
History
In the beginning there was commit (664b0ad)3fcead4fe4d22c05065a82a338770c429 made on August 19th, 2011. In it was a file wierd-exprs.rs
, this was a mistake that was not noticed for a few commits and then renamed to weird-exprs.rs
on Sep 26, 2011. This file will be 11 years old in two weeks. It's been around for quite some time, because no matter what point in time the language exists at we have valid programs that we can write because of how the language is structured, that we wouldn't want to write, but we should still test for to make sure we don't break any edge cases and invariants.
There are a lot of programs that rustc
can accept and produce code for, but that we as programmers would find are not easy to read, helpful, or do anything we'd actually want a computer to do. This test suite has an equivalent analogy in the English sentence Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo
. It follows all of the correct grammar of the language, however, it is nonsensical, but is also technically allowed. weird-exprs.rs
is the same thing. Here is one of those test cases from the file and trust me when I say this is the least gnarly one to look at.
fn evil_lincoln() {
let _evil = println!("lincoln");
}
It just prints "lincoln"
and assigns the unit type returned from println
to _evil
. Sure you can assign the unit value to a variable, but we wouldn't actually want to do that in the code we write. This is what I mean by valid, but not helpful, code. Now you might be wondering, "Why is it called evil_lincoln
?". Well let's look at it in the original commit that added it.
fn evil_lincoln() {
let evil <- log "lincoln";
}
As you can see,println
used to be called log
and for anyone not up to date on US Presidential History Lincoln lived in a log cabin he had built himself in Illinois back in the day. The name of the test survived, but not the pre 1.0 syntax.
Rust and what it looks like has both changed a lot and not a lot in the 11 years since this test was added. We use println!
a macro, not log
a built in keyword, and we assign with equals not left arrow these days. weird-exprs.rs
has existed through most of Rust's creation and every step of the way since its inception it has guarded the language from breaking its grammar and parsing rules. It was here before many of us started using Rust and it will be here longer than many of us will be.
Now that we understand just where this file came from, let's start looking into a few choice cases to learn a bit more about how Rust works and let's get weird.
Let's Get Weird
#![allow(non_camel_case_types)]
#![allow(dead_code)]
#![allow(unreachable_code)]
#![allow(unused_braces, unused_must_use, unused_parens)]
#![allow(uncommon_codepoints, confusable_idents)]
If you look at the file today you can see these allow pragmas meaning we're going to have a good time because we're allowing ourselves to use the good code. Do note I can't cover every case with the time we have today and so I have chosen the ones I think we can learn the most from. If you want to see all of the tests, which I really think you should, they're absolutely fascinating and will make you scratch your head for a bit as you figure them out, then I suggest reading them all here at src/test/ui/weird-exprs.rs
in the rustc repo at commit (491eb1)e6c9fac20faad11e5da16db3185b2410d
.
if true {
println!("Hello I'm true");
}
Let's talk about if
. if
lets you test some expression for a boolean value of true and if it's true executes the block. In this case we check it's true and then execute the println!
macro.
if false {
println!("Hello I'm true");
} else {
println!("Hello I'm false");
}
We can also have an optional else block that let's us do something in case it's false. In this case we'd print "Hello I'm false". Did you know that if is an expression though in Rust and not a statement? This means it's far more flexible and can be used in cases that expect an expression such as variable assignment.
let x = if true {
1
} else {
0
};
This means you can test a condition and assign it a value depending on whether the condition is true or false so long as the type in both blocks is the same. Did you know that the condition if accepts is also an expression. The only thing the condition must do is evaluate to a boolean.
if {
println!("Evaluating the expression block");
true
} {
println!("Hello I'm true");
} else {
println!("Hello I'm false");
}
In this case we use an expression block which will execute everything inside the block first. In this case we print the sentence "Evaluating the expression block", before evaluating and returning whatever the final value is without a semicolon, in this case true. Something to note about expression blocks is that they always return a type, it just defaults to the unit type if there is no type.
match 1u8 {
1 => println!("I am one"),
_ => println!("I am neither one or zero"),
}
Now let's talk about match for a second. It is also an expression that takes an expression, in this case the value 1 which is a u8, and pattern matches on it. It tests the patterns in order of writing and then stops at the first one that matches the pattern and then it executes the expression to the right of it. Rust forces you to match every possible pattern and so we can use _
as the catchall pattern for every other number that a u8
can be. In this example 1 matches on 1 and prints "I am one".
let x = 1u8;
match x {
_ if x.is_even() => println!("I am an even number"),
_ if x.is_odd() => println!("I am an odd number"),
_ => unreachable!("Broke math itself, x is neither odd or even"),
}
We can go even further with match
as each pattern can accept an optional if guard alongside the pattern. Here we check the first pattern for x and since it's the catchall we match and then ask if x is even. We see that it's not and move on to the next pattern. We know the pattern will match and then check x is odd which it is since 1 is an odd number and then print "I am an odd number".
Let's recap:
- if is an expression
- if accepts expressions that can evaluate to a boolean
- match is an expression
- match patterns can have an optional if expression
- expressions that evaluate to a value can be assigned to a variable
Now you might be seeing where I'm going with this. If you can put an expression inside an expression and if is an expression then you can get something like this test from weird-exprs.rs
which is designed to test that you can arbitrarily nest if expressions and that you can nest them in match expressions
fn match_nested_if() {
let val = match () {
() if if if if true {true} else {false} {true} else {false} {true} else {false} => true,
_ => false,
};
assert!(val);
}
So let's walk through what this test is doing.
- We first match on the unit type and go to the first pattern
- We see the pattern is the unit type and so we now need to check the nested if statements
- We evaluate the first statement
if true
and see we should use theif
block, not theelse
block which evaluates totrue
- We then use the true value from the first
if
as the input to the secondif
which means we choose theif
block, not theelse
block which evaluates totrue
- We then evaluate the third
if
and just like the previous two we choose theif
block and not theelse
block which evaluates totrue
- Then we check if the last
if
istrue
for thematch
pattern which it is and so we choose that pattern in thematch
statement. - We then assign the value
true
to the variableval
- We then assert that
val
istrue
, which it is, and so the test passes
Let's kick things up a notch and talk about functions.
fn example() {
println!("I am a function")
}
We can define a function example
with the above syntax and when called it will execute anything in the block, in this case it will print out "I am a function". Okay that's easy. Now we can also define functions for types like so:
struct Bar;
impl Bar {
fn example(self) {
println!("I am a function that takes Bar as input");
}
}
Here we have a struct Bar
that we impl
a function example
for that takes Bar
by value with the self
argument and then prints out "I am a function that takes Bar as input".
Now what if we changed it so that it also returned a type Bar
instead of printing out something? It would look like this:
struct Bar;
impl Bar {
fn example(self) -> Self {
Self
}
}
Where the only difference is we're returning Self
which is an alias for Bar
here given we're inside an impl block. This would let us write some interesting code because then we could call example
as many times as we wanted like this:
let bar = Bar.example()
.example()
.example()
.example();
We could keep chaining example
here because we'd create a new Bar
in each call to example. I promise this is going somewhere. I want to talk about function traits for a second. These let us pass functions into other functions generically. Here's what I mean.
fn example<F>(function: F)
where F: FnOnce() -> String
{
println!("function string: {}", function());
}
example(|| String::from("foo"));
example(|| String::from("bar"));
We define a function example
that has a generic parameter F
that is the type of the argument function
. We restrict what F
is with a where
clause saying that it must impl FnOnce
and that it takes no args and returns a String
. We then define a body that will print out "function string: " with the value of the function that we pass in that we invoke. In this case foo
and bar
. We should note closures implement FnOnce
as do named functions. Let's recap:
- We can define functions that when called will execute what's inside the block
- We can implement a function for a type that will take itself as the input
- We can have a function return itself as a type
- We can therefore call a function over and over again
- We also have a trait
FnOnce
that means whatever implements it can be invoked as a function that takes itself by value and it can have 0 or more arguments and it can return a type.
Now FnOnce
is a trait right? Sure closures and named functions inherently implement it, but shouldn't we be able to implement it for any type we want so that we can call it as a function? We can on nightly Rust and therefore in weird-exprs.rs
we do.
fn function() {
struct foo;
impl FnOnce<()> for foo {
type Output = foo;
extern "rust-call" fn call_once(self, _args: ()) -> Self::Output {
foo
}
}
let foo = foo () ()() ()()() ()()()() ()()()()();
}
First we define a struct named foo
inside of a function, which you can do! It limits it to only being created inside the function. A neat trick if you need an intermediate type not exposed in the API. We then impl FnOnce
for foo by saying it takes no args by using a tuple of size 0 and that it's return type for the function call should be another type foo
. We then we define the function body where we return a new type foo
after taking the old one by value with self
and dropping it at the end of the function body. We then call foo
and much like before where we kept calling example
we just keep invoking foo
as an FnOnce
function and eventually assign foo
to the variable foo
. This test is just making sure that yes we can arbitrarily nest function calls one right after the other, so long as a function returns a function that you can call, which yes you can do so by just having a function return a function that impls Fn
, FnOnce
, or FnMut
.
Now let's talk about loops and the never type.
loop {
println!("stdout go brrrrrrrr");
}
With loops we have a loop
keyword that will let us infinitely run the code inside the block. We can end a loop by using the break
keyword.
loop {
println!("stdout go brrrrrr");
break;
}
Now here's a neat thing, loop
is an expression and so it can go wherever you'd want an expression. This means we can return a value from a loop like so:
let x = loop {
break 5;
};
assert_eq!(x, 5);
Here we return a value five from the loop
by calling break
and assign it to x
We then assert it's equal to five! Now loops also have one other keyword, continue
which means "stop evaluating this loop and start from the beginning".
loop {
println!("stdout go brrrrrr");
continue;
unreachable!("can't touch this");
}
Here we print out "stdout go brrrrrr" and then restart the loop
again and never hit the unreachable statement. Okay so we know a bit about loops, but what about that never type I mentioned. never
is an inbuilt primitive. Let's look at a quick example.
let x: ! = {
return 123
};
Here the exclamation point is how we represent the never
type. Since we return
from the function early with the value 123
we can never assign it to x
. never
is how we represent things that we can't construct or code we will never execute. Some control flow statements are this never
type. return
is one, but so are continue
and break
as they cause the code to stop where it is and jump to some other place. Another interesting thing is that these keywords that are never
can type check to anything as you will never
need the type for that part so for example:
let x: String = return;
let x: i32 = return;
This compiles just fine as we exit the function early and can't assign any value to x, no matter what type it is.
Let's recap then:
- We have a loop keyword which loops infinitely
- We can use break to exit a loop or exit a loop and return a value
- We can use continue to start at the beginning of a loop
- Some control flow words like
break
andcontinue
are what's known as thenever
type never
type checks as any type
With this we're ready to look at our next test case.
fn angrydome() {
loop { if break { } }
let mut i = 0;
loop { i += 1; if i == 1 { match (continue) { 1 => { }, _ => panic!("wat") } }
break; }
}
For the sake of brevity I'm gonna just show you the control flow with this handy diagram. I'm kidding, but the important part here is what this test is testing for which is that break
and continue
can be used anywhere and type check, while still letting the loop execute where it can. A bit nonsensical for control flow, but absolutely necessary.
Let's talk about keywords. Rust has 3 types of keywords, reserved, strict, and weak. Reserved key words are words that we might use in the future, but have no purpose yet. They're reserved so that no one uses them in their code which could cause it to break in later versions of the Rust compiler where they become used for something. We also have strict key words which means these words cannot be used as the name of items, variables, function parameters, fields, variants, type parameters, lifetime parameters, loop labels, macros, attributes, macro placeholders, or crates. Words like loop
, return
and fn
fall into this category. We also have weak keywords. These are only special in certain contexts and so can be used in places you couldn't use strict keywords. union
, macro_rules!
, and 'static
are the weak keywords. dyn
was also a weak keyword in 2015 edition, but was promoted to a strict one in 2018.
Now with all of this in mind what about primitives in Rust like u8
. Is it not a strict keyword? My editor higlights u8
as well as other keywords so certainly it's a special word. It's not. It's just an inbuilt type which means we need to test that primitives can be used anywhere as a name you would not be able to for strict keywords like so:
fn u8(u8: u8) {
if u8 != 0u8 {
assert_eq!(8u8, {
macro_rules! u8 {
(u8) => {
mod u8 {
pub fn u8<'u8: 'u8 + 'u8>(u8: &'u8 u8) -> &'u8 u8 {
"u8";
u8
}
}
};
}
u8!(u8);
let &u8: &u8 = u8::u8(&8u8);
::u8(0u8);
u8
});
}
}
This function is a bit hard to parse, but let me break it down for you. This function first checks that the argument input which is a u8
named u8
is not equal to zero which we are specifying is a u8
explicitly. We then assert that a value eight of type u8
is equal to the return value of an expression block which defines a macro u8
that takes the literal token u8
to define a module u8
with a function named u8
with a named lifetime 'u8
, that must outlive the lifetimes 'u8
and 'u8
, with an argument u8
that's a ref'u8
of a u8
that returns an &'u8
of type u8
. In that function it creates an &'static str
with the value u8
and returns the function argument u8
. We then call the macro u8
with the argument u8
, create a variable named &u8
, with type &u8
, to assign it &8u8
as the value after calling the function u8
from the module u8
and then we call the original function u8
recursively with the value 0u8
, which hits the if statement and returns, and then return the value u8
from the block which in this case is eight since we pass in that value in the test case meaning eight is equal to eight! See not hard to understand, just hard to parse.
Conclusion
It's been a short yet dense journey that I've taken you on today. We've seen a lot, maybe too much for mortal eyes. I sincerely apologize for showing you even a fraction of the weird-exprs.rs
test file, but I hope you see the necessary evil that this file is in order to have the language we have today. I hope you go forth to read the rest of the file and just write more weird code. It's fun even if it isn't useful and you can learn a thing or two about Rust you didn't even know was possible. Especially now that you know about Weird Expressions and Where to Find Them. Thank you for your time.