The scheme-to-llvm compiler is a compiler from a small subset of Scheme to
LLVM's assembly form intermediate representation. The
compiler was original developed for an invited workshop at
2020 edition of the European Lisp Symposium:
ELS2020, and you can
find slides for the talk and a link to the presentation video. The compiler is
based on a student compiler written to use the nanopass framework, but
enhanced to use some of the techniques from Chez Scheme, which is also the
host compiler.
Expr -> <x>
| <imm>
| (quote <d>)
| (if <Expr> <Expr>)
| (if <Expr> <Expr> <Expr>)
| (and <Expr>* ...)
| (or <Expr>* ...)
| (not <Expr>)
| (set! <x> <Expr>)
| (begin <Expr>* ... <Expr>)
| (lambda (<x>* ...) <Expr>* ... <Expr>)
| (let ([<x>* <Expr>*] ...) <Expr>* ... <Expr>)
| (letrec ([<x>* <Expr>*] ...) <Expr>* ... <Expr>)
| (<Expr> <Expr>* ...)
| (<pr> <Expr>* ...)
Where x is a variable, imm is an immediate, d is a Scheme datum in our
target language, and pr is a primitive.
- An immediate is a signed 60-bit integer (fixnum), a boolean (
#tor#f), or null ('()). - Datum is a pair of datum, a vector of datum, or an immediate
- Variables are represented as Scheme symbols.
- Primitives are also represented as Scheme symbols, though they are limited to the following table:
| primitive | arity | argument types | return type |
|---|---|---|---|
+ |
2 | fixnum, fixnum | fixnum |
- |
2 | fixnum, fixnum | fixnum |
* |
2 | fixnum, fixnum | fixnum |
cons |
2 | any, any | pair |
pair? |
1 | any | boolean |
car |
1 | pair | any |
set-car! |
2 | pair, any | void |
cdr |
1 | pair | any |
set-cdr! |
2 | pair, any | void |
make-vector |
1 | fixnum | vector |
vector? |
1 | any | boolean |
vector-length |
1 | vector | fixnum |
vector-ref |
2 | vector, fixnum | any |
vector-set! |
3 | vector, fixnum, any | void |
void |
0 | void | |
< |
2 | fixnum, fixnum | boolean |
<= |
2 | fixnum, fixnum | boolean |
= |
2 | fixnum, fixnum | boolean |
>= |
2 | fixnum, fixnum | boolean |
> |
2 | fixnum, fixnum | boolean |
eq? |
2 | any, any | boolean |
boolean? |
1 | any | boolean |
fixnum? |
1 | any | boolean |
null? |
1 | any | boolean |
procedure? |
1 | any | boolean |
In order to run the compiler you need the following dependencies:
- LLVM w/Clang installed
- If you have LLVM/Clang 10 installed you can use the set the parameter
use-llvm-10-tailccto#tto use thetailcccalling convention added in LLVM 10
- If you have LLVM/Clang 10 installed you can use the set the parameter
- Chez Scheme: https://github.com/cisco/ChezScheme
- The nanopas framework: https://github.com/nanopass/nanopass-framework-scheme
With those installed, you can load compiler into Chez Scheme as follows
(assuming you are starting from the scheme-to-llvm directory).
$ scheme --libdirs <path-to-nanopass-framework>:src/main/scheme
Chez Scheme Version 9.5.3
Copyright 1984-2019 Cisco Systems, Inc.
> (import (c)) ;; import the example compiler
> (tiny-compile
'(letrec ([factorial (lambda (n)
(if (= n 0)
1
(* n (factorial (- n 1)))))])
(factorial 10)))
3628800
Note that under the covers the tiny-compile produces a file called t.ll,
uses clang to compile and link this LLVM IR code into an executable in t.
This application writes the result of the program to a temporary output file,
and tiny-compile uses the Chez Scheme reader to pull read the result.
There is also a set of tests included in the compiler (originally from the
student compiler), which you can run with test-all or analyze-all.
test-all will run all of the tests until one fails, while analyze-all will
run all of the tests and print . for successful tests, F for tests that
produced the incorrect results and E for tests that raised an exception.
Optionally test-all can also be passed a boolean to indicate if it should be
noisy, which when #t will print each test as it begins compiling and
running it.
Finally, passes can be traced with the traced-passes parameter, any pass that
is traced will print the output produced by the pass. traced-passes attempts
to be flexible in specifying the passes:
(traced-passes 'pass-name)will add'pass-nameto the list of traced passes if it is not in the list, or remove it from the list if it already is;(traced-passes '(pass-name1 pass-name2 ... pass-nameN))is equivalent to callingtrace-passeson each'pass-nameJ;(traced-passes '())or(traced-passes #f)clears all tracing;(traced-passes #t)traces all passes; and(traced-passes)returns the current list of passes.
You can see the full list of passes by referencing the variable all-passes
The source layout is very simple in this example compiler. All of the source
code is in the src/main/scheme directory. The entire compiler and tests are
in the file c.sls. The parse-scheme pass makes use of the s-expression
pattern matcher defined in match.sls, and various parts of the compiler make
use of some of the type and primitive definitions in d.sls. If you look at
d.sls you'll notice there are a number of data types and primitives not
supported in the current compiler. This was a very simple start at building a
more complete Scheme implementation, but is, as of yet, unimplemented.