TinyLISP is a very small implementation of LISP which was originally based on
the "eval/apply" loop in the superb Structure and Interpretation of Computer
Programs (Abelson, Sussman, Sussman). Per the name, the codebase of TinyLISP
is quite minimal (e.g., the only built-in logic operator is NAND), currently
under 1.5kloc. (The previous ceiling of 1kloc was breached by coroutine
support.) It also has very little external dependencies, for portability; the
entire external interface (as defined in the tl_interp structure) consists
only of a function to read a character, put a character back to be read again,
and an output function similar to libc's printf.
The implementation is written in standards compliant C (ideally ANSI C, although some LLVM- and GNU-specific areas are in architecture-specific areas), and should be portable to other compilers that adhere to these standards. The basic executable should be usable wherever a POSIX-compliant libc (threading not required) is available. Special configurations may introduce special compilation requirements; see especially Small Systems below.
make help in the repository root prints out a plethora of build information.
It's more up-to-date than this README; read it carefully, and type make when
you're ready.
make run in the repository root. Internally, this rule runs:
cat std.tl - | $(DEBUGGER) ./tl
...which loads the std.tl "standard" library to provide access to more common
LISP functionality. It is worth noting that the majority of TinyLISP's language
is implemented in itself, using its powerful metaprogramming capabilities;
refer to the language specification in the documentation.
Many variations are possible; run make help to get a full list of influential
variables and their documentation.
As of late, on UNIX, the above can be emulated as
$(DEBUGGER) ./tl std.tl
but the former syntax works as well.
Scripts can thus be run as
./tl script.tl
and multiple scripts can be run sequentially by providing them in order, as in
./tl std.tl lib.tl script.tl
but note that the interpreter will expect more data from stdin after all
scripts run. If this is undesirable, the standard interpreter provides a
built-in tl-exit which, when applied to an int, exits the interpreter with
that "exit status". Meanings differ between platforms, but POSIX generally
holds that only the least significant 8 bits are meaningfully provided to a
waiting parent process; what is done with this is, again, platform dependent,
but a POSIX shell usually interprets a value of 0 as "success" or "true" and
any other value as "failure" or "false", as is useful in the interpretation of
conditional expressions.
On POSIX systems, when running the interpreter using a non-TTY (according to
isatty) as input, the interpreter automatically suppresses prompt characters
(> ) and statements about the value read or any result of evaluation which is
merely "true" (tl-#t) in the REPL. (Most built-in functions that don't return
a semantically-meaningful value, such as tl-define, return tl-#t.) This is
intended to make interoperability with pipelines easier. If this behavior is
undesirable, the default interpreter has a built-in tl-quiet which accepts
the following integer values:
0: (QUIET_OFF) The default when run from a TTY, this shows a prompt (>) before reading each expression, announces the expression as read (Read:), and prefixes the result of evaluation withValue:). It also printsDone.when end-of-file is encountered.- 1: (
QUIET_NO_PROMPT) This does not show the prompts (>,Read:,Value:, andDone.), but still displays the value read and the result of evaluation. - 2: (
QUIET_NO_TRUE) The default when run from a non-TTY, this suppresses printing the value read and printing the true valuetl-#t. Other values are still printed. - 3: (
QUIET_NO_VALUE) This suppresses all printing of values.
tl-display continues to print out formatted values in all cases.
Refer to main.c for a typical way to pass control to a running interpreter.
In general, the major interface will be the pair tl_eval_and_then to
initialize the evaluation of an expression, followed by tl_run_until_done to
crank the interpreter until the final continuation is called.
In environments where indefinite pauses are not permissible (such as hard real
time environments), or where concurrent interpreters may be interleaved, one
may also use tl_apply_next directly, which will do one apply operation (a
fundamental interpreter step) and return a status code:
-
TL_RESULT_DONE(0): The interpreter has finished without error, and no more work remains to be done. -
TL_RESULT_AGAIN(1): The interpreter has finished one operation, buttl_apply_nextmust be called again to make further progress. -
TL_RESULT_GETCHAR(2): The interpreter needs a character. In a synchronous environment, one can usetl_getc; in asynchronous environments, one must wait for the next event. In either case, the character received should be pushed to the value stack as atl_intbefore the next call totl_apply_next. (See the documentation on the Continuation and Value Stacks below.)
Note that the latency of other methods, such as C functions called directly or indirectly by TL code, cannot be controlled by TL.
On ELF systems, TL supports INITSCRIPTS, embedded binary data that contains
programs that are "read from input" initially. This may be useful to set up a
standard environment in embedded applications. See the Makefile for
further details.
TinyLISP can be augmented with loadable modules, provided a suitable dynamic
linker interface exists. The default on Unix-like systems is to assume dlopen
and dlsym are functional. See main.c for the responsibilities of
the loading function. Modules may also be statically-linked, as is the default
on small systems (see below). Statically linked modules are not automatically
loaded by the interpreter interface; they have to be discovered in the link
process. See, again, main.c, which does this after initializing the
intepreter. In general, downstreams may call statically initialized modules in
any way, at any time.
Modules, in mod, are not counted toward TinyLISP's line count. They do,
however, show how one could implement additional functionality, such as more
general transput and memory-handling capabilities. More libraries are subject
to be added at any time, and these are not necessarily subject to the same
scrutiny as TinyLISP's own code. Cautious downstreams may wish to set MODULES
in the build process to a space-separated list of vetted modules, or an empty
string.
TinyLISP has an embedded "minilibc", which contains exactly enough libc to allow TL to run, based on five interface functions defined in arch.h: file operations (read, write, flush), memory operations (get heap, release heap), and halt. This interface is designed to be intentionally simple to implement--in particular, heaps can be large and are suballocated, and no assumptions are made about file descriptors other than 0, 1, or 2 (standard in, out, and error).
To use it, pass USE_MINILIBC=1 to a make invocation, optionally passing
MINILIBC_ARCH= one of the choices below.
-
linsys: Uses Linux x64 system calls directly. Aside from testing purposes, the resulting binary has no dynamic dependencies--it is "statically linked". -
wasm: Compiles to WASM. This feature usually requiresCC=clang. The resulting module has a few extra symbols intended to help embedding. Additionally, no bit packing is done. All the interface functions are imported, butfgetcis generally advised against on JS platforms that don't support blocking (such as the Web APIs), sotl_apply_nextshould be used directly instead. An example implementation is in thewasmdirectory, which can be used if this repository is served from a standard HTTP server. -
rv32im: A RISC-V RV32IM image is built. This is designed to be hosted in the RISC-V SoC in Logisim-evolution. A basic memory map is:- 0x000_000 - 0x100_000: ROM, text sections and rodata loaded by the ELF;
- 0x100_000 - 0x300_000: a "JTAG"-like interface
- 0x300_000 - (0x300_000 + MEM_SIZE (default 0x100_000)): RAM
These are the defaults, but can be overridden by adding to
DEFINES. See rv32im.c for details.
Other implementations are welcome to be added!
minilibc is not designed to be performant nor standards-compliant, and is not
counted toward TinyLISP's own line counts. However, it is implemented in much
the same brevity as TinyLISP, and thus can serve as a didactic example of libc
internals. In particular, it defines many printf variants and a functional
malloc based on the K&R malloc, but with adjacent region merging.
tl_read in read.c is the lexical analyzer. In short, a lexeme may be:
"(double quote) bounds a symbol, which may contain any character other than";()(parentheses) bound a proper list of lexemes;- ... except that a
. DATUM )at the end of a list creates an improper list; in particular(A . B)is a standardconspair. Nowhere else is.special, and it is treated as a symbol in those places.
- ... except that a
- all digits introduce decimal numbers;
- whitespace terminates lexemes, including symbols, and is skipped;
;and everything following it until newline is ignored (as comments);- every other character represents itself within a symbol.
There is one caveat: at runtime, a TinyLISP program may introduce "prefices"
using tl-prefix, which translates into a proper list with the head being the
specified symbol or expression, and the second item being the entire prefixed
expression. This mechanism is used to implement ' as quote, for example. In
the current iteration, prefices may only be one character.
Evaluation rules are as follows:
- Numbers, as well as all forms of cfunction or LISP function, evaluate to themselves ("are self-valuating");
- Symbols evaluate to their "bound" value in the current environment (see below);
- Lists represent function application, with the head assumed to be callable, and the tail as arguments.
For example, the expression:
(+ 3 5)
Evaluates:
- The application, which pushes a continuation as it must evaluate:
+, a symbol, which evaluates to the binding totl_cfbv_add, a cfunction by-value callable, which invokes immediate evaluation of the arguments:3, which self-evaluates to 3,5, which self-evaluates to 5,
- ...once the callable (and all arguments) have evaluated, the function (
tl_cfbv_add) is applied with the argument list ((3 5)), producing the value 8.
As another carefully selected example, consider:
(define foo #t)
...which evaluates:
- The application, which pushes a continuation as it evaluates:
define, which evaluates to the binding oftl_cf_define, a non-by-value cfunction--therefore, the syntax of the arguments are captured,
- the function itself (
tl_cf_define) is applied to the arguments(foo #t), within which the second argument is pushed for evaluation:#tevaluates totl-#twithin the interpreter;
- ...and the continuation for
tl_cf_definebinds this value#tl-tto the symbolfoowithin the current environment, and evaluates totl-#t.
These examples demonstrate that TL has two modes of evaluation: strict
(call-by-value, or direct) evaluation, as with + above, and non-strict
(call-by-name, or syntactic) evaluation, as with define. Additionally, TL
can mix them freely; the result of define can be evaluated eagerly within
another function, or be deferred within another syntactic invocation. This
dichotomy is available to language users, too, as the difference between
lambda and macro, respectively. It is the power of this dichotomy that
allows LISP to effortlessly intermingle data and code; for a complete example,
see the definition of quasiquote in std.tl.
eval is the arrow between syntax and value. The internal implementation is
tl-eval-in&, or tl_eval_and_then, which captures a syntax and evaluates it
to a value (eventually--so long as the continuation is actually reached, for
example), which it then passes to its continuation. Note that TL must maintain
an in-code distinction between syntactic and direct values; for example:
(call/cc (lambda (cc) cc))
... successfully binds a continuation object to cc in the scope of the
lambda user function. However:
(call/cc (macro (cc) env cc))
... will attempt to call the user macro with a continuation as an argument;
as described below, continuations are values that do not have a
self-valuating syntax, since they capture the delicate internal state of the
interpreter. Because macros expect a syntax to capture, the interpreter
throws an error ("invoke macro/cfunc with non-syntactic arg") and abandons
the evaluation.
The general rule for language users is as follows: if you expect a real value,
use lambda; if you expect syntax, use macro. For example, 2, (begin 2),
((lambda () 2)), and (+ 1 1) are several different syntaxes that all result
in the same direct value (2). In the mixed case where some arguments need to
be syntactic (using define as a prototypal example), use a macro to capture
syntax, and selectively use tl-eval-in& or variants to convert some syntax to
values.
TL macros are more general than some other implementations of "macros" that are only allowed to rewrite syntax. A common idiom for implementing those macros in TL is:
(define example (macro (arg) env (tl-eval-in env `( ... ))))
where ... is the syntactic translation, using the appropriate quasiquote
macros. But, of course, macros are not so restricted--they can implement
arbitrary functionality, including invoking other macros. This can be used for
advanced features, such as lazy evaluation or domain specific languages (DSLs).
quasiquote is, in fact, implemented as a macro that processes its argument as
a DSL, and so symbols such as @ and , have no special meaning outside of
it.
In short, the TL evaluator is a "Call by Push Value" (CBPV) interpreter, and can thus implement many different modalities of lambda calculus evaluation.
TinyLISP is not a total language; it is possible for computations to not only
run indefinitely, but also to be abandoned. While there are many ways to do
this (including invoking a continuation, as described below), likely the most
mundane is the generation of an error. Various built-in functions will raise
errors if they are inappropriately applied (to the wrong types of arguments, to
invalid numbers of arguments, or--as above--to direct values when they are
expecting syntax, and so forth). User code can invoke the built-in function
tl-error to raise an error directly.
TinyLISP has a so-called "rescue stack", which is initially empty. When it is
empty, an error will propagate directly to the C top-level, wherein
tl_apply_next returns TL_RESULT_DONE (which ends any running
tl_run_until_done call), and the interpreter's error field will be
non-NULL. (This means the empty list, which is incidentally represented as
NULL, is never a valid error value.) An interpreter in error state will not
resume until error is cleared; usually, the entire computation is abandoned
with tl_interp_reset before beginning a new evaluation. However, it is
possible to restart the same computation, on the understanding that the values
may not be sensible due to the error condition; most built-in functions will
return tl-#f (the standard "false" symbol), which may cause further errors.
User code can use tl-rescue, which expects a callable argument that takes no
parameters, to push a continuation onto the rescue stack. When this stack isn't
empty and an error is generated, the error object is passed to the topmost
continuation--in effect, the program behaves as if tl-rescue returned the
error object. If no error is generated, tl-rescue evaluates to the result of
evaluating the callable. Note that there is no guaranteed way to distinguish
successful results and error objects without construction: it may be
necessary to structure the evaluation of the callable to guarantee that a
returned value is unlikely to be mistaken for an error.
C programs may push continuations (or a TL_THEN C continuation) onto the
rescue stack as well, provided it is careful to balance with an appropriate
TL_DROP_RESCUE special pushed onto the continuation stack simultaneously.
This permits C code to react to errors without having them propagate to the
top-level tl_run_until_done loop if so desired. See the comments in eval.c
and relevant definitions in tinylisp.h for more details.
While this mechanism can be used for non-local stack-based return like C's
setjmp/longjmp, it is internally built on continuations, which are a far
more powerful non-local control flow mechanism; as such, most TL users are
expected to use them directly instead; see Continuations, below. Nevertheless,
errors are the correct choice to represent divergent computations, usually due
to incorrectness.
User functions (both lambda and macro) are defined with a parameter
specification as their first argument. An argument list, as from the
application of a user function, is applied to the parameter specification
recursively as follows:
- If the current parameter is a pair, the first argument value is bound to the first symbol (see Environments, below), and this process recurs with the remainder of the parameters and arguments;
- If the current parameter is a symbol, all remaining arguments are bound to that symbol.
It is an error if the first case is reached and either the current parameter or current argument is an empty list; such errors are "arity" violations. (C functions may emit arity violations, although such checks are built-in.) Thus, the length of the parameter list, or whether it is a list, determines the arity of the function.
For example: (lambda (a b c) ...) defines a function with arity exactly 3; it
will fail when applied with less or more than 3 argument values. (lambda (a b . c) ...) defines a function with arity at least 2, where symbol c evaluates
to a (possibly empty) list of arguments beyond b. (lambda a ...) defines a
function with arbitrary arity, where all arguments are assigned to the symbol
a. Without loss of generality, this applies to macros as well.
A TL environment is a proper list of frames; a frame is a proper list of bindings; and a binding is a pair of a symbol and an object. Symbols, when evaluated, are looked up in the current environment. Lookup proceeds in frame order, which means bindings in earlier frames shadow bindings in later frames.
The built-in tl-define creates or updates a binding in the first frame only.
The built-in tl-set! will update a binding in any frame, or create it in
the first frame if it does not exist. While tl-set! permits greater
flexibility, tl-define is more performant, and the use of tl-define
encourages further performance gains due to data locality.
The current environment can be accessed with the built-in tl-env function
without arguments. When the interpreter begins, this environment has only one
frame; that environment can be retrieved as the value of the tl-top-env
function.
User-defined functions (lambda or macro) store the environment at the time
of their definition. When applied, a new frame is prepended to the stored
environment. In the process of evaluation of a user function:
- This new frame is set up with the formal parameters bound to the argument
values (themselves evaluated to direct values if the function is a
lambda) according to the General Parameter process above; and - the user code is pushed onto the continuation stack, tail-first, in the new environment.
This mechanism allows users to create new scopes dynamically. It will be seen in Continuations, below, how this can be used to create encapsulation.
A user function, as well as a continuation, can be passed as the sole argument
to tl-env; this returns the stored environment of the function or
continuation. tl-set-env!, given a function (or continuation) and an
environment, will set the object's stored environment. The stored environment
can be passed to the built-in tl-eval-in& to evaluate an expression in that
environment, effectively allowing one to operate within the environment without
invoking or resuming the function or continuation.
Macros, when defined, receive an additional symbol after their parameter
specification, and before their body, called the "environment name". This
symbol is bound to the environment in which the macro was applied. This value
is suitable for use with tl-eval-in& to evaluate code in the scope of the
macro's invocation.
TL has first-class support for continuations, which are a value that represents
a suspended computation. The core function is
tl-call-with-current-continuation (which std.tl binds to the more-brief
call/cc), which invokes its argument with a continuation representing the
state of the interpreter as of entering call/cc. Typical usage looks as
follows:
(call/cc (lambda (k) ...))
This expression evaluates, absent any syntax which calls k, to the result of
calling the lambda, which is usually the tail position of .... However, if
k is called, such as via (k 7), evaluation resumes as if this particular
call/cc had evaluated to 7 instead. Arguments to continuations, like those
to a lambda, are evaluated in the current environment before control is
passed.
Within the lambda above, this can be interpreted as implementing "early
return"; for example:
(call/cc (lambda (return)
...
(if not-worth-continuing
(return #f)
...
)
))
In this example, the second ... is evaluated if and only if the computation
hasn't been "abandoned" by a true value of not-worth-continuing causing the
invocation of return. This trick is used to convert the built-in
tl-eval-in& to the more straightforward tl-eval-in often used in macros:
(define tl-eval-in
(lambda (env ex)
(call/cc (lambda (ret) (tl-eval-in& env ex ret)))))
The ret continuation returns to the call/cc, which is in the tail position
of tl-eval-in.
It's worth noting that k, unlike C's setjmp/longjmp, can outlive the
called function. Indeed,
(define current-continuation
(lambda () (call/cc (lambda (cc) cc))))
returns the continuation entered by calling this user function. (The real
definition in std.tl is a macro to avoid issues with scoping, but has largely
equivalent behavior.) Used carefully, this continuation can be used as a way to
pass "messages" back into the continuation's scope, not unlike Smalltalk:
(define make-object (lambda (state)
(set! message (current-continuation))
(cond
((= (tl-type message) 'cont) message)
((= (cadr message) 'foo) ((car message) ...))
((= (cadr message) 'bar) ((car message) ...))
)
))
The frame that holds state and message as local values is fully held by the
returned continuation; this means that the ... can use it as mutable state,
where it can be used, for example, to implement private member variables. To be
sure that this continuation escapes, the first cond branch uses the internal
tl-type function to recognize whether the value is a continuation, and simply
returns it if it is. Otherwise, cond dispatches to "methods", recognizes by
their first item; this idiom can be used as follows:
(define send-message (lambda (obj . args)
(call/cc (lambda (ret) (obj (cons ret args))))
))
(define object (make-object 3))
(send-message object 'foo)
(send-message object 'bar 1 2)
(send-message object 'baz object (+ 5 6))
In order to avoid escaping back to (define object ...), the methods pass a
"return continuation" as the first element of a list; this is a common pattern
allowing for cooperative coroutines, and moreover allows one to define a
suspendable computation graph independent of the call stack (permitting
scheduling and yielding, a la "async/await" in other languages). Note that the
number of times a continuation can be invoked is arbitrary.
Internally, a continuation is a triple of the continuation stack, the value
stack, and the environment. These three objects are, together, considered to be
"the state" of the interpreter at any given time.
tl-call-with-current-continuation captures its invocation state, and then
pushes this object as the sole (direct) value applied to its argument.
Resumption is implemented as a special case in tl_apply_next, the
interpreter's "next state" quantum, where it restores the state from the
continuation and pushes its sole expected argument. The value can be direct or
syntactic; if it is syntactic, it is evaluated in a substate (pushed on the
continuation stack).
In fitting with the Call-by-Push-Value mode, TL interpreters maintain two
stacks, conts (continuations) and values. The value stack is, at all
times, a proper list (whose top is the car) of pairs of objects and a
"syntax" flag--this is tl-#t if the value is syntactic, and tl-#f if the
value is direct. (The intepreter has two fields, true_ and false_, that
cache these symbols, so comparison is fast.) The continuation stack is more
complicated, and represents, more or less, a call stack for the running
computation; it, too, is a proper list used as a stack, whose elements are
improper lists of the form (len expr . env). len may take some special
values, all of which are negative:
-
TL_APPLY_PUSH_EVAL(-1):expris a syntax, andenvis the environment in which it shall be evaluated. Usually, this syntax is an application, which is handled bytl_push_evalby "indirecting" into a more primitive application on the continuation stack, as described below. -
TL_APPLY_INDIRECT(-2): what would ordinarily beexpris taken from the value stack; it must be a direct callable value. Theexprposition of the entry contains thelento be used in the application. This is pushed when a complex expression needs to evaluate its callable before it can apply it. -
TL_APPLY_DROP_EVAL(-3): asTL_APPLY_PUSH_EVAL, but the value is silently discarded. This is the typical case for non-tail-position user code in a user function. -
TL_APPLY_DROP(-4): ignoresexprandenv, and simply drops the top of the value stack. This is emitted in a pair withTL_APPLY_INDRECTwhenever aTL_APPLY_DROP_EVALmust be indirected. -
TL_APPLY_DROP_RESCUE(-5): ignoresexprandenv, and simply drops the top of the rescue stack. This is pushed simultaneously with the new rescue continuation, and serves to remove it if the computation succeeds. -
TL_APPLY_GETCHAR(-6): stopstl_apply_next, returningTL_RESULT_GETCHAR. This is a signal to the top-level (usuallytl_run_until_done, but also any other driver) that more input is needed. Allowing the call totl_apply_nextto end allows a driver to asynchronously restart the computation when more input is needed. -
Otherwise, the entry is a "basic application", where
expris a callable object,envis the environment, andlenis the number of values it expects;lenvalues are popped from the value stack, pushing evaluations from right to left (such that evaluation order is left to right) if the callable expects direct arguments (acfunc_byvalor a userlambda), or raising an error if the callable expects syntax arguments (acfunc,then, ormacro) and any argument is a direct value. If collection succeeds, these arguments are then passed to the appropriate C function, or to user code as the "argument list" described in General Parameters above.
The usual way to evaluate a TL expression from C, then, is to synthesize a one-argument application:
tl_interp *in = ...;
tl_object *state = ...;
tl_object *expr = ...;
void _c_continuation_k(tl_interp *in, tl_object *args, tl_object *state) { ... }
tl_push_apply(in, 1, tl_new_then(in, _c_continuation_k, state, "_c_continuation_k"), in->env);
tl_push_eval(in, expr, in->env);In this way, the continuation _c_continuation_k eventually receives a proper
list of length one, whose car is the direct value of the evaluated expr.
The interpreter environment in->env is assumed to be in a valid state; in a
REPL, this is usually in->top_env. The string passed to tl_new_then is a
debugging aid. state may be TL_EMPTY_LIST (or, equivalently, NULL) if the
communication of state is not needed.
The idea of pushing applications and evaluable expressions simultaneously
generalizes, of course, to many different applications, many different kinds of
callables, and many different expressions in many different environments, so
long as the stack discipline is observed. In particular, the sum of the len
of all applications should equal the number of value pushes done.
It is also possible to spend one redirection to pair an application and its values directly:
tl_push_apply(in, 1, tl_new_then(in, _c_continuation_k, state, "_c_continuation_k"), in->env);
tl_push_apply(in, TL_APPLY_PUSH_EVAL, expr, in->env);... though more care must be taken to sequence these correctly, as now the
evaluations will be sequenced with respect to the continuation stack; in
particular, the last TL_APPLY_PUSH_EVAL will become the first argument. In
essence, this device dynamically arises in the evaluation of user functions,
which appear in place of the tl_new_then call above.
TinyLISP is distributed under the terms of the Apache 2.0 License. See
COPYING for more information.
Please feel free to leave issues on this repository, or submit pull requests. I will make every effort to respond to them in a timely fashion.
Contributions are welcome--the most wieldy being pull requests, but I can negotiate other methods of code delivery. I retain the right to curate the code in this repository, but you are within your licensed right to make forks for your own purposes. Nevertheless, contributing back upstream is welcome!
Here are some easy things to look at:
- Documentation improvements and clean-up;
- Code cleanup (especially anything that makes it brief without being inscrutable);
- Finding bugs (they are inevitable, after all);
- Improving workflows (tell me what's hard to accomplish).