-
Notifications
You must be signed in to change notification settings - Fork 14
The Corset Language
Corset offers a development environment for the writing of constraints compilable to an AIR backend. Therefore, the compilation target are polynomials on the underlying field; the constraint system is said to be satisfied when they equal zero at all of their evaluation points, as defined in the traces.
The Corset language provides a high-level mean to develop complex constraint systems targeting this scheme.
A constraint system, as defined by Corset, requires the following interacting pieces:
- one or more modules;
- each containing several columns, or commitments – similar to variables & inputs in traditional programming;
- and several constraints on these columns.
The numeric values of the columns will then be provided in traces, that may satisfies the constraint system iff there are no incoherences between their numeric values and the defined constraints.
Here is a toy constraint system enforcing that the table-of-3 module is actually computing the multiplicative table of 3, from 0 to 10:
(module table-of-3) ;; define a module
(defcolumns ;; specifies that our traces will contain two columns, argument and result
argument
result)
(defconst ;; we want to prove the multiplication table from 0 to 10
START 0
END 10)
(defconstraint check-multiplications () (eq! result (* 3 argument))) ;; each result is equal to 3×argument
(defconstraint first-argument (:domain {0}) (eq! argument START)) ;; argument[0] == 0
(defconstraint last-argument (:domain {-1}) (eq! argument END)) ;; argument[:last] == 10
(defconstaint argument-increase () (will-inc! argument 1)) ;; argument[i+1] = argument[i] + 1This module will only be validated by a trace whose argument column sweep through the [0; 10] range, and whose result column values are everywhere equal to thrice the matching value in the argument column.
Identifiers in Corset are case-sensitive and may be made of any succession of unicode symbols. The only restrictions are:
- they must not start with a number;
- they can not contain a colon (
:).
Usual Lisp family conventions advise to use kebab-case-identifiers, but Corset will not force it upon the user.
Immediate numbers may be expressed in decimal base (no prefix), in binary (using the 0b prefix), or in hexadecimal (using the 0x prefix).
Domains are used to denote a finite set of indices. They are used, for instance, for array definition and constraint definition domains. They may be either discrete, enumerating every single value, or comprehensive, defining whole sub-ranges at once.
Discrete domains are defined as a list of space-separated indices, wrapped in braces; e.g. {1 2 4 -2}. Negative values are relative to the last index of the module.
Domains may also be defined as range, in which case another notation is used. The base notation are the colon-separated bounds, wrapped in square brackets, e.g. [2:4] covers {2 3 4}. If the range is 1-based, the lower bound can be omitted; e.g. [4] == [1:4]. Finally, a step can be specified with a third argument, e.g. [1:6:2] == {1 3 5}.
Corset only features line comments, starting with a semicolon (;) – which are often doubled by convention. No block comments are provided, but users may feel free to get creative with ASCII/Unicode-art they may see as useful to convey the meaning of comments.
In order both to extract as much information as can be from the source for optimization purpose, and to help the user avoid pitfalls while developing constraints systems, Corset feature a (hopefully) comprehensive type system.
The type system is tridimensional, handling separately scale (dimensionality of a type), magma[fn:1](numeric values accepted by a type), and conditioning (whether an expression can be used as a condition in a conditional statement).
A type scale can either be:
- scalar
- it represents a single value, and is represented by its magma symbol;
- column
- it represents a column value, and is represented by its magma symbol surrounded by square braces;
- array
- it represents an array of columns, and is represented by its magma symbol surrounded by double square braces;
- list
- it represents a list of expressions, and is represented by its magma symbol surrounded by braces;
- any
- it does not care about its scale, and is represented by ∋ followed byt its magma symbol.
The magma represents the type of numeric values that can be stored in member of a given type. It can be:
- binary
- a type that can only bear one-bit values, represented as 𝟙;
- nibble
- a unsigned 4-bits value, represented as 𝟜;
- byte
- an unsigned 8-bits value, represented as 𝟠;
- native
- an element of the targeted Galois field, represented as 𝔽;
- integer
- an arbitrary-sized unsigned integer, represented as
i<SIZE>; - anything
- this type does not care for its numeric values, represented as ∀.
The conditioning of an expression determines whether and how it may be used as the condition in a conditional statement
- boolean
- represents a boolean value, either
trueorfalse, represented as 𝔹; - loobean
- a reversed boolean value, where truthiness is expressed trough nullity, represented as 𝕃;
- unconditioned
- the expression bearing this conditioning can not be used as a condition – this is the default state.
Combining the two aspects of type definition, here is some types that might be encountered in a Corset constraint system:
- a column of binary booleans
[𝟙->𝔹]- anything made of nibbles
∋𝟜- a column of anything
[∀]- a 312 bits integer
i312- anything of whatever
∋∀- an array of byte columns
[[𝟠]]
Although nothing prevents the user to only use the root context, Corset offers a module system to split constraints among several modules.
The module system is naturally useful to group semantically linked constraint and columns, and to allow reusing the same names in different contexts, it also has deeper implications regarding the generated proof system.
Indeed, it is required that all the columns within a module share the same size, and the only constraints allowed to cross module boundaries and interface them are inclusion constraints (detailed later on).
Therefore, it is of the utmost importance to clearly design a constraint system split into modules, while keeping performances and memory usage in mind.
A module is simply entered through a (module <NAME>) instruction. A module can be spread among several files, and several modules may be defined within a single file – although the authors would advise against it for the sake of readability.
Modules name starting with a # are reserved and can not be defined by the user.
(defcolumns A B) ;; those are top-level columns
(module mod1) ;; all subsequent declarations are made whihin mod1
(defcolumns X Y Z) ;; these columns are only visible in mod1
(module mod2) ;; we are now in mod2
(defcolumns T U) ;; these columns are only visible in mod2
(defconstraint invalid () (eq! X T)) ;; invalid: X is not visible here
(defplookup valid (mod1.X mod1.Z) (T U)) ;; only inclusion constraints may refer to mod1 columnsConstants are defined throug the defconst form, and are available within the module they are defined in. They can be defined to a numeric value, or more generally to any statically computable value.
One or more constants may be defined in a single form:
(defconst [<NAME> <VALUE>]+)(defconst π 3) ;; defining a single constant
(defconst ;; defining several constants at once
e 2
max-byte (- (^ 2 8) 1)) ;; OK: (- (^ 2 8) 1) can be statically computedColumns, or commitments, or even witnesses, are where numerical values intervening in the constraint validation process are stored.
Semantically, columns can be divided in three categories:
- input columns
- contain the input to the computation process that a module aims at proving;
- intermediary columns
- are used as working registers, or scratchpads, containing intermediate steps required to prove the computations process;
- output columns
- contains the result of the computations a module aims at proving.
A distinction should be made between atomic columns, declared through the defcolumns form, and other automatically generated columns. Atomic columns must be filled by the use and be present in the trace files submitted to the prover, whereas other columns (typically used as working registers to prove intermediate computations) are automatically generated and filled by Corset, and are transparent to the user.
The defcolumns form is used to declare atomic columns, using the following scheme:
(defcolumns [<NAME> | (<NAME> [:<TYPE>] [<DOMAIN>] [:padding <VALUE>] [:display <BASE>])]+)By default, columns are declared as columns of field elements ([𝔽], cf. The Type System). However, the type magma may be changed. In general, users should aim to always use the smaller possible magma, as it may allow Corset to trigger optimizations when compiling the constraints.
While parsing traces, Corset will ensure that the provided values for an atomic columns are compatible with its declared type, and will error otherwise.
By default, columns are unidimensional, meaning they contain a single value per row. However, it may be more meaningful to group together semantically linked columns, e.g. the bytes of an accumulator.
In this case, array-like columns may be declared by affecting them to a domain (cf. the Atoms/Domains section); in which case the individual elements can be accessed as scalar values with the [<NAME> <INDEX>] notation. Accessing invalid indices will result in a compilation error. The length of an array column can be obtained with the len function applied to the array.
The proving process requires that all columns have a length that is a power of two. As this is obviously not compatible, if per chance, with the arithmetization of the computation to be proven, Corset will pad all columns with dummy values to reach the next power of two of their natural length.
However, in specific case, users may wish to impose a non-zero padding value. This may be accomplished with the :padding <VALUE> annotation on a column.
By default, when debbuging or inspecting trace files, Corset will display column values in hexadecimal. However, this may not be the most appropriate way of doing so; thus, Corset offers the possibility of chosing another base with the :display annotation. It must be following by one of these keywords: :hex (hexadecimal; default behavior), :dec (decimal), :bin (binary), :bytes (grouped-by-two bytes in hexadecimal notation), :truthiness (true~/~false, applicable to boolean and loobean columns), :opcode (EVM opcode).
(defcolumns A) ;; the simplest way of defining a column of 𝔽
(defcolumns (B :bool)) ;; a column may be given an explicit type
(defcolumns (X :nibble :display :bin) ;; when debugging, display X values in binary
(Y :byte)
(Z [4])) ;; an array of 4 field element columns – first element may be accessed with [Z 1]; and its length with (len Z)Several constraint types can be defined, addressing different use cases. Although all other constraints can be built from only vanishing constraints and inclusion constraints, Corset offers syntactic sugar for semantically more specific constraint types.
Vanishing constraints are the bread and butter of AIR constraint systems, and most probably the one that will be encountered most often.
They ensure that a given polynomial expression on the target field vanishes at all evaluation points, each point corresponding to a line in a trace, i.e. the set of all involved column values at a given index.
(defconstraint <NAME> ([:domain <DOMAIN>] [:guard <EXPRESSION>] [:perspective <NAME>])
<EXPRESSION>)Vanishing constraints may be defined either on the integrality of the trace (i.e. they must hold at every line of the trace), or only on a discrete subset of the traces.
By default, vanishing constraints are globally defined. However, they can be restricted to a discrete domain with the :domain keyword followed by a domain definition.
The :domain keyword restricts the constraint validity requirement to a static domain. However, this may not be expressive enough. At the cost of increasing the constraint degree, the :guard keyword allow to define a dynamic domain: the constraint will only be checked for trace lines where the given <EXPRESSION> is not null.
Another way of restraining a constraint application domain is to enable it only within a given perspective, with the :perspective keyword. In this case, not only the constraint will only be checked for correctness within this perspective, but it will also be able to tap into the given perspective columns.
As constraints defined within a perspective may need to access columns defined in another perspective, a special syntax allows to refer to columns specific to another perspective: PERSPECTIVE/COLUMN.
The body of the constraint may be any expression. The only restriction is that the expression final type must be loobean, to ensure that the semantic matches the computation.
Here is how to define a simple global constraint, ensuring that column A is always equal to twice the column B :
(defconstraint A-2-B () (eq! A (* 2 B)))Were it wished to restrain this constraint to only the first and last line of the trace, the :domain keyword would be used:
(defconstraint A-2-B-start-end (:domain {0 -1}) (eq! A (* 2 B)))Alternatively, should this constraint only be triggered when C equals 32, the :guard keyword would be required:
(defconstraints A-2-B-when-C-32 (:guard (eq C 32)) (eq! A (* 2 B)))It should be noticed that the guard, being interpreted according to traditional boolean logic, uses the eq function, whereas the constraint body, being rquired to be a loobean, uses its loobean dual eq!.
Inclusion constraints proves (through a plookup argument) that all the line-wise value sets of a given set of columns (or expressions) are included (not necessarily strictly) in the line-wise value sets of another set of columns (or expressions). This constraint family is the only one to be able to cross module boundaries, and thus allow to ‶link″ modules together. Therefore, involved columns should be qualified with their module name.
These can semantically be compared to an SQL LEFT JOIN with a requirement of existence in the second table, where each set of columns plays the role of an SQL table.
(defplookup <NAME>
([<INCLUDING EXPRESSION>])
([<INCLUDED EXPRESSION>]))(defplookup mod1-contains-2×mod2
(mod1.A mod1.B mod1.C)
((* 2 mod2.X) (* 2 mod2.Y) (* 2 mod2.Z)))This proves that for all lines in the module mod2, the tuple of values (2×x, 2×y, 2×z) from the columns (X, Y, and Z) is found in some tuple of the column (A, B, C) in the module mod1.
Range constraints ensure that a given column has all its values between 0 and a given value.
Range constraints are defined the following way, ensuring that all elements from COLUMN are stricly smaller than n.
(definrange <COLUMN> <n>)(definrange A 1025)This ensures that for any a in A, 0 ≤ a < 1025.
Interleaving constraints declare a new column, which provably contains a regular interleaving of a given set of columns. Id est, if an interleaved column C is defined from A and B, then C will contain the values [A0 B0 A1 B1 A2 B2 ... An Bn].
Interleaving constraints are defined the following way:
(definterleaved <NEW COLUMN> ([ <SOURCE COLUMN> ]))The newly declared column will automatically be computed by Corset when required, and thus does not need to be explicitely filled in the trace.
(definterleaved C (A B))This declares a new column C, containing the interleaved values of A and B, in this order.
Permutation constraints ensure that, given two equally sized sets of columns, their line are exactly equals, albeit in a potentially different order. Moreover, an ordering can be defined on the second set columns, thus ensuring that the second set is a sorted permutation of the first set.
Permutations (and sorting) constraints are defined the following way:
(defpermutation ([ <NEW COLUMN> ])
([ <OLD COLUMN> | (<OLD COLUMN> <↑|↓>)]))Sorting criterions are applied in their order of definitions, therefore it is important to declare the permutation constraint in the desired order.
When required, Corset will automatically generate the new sorted columns. Therefore, it is not required to compute them in the trace generation process.
(defpermutation (A' B' C')
((↓ A) (↑ B) C))Here, A', B', and C' are defined as a sorted permutation of the columns A, B, and C, in increasing order of A values, then in decreasing order of B values, and independently of C values.
If the Unicode arrows are hard to type, you may replace ↓ with + and ↑ with -.
Corset provides a few builtin forms that may be used to make constraint expressions more concise and readable.
Conditionals are the fundamentals of many constraints. Thus, Corset provides the classic If statement, in the form of the (if COND THEN [ELSE]) expression. COND has to be bear a conditioning, be it boolean or loobean.
When it makes semantical sense to group several expressions in the same constraint, the begin form may be used. It will create a list of expressions, that will only be satisfied when all the expressions it contains are satisfied (i.e. evaluate to 0 on their domain).
;; not so clearly conveys the meaning
(defconstraint initialization-1 ()
(reset-stuff))
(defconstraint initialization-2 ()
(set-to-0 x y z))
(defconstraint initialization-3 ()
(check-whatever))
;; clearly conveys the meaning
(defconstraint initialization ()
(begin (reset-stuff)
(set-to-0 x y z)
(check-whatever)))Repetitive operations may be expressed in a shorter ways through loop. It should be understood that “loops” in Corset are more akin to an automated code generation than a standard loop, as Corset programs are not executed per se, but compiled down to circuits.
The for loop adheres to the following structure:
(for <VARIABLE> <DOMAIN> <BODY>)VARIABLE will be introduced as a local binding, the domain follows the standard domain syntax (cf. the Atoms/Domains section), and the body will be repeated for each value VARIABLE can take over DOMAIN.
The output type will be a list of expressions, of the same cardinality than DOMAIN.
;; a standard loop over 1-6
(for i [6] (foobar i))
;; loop over discrete values
(for i {1 3 5} (something i))
;; loops can be nested
(for i [4]
(for j {1 3 5}
(some-function i j)))In order to make expression more readable, it is often desired to store complex computations in local variables, that are then injected in the final expression. In Corset, this is accomplished with the let form, which obeys to the following scheme:
(let ([(<NAME> <DEFINITION>)]+)
<EXPRESSION>);; this:
(defpurefun (foobarize x y z)
(let ((foo (* 3 (+ x z)))
(bar (+ (^ 2 8) (* z y)))
(ize (* x y z x y z)))
(do-it foo bar ize)))
;; is much clearer than:
(defpurefun (foobarize x y z)
(do-it (* 3 (+ x z)) (+ (^ 2 8) (* z y)) (* x y z x y z)))The debug form will only be compiled when Corset is run with the --debug flag; otherwise, it will just be dropped from the generated constraint set.
It is intended to mark parts of the constraint system that may be costly to run in production, but may be useful as sanity check in the development phase.
(defcolumns (X :bool))
;; Better safe than sorry, but may be costly in production
(defconstraint X-is-really-boolean ()
(debug (any! (eq! X 0)
(eq! X 1))))Repeated snippets of code may be factorized away in functions, improving the readability of the code and clarifying the underlying semantic.
Functions in Corset are gradually typed, meaning that the user may impose type restrictions on the arguments, or specify the return type of the function.
Functions are defined with the `defun` and `defpurefun` forms, with the following syntax:
(def[puref]un (<NAME>|(<NAME> :type) [<ARG> | (ARG :type)]*)
(<BODY>))For instance, here is the function defining how to square a number:
(defun (square x) (* x x))By default, function arguments are assumed to be native integers, and Corset will do its best to infer the function result type. However, both the arguments and the function types may be specified, which may lead to better optimization down the line.
;; this declares a function only accepting types compatible with byte
(defpurefun (byte-square (x :byte)) (* x x))If Corset detects a discrepancy between the inferred output type and the declared one, it will emit a warning, unless the function has been annotated with the :nowarn keyword.
;; Corset is not smart enough to know that 1 - boolean is a boolean itself
;; So we force the output type of the function
(defpurefun ((not :bool :nowarn) (x :bool))
(- 1 x))The defpurefun form behaves similarly to the defun, with the only nuance that functions defined with is can not access any object outside of their arguments and constants. This purity of the scoping helps ensuring a predictable behavior, by avoiding e.g. unexpected shadowing.
As a rule of thumb, declaring function as pure should be the default, falling down on impure functions only when absoltely required.
(defcolumns A B)
(defun (A-plus-2)
(+ A 2)) ;; OK
(defpurefun (A-plus-2)
(+ A 2)) ;; KO: A is not an argument and can not be reached from a pure function
(defpurefun (plus-2 X)
(+ X 2)) ;; OK: X is an argument of plus-2; called as (plus-2 A) later onFunction polymorphism on the arguments is available, and is triggered when declaring multiple functions with the same name, but different types on the argument. When a polymorphic function is called, Corset will systematically call the one with the most restrictive typing.
;; Define a function normalizing a valut to {0, 1}
;; In the general case, multiply x by its inverse
(defpurefun (norm x) (* x (inv x)))
;; however, booleans are already normalized; no need to compute an inverse
(defpurefun (norm (x :bool)) x)
(defcolumns t (u :bool))
... (norm t) ... ;; t is not a boolean, the first version will be called
... (norm u) ... ;; u is a boolean, the second version will be called[fn:1] The term ‶magma″ here is a deliberate abuse, as mathematicians have already stolen all the other fitting terms.