|
1 | 1 | # SIMD support for WebAssembly |
2 | 2 |
|
3 | | -This proposal describes how 128-bit SIMD types and operations can be added to |
4 | | -WebAssembly. It is based on [previous work on SIMD.js in the Ecma TC39 |
| 3 | +This proposal describes how 128-bit packed SIMD types and operations can be |
| 4 | +added to WebAssembly. It is based on [previous work on SIMD.js in the Ecma TC39 |
5 | 5 | ECMAScript committee](https://github.com/tc39/ecmascript_simd) and the |
6 | 6 | [portable SIMD specification](https://github.com/stoklund/portable-simd) that |
7 | 7 | resulted. |
8 | 8 |
|
9 | | -There are three parts to the proposal: |
10 | | - |
11 | | -1. [A specification of portable SIMD operations](portable-simd.md) that came |
12 | | - out of the SIMD.js work. |
13 | | -2. [A table of proposed WebAssembly operations](webassembly-opcodes.md) with |
14 | | - links to the portable specification. |
15 | | -3. This document which describes the mapping between WebAssembly and the |
16 | | - portable specification. |
17 | | - |
18 | | -# Mapping portable SIMD to WebAssembly |
19 | | - |
20 | | -The types and operations in the portable SIMD specification are relatively |
21 | | -straightforward to map to WebAssembly. This section describes the details of |
22 | | -the mapping. |
23 | | - |
24 | | -The following operations are *not* provided in WebAssembly: |
25 | | - |
26 | | -- `f*.maxNum` and `f*.minNum`. These NaN-suppressing operations don't exist in |
27 | | - scalar WebAssembly versions either. The NaN-propagating versions are provided. |
28 | | -- `f*.reciprocalApproximation` and `f*.reciprocalSqrtApproximation` are omitted |
29 | | - from WebAssembly pending further discussion. |
30 | | - |
31 | | -## New value types |
32 | | - |
33 | | -The following value types are added to the WebAssembly type system to support |
34 | | -128-bit SIMD operations. Each new WebAssembly value type corresponds to the |
35 | | -[portable SIMD type](portable-simd.md#simd-types) of the same name. |
36 | | - |
37 | | -* `v128`: A 128-bit SIMD vector. |
38 | | -* `b8x16`: A vector of 16 boolean lanes. |
39 | | -* `b16x8`: A vector of 8 boolean lanes. |
40 | | -* `b32x4`: A vector of 4 boolean lanes. |
41 | | -* `b64x2`: A vector of 2 boolean lanes. |
42 | | - |
43 | | -The 128 bits in a `v128` value are interpreted differently by different |
44 | | -operations. They can represent vectors of integers or IEEE floating point |
45 | | -numbers. |
46 | | - |
47 | | -The four boolean vector types do not have a prescribed representation in |
48 | | -memory; they can't be loaded or stored. This allows implementations to choose |
49 | | -the most efficient representation, whether as a predicate vector or some |
50 | | -variant of bits in a vector register. |
51 | | - |
52 | | -## Scalar type mapping |
53 | | - |
54 | | -Some operations in the portable SIMD specification use scalar types that don't |
55 | | -exist in WebAssembly. These types are mapped into WebAssembly as follows: |
56 | | - |
57 | | -* `i8` and `i16`: SIMD operations that take these types as an input are passed |
58 | | - a WebAssembly `i32` instead and use only the low bits, ignoring the high |
59 | | - bits. The `extractLane` operation can return these types; it is provided in |
60 | | - variants that either sign-extend or zero-extend to an `i32`. |
61 | | - |
62 | | -* `boolean`: SIMD operations with a boolean argument will accept a WebAssembly |
63 | | - `i32` instead and treat zero as false and non-zero values as true. SIMD |
64 | | - operations that return a boolean will return an `i32` with the value 0 or 1. |
65 | | - |
66 | | -* `LaneIdx2` through `LaneIdx32`: All lane indexes are encoded as `varuint7` |
67 | | - immediate operands. Dynamic lane indexes are not used anywhere. An |
68 | | - out-of-range lane index is a validation error. |
69 | | - |
70 | | -* `RoundingMode`: Rounding modes are encoded as `varuint7` immediate operands. |
71 | | - An out-of-range rounding mode is a validation error. |
72 | | - |
73 | | -## SIMD operations |
74 | | - |
75 | | -Most operation names are simply mapped from their portable SIMD versions. Some |
76 | | -are renamed to match existing conventions in WebAssembly. The integer |
77 | | -operations that distinguish between signed and unsigned integers are given `_s` |
78 | | -or `_u` suffixes. For example, `s32x4.greaterThan` becomes `i32x4.gt_s`, c.f. |
79 | | -the existing `i32.gt_s` WebAssembly operation. |
80 | | - |
81 | | -[The complete set of proposed opcodes](webassembly-opcodes.md) can be found in |
82 | | -a separate table. |
83 | | - |
84 | | -### Floating point conversions |
85 | | - |
86 | | -The `fromSignedInt` and `fromUnsignedInt` conversions to float never fail, so |
87 | | -they are simply renamed: |
88 | | - |
89 | | -* `f32x4.convert_s/i32x4(a: v128, rmode: RoundingMode) -> v128` |
90 | | -* `f64x2.convert_s/i64x2(a: v128, rmode: RoundingMode) -> v128` |
91 | | -* `f32x4.convert_u/i32x4(a: v128, rmode: RoundingMode) -> v128` |
92 | | -* `f64x2.convert_u/i64x2(a: v128, rmode: RoundingMode) -> v128` |
93 | | - |
94 | | -The float to integer conversions can fail. Conversion failure in any lane is |
95 | | -converted to a trap, same as the scalar WebAssembly conversions: |
96 | | - |
97 | | -* `i32x4.trunc_s/f32x4(a: v128) -> v128` |
98 | | -* `i64x2.trunc_s/f64x2(a: v128) -> v128` |
99 | | -* `i32x4.trunc_u/f32x4(a: v128) -> v128` |
100 | | -* `i64x2.trunc_u/f64x2(a: v128) -> v128` |
101 | | - |
102 | | -### Memory accesses |
103 | | - |
104 | | -The load and store operations use the same addressing and bounds checking as the |
105 | | -scalar WebAssembly memory instructions, and effective addresses are provided in |
106 | | -the same way by a dynamic address and an immediate offset operand. |
107 | | - |
108 | | -Since WebAssembly is always little-endian, the `load` and `store` instructions |
109 | | -are not dependent on the lane-wise interpretation of the vector being loaded or |
110 | | -stored. This means that there are only two instructions: |
111 | | - |
112 | | -* `v128.load(addr, offset) -> v128` |
113 | | -* `v128.store(addr, offset, data: v128)` |
114 | | - |
115 | | -The natural alignment of these instructions is 16 bytes; unaligned accesses are |
116 | | -supported in the same way as for WebAssembly's normal scalar load and store |
117 | | -instructions, including the alignment hint. |
118 | | - |
119 | | -The partial vector load/store instructions are specific to the 4-lane |
120 | | -interpretation: |
121 | | - |
122 | | -* `v32x4.load1(addr, offset) -> v128` |
123 | | -* `v32x4.load2(addr, offset) -> v128` |
124 | | -* `v32x4.load3(addr, offset) -> v128` |
125 | | -* `v32x4.store1(addr, offset, data: v128)` |
126 | | -* `v32x4.store2(addr, offset, data: v128)` |
127 | | -* `v32x4.store3(addr, offset, data: v128)` |
128 | | - |
129 | | -The natural alignment of these instructions is *4 bytes*, not the size of the |
130 | | -access. |
| 9 | +The [proposed specification](SIMD.md) has the details. |
0 commit comments