Wasm Introduction (Part 7): Text Format
Written by the CoinEx Chain lab, this article is the 7th one of the Wasm Introduction series and introduces Text Format. CoinEx Chain is the world’s first public chain exclusively designed for DEX, and will also include a Smart Chain supporting smart contracts and a Privacy Chain protecting users’ privacy.
In the previous articles we discussed the WebAssembly(Wasm for short) binary formatand instruction setin detail. This article will focus on the Wasm Text Format (hereinafter referred to as WAT).
Overall Structure
(module
;; fields
(type ... )
(import ... )
(func ... )
(table ... )
(mem ... )
(global ... )
(export ... )
(start ... )
(elem ... )
(data ... )
)
Text format is another representation of binary format, but it is more friendly to humans. Binary format is more suitable for machine (such as compiler) generation and (such as Wasm interpreter) understanding, and text format is more suitable for human writing and reading. In addition to the obvious differences in the form of representation, the two formats have the following differences in structure:
- In binary format data are organized in sections, while in text format contents are organized in fields. The WAT compiler needs to collect fields of the same type and merge them into binary sections.
- In binary format, all sections except the custom section can only appear at most once, and must appear in ascending order of section ID. Such restriction does not apply to text format or the order of the fields. However, import fields must appear before function, table, memory, and global fields. In addition, there is no custom field in the text format, so there is no way to express custom sections.
- Fields and sections are basically one-to-one correspondences, but there is no separate code field. The code field and function field are merged together.
- Text format provides multiple inline forms for easy writing. E.g:
* Function, table, memory, and global fields can contain inline import and export fields.
* Table field can contain inline element fields.
* Memory field can contain inline data fields.
* Function and import fields can contain inline type fields.
Next, the fields are introduced in the order of the section ID incrementally.
Type Field
The type field defines the function type. The following example defines a function type that takes two i32
type parameters and returns an i32
type value:
(module
(type (func (param i32) (param i32) (result i32)))
)
We can assign an identifier to the function type as its name, so that the function type can be referenced by name elsewhere, without having to pass the index directly. module
, type
, func
, param
, result
are keywords of the WAT language. Identifiers must begin with a $ sign, followed by one or more numbers or letters. For complete identifier lexical rules, please refer to Section 6.3.5 of the Wasm Specification. In addition, function type parameters can also be abbreviated in the same (param)
. The following example shows the usage of identifiers and short form of parameters:
(module
(type $ft1 (func (param i32 i32) (result i32)))
(type $ft2 (func (param f64)))
)
Import & Export Field
The Wasm module can import or export four types of elements: function, table, memory, and global variable. Correspondingly, the import and export fields are also written in four ways. The following example shows four ways of writing import fields:
(module
(type $ft1 (func (param i32 i32) (result i32)))
(import "env" "f1" (func $f1 (type $ft1)))
(import "env" "t1" (table $t 1 8 funcref))
(import "env" "m1" (memory $m 4 16))
(import "env" "g1" (global $g1 i32)) ;; immutable
(import "env" "g2" (global $g2 (mut i32))) (;; mutable ;;)
)
As can be seen from the above example, in the import field, you need to specify the module name, element name, and the specific type of the imported element. Names are represented by strings and need to be enclosed in double quotes. The import field can also be given an identifier like the type field, so that the imported element can be referenced by name later. WAT supports two types of comments: single-line comments that begin with ;;
and cross-line comments that begin with (;;
, end with ;;)
.
(module
(import "env" "f1"
(func $f1
(param i32 i32) (result i32) ;; inline type
)
)
)
Compared to the import field, the export field is easier to write. For the export field, it is only required to specify the export name and the specific element index. Please note that the export name must be unique within the entire module. The following example shows four ways of writing export fields:
(module
;; ...
(export "f1" (func $f1))
(export "f2" (func $f2))
(export "t1" (table $t ))
(export "m1" (memory $m ))
(export "g1" (global $g1))
(export "g2" (global $g2))
)
Import and export fields can be put inline in function, table, memory, and global fields. The following example shows the inline writing of the import field:
(module
(type $ft1 (func (param i32 i32) (result i32)))
(func $f1 (import "env" "f1") (type $ft1))
(table $t1 (import "env" "t" ) 1 8 funcref)
(memory $m1 (import "env" "m" ) 4 16)
(global $g1 (import "env" "g1") i32)
(global $g2 (import "env" "g2") (mut i32))
)
The following example shows the inline writing of the export field (see below for more details of how to write functions, tables, memory, and global fields)
(module
(func $f (export "f1") ... )
(table $t (export "t" ) ... )
(memory $m (export "m" ) ... )
(global $g (export "g1") ... )
)
Function Field
A function field declares local variables of a function and gives instructions for the function. The compiler will split the function field, place the type index in the function section, and local variable information and bytecode in the code section. The following example shows the writing of the function field (see below for more details of how to write instructions):
(module
(type $ft1 (func (param i32 i32) (result i32)))
(func $add (type $ft1)
(local i64 i64)
;; instructions
(local.get 3) (drop)
(i32.add (local.get 0) (local.get 1))
)
)
In fact, the parameters of the function are also ordinary local variables, and, together with the local variables declared in the function field, constitute local variable space, with the index incremented from 0.
The above is a simplified version of the function field, which directly references the function type, and local variables are written in the same (local)
. We can make the function type inline with the function field and split (param)
into several to name the parameters. In the same way, (local)
can be split into several to name local variables. By giving names to parameters and local variables, we can locate parameters or local variables by name instead of index in variable instructions, which helps to improve the readability of the code. Let's rewrite the above example as the inline type, and assign identifiers to parameters and local variables, as follows:
(module
(func $f1 (param $a i32) (param $b i32) (result i32)
(local $c i64) (local $d i64)
(local.get $c) (drop)
(i32.add (local.get $a) (local.get $b))
)
)
Table & Element Field
Since it is specified in the Wasml.0 Specification that a module can have at most one table, the table field can appear at most once. The element field can appear multiple times, which can specify multiple function indexes and the table offset corresponding to the first function index. The following example shows how to write the table and element fields:
(module
(func $f1) (func $f2) (func $f3)
(table 10 20 funcref)
(elem (offset (i32.const 5)) $f1 $f2 $f3)
)
We can also make an element field inline in a table field, yet we cannot specify table limits in this way but have the compiler speculate according to the inline element. It is also impossible to specify the starting offset of the element which must start from 0. The following example shows the inline writing of element fields:
(module
(func $f1) (func $f2) (func $f3)
(table funcref ;; min: 3, max: 3
(elem $f1 $f2 $f3) ;; inline elem, offset: 0
)
)
Memory & Data Field
Similar to the table, since it is specified in the Wasml.0 Specification that the module can only have one memory at most, the memory field can also appear at most once. The data field can appear multiple times, and a constant expression is required to specify the starting memory offset (address) and a string to specify the initial memory value. The following example shows how to write the memory and data fields:
(module
(memory 4 16)
(data (offset (i32.const 100)) "Hello, ")
(data (offset (i32.const 108)) "World!\n")
)
A data field can also be inline in a memory field, but we cannot specify the page limits of memory in this way but have the compiler speculate according to the inline data. It is also impossible to specify the starting offset of the memory which can only start from 0. In addition, the initial data can be written as multiple strings. The following example shows the inline writing of the data field:
(module
(memory ;; min: 1, max: 1
(data "Hello, " "World!\n") ;; inline data, offset: 0
)
)
With escape characters, we can easily embed special characters such as new lines, hex-encoded bytes, and Unicode code points in strings. For details, please refer to Section 6.3.3 of the Wasm Specification.
Global Field
In the global field, we can specify the identifier, type, mutability, and initial value of global variables. The following example shows how to write global fields:
(module
(global $g1 (mut i32) (i32.const 100)) ;; mutable
(global $g2 (mut i32) (i32.const 200)) ;; mutable
(global $g3 f32 (f32.const 3.14)) ;; immutable
(global $g4 f64 (f64.const 2.71)) ;; immutable
(func
(global.get $g1)
(global.set $g2)
)
)
Start Field
The start field is the simplest and is used to specify the starting function index. The following example shows how to write the starting field:
(module
(func $main ... )
(start $main)
)
Having introduced the overall structure of WAT and the writing of various fields, now we’re going to discuss how to write various instructions below.
Plain Instruction
The plain instruction is pretty straightforward. For most instructions, it is an opcode followed by immediate arguments (if any). The following example shows the general writing of most instructions other than control instructions:
(module
(memory 1 2)
(global $g1 (mut i32) (i32.const 0))
(func $f1)
(func $f2 (param $a i32)
i32.const 123
i32.load offset=100 align=4
i32.const 456
i32.store offset=200
global.get $g1
local.get $a
i32.add
call $f1
drop
)
)
It is clear that the immediate argument of most instructions cannot be omitted, and it follows the opcode in the form of numerical value or name. Memory load/store instructions are an exception. Both the offset
and align
immediate arguments are optional and must be specified explicitly (the value follows the equal sign).
The three structured control instructions, block
, loop
, and if
, can specify optional result types and must end with keyword end
. The if
instruction can also be split into two branches with keyword else
. The following example shows the general way of writing control instructions such as block
, loop
, if
, br
, and br_if
:
(module
(func $foo
block $l1 (result i32)
i32.const 123
br $l1
loop $l2
i32.const 123
br_if $l2
end
end
drop
)
(func $max (param $a i32) (param $b i32) (result i32)
local.get $a
local.get $b
i32.gt_s
if (result i32)
local.get $a
else
local.get $b
end
)
)
The br_table
instruction is written similarly to the br
instruction. The labels are written one by one after opcode, seperated by space, and followed by the default label. Here is an example:
(module
(func
block
block
block
i32.const 3
br_table 0 1 2 0 ;; labels: 0,1,2, default: 0
end
end
end
)
)
Folded Instruction
In addition to the plain format described above, instructions can also be written in a more condensed way. Three steps to adjust the ordinary format into a folded one: First, add parentheses to the instruction; second, remove the keyword end
if it is a block
, loop
or if
instruction (if
instructions are a bit more complicated. See the example below for details.); third (optional), if an instruction (plain or folded) and the preceding instructions can be regarded as one group of operations logically, the previous instructions can be folded into the instruction. For example, the three instructions local.get $a
, local.get $b
, and i32.add
are logically a set of operations that perform addition calculations. Then these three instructions can be folded up and written as (i32 .add (local. get $ a) (local. get $ b))
.
The folded instruction actually expresses an instruction tree. The WAT compiler will expand the folded instruction according to the post-order traversal. Rewrite the previous example containing foo()
and max()
functions using the three steps described above, and the code should be as follows:
(module
(func $foo
(block $l1 (result i32)
(i32.const 123)
(br $l1)
(loop $l2
(br_if $l2 (i32.const 123))
)
)
(drop)
)
(func $max (param $a i32) (param $b i32) (result i32)
(if (result i32)
(i32.gt_s (local.get $a) (local.get $b))
(then (local.get $a))
(else (local.get $b))
)
)
)
As you can see, the code does look a lot better. In order to deepen our understanding of the folded instruction, let’s expand the if
instruction of the max()
function by a layer, extract the i32.gt_s
instruction, and rewrite it into the following equivalent form:
(module
(func $max (param $a i32) (param $b i32) (result i32)
(i32.gt_s (local.get $a) (local.get $b))
(if $l (result i32)
(then (local.get $a))
(else (local.get $b))
)
)
)
We can continue to expand the i32.gt_s
instruction, extract the local.get
instruction, and rewrite it into the following equivalent form:
(module
(func $max (param $a i32) (param $b i32) (result i32)
(local.get $a) (local.get $b) (i32.gt_s)
(if $l (result i32)
(then (local.get $a))
(else (local.get $b))
)
)
)
That’s all for the basic syntax of WAT.