Wasm Introduction (Part 7): Text Format

Written by the CoinEx Chain lab, this article is the 7th one of the Wasm Introduction series and introduces Text Format. CoinEx Chain is the world’s first public chain exclusively designed for DEX, and will also include a Smart Chain supporting smart contracts and a Privacy Chain protecting users’ privacy.

In the previous articles we discussed the WebAssembly(Wasm for short) binary formatand instruction setin detail. This article will focus on the Wasm Text Format (hereinafter referred to as WAT).

Overall Structure

(module
;; fields
(type ... )
(import ... )
(func ... )
(table ... )
(mem ... )
(global ... )
(export ... )
(start ... )
(elem ... )
(data ... )
)

Text format is another representation of binary format, but it is more friendly to humans. Binary format is more suitable for machine (such as compiler) generation and (such as Wasm interpreter) understanding, and text format is more suitable for human writing and reading. In addition to the obvious differences in the form of representation, the two formats have the following differences in structure:

  • In binary format data are organized in sections, while in text format contents are organized in fields. The WAT compiler needs to collect fields of the same type and merge them into binary sections.
  • In binary format, all sections except the custom section can only appear at most once, and must appear in ascending order of section ID. Such restriction does not apply to text format or the order of the fields. However, import fields must appear before function, table, memory, and global fields. In addition, there is no custom field in the text format, so there is no way to express custom sections.
  • Fields and sections are basically one-to-one correspondences, but there is no separate code field. The code field and function field are merged together.
  • Text format provides multiple inline forms for easy writing. E.g:

* Function, table, memory, and global fields can contain inline import and export fields.

* Table field can contain inline element fields.

* Memory field can contain inline data fields.

* Function and import fields can contain inline type fields.

Next, the fields are introduced in the order of the section ID incrementally.

Type Field

(module
(type (func (param i32) (param i32) (result i32)))
)

We can assign an identifier to the function type as its name, so that the function type can be referenced by name elsewhere, without having to pass the index directly. module, type, func, param, resultare keywords of the WAT language. Identifiers must begin with a $ sign, followed by one or more numbers or letters. For complete identifier lexical rules, please refer to Section 6.3.5 of the Wasm Specification. In addition, function type parameters can also be abbreviated in the same (param). The following example shows the usage of identifiers and short form of parameters:

(module
(type $ft1 (func (param i32 i32) (result i32)))
(type $ft2 (func (param f64)))
)

Import & Export Field

(module
(type $ft1 (func (param i32 i32) (result i32)))
(import "env" "f1" (func $f1 (type $ft1)))
(import "env" "t1" (table $t 1 8 funcref))
(import "env" "m1" (memory $m 4 16))
(import "env" "g1" (global $g1 i32)) ;; immutable
(import "env" "g2" (global $g2 (mut i32))) (;; mutable ;;)
)

As can be seen from the above example, in the import field, you need to specify the module name, element name, and the specific type of the imported element. Names are represented by strings and need to be enclosed in double quotes. The import field can also be given an identifier like the type field, so that the imported element can be referenced by name later. WAT supports two types of comments: single-line comments that begin with ;;and cross-line comments that begin with (;;, end with ;;).

(module
(import "env" "f1"
(func $f1
(param i32 i32) (result i32) ;; inline type
)
)
)

Compared to the import field, the export field is easier to write. For the export field, it is only required to specify the export name and the specific element index. Please note that the export name must be unique within the entire module. The following example shows four ways of writing export fields:

(module
;; ...
(export "f1" (func $f1))
(export "f2" (func $f2))
(export "t1" (table $t ))
(export "m1" (memory $m ))
(export "g1" (global $g1))
(export "g2" (global $g2))
)

Import and export fields can be put inline in function, table, memory, and global fields. The following example shows the inline writing of the import field:

(module
(type $ft1 (func (param i32 i32) (result i32)))
(func $f1 (import "env" "f1") (type $ft1))
(table $t1 (import "env" "t" ) 1 8 funcref)
(memory $m1 (import "env" "m" ) 4 16)
(global $g1 (import "env" "g1") i32)
(global $g2 (import "env" "g2") (mut i32))
)

The following example shows the inline writing of the export field (see below for more details of how to write functions, tables, memory, and global fields)

(module
(func $f (export "f1") ... )
(table $t (export "t" ) ... )
(memory $m (export "m" ) ... )
(global $g (export "g1") ... )
)

Function Field

(module
(type $ft1 (func (param i32 i32) (result i32)))
(func $add (type $ft1)
(local i64 i64)

;; instructions
(local.get 3) (drop)
(i32.add (local.get 0) (local.get 1))
)
)

In fact, the parameters of the function are also ordinary local variables, and, together with the local variables declared in the function field, constitute local variable space, with the index incremented from 0.

The above is a simplified version of the function field, which directly references the function type, and local variables are written in the same (local). We can make the function type inline with the function field and split (param) into several to name the parameters. In the same way, (local) can be split into several to name local variables. By giving names to parameters and local variables, we can locate parameters or local variables by name instead of index in variable instructions, which helps to improve the readability of the code. Let's rewrite the above example as the inline type, and assign identifiers to parameters and local variables, as follows:

(module
(func $f1 (param $a i32) (param $b i32) (result i32)
(local $c i64) (local $d i64)

(local.get $c) (drop)
(i32.add (local.get $a) (local.get $b))
)
)

Table & Element Field

(module
(func $f1) (func $f2) (func $f3)
(table 10 20 funcref)
(elem (offset (i32.const 5)) $f1 $f2 $f3)
)

We can also make an element field inline in a table field, yet we cannot specify table limits in this way but have the compiler speculate according to the inline element. It is also impossible to specify the starting offset of the element which must start from 0. The following example shows the inline writing of element fields:

(module
(func $f1) (func $f2) (func $f3)
(table funcref ;; min: 3, max: 3
(elem $f1 $f2 $f3) ;; inline elem, offset: 0
)
)

Memory & Data Field

(module
(memory 4 16)
(data (offset (i32.const 100)) "Hello, ")
(data (offset (i32.const 108)) "World!\n")
)

A data field can also be inline in a memory field, but we cannot specify the page limits of memory in this way but have the compiler speculate according to the inline data. It is also impossible to specify the starting offset of the memory which can only start from 0. In addition, the initial data can be written as multiple strings. The following example shows the inline writing of the data field:

(module
(memory ;; min: 1, max: 1
(data "Hello, " "World!\n") ;; inline data, offset: 0
)
)

With escape characters, we can easily embed special characters such as new lines, hex-encoded bytes, and Unicode code points in strings. For details, please refer to Section 6.3.3 of the Wasm Specification.

Global Field

(module
(global $g1 (mut i32) (i32.const 100)) ;; mutable
(global $g2 (mut i32) (i32.const 200)) ;; mutable
(global $g3 f32 (f32.const 3.14)) ;; immutable
(global $g4 f64 (f64.const 2.71)) ;; immutable
(func
(global.get $g1)
(global.set $g2)
)
)

Start Field

(module
(func $main ... )
(start $main)
)

Having introduced the overall structure of WAT and the writing of various fields, now we’re going to discuss how to write various instructions below.

Plain Instruction

(module
(memory 1 2)
(global $g1 (mut i32) (i32.const 0))
(func $f1)
(func $f2 (param $a i32)
i32.const 123
i32.load offset=100 align=4
i32.const 456
i32.store offset=200
global.get $g1
local.get $a
i32.add
call $f1
drop
)
)

It is clear that the immediate argument of most instructions cannot be omitted, and it follows the opcode in the form of numerical value or name. Memory load/store instructions are an exception. Both the offset and align immediate arguments are optional and must be specified explicitly (the value follows the equal sign).

The three structured control instructions, block, loop, and if, can specify optional result types and must end with keyword end. The ifinstruction can also be split into two branches with keyword else. The following example shows the general way of writing control instructions such as block, loop, if, br, and br_if:

(module
(func $foo
block $l1 (result i32)
i32.const 123
br $l1
loop $l2
i32.const 123
br_if $l2
end
end
drop
)
(func $max (param $a i32) (param $b i32) (result i32)
local.get $a
local.get $b
i32.gt_s
if (result i32)
local.get $a
else
local.get $b
end
)
)

The br_tableinstruction is written similarly to the brinstruction. The labels are written one by one after opcode, seperated by space, and followed by the default label. Here is an example:

(module
(func
block
block
block
i32.const 3
br_table 0 1 2 0 ;; labels: 0,1,2, default: 0
end
end
end
)
)

Folded Instruction

The folded instruction actually expresses an instruction tree. The WAT compiler will expand the folded instruction according to the post-order traversal. Rewrite the previous example containing foo() and max() functions using the three steps described above, and the code should be as follows:

(module
(func $foo
(block $l1 (result i32)
(i32.const 123)
(br $l1)
(loop $l2
(br_if $l2 (i32.const 123))
)
)
(drop)
)
(func $max (param $a i32) (param $b i32) (result i32)
(if (result i32)
(i32.gt_s (local.get $a) (local.get $b))
(then (local.get $a))
(else (local.get $b))
)
)
)

As you can see, the code does look a lot better. In order to deepen our understanding of the folded instruction, let’s expand the ifinstruction of the max()function by a layer, extract the i32.gt_sinstruction, and rewrite it into the following equivalent form:

(module
(func $max (param $a i32) (param $b i32) (result i32)
(i32.gt_s (local.get $a) (local.get $b))
(if $l (result i32)
(then (local.get $a))
(else (local.get $b))
)
)
)

We can continue to expand the i32.gt_sinstruction, extract the local.getinstruction, and rewrite it into the following equivalent form:

(module
(func $max (param $a i32) (param $b i32) (result i32)
(local.get $a) (local.get $b) (i32.gt_s)
(if $l (result i32)
(then (local.get $a))
(else (local.get $b))
)
)
)

That’s all for the basic syntax of WAT.