This document contains a high-level summary of the native concurrency support, providing background for understanding the definitions in the WIT, AST explainer, binary format and Canonical ABI explainer documents that are gated by the 🔀 (async) and 🧵 (threading) emojis.
- Goals
- Summary
- Concepts
- Interaction with the start function
- Async ABI
- Examples
- Component Instance Lifetime
- TODO
With the release of WASI 0.3, the following concurrency-specific goals and use cases are added, refining the Component Model's high-level goals and use cases:
- Integrate with idiomatic source-language concurrency features including:
asyncfunctions in languages like C#, JS, Python, Rust and Swift- coroutines in languages like Kotlin, Perl, PHP and (recently) C++
- green threads scheduled by the language's own runtime in languages like Go and (initially and recently again) Java
- host threads that are scheduled outside the language's own runtime in languages like C, C++, C#, Python, Rust and many more that expose pthreads or other OS threads
- promises, futures, streams and channels
- callbacks, in languages with no other built-in concurrency mechanisms
- Provide fiber-like stack-switching capabilities via Core WebAssembly import calls in a way that composes with, and is specified in terms of, but doesn't actually depend on, the Core WebAssembly stack-switching proposal.
- Allow polyfilling in browsers via JavaScript Promise Integration (JSPI)
- Avoid partitioning interfaces and components into separate ecosystems based on degree of concurrency; don't give components a "color".
- Allow runtimes to maintain meaningful cross-language call stacks (for the benefit of debugging, logging, tracing and profiling).
- Consider backpressure and cancellation as part of the design.
- Allow non-reentrant synchronous and event-loop-driven core wasm code that assumes a single global linear memory stack to not have to worry about additional reentrancy.
To support the wide variety of language-level concurrency mechanisms listed
above, the Component Model defines a new low-level, language-agnostic async
calling convention (the "async ABI") for both calling into and calling out of
Core WebAssembly. Language compilers and runtimes can bind to this async ABI in
the same way that they already bind to various OS's concurrent I/O APIs (such
as select, epoll, io_uring, kqueue and Overlapped I/O) making the
Component Model "just another OS" from the language toolchain's perspective.
In addition to adding a new async ABI for use by the language's compiler and
runtime, the Component Model also adds a new async effect type that can be
added to function types (in both WIT and raw component function type
definitions) to indicate that the function may block before returning its value.
interface processor {
process: async func(in: inputs) -> outputs; /* may block */
ready: func() -> bool; /* may not block */
}When a function type does not contain async, the Component Model traps if
the callee blocks at runtime before returning a value (as described in more
detail below). Thus, the absence of async means that a host or
component caller never needs to handle the case where the callee blocks before
returning a value. For hosts like browsers with event-loop concurrency, this
invariant is necessary to allow non-async component exports to be called in
synchronous contexts (like event listeners, callbacks, getters, setters and
constructors).
The new async ABI can be used alongside or instead of the existing WASI 0.2
"sync ABI" to call or implement any async-typed functions. When calling an
imported function via the async ABI, if the async callee blocks,
control flow is returned immediately to the caller, and the callee continues
executing concurrently. When implementing an async function via the async
ABI, multiple concurrent export calls are allowed to be made by the caller.
Critically, both sync-ABI-calls-async-ABI and async-ABI-calls-sync-ABI pairings
have well-defined, composable behavior for both inter-component and
intra-component calls.
Because async function exports may be implemented with the sync ABI and
then call async function imports using the sync ABI, traditional sync code
can compile directly to components exporting async functions without having
to be rewritten to use source-language concurrency mechanisms (like callbacks,
async/wait, coroutines, etc). For example, traditional C programs with a
main() and calls to read(), write() and select() can run without change
in the WASI 0.3 wasi:cli/command world, which exports run: async func() -> result. Thus, async in WIT does not require the same kind of transitive
source-code changes as source-level async in languages like C#, Python, JS,
Rust and Dart.
Because async exports impose little to no requirements on the guest
language's style of concurrency, most worlds (including wasi:cli/command,
wasi:http/service and wasi:http/middleware) are expected to export async
functions so that the contained Core WebAssembly code is free to block.
Implementing a non-async function will primarily only arise when a component
is virtualizing the non-async imports of a world (e.g., the getters and
setters of wasi:http/types.headers). In this more exotic virtualization
scenario, a future extension could allow a parent component that
imports async functions to implement its child's non-async imports in the
same manner as JSPI in the browser.
Thus, overall, async in WIT and the Component Model does not behave like a
"color" in the sense described by the popular What Color Is Your Function?
essay.
Each time a component export is called, the wasm runtime logically spawns a new green thread (as opposed to a kernel thread) to execute the export call concurrently with other calls in the runtime. This means that thread-local storage is never reused between export calls and, in general, a caller's thread's identity is never observable to the callee. In some cases (such as when only sync ABI components are used) the runtime can statically, as an optimization, make a plain synchronous function call with the same wasm-observable behavior as-if it had created a new thread. But in general, when one component makes an async call that transitively blocks in another component, having the callee on its own native callstack is needed for the runtime to be able to switch back to the caller without having to unwind the stack.
In addition to the implicit threads logically created for export calls, Core
WebAssembly code can also explicitly create new green threads by calling the
thread.new-indirect built-in. Regardless of how they were created, all
threads can call a set of Component Model-defined thread.* built-in functions
(listed below) to suspend themselves and/or resume other
threads. These built-ins provide sufficient functionality to implement both the
internally-scheduled "green thread" and the externally-scheduled "host thread"
use cases mentioned in the goals.
Until the Core WebAssembly shared-everything-threads proposal allows Core
WebAssembly function types to be annotated with shared, thread.new-indirect
can only call non-shared functions (via (table funcref) index, just
like call_indirect) and thus currently all threads must execute
cooperatively in a sequentially-interleaved fashion, switching between
threads only at explicit program points just like (and implementable via) a
traditional OS fiber. While these cooperative threads do not allow a
single component instance to increase its internal parallelism, cooperative
threads are still quite useful for getting existing threaded code to Just Work
(as-if running on a single core) without the overhead of CPS Transform
techniques like Asyncify and without depending on shared-everything-threads.
Moreover, in various embeddings, all available parallelism is already saturated
by running independent component instances on separate kernel threads.
Because new threads are (semantically, if not physically) created at all
cross-component call boundaries, the degree of shared and non-shared thread
use is kept an encapsulated implementation detail of a component (similar to
the choice of linear vs. GC memory). This enables component authors to
compatibly change their implementation strategy over time, starting simple and
adding complexity for performance as needed over time.
To provide wasm runtimes with additional optimization opportunities for
languages with "stackless" concurrency (e.g. languages using async/await),
two async ABI sub-options are provided: a "stackless" async ABI selected by
providing a callback function and a "stackful" async ABI selected by not
providing a callback function. The stackless async ABI allows core wasm to
repeatedly return to an event loop to receive events (delivered to the
callback), thereby clearing the native stack for the benefit of the wasm
runtime while waiting in the event loop.
To propagate backpressure, it's necessary for a component to be able to say
"there are too many concurrent export calls already in progress, don't start
any more async calls until I let some of the current calls complete". Thus,
the Component Model provides a built-in way for a component instance to apply
and release backpressure that callers experience by having their import call
immediately block.
This backpressure mechanism provides the basis for how the sync and async ABIs interoperate:
- If a component calls an import using the async ABI, and the import is implemented by a component using the sync ABI, the callee first acquires an "exclusive" lock on the component instance and then starts executing. If the callee blocks, execution is immediately transferred back to the caller (as required by the async ABI).
- If another async call attempts to start in this same component instance, the callee immediately blocks when acquiring the "exclusive" lock, waiting for the previous call to return and release the lock.
Note that because functions without async in their type are not allowed to
block, non-async functions do not attempt to acquire the "exclusive" lock;
they just barge in. Components exporting a mix of async and non-async
functions (which again mostly only arises in the more advanced virtualization
scenarios) must therefore take care to handle the "barge-in" case gracefully.
Because this nested non-async call will complete synchronously without
blocking, this behavior does not break Component Invariant #3: a single
global shadow stack can still be (re)used in a LIFO manner, much like a
traditional signal handler.
Lastly, WIT is extended with two new type constructors—future<T> and
stream<T>—to allow new WIT interfaces to explicitly represent concurrency in
both the sync and async ABIs in way that can be bound to many language's
idiomatic futures, promises, streams and channels. Futures and streams are,
semantically, unidirectional unbuffered channels with a dynamically-enforced
session type describing the passing of exactly 1 or 0..N values, resp., with
the additional ability for the reader end to signal a loss of interest
to the writer end. Thus, futures and streams are more primitive concepts than,
e.g., Unix pipes (which have an associated intermediate memory buffer that
values are copied into and out of). Rather, streams could be used to define
higher-level concepts like pipes, HTTP response bodies or stream transformers.
E.g.:
resource pipe {
constructor(buffer-size: u32);
write: func(bytes: stream<u8>) -> result;
read: func() -> stream<u8>;
}
resource response {
constructor(body: stream<u8>);
consume-body: func() -> stream<u8>;
}
transform: func(in: stream<point>) -> stream<point>;A future or stream in a function signature always refers to the transfer of
unique ownership of the readable end of the future or stream. To get a
writable end, a component must first internally create a (readable, writable)
end pair (via the {stream,future}.new built-ins) and then pass the readable
end elsewhere (e.g., in the above WIT, as a parameter to an imported
pipe.write or as a result of an exported transform). Given the readable or
writable end of a future or stream (represented as an i32 index into the
component instance's handle table), Core WebAssembly can then call a
{stream,future}.{read,write} built-in to synchronously or asynchronously
copy into or out of a caller-provided buffer of Core WebAssembly linear (or,
soon, GC) memory.
The following concepts are defined as part of the Component Model's concurrency support.
As described in the summary, each call to a component export logically creates a new (green) thread which, in many cases, can be optimized away and replaced with a synchronous function call. Each call to a component export also creates a new task that contains this new thread. Whereas a thread contains a callstack and other execution state, a task contains ABI bookkeeping state that is used to enforce the Canonical ABI's rules for export calls. Tasks are themselves contained by the component instance whose export was called. Thus, the overall containment relationship is:
Component Store
↓ contains
Component Instance
↓ contains
Task
↓ contains
Thread
where a component store is the top-level "thing" and analogous to a Core WebAssembly store.
The reason for the thread/task split is that, when one thread creates a new thread, the new thread is contained by the task of the original thread which creates an N:1 relationship between threads and tasks that ties N threads to the original export call (= "task") that transitively spawned those N threads. This relationship serves several purposes described in the following sections.
In the Canonical ABI explainer, threads, tasks, component instances and
component stores are represented by the Thread, Task,
ComponentInstance and Store classes, resp.
As mentioned above, calling a component export creates a task to track the state used to enforce Canonical ABI rules that apply to the callee (an example being: the number of received borrowed handles that still need to be dropped before the call returns).
Symmetrically, calling a component import creates a subtask to track the state used to enforce Canonical ABI rules that apply to the caller (an example being: which handles have been lent that the caller can't drop until the call resolves).
When one component calls another, there is thus a new task+subtask pair created to ensure that both components uphold their end of the Canonical ABI rules. But when the host calls a component export, there is only a task and, symmetrically, when a component calls a host-defined import, there is only a subtask. Thus, the async call stack at the point when a component calls a host-defined import will have the general form:
[Host]
↓ host calls component export
[Component Task]
↓ component calls import implemented by another component's export 0..N times
[Component Subtask <> Component Task]*
↓ component calls import implemented by the host
[Component Subtask <> Host task]
Here, the ↓ arrow represents the subtask relationship (the dual of which
is the supertask relationship). Since a task+subtask pair have the same
supertask, they can be thought of as a single node in the async call stack.
A subtask/supertask relationship is immutably established when an import is called, setting the current task as the supertask of the new subtask created for the import call. Thus, one reason for associating every thread with a "containing task" is to ensure that there is always a well-defined async call stack.
The async call stack is not currently observable to running components, except
that it may nondeterministically appear as part of the callstack stored in
error-context 📝. (In the future, functionality could be added to
allow a donut wrapping parent to follow the async call stack from a child's
import call to a child's export call.) Instead, the async call stack is
currently used to provide backtraces when debugging, profiling, tracing and
logging. While particular languages can and do maintain their own async call
stacks in core wasm state, without the Component Model's async call stack,
linkage between different languages would be lost at component boundaries,
leading to a loss of overall context in multi-component applications.
There is an important gap between the Component Model's minimal form of Structured Concurrency and the Structured Concurrency support that appears in popular source language features/libraries. Often, "Structured Concurrency" refers to an invariant that all "child tasks" finish or are cancelled before a "parent task" completes. However, the Component Model doesn't force a subtask's threads to all return before the supertask's threads all return. The reason for not enforcing a stricter form of Structured Concurrency at the Component Model level is that there are important use cases where forcing a supertask's thread to stay resident just to wait for subtasks to finish would waste resources without tangible benefit. Instead, we can say that once a supertask's last thread finishes execution, the supertask semantically "tail calls" any still- executing subtasks, staying technically-alive and on the async call stack until they complete, but not consuming real resources.
For scenarios where one component wants to non-cooperatively put an upper bound on execution of a call into another component, a separate "blast zone" feature is necessary in any case (due to iloops and traps).
At any point in time while executing Core WebAssembly code or a canonical built-in called by Core WebAssembly code, there is a well-defined current thread whose containing task is the current task.
The "current thread" is specified in terms of stack-switching with a
current-thread effect for retrieving the current thread from the parent
resume handler's state. However, due to structural invariants, engines can
reliably optimize this current-thread effect by storing the current thread in
the VM's execution state (or a special Core WebAssembly global) so that it
could be cheaply loaded and/or kept in register state.
Threads store their containing task so that the "current task" is always
current_thread.task.
Because there is always a well-defined current task and tasks are always created for calls to typed functions, it is therefore also always well-defined to refer to the current task's function's type, e.g., when returning a value or determining whether blocking is allowed.
The Component Model provides a set of built-in Core WebAssembly functions for creating and running threads.
New threads are created with the thread.new-indirect built-in. As mentioned
above, a spawned thread inherits the task of the spawning
thread which is why threads and tasks are N:1. thread.new-indirect adds a new
thread to the component instance's threads table and returns the i32 index of
this table entry to the Core WebAssembly caller. Like pthread_create,
thread.new-indirect takes a Core WebAssembly function (via index into a
funcref table) and a "closure" parameter to pass to the function when called
on the new thread. However, unlike pthread_create, the new thread is
initially in a "suspended" state and must be explicitly "resumed" using one of
the following 3 thread built-ins. Once the thread is resumed, the thread can
learn its own index by calling the thread.index built-in.
A suspended thread (identified by thread-table index) can be resumed at some
nondeterministic point in future via the thread.resume-later built-in. In
contrast, the thread.yield-then-resume built-in switches execution to the
given thread immediately, leaving the calling thread to be resumed at some
nondeterministic point in the future. Lastly, the thread.suspend-then-resume
built-in switches execution to the given thread immediately, like
thread.yield-then-resume, but leaves the calling thread in the "suspended"
state. These three functions can be used to resume both newly-created threads as
well as threads that executed and then suspended.
Threads can also explicitly put themselves in the "suspended" state without
specifying the other thread to run by calling the thread.suspend built-in.
This is useful if a thread needs to wait on some condition that will be met by
some unknown thread in the future (which will resume the suspended thread).
Similarly, threads can explicitly put themselves in the "ready to run" state
without specifying the other thread to run by calling thread.yield. This is
useful if a thread has a long-running computation without I/O but still needs to
allow other cooperative threads to make progress concurrently.
Lastly, in addition to being able to switch to "suspended" threads, threads can
also switch to threads that are in a "ready to run" state by calling the
thread.suspend-then-promote and thread.yield-then-promote built-ins
which, like the thread.{suspend,yield}-then-resume built-ins, leave the
calling thread in a "suspended" or "ready to run" state, resp. The calling
thread may know that the target thread is ready to run (e.g., because the
target thread is known to have yielded or to be waiting on a future/stream
operation that the calling thread just completed). However, in general,
readiness may depend on nondeterministic external I/O and the calling thread may
just want to yield its timeslice to the target thread if it's ready as a
scheduling optimization. Thus, if the target thread is not "ready to run",
these built-ins are defined to gracefully fall back to the behavior of the
thread.{suspend,yield} built-ins.
Together, these thread built-ins support both the "green thread" use cases, where Core WebAssembly code running inside the component wants to fully control thread scheduling (via suspending and resuming built-ins), and the "host thread" use cases, where the Core WebAssembly code wants to let the containing runtime nondeterministically schedule threads (via yielding built-ins) with hints (via the promoting built-ins) — or a mixture of both.
Each thread contains a distinct mutable thread-local storage array. The
current thread's thread-local storage can be read and written from core wasm
code by calling the context.get and context.set built-ins.
The thread-local storage array's length is currently fixed to contain exactly
2 elements with the goal of allowing this array to be stored inline in whatever
existing runtime data structure is already efficiently reachable from ambient
compiled wasm code. Because module instantiation is declarative in the
Component Model, the imported context.{get,set} built-ins can be inlined by
the core wasm compiler as-if they were instructions, allowing the generated
machine code to be a single load or store. This makes thread-local storage a
natural place to store:
- a pointer to the linear-memory "shadow stack" pointer
- a pointer to a struct used by the runtime to implement the language's thread-local features
Both of context.{get,set} take an immediate argument of i32 or i64 to
indicate the return or argument type. As part of component-level validation, all
context.{get,set} definitions within a single component are required to
specify the same thread-local element type, so that there is no mixing of
types between loads and stores. This restriction would allow Core WebAssembly
reference types to be used as thread-local storage element types in the future.
When threads are created explicitly by thread.new-indirect, the lifetime of
the thread-local storage array ends when the function passed to
thread.new-indirect returns and thus any linear-memory allocations associated
with the thread-local storage array should be eagerly freed by guest code right
before returning. Similarly, since each call to an export logically creates a
fresh thread, thread-local allocations can be eagerly released when this
implicit thread exits by returning from the exported function or, if the
stackless async ABI is used, returning the "exit" code to the event loop. This
non-reuse of thread-local storage between distinct export calls avoids what
would otherwise be a likely source of TLS-related memory leaks.
Since the same mutable thread-local storage cells are shared by all core wasm
running under the same thread in the same component, the cells' contents must
be carefully coordinated in the same way as native code has to carefully
coordinate native ISA state (e.g., the FS or GS segment base address). In the
common case, thread-local storage is only context.set by the entry trampoline
invoked by canon_lift and then all transitively reachable core wasm code
(including from any callback) assumes context.get returns the same value.
Thus, if any non-entry-trampoline code calls context.set, it is the
responsibility of that code to restore this default assumption before
allowing control flow to escape into the wild.
For more information, see context.get in the AST explainer.
When a thread calls an import using the async ABI, the Component Model
guarantees that if the callee task blocks, control
flow is immediately returned back to the caller's thread. When the callee is
implemented by the host, what counts as "blocking" is up to the host; e.g.,
the host can arbitrarily determine whether file I/O "blocks" or not depending on
whether the host is implemented using traditional synchronous OS syscalls or an
asynchronous io_uring. However, when the callee is implemented by another
component, the Component Model defines what counts as "blocking".
There are several ways for a task to potentially "block":
- synchronously calling an
asyncfunction that transitively blocks or hits backpressure - suspending the current thread via the
thread.suspend{,-then-promote}built-ins - cooperatively yielding (e.g., during a long-running computation) via the
thread.yield{,-then-promote}built-ins or, when using the stacklesscallbackABI, returning with theYIELDcode - waiting for one of a set of concurrent operations to complete via the
waitable-set.waitbuilt-in or, when using the stacklesscallbackABI, returning with theWAITcode - synchronously waiting for a stream or future operation to complete via the
{stream,future}.{,cancel-}{read,write}built-ins - synchronously waiting for a subtask to cooperatively cancel itself via the
subtask.cancelbuilt-in
Since Component Model concurrency is specified in terms of the Core WebAssembly
stack-switching proposal, each of the above represents a point where the
current thread may suspend with the $block effect.
Each of these points also serves as a cooperative yield point where
Component Invariant #2 allows reentrance. However, just because the current
thread suspends doesn't mean that the task has officially "blocked": what
happens next depends on the state of the task and the declared function type:
If the task has already returned a value to the caller, then control flow returns to the caller and, from the caller's perspective, the call returns normally without blocking. The callee's threads can continue executing, but what happens with these threads no longer matters to the caller.
If instead the task has not yet returned a value and the callee's function
type declares the async effect, control flow returns directly to the
caller. If the caller used the async ABI, then control flow returns to Core
WebAssembly, indicating that the call "blocked" by returning the non-zero index
of a new subtask. If the caller used the sync ABI,
then the caller immediately suspends with the $block effect and this process
repeats recursively up the stack.
Lastly, if the task has not yet returned a value and the callee's function type
does not declare the async effect, then the task is not allowed to "block"
and will trap if it ends up blocking. However, just because the current thread
has suspended doesn't mean that the overall task is blocked: if there are any
other threads in the callee's component instance that are in the "ready to
run" state, progressing them may unblock returning a value
(e.g., by releasing a lock or computing a dependency) and so the Component Model
repeatedly resumes threads that are ready to run until either the task returns a
value or there are no more eligible ready threads (and the call traps).
See the end of canon_lift in the Canonical ABI Explainer for more details.
These rules achieve expressive parity with what would otherwise be possible
using a CPS transform like Asyncify to implement a synchronous function. For
example, they allow synchronous functions to be implemented by pthreads that
switch and make progress at cooperative yield points. And if there is only a
single pthread, since thread.yield always leaves the calling thread in a
"ready to run" state, thread.yield effectively becomes a no-op (until a value
is returned).
When an async-typed function is called with the async ABI and the call
blocks before returning a value, the return value to the Core
WebAssembly caller is the index of a newly-created subtask representing the
concurrent execution of the callee. Subtasks are a kind of waitable.
Multiple waitables can be added to a waitable set to wait for one of them
to make progress. Waitable sets work like a simplified version of epoll and
are designed to avoid the O(N) cost associated with traditional
select-style primitives.
Specifically, waitable sets are created and used via the following built-ins:
waitable-set.new: return a new empty waitable setwaitable.join: add, move, or remove a given waitable to/from a given waitable setwaitable-set.wait: wait until one of the waitables in the given set has a pending event and then return that eventwaitable-set.poll: if any of the waitables in the given set has a pending event, return that event; otherwise return a sentinel "none" value
In addition to subtasks, (the readable and writable ends of) streams and futures are also waitables, which means that a single waitable set can uniformly wait on all the kinds of heterogeneous I/O available in the Component Model.
Streams and Futures have two "ends": a readable end and writable end. When
consuming a stream or future value as a parameter (of an export call with
a stream or future somewhere in the parameter types) or result (of an
import call with a stream or future somewhere in the result type), the
receiver always gets unique ownership of the readable end of the stream
or future. When producing a stream or future value as a parameter (of
an import call) or result (of an export call), the producer can transfer
ownership of a readable end that it has either been given by the outside world
or freshly created via {stream,future}.new (which also return a fresh paired
writable end that is permanently owned by the calling component instance).
Based on this, stream<T> and future<T> values can be passed between
functions as if they were synchronous list<T> and T values, resp. For
example, given f and g with types:
f: func(x: whatever) -> stream<T>;
g: func(s: stream<T>) -> stuff;Given this, g(f(x)) works as you might hope, concurrently streaming the
results of f into g.
Given the readable or writable end of a stream, core wasm code can call the
imported stream.read or stream.write canonical built-ins, resp., passing the
pointer and length of a linear-memory buffer to write-into or read-from, resp.
These built-ins can either return immediately if >0 elements were able to be
written or read immediately (without blocking) or return a sentinel "blocked"
value indicating that the read or write will execute concurrently. The readable
and writable ends of streams and futures can then be
waited on to make progress. Notification of
progress signals completion of a read or write (i.e., the bytes have already
been copied into the buffer). Additionally, readiness (to perform a read or
write in the future) can be queried and signalled by performing a 0-length
read or write (see the Stream State section in the Canonical ABI explainer
for details).
As a temporary limitation, if a read and write for a single stream or
future occur from within the same component and the element type is a
non-empty, non-number type, there is a trap. In the future this limitation will
be removed.
The T element type of streams and futures is optional, such that future and
stream can be written in WIT without a trailing <T>. In this case, the
asynchronous "values(s)" being delivered are effectively meaningless unit
values. However, the timing of delivery is meaningful and thus future and
stream can used to convey timing-related information. Note that, since
functions are asynchronous by default, a plain f: func() conveys completion
without requiring an explicit future return type. Thus, a function like
f2: func() -> future would convey two events: first, the return of f2, at
which point the caller receives the readable end of a future that, when
successfully read, conveys the completion of a second event.
The Stream State and Future State sections describe the runtime state maintained for streams and futures by the Canonical ABI.
When passed a non-zero-length buffer, the stream.read and stream.write
built-ins are "completion-based" (in the style of, e.g., Overlapped I/O or
io_uring) in that they complete only once one or more values have been
copied to or from the memory buffer passed in at the start of the operation.
In a Component Model context, completion-based I/O avoids intermediate copies
and enables a greater degree of concurrency in a number of cases and thus
language producer toolchains should attempt to pass non-zero-length buffers
whenever possible.
Given completion-based stream.{read,write} built-ins, "readiness-based" APIs
(in the style of, e.g., select or epoll used in combination with
O_NONBLOCK) can be implemented by passing an intermediate non-zero-length
memory buffer to stream.{read,write} and signalling "readiness" once the
operation completes. However, this approach incurs extra copying overhead. To
avoid this overhead in a best-effort manner, stream.{read,write} allow the
buffer length to be zero in which case "completion" of the operation is allowed
(but not required) to wait to complete until the other end is "ready". As the
"but not required" caveat suggests, after a zero-length stream.{read,write}
completes, there is no guarantee that a subsequent non-zero-length
stream.{read,write} call will succeed without blocking. This lack of
guarantee is due to practical externalities and because readiness may simply
not be possible to implement given certain underlying host APIs.
As an example, to implement select() and non-blocking write() in
wasi-libc, the following implementation strategy could be used (a symmetric
scheme is also possible for read()):
- The libc-internal file descriptor table tracks whether there is currently a
pending write and whether
select()has indicated that this file descriptor is ready to write. - When
select()is called to wait for a stream-backed file descriptor to be writable:select()starts a zero-length write if there is not already a pending write in progress and then waits on the stream (along with the otherselect()arguments).- If the pending write completes,
select()updates the file descriptor and returns that the file descriptor is ready.
- When
write()is called for anO_NONBLOCKINGfile descriptor:- If there is already a pending
stream.writefor this file descriptor,write()immediately returnsEWOULDBLOCK. - Otherwise:
write()callsstream.write, forwarding the caller's buffer.- If
stream.writereturns that it successfully copied some bytes without blocking,write()returns success. - Otherwise, to avoid blocking:
write()callsstream.cancel-writeto regain ownership of the caller's buffer.- If
select()has not indicated that this file descriptor is ready,write()starts a zero-length write and returnsEWOULDBLOCK. - Otherwise, to avoid the potential infinite loop:
write()copies the contents of the caller's buffer into an internal buffer, starts a newstream.writeto complete in the background using the internal buffer, and then returns success.- The above logic implicitly waits for this background
stream.writeto complete before the file descriptor is considered ready again.
- If there is already a pending
The fallback path for when the zero-length write does not accurately signal
readiness resembles the buffering normally performed by the kernel for a
write syscall and reflects the fact that streams do not perform internal
buffering between the readable and writable ends.
Once a component exports functions using the async ABI, multiple concurrent
export calls can start piling up, each consuming some of the component's finite
private resources (like linear memory), requiring the component to be able to
exert backpressure to allow some tasks to finish (and release private
resources) before admitting new async export calls. To do this, a component may
call the backpressure.inc built-in to increment a component-instance-wide
"backpressure" counter until resources are freed and then call
backpressure.dec to decrement the counter. When the backpressure counter is
greater than zero, new export calls immediately return in the "starting" state
without calling the component's Core WebAssembly code. By using a counter
instead of a boolean flag, unrelated pieces of code can report backpressure for
distinct limited resources without prior coordination.
In addition to explicit backpressure set by wasm code, there is also an
implicit source of backpressure to ensure Component Invariant #3 and protect
non-reentrant core wasm code. In particular, when an async-typed export is
lifted with the sync ABI or the stackless async ABI, a component-instance-wide
lock is implicitly acquired every time core wasm is executed. By returning to
the event loop after every event (instead of once at the end of the task),
stackless async exports release the lock between every event, allowing a higher
degree of concurrency than synchronous exports. Stackful async exports ignore
the lock entirely and thus achieve the highest degree of (cooperative)
concurrency.
Since non-async functions are not allowed to block (including due to
backpressure) and also don't pile up like async functions, non-async
functions ignore backpressure (explicit and implicit) entirely. If a
component exports a mix of async and non-async functions, code generation
must therefore be prepared to handle non-async functions executing at
any cooperative yield point, even in the middle of a callback.
Once a task is allowed to start according to these backpressure rules, its arguments are lowered into the callee's linear memory and the task is in the "started" state.
The way an async export returns its value using the async ABI is by calling
task.return, passing the core values that are to be lifted as parameters.
When using the async ABI, any of the threads contained by a task can call
task.return; there is no "main thread" of a task. When the last thread of a
task returns, there is a trap if task.return has not been called. Thus, some
thread (either the thread created implicitly for the initial export call or some
thread transitively created by that thread) must call task.return.
Returning values by calling task.return allows a task to continue executing
even after it has passed its initial results to the caller. This is also
possible even with the sync ABI by using cooperative threads. Continuing
to execute after returning a value can be useful for various finalization tasks
(freeing memory or performing logging, billing or metrics operations) that don't
need to be on the critical path of returning a value to the caller, but the
major use of executing code after task.return is to continue to read and write
from streams and futures. For example, a stream transformer function of type
func(in: stream<T>) -> stream<U> will immediately task.return a stream
created via stream.new and then sit in a loop interleaving stream.reads (of
the readable end passed for in) and stream.writes (of the writable end it
stream.newed) before exiting the task.
Once task.return is called, the task is in the "returned" state. Calling
task.return when not in the "started" state traps.
Component Model async support is careful to ensure that borrowed handles work
as expected in an asynchronous setting, extending the dynamic enforcement used
for synchronous code:
When a caller initially lends an owned or borrowed handle to a callee, a
num_lends counter on the lent handle is incremented when the subtask starts
and decremented when the caller is notified that the subtask has
returned. If the caller tries to drop a handle while the handle's
num_lends is greater than zero, the caller traps. Symmetrically, each
borrow handle passed to a callee increments a num_borrows counter on the
callee task that is decremented when the borrow handle is dropped. If a
callee task attempts to return when its num_borrows is greater than zero, the
callee traps.
In an asynchronous setting, since there can be multiple overlapping async tasks
executing in a component instance, a borrowed handle must track which task's
num_borrows was incremented so that the correct counter can be decremented
and there is a trap upon task.return if num_borrows is nonzero.
Once an async call has started, blocked and been added to the caller's table,
the caller may decide that it no longer needs the results or effects of the
subtask. In this case, the caller may cancel the subtask by calling the
subtask.cancel built-in.
Once cancellation is requested, since the subtask may have already racily returned a value, the caller may still receive a return value. However, the caller may also be notified that the subtask is in one of two additional terminal states:
- the subtask was cancelled before it started, in which case the caller's arguments were not passed to the callee (in particular, owned handles were not transferred); or
- the subtask was cancelled before it returned, in which case the arguments were passed, but no values were returned. However, all borrowed handles lent during the call have been dropped.
Thus there are three terminal states for a subtask: returned, cancelled-before-started and cancelled-before-returned. A subtask in one of these terminal states is said to be resolved. A resolved subtask has always dropped all the borrowed handles that it was lent during the call.
Cancellation is cooperative, delivering the request for cancellation to one
of the subtask's threads and then allowing the subtask to continue executing
for an arbitrary amount of time (calling imports, performing I/O and everything
else) until the subtask decides to call task.cancel to confirm the
cancellation or, for whatever reason, call task.return as-if there had been
no cancellation. task.cancel enforces the same "all borrowed handles dropped"
rule as task.return, so that once a subtask is resolved, the caller knows its
lent handles have been returned. If the subtask was waiting to start due to
backpressure, the subtask is immediately aborted without running the callee at
all.
When subtask.cancel is called, it will attempt to immediately resume one of
the subtask's threads which is in a cancellable state, passing it a sentinel
"cancelled" value. A thread is in a "cancellable" state if it calls one of the
blocking built-ins with the cancellable immediate set (indicating
that the caller expects and propagates cancellation appropriately) or, if using
a callback, returns to the event loop (which always waits cancellably). If a
subtask has no cancellable threads, no thread is resumed and the request for
cancellation is remembered in the task state, to be delivered immediately at
the next cancellable wait. In the worst case, though, a component may never
wait cancellably and thus cancellation may be silently ignored.
subtask.cancel can be called synchronously or asynchronously. If called
synchronously, subtask.cancel blocks until the subtask reaches a resolved
state and returns which state was reached. If called asynchronously, then if a
cancellable subtask thread is resumed and the subtask reaches a resolved
state before blocking for whatever reason subtask.cancel will return
which state was reached. Otherwise, subtask.cancel will return a "blocked"
sentinel value and the caller must wait via
waitable set until the subtask reaches a resolved state.
The Component Model does not provide a mechanism to force prompt termination of threads as this can lead to leaks and corrupt state in a still-live component instance. In the future, prompt termination could be added as part of a "blast zone" feature that promptly destroys whole component instances, automatically dropping all handles held by the destroyed instance, thereby avoiding the leak and corruption hazards.
Component Model concurrency support necessarily introduces a degree of
nondeterminism. However, until Core WebAssembly adds
shared-everything-threads, Component Model concurrency is cooperative,
which means that nondeterministic behavior can only be observed at well-defined
points in the program. Once shared-everything-threads is added,
WebAssembly's full weak memory model will be observable, but only within
components that use the new shared attribute on functions.
One inherent source of potential nondeterminism that is independent of the
Component Model is the behavior of host-defined import and export calls.
Component Model concurrency extends this host-dependent nondeterminism to the
behavior of the read and write built-ins called on streams and futures
that have been passed to and from the host. However, just as with import and
export calls, it is possible for a host to define a deterministic ordering of
stream and future read and write behavior such that overall component
execution is deterministic.
In addition to the inherent host-dependent nondeterminism, the Component Model adds several internal sources of nondeterministic behavior that are described next. However, each of these sources of nondeterminism can be removed by a host implementing the WebAssembly Deterministic Profile, maintaining the ability for a host to provide spec-defined deterministic component execution for components.
The following sources of nondeterminism arise via internal built-in operations defined by the Component Model:
- If there are multiple waitables with a pending event in a waitable set that is being waited on or polled, there is a nondeterministic choice of which waitable's event is delivered first.
- If multiple threads wait on or poll the same waitable set at the same time, the distribution of events to threads is nondeterministic.
- Whenever a thread yields or waits on a waitable set with an already pending event, whether or not the thread blocks and transfers execution to an async caller or another thread is nondeterministic.
- If multiple threads that previously blocked can be resumed at the same time, the order in which they are resumed is nondeterministic.
- If multiple tasks are blocked by backpressure and the backpressure is disabled, the order in which these pending tasks start, along with how they interleave with new tasks, is nondeterministic.
- If a task containing multiple threads is cancelled, the choice of which thread receives the request for cancellation is nondeterministic.
Despite the above, the following scenarios do behave deterministically:
- If a component
aasynchronously calls the export of another componentb, control flow deterministically transfers toband then back toawhenbreturns or blocks. - If a component
aasynchronously cancels a subtask in another componentb, control flow deterministically transfers toband then back toawhenbresolves or blocks. - If a component
aasynchronously cancels a subtask in another componentbthat was blocked before starting due to backpressure, cancellation completes deterministically and immediately. - When both ends of a stream or future are owned by wasm components, the behavior of all read, write, cancel and drop operations is deterministic (modulo any nondeterministic execution that determines the ordering in which the operations are performed).
Even without concurrency support, it is possible to reenter a component instance
by recursively calling the component's export from a function called by the
component's import. For example, given a component importing imp and exporting
exp, using the JS API, JS code could write:
import source component from './component.wasm';
var instance;
function imp() {
instance.exports.exp();
}
instance = WebAssembly.instantiate(component, { imp });
instance.exports.exp(); // exp ~~> imp ~~> expTo relieve generic bindings generators and component authors from having to conservatively assume that every import call might reenter in this manner, the Component Model has Component Invariant #2. This is enforced by the Canonical ABI using strategically placed traps and boolean flags on component instances.
With native concurrency support, what we'd naturally expect is that if our
component imports imp and exports exp as async functions, then the
following JS code could run the two exp calls concurrently, as if they were JS
async functions:
import source component from './component.wasm';
async function imp() {
await ... some Web API I/O
}
instance = WebAssembly.instantiate(component, { imp });
await Promise.all([
instance.exports.exp(),
instance.exports.exp()
]);In particular, if exp transitively awaits imp, then when imp blocks (via
await), control flow returns to the top-level JS script with instance in a
reenterable state, so that exp can be concurrently invoked a second time.
However, this also means that if we slightly change our original recursive
example to use async and then await before attempting to reenter instance,
there is no trap. The first await in imp returns to top-level, leaving
instance in a reenterable state, so when imp is later resumed from the event
loop, it is allowed to reenter exp.
import source component from './component.wasm';
var instance;
async function imp() {
await Promise.resolve();
await instance.exports.exp();
}
instance = WebAssembly.instantiate(component, { imp });
await instance.exports.exp(); // exp ~~> imp ~~> expThe hazard with this example is that if the outer call to exp internally grabs
and holds a lock while awaiting the call to imp, and if the recursive call to
exp waits to acquire the same lock, there will be a deadlock. In the preceding
async example, since there is no circular dependency between the two calls to
exp, the second call can simply wait for the first to release any lock it
holds.
A concrete example of this hazard is the implicit per-component-instance lock
taken and released by backpressure. E.g., if component lifts
exp synchronously (which triggers implicit backpressure while a call to exp
is running), the recursive call to exp will immediately deadlock.
Unfortunately, it's not possible to reliably discriminate the two cases so that
the second example traps (as it did in the synchronous case) while the first
example succeeds. Given the Component Model's well-defined async call
stack, it might seem possible to tell the cases apart
by checking whether instance is already on the call stack when attempting to
enter exp. However, this doesn't work for two reasons:
First, to properly detect asynchronous recursion, the host embedding would have to maintain something analogous to the Component Model's async call stack, which some hosts (including, currently, browsers) simply do not have a well-defined way to do.
Second, the async call stack is neither necessary nor sufficient to catch these
kinds of asynchronous recursive deadlocks. The async call stack tracks the
causality leading up to a call, which is useful for debugging, tracing,
profiling, etc., but the async call stack doesn't imply that every call on the
stack is blocking on the result of the next call in the chain (unlike with a
synchronous call stack, which does imply this). Moreover, the async call stack
can arbitrarily reset through indirect forms of asynchronous calls (e.g., host
APIs with callbacks like, in a browser, setTimeout), so the absence of
recursion on the async call stack does not guarantee the absence of a circular
asynchronous dependency.
Thus, the Canonical ABI rules don't attempt to distinguish the different kinds of asynchronous reentrance. It is thus the responsibility of component clients to avoid async recursion. Fortunately, in component-to-component compositions, this kind of recursion is only possible when doing advanced higher-order linking (aka donut wrapping). And unlike Component Invariant #2, which directly impacts bindings generators, async recursion only arises when there's blocking and so it's already necessary to support (non-recursive) reentrance.
All start functions (both component-level and Core WebAssembly start functions
called via core instance definition) implicitly have the component-level
function type func(), i.e., they are synchronous and take and return no
arguments. Based on the above description of synchronous functions, this means
that start functions may not block before returning. However, if a
component-level start function is lifted using the async ABI, it may block
after calling task.return, and may thus serve as a long-running "background
task" to which work can be dispatched (e.g., via the setInterval() or
requestIdleCallback() JavaScript APIs). From the perspective of structured
concurrency, these background tasks are new task tree roots (siblings to the
roots created when component exports are called by the host).
As a post-0.3.0 follow-up TODO, component type definitions should be
extended to allow an async effect that declares that component instantiation
is allowed to block. This would be necessary to implement, e.g.,
JS top-level await or I/O in C++ constructors executing during start.
At an ABI level, native async in the Component Model defines for every
async-typed function a non-blocking core function signature that can be
used instead of or in addition to the existing (WASI 0.2) synchronous
core function signature. This non-blocking core function signature is intended
to be called or implemented by generated bindings which then map the low-level
core async protocol to the languages' higher-level native concurrency features.
Given these imported WIT functions (using the fixed-length-list feature 🔧):
world w {
import foo: async func(s: string) -> u32;
import bar: async func(s: string) -> string;
import baz: async func(t: list<u64; 5>) -> string;
import quux: async func(t: list<u32; 17>) -> string;
}the default/synchronous lowered import function signatures (assuming 32-bit memories) are:
;; sync
(func $foo (param $s-ptr i32) (param $s-len i32) (result i32))
(func $bar (param $s-ptr i32) (param $s-len i32) (param $out-ptr i32))
(func $baz (param i64 i64 i64 i64 i64) (param $out-ptr i32))
(func $quux (param $in-ptr i32) (param $out-ptr i32))Here: foo, bar and baz pass their parameters as "flattened" core value
types while quux passes its parameters via the $in-ptr linear memory
pointer (due to the Canonical ABI limitation of 16 maximum flattened
parameters). Similarly, foo returns its result as a single core value while
bar, baz and quux return their results via the $out-ptr linear memory
pointer (due to the current Canonical ABI limitation of 1 maximum flattened
result).
The corresponding asynchronous lowered import function signatures are:
;; async
(func $foo (param $s-ptr i32) (param $s-len i32) (param $out-ptr i32) (result i32))
(func $bar (param $s-ptr i32) (param $s-len i32) (param $out-ptr i32) (result i32))
(func $baz (param $in-ptr i32) (param $out-ptr i32) (result i32))
(func $quux (param $in-ptr i32) (param $out-ptr i32) (result i32))Comparing signatures, the differences are:
- Async-lowered functions have a maximum of 4 flat parameters (not 16).
- Async-lowered functions always return their value via linear memory pointer.
- Async-lowered functions always have a single
i32"status" code.
Additionally, when the parameter and result pointers are read/written depends on the status code:
- If the low 4 bits of the status are
0, the call didn't even start and so$in-ptrhasn't been read and$out-ptrhasn't been written and the high 28 bits are the index of a new async subtask to wait on. - If the low 4 bits of the status are
1, the call started,$in-ptrwas read, but$out-ptrhasn't been written and the high 28 bits are the index of a new async subtask to wait on. - If the low 4 bits of the status are
2, the call returned and so$in-ptrand$out-ptrhave been read/written and the high 28 bits are0because there is no async subtask to wait on.
When a parameter/result pointer hasn't yet been read/written, the async caller must take care to keep the region of memory allocated to the call until receiving an event indicating that the async subtask has started/returned.
Other example asynchronous lowered signatures:
| WIT function type | Async ABI |
|---|---|
async func() |
(func (result i32)) |
async func() -> string |
(func (param $out-ptr i32) (result i32)) |
async func(x: f32) -> f32 |
(func (param $x f32) (param $out-ptr i32) (result i32)) |
async func(s: string, t: string) |
(func (param $s-ptr i32) (param $s-len i32) (param $t-ptr i32) (param $t-len i32) (result i32)) |
future and stream can appear anywhere in the parameter or result types. For example:
async func(s1: stream<future<string>>, s2: list<stream<string>>) -> result<stream<string>, stream<error>>In both the sync and async ABIs, a future or stream in the WIT-level type
translates to a single i32 in the ABI. This i32 is an index into the
current component instance's handle table. For example, for the WIT function type:
async func(f: future<string>) -> future<u32>the synchronous ABI has signature:
(func (param $f i32) (result i32))and the asynchronous ABI has the signature:
(func (param $f i32) (param $out-ptr i32) (result i32))where $f is the index of a future (not a pointer to one) while while
$out-ptr is a pointer to a linear memory location that will receive an i32
index.
For the runtime semantics of this i32 index, see lift_stream,
lift_future, lower_stream and lower_future in the Canonical ABI
Explainer. For a complete description of how async imports work, see
canon_lower in the Canonical ABI Explainer.
Given an exported WIT function:
world w {
export foo: async func(s: string) -> string;
}The default sync export function signature for export foo is:
;; sync
(func (param $s-ptr i32) (param $s-len i32) (result $retp i32))where (working around the continued lack of multi-return support throughout
the core wasm toolchain) $retp must be a 4-byte-aligned pointer into linear
memory from which the 8-byte (pointer, length) of the string result can be
loaded.
The async export ABI provides two flavors: stackful and stackless.
The stackful ABI is currently gated by the 🚟 feature.
The async stackful export function signature for export foo (defined above
in world w) is:
;; async, no callback
(func (param $s-ptr i32) (param $s-len i32))The parameters work just like synchronous parameters.
There is no core function result because a callee returns their
value by calling the imported task.return function which has signature:
;; task.return
(func (param $ret-ptr i32) (param $ret-len i32))The parameters of task.return work the same as if the WIT return type was the
WIT parameter type of a synchronous function. For example, if more than 16
core parameters would be needed, a single i32 pointer into linear memory is
used.
The async stackless export function signature for export foo (defined above
in world w) is:
;; async, callback
(func (param $s-ptr i32) (param $s-len i32) (result i32))The parameters also work just like synchronous parameters. The callee returns
their value by calling task.return just like the stackful case.
The (result i32) lets the core function return what it wants the runtime to do next:
- If the low 4 bits are
0, the callee completed (and calledtask.return) without blocking. - If the low 4 bits are
1, the callee wants to yield, allowing other code to run, but resuming thereafter without waiting on anything else. - If the low 4 bits are
2, the callee wants to wait for an event to occur in the waitable set whose index is stored in the high 28 bits.
When an async stackless function is exported, a companion "callback" function must also be exported with signature:
(func (param i32 i32 i32) (result i32))The (result i32) has the same interpretation as the stackless export function
and the runtime will repeatedly call the callback until a value of 0 is
returned. The i32 parameters describe what happened that caused the callback
to be called again.
For a complete description of how async exports work, see canon_lift in the
Canonical ABI Explainer.
For a list of working examples expressed as executable WebAssembly Test (WAST) files, see this directory.
This rest of this section sketches the shape of a component that uses async
to lift and lower its imports and exports with both the stackful and stackless
ABI options.
Starting with the stackful ABI, the meat of this example component is replaced
with ... to focus on the overall flow of function calls:
(component
(import "fetch" (func $fetch async (param "url" string) (result (list u8))))
(core module $Libc
(memory (export "mem") 1)
(func (export "realloc") (param i32 i32 i32 i32) (result i32) ...)
...
)
(core module $Main
(import "" "mem" (memory 1))
(import "" "realloc" (func (param i32 i32 i32 i32) (result i32)))
(import "" "fetch" (func $fetch (param i32 i32 i32) (result i32)))
(import "" "waitable-set.new" (func $new_waitable_set (result i32)))
(import "" "waitable-set.wait" (func $wait (param i32 i32) (result i32)))
(import "" "waitable.join" (func $join (param i32 i32)))
(import "" "task.return" (func $task_return (param i32 i32)))
(global $wsi (mut i32))
(func $start
(global.set $wsi (call $new_waitable_set))
)
(start $start)
(func (export "summarize") (param i32 i32)
...
loop
...
call $fetch ;; pass a string pointer, string length and pointer-to-list-of-bytes outparam
... ;; ... and receive the index of a new async subtask
global.get $wsi
call $join ;; ... and add it to the waitable set
...
end
loop ;; loop as long as there are any subtasks
...
global.get $wsi
call $wait ;; wait for a subtask in the waitable set to make progress
...
end
...
call $task_return ;; return the string result (pointer,length)
...
)
)
(core instance $libc (instantiate $Libc))
(alias $libc "mem" (core memory $mem))
(alias $libc "realloc" (core func $realloc))
;; requires 🚟 for the stackful abi
(canon lower $fetch async (memory $mem) (realloc $realloc) (core func $fetch'))
(canon waitable-set.new (core func $new))
(canon waitable-set.wait (memory $mem) (core func $wait))
(canon waitable.join (core func $join))
(canon task.return (result string) (memory $mem) (core func $task_return))
(core instance $main (instantiate $Main (with "" (instance
(export "mem" (memory $mem))
(export "realloc" (func $realloc))
(export "fetch" (func $fetch'))
(export "waitable-set.new" (func $new))
(export "waitable-set.wait" (func $wait))
(export "waitable.join" (func $join))
(export "task.return" (func $task_return))
))))
(canon lift (core func $main "summarize")
async (memory $mem) (realloc $realloc)
(func $summarize async (param "urls" (list string)) (result string)))
(export "summarize" (func $summarize))
)Because the imported fetch function is canon lowered with async, its core
function type (shown in the first import of $Main) takes pointers to the
parameter and results (which are asynchronously read-from and written-to) and
returns the index of a new subtask. summarize calls waitable-set.wait
repeatedly until all fetch subtasks have finished, noting that
waitable-set.wait can return intermediate progress (as subtasks transition
from "starting" to "started" to "returned") which tell the surrounding core
wasm code that it can reclaim the memory passed arguments or use the results
that have now been written to the outparam memory.
Because the summarize function is canon lifted with async, its core
function type has no results; results are passed out via task.return. It also
means that multiple summarize calls can be active at once: once the first
call to waitable-set.wait blocks, the runtime will suspend its callstack
(fiber) and start a new stack for the new call to summarize. Thus,
summarize must be careful to allocate a separate linear-memory stack in its
entry point and store it in context-local storage (via context.set) instead
of simply using a global, as in a synchronous function.
Note that removing async from the type of summarize specified in the canon lift definition would cause the above component to trap when it attempted to
call waitable-set.wait.
The stackful example can be re-written to use the callback immediate (thereby
avoiding the need for fibers) as follows.
Note that the internal structure of this component is almost the same as the
previous one (other than that summarize is now lifted from two core wasm
functions instead of one) and the public signature of this component is
the exact same.
Thus, the difference is just about whether the stack is cleared by the core wasm code between events, not externally-visible behavior.
(component
(import "fetch" (func $fetch async (param "url" string) (result (list u8))))
(core module $Libc
(memory (export "mem") 1)
(func (export "realloc") (param i32 i32 i32 i32) (result i32) ...)
...
)
(core module $Main
(import "" "mem" (memory 1))
(import "" "realloc" (func (param i32 i32 i32 i32) (result i32)))
(import "" "fetch" (func $fetch (param i32 i32 i32) (result i32)))
(import "" "waitable-set.new" (func $new_waitable_set (result i32)))
(import "" "waitable.join" (func $join (param i32 i32)))
(import "" "task.return" (func $task_return (param i32 i32)))
(global $wsi (mut i32))
(func $start
(global.set $wsi (call $new_waitable_set))
)
(start $start)
(func (export "summarize") (param i32 i32) (result i32)
...
loop
...
call $fetch ;; pass a string pointer, string length and pointer-to-list-of-bytes outparam
... ;; ... and receive the index of a new async subtask
global.get $wsi
call $join ;; ... and add it to the waitable set
...
end
(i32.or ;; return (WAIT | ($wsi << 4))
(i32.const 2) ;; 2 -> WAIT
(i32.shl
(global.get $wsi)
(i32.const 4)))
)
(func (export "cb") (param $event i32) (param $p1 i32) (param $p2 i32)
...
if (result i32) ;; if subtasks remain:
(i32.or ;; return (WAIT | ($wsi << 4))
(i32.const 2)
(i32.shl
(global.get $wsi)
(i32.const 4)))
else ;; if no subtasks remain:
...
call $task_return ;; return the string result (pointer,length)
...
i32.const 0 ;; return EXIT
end
)
)
(core instance $libc (instantiate $Libc))
(alias $libc "mem" (core memory $mem))
(alias $libc "realloc" (core func $realloc))
(canon lower $fetch async (memory $mem) (realloc $realloc) (core func $fetch'))
(canon waitable-set.new (core func $new))
(canon waitable.join (core func $join))
(canon task.return (result string) (memory $mem) (core func $task_return))
(core instance $main (instantiate $Main (with "" (instance
(export "mem" (memory $mem))
(export "realloc" (func $realloc))
(export "fetch" (func $fetch'))
(export "waitable-set.new" (func $new))
(export "waitable.join" (func $join))
(export "task.return" (func $task_return))
))))
(canon lift (core func $main "summarize")
async (callback (core func $main "cb")) (memory $mem) (realloc $realloc)
(func $summarize async (param "urls" (list string)) (result string)))
(export "summarize" (func $summarize))
)For an explanation of the bitpacking of the i32 callback return value,
see unpack_callback_result in the Canonical ABI explainer.
While this example spawns all the subtasks in the initial call to summarize,
subtasks can also be spawned from cb (even after the call to task.return).
It's also possible for summarize to call task.return called eagerly in the
initial core summarize call.
The $event, $p1 and $p2 parameters passed to cb are the same as the
return values from waitable-set.wait in the previous example. The precise meaning of
these values is defined by the Canonical ABI.
In settings like service workers and serverless computing, a single
component instance may handle multiple independent host events by having its
exported functions called repeatedly (and, if they're async, concurrently). In
settings like these, the number and lifetime of component instances and the
degree of reuse is determined by host policies based on factors like
utilization, available parallelism and security. However, when component
instance lifetimes are flexible in this manner and don't have an obvious end
(as opposed to a traditional CLI setting, where a component instance's lifetime
conventionally ends right after main() returns), the host still needs to
understand the expectations of component authors to enable portability.
Before the addition of native concurrency support in WASI 0.3, a natural expectation is that, in the absence of atypical scenarios like timeouts or quota exhaustion, a component author can expect that their component instance will not be abruptly terminated during the execution of contained Core WebAssembly code. But other than that, component authors must conservatively assume that their component instance will be torn down at any time.
With the addition of native concurrency support, these expectations must be
nuanced to account for asynchronous Core WebAssembly execution. In particular,
when using the stackless async callback ABI, an active task may have no active
Core WebAssembly function invocation while it is WAITing on a Waitable Set.
Analogously, when using the stackful async ABI, a Core WebAssembly function
invocation will be suspended (as-if by the stack-switching proposal's
suspend instruction) when it calls waitable-set.wait so that there is also
no Core WebAssembly code actively executing (only a continuation stored in the
component instance's table of threads).
Now, before a task has returned its value, there is still a natural expectation that, even if there is currently no active Core WebAssembly execution, the task is still logically executing and thus the component instance will not be terminated (under normal circumstances; timeout or quota exhaustion could still abruptly terminate the instance). Furthermore, even if a component instance has no active tasks that haven't returned a value, if a component instance is holding the readable or writable end of a stream or future that the host holds the other end of, there is also a natural expectation that the host will keep the component instance alive until all the futures and streams have reached a closed state.
As an example, even after wasi:http/handler's handle function returns a
response resource, the handle function's component instance is expected to
be kept alive as long as it's holding the un-closed writable end of the
stream<u8> contained by the returned response. If, on the other hand, the
stream<u8> was forwarded from the return value of a host import, and so the
component instance is not holding any writable end, this expectation doesn't
apply and so all other conditions being met, the component instance can be
eagerly torn down.
The interesting question is what happens once all tasks have returned their values and all incoming and outgoing streams and futures have been closed if the component instance still contains live threads that are currently suspended (and so not actively executing Core WebAssembly code) but may potentially be resumed in the future to do important post-return work (like performing logging, billing or metrics operations that have been taken off the pre-return critical path for peformance reasons).
If the component instance is conservatively kept alive (until a hard timeout),
this may end up wasting resources for periodic background activities that run in
an infinite (waiting) loop. In particular, cooperative threads used to implement
pthreads are expected to sometimes be used in this manner. On the other hand,
immediately tearing down a component instance as soon as the last byte of an
outgoing stream is written and active Core WebAssembly execution returns or
blocks will break the abovementioned post-return use cases if they involve
waiting on async operations to complete.
To resolve this tension, threads are implicitly distinguished by a "keep-alive"
flag that determines whether the expectation is that the existence of the thread
is intended to keep the containing component instance alive. In WASI 0.3,
this "keep-alive" flag is default set for the implicit thread
created for a task and default cleared for the explicit threads created by
thread.new-indirect. In particular, this means that an
async callback-lifted function will keep its containing component instance
alive until it returns the EXIT code (0).
As an example, in JavaScript, the Service Worker API's waitUntil method
would delay returning the EXIT code. In 0.3.0, without cooperative threads
(🧵), setInterval would also unfortunately delay
returning the EXIT code and thus, without guest code intervention, would keep
component instances alive until timeout limits were hit. The release of
cooperative threads would offer a solution to this problem, but an awkward one.
Instead, the intention is to add new built-in functions that would
provide guest code more direct, dynamic control over its own keep-alive
flags, thereby allowing the JS event loop to clear its keep-alive flag once all
waitUntil promises resolved, thereby allowing setInterval callbacks to keep
running (while the host wants to keep the instance warm), but still indicating
to the host that destruction is welcome at any time.
Lastly, the above discussion refers to component instances, however the host cannot tear down independent component instances when they are linked together (as this would leave dangling function imports). In general, components must be instantiated and destroyed as trees, where the host can only choose when to instantiate or destroy the root component of the tree, and all other child instances are instantiated/destroyed along with the root. Thus, when the above rules set an expectation that any component instance in a tree be kept alive, the whole tree would be kept alive.
Native async support is being added incrementally. Beyond what's currently specified, the following features are being considered for addition to complete the concurrency story:
- remove the temporary trap mentioned above that occurs when a
readandwriteof a stream/future happen from within the same component instance - zero-copy forwarding/splicing
- allow the
stream<char>type to validate; make it usestring-encodingand not split code points - add built-ins providing guest code more control over its containing component instance's lifetime
- some way to say "no more elements are coming for a while"
- add an
asynceffect oncomponenttype definitions allowing a component type to block during instantiation - add an
asynceffect onresourcetype definitions allowing a resource type to block during its destructor - allow a parent component to perform JSPI-like suspension of the sync calls
of its child components, thereby allowing the parent to implement the child's
sync import calls in terms of the parent's
asyncimports. - allow a donut wrapping parent component to ask which of a child's export calls a particular child import call is associated with (e.g., for logging purposes).
- add a
strict-callbackoption that adds extra trapping conditions to provide the semantic guarantees needed for engines to statically avoid fiber creation at component-to-componentasynccall boundaries - allow function closures to be passed as first-class values, supporting the "callback" pattern in many pre-existing APIs, including Web APIs
- allow pipelining multiple
stream.read/writecalls - allow chaining multiple async calls together ("promise pipelining")
- integrate with
shared: define how to lift and lower functionsasyncandshared