|I'm proposing a "green threads" model for threading on the arc2c output.|
arc2c converts all functions into "case" blocks for a big switch statement. When a function needs to call another function, it loads a "pc" variable with the number for that function's case statement and jumps to the top. It's like a trampoline.
Since all "jumps" actually go to the top of the switch structure, it may be possible to change execution between threads during the trampoline. Basically, each thread contains its own stack and pc counters, as well as its state (running or dead).
We maintain a circular list of threads.
When the trampoline is entered we commit the sp, stack, pc, and state into the thread's structure. Then we get the 'cdr of the circular list of threads, check if that thread is dead, and while the cdr thread dead, get its cdr and replace the current cdr with that, until we reach the current thread (and if it's dead, exit), or we reach a live thread.
If the thread is live we copy its sp, stack, pc, and state, and proceed to the switch structure.
When building a new thread, we construct a stack with the provided function's closure and a %halt continuation (which are always the first parameters to all functions in the arc2c output), make a new sp that points just beyond the parameters, set the pc to the provided function, and set the state to alive, then insert this to the circular list of threads.
We could also make this part optional, so that non-threaded applications will not have the overhead of the thread-switching.
The advantage is that we can get threads even without OS support (say on an embedded system without a threaded RTOS...), and thus be very portable. Also, this makes atomic operations trivial: we can wrap atomic operations in a pair of (%start-atomic) and (%stop-atomic) primitives, which simply set a C-variable, atomic; when the trampoline detects that the C-variable atomic is set it simply skips over threading unless the state is dead (although there is a potential problem with this if an atomic operation, say, exits via a continuation). Finally, the Boehm GC currently being used doesn't seem to handle "real" threads very well, so our emulation of multiple threads would work.
The disadvantages are that I/O operations will block all threads, unless we complicate our threading model, wherein we have a "blocked" state, and use only asynchronous I/O (including the current async I/O with the blocked state, so that the trampoline can poll the I/O operation and unblock the thread if the I/O has completed). Also, we cannot expect the OS to distribute execution to other processors/cores (obviously because we're not asking for it).