Arc Forum | This strategy seems viable, but it seems slow, because there are a lot of checks...

Arc Forum

3 points by stefano 6307 days ago | link | parent

This strategy seems viable, but it seems slow, because there are a lot of checks at every thread switching. If we could manage to transform every Arc function into a C function (actually an array containing the C function address and the closed vars) we could then rely on the pthread library for multi-threading, or the equivalent threading library available on the host system.

3 points by almkglor 6307 days ago | link

The problem with transforming each Arc function to a C function is the stack space consumed at each call. Sure, gcc does tail call optimization, but not all compilers are gcc.

Chicken fixes this by keeping track of stack and GC'ing the stack when it's full; I'm not sure about Bigloo but I hear it's got a pretty good Scheme function == C function equivalency, so it might be useful to see how they fixed the tail call optimization problem there.

That said, the current execute() function accepts a pc argument, which specifies which Arc function it begins with. It may be possible to pass the pc to go to together with a new stack, but I don't know pthreads.

-----

2 points by stefano 6307 days ago | link

I think you could trust every decent C compiler to do tail recursion optimization, but if you want full compatibility then the mapping Arc fun -> C fun doesn't work. pthread requires you to pass a pointer to a C function i.e. an adress where to jump. We could wrap every thread within a C function and leave all the other functions as they are currently implemented. The C function would just call execute() with the right parameters. If execute() doesn't use global vars, there will not be race condition problems.

-----

2 points by almkglor 6307 days ago | link

Well, I've started to import bits of execute from globals to locals. However I do have access to a few bits of global variables, specifically the quoted-constants array (those created by 'foo and '(a b c d), etc.); this table is initialized at startup (before the first call to execute). I would suppose this read-only array would be okay to access?

As for wrapping them in C functions: the problem is that the most basic Arc threading function isn't 'thread, it's 'new-thread, which accepts a function as input. 'thread is defined as:

  (mac thread body
    `(new-thread (fn () ,@body)))

In theory, new-thread could be called with any function:

  (new-thread
   (if
    (something)
      (fn () (ha-ha-ha))
    (something-else)
      (fn () (he-he-he))
      (fn () (je-je-je))))

Sure, it won't happen most of the time, since most people will sensibly use the simpler 'thread, but exploratory, exploratory...

It would be possible to implement if pthreads or whatnot can pass even just a single pointer to the newly-threaded function, but if it can't (why not?) then our alternative is to create a bunch of C functions for each Arc function, which just calls the execute() function with the correct number.

---------

Edit: okay, I did a little eensy-weensy bit of research on pthreads, and it seems that pthreads can pass a pointer to the called C function.

So I suppose we can modify execute's signature:

  obj execute(long pc, obj stack[], obj *sp);
  int main(void){
    /*init...*/
    obj *stack = GC_MALLOC(sizeof(obj)*MAX_STACK);
    obj *sp = stack;
    execute(0, stack, sp);
  }

Then the NEWTHREAD() primitive would be:

  typedef struct{
    long pc;
    obj *stack;
    obj *sp;
  } newthreadpasser;
  #define T_THREAD 9999 //some random number
  typedef struct{
    long type; /*T_THREAD*/
    pthread_t val;
  } thread;
  #define NEWTHREAD() \
   {obj *newstack = GC_MALLOC(sizeof(obj) * MAX_STACK);\
    obj *newsp = &newstack[2]; \
    newstack[0] = POP(); \
    newstack[1] = THREAD_EXIT_CONTINUATION;\
    newthreadpasser * np = GC_MALLOC(sizeof(newthreadpasser));\
    np->pc = ((obj*)newstack[0])[0];\
    np->stack = newstack;\
    np->sp = newsp;\
    thread * threadobj = GC_MALLOC(sizeof(thread));\
    threadobj->type = T_THREAD;\
    pthread_create(&(threadobj->val), NULL, threadcall, (void*) np)\
    PUSH((obj)threadobj);}

Finally, threadcall:

  void *threadcall(void * args){
    newthreadpasser * np = (newthreadpasser *) args;
    execute(np->pc, np->stack, np->sp);
  }

Hmm.

-----

2 points by stefano 6307 days ago | link

This could work. As for global variables, if they are read-only, then there won't be any problem. What happens if you load in parallel two different modules (with their constants)? Maybe loading a file should be made atomic, or at least a part of it, such as constant values initialization.

-----

1 point by almkglor 6306 days ago | link

Well, a "load" at top level simply inserts the code for the loaded file directly into the source for now. Any other load statement is not supported.

-----

1 point by kranerup 6304 days ago | link

Using pthreads might have some benefits but speed is not a reason for choosing pthreads. On Linux systems that maps pthreads onto kernel threads I've seen a two order of magnitude slower execution compared to an user space thread library. In general I'd say that if I/O is important then using pthreads would make sense but if speed is important then green threads is preferable.

-----

2 points by stefano 6304 days ago | link

I thought pthreads were fast, but anyway on computers with more than one core/processor, which are very common, green threads don't distribute the threads between the cores, because this is the OS's duty, and it doesn't know anything about green threads.

-----

3 points by almkglor 6304 days ago | link

Hmm. After a bit of research, it seems that some VM's handle this by using pthreads a few times (presumably one per core), then having each pthread spawn several green threads.

-----

2 points by stefano 6304 days ago | link

This seems a good solution if we can arrange to spawn a new pthread also for blocking I/O, maybe by catching I/O functions and putting into a pthread the I/O operation caller. This hybrid would make the compiler code not very clean, because it should handle both green threads and pthreads.

-----