Arc Forumnew | comments | leaders | submitlogin
2 points by binx 6077 days ago | link | parent

Just leave the peephole stuff to gcc, it almost always does better than handcoded optimizer.

The CPS transformed code can be arbitrarily inlined, so a simple inliner without flow analysis can give you much efficiency for free.



2 points by almkglor 6077 days ago | link

And if the target isn't gcc?

For that matter my concern is with the expansion of PUSH and POP:

   PUSH(x); y = POP();

   =>

   *sp++ = x; y = *--sp;
Can gcc peephole the above, completely eliminating the assignment to the stack (which is unnecessary in our case after all)?

   y = x; //desired target
Without somehow informing gcc to the contrary, gcc will assume that writing to * sp is significant, even though in our "higher-level" view the assignment to * sp is secondary to transferring the data between two locations.

-----

4 points by sacado 6077 days ago | link

Actually, I tried the above (tuning generated code so as to change something like :

  BEGIN_JUMP(3); PUSH(LOCAL(5)); PUSH(LOCAL(6)); PUSH(LOCAL(7)); END_JUMP(3);
to its semantic but obviously much faster equivalent :

  memcpy (stack, stack + 5, sizeof(obj) * 3); sp = stack + 3; END_JUMP(3);
and

  PUSH(x); if(POP)
to

  if (x)
Well, with full optimizations on gcc (-O3), it doesn't change anything (at least in execution time, I didn't compare generated machine codes). Wow, gcc seems really clever. Now that I know how hard it is to implement a compiler, I can only applaud :)

-----

1 point by almkglor 6077 days ago | link

WOW. gcc must be real good then ^^.

-----