Arc Forumnew | comments | leaders | submitlogin
Deeper into the guts of macroexpansion
2 points by akkartik 3854 days ago | 12 comments
This is going to be a long, mind-bending story about macro-generating macros. Grab a chair.

I've been interested in generic functions for a long time, the ability to say:

  def (len x) :case (isa queue x)
    ..
(The syntax is irrelevant, though I like this form. The crux is to be able to insert behaviors to names in multiple places, allowing us to separate concerns. I can define len in one place, and then extend it for some new case far away in space and time. Inspired by https://en.wikipedia.org/wiki/Ruby_%28programming_language%29#Open_classes)

---

At some point I realized that I could also similarly extend macros (http://arclanguage.org/item?id=13790; http://arclanguage.org/item?id=17233):

  mac (with x ... rest) :case (x = 'infile)
    ..

  with infile "x"
    (read)
  => 123  # reads from file "x"
The way this works is that this:

  mac (name params) :case cond
    ..
expands to this:

  let $old name
    mac (name params)
      if cond
        ..
        `(,$old ,params)
Got all that? Good. No? Skip the rest and leave a question.

---

So things stood for a long long time (since http://arclanguage.org/item?id=13790). But then a few days ago I was trying to build a data structure that needed to override a mutator:

  mac (push x s) :case (isa stack s)
    `(push ,x (rep ,s))
And it just didn't work. Eventually I figured out the problem: I'd setup conditions to evaluate at macroexpansion time, when only the names of args are available, not values. Basically, the above expression was expanding to this:

  let $old push
    mac (push x s)
      if (isa stack s)
        `(push ,x (rep ,s))
        `(,$old ,x ,s)
But I needed it to expand to this:

  let $old push
    mac (push x s)
      `(if (isa stack ,s)
         (push ,x (rep ,s))
         (,$old ,x ,s))
In other words, I needed to implicitly generate a backquote. So I spent several hours trying to make this work, with weird syntax where backquotes didn't match unquotes:

  mac (push x s) :case (isa stack ,s)
    (push ,x (rep ,s))
..but something would always break. Eventually I realized why: I need room to operate on the names before starting on the backquote. Imagine something like this:

  mac (foo n)
    with (a ..
          b ..)
      `(bar ,a ,b)
You can't handle this if you're implicitly generating the backquote. You need to control what happens at macroexpansion time and what happens at eval time.

---

Ok, so what can we try next? Perhaps this means push shouldn't be a macro? Hey, wart has selective evaluation. Do mutation operators really need to be macros? What if I changed the original definition of push from this:

  mac (push x seq)
    `(<- ,seq (cons ,x ,seq))
to this:

  def (push x ('name | seq))
    (<- name (cons x seq))
(Wart has the '|' operator for haskell-style as-expressions, so I can here use both the name of the second arg and its value: http://arclanguage.org/item?id=16970)

But no, that doesn't work because the binding to name doesn't exist in this lexical scope. We do need a macro.

---

Ok, next idea: two backquoted expressions. I'm gonna perform extricate the backquote in:

  let $old push
    mac (push x s)
      `(if (isa stack ,s)
         (push ,x (rep ,s))
         (,$old ,x ,s))
Macros aren't primitives in wart. Inspired by Kernel, they're built out of eval and caller_scope. So this:

  mac (foo n)
    `(+ ,n 1)
is identical to this:

  foo <- (fn (n) (eval `(+ ,n 1) caller_scope))
I'm gonna pull this transform into the condition expression above, so that push now code-generates to:

  let $old push
    mac (push x s)
      if (eval `(isa stack ,s) caller_scope)
        `(push ,x (rep ,s))
        `(,$old ,x ,s))
Now the backquotes and quotes all line up, suggesting we might be on the right track:

  mac (push x s) :qcase `(isa stack ,s)
    `(push ,x (rep ,s))
:qcase is my ad hoc name for "eval-time case that needs to operate on the values of args".

---

But that isn't quite right; there's one last wrinkle. The caller_scope inside the macro body is different from the caller_scope after returning from the macro. The expansion for mac is:

  <- ,name (fn ',params
             (eval ((fn() ,@body))
                   caller_scope))
The eval and the fn seem to overlay their own values to caller_scope, which we can tunnel through by:

  ((caller_scope 'caller_scope) 'caller_scope)
(Scopes are just tables keyed by symbol names.)

So the final expansion is:

  let $old push
    mac (push x s)
      if (eval `(isa stack ,s) (macro_caller_scope))  # expression above
        `(push ,x (rep ,s))
        `(,$old ,x ,s))
---

So what do people think of all this? Hack piled on hack, or useful? I'm still not convinced I fully understand why eval and fn each add an indirection to caller_scope. Fortunately the definition of (macro_caller_scope) seems stable. If it needed to change everytime I redefined mac, that would suck. (Yes, I extend mac just like any other macro. Several times.) That it's stable increases confidence that there isn't some further use case I haven't considered that'll bring this whole edifice crumbling down.

But even if there isn't, I'm starting to question if lisp is indeed the hundred-year programming model. I'm starting to feel at the end of my tether, at the limits of what macroexpansion can cleanly support.

If you want to read more, here's how I extend mac: https://github.com/akkartik/wart/blob/5b093c852c/047generic.wart. As always, if you want to play with these code snippets:

  $ git clone http://github.com/akkartik/wart
  $ cd wart
  $ ./wart
Feel free to ask me more questions, either here or over email (see my profile).


2 points by rocketnia 3852 days ago | link

Where you said this...

  foo <- (fn (n) (eval `(+ ,n 1) caller_scope))
...I think you meant to say this, with an unevaluated argument list and another (fn ...) form:

  foo <- (fn '(n) (eval ((fn () `(+ ,n 1))) caller_scope))
That said, it's okay to drop the ((fn () ...)) in this case, since the code inside doesn't refer to caller_scope.

---

"I'm still not convinced I fully understand why eval and fn each add an indirection to caller_scope ."

(For the moment, I'm going to overlook eval here.)

That's just how you chose to build it, right? Isn't the whole point of caller_scope that it's specific to the current call?

I think the main issue you're encountering is that you always use the same name, "caller_scope", resulting in shadowing. Kernel uses an explicit parameter, as in ($vau () env ...), but Wart doesn't. This isn't necessarily a problem, because you can do something like this to work around the shadowing when necessary:

  (fn (x)
    (let my_scope caller_scope
      (fn (y)
        ...)))
If you actually want to call a variable "caller_scope" even though it's not the current caller scope, you might have trouble with that, but you do have a couple of options.

I think this code will evaluate the ... as though it's just inside the (fn (x) ...), so that it can see the correct binding of caller_scope:

  (fn (x)
    (let fnx_scope ((fn () caller_scope))
      (fn (y)
        (eval '... fnx_scope))))
Unfortunately, the ... can't see the binding of y.

If you tweak the language a bit, you could make the implicit caller_scope parameter get shadowed by the explicit parameters. That is, if a function is of the form (fn (caller_scope) ...), you could let it ignore its implicit caller_scope binding altogether. (Currently Wart treats this just like any other duplicate argument name, and it throws an error.) In that case, you could use this technique:

  (fn (x)
    (let my_scope caller_scope
      (fn (y)
        (let caller_scope my_scope
          ...))))
Unfortunately, the ... can see the binding of my_scope. You might be able to get around this with a gensym.

---

"I'm still not convinced I fully understand why eval and fn each add an indirection to caller_scope ."

(Now to tackle eval.)

Are you sure 'eval adds a layer of indirection? I think the only indirection you're encountering is caused by 'fn. I just tried it out in Wart to make sure:

  ((fn () ((fn () (caller_scope = caller_scope)))))
  => ''1
  ((fn () ((fn () (caller_scope = (eval 'caller_scope caller_scope))))))
  => nil
  ((fn () ((fn () (caller_scope = ((fn () (eval 'caller_scope caller_scope))))))))
  => ''1
The first ((fn () ...)) layer establishes a binding for caller_scope, but it's an empty environment. The second ((fn () ...)) layer establishes another binding for caller_scope, so now caller_scope is bound to a first-class environment with a single variable in it named "caller_scope".

Within this second layer, we get a true value for (caller_scope = caller_scope). This is just a sanity check.

We false value for (caller_scope = (eval 'caller_scope caller_scope)), presumably because we're comparing the two different environments we've established.

Finally, it seems we can successfully use ((fn () (eval '<foo> caller_scope)))) in place of <foo>, because although it introduces a new environment, it evals in the old one.

Here's another way we can verify that eval doesn't introduce its own binding for caller_scope:

  ((fn () (eval 'caller_scope caller_scope)))
  020lookup.cc:28 no binding for caller_scope
  => nil
  ((fn () (eval '2 caller_scope)))
  => 2
This all seems reasonable to me.

---

"But I needed it to expand to this:"

  let $old push
    mac (push x s)
      `(if (isa stack ,s)
         (push ,x (rep ,s))
         (,$old ,x ,s))
I think you almost got it there. What you really needed was this:

  mac (push x s) :case `(isa stack ,s)
    `(push ,x (rep ,s))
  
  -->
  
  let $old push
    mac (push x s)
      `(if ,`(isa stack ,s)
         ,`(push ,x (rep ,s))
         (,$old ,x ,s))
The difference is the addition of two groups of ,` heiroglyphics.

In between the , and the ` you have "room to operate on the names before starting on the backquote":

  mac (foo n) :case (let a .. `(isa foo-type ,a))
    with (a ..
          b ..)
      `(bar ,a ,b)
  
  -->
  
  let $old foo
    mac (foo n)
      `(if ,(let a .. `(isa foo-type ,n))
         ,(with (a ..
                 b ..)
            `(bar ,a ,b))
         (,$old ,n))
Since this approach would work for Arc-style (compile time) macros, I think it's what you were really looking for. Your "final expansion" branches on run time information before generating the code, so it's specific to fexprs.

---

"But even if there isn't, I'm starting to question if lisp is indeed the hundred-year programming model. I'm starting to feel at the end of my tether, at the limits of what macroexpansion can cleanly support."

False alarm! I'm pretty sure this was just a case of being rusty at one's own programming language. ^_-

-----

1 point by akkartik 3851 days ago | link

Thanks rocketnia! I'm still trying to get mac to generate the comma-backquote expressions, but I'll report back.

(Thanks for finding the typo in foo!)

-----

1 point by akkartik 3844 days ago | link

I've been playing with this for a few days, and I'm still stuck on how to modify this decl:

  (def foo (a) `(list ,a))
so that:

  arc> (foo 3)
  (list 3)
instead returns (list ,3). How can you generate a literal unquote? Perhaps this is a bug in wart's core? Perhaps it's impossible to do if commas don't expand to unquote?

-----

2 points by rocketnia 3842 days ago | link

Sorry, that toy example is too small for me. I don't understand why this macro redefinition approach would lead to that snag (although it does look like a plausible snag to end up in ^_^ ). Could you give some more context?

---

"Perhaps it's impossible to do if commas don't expand to unquote ?"

From what you're saying, it sounds like you're using some custom data structures, but that you just haven't given yourself all the primitives you need to manipulate them.

  > foo <- (make-unquote 12)
  ,12
  
  > (unquote? foo)
  t
  
  > (unquote-get foo)
  12
  
  > foo <- (make-quasiquote 12)
  `12
  
  > (quasiquote? foo)
  t
  
  > (quasiquote-get foo)
  12
Pardon the pseudo-Wart, pseudo-Arc, pseudo-Scheme. XD

-----

1 point by akkartik 3842 days ago | link

Ah, yes I can manipulate commas like that. I was wondering if there's a quasiquote/unquote idiom to handle them :)

-----

1 point by akkartik 3836 days ago | link

Ok, I think I've found a testcase proving that putting the if within backquotes won't work.

  # Simplified expansion of:
  #   mac (foo x)
  #     `(+ ,x 1)
  #
  #   mac (foo x) :case nil
  #     `(list ,car.x)

  mac (foo x)
    `(if nil
       ,`(list ,(car x))
       ,`(+ ,x 1))

  (foo 3)
The problem here is that both branches of the if will be macroexpanded, throwing an unnecessary error for the call to car.

So it seems we do need the if to be outside backquotes, and my subsequent hacks :/ Do you see any holes in this argument?

-----

2 points by rocketnia 3836 days ago | link

First of all, I'm noticing your expansion is suspicious in a couple of ways.

The case itself should be a piece of generated code:

  -#   mac (foo x) :case nil
  +#   mac (foo x) :case `nil
  
  -  `(if nil
  +  `(if ,`nil
I made this same change back in http://arclanguage.org/item?id=18050 where I said "What you really needed was this." It might have looked like I was only talking about the expansion result, but the usage syntax changes too.

Also, this seems to be unrelated to the issue you're having, but I just realized you need to be careful about variable scope. What if the original definition said "mac (foo x)" but the extension said "mac (foo y)"? I recommend putting in a form that binds the right variables:

  -  `(if ,`nil
  +  `(if ,(with (x x)
  +          `nil)
  -    ,`(list ,(car x))
  +    ,(with (x x)
  +       `(list ,(car x)))
       ,`(+ ,x 1))
(Careful though, because this (with ...) form will add a layer of indirection to caller_scope.)

---

To address the issue you're encountering, yes, every case will expand at every call site, even if there's no intuitive need for it to expand there. It can't branch on a decision that hasn't been made until run time, unless you're willing to use the full power of fexprs. (I assume you're trying to maintain a semblance of stage separation, or you wouldn't distinguish macros from functions.)

Keeping this stuff in mind, you can write your code so that it doesn't give you that error:

  # Simplified expansion of:
  #   mac (foo x)
  #     `(+ ,x 1)
  #
  #   mac (foo x) :case `nil
  #     if acons.x
  #       `(list ,car.x)
  #       `(err "I hope this code will never be reached.")

  mac (foo x)
    `(if ,(with (x x)
            `nil)
       ,(with (x x)
          (if (acons x)
           `(list ,(car x))
           `(err "I hope this code will never be reached.")))
       ,`(+ ,x 1))

-----

2 points by rocketnia 3836 days ago | link

"What if the original definition said "mac (foo x)" but the extension said "mac (foo y)"? I recommend putting in a form that binds the right variables"

Whoops, that (with ...) stuff isn't ideal because it leaves the outer variables in scope. That is, if the original definition has a parameter called "list", an extension will see that binding even if it uses a different name for the same parameter, so it might try to call the global 'list and be surprised.

I suppose I recommend doing this instead, where g123-args is a gensym:

  mac (foo . g123-args)
    `(if ,(apply (fn (x)
                   `nil)
                 g123-args)
       ,(apply (fn (x)
                 (if (acons x)
                   `(list ,(car x))
                   `(err "I hope this code will never be reached.")))
               g123-args)
       ,(apply (fn (x)
                 `(+ ,x 1))
               g123-args))
Er, and there's one more change I should probably make here so you can actually use this for what you're trying to do.

  let g1-old foo
    (mac (foo . g2-args)
      `(if ,(apply (fn (x)
                     `nil)
                   g2-args)
         ,(apply (fn (x)
                   (if (acons x)
                     `(list ,(car x))
                     `(err "I hope this code will never be reached.")))
                 g2-args)
         
         (,g1-old ,@g2-args)
         ; or in Arc, ",(apply rep.g1-old g2-args)"
         
         )

-----

2 points by rocketnia 3852 days ago | link

"Perhaps this means push shouldn't be a macro?"

In Arc, my immediate approach would be to define 'push as a macro, but also to give it a procedure version that's just a little more verbose. Then we can extend the procedure version and leave the macro version alone.

Here's some working Arc 3.1 code, which calls the macro "my-push", rather than overwriting Arc's own 'push:

  (mac place (p)
    (w/uniq new-val
     `(make-place (fn () ,p) (fn (,new-val) (= ,p ,new-val)))))
  
  (def make-place (getter setter)
    (annotate 'place (list getter setter)))
  
  (def place-get (place)
    (rep.place.0))
  
  (def place-set (place new-val)
    rep.place.1.new-val)
  
  (defset place-get (place)
    (w/uniq g-place
      `((,g-place ,place)
        (place-get ,g-place)
        [place-set ,g-place _])))
  
  
  (mac my-push (elem stack)
    `(fn-push ,elem (place ,stack)))
  
  (def fn-push (elem stack)
    (zap [cons elem _] place-get.stack))
Here's a Gist of this code: https://gist.github.com/rocketnia/6848219

-----

2 points by fallintothis 3853 days ago | link

This is interesting, but my first reaction is that push wouldn't necessarily need to be a macro at all with the right reference-passing/modifying semantics. For instance, Arc's scar & scdr manipulate the contents at an address:

  ; Won't work on nil, whatever.  Just a case in point.

  (def my-push (x addr)
    (let old-car (car addr)
      (scar addr x)
      (scdr addr (cons old-car (cdr addr)))
      addr))

  arc> (= xs (= ys '(1 2 3)))
  (1 2 3)
  arc> (my-push 4 ys)
  (4 1 2 3)
  arc> ys
  (4 1 2 3)
  arc> xs
  (4 1 2 3)

  ; versus

  arc> (= xs (= ys '(1 2 3)))
  (1 2 3)
  arc> (push 4 ys)
  (4 1 2 3)
  arc> ys
  (4 1 2 3)
  arc> xs
  (1 2 3)
Coming up with a simultaneously clean, terse, and simple pointer model is a different exercise, I suppose. And not one I'm particularly qualified to participate in (I'm no C guru).

Have you found any other use-cases for this qcase idea?

-----

2 points by akkartik 3853 days ago | link

Thanks for that idea! No I don't have a second use case that requires dispatching on the value of macro args. I'll keep an eye out and report back.

I'm a little leery of using pointer-based semantics. Probably because of my C background :p Your analysis was very thorough, and I like that push doesn't silently modify the values of other variables besides the one requested. I hadn't noticed that subtlety before. Yet another reason to avoid scar and co.

-----

2 points by fallintothis 3852 days ago | link

I'm a little leery of using pointer-based semantics. Probably because of my C background :p

Yeah, it's an interesting problem. One that I should probably understand better... I mean, I "get" it: the RAM machine model is simple. But I've not programmed enough C (or similar) to understand how far the semantics would permeate day-to-day programs.

There's the part of me that's like "oh, just have primitives to reference & dereference addresses; easy". I seem to recall Lisps that use something like (ref x) and (deref x), and the explicitness pleases me, coming from "modern" languages where you have to remember odd rules about whether/where things are passed as pointers. But then I read C code and see all these pointers-to-pointers all over the place---in spots of code where I normally wouldn't think of pointers at all. Then again, that might be endemic to C, where you're probably just being destructive by default.

I've always wondered how ML works out in this respect (http://homepages.inf.ed.ac.uk/stg/NOTES/node83.html). Its system seems elegant on first glance, but I have no experience in it. I seem to keep finding myself regretting this lately...I should pick up a project to scratch my ML-learning itch.

-----