Arc Forumnew | comments | leaders | submitlogin
Hygienic macros implemented in Anarki branch
7 points by rntz 5856 days ago | 2 comments
AN EXAMPLE:

I've written an extension to anarki which implements "semi-hygienic quasiquoting". Instead of the ` and , and ,@ syntax, #` and #, and #,@ are used instead. I will explain the precise semantic differences below, but an example should illustrate them best. Consider the following code:

    (def square (x) (* x x))

    (mac dirty (x) `(square ,x))
    (mac also-dirty (x) `(square #,x))
    (mac broken (x) #`(square ,x))
    (mac clean (x) #`(square #,x))

    (mac dirty2 (x)
      (w/uniq xs
        `(let ,xs ,x
           (* ,xs ,xs))))

    (mac clean2 (x)
      (w/uniq xs
        #`(let ,xs #,x
            (* ,xs ,xs))))
Now suppose we load the above code, and try to break it via shadowing the names used in the macroexpansions:

    arc> (let square 2 (dirty square))
    Error: "Function call on inappropriate object 2 (2)"
    arc> (let square 2 (also-dirty square))
    Error: "Function call on inappropriate object 2 (2)"
    arc> (let x 2 (broken x))
    Error: "reference to undefined identifier: __x"
    arc> (let square 2 (clean square))
    4
    arc> (let * 2 (dirty2 *))
    Error: "Function call on inappropriate object 2 (2 2)"
    arc> (let * 2 (clean2 *))
    4
As you can see, 'clean and 'clean2 are properly hygienic under this system. Note that 'clean2 requires the use of 'w/uniq; this is why I call this "semi-hygienic" - it doesn't solve local variable shadowing, only global variable shadowing. I plan on implementing local variable uniqing eventually, but this would require much more significant changes to ac.scm.

IN-DEPTH SEMANTICS:

Firstly, the syntactic sugar I use (#` #, #,@) for hygienic quasiquote and friends is borrowed from mzscheme, which expands them to 'quasisyntax, 'unsyntax, and 'unsyntax-splicing respectively. I have no idea what these things mean in mzscheme; I just found it convenient to hijack them.

What hygienic quasiquoting (hqq'ing) does is change the lexical environment under which code is compiled (that is, translated to mzscheme code). Something which is hqq'ed is compiled using the lexical environment in which the hqq'ed expression was encountered - namely, the macro. However, expressions unquoted into it via "#," or "#,@" are compiled in the lexical environment of whatever called the macro. This is tracked by adding some magic (specifically, mzscheme parameters) in ac.scm.

This changing of lexical environment is accomplished by adding a new kind of code: syntactic closures, an idea borrowed from MIT scheme (but greatly simplified). A syntactic closure is a three-element vector of the form #3(closure <env> <expr>). When the arc compiler sees such an object, it compiles <expr> in <env> instead of the current environment. What "#`" does is put the quasiquotation of its contents into a closure with an empty environment, so it is compiled at the toplevel. What "#," does is put the unquotation of its contents into a closure with the environment from which the most current macro was called (if it's not in a macro, then it uses an empty environment). "#,@" behaves analogously.

From these semantics, the reason why the sample code given above behaves as it does should be clear with some thought. Moreover, a few limitations become apparent. Semi-hygienic quasiquoting cannot be used to implement 'iflet or similar macros, for example, because the unquoted expressions will be compiled in the lexical environment the macro was called in, hence without the extra variable binding iflet introduces. This is a very significant limitation.

To fix this limitation would require tracking not just the environment a syntactic closure originated in, but any modifications which "logically belong" to that environment, so that these can be merged in when the closure is compiled. Both this and auto-uniqing of local variables require examining the code compiled inside a quasiquote in-depth, which I plan on implementing in future.

USING THE HYGIENE BRANCH:

The implementation of the above is available on anarki in a separate branch, named "hygiene". In order to use it, run:

    git checkout --track -b hygiene origin/hygiene
After doing this, you will be on the hygiene branch. You should only need to run the above once. If you want to go back to the master branch, run "git checkout master"; if you want to go again to the hygiene branch, run "git checkout hygiene". For more information on branching in git, read the docs (http://git.or.cz/#documentation).

I believe my changes are totally backwards-compatible, but as I have not tested this thoroughly I am hesitant to commit them to the master branch, especially as I plan on eventually extending it to implement fully hygienic macros, which will require significant (but still ideally backwards-compatible) changes to ac.scm.

I plan on tracking the master branch - this is not a fork, it is an extension. However, as I only occasionally push/pull from anarki, don't expect the branch to be day-by-day up-to-date with the master branch. Feel free to merge changes into the hygiene branch yourself and push to anarki if you so desire, but please don't split the codebase.



2 points by rntz 5837 days ago | link

Fully hygienic macros have now been implemented and pushed to the hygiene branch - w/uniq is no longer necessary. Moreover, this hygiene can be avoided, which makes possible the writing of anaphoric and other macros which deliberately make use of variable capture. All that is necessary is to unsyntax (#,) the variable which is to be captured. 'iflet neatly demonstrates both automatic avoidance of variable capture and the ability to deliberately capture:

    (mac iflet (var expr then . rest)
      #`(let temp #,expr
          (if temp (let #,var temp #,then) #,@rest)))

    (mac aif (expr . body)
      #`(let #,'it #,expr
          ,(if (cddr body)
             `(if #,'it #,(car body) (aif #,@(cdr body)))
             `(if #,'it #,@body))))
Admittedly, in the case of 'aif at least, the "hygienic" way of doing things doesn't give you any more safety than doing it with normal quasiquotes would. However, nothing prevents you from using normal quasiquoting in this case; arc now has the best of both worlds, as it were.

-----

2 points by shader 5855 days ago | link

Interesting idea on using branches to keep development separate from the main trunk. After all that's what it's for ;)

Maybe more of our things should be done that way?

Besides, it's easy to see a list of branches with "git branches -r". If only each branch could have a description.

-----