It's possible to record the fact that some cons cells (or other structures) are identical. The Lisp reader's #n= notation can be used for this, although usually it's only used to print circular lists. Common Lisp's print-circle can be used to demonstrate this:
Now, if you have a running process, and you want it to seralize a closure (or, generally, an object) that points to a cell that happens to be identical to something else in the process, and you want to read the closure back in and have its pointed-to cell continue to be identical to that other thing... you'll need something more sophisticated. If each object is given some unique id, that would be one approach; that sounds like the mailboxes thing. Another approach would be... say, if the other cell is the 5th element of a list identified with the global name 'blah, then conceivably that could be a good syntax. This might be useful for modules, separate compilation, and/or saving arbitrary process state.
Then, well, you could conceivably change the semantics of "env" as it's passed around in ac.scm. Currently it's just a list of variables that are bound, and things test for whether a variable is present in that list. You could change it to a list of (variable-name macro-it's-bound-to-if-any), and have the special form (let-macro name arglist bodexpr . body) insert `(,name (fn ,arglist ,bodexpr) into env, while everything else puts in (variable nil), and change all the existing tests on "env" to search for "a list whose car is x" rather than "x", and lastly make ac-call call the macro-function on the expression if it finds one in the lexenv.
In theory, one could put arbitrarily complicated information, such as about deduced types of variables, into this "env" mapping, and implement some amount of compiler optimization that way.
First-class macros, of course, are the semantically nicest approach, but more difficult to compile.
It should be possible to get the continuation from the point of failure and the dynamic variables from the failing thread (basically: the stack), the same information from any other running threads, and the set of global variables (this at least can be gotten with (namespace-mapped-symbols)), and trace the graph of reachable objects from there, and serialize it all to a file. I don't know if Racket provides the ability to do all that, though; for one thing, I don't know if there's a way to access the variables saved in a closure (from outside the closure).[1] (Maybe using unsafe operations could do that.) Since tracing the graph of objects is exactly what a GC does, and a proper moving GC has to be able to learn the type of every object and where all the pointers are, it must have that functionality, whether or not it's exposed. (I think it should be exposed, of course.)
Barring that, it's possible that the gdbdump rocketnia points at is the easiest way to do it in Racket.
Also, I guess if you're using the FFI at all (which, say, any GUI program would do), then you do need the full core dump if you want to get the state of the C libraries you're using.
The article uses patterns as an example of a non-expression syntax. Another would be what might be called access expressions that setf can use. And CL does have defsetf (as well as define-modify-macro and such) to extend it (as does Arc with = and defset).
Note that I say "access expression", though—it's designed such that (setf (car x) ...) will modify what (car x) subsequently would return (assuming that "..." doesn't rebind x). But there's also a symmetry in the pattern-matching stuff: if (cons x y) created an object, then the pattern (cons x y) will destructure the object and bind x and y to what they originally were. Perhaps that's simply good practice in designing new syntaxes.
With interpreter semantics, in which a macro gets expanded anew every time a function is called, the late binding comes for free. ;-) Then, if you want the runtime performance that comes from compilation, you optimize for the case where people are not redefining functions, and invalidate old compilation results when they do. I think that rough plan should be doable, though I haven't gotten around to implementing enough of a system to say how well it works. But I think that's the only way to get anything close to good performance in Javascript VMs (not that they expand macros, but I expect they inline function calls and such, which requires similar assumptions about global definitions), and it seems to have been done.
For separate compilation, it does seem clear that what gets serialized will be references like "the object [probably a function] named 'foo in module bar", and structures (s-expressions or otherwise) containing such references. Given that compilation implies macroexpansion, you do have to assume (or verify) that the macros from other modules are what they used to be—and that non-macros (used in functional position at least) are still non-macros. If you have a full-blown Makefile kind of build system, then by default I suppose every file's output depends on the contents of every other file that it uses; or, as an optimization, depends merely on the exact set and definitions of macros exposed from those files. (In the C++ system I encounter at work, code is separated into .cpp and .h files, and editing a .h file causes the recompilation of every .cpp file that recursively depends on it, but editing a .cpp file only causes its own recompilation. If you wanted to imitate that, I guess you'd put macros into a distinctively named set of files, and forbid exportable macros anywhere else.)
---
Thanks! I've sold out and have been working for a medium-sized company doing mostly C++ and bash (the latter is unbelievably useful) for the past 3.5 years. I make intermittent progress on the side doing other things.
> To handle special forms, a solution is to come up with a globally unique random prefix long enough that no one could generate it accidentally (e.g. "xVrP8JItk2Ot"), and then rename the special forms `assign`, `fn`, etc. to `xVrP8JItk2Ot-assign`, `xVrP8JItk2Ot-fn` etc;
Would this prefix be present in the source files of the language implementation? Or generated at runtime, or something else? If the latter, it seems like gensyms are the right approach. If the former, what's the use case—someone writing programs that assume someone might have redefined "assign" and want to work with the builtin? (If it's just hacking arc3.1 to do what we want, I think you can create more or less arbitrary unique objects (vectors, cons cells, a new kind of struct), give them bindings in the base Arc environment, and replace e.g. '(eq? (xcar s) 'quote) with '(eq? (xcar s) the-quote-object).)
Speaking of gensyms, I had a plan recently. I like PG's approach of just sequentially named variables, but (a) my ideal language would probably expose its interned-symbol table, and it'd therefore be more orthogonal and simpler for it to directly expose the "create a symbol without interning it" function, so uninterned symbols should be available for free; (b) it is possible that people will name a variable "gs1", so that's another reason to use true gensyms; (c) many macros use gensyms, but I might like to bootstrap a system that likes macros the expansion of which incurs zero side effects, and incrementing a gensym counter is a side effect. For (c) there might be another approach, but I came up with this one: Have the printer assign sequential numbers to each uninterned symbol it encounters (with a weak hash table). This has the nice effect that, when you do print macroexpansions, the numbers you see will begin at 1, rather than in the hundreds or thousands and varying depending on how many libraries have been loaded.
> (Making `fn` a macro also has the advantage that features such as default arguments can be implemented in Arc).
Yes, that is nice. I actually did this in my Lisp in Summer Projects submission[1] many years ago.
> is there a reason to use different prefixes for macros and functions? We could have a read syntax which evaluated to the value of the symbol (whatever it is), and that would work for both functions and macros.
Eh, no strong reason. I'd thought about doing a reverse variable lookup for every value and printing the variable name if it existed, but it makes less sense if e.g. the value 10 gets printed as #V:<some global variable that happened to be bound to 10>. Also, I figured an IDE-type setup (such as DrRacket) might use different colors or fonts to distinguish macros/functions/special forms/other. Last, you do need a way to print lambdas that don't have a global name (and might not even have a local name), and I'm guessing you'd want that to appear as #f:<some attempt at indicating line number / REPL input>, looking analogous to but different from the global case.
[1] https://github.com/waterhouse/emiya : "[O]ptional arguments are implemented by redefining "fn" (Arc's equivalent of "lambda") as a macro that expands to a call to "underlying-fn", which is bound to the fn special object--again, by the user. Then everything is recompiled, so that this redefinition does not cost anything at runtime; some care is needed here, to avoid an infinite loop, if a function used to recompile "fn" itself uses "fn"."
Thanks for the pointer. He has an entire chapter on hygiene... And then there is this:
"There is no need to provide quotation here because, having failed to enforce the prohibition against embedding combiners in a macro expansion, we don’t need to embed their unevaluated names in the expansion."
It's nice that his primitive $vau grabs the current lexenv, as this enables another kind of macro-like behavior I've thought might be useful: taking subexpressions, fully macroexpanding them, and doing something with the result (e.g. determining whether a variable is ever used, and omitting a computation if not). I don't know how that would mesh with the interpreter's semantics, though...
I'll probably have to read and brood on this further.
The Y combinator itself is more cumbersome, having an extra currying step or two. I prefer the form hjek is using—which is a function that expects to take "itself" as an extra parameter, like this:
(fn (f i)
(aif i!parent
(+ 1 (f f (item it)))
0))
So the recursive call, "(<self> (item it))", is implemented as "(f f (item it))". And then usage is very simple: actually give it itself as an extra argument.
The Y combinator works with a different function signature:
That is, the function takes "something that's not quite itself" as an argument, and returns a function which does one step of computation and may do a "recursive" call using the thing that was passed into it. The implementation would therefore like to be:
(def fix (f) ;aka Y
(fn (i)
((f (fix f)) i)))
But, if we're doing the entire thing with anonymous recursion, we can (laboriously) implement fix like this:
Every recursion step involves creating multiple lambdas. Eek. (It's even worse if you use the general, n-argument Y combinator, in which case you must use "apply" and create lists.) Whereas with hjek's non-curried approach, only a constant number of lambdas have to be created at runtime. (Optimizing compilers might be able to cut it down to 0.)
If you want to create a macro like afn or rfn, and want the user to be able to act like the function is named F and accepts just the parameter i, you can put a wrapper into the macroexpansion, like this:
(rfn F (i)
(aif i!parent
(+ 1 (F (item it)))
0))
->
(fn (i)
((fn (f)
(f f i))
(fn (f i)
(let F (fn (i) (f f i))
(aif i!parent
(+ 1 (F (item it)))
0)))))
And in this case, while the code does call for creating an F-lambda on every recursive call, I think it's easier for the compiler to eliminate it—I don't remember whether I'd gotten Racket to do it. (I think it probably did eliminate it when working with Racket code, but Arc, which generates all the ar-funcall expressions, might not have allowed that.)
The actual code for rfn will create a variable and then modify it, creating a lexical environment with a cycle in it. That's certainly a more straightforward approach. I figure the above is useful only if you're working in a context where you really want to avoid mutation or true cycles. (For example, I am considering a system that detects macros whose expansion is completely side-effect-free. It might be easier to use the above approach to defining iteration than to teach the system that rfn is "close enough" to being side-effect-free.)
I had high hopes for Arc over a decade ago, then it languished, then pg stopped participating.... then I looked at Racket (then PLT-Scheme) since that's what Arc was built on and realized Racket was the language I was looking for :)