I can't speak to elisp, but the way macro systems work in Arc and Racket, the code inside a macro call could mean something completely different depending on the macro. Some macros could quote it, compile it in their own way, etc. So any code occurring in a macro call generally can't be transformed without changing the meaning of the program. Trying to detect and process other macro calls inside there is unreliable.
I have ideas in mind for how macro systems can express "Okay, this macro call is over; everything beyond this point in the s-expression is an expression." But that doesn't help with Arc or Racket, whose macro systems aren't designed for that.
So something like your situation, where you need to walk the code before knowing which macroexpander to subject each part of it to, can't reliably treat the code as code. It's better to treat the code as a meaningless soup of symbols and parentheses (or even as a string). You can walk through the data and find things like `(%language ...)` and treat those as escape sequences.
(What elisp is doing there looks like custom escape sequences, which I think is ultimately a more concise way of doing things if new macro definitions are rare. It gets into a middle ground between having s-expression soup and having a macro system that's designed for letting code be walked like this.)
Processing the scope of variables is a little difficult, so my escape sequences would be a bit more verbose than your example. It's not like we can't take a Racket expression and infer its free variables, but we can only do that if we're ready to call the Racket macroexpander, which isn't part of the approach I'm describing.
(I heard elisp is lexically scoped these days. Is that right?)
This is how I'd modify the escape sequences. This way it's clear what variables are passing between languages:
(%language arc ()
(let in (instring " foo")
(%language scm ((in in))
(let-values (((a b c) (port-next-location in)))
(%language el ((c c))
(with-current-buffer (generate-new-buffer "bar")
(insert (prin1-to-string c))
Actually, instead of just (in in), I might also specify a named strategy for how to convert that value from an Arc value to a Racket value.
Anyhow, once we walk over this and process the expressions, we can wind up with generated code like this:
We also collect enough metadata in the process that we can write harnesses to call these blocks at the right times with the right values.
This is a general-purpose technique that should help with any combination of languages. It doesn't matter if they run in the same address space or anything; that kind of detail only changes what options you have for value marshalling strategies.
I think there's a somewhat more convenient approach that might be possible between Arc and Racket, since their macroexpanders both run in the same process and can trade off with each other: We can have an Arc macro that expands its body as Racket code (essentially Anarki's `$`) and a Racket macro that expands its body as Arc code. But there are some difficulties designing the latter, particularly in terms of Racket's approach to hygiene and its local macros, two things the Arc macroexpander has zero concept of. When we switch from Racket to Arc and back to Racket, the hygiene information and local macro scopes will probably be obliterated.
In your arcmacs project, I guess you might also be able to have an Arc macro that expands its body as elisp code, an elisp macro that expands its body as Racket code, etc. :-p So maybe that's the approach you really want to take with `%language` and I'm off on a tangent with this "escape sequence" interpretation.
You're hitting on a problem I've been thinking about for years. There are a few reasons this is tricky, notably related to detecting whether something is a variable reference or a variable declaration.
(let in (instring " foo")
(let-values (((a b c) (port-next-location in)))
(with-current-buffer (generate-new-buffer "bar")
(insert (prin1-to-string c))
To handle this example, you'll need to know whether each form is a function call, a variable definition, a list of definitions (let-values), a function call, and which target the function is being called for.
For example, an arc function call needs to expand into `(ar-apply foo ...)`
And due to syntax, you can't just handle all the cases by writing some hypothetical very-smart `ar-apply` function. If your arc compiler targets elisp, it's tempting to try something like this:
(ar-apply let (ar-apply (ar-apply a (list 1))) ...
which can collapse nicely back down to
(let ((a 1)) ...)
in other words, it's tempting to try to defer the "syntax concern" until after you've walked the code and expanded all the macros. Then you'd collapse the resulting expressions back down to the language-specific syntax.
But it quickly becomes apparent that this is a bad idea.
Another alternative is to have a "standard language" which all the nested languages must transpile tO:
(%let in (%call instring " foo")
(%let (a b c) (%call port-next-location in)
(|with-current-buffer| (%call generate-new-buffer "bar")
(%call insert (prin1-to-string c)
Now, that seems much better! You can take those expressions and easily spit out code for scheme, elisp, arc, or any other target. And from there it's just a matter of adding shims on each runtime.
The tricky case to note in the above example is with-current-buffer. It's an elisp macro, meaning it has to end up in functional position like (with-current-buffer ...) rather than something like (funcall #'with-current-buffer ...)
There are two ways to deal with this case. One is to hook into elisp's macroexpand function and expand the macros as you go. Emacs calls this eager macroexpansion, and there are some cases related to autoloading (I think?) that make this not necessarily a good idea.
The other way is to punt, and have the user indicate "this is an elisp form; do not mess with it."
The idea is that if the symbol in functional position is surrounded by pipe chars, then the compiler should leave its position alone but compile the arguments. So
(Edit: Oops, I replied out of order and didn't read shawn's comment with the elisp examples before writing this.)
I suspect what pg means by it is primarily that it's tricky to do in Racket (though I'm not sure if it'd be because there are too few options or too many).
Essentially, I think it's easy to display a closure by displaying its source code and all the captured values of its source code's free variables. (Note that this might be cyclic since functions are often recursive.)
But there is something tricky about it, which is: What language is the source code in? In my opinion, a language design is at its most customizable when even its built-in syntaxes are indistinguishable from user-defined ones. So the ideal displayed format of a function would ideally involve some particular set of user-definable syntaxes. Since a language designer can't anticipate what user-defined syntaxes will exist, clearly this decision should ultimately be up to a user. But what mechanism does a user use to express this decision?
As a baseline, there's at least one straightforward choice the user can make: The one that expresses the code in terms of Arc special forms (`fn`, `assign`, `if`, `quote`, etc.) and Arc functions that aren't implemented in Arc (`car`, `apply`, etc.). In a language that isn't aiming for flawless customizability, this is enough.
Now suppose we try to generalize this so a user can choose any set of syntaxes to express things in terms of -- say, "I want the usual language, but with `=` treated as an additional built-in." If the code contains `(= x (rfn ...))`, then the macroexpander at some point needs to expand the `rfn` without expanding the `=`. That's not really viable in terms of Arc's macroexpander since we don't even know `(rfn ...)` is an expression in this context until we process `=`. So this isn't quite the right generalization; the right generalization is something trickier.
I suppose we can have every function printed in terms of its pre-macroexpansion source code along with printing all the macros and other macroexpansion-time state it happened to rely on as it macroexpanded the first time. That would narrow down the problem to how to print the built-in functions. And we could solve that problem in the language design by making it so nothing ever captures a built-in function as a first-class value, only as a late-bound reference to the global variable it's accessible from.
Or we could have the user specify their own macroexpander and make it so that whenever a function is printed, if the current macroexpander hasn't expanded that function yet, it does so now (just to determine how the function is printed, not how it behaves). This would let the user specify, for instance, that `assign` expands into `=` and `=` expands into itself, rather than the other way around.
These ideas are incomplete, and I think making them complete would be pretty tricky.
In Cene, I have a different take on this: If a function is printable (and not all are), then it's printable because it's a callable struct with a tag the printer knows about. It would be printed as a struct. The function implementation wouldn't be printed. (The user could look up the source code information based on the struct tag, but that's usually not printable.) There may be some exceptions at the REPL where information is printed that usually isn't available, because the REPL is essentially a debugging context and the debugger sees all. (Racket's struct inspectors express a similar debugger-sees-all principle, but I haven't seen the REPL take advantage of it.)
"[Racket] has every feature you could want. And yet it is very difficult to do even the simplest things. For example, when an error is raised in Anarki, you'll see a stack trace that points to ac.rkt rather than the actual location within the arc file that caused the error."
Is that really "the simplest things"? :-p It seems to me Arc goes out of its way to avoid interleaving source location information on s-expressions the way Racket does. Putting it back, without substantially changing the way Arc macros are written, seems to me like it would be pretty complicated, and that complication would exist whether Arc was implemented in Racket or not. (I think aw's approach is promising here.)
It's fun to see Lumen is designed for compiling to other languages even when it's self-hosted. That's always how I figured Arc or pretty much any of my languages would have been written once they were self-hosted.
I've been trying to write Racket libraries that (among other benefits) let me implement Cene in Racket the way I'd like to implement Cene in Cene, which should make it an easier process to port it into a self-hosting Cene implementation. But I certainly don't have it working yet, the way Lumen clearly is. :)
"There's one thing you can't do with functions that you can do with data types like symbols and strings: you can't print them out in a way that could be read back in. The reason is that the function could be a closure; displaying closures is a tricky problem."
I've sometimes wondered just what the connotation of 'tricky' is here. Is it hard in some theoretic sense, or just "Arc is a prototype so we don't have this yet"?
You may have noticed one of elisp's interesting features: closures are printed out.
> (define (adder n)
(+ x n)))
(closure (t) (n) (function (lambda (x) (+ x n))))
> (adder 42)
(closure ((n . 42) t) (x) (+ x n))
That means you can theoretically write and read closures to/from disk.
Anyone remember the "Unknown or expired link" errors in the old days of Hacker News? Arc generates closures on the fly, then stores them on the server keyed by a random ID. It sends this random ID down to the browser as links: https://www.laarc.io/x?fnid=ls1wNQuImEEYyvWtGKs1Sj. When the user visits the link, arc looks up the closure and calls it. This effectively allows Arc code to pause computation and then resume later. It's kind of like traditional node-style fs.readFile callbacks, but way more interesting because Arc uses macros to make it feel very natural to write server-side code in this style. You don't feel like you're writing nested callbacks. I've never caught myself wishing for async/await, for example.
Now, the drawback of this technique is that it starts to consume a lot of memory. The closures don't need to be stored forever, but they do need to be stored for a reasonable amount of time. Arc solves this by "harvesting" the closures, meaning it walks through the global closures table and casts out the old ones. Like a startup. (Hmm.)
The other drawback is that a server reboot will wipe all the closures. That's why you'd sometimes get "Unknown or expired link" in the middle of the day on HN. The memory usage got pretty extreme, and a periodic reboot was a quick fix (that no doubt made pg wince each time he ^C'd the server).
So, emacs lisp is very interesting here, because it illustrates that it ought to be possible to store Arc's dynamic closures on disk rather than in memory. That would solve the problem completely: you can generate as many functions as you want, and you don't need to worry about a thing till you blow through >10GB of disk space.
The best part is that the server should be able to reboot without losing the closures.
All the functions in news.arc operate either on objects in memory, like users, or on ephemeral objects, like requests. (The arc server creates a new `request` instance for each incoming connection, and it stores things like the user's cookies and the query arguments.) Both of these can already be serialized straight to disk. And the dynamic closures are almost always generated in the middle of building an HTML page -- i.e. you want to describe "build a form; here's what to do when the user submits it". That latter half is a function, which becomes a dynamic closure keyed by fnid. The form is sent straight down to the user's browser. When the form is submitted, the server picks up right where it left off. That means you can interweave your "view" code and your "model" code, in the MVC sense. And there's no need for a controller; the controller is the closure, which knows what to do thanks to lexical context.
HN eventually solved the fnid problem by getting rid of them. You'll rarely see any dynamic closures on the site. My hypothesis is (a) it's faster to write features using the fnid technique, and (b) the fns can be serialized to disk.
(a) seems true, and (b) is worth exploring. As https://www.laarc.io/ gains momentum, we're hoping to preserve this technique. If it works out, users shouldn't notice any "Unknown or expired link" messages – the closure will be on disk, and they'll last a long time, to put it mildly.
It would be important to ensure that circular structures don't cause an infinite loop, and I'd be nervous about straying too far from Racket's `write` facility. For better or worse, it's a limitation of racket that you can't `read` a table you've written. But it could be worth doing.
That's really cool, thanks for making the case for Lumen. I now have it on my todo list to determine how much the standard library of Lumen matches the names Arc uses.
On a slight tangent, ycombinator.lol is not affiliated with Y Combinator, and https://github.com/lumen-language is not affiliated with Scott Bell the creator of Lumen. I clarify this because it took me a while to figure out. Am I characterizing it right, Shawn?
No. I feel slightly bad for being so blunt, but Racket's language facilities have absorbed more of my mental time and energy than any other aspect of Racket.
Let's put it this way. Racket is an excellent choice for implementing languages. It has every feature you could want. And yet it is very difficult to do even the simplest things. For example, when an error is raised in Anarki, you'll see a stack trace that points to ac.rkt rather than the actual location within the arc file that caused the error. This is because, frankly, ... Well, let's just say I've redacted some expletives here. But those expletives were aimed at the fact that it's really quite difficult to translate the vision in your mind into an implementation using Racket.
Now, that being said, if you sit down with Racket and take the time to learn it, you will find that it's one of the most robust, flexible, and performant language runtimes in existence. The thread-local variable support is a killer feature. The custodian support is rock solid, which makes sandboxing trivial and super reliable. And it's the premiere implementation of Scheme, so it might continue to have a thriving community for decades to come. But it took me years to become very effective with Racket. (This says more about my own shortcomings than about Racket, though!)
It's a Lisp that runs in both JS and Lua. It's self-hosted, meaning all .js and .lua code is generated by Lumen itself. Diff this technique with the fact that ac.scm is written in Scheme rather than Arc. Very mysterious! I remember how electrifying it felt that first day I stumbled across it and realized the significance of what I was looking at.
It's one of the most incredible projects I've ever seen. It's so small and clear, yet does so much – a perfect example of why simple systems run circles around competitors.
There is a branch that adds Python support at https://github.com/shawwn/lumen/tree/features/python which speaks to Lumen's flexibility and power. (I wonder if there is another Lisp that you can use natively from three different languages without any FFI. And if there is, I doubt it's <3000 lines.)
The first question I'd ask is, what kind of language do you want to create?
That makes it a lot easier to answer a question like "Is X a good choice for Y?" It depends on Y! :-)
Then, along with asking here (which is fine), you might also want to ask the Racket folks. Go to https://racket-lang.org/ and scroll down to "Community". Then you can ask, "I'd like to create a language like Y, would Racket be a good choice? If so, how would I go about it?"
Would anyone recommend Racket for a toe-headed newbie who wants to learn how to implement their own language after ruining Arc for a bit? Language development is something I've wanted to do for a while but I have no background in type theory, compilers, or anything of that sort.
The INTERLISP read program treats square brackets as 'super-parentheses': a
right square bracket automatically supplies enough right parentheses to match
back to the last left square bracket (in the expression being read), or if none
has appeared, to match the first left parentheses,
e.g., (A (B (C]=(A (B (C))),
(A [B (C (D] E)=(A (B (C (D))) E).
Here's a document which goes over a variety of different notations (although the fact they say "there is no opening super-parenthesis in Lisp" seems to be inaccurate considering the above):
They favor this approach, which is also the one that best matches the way I intend for Parendown to work:
"Krauwer and des Tombe (1981) proposed _condensed labelled bracketing_ that can be defined as follows. Special brackets (here we use angle brackets) mark those initial and final branches that allow an omission of a bracket on one side in their realized markup. The omission is possible on the side where a normal bracket (square bracket) indicates, as a side-effect, the boundary of the phrase covered by the branch. For example, bracketing "[[A B] [C [D]]]" can be replaced with "[A B〉 〈C 〈D]" using this approach."
That approach includes what I would call a weak closing paren, 〉, but I've consciously left this out of Parendown. It isn't nearly as useful in a Lispy language (where lists usually begin with operator symbols, not lists), and the easiest way to add it in a left-to-right reader macro system like Racket's would be to replace the existing open paren syntax to anticipate and process these weak closing parens, rather than non-invasively extending Racket's syntax with one more macro.
Notice (pair (cdr spec)) is an alist. Now if you wanted to extend start-tag with conditional operations on the tag spec, you could bind (pair (cdr spec)) to say var 'attrs' and use arcs built in functions to inspect and modify as you see fit. As you can see a table is probably not needed when there's only a half-dozen items (at max) plus you would lose the order. A list would prevent you from pairing up to do anything meaningful like an inspection or conditional modification.
It's worth noting that a serious downfall for alists is having no real means to detect a data type, because really it doesn't have one (unlike a table). You could inspect the first item attempting to detect the type that way, but the logic's soundness/efficiency breaks down pretty quickly in all but the simplest cases.
So, again, a small number of pairs where you are confident in the shape of the data and have total control of the data usage (i.e where and how it get's passed around) then they are good, but otherwise you would need a pretty good or nuanced reason for it.
Some of pgs other idea's  are: "having several that share the same tail, and preserve old values".
i.e. You could have some function that accumulates pairs (where some or many have the same key, but a different value). This is where you don't want the obvious behaviour a table provides (where the last one added wins), and/or you need previous entries. Noting that you can save an alist to a file, and reload them easily while still having access to the history of values for a given key, thus you're able to re-construct your operations based on historical data. You just can't do that with a table.
Just to recap my opinion that I gave you over chat: the advantage alists have is that they're simple to support. Along pretty much any other axis, particularly if they grew too long, you'd be better off using some other data structure. Which one would depend on the situation.
> You've got me curious now about how this relates to Amacx :)
Why, everything! :-) E.g. I start with: what if top level variables were implemented by an Arc table, and top level variable references were implemented by an Arc macro? That is, what if top level variables were built out of lower level language axioms, instead of being built in?
We end up with something that's kind of like modules, but doesn't do everything that we'd typically expect modules to do (though perhaps we could implement modules on top of them if we wanted to), and also does some things that modules don't do (for example we can load code into a different environment where the language primitive act differently).
To give a name to this thing that is kind of like modules but different, I called them "containers", because they're something you load code into.
Are containers useful? Well, I'm guessing it would depend on whether we'd ever want to want load code into different environments in our program. If we only want to load code once, and all we want is a module system, I imagine it'd probably be more straightforward to just implement a module system directly.
On the other hand, suppose we have a runtime that gives us some nifty features, but is slower than plain Arc. Now it seems like containers could turn out to be a useful idea. Perhaps I have some code that I want to load in plain Arc where it'll run fast, and other code that I want to run in the enhanced runtime and I don't mind that it's slower.
> I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time.
This is the classic test driven development refactoring cycle: add features with tests, then refactor e.g. to remove duplicate code, and/or otherwise refactor to make the code more axiomatic.
Since "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp", one could, in theory, start with such a C or Fortran program and refactor towards an axiomatic approach until you had reinvented Lisp, with the program written in Lisp :-)
But in practice I think going the other way is sometimes necessary: that is, starting with some axioms, and seeing what can be implemented out of them.
I'm not sure why that is (why doesn't anyone keep refactoring a large C program and end up with Lisp?) but I suppose it might be because it's too cognitively difficult, or you end up at some kind of local maximum in your design, or something.
In any case, I find tests absolutely essential for working on Amacx... not just "nice to have" or "saves me a lot of time", but "impossible to do without them"!
You've got me curious now about how this relates to Amacx :)
I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time. You can kinda see that in http://akkartik.name/post/list-comprehensions-in-anarki if you squint a little; I don't show my various working copies there, but the case analysis I describe does faithfully show how I started out thinking about the problem, before the clean general solution suddenly fell out in a flash of insight.
(Having tests doesn't preclude more systematic thinking about the space, and proving to myself that a program is correct. But even if I've proved a program correct to myself I still want to retain the tests. My proofs have been shown up too often in the past ^_^)
It's interesting to be taking an axiomatic approach. That is, in this case, to add to the language axioms that expressions can be labeled with their source file locations.
It might not work: it might turn out that the feature I want (to be able to track source locations through macro expansions) can't be expressed in terms of this particular set of axioms. Or, it might be that it can, but the result is a runtime too slow for me to want to use it.
But, if it does work, it has its own internal logic. What does (cdr x) mean when x has been labeled with source locations? Well, clearly, what it ought to mean is the tail of x, labeled with the source locations of the tail of x. Theorems such as (apply (fn args args) xs) ≡ xs should continue to work.
On the other end of the spectrum from an axiomatic approach is engineering. Have a list of features you want, and design a system that implements all of them. This too might fail sometimes (perhaps the features you want turn out to be incompatible, or you design yourself into a corner that's hard to get out of)... but most of the time it's more reliable, in the sense that usually we can come up with some design that implements all (or at least most!) of the features we want... even if maybe the result isn't very pretty.
The downside of engineering is design complexity. Complexity will probably at least scale linearly with the number of features, if not more likely by some power law. If we're lucky we may see some simplifications in the design along the way that we can refactor into, some axioms of the design that become apparent that we can incorporate... but most of the time, in practice, the design gets more and more complex as we add features.
Engineering is attractive because it gets things done. "I just want X, let's implement X". There are a lot of times when what I want is just to implement X, and I engineer a design, and it works out fine.
The axiomatic approach is more uncertain. Will it work? I don't know. It's also harder. Oops, ssyntax stopped working. Why? `some` stopped working. Why? `recstring` stopped working. Why? `+` stopped working. Why? Is it because my implementation of `apply` is broken, or because I broke the compiler and its now outputting broken code, or because my runtime is broken? It could be any of these. Another day, another week of debugging.
It's also more fun. There are many macro systems. Many of them are practical. Some have features I don't care about, some are more complicated than I like, but I don't have much interest myself in engineering yet another macro system. Axioms are more interesting. Perhaps it will turn out that for these particular set of axioms, it doesn't work out for this particular feature. But then at least I know why :-)