Should clarify something (follow-up to previous reply, see below or above).
Most macros don't want to deal with whitespace and comments. We also might want to _not let_ them, otherwise people will start using comments as code, like in Ruby. Macros just want expressions. So, we would define a second level of the AST and perform a second pass.
For the same base notation, there may be multiple languages defined in terms of it. If such a language has any form of prefix or infix, or uses the `outside_parens()` calling convention, the second pass would have to group nodes into expressions, in ways specific to that language. Furthermore, it should be addled with metadata about packages, types, and so on. The resulting AST is compiled, fed to macros, etc.
> Why do you care about emitting exactly the code that was parsed? What new apps does it enable?
Got a `gofmt` addiction, can't go back. Auto-formatting should ship with every language.
Briefly skimmed the Go implementation, and you seem to be right: it seems to lose whitespace and enforce its own formatting.
> Are you aware of any languages that perfectly reproduce input layout?
For now just my own. [1] The language isn't real yet, and might never be realized, but it has a base data notation (very Lisp-like), a parser, and I just started writing a formatter. Because the AST for the data notation preserves whitespace and comments, the formatter can print the code _exactly_ as is. This has interesting repercussions.
For a fully-implemented formatter for a fully-defined language, you wouldn't need whitespace; see Go. However, being able to print everything back means your formatter is usable from the start. It can support one or two simple rules, making only minor modifications, but you can use it on real code right away. Furthermore, this means we'll _always_ be able to choose which rules to enable or disable, which can be handy if the family of languages described in terms of this notation has different formatting preferences. I actually want the formatter shipped with the language, like `gofmt`, to be non-configurable, but this still seems like a useful quality.
The GCC only supports a handful of Algol-family languages, and not a single Lisp. Arc is currently an interpreted language running on top of MzScheme. It's two different realms entirely.
Since Arc compiles each Arc expression to MzScheme before executing it, and since MzScheme compiles each of those expressions, Arc inherits this on-demand compilation behavior.
I’m not much of a Lisper, so it would be interesting to hear what experienced people would say about the language. Am I stupid? Did I get something horribly wrong? The idea was to bring Lisp (Arc) paradigms but use native JavaScript data structures, where you don’t have symbols, nil, and so on. The resulting side effects is things like quoting stringifying atoms for macros, hash tables not being lists (there are no keyword elements in JS), and so on. I would actually love to hear I got something wrong; the language is still early enough to make breaking changes.
I wrote the original compiler in CoffeeScript, then gradually reimplemented its parts in jisp, concurrently fixing bugs and adding features to the remaining coffee parts. The hardest part to replace was the jisp.coffee to jisp.jisp; at that point I had to stop and rearchitect some parts of the old compiler and implement macros in it to write the new one in a relatively sane way. Using a higher-level JavaScript dialect helped in several ways: it allowed to write less code, challenged me to immediately implement those higher-level features, and helped with understanding and debugging JavaScript (which would have been harder if I wrote in a non-JS language).
Didn’t want to publish an unfinished compiler, so there’s no coffee in the repo history. I don’t actually even have a git history before 0.0.1.
Took three weeks from concept to publish. May have been faster or better if I borrowed technical concepts from other implementations, but I wanted the compiler to be completely original. For instance, it converts jisp code into native JavaScript data structures rather than token trees. Not sure how other similar dialects handle this.
Oh that's too bad. I'm intensely curious about the experience and it's very poorly covered in most places. Guess I'll just have to try it for myself :)
Writing a self-hosted compiler feels a bit like climbing a skyscraper without a safety belt. I usually test each change by immediately having the compiler recompile itself a few times. Back when it was unstable, each time felt like a plunge from a mountaintop, with a jolt of adrenaline. I guess it still does. :D
Thanks for sharing, rocketnia! I’ve been looking into ways of getting started, and Arc.js looks like the most novice-friendly implementation so far. Node.js and browser is what I use for hobby coding anyway. :)