> Would it be much trouble if I started working toward some of these things for Anarki or Amacx?
My aspiration for Amacx is that it becomes a framework that allows you to create the language you want to create. By analogy, similar to how if you're writing a compiler, and you'd find LLVM useful, you can use LLVM as part of your toolchain to write your compiler.
Thus, if you (or someone) wanted to create a particular reader and printer syntax for tables (whether ##ob and v or something else), then you certainly should be able to do that.
I have both an Arc reader and printer written in Arc, but not yet included in Amacx because currently it's too slow. Working on the reader and printer makes the most sense, I think, after finishing my current work on source location tracking (assuming that works out), both because with a profiler it will be easier to see how to speed up the implementation, and because the reader will need to support source location tracking itself.
There's a lot of "if"s here, but in the happy scenario that everything works out, then hopefully adding ##ob and v (or whatever someone wants) will be easy: just add a few lines of Arc code :-)
> Q: If special syntax support for alists were added would a!k return a keyvalue pair (`assoc`) or just a value(`alref`)?
One option is to extend calling lists so that calling a list with a number would continue to do the same thing (return the item at that position), while calling a list with a non-number would treat the list as an association list and do a lookup on that key.
Then no special syntax is needed because the standard `alist!x` would work, as it expands into `(alist 'x)`.
Naturally, this means that you couldn't do a lookup in your alist with a number using the `!` syntax, but that might be OK for you if you aren't using numbers in keys in the alists that you want to use the `!` syntax with.
This option wouldn't return the alist value on an index access though, as it would continue to return the association pair.
Another option would be to create a new type (e.g. using `annotate`) and then have calling objects of that type do what you want (for example, in Anarki you could use `defcall`).
testible.clef means look up the value stored at what clef evaluates to in testible
testible!clef means look up the value stored at clef in testible
testable.0 means look up the value stored (at what 0 evaluates to) in testible
testible!0 means look up the value stored at 0 in testible
atist.0 means look up the value stored at what 0 evaluates to in atist
atist!0 means look up the value stored at 0 in atist
0 of course being an atom evaluates to 0
but what if you want it to evalute to something like the key stored at 0 . . .
. . . if quote means something like "don't evaluate" and unquote means something like "do evaluate" then one could reason that
atist,0 means look up the value stored at what 0 evaluates to (do evaluate it) in atist
! and . behave the same with alists and numbers while its inconvenient if you want to access the key it makes sense.
Is this reasonable?
Edit: this could work for alists and insertion-ordered tables since it's unobvious how testible!0 & testible.0 should behave, numbers can and should be able to be keys so one can imagine a situation where behavior would be like so:
"Immediate reaction: this is a bad idea. Commas mean something specific in Lisp. And having `0` mean different things in different contexts is a recipe for disaster.
"I prefer aw's proposal above. Support list indexing, support alist lookup, don't support alist lookup by integer keys. Not the end of the world, people can just use `alref` in that situation."
The first question I'd ask is, what kind of language do you want to create?
That makes it a lot easier to answer a question like "Is X a good choice for Y?" It depends on Y! :-)
Then, along with asking here (which is fine), you might also want to ask the Racket folks. Go to https://racket-lang.org/ and scroll down to "Community". Then you can ask, "I'd like to create a language like Y, would Racket be a good choice? If so, how would I go about it?"
It's interesting to be taking an axiomatic approach. That is, in this case, to add to the language axioms that expressions can be labeled with their source file locations.
It might not work: it might turn out that the feature I want (to be able to track source locations through macro expansions) can't be expressed in terms of this particular set of axioms. Or, it might be that it can, but the result is a runtime too slow for me to want to use it.
But, if it does work, it has its own internal logic. What does (cdr x) mean when x has been labeled with source locations? Well, clearly, what it ought to mean is the tail of x, labeled with the source locations of the tail of x. Theorems such as (apply (fn args args) xs) ≡ xs should continue to work.
On the other end of the spectrum from an axiomatic approach is engineering. Have a list of features you want, and design a system that implements all of them. This too might fail sometimes (perhaps the features you want turn out to be incompatible, or you design yourself into a corner that's hard to get out of)... but most of the time it's more reliable, in the sense that usually we can come up with some design that implements all (or at least most!) of the features we want... even if maybe the result isn't very pretty.
The downside of engineering is design complexity. Complexity will probably at least scale linearly with the number of features, if not more likely by some power law. If we're lucky we may see some simplifications in the design along the way that we can refactor into, some axioms of the design that become apparent that we can incorporate... but most of the time, in practice, the design gets more and more complex as we add features.
Engineering is attractive because it gets things done. "I just want X, let's implement X". There are a lot of times when what I want is just to implement X, and I engineer a design, and it works out fine.
The axiomatic approach is more uncertain. Will it work? I don't know. It's also harder. Oops, ssyntax stopped working. Why? `some` stopped working. Why? `recstring` stopped working. Why? `+` stopped working. Why? Is it because my implementation of `apply` is broken, or because I broke the compiler and its now outputting broken code, or because my runtime is broken? It could be any of these. Another day, another week of debugging.
It's also more fun. There are many macro systems. Many of them are practical. Some have features I don't care about, some are more complicated than I like, but I don't have much interest myself in engineering yet another macro system. Axioms are more interesting. Perhaps it will turn out that for these particular set of axioms, it doesn't work out for this particular feature. But then at least I know why :-)
You've got me curious now about how this relates to Amacx :)
I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time. You can kinda see that in http://akkartik.name/post/list-comprehensions-in-anarki if you squint a little; I don't show my various working copies there, but the case analysis I describe does faithfully show how I started out thinking about the problem, before the clean general solution suddenly fell out in a flash of insight.
(Having tests doesn't preclude more systematic thinking about the space, and proving to myself that a program is correct. But even if I've proved a program correct to myself I still want to retain the tests. My proofs have been shown up too often in the past ^_^)
> You've got me curious now about how this relates to Amacx :)
Why, everything! :-) E.g. I start with: what if top level variables were implemented by an Arc table, and top level variable references were implemented by an Arc macro? That is, what if top level variables were built out of lower level language axioms, instead of being built in?
We end up with something that's kind of like modules, but doesn't do everything that we'd typically expect modules to do (though perhaps we could implement modules on top of them if we wanted to), and also does some things that modules don't do (for example we can load code into a different environment where the language primitive act differently).
To give a name to this thing that is kind of like modules but different, I called them "containers", because they're something you load code into.
Are containers useful? Well, I'm guessing it would depend on whether we'd ever want to want load code into different environments in our program. If we only want to load code once, and all we want is a module system, I imagine it'd probably be more straightforward to just implement a module system directly.
On the other hand, suppose we have a runtime that gives us some nifty features, but is slower than plain Arc. Now it seems like containers could turn out to be a useful idea. Perhaps I have some code that I want to load in plain Arc where it'll run fast, and other code that I want to run in the enhanced runtime and I don't mind that it's slower.
> I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time.
Exactly!
This is the classic test driven development refactoring cycle: add features with tests, then refactor e.g. to remove duplicate code, and/or otherwise refactor to make the code more axiomatic.
Since "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp", one could, in theory, start with such a C or Fortran program and refactor towards an axiomatic approach until you had reinvented Lisp, with the program written in Lisp :-)
But in practice I think going the other way is sometimes necessary: that is, starting with some axioms, and seeing what can be implemented out of them.
I'm not sure why that is (why doesn't anyone keep refactoring a large C program and end up with Lisp?) but I suppose it might be because it's too cognitively difficult, or you end up at some kind of local maximum in your design, or something.
In any case, I find tests absolutely essential for working on Amacx... not just "nice to have" or "saves me a lot of time", but "impossible to do without them"!
I was curious, has anyone implemented "internal definitions", that is, a way to introduce a variable like `let`, but without having to indent the code that uses it?
For example, I may have a function like:
(def foo ()
(let a (...)
(do-something a)
(let b (... a ...)
(do-something-else b)
(let c (... b ...)
etc...))))
and the indentation keeps getting deeper the more variables I define. `withs` can help, but only if I'm not doing something in between each variable definition.
In Racket, this could be written
(define (foo)
(define a (...))
(do-something a)
(define b (... a ...))
(do-something-else b)
(define c (... b ...))
etc...)
In Racket, this is called "using define in an internal definition context"; it's like using `let` to define the variable (the scope of the variable extends to end of the enclosing form) except that you don't have to indent.
In Arc, I'd like to use `var`, like this:
(def foo ()
(var a (...))
(do-something a)
(var b (... a ...))
(do-something-else b)
(var c (... b ...))
etc...)
In Arc 3.2, a `do` is a macro which expands into a `fn`, which is a primitive builtin form. We could swap this, so that `fn` was a macro and the function body would expand into a `do` form.
Thus, for example, `(fn () a b c)` would expand into `($fn () (do a b c))`, where `$fn` is the primitive, builtin function form.
Many macros such as `def` and `let` that have bodies expand into a `fn`s, and this change would mean that the body of these forms would also be a `do`.
For example, `(def foo () a b c)` expands (roughly) into `(assign foo (fn () a b c)`; which in turn would expand into `(assign foo ($fn () (do a b c)))`
Thus, in most places where we have a body where we might like to use `var`, there'd be a `do` in place which could implement it.
Why go to this trouble? `var` can't be a macro since a macro can only expand itself -- it doesn't have the ability to manipulate code that appears after it. However `do`, as a macro, can do whatever it wants with the code it's expanding, including looking to see if one of the expressions passed to it starts with `var`.
As macro, this could be an optional feature. Rather than being built in to the language where it was there whether you wanted it or not, if you didn't like internal definitions you wouldn't have the load the version of `do` which implemented `var`.
A further complication is what to do with lexical scope. If I write something like,
(let var (fn (x) (prn "varnish " x)
(var 42))
I'm clearly intending to use `var` as a variable. Similar to having lexical variables override macros, I wouldn't want `var` to become a "reserved keyword" in the sense that I now couldn't use it as a variable if I wanted to.
The Arc compiler knows of course which lexical variables have been defined at the point where a macro is being expanded. (In ac.scm the list of lexical variables is passed around in `env`). We could provide this context to macros using a Racket parameter.
> What aw wants is a more semantic change where vars `a`, `b` and `c` have the same scope. At least that's how I interpreted OP.
Just to clarify, in my original design a `var` would expand into a `let` (with the body of the let extending down to the bottom of the enclosing form), and thus the definitions wouldn't have the same scope.
Which isn't to say we couldn't do something different of course :)
Huh, I thought I had fixed the formatting in post, but apparently it didn't get saved. Too late to edit it now.
- For the profiler to be useful, functions need to be labeled in some way so that the profiler will show which functions are which.
- To identify functions, I need to propagate source location information from an Arc source code file through to Racket, which includes propagating the source location information through Arc macros.
One option would be to create a macro system designed to propagate source location information through macros. This of course is what Racket does.
In Arc, macros are defined with list primitives (car, cdr, cons), function calls (for example, `(mac let (var val . body) ...)`), and operations that can be built out of those (such as quasiquotation).
My hypothesis is that by allowing list primitives to operate on forms labeled with source location information, we can continue to use Arc macros.
This leads to an interesting issue however...
Consider
(apply (fn args args) xs)
this is an identity operation. For any list `xs`, this returns the same list.
In Arc 3.2, and in my implementation, an Arc function compiles into a Racket function. E.g., `(fn args args)` becomes a Racket `(lambda args args)`.
Of course we don't have to do that. If we were writing an interpreter, for example, an Arc function would compile into some function object that'd be interpreted by the host language... an Arc function wouldn't turn into something that could be called as a Racket function directly.
But, if `(fn args args)` is implemented as a Racket function `(lambda args args)`, then to call the function with some list `xs` we need to use Racket's apply. But, of course, Racket's apply takes a Racket list. So in Arc 3.2, Arc's apply calls Racket's apply after translating the Arc list into a Racket list.
Leaving out a couple of steps, what in essence we end up with in Racket is the equivalent of:
(apply (lambda args args) (ar-nil-terminate xs))
where `ar-nil-terminate` converts an Arc list to a Racket list.
Now, for Amacx, I've invented my own representation for Arc lists. In my version, lists (that is, cons cells) can be labeled with where in a source code file they originated from. For example, if I read "(a b c)" from a file, I can inspect that list for source location information:
> (prn x)
(a b c)
> (dump-srcloc x)
foo.arc:1.0 (span 7) (a b c)
foo.arc:1.1 (span 1) a
foo.arc:1.3 (span 1) b
foo.arc:1.5 (span 1) c
which shows me that the list in `x` came from a source file "foo.arc" at line 1, column 0 with a span of 7 characters; that the first element "a" was at column 1, "b" was at column 3, and so on.
This is entirely internally consistent. E.g. (cdr x) returns a value which contains both the tail of the list `(b c)` and the source location information for the sublist.
But.
In a macro,
(mac let (var val . body)
`(with (,var ,val) ,@body))
I'm not seeing the source location get through the rest args.
(mac let (var val . body)
(dump-srcloc body)
`(with (,var ,val) ,@body))
Zilch. Nothing. Nada. `body` is a plain list, no source location information.
Why?
Because I stripped it.
(apply (lambda args args) (ar-nil-terminate xs))
The argument to Racket's `apply` has to be a Racket list. Not my own made-up representation for lists.
Thus my version of `ar-nil-terminate` removes source location information and returns a plain Racket list. I did this early on, because loading Arc fails quite quickly when `apply` doesn't work. I didn't realize it would mean that macros wouldn't get source location information passed to them.
So, a macro like `let` turns into the equivalent of
(annotate 'mac
(lambda (var val . body)
...))
the macro is invoked with `apply`... and there goes the source location information in `body`.
Of course, like I said, I don't have to implement an Arc rest argument with a Racket rest argument. An Arc function that took a rest argument could turn into some other kind of object where I'd pass in the rest argument myself.
But that would be slower. Probably.
I can get the profiler to work (I think), but then I'd be profiling the slower version of the code.
Though the runtime that implements the extended form of lists with source location information is slower anyway because all of Arc's builtins need to unwrap their arguments.
Where do you actually need source location information in order to get Arc function names to show up in the profiler?
Would it be okay to track it just on symbols, bypassing all this list conversion and almost all of the Arc built-ins' unwrapping steps (since not many operation have to look "inside" a symbol)?
If you do need it on cons cells, do you really need it directly on the tail cons cells of a macro body? I'd expect it to be most useful on the cons cells in functional position. If you don't need it on the tails, then it's no problem when the `apply` strips it.
Oh, you know what? How about this: In `load`, use `read-syntax`, extract the line number from that syntax value, and then use `syntax->datum` and expand like usual. While compiling that expression, turn `fn` into (let ([fn-150 (lambda ...)]) fn-150) or (procedure-rename (lambda ...) 'fn-150), replacing "150" here with whatever the source line number is. Then the `object-name` for the function will be "fn-150" and I bet it'll appear in the profiling data that way, which would at least give you the line number to work with.
If you want, and if that works, you can probably have `load` do a little bit of inspection to see if the expression is of the form (mac foo ...) or (def foo ...), which could let you create a more informative function name like `foo-150`.
There's something related to this in `ac-set1`, which generates (let ([zz ...]) zz) so that at least certain things in Arc are treated as being named "zz". Next to it is the comment "name is to cause fns to have their arc names while debugging," so "zz" was probably the Arc variable name at some point.
mmm, not sure. It'd probably be easier to start with a working version (even if slow) and then remove source information from lists and see if anything breaks.
> In `load`, use `read-syntax`, extract the line number from that syntax value
erm, so all functions forms compiled during the eval of that expression would get named "fn-150"?
"erm, so all functions forms compiled during the eval of that expression would get named "fn-150"?"
That's what I mean, yeah. Maybe you could name them with their source code if you need to know which one it is, if it'll print names that wide. :-p This isn't any kind of long-term aspiration, just an idea to get you the information you need.
I’m currently working to see if I can get profiling going...
At the moment, Amacx loads arc.arc unpleasantly slowly. And I don’t
even have all of arc.arc included yet.
I imagine it’s probably because the compiler is slow. (What else could
it be?)
And I imagine there’s a good chance the compiler is slow because as the
compiler recursively works its way down into forms being compiled, I
functionally extend the compilation context in a simplistic way:
(def functional-extend (g nk nv)
(fn (k)
(if (is k nk) nv (g k))))
But it would be nice to be able to measure it, instead of just guessing.
I hope I can use the Racket profiler, if I can propagate function names
and/or function source locations from Arc to Racket.
i.e. the profiler knows that functions are being called, but has no way
to identify them because all such identifying information has been lost
by the time Arc code (e.g. implementing the compiler) has been compiled
to Racket.
Today I ran into a weird issue. I was running the Racket profiler in
DrRacket, and it worked at first, but then started to hang. So I closed
DrRacket and restarted it, deleted all my `compiled` directories,
reverted my source code back to an earlier version… and it still hung.
So I don’t know why it worked at first or why it stopped working.
But it turns out the profiler works well from the command line, so I’m
doing that instead.
A container is an object which stores top level variables. When you use a top level variable "foo", that's a reference to "foo" in the container that you're eval'ing your code inside.
A container can be a simple Arc table, or some object like a Racket namespace which can be made to act like an Arc table. So suppose I have a container `c`, and I eval some code in c:
(eval '(def foo () 123) c)
Now `c!foo` is the function foo:
> (c!foo)
123
A function can always call a function in another container if it has the same runtime. A compiled function ends up having a reference to the container it was compiled in because the top level variables used in the function are compiled to have a reference to the container, but other than that it's just a function.
A function in one container can call a function in another container with a different runtime if the runtimes are compatible. Which they might not be. For example, a function compiled in the srcloc runtime can call a function in the mpair runtime because both runtimes compile Arc functions to Racket functions, but the mpair runtime wouldn't understand an Arc list passed to it from the srcloc runtime made up of Racket syntax objects. So you might need to do some translation between runtimes depending on how different they are.
A runtime is how we choose to implement a language once a program has been compiled and is now running.
For example, in Arc 3.2, `cons` creates an immutable Racket pair, `car` works with a Racket pair or a Racket symbol nil, and `scar` modifies an Arc cons with Racket's `unsafe-set-mcar!`.
These are all runtime decisions. We could create a different runtime. For example, Arc's `cons` could create a Racket mpair, and `scar` could use Racket's `set-mcar!`.
For Amacx I've written two runtimes so far. One I called "mpair" because it implements Arc lists using Racket's mpairs. The other I called "srcloc" because it allows source location information to be attached to Arc forms (lists and atoms).
Currently, in srcloc, source location information is attached to forms using Racket's syntax objects. Thus, in srcloc, Arc's `car` can be applied to a Racket syntax object which wraps a list or nil, and it will unwrap the syntax object and return the underlying value.
It's a coincidence that I chose to call my runtime "srcloc" and Racket happens to also store source location information in a struct they call "srcloc" :-)
Then, well, you could conceivably change the semantics of "env" as it's passed around in ac.scm. Currently it's just a list of variables that are bound, and things test for whether a variable is present in that list. You could change it to a list of (variable-name macro-it's-bound-to-if-any), and have the special form (let-macro name arglist bodexpr . body) insert `(,name (fn ,arglist ,bodexpr) into env, while everything else puts in (variable nil), and change all the existing tests on "env" to search for "a list whose car is x" rather than "x", and lastly make ac-call call the macro-function on the expression if it finds one in the lexenv.
In theory, one could put arbitrarily complicated information, such as about deduced types of variables, into this "env" mapping, and implement some amount of compiler optimization that way.
First-class macros, of course, are the semantically nicest approach, but more difficult to compile.