But now that I look closer at the ac.scm history (now ac.rkt in Anarki), I realize I was mistaken to believe Arc treated #f as a different value than nil. Turns out Arc has always equated #f and 'nil with `is`, counted them both as falsy, etc. So this library was already returning nil, from Arc's perspective.
There are some things that slip through the cracks. It looks like (type #f) has always given an "unknown type" error, as opposed to returning 'sym as it does for 'nil and '().
So with that in mind, I think it's a bug if an Arc JSON library returns 'nil or #f for JSON false, unless it returns something other than '() for JSON . To avoid collision, we could represent JSON arrays using `annotate` values rather than plain Arc lists, but I think representing JSON false and null using Arc symbols like 'false and 'null is easier.
That documentation may be wrong. On the other hand, it may be correct in the context of someone who is only using Arc, not Racket.
There are a lot of ways to conceive of what Arc "is" outside of the Racket implementations, but I think Arc implementations like Rainbow, Jarc, Arcueid, and so on tend to be inspired first by the rather small set of operations showcased in the tutorial and in arc.arc. (As the tutorial says, "The definitions in arc.arc are also an experiment in another way. They are the language spec.") Since #f isn't part of those, it's not something that an Arc implementation would necessarily focus on supporting, so there's a practical sense in which it's not a part of Arc our Arc code can rely on.
(Not that any other part of Arc is stable either.)
> Really, #t and #f are not proper Arc booleans, so it makes sense that Arc can't tell what type they are.
Really, #t and #f are not proper Arc anything, but the language apparently handles them so IMHO Arc should also be able to know what type they are. Otherwise, I fear, this will become a Hodge Podge language that will lose appeal.
Personally I don't care if Arc supports booleans. I only care that it can translate booleans (when need be) to a meaningful Arc semantic. That said, if we're going to support booleans then let's not create partial support.
It's great to see a JSON API integrated in Arc. :)
I took a look and found fixes for the unit tests. Before I got into that debugging though, I noticed some problems with the JSON library that I'm not sure what to do with. It turns out those are unrelated to the test failures.
The JSON solution is a quick and dirty hack by a rank noob, and I'm sure something better will come along.
And in hindsight the problem with the (body) macro should probably have been obvious, considering HTML tables are built using (tab) and not (table). I'm starting to think everything other than (tag) should be done away with to avoid the issue in principle, but that would be a major undertaking and probably mostly just bikeshedding.
As part of tidying up my code and separating it into individually digestible libraries rather than a big ball of mud, I've started a GitHub organization called "Lathe." 
You might be familiar with Lathe as the name of my Arc utility libraries and their namespace system. The concept behind the name Lathe was always related to trying to "smooth out" the language I was working in. (And I think originally it was directly related to the language Blade I was trying to design and build; I was smoothing out Arc to get it closer to Blade, or something.)
I'm finally breaking Lathe apart into multiple libraries, all under the "Lathe" GitHub organization. I've got these so far:
- Lathe Comforts for Racket (little day-to-day utilities)
- Lathe Morphisms for Racket (algebraic or category-theoretic constructions)
- Lathe Ordinals for Racket (ordinal arithmetic)
Lathe Morphisms and Lathe Ordinals weren't ever part of the original Lathe repo; they're all-new. And there isn't really that much to Lathe Morphisms yet anyhow; its design is still unstable at the most basic levels as I learn more about category theory.
Anyhow, this blog post is a journal of the way I broke out Lathe Ordinals into its own library this week.
I made this blog post about a week ago. It meanders a lot because I'm making up for all the time I haven't been updating my blog.
The gist of it is that the extensible quasiquotation syntax design I've been working on for a while now, which I've thought had something to do with higher category theory, does indeed seem very related.
All the times I've thought to myself "Why is this so hard to implement? Surely someone out there has answers..." it turns out that the people working on opetopic higher categories are exactly the people with those answers. So now some of the complexity that's made me doubt my approach, I can actually be confident about, and I've found some clear answers out there to things I never quite figured out on my own.
For instance, check out "Implementing the Opetopes," a PDF linked from http://ericfinster.github.io/. In there, Eric Finster describes a data structure called "SAddr," which is an address referencing a particular part of an opetopic structure, the same way you might use an integer to reference a particular element of a list.
Every so often I would think about what it would take to reference a particular element of what I've been calling a "hypertee," and I would come to the tentative conclusion that I'd need a list of lists of lists ... of lists of empty lists. That's exactly what Eric Finster's SAddr data structure is, so it looks like I don't need to worry that I've made a mistake somewhere; someone else has tested this idea already and had success. :)
Over the past week I've been going ahead with an implementation of the kind of quasiquotation system I've been attempting for all this time. It's going well. :) I look forward to having more to report at some point.
It looks like I might've subtly broken ns.arc with my own changes to make Anarki installable as a Racket package. Here's an example that should be working, but currently isn't:
(= n 2)
(= my-definition (* n n))
(let my-ns (nsobj)
; Populate the namespace with the current namespace's bindings.
(each k (ns-keys current-ns)
; Racket has a variable called _ that raises an error when
; used as an expression, and it looks like an Arc variable, so
; we skip it. This is a hack. Maybe it's time to change how
; the Arc namespace works. On the other hand, copying
; namespaces in this naive way is prone to this kind of
; problem, so perhaps it's this technique that should be
(unless (is k '||)
(= my-ns.k current-ns.k)))
; Load the file.
(w/current-ns my-ns (load "my-file.arc"))
; Get the specific things you want out of the namespace.
cannot reference an identifier before its definition
in module: "/home/nia/mine/drive/repo/mine/prog/repo/not-mine/anarki/ac.rkt"
The idea is, you create an empty Arc namespace with (nsobj), you use `w/current-ns` to load a file into it, and you use `a!b` or `a.b` syntax to manipulate individual entries.
An "Arc namespace" is just a convenience wrapper over a Racket namespace that automatically converts between Arc variables `foo` and their corresponding Racket variables `_foo`.
For some overall background...
I wrote ns.arc when I didn't have much idea what Racket namespaces or modules could do, but I was at least sure that changing the compiled Arc code to more seamlessly interact with Racket's `current-namespace` would open up ways to load Arc libraries without them clobbering each other. It wouldn't be perfect because of things like unhygienic macros, but it seemed like a step in the right direction.
I went a little overboard with the idea that Racket namespaces and Racket modules could be manipulated like Arc tables. However, that was the only clear vision I had when I embarked on writing the ns.arc library, so I approximated it as well as I could anyway. In fact, I don't think the utilities for generating first-class modules (like `simple-mod` and `make-modecule`) are all that useful, because as I understand a little better now, Racket modules are as complicated as they are mainly to support separate compilation, so generating them at run time doesn't make much sense.
I'm still finding out new things about what these can do, though. Something I didn't piece together until just now was that Racket has a Racket has a `current-module-name-resolver` parameter which can let you run arbitrary code in response to a top-level (require ...) form. I presume this would let you keep track of all the modules required this way so you can `namespace-attach-module` them to another namespace later. Using this, the kind of hackish partial-namespace-copying technique I illustrate above can probably be made into something pretty robust after all, as long as Anarki sets `current-module-name-resolver` to something specific and no other code ever changes it. :-p
I tinkered with Anarki a whole bunch and finally got this working smoothly. There was a missing step, because it turns out we need to load certain Racket-side bindings into a namespace in order to be able to evaluate Arc code there. It seems more obvious in hindsight. :)
I approached this with the secondary goal of letting a Racket program (or a determined Arc program) instantiate multiple independent intances of Anarki. The ac.rkt module was the only place we were performing side effects when a Racket module was visited, and Racket's caching of modules makes it hard to repeat those side effects on demand, so I moved most of them into a procedure called `anarki-init`.
By adding one line to the example I gave...
(let my-ns (nsobj)
; Load the Arc builtins into the namespace so we can evaluate
(w/current-ns my-ns ($.anarki-init))
...it becomes possible to evaluate Arc code in that namespace, and the example works.
Before I started on that, I did a bunch of cleanup to get the Anarki unit tests and entrypoints running smoothly on all our CI platforms. To get started on this cleanup, I had a few questions hjek and akkartik were able to discuss with me on issue #94: https://github.com/arclanguage/anarki/issues/94
A lot of the problems I'm fixing here are ones I created, so it's a little embarrassing. :) It's nice to finally put in some of this missing work, though. I want to say thanks to shader and hjek for talking about modules and packages, provoking me to work on this stuff!
I've always been frustrated with Arc's lack of a standard practice for loading dependencies (although I suppose akkartik might consider that a feature ^_^ ).
If the way Arc's lib/ directory has been used is any indication, the way to do it is:
- Start in the Arc directory when you run Arc, and never cd out of it.
- (load "lib/foo.arc"), or (require "lib/foo.arc") if you want to avoid running the same file multiple times
But I think for some Anarki users, the preferred technique has been somewhat different:
- Invoke Anarki from any directory.
- Ignore lib/ as much as possible. On occasion, load a library from there by using (load "path/to/arc/lib/foo.arc"), but a few libraries may make this difficult (e.g. if they need to load other libraries).
When I started writing Arc libraries, the first thing I wrote was a framework for keeping track of the location to load things relative to, so that my other libraries could load each other using relative paths regardless of which of the above techniques was in use. But the Lathe module system didn't catch on with anyone else. XD
More recently, eight years ago, rntz implemented the current-load-file* global variable that may make it easier for Anarki-specific libraries to compute the paths of the libraries they want to load. Nothing is currently using it in Anarki however.
It's reading the invalid sequence as � U+FFFD REPLACEMENT CHARACTER, which translates back to UTF-8 as EF BF BD (as we can see in the actual results above). The replacement character is what Unicode offers for use as a placeholder for corrupt sequences in encoded Unicode text, just like the way it's being used here.
Arc's assignment operations are set up to acquire a global lock as they operate, achieved by use of an (atomic ...) block. This is so other threads can't observe the in-between states of operations like (swap ...) and (rotate ...). The documentation for 'atomic is here: https://arclanguage.github.io/ref/threading.html
Arc also has some support for continuations, which can serve as a kind of cooperative multithreading. Mainly, Arc just exposes 'ccc as a binding for Racket's 'call-with-current-continuation, and it uses Scheme's 'dynamic-wind to implement 'protect. These are documented here: https://arclanguage.github.io/ref/error.html
Those are features in support of concurrency, as in, the interleaving of multiple expression evaluations for the sake of avoiding verbose inversion-of-control coding styles. It looks like racket/place is particularly intended for parallelism, as in, the use of multiple processors at once for the sake of performance. I'd say Arc doesn't provide any particular support for parallelism yet, only concurrency.
To summarize this post, I'm trying to generalize macros and quasiquotation in several ways all at once, but most of the hard problems I'm encountering center on an idea I call "higher quasiquotation":
- A quasiquotation is data shaped like an s-expression with s-expression-shaped holes.
- A quasiquotation of higher degree is data shaped like a quasiquotation with quasiquotation-shaped holes.
I already want to define macros whose bodies are not s-expressions but quasiquotations, so I already want to move up one degree on this progression. To make sure this design is correct, I want it to work even if I move up an arbitrary number of degrees. If it works, I expect one macro system to work for quasiquotation macros, regular macros, and reader macros, all in the same generalized way.
No, just like you can nest parentheses without spontaneously having quasiquotation, you can nest quasiquotations like you're talking about without reaching a higher degree of quasiquotation.
Suppose we have notations like so:
( ) parentheses
` , quasiquotation
^ $ the next higher degree of quasiquotation
(I would go on, but I'll run out of punctuation.)
S-expressions are bounded on both sides by single characters, and they nest within each other like this (labeling the layers as "a" and "b" and anything outside their lexical extent as "__"):
(a (b) a) --
If we want to look at just the "b" s-expression, it's simple to write that down on its own:
Quasiquotations are bounded on both sides by s-expressions, and they nest like this:
`(a `(b ,(a ,(--) a) (b) b) a ,(--) a (a) a) --
The meaning of "--" in the inner expressions hasn't changed: Every "--" is outside the lexical boundary of the `(a ...) quasiquotation, just as every "a" is outside the lexical extent of the `(b ...) quasiquotation. Occurrences of "--" can appear when the lexical extent closes on an s-expression (")") or a quasiquotation (","), and both of these are shown in the example.
We can isolate the "b" part pretty easily again, but we need to allow for holes in our data structure:
`(b ,(--) (b) b)
Quasiquotations of the next higher degree are bounded on both sides by quasiquotations, and they nest like this:
a ,(b (b) b) a ,(b) a `(a) a (a) a)
b ,(a) b `(b) b (b) b)
a ,(--) a `(a) a (a) a)
Occurrences of "--" can appear when the lexical extent closes on an s-expression (")"), a quasiquotation (","), or a quasiquotation of the next higher degree ("$"), and all of these are shown in the example.
This time, writing down just the "b" part gets tricky using s-expression-shaped syntax because one of the holes has orphaned sections inside (holes in the hole):
$`(-- ,(b (b) b) ,(b))
b ,(--) b `(b) b (b) b)
When we have more than one orphaned section in the same hole like we do here, we may need to use labels (or positions, or extra hole structure) to tell them apart so we can insert the "b" section into the "a" section deterministically. So far I haven't figured out how to represent this data in a way that works, let alone an elegant way.
For a concrete use case, look at it this way: When do we use ( and )? When our DSLs don't last all the way to the end of the file. When do we use ` and ,? When our DSLs have holes in them (although it might be unusual to hear this, because most Lisps couple these syntaxes together with a particular nested-list-generating DSL). When do we use ^ and $? When our DSLs have holes with holes in them. And so on, where higher degrees of quasiquotation have more and more intricate holes.
Let's say I'm writing a macro that implements a DSL where Common Lisp code and Arc code can be combined in the same function. (I'm going with Common Lisp so that we can't simply compile it to inline Racket code.) I may have Common Lisp s-expressions occurring under my Arc s-expressions and Arc s-expressions occurring under my Common Lisp s-expressions, but I want Arc variables to be visible in all the Arc parts and Common Lisp variables to be visible in all the Common Lisp parts. When we take a look at the lexical scope of any one local Common Lisp (or Arc) variable in that code, it's not simply shaped like an s-expression like traditional lexical scopes are; it has holes-with-holes wherever the Arc (respectively Common Lisp) expressions occur. So ^ and $ are a natural fit for the DSL:
Maybe I could actually use something like that ^ and $ syntax.
I'll need to generalize it to an infinite number of degrees:
Since Scheme's ,',',', technique doesn't generalize to other DSLs, I'll want to have at least one way to unquote from more than one level of nested quasiquotation at once:
Cene has more than one notation for doing that. (I can write a label on an outer layer, and then I can unquote all the way to a particular label.) I'd like to support various unquoting styles as macros, but maybe that macro system can expand to a notation like this, so getting this to work first seems best.
Finally, representing data structures with orphaned parts might be easy enough once I actually try using key-value tables to hold orphans like I've planned.
I think I like this approach even better than the one I reached in the blog post. It means that I actually can process an infinite number of degrees of quasiquotation "in the reader" rather than letting the top level of macroexpansion begin with an s-expression.
But I suppose this and the blog post tackle two different parts of the problem. The blog post's approach/challenges still apply to the process of parsing this syntax into an infinite-degree quasiquotation.
data HDExpr s m
= HDExprMedia (m (HDExprNonMedia s m))
data HDExprNonMedia s m
= HDExprHole s [Map s (HDExpr s m)]
| HDExprLayer (HDExpr s m) [Map s (HDExpr s m)]
It's not very self-documenting, so to start describing it, here's a Lispier pseudocode:
<expr> ::= <s-expression where some leaves are <paren>>
<paren> ::= (close <identifier> <list of <environment of <expr>>>)
<paren> ::= (open-and-close <expr> <list of <environment of <expr>>>)
I say "s-expression" there, but we could have any monad there for our syntax. If we work in the s-expression monad, the usual quasiquote operations ` , are parens of degree 1. When the monad we're working in is Haskell's (Writer String), our syntax is effectively reader syntax, where the ( ) parens are degree 1 and ` , are degree 2.
(There are also parens of degree 0, but they're a bit weird: Once they open, they only close again by reaching the end of the syntax. So when we're in the s-expression monad, perhaps ' is a paren of degree 0. When we're in the (Writer String) monad, perhaps pressing enter at the end of a REPL command is a paren of degree 0.)
The degree of the paren is reflected in the length of the list of environments it contains. Degree-0 parens have no environments, degree-1 parens have exactly one, and so on. If we want to represent a degree-N expression, then the parens we use are limited in a specific way:
<expr> ::= <s-expression where some leaves are <paren>>
<for some (M < N), an M-element list of
<environment of <expr>>)
<for some (M <= N), an M-element list of
<environment of <expr>>)>
The (open-and-close ...) syntax begins a nested expression. The environments serve to fill the holes in the <expr>, resuming the previous level of nesting. Each element of the list fills holes of a different degree. Holes of degree higher than the number of environments in the list are not filled; they remain holes. For instance, in `(a b (c d ,e) f), the "(" before "c" has two holes in its expression (",e" and ") f"), but it only fills the hole for ") f". The hole for ",e" is a higher degree than the "(" paren, so it's not filled at such a local place. It's only filled by the "`" paren on the outside.
The (close ...) syntax begins a hole of degree equal to the number of elements of the list. The list of environments will be used when the hole is replaced with an expression. It'll fill in the (low-degree) holes of that expression.
Note that the list lengths correspond with the degrees of holes, but not with the degrees of expressions. In fact, a degree-N expression does not contain any expressions of degrees other than N. If we consider ourselves to be working with higher quasiquotations of an infinite degree N, but every (close ...) actually occurring in our data has finite degree, then we can represent our data using finite lists. We never need to select a finite value for N!
In Haskell, the strongly typed variations I wrote do require a finite value for N to be decided beforehand (and baked into the name of the type), because I'm pretty sure that vastly simplifies the type system features I would need to use to get it to work. :-p Roughly, the difficulty is that I need 0 to give me something of kind ( * ), 1 to give me something of kind ( * -> * ), 2 to give me something of kind (( * -> * ) -> ( * -> * )), and so on, so I would need kinds that depend on values. Agda or Idris might already be up for this task, but I doubt that's going to be the kind of effort I want to spend since I intended to build my macroexpander in (untyped) Racket to begin with.
There might just be a little more I want to tinker with in Haskell, because on my way to defining this data structure I started to develop some code for "higher monads," and it would be fun if this data structure turned out to be an instance of that concept. Still, at this point I'm probably ready to go back to Racket.
I'm even hopeful again that this macro system might play nicely with Racket's. Racket has some hygiene features that walk recursively over the inputs and outputs of macros, expecting them to be s-expression-shaped with no holes. I wasn't sure that Racket's technique could be reconciled with higher quasiquotation. With this data structure, the exotic nesting of higher quasiquotations is converted to the traditional kind of nesting that Racket expects.
So, this might be a usable self-contained Racket library in the end -- not even a framework but a seamless library -- which is what I want, because that would make it easy to clarify the meaning of higher quasiquotation in terms of an existing ecosystem before I use it in Cene. :)
Since this thread and my code comments contain some walls of text, I'll see if I can convert them to a blog post soon.
 In particular, Racket uses a hygiene technique where it attaches a fresh scope label to a piece of syntax before macroexpanding and then flips the presence of that scope after macroexpanding so that the scope winds up attached to only the parts of the macro result that don't come from the input (making that region more local than the rest). Racket also has a "syntax taints" feature which lets macros attach "dye packs" to their results so that the identifiers occurring in those results can't be used by a client to access private module bindings.