Just after that, Racket introduced an easier way to fix this, `unsafe-set-immutable-car!`, specifically as a stable way to mutate the so-called immutable cons cells going forward. :) We should probably move to that.
I made an attempt at this once in Anarki, but I wasn't really satisfied with the result. The anarki package has a `#lang anarki` language, with a demo in the /lib/racket-lang-demo directory.
In particular, this is /lib/racket-lang-demo/lang-anarki-library.arc, which shows what it would look like to use Racket libraries and other Arc libraries from a `#lang anarki`-based library:
The `(:provide ...)` syntax is a special extension to the Arc language for the purposes of `#lang anarki`. Every `#lang anarki` file must start with `(:provide var1 var2 var3 ...)` fo r some number of variables (even zero). It describes what exports are visible to other Racket libraries.
I believe `#lang anarki` does something to make it so `(load "plain-anarki-deep-dependency.arc")` loads the file path relative to the directory the module is in.
There are a number of awkward things about using Arc to write libraries for Racket consumption:
- Racket languages typically have a notion of phase separation where some amount of the program (the compile-time part, i.e., phases 1 and up) happens during the process of compiling a module into a .zo file, and the rest of the behavior (the run-time part, i.e., phase 0) can be performed by loading the already-compiled .zo file. Arc programs are more like `#lang racket/load` in that they only begin macroexpanding the later expressions in the file after they've run the run-time parts of the previous expressions, so it's like the whole macroexpansion process has to wait until run time. The only things `#lang anarki` does at compile time are reading the file's s-expressions and processing the `(:provide ...)` directive.
- Racket and Arc both have macro systems. They both involve writing macro definitions in source-to-source styles that usually make it easy to write a macro that abstracts over another macro call. Sometimes, in more advanced cases, these styles require more attention to get the hygiene right. However, they use different "source" types (Racket's syntax objects vs Arc's plain s-expressions), and they have rather different approaches to hygiene (Racket's lexical information carrying module imports and sets of scopes, vs Arc's gensyms and global variable redefinition warnings). This means mixing Racket macros with Arc macros is one of the advanced cases; I expect that to get the interactions right, it may be necessary to do some explicit conversions between "source" types, as well as being proficient in both languages' approaches to hygiene.
- As far as a module system goes, Arc traditionally has no way to allow multiple pieces of code to see different bindings of the same top-level variable name. (Framewarc and I believe Arc/Nu have approaches to this, but they involve writing all macros differently than before.) Hence, two Arc libraries might fail to be usable together due to mutual clobbering of the same variable, which isn't a typical situation with Racket modules. If and when modular techniques catch on for Arc, those techniques may still be challenging to reconcile with Racket's techniques, so a `#lang anarki`-based library may still not present a very familiar interface to other Racket ecosystem programmers.
- Arc is a lisp-N with its various namespaces, like `setforms`, `defcall`, and `coerce`, stored as entries in global hash tables. Racket is a lisp-N with its various namespaces, like `set!` transformers and `match` expanders, stored on the same compile-time object by implementing multiple interfaces (structure type properties/generic interfaces). This is one particular place where reconciling the modularity approaches may be difficult.
- Racket's compiler tooling tries to determine statically what files a file depends on, both so that `raco make` and `raco setup` can recompile the file if its dependencies have changed and so that if a file is bundled up with `raco distribute` or `raco exe`, its dependencies are included in the bundle. This is something I didn't tackle at all with `#lang anarki`. Things like `load` and `dynamic-require` make it difficult to keep track of dependencies statically this way.
The design I chose for `#lang anarki` was basically to get something tolerable working with as little effort as possible. There's probably a lot of room for improvement, so I didn't consider it stable enough to show off in the `anarki` Racket package documentation.
I think a somewhat better approach would involve:
- Giving Anarki a read-time namespace system like Common Lisp's, to solve Arc's inadvertent name clobbering issues (while potentially still permitting intentional clobbering).
- Going ahead and macroexpanding a whole `#lang anarki` file at compile time, evaluating nothing but the `mac` definitions at first, even though that's not the usual Arc `load` semantics. User-defined macros can sometimes be slow, making it worthwhile to expand them before producing the .zo. Only so much Arc code actually cares about running certain expressions before others are expanded, and those cases can use a CL-style `eval-when` approach.
- Assembling static information about the module just by using `eval-when` compile-time mutable definitions. Then we could dispense with `(:provide ...)` and just have a global mutable table that collects all the exports of the current module.
I think Arc's load time is pretty typical of uncompiled Racket code. In Racket, people can speed up their development process by using `raco setup --pkgs my-package` to compile their libraries or `raco make filename.rkt`
to compile individual files. That might be harder to do for Arc, especially the Arc REPL, since Arc's macroexpansion is interleaved with run-time side effects.
Yes, it's the fork that's been maintained by the people here at Arc Forum. :) I recommend it.
Whether you want the community additions or not, you might also be interested in the "official" and "stable" branches of that repo. The "official" branch tracks the official .tar releases that you'd find on the front page here, and the "stable" branch tracks the official releases plus a few small bug fixes and quality-of-life improvements.
Penknife really was just a pile of code before now. I originally wrote it on my commute using an Android phone. Now I've finally gotten around to giving it a readme, a .gitignore, and all that. Whew. :)
There was another language I called Penknife a couple of years later because I thought of it as having the same set of design goals, so technically this one's more like Penknife Mk. I. That's what I think I'll call it.
Something about Penknife Mk. I that I often think back to is its approach to macro hygiene.
Its syntax is primarily string-based, but with the ability for other values to be embedded inside the strings. It has a kind of quasiquotaton that surrounds the quoted section with an object wrapper that the macroexpander recognizes. When the macroexpander expands that expression, it switches over to using the scope at the macro's definition site. Interpolations in the quasiquotation are surrounded with another wrapper that causes the macroexpander to switch back to the caller's scope.
When the macro is defined, its lexical scope is captured and carried on the macro's binding. That way, it's the namespace that holds other namespaces inside it. The syntax trees don't have to hold namespaces; they can just hold paths to traverse the namespace hierarchy.
I still think of this as a nice sweet spot; the code has a context-independent enough identity to be compiled, while the namespaces are mutable enough to allow REPL interaction.
Racket's sets-of-scopes approach to hygiene is mature and supports advanced features like local macros and local definition blocks, but I think of it as kind of sloppy. It spray paints one piece of information over the whole syntax tree just to mark that a variable is in scope. This approach could be handy in cases where variable scopes overlap with each other in non-hierarchical ways -- where sometimes one variable is in scope, sometimes the other, and sometimes both -- but I feel like Penknife's wrapper objects are a much tidier model of the typical hierarchical structure. There've been many moments in my Racket programming when I would have liked Racket to have Penknife Mk. I's approach to hygiene instead.
I'm excited to have something I've put together this week that seems highly relevant to discussions we've had here. :) Basically, Fexpress is an fexpr evaluator, but the fexprs are an extended kind that can also offer up compilation functionality in addition to interpretation.
There are only so many things that are "internal" in this project, because all the bits and pieces might be useful to user-defined fexprs, so you can get a pretty thorough tour of the techniques from the docs: https://docs.racket-lang.org/fexpress/index.html
I was merciless about cutting down the scope of this project so that I wouldn't overcomplicate it and make it incomprehensible to people. I specifically made certain questionable choices in the API just so that the code could be a little easier to read.
Until this week, the Fexpress repo consisted of only a non-starter attempt I made 10 years ago, so it's a bit nostalgic to finally build it out into a real project. I wrote up a blog post about this journey and where the ideas might go from here: https://rocketnia.wordpress.com/2021/08/18/fexpress-compilat...
"However, there’s a certain kind of seamlessness Fexpress won’t attempt: Racket’s and can’t be passed into Racket’s map, and sometimes this surprises people who expect macros to act like functions. In languages with fexprs as the default abstraction, it tends to be easy to implement and and map in such a way that this interaction succeeds."
Do you have an fexpr-aware version of map at the moment? Or and? Or do you have to switch both?
The point I'm making in that paragraph is that there are reasons for `map` and `and` not to work together even in an fexpr-capable language. So I'm not sure what you're asking for; I'd say Racket's `map` and `and` already are "fexpr-aware."
Are you asking for clarification on how "this interaction succeeds" in a language where fexprs are more pervasive?
Let's see... I found an online PicoLisp runner.
As an initial test, `prin` prints and returns its string argument, and anything non-NIL counts as truthy:
(PicoLisp doesn't evaluate rest arguments like `args` here, even though it does evaluate every other argument in the argument list, and that's by design. This is how we can write custom fexprs.)
It appears that what our fexpr observes is a list of what PicoLisp calls "anonymous symbols" (probably like gensyms), which apparently are bound to the already-evaluated argument values.
Incidentally, notice how this `mapcar` call fully evaluates the list arguments, including the part that prints "oops." That makes these two programs arguably inconsistent:
: (mapcar and (list T NIL) (list NIL (prin "oops^J")))
oops
-> (NIL NIL)
: (list (and T NIL) (and NIL (prin "oops^J")))
-> (NIL NIL)
When the transformation passed to `mapcar` is a pure function, refactoring a `mapcar` call this way makes no difference. When the transformation is a side-effecting procedure, it does make a slight difference because one transformation call might perform its effects before the next transformation's arguments have started to be evaluated. And here, when the transformation is `and`, this refactoring makes a difference because `and` usually changes how often its arguments' effects are performed, but `mapcar` doesn't let it have that control.
That more or less makes sense operationally, but it means that in PicoLisp, I could imagine `mapcar` being more "fexpr-aware" than it is now. A more fexpr-aware design for `mapcar` would either:
- Allow `and` to affect the evaluation of `mapcar`'s own arguments. Now we get to implement a way to evaluate a list-valued expression without evaluating the list elements, so that we can pass the elements along to `and`. And once we're done with that, what should happen when we map fexprs that aren't even procedure-like, such as (mapcar let ...) or (mapcar setq ...)? Is there even a point to doing any of this?
- With full awareness of fexprs, make the decision to explicitly disallow non-procedure fexprs from being used with `mapcar`. That way, we don't even open up that can of worms.
Racket's `map` essentially accomplishes the latter.
By the way, if you'd be interested in a version of `and` that can be used as both a Racket macro with short-circuiting behavior and a first-class procedure without it, you can do that in Racket (using essentially a symbol macro):
In the context of Fexpress, this punning can be taken to another level, letting the first-class version of `my-and` be both a procedure and an Fexpress fexpr at the same time, while the second-class calls to it have Racket macro behavior. I've put together a test to demonstrate it:
In the Fexpress proof of concept, there's no way to convert between Fexpress fexprs and Racket macros in either direction. That makes this punning pretty pointless, except perhaps as a technique for publishing libraries that can be used roughly the same way by both Racket and Fexpress programmers.
Sorry, I do know that fexpr lisps can combine `map` and `and`. It sounds like you're proposing a different approach in Fexpress, so I was wondering what sort of vocabulary you are imagining. Were you thinking you'd have something called `map/f` that can take non-functions as its first argument? And so there'd be fexpr-aware variants for all higher-order functions? (Thinking harder, an `and/f` doesn't really make sense here.)
"Were you thinking you'd have something called `map/f` that can take non-functions as its first argument?"
No, I was thinking `map` should only take a procedure.
I don't think there's a particular way to "map" an fexpr that stands out as a compelling abstraction. I propped up a certain ambitious "evaluate a list-valued expression without evaluating the list elements" vision for `map` above, but this is really more of a lazy function map than an fexpr map.
The `map` operation is basically about the list data structure. Since it's a data structure that holds its elements eagerly, we can transform it with an eager function. If it held its elements lazily, we could use a lazy function. If it held its arguments unevaluated-ly, and if the elements didn't even have to be expressions, then the abstractions we could map over these lists might be fexpr-like, but I don't know of a way to make that make sense.
If you're thinking "but then what do you do with an fexpr once you have one?" I don't know. I've only found so many compelling uses for fexprs, which have to do with places where the code of the call site is needed at run time, possibly because run time and compile time are tangled up:
- Defining custom syntaxes in mutually recursive ways that involve invoking some of the syntaxes before some of the others have finished being defined.
- Making macro systems where redefinitions of the macros automatically propagate to change the behavior of previously defined things that call them.
- Writing abstractions that take care of their own JIT behavior. In order to strategically recompile their use sites according to usage statistics collected at run time, they need access to both the source code and the ability to observe run-time conditions.
In each case, fexprs would be serving as a language front-end layer, not as internal building blocks for general-purpose computation. I think source code is second-class in an essential way; although we can write macro abstraction layers that fabricate source code as they go along, this makes it harder to report informative errors in terms of the user's original code. Macros and fexprs take source code as input, so they're tied to the language front-end in a way that procedures aren't.
And when the source code isn't something we're abstracting over much, then neither is the environment of macros or fexprs that are in scope there. (Like, the environment is more or less an annotation on the source code, and if we focus on having only one original source codebase to annotate this way, then we only have one set of annotations to produce.) So a lot of the best techniques for macros or fexprs are likely to be self-contained abstractions that keep their first-class manipulations of macros and fexprs behind the scenes, rather than whole programming styles which make pervasive use of macros or fexprs as first-class values.
Thank you, that is very helpful! And this also matches some of my growing ambivalence with macros.
When I was building Wart[1], I repeatedly ran into situations that seemed very difficult to debug. Looking back, I tend to debug by simulating computation in my head, and macros made it easy to create code that was more complicated than I could visualize[2]. When I built Mu[3] I hoped that backing away from first-class macros and pervasive support for traces[4] would help. That's still in early days, but the evidence is quite ambivalent so far.
I think macros can get pretty complicated fast. I think in some ways it's at least as hard to write well polished macros (with good error messages, editor support, debugging steppers, pretty printers, and so on) as it is to build a well polished language without macros. I think the reason it feels easy to write macros in many languages is because many languages don't set a high bar for a polished experience.
I don't think taking macros out is an answer I'm very interested in, though. I want people to be able to use my stuff without having to write code the same way I do, and vice versa. Some kind of syntactic extensibility is important to me.
And I don't think taking macros out even really eliminates the "tied to the front end" non-compositionality issues. Every abstraction that can be used has a user experience, and macros just take more control of that experience than functions do. Every time a programmer tries to report a friendly error message from a function, they're trying to take some control over that experience, but the capabilities of a function only give them so much information to work with.
A language without macros can supply a one-size-fits-all kind of polished experience, which may indeed be a much better experience than what most macros provide in practice. But a language like that also imposes a practical limit on how much more polished it can get, especially for domain-specific applications that don't have the attention of the language developers.
To be clear I still use macros. I'm skeptical of the benefit of fexprs, but it totally makes sense to support them as a sharp-edged, high-power capability intended to be used rarely.
There's not much use I have for fexprs. I just had certain ideas about how they fit in with other concepts and how they could be made compatible with a compilation-favoring workflow.
I consider it basically a mistake to use the same syntax for macro calls and function calls... but not because they're different things. When we want to invoke a function like a macro, we can just conceptualize it as a macro that expands into a function call.
I think it's the same for fexprs. Macro calls can more or less expand into fexpr calls by simply preserving all the code they received under a `quote` in their expansion result. This doesn't quite get us a lexical-scope-respecting version of fexprs unless we also capture the local lexical environment, and that's not necessarily possible depending on the macro system, but it's not much of a stretch:
- Racket's macroexpander keeps track of the set of local variables in scope, but it doesn't quite expose it to user-defined macros.
- Arc's macroexpander `ac` keeps track of the set `env` of Arc local variables in scope, but it doesn't expose it to user-defined macros.
- In Guile, the local environment can be captured using (procedure-environment (lambda () '())).
- Fexpress currently lets compilation-friendly fexprs do this using the `depends-on-env?` field of a `compilation-result?` data structure. If a subexpression depends on the lexical environment this way, (fexpress-clambda ...) forms surrounding that subexpression expand differently so that they create a run-time representation of the lexical scope. This way, no one step in the code has to traverse or build up the whole environment, so the cost is spread out. Since Fexpress's variables are immutable, variable accesses don't even have to go through this run-time environment; they can be Racket variable accesses.
Anyhow, I basically consider fexprs to be one of the things a macro-capable language is theoretically capable of, even if people don't commonly prefer to use that functionality and macro systems don't always quite offer it.
I've been interested in exploring the concept of typed macros in general for the purposes of designing module systems which have both macros and typed API signatures. And I've had typed fexprs on my mind as an optimization approach since the Eight thread way back when, and I expect these two trains of thought to be on their way to the same destination.
With Fexpress, I explored the typed fexpr side, and I kept a really narrow focus on types for optimization rather than API boundary delineation so that I wouldn't make it any more complicated than it had to be. I don't want to be like "oh, by the way, in this complex type system for macros, you can find fexprs if you look for them," because some people might recoil at that. :-p And if I said something like that, but I didn't actually have an fexprs-by-default language ready to show, that would be quite a tease.
The actual languages I build with these ideas will probably not have fexpr calls as the default behavior for calling a local variable. Personally, I prefer not to have any default; it's not too much work to write out `funcall` unless I have a whole DSL's worth of local variables I'm invoking. But if it's a typed language, a lot of the local variables will probably have Lisp-1-style behavior anyway, because a variable of function type could be known to have function-call-like macro behavior and so on. And if some of the types people use turn out to declare that their variables have fexpr-like call behavior, I'll consider that to be an interesting outcome. The types, if they do any kind of soundness enforcement (unlike Fexpress's unsound hints), can keep those fexprs from sneaking into other people's code unless they're ready to receive them.
I know API enforcement wouldn't necessarily be your favorite feature in those designs. :-p But I think that's basically the picture of where Fexpress-style fexprs slot into my long-term goals, basically as one part of the possibility space that macros have more room to explore with the help of a type system.
If I understand right, I think those are supposed to be errors still. The documentation for `expand-user-path` says "In addition, on Unix and Mac OS, a leading ~ is treated as user’s home directory and expanded[...]" but it doesn't say there's any corresponding behavior on Windows.
Ah, interesting. That's a good point that it's Unix/MacOS-specific. My hope was that it should make Arc interpret paths the same way the default shell on the system does. On Linux, ~/ is the home directory of the current user, so it makes sense that Anarki would work that way on Linux. But Windows doesn't work that way, so it makes sense Anarki wouldn't either.
I don't know of any similar path tricks on Windows, But, if this change hasn't broken anything on Windows, I would be satisfied with that. Can you verify it still works to open an existing file? Are there any other special ways to refer to files or directories on Windows?
"On Linux, ~/ is the home directory of the current user, so it makes sense that Anarki would work that way on Linux. But Windows doesn't work that way, so it makes sense Anarki wouldn't either."
Yeah, I don't know much about what a Windows user would expect ~ to do. I would say `expand-user-path` leaving ~ alone on Windows is probably as good a behavior as any. It coincides with Command Prompt, where ~ just refers to a file or folder named "~". In PowerShell, ~ seems to be expanded to C:\Users\[username]\ somehow, so there's potentially an alternative design there.
---
"I don't know of any similar path tricks on Windows, But, if this change hasn't broken anything on Windows, I would be satisfied with that. Can you verify it still works to open an existing file?"
On Windows 10 64-bit with Racket 8.1, I've at least run the tests.arc unit tests, the unit-test.arc/tests.arc unit tests, and build-web-help.arc, and they seem to work.
---
"Are there any other special ways to refer to files or directories on Windows?"
I hardly know where to begin learning about and testing those features, and the documentation makes it look like Racket has explored hat rabbit hole pretty thoroughly already, so I'm inclined to suggest we just piggyback on Racket's work here.
At one point, years ago, I relied on every lambda returning a newly allocated object. I think I brought it up as a bug in Rainbow that the [] expressions in my Arc code were generating the same lambda-lifted result instead of creating unique objects each time.
But I think this was never something I could rely on in the original, Racket-based Arc either. The Racket docs don't specify whether a lambda expression returns a new object or reuses an existing one, and in fact they explicitly allow for the possibility of reusing one:
"Similarly, evaluation of a lambda form typically generates a new procedure object, but it may re-use a procedure object previously generated by the same source lambda form."
That sentence has been in the reference since the earliest versions I can find online:
I'm not sure if the bug you're referring to had to do with making the same mistake I was making back then, or if you're talking about making the opposite assumption (expecting two procedures to be equal), but it's probably best not to expect stability either way.
I was doing a deep-`iso` comparison I called `same`, because in Arc, hash tables are not the same. In Arc3.1:
arc> (iso (obj) (obj))
nil
arc> (iso (obj 1 2) (obj 1 2))
nil
So I made a function that iterated over the key/value pairs and compared them. In some unit tests, I used that on a hash table that contained functions. Yesterday, I refactored the unit tests to instead make sure the keys were right.
Interestingly, it seems that Anarki can compare hash tables:
arc> (iso (obj) (obj))
't
arc> (iso (obj 1 2) (obj 1 2))
't
Anyway, this should fix the unit test tests in the Anarki repo.
There are a number of system calls like (system "rm ...") in news.arc and other places. These run shell commands, and most of them only work on a POSIX shell. I'm sorry to say, I'm not sure anyone has ever gone through all these calls and made them portable so that news.arc can run on Windows.
If you use the Anarki master branch, I think a few of those calls have been replaced with more portable code, but not all of them. I've at least been able to get a server started up and showing content before.
I used to use Arc on Windows quite a bit, and I tried the web server a couple of times, but I never really used the web server. Nowadays I still use Windows, but I do a lot of my programming projects inside a Linux Mint VM using VirtualBox. I think running news.arc inside a VM like that would be one of the easier ways to get it working. Of course, learning to set up a VM like that could be a lot to figure out, but at least that's something a lot of people out there have already written up tutorials for.
Another possibility is to look for one of the many Hacker News clones that people have whipped up in other languages. You probably won't get to use Arc that way, but some of those clones may be better architected and better maintained than news.arc.
I've brought up an idea a couple of times here: If (quasiquote ...) and (unquote ...) match up and nest like parentheses, then what if we generalize that analogy to higher dimensions?
After years of going back to the drawing board over and over, I finally have a working, reasonably elegant (if slow), and most importantly, documented implementation of hypersnippets and hyperbrackets in Punctaffy. I'm really excited to be able to give examples of how these can be used for things other than quasiquotation.
The "Introduction to Punctaffy" section introduces the idea of degree-2 hyperbrackets. There are precious few examples where degree-3 hyperbrackets would come in handy (much less higher-degree ones), but the infrastructure is working for those as well.
There are still a number of directions I hope Punctaffy will take from here, but it's finally past the "sorry, I haven't written up good documentation, so I don't know how to explain" stage that it's been stuck in for quite a while. :) If you're interested, I've written up some in-depth discussion of those future ideas in the "Potential Applications" sections. One of those (the interaction between 'unsyntax and ellipses in the Racket syntax template DSL) would even be a practical application of degree-3 hyperbrackets.
This documentation totals to something like 30,000 words. Most of those words are the reference documentation of Punctaffy's hypersnippet-shaped data structures (hypertees and hypernests). Don't worry about reading through all of it, but I'd be very happy to see if anyone could make it through the introduction. After 4 years of not having much to show people, I really hope that introduction will help. :)
There are a number of factors going into what makes it slow (and what might make it faster). I'm particularly hopeful about points 4 and 5 here, basically the potential to design data structures which pay substantially better attention to avoiding redundant memory use.
By the way, thanks for asking! I'm reluctant to talk about this in depth in the documentation because I don't want to get people's hopes up; I don't know if these optimizations will actually help, and even if they do, I don't know how soon I'm going to be able to work on building them. Still, it's something I've been thinking about.
1. Contracts on the whole
Punctaffy does a lot of redundant contract checking. I have a compile-time constant that turns off contract checking[1], and turning it off gives a time reduction in the unit tests of about 70%, reducing a time like 1h20m to something more like 20m. That's still pretty slow, but since this is a quick fix for most of the issue, it's very tempting to just publish a contractless variation of the library for people who want to live on the edge.
2. Whether the contracts trust the library
Currently, all the contracts are written as though they're for documentation, where they describe the inputs and the outputs. This constrains not just that the library is being used with the correct inputs but that it's producing the correct outputs. Unless the library is in the process of being debugged, these contracts can be turned off.
(Incidentally, I do have a compile-time constant that turns on some far more pervasive contract-checking within Punctaffy to help isolate bugs.[2] I'll probably want it to control these output-verifying contracts as well.)
3. The contract `hypernest/c`
One of the most fixable aspects here is that my `hypernest/c` contract checks a hypernest's entire structure every time, verifying that all the different branches zip together. If I verified this at the time a hypernest was constructed, the `hypernest/c` contract could just take it for granted that every hypernest was well-formed.
4. Avoidable higher-dimensional redundancy in the data structure
Of course, even without contracts, 20 minutes is still a pretty long time to wait for some simple tests to compile. I don't want to imagine the time it would take to compile a project that made extensive use of Punctaffy. So what's the remaining issue?
Well, one clue is a previous implementation of hypersnippets I wrote before I refactored it and polished it up. This old implementation represented hypertees not as trees that corresponded to the nesting structure, but as plain old lists of brackets. Every operation on these was implemented in terms of hyperstacks, and while this almost imperative style worked, it didn't give me confidence in the design. This old implementation isn't hooked up to all the tests, but it's hooked up to some tests that correspond to ones that take about 3 minutes to run on the polished-up implementation. On the old list-of-brackets implementation, they take about 7 seconds.
I think there's a reason for that. When I represent a hole in a hypertee, the shape of the hole itself is a hypertee, and the syntax beyond each of the holes of that hole is a hypertee that fits there. A hypertee fits in a hole if its low-degree holes match up. That means that in the tree representation, I have some redundancy: Certain holes are in multiple places that have to match up. Once we're dealing with, say, degree-3 hypertees, which can have degree-2 hypertees for holes, which can have degree-1 hypertees for holes, which have a degree-0 hypertee for a hole, the duplication compounds on itself. The data structure is using too much space, and traversing that space is taking too much time.
I think switching back to using lists of brackets and traversing them with hyperstacks every time will do a lot to help here.
But I have other ideas...
5. Avoidable copying of the data structure
Most snippets could be views, carrying an index into some other snippet's list of brackets rather than carrying a whole new list of brackets of their own. In particular, since Punctaffy's main use is for parsing hyperbracketed code, most snippets will probably be views over a programmer's hand-written code.
6. The contract `snippet-sys-unlabeled-shape/c`
There's also another opportunity that might pay off a little. Several of Punctaffy's operations expect values of the form "snippet with holes that contain only trivial values," using the `snippet-sys-unlabeled-shape/c` contract combinator to express this. It would probably be easy for each snippet to carry some precomputed information saying what its least degree of nontrivial-value-carrying hole is (if any). That would save a traversal every time this contract was checked.
This idea gets into territory that makes some more noticeable compromises to conceptual simplicity for the sake of performance. Now a snippet system would have a new dedicated method for computing this particular information. While that would help people implement efficient snippet systems, it might intimidate people who find snippet systems to be complicated enough already.
It's not that much more complicated, so I suspect it's worth it. But if it turns out this optimization doesn't pay off very well, or if the other techniques already bring Punctaffy's performance to a good enough level, it might not turn out to be a great tradeoff.
I want to think more about this and continue the conversation, but I'm worried the reply window will close first.
Since I presume your input space is relatively small (the AST of a program, which usually only has a few thousand nodes), it sounds like you have some sort of state-space explosion.
Your comment about recursive matching of hypertees sounds like the biggest problem.
Just a shot in the dark (having not studied what you're doing yet), but is there any chance you could use partial-order reduction, memoization, backtracking, etc. to reduce the state-space explosion?
I could be wrong, but most of the other optimizations sounded like they address constant factors, like contract checking.
But then I don't know much about how contracts work; I guess the verification logic could be rather involved itself.
If the window closes, maybe we could continue at #arc-language:matrix.org