Even if you merge `writefile` and `save-table` like this, but not `readfile1` and `read-table`, then people still need to know, at development time, what type of data is in the file in order to read it, so they might as well use a type-specific way to write it as well. Unfortunately, merging `readfile1` and `read-table` isn't really possible, since their serialized representations overlap; they can't reconstitute information that was never written to the file to begin with.
From a bigger-picture point of view, this seems like it would become a non-issue once Arc had its own reader. I assume the problem with reading tables using `read` is that Racket's reader constructs immutable hashes. An Arc-specific reader would naturally construct Arc's mutable tables instead.
Doesn't Racket's reader give us similar problems in that it reads immutable strings and cons cells, too? So these problems could all be approached as a single project.
In the short term, it's not a project that would need a whole new reader. It could just be an adaptation of Racket's existing reader... something like this:
(define (correcting-arc-read in)
(let loop ([result (read in)])
; TODO: See if this should construct a Racket mutable cons cell
; instead (`mcons`). Right now this just creates an immutable
; one, which should be fine since Arc uses an unsafe technique
; to mutate those.
[(cons a b) (cons (loop a) (loop b))]
[ (? hash?)
(match-lambda [(cons k v) (cons (loop k) (loop v))])
[ (? string?)
; We construct a new mutable string with the same content as
(substring result 0)]
; We handle tagged values, which are represented as mutable
; Racket vectors.
[(? vector?) (list->vector (map loop (vector->list result)))]
; We handle various atomic values. (TODO: Add more of these
; cases until we've accounted for every writable type Arc
; supports. Alternatively, just make this a catch-all
; `[_ result]`.)
[(? number?) result]
[(? symbol?) result])))
Writing Arc's `queue` type might be tricky, since that representation relies on sharing. It's possible queues (and other tagged values in general) should have a customized read and write behavior.
> I assume the problem with reading tables using `read` is that Racket's reader constructs immutable hashes.
Racket also has mutable hashes created using `make-hash` rather than `hash`. It could just be that the tables are not serialised as something Racket reads as mutable hashes when deserialising it back again?
"It could just be that the tables are not serialised as something Racket reads as mutable hashes when deserialising it back again?"
I'm pretty sure the Racket reader never reads a mutable hash, but that it's possible for a custom reader extension to do it.
Some of Racket's approach to module encapsulation depends on syntax objects being deeply immutable. In particular, a module can export a macro that expands to (set! my-private-module-binding 20) but which uses `syntax-protect` so that the client of that macro can't use the `my-private-module-binding` identifier for any other purpose. If the lists constituting a program's syntax were usually mutable, then it would be hard to stop the client from just mutating that expansion to make something like (list my-private-module-binding 20), giving it access to bindings that were meant to be private.
I think this is why Racket's `read-syntax` creates immutable data. As for why `read` does it too, I think it's just a case of `read` being a relatively unused feature in Racket. They don't have many use cases for `read` that they wouldn't rather use `read-syntax` for, so they don't usually have reasons for the behavior of `read` to diverge from the behavior of `read-syntax`.
All this being said, they could pretty easily add built-in syntaxes for mutable hashes, but I think it just hasn't come up. Who ever really wants to read a mutable value? In Racket, where immutable values are well-supported by the core libraries, you aren't gonna need it. In the rare case you want it, it's easy enough to do a deep traversal to build the mutable structure you're looking for (like my example `correcting-arc-read` does).
It only comes up as a particular problem in Arc. Arc's language design doesn't account for the existence of immutable values at all, so working around them when they appear can be a bit quirky.
The Racket reader reads a seven-element list there, not a mutable hash.
Looks like you're proposing to use a two-stage serialization format. One stage is `read` and `write`, and the other is `serialize` and `deserialize`. At the level of language design, what's the point of designing it this way? (Why does Racket design it this way, anyway?)
I can see not wanting to have cyclic syntax or syntax-with-sharing in the core language just because it's a pain in the neck to parse and another pain in the neck to interpret. Maybe that's reason enough to have a separate `racket/serialize` library.
But isn't the main issue here that Arc's `read` creates immutable tables when the rest of the language only deals with mutable ones? The mutable tables go through `write` just fine, but `read` doesn't read the same kind of value that was written out. If and when this situation is improved, I don't see where `racket/serialize` would come into play.
Hmm, all right. I didn't want to believe that was the point you were trying to make. In that case, I think I must not have conveyed anything very clearly in the `correcting-arc-read` comment.
I've been trying to respond to this, which was your response to that one:
"Racket also has mutable hashes created using `make-hash` rather than `hash`. It could just be that the tables are not serialised as something Racket reads as mutable hashes when deserialising it back again?"
In the `correcting-arc-read` comment, I used `make-hash` in the implementation of `correcting-arc-read`, so I assumed your first sentence was for the edification of others. I found something to respond to in the second, which kinda pattern-matched to a question I had on my mind already ("Can't the Racket reader just construct a mutable table since what was written was a mutable table?").
As for my response to your "No..?"...
One of the purposes of `correcting-arc-read` is that (when it's used as a drop-in replacement for Arc's `read`) it makes `readfile1` return mutable tables. So if anyone had to be convinced that a reader that returned mutable hashes could be implemented at all in Racket, I thought I had shown that already. When it looked like you might be trying to convince me of something I had already shown, I dismissed that idea and thought you were instead trying to clarify what your proposed alternative to `correcting-arc-read` was.
Seems like I've been making bad assumptions and that as a result I've been mostly talking to myself. Sorry about that. :)
Maybe I oughta clarify some more of the content of that `correcting-arc-read` comment, but I'm not sure what parts. And do you figure there are any points you were making that I could still respond to? I'd better not try to assume what those are again. :-p
I have an old fork (https://github.com/akkartik/arc) that has an extensible generic pair of functions called serialize and unserialize which emit not just the value but also tagged with its type. read and write are built atop them.
After the first few paragraphs, this picked up and was an interesting read. That's when it became clear this article wasn't going to try to claim Lisp is supernatural, but to explain why people think of Lisp as supernatural. :)
pg: "Arc embodies a similarly unPC attitude to HTML. The predefined libraries just do everything with tables. Why? Because Arc is tuned for exploratory programming, and the W3C-approved way of doing things represents the opposite spirit.
Tables are the lists of html. The W3C doesn't like you to use tables to do more than display tabular data because then it's unclear what a table cell means. But this sort of ambiguity is not always an error. It might be an accurate reflection of the programmer's state of mind. [...]
Good cleanness is a response to constraints imposed by the problem. Bad cleanness is a response to constraints imposed from outside-- by regulations, or the expectations of powerful organizations."
Personally, I think using semantic HTML isn't that big a deal to implement, and it seems to have practical benefits in terms of accessibility. It isn't just something the W3C is trying to impose on people arbitrarily.
And yet, the built-in features of HTML and CSS have a tendency of being very arbitrary. If you want a text box, you can have it; if you want a set of radio buttons, you can have it; if you want a set of radio buttons where one of them says "Other (please specify)" and has an associated text box, you suddenly have a significant amount of code to write. If you want to style the first letter of a paragraph, you can use a ::first-letter CSS selector... but if you want to style letters other than the first, you need to wrap them in explicit HTML elements, which unfortunately has other effects you might not want (like causing screen readers to treat each letter like it's a separate word).
Sometimes, there isn't a workaround. For instance, pages have titles, which appear in the top of the browser window. Have you ever seen a page with a title and a subtitle displayed just under it? I haven't, and short of finding a security hole that makes the browser execute arbitrary code, it seems pretty clear there isn't a way to do this.
Sometimes, there is a conceivable workaround, but it requires something like building your own text layout engine from scratch and then wrangling with a lot of obscure Unicode scripts, screen-reader troubles, text selection support, etc. There are some in-browser text editors which have to make this kind of effort just to achieve syntax highlighting.
And sometimes, there's a workaround that's a lot like building your own substantial subsystem of the browser, but it's actually fairly reasonable to do in a pinch. Like, there are a bunch of front-end frameworks for writing reactive UIs. They take in something that's pretty similar to DOM nodes (sometimes even obtained by parsing DOM nodes that aren't meant to be displayed to the user), they generate actual DOM nodes that are similar to those, and they modify those generated DOM nodes on the fly as the application state changes. In certain respects, these frameworks can save a lot of work by taking advantage of the underlying features of HTML... and in certain respects, there's extra work involved in inverting HTML's abstractions to get them to support this new indirect interaction style they weren't designed for.
pg: "But the advantage of a rewritable language is more than that it lets programmers fix your mistakes. I think the best programmers tend to work by rewriting whatever language they're using. So even the perfect language, if there is such a thing, would be very rewritable. In fact, if I had to guess, I think the perfect language might be whichever one was most rewritable."
How much code does it take for someone to implement their own "rewritten" variation of some HTML or CSS feature? Well, if they can't achieve their goal without writing their own text layout engine or their own virtual DOM framework, quite a lot.
This is important to Arc because it makes the program longer.
Arc is a language designed incrementally by starting with something Lispy and then making whatever changes will shorten Arc programs. News is a program that was written to put Arc to the test; the shorter its code is, the better Arc is doing.
If News used HTML or CSS features in very picky ways, it wouldn't be a good test: As soon as News's needs strayed slightly from the HTML and CSS features that browsers had built-in, then a heap of code would need to be written to make up the difference. When a slight discrepancy in the way a measuring instrument is consulted results in a large discrepancy in the measurement, it's not a very reliable measuring instrument.
So it seems to me that of all possible programs, News was pursued because it could get by on using HTML features in non-picky ways, pretty much in the ways they were already designed to be used. The use of HTML tables and transparent gif spacers was a well-known and once-popular, if dated, technique for achieving layout that was consistent across browsers, so pg built abstractions on that technique for News.
All that being said, I personally think it's a great improvement for News to use semantic HTML tags instead of tables, and I don't think this change really does that much to the size of the codebase (does it?). I just figure these pg writings are interesting in this context.
This has me thinking about html.arc....
The way html.arc is designed involves a lot of special-casing of specific HTML tags and attributes. It's almost like a full go-between layer abstracting HTML from Arc, which suggests that with some ambitious modifications to html.arc, it could turn into a DSL that compiles to HTML in a more indirect way (perhaps performing nonlocal transformations to implement things like footnotes or column breaks). This would potentially be a good place to hone the design of the HTML built-ins so that they're more abstraction-friendly and more "rewritable" as far as Arc code is concerned.
This has a lot in common with those front-end frameworks I mentioned. They abstract over and extend the features of HTML, and in doing so, they tend to make HTML's built-ins "rewritable" by exposing a new way for programmers to define their own extensions of the same kind.
(ノಠ益ಠ)ノ彡┻━┻ NO PG LISTS ARE ALREADY THE LISTS OF HTML
... sorry. Don't know what came over me there.
>and I don't think this change really does that much to the size of the codebase (does it?).
Most of it is the result of moving existing code around, so I think it comes out about even. I don't know how much of a performance issue macro expansion is but there is less of it in the new code, and the HTML itself should be simpler without tables.
>This has me thinking about html.arc....
Racket has its own xml/html library and there is an sml.arc which I haven't played with yet that seems like it might be capable. html.arc seems to do both too much and too little... the attributes blacklist makes it difficult to have modern features like data attributes, and the more macros there are, the more polluted the global namespace becomes.
I've sometimes thought it would be nice if Arc supported css and xml grammar natively, but I have no idea what it would take to actually support that. And I'm probably the only person here who wants to just write html and css directly, rather than use s-expressions or concatenating strings.
> Tables are the lists of html. The W3C doesn't like you to use tables to do more than display tabular data because then it's unclear what a table cell means. But this sort of ambiguity is not always an error. It might be an accurate reflection of the programmer's state of mind. [...]
Note that it's traditional html tables pg is referencing. I could be wrong, but my understanding is that the 'display' properties options: 'table', 'table-row', 'table-cell' etc, were not well supported (in IE particularly) or even existent at the time pg wrote the code. So he may not believe the same now.
> The predefined libraries just do everything with tables. Why? Because Arc is tuned for exploratory programming, and the W3C-approved way of doing things represents the opposite spirit.
I don't agree. My understanding is that traditional tables are ridged, which is why they suggest you only put data into it. They are highlighting that it's, generally, not suitable for other things. Divs allow for more flexible manipulation. For example, I can create a table and then decide to break out of the table somewhere in the middle of it's content to render some component. Or I can make a table and morph it into something different by only changing it's properties via css/js.
But pg was not writing web apps. He was writing web pages and then calling a new page for, pretty much, ANY change. So from that perspective (where you can macro away server side) you can see why pg would just put it all in a table and highlight how using arc macros will let you do more compose-able things.
This came up during ar development, although I don't remember who brought it up. I think aw was about to implement it but was concerned that it would be annoying at the REPL to have all the program's bindings be looked up eagerly (aka for them to not be "hackable"). It could be annoying for mutually recursive macros, too, although those are already tricky enough that it probably wasn't a big concern at the time.
I remember recommending an upgrade: Instead of generating:
`(,my-func ,a ,b)
Generate something that preserves the late binding of mutable global variables, like this:
`((',(fn () my-func)) ,a ,b)
I seem to remember aw wasn't convinced this cruft was going to be worth whatever hygiene it gained, even after I recommended a syntactic upgrade:
`(,late.my-func ,a ,b)
Since aw values concision a lot, perhaps the issue was that most programmers would surely just write ,my-func in the hope that it would be rare (or even incorrect) to ever want to rebind it, and then other programmers would suffer from that decision.
But Pauan followed a design path like this in Nulan and/or Arc/Nu. In Pauan's languages... actually, I think I remember a couple of approaches, and I don't remember which ones were real.
One approach I think I remember is that `(my-func a b (another-func c) d) inserted mutable boxes for my-func and another-func into the result, but it would just implicitly unquote a, b, c, and d, because variables at the beginning of a list are likely to be mutable globals and variables in other positions are likely to refer to gensyms.
There might have even been an auto-gensym system in that quasiquote operator at some point.
I liked this approach a lot at the time. That was the same time I was working on Penknife. I was trying to avoid name collision with first-class namespaces in Penknife, but I still wanted macros to work, and I was trying to take a very similar approach. (I couldn't take exactly the same approach because my macroexpansion results were strings; instead, I think I allowed regions of the macroexpansion result to be annotated with a first-class namespace to use for name lookup.)
When Penknife's compile times were abysmally long, I started to realize that even if I found an optimization, this was going to be a problem with macros in general. Anyone can write an inefficient macro, and anyone who can put up with it can build a lot of stuff on top of it that other users won't be able to appreciate. So I started to require separate compilation in my language designs.
With separate compilation in mind as a factor for the language design, it no longer made sense to put unserializable values into the compiled code. Instead, in Penknife's case, I devised a system of namespace paths to replace them. The namespaces were still first-class values, but one thing you could do with a Penknife macro was get hold of its first-class definition-site namespace, so the macroexpanded code could refer to variables in distant namespaces by specifying a chain of names of macros to look them up from. And this kept things hackable, too, since you could mutate a macro's definition-time namespace explicitly (not that many programs or REPL sessions would bother to do that).
Not long after, I set a particular goal to make a language (Era) where the built-in functionality was indistinguishable from libraries. Builtins aren't hackable from within the language, so the module system needs to make it possible (and ideally easy) for people to write non-hackable libraries.
(Technically they only need to be non-hackable to people who don't know the source code, because once you know the source code, you know you're not dealing with builtins. I intend to take advantage of this to make the language hackable after all, but it's going to take essentially a theorem prover in the module system before it's useful to tell the module system you have the source code of a module, as opposed to just using that source code to compile a new module of your own behind the module system's back.)
Anyhow, this means I haven't put hackability at the forefront for a long time.
I think the embedding-first-class-values approach will work, and I think late binding is workable (by using late.my-func) and there's a workable variation of that late binding approach to enable separate compilation too (by using namespace paths made out of chains of macro names). So I like it, but I just have this stuff to recommend for it to make it really tick. :)
By the way, it's good to see you! I wondered how you were doing.
For separate compilation, it does seem clear that what gets serialized will be references like "the object [probably a function] named 'foo in module bar", and structures (s-expressions or otherwise) containing such references. Given that compilation implies macroexpansion, you do have to assume (or verify) that the macros from other modules are what they used to be—and that non-macros (used in functional position at least) are still non-macros. If you have a full-blown Makefile kind of build system, then by default I suppose every file's output depends on the contents of every other file that it uses; or, as an optimization, depends merely on the exact set and definitions of macros exposed from those files. (In the C++ system I encounter at work, code is separated into .cpp and .h files, and editing a .h file causes the recompilation of every .cpp file that recursively depends on it, but editing a .cpp file only causes its own recompilation. If you wanted to imitate that, I guess you'd put macros into a distinctively named set of files, and forbid exportable macros anywhere else.)
Thanks! I've sold out and have been working for a medium-sized company doing mostly C++ and bash (the latter is unbelievably useful) for the past 3.5 years. I make intermittent progress on the side doing other things.
This is a blog post I wrote up about a bunch of programming I've been doing for my Cene language and some related Racket libraries.
I've finally been able to make an extensible `quasiquote` built on what I call hypersnippets. I've also referred to this as "higher-quasiquotation-shaped syntax" in the past, and lately I understand that my hypersnippets are the exact same thing as what people already refer to as "opetopes" in higher category theory.
Building this kind of extensible `quasiquote` operation is a goal I've been pursuing for two years now. Now I can work on polishing up the code, sorting things out, documenting them... and, ultimately, returning to the task I was doing two years ago before I had to go out of my way to figure out hypersnippets: Doing polish and documentation for the Cene language itself.
I'm sorry it took me so long to reply to this. Basically, I don't think I've successfully explained to anyone what a hypertee is, ever. There are some places in the comments in Punctaffy where I explain them, but I haven't put in the work to make it a very well-illustrated introduction.
Lately I realized I shouldn't be representing my higher quasiquotation/hypersnippet macro system's syntax with plain old hypertees anyway, but with something I'm calling "hypernests." So I implemented hypernests... in a flawed way that didn't actually serve the purpose I expected, so now I'm in the middle of some refactoring to fix them. It's hard for me to justify talking about the things I've built in Punctaffy when I know I can explain and motivate the topic a lot better once I have a working macro system to show for it.
It never seems like it should be that much work to just take out a piece of paper and draw up some diagrams for a blog post... but whenever I get started trying to explain like that, I usually realize I've been doing certain things wrong and need to refactor.
I hope to have something soon. I've written up a lot more unit tests, and the latest refactoring of the hypernest implementation is becoming as simple as I always hoped this kind of thing could be; the primary risk I anticipate is that it'll fall into infinite loops. I technically already have a working macro system for extensible `quasiquote` which could serve as a demo, but I'm pretty sure it breaks for operators of higher dimension than `quasiquote`, and that's what my refactoring is going to fix.
If it helps to write a very short explanation:
In quasiquotation syntax, the unquote operation is like a 1-dimensional closing bracket, just as the closing parenthesis is a 0-dimensional closing bracket. See, the unquoted part of the code starts at one (0-dimensional) location in the text and stops at another, so it's like a line segment. We can imagine 2-dimensional closing brackets which are shaped like quasiquotations, and so on.
The 1-dimensional closing bracket actually begins with a 0-dimensional bracket that opens a 1-dimensional region that must be closed by another 0-dimensional closing bracket.
The whole thing is the 1-dimensional closing bracket.
The parenthesis at the end is the 0-dimensional bracket that closes it.
If we write a single opening bracket, including all the closing brackets it needs, and all the closing brackets those closing brackets need, etc., I'm pretty sure we have an opetopic shape as used in higher category theory: The closing brackets are the various-dimensional source cells of the opetope.
If we label each of the closing brackets (of every dimension) of the single opening bracket with a data value -- or from another point of view, put an "unquoted expression" into every hole of our higher-quasiquotation-shaped syntax, then that's what I call a hypertee.
If we have a syntax with closing brackets and (nestable) opening brackets, and we put labels on all the closing brackets of the outermost opening bracket (labels which we can think of as "unquoted expressions") and labels on all the nested opening brackets (labels which we can think of as "operators" or "macro names" which apply to those opening brackets' contents), then that's what I call a hypernest.
Closing brackets have to be of a dimension strictly lower than the bracket they're closing. That's different from opening brackets; we can nest a high-dimensional opening bracket inside of a low-dimensional one.
Nesting opening brackets are pretty exotic if you consider them from the geometric standpoint of opetopes -- how does it make sense to have a low-dimensional shape with high-dimensional faces on it? -- but it's necessary for Punctaffy's syntax purposes. That's because we need to be able to write a quasiquotation operator of some specific dimension N that can quote any operator in the language, including those of dimension N or greater. This is why I've needed to move to hypernests for syntax lately, even though I spent a lot of time thinking I could get by with hypertees.
No, I was forgetting that ac is considered a compiler :) But it compiles the Arc codebase every single time Arc starts up. Maybe we should start memoizing its outputs to disk somehow, see if that makes it noticeably faster. My suspicion is egregious runtime processing like ar-denil-last and ar-apply-args will cause it to not make a difference.
Given the pervasiveness with which Arc has to make such transforms at runtime, I've gotten into the habit of thinking of it effectively as an interpreter.
Tweets can be missed. Perhaps we should email firstname.lastname@example.org.
I've been reluctant to do this, because the outcome may well be, "wait, is this old site really still up? Let's just take it down." :) Don't mind li'l ol' us out here, we're no trouble, no trouble at all..
edit: I'm ok if they take it down. I'll know where to go via the anarki wiki (if someone updates it). And it may actually be better if they do take it down IMHO as it will force everyone to find a place with more control over the setup.
I think if we fork the community site to run on anarki. which I think is more likely than being given control over the Arc Forum, we should consider ways to archive and bring forward all of the stuff on the existing arc forum. It shouldn't be too hard to crawl the forum, though I think there might be some DoS prevention that would slow it down.
There are several more things I should explain about this:
===== What information a help entry is based on =====
Sometimes, some help information is populated while other information isn't available. For instance, right now `list` has everything help.arc is designed to display except for a docstring:
arc> (help list)
[fn] (list . args)
list is not documented.
arc> (list 1 2 3)
(1 2 3)
arc> (list "a" '(1 2) 3)
(a (1 2) 3)
arc> (src list)
(def list args args)
It has a signature ('args), an implementation definer ('def), an implementation body ('args), a source file ("arc.arc"), and examples ('((list 1 2 3) (1 2 3) (list "a" '(1 2) 3) ("a" (1 2) 3))), and of course a value that has a type ('fn), but it has no docstring.
When the code in build-web-help.arc generates the HTML page, it determines whether to display a help entry purely by whether it has a docstring or not. I figure this is a good way to distinguish between things that are interesting to read about and things that are idiosyncratic helper functions, but one consequence is that there is no entry displayed for `list` right now.
===== Broken links =====
When the documentation refers to another entry by [[foo]], it's converted to an link to the relevant entry on the page. If there is no entry by that name on the page, it's instead converted to a span of style "broken-link", which shows up in red. If you'd like to fill in gaps in the documentation, you can view the source of the page to find all the occurrences of "broken-link".
===== Security of repository access privileges =====
The script pushes to Anarki's gh-pages repo using a personal access token for my GitHub machine user, rocketniabot. I just created rocketniabot for this purpose. The token is limited to pushing to rocketniabot's public repos, and right now the only public repo rocketniabot has access to is Anarki. If this changes in the future, I might want to ask someone else if they can use an access token. (And if no one wants to volunteer one, it's not the end of the world; we'll just stop having automatic pushes to the `gh-pages` branch until it's fixed again.)
The token is not committed to the repo; it's set up in the Travis CI settings as a so-called "encrypted environment variable," which is only exposed during a non-pull-request build or a pull request build that comes from another branch in the same repo. I believe this prevents non-contributors from accessing the token.
Although people could make rocketniabot look bad by having it push abusive content to the `gh-pages` branch, they would first have to either push a build script that exposes the token or push content like that to the `master` branch themselves.
If we notice either of those kinds of abuse occurring, I recommend we take two actions:
- Remove that contributor from the Anarki project on GitHub so they can't keep doing this.
- Please let me know so I can revoke the compromised rocketniabot access token.
- Until I do that, remove the rocketniabot contributor from the Anarki project.
- If you'd like to set up automatic `gh-pages` pushes again and I'm not responding to messages, then I recommend you choose another user account who's willing to be responsible for the automatic pushes, have that account set up a personal access token with public repo access, and put that token in the Travis CI configuration instead. (If you do this, please change the variable name so it's not "ROCKETNIABOT_GH_TOKEN". I think we need to keep track of whose it is so we can notify the right person when it's compromised.)
===== Security of the website's client-side data =====
If we ever have any page on the arclanguage.github.io domain store client-side data (like localStorage entries or cookies), someone could potentially access this information and take advantage of it before they're removed from the project. Because of this, I recommend we don't store any client-side data on that domain.