Arc Forum | Ask AF: Add curly brace table syntax to arc?

Arc Forum

Ask AF: Add curly brace table syntax to arc?

7 points by kinnard 2360 days ago | 18 comments

Could a curly brace syntax for tables be added to anarki?

Ex:

  arc> (= user {name "John Doe" age 23 id 73881})
  #hash((age . 23) (id . 73881) (name . "John Doe"))

2 points by rocketnia 2347 days ago | link

This is what I think would be a great way to enter and print tables at the REPL:

  arc> (ob (v name "John Doe") (v age 23) (v id 73881))
  (ob (v age 23) (v id 73881) (v name "John Doe"))

And if they must be compatible with `read` and `write`, I think this would be a great way to render them for that:

  (##ob (v name "John Doe") (v age 23) (v id 73881))

This way it's just about as easy to refactor between `(##ob (v k1 ,v1)) and (ob (v k1 v1)) as it is to refactor between `(,a ,b ,c) and (list a b c).

(The v here is for "value." An alternate syntax, (kv ...), could be used for entries where the key isn't quoted.)

(Note that (##ob ...) here is a reader macro call. I'm using a design for reader macros that puts the macro name on the inside of a parenthesis, rather than the approach taken by things like Racket's #hash(...). That way, reader macro names can be descriptive without pushing the indentation far to the right.)

This approach generalizes to just about any other data structure we want, such as graphs, queues, sorted sets, etc. We don't have to pick out new parentheses for each one, and we don't have to specify idiosyncratic indentation rules either, so pretty printing at the REPL can be very nice automatically:

  (ob
    (v name "John Doe")
    (v age 23)
    (v id 73881))
  
  (ob
    (v name
      "John Doe")
    (v age
      23)
    (v id
      73881))

In terms of Racket implementation, it should be pretty easy to get most of this working using `gen:custom-write`, `make-constructor-style-printer`, and `pretty-print`. Racket's pretty-printer will probably give us results I find slightly less satisfying, but it's a start:

  (ob
   (v name "John Doe")
   (v age 23)
   (v id 73881))
  
  (ob
   (v
    name
    "John Doe")
   (v
    age
    23)
   (v
    id
    73881))

There are only a few other tricky parts:

- We may have to represent Arc tables as their own data structure, rather than directly as Racket hashes, so that they print nicely even when they're nested inside other Racket data structures like lists and vectors. This is one distinct place where, for the best possible Racket interop, we may need to avoid representing Arc values the same way as Racket ones. Then again, I think `port-print-handler` might provide the ability to print parts of Racket values using the Arc style, so it could be possible to get very nice interop here.

- In order to get (##foo to be processed as a call to an Arc reader macro called "foo", we would need to replace the Racket ( readtable entry with an entry that behaved the same as it does in non-## cases. Racket's ( syntax isn't as simple as it might seem, as I found out when I wrote a custom open parenthesis for Parendown, and I would be glad to copy out some of my Parendown code to make this work.

- Of course, it would take some design work to decide on Arc-side interfaces for defining things like reader macros, custom write behaviors, and maybe even custom REPL pretty print behaviors and custom quasiquotation behaviors (to determine where unquotes can go). In Racket, customization of the `write` or `print` behavior is usually done in a per-value-type way using `gen:custom-write`, but I think it would be better to associate them with the "current writer" or "current printer" somehow, just as the reader and macroexpander use the "current readtable" and the "current namespace." That would allow us to swap out the writer at the same time as we swap out the reader, rather than letting the `read` and `write` behavior get out of sync. Essentially, I would store all these things in the Arc namespace.

---

Would it be much trouble if I started working toward some of these things for Anarki or Amacx? If I do work on this, which things would need my help the most or would make the best milestones? Honestly, my top priorities right now are Punctaffy and Cene, so even though I can express opinions about Arc, I might not allocate the time to follow through on them myself. (My desire not to burden people with something that I think of as being in only in a half-finished state has always been one of the reasons I commit so rarely to Anarki.)

I know the reader syntax for tables bears very little resemblance to the curly brackets people have been talking about here, and I don't want to trample on that. Maybe tables can `write` with curly brackets while other things tend to use this more general-purpose style.

shawn, are you currently trying to write a full pretty-printer for Arc values from scratch just so Racket hashes can be written using curly brackets? Are you using `port-print-handler` or something? That's another thing I'd rather not trample on if you have an idea underway.

-----

2 points by aw 2347 days ago | link

> Would it be much trouble if I started working toward some of these things for Anarki or Amacx?

My aspiration for Amacx is that it becomes a framework that allows you to create the language you want to create. By analogy, similar to how if you're writing a compiler, and you'd find LLVM useful, you can use LLVM as part of your toolchain to write your compiler.

Thus, if you (or someone) wanted to create a particular reader and printer syntax for tables (whether ##ob and v or something else), then you certainly should be able to do that.

I have both an Arc reader and printer written in Arc, but not yet included in Amacx because currently it's too slow. Working on the reader and printer makes the most sense, I think, after finishing my current work on source location tracking (assuming that works out), both because with a profiler it will be easier to see how to speed up the implementation, and because the reader will need to support source location tracking itself.

There's a lot of "if"s here, but in the happy scenario that everything works out, then hopefully adding ##ob and v (or whatever someone wants) will be easy: just add a few lines of Arc code :-)

-----

2 points by i4cu 2347 days ago | link

Personally, I don't think this is going to make the language more attractive. You've traded better printing for more verbose code.

  current-arc> (obj name "John Doe" age 23 id 73881)

  your-arc> (ob (v name "John Doe") (v age 23) (v id 73881))

maybe?:

  alt-arc> (ob name "John Doe" age 23 id 73881)

returns (Assuming you're attempting to have ordered tables?):

  (ob (v name "John Doe") (v age 23) (v id 73881))

-----

2 points by rocketnia 2347 days ago | link

(I hope you don't mind if I change my mind and use `object` instead of `ob`. I just remembered `ob` is a pretty good local variable name for object values.)

Code could still use `obj`, even in the reader. These two things could be parsed as the same value:

  (##obj name "John Doe" age 23 id 73881)
  (##object (v name "John Doe") (v age 23) (v id 73881))

The reason I suggest interspersing extra brackets and v's, when the concise `obj` already exists, is to avoid idiosyncrasies of pretty-printing `obj` for larger examples.

Here's an example of how a nested table prints in the latest Anarki:

  arc> coerce*
  '#hash((bytes . #hash((string . #<procedure:...ne/anarki/ac.rkt:1128:21>)))
         (char
          .
          #hash((int . #<procedure:integer->char>)
                (num . #<procedure:...ne/anarki/ac.rkt:1133:21>)))
         (cons
          .
          #hash((queue . #<procedure:...t/private/kw.rkt:592:14>)
                (string . #<procedure:...ne/anarki/ac.rkt:1127:21>)
                (sym . #<procedure:...t/private/kw.rkt:592:14>)
                (table . #<procedure:...t/private/kw.rkt:592:14>)))
         (fn
  ...

As a human who can easily apply idiosyncratic rules, here's how I'd probably lay that out if I could only use (##obj ...):

  arc> coerce*
  '(##obj
     
     bytes (##obj string #<procedure:...ne/anarki/ac.rkt:1128:21>)
     
     char
     (##obj
       int #<procedure:integer->char>
       num #<procedure:...ne/anarki/ac.rkt:1133:21>)
     
     cons
     (##obj
       queue #<procedure:...t/private/kw.rkt:592:14>
       string #<procedure:...ne/anarki/ac.rkt:1127:21>
       sym #<procedure:...t/private/kw.rkt:592:14>
       table #<procedure:...t/private/kw.rkt:592:14>)
     
     fn
  ...

There are several idiosyncrasies in action there: I'm choosing not to indent values by the length of their keys, I'm choosing not to indent them further than their keys at all (or vice versa), I am grouping them on the same line when I can, and I'm putting in padding lines between every entry just because some of the keys and values are on separate lines.

Oh, and I'm not indenting things by the length of the "##obj" operation itself, just by two spaces in every case, but that's a more general rule I go by.

As far as Lisp code in general is concerned, those seem like personal preferences. I don't expect anyone to indent this quite the same way. Maybe people could take a shot at it and see if a consensus emerges here. :)

Now suppose I could only use `##object`:

  arc> coerce*
  '(##object
     (v bytes (##object (v string #<procedure:...ne/anarki/ac.rkt:1128:21>)))
     (v char
       (##object
         (v int #<procedure:integer->char>)
         (v num #<procedure:...ne/anarki/ac.rkt:1133:21>)))
     (v cons
       (##object
         (v queue #<procedure:...t/private/kw.rkt:592:14>)
         (v string #<procedure:...ne/anarki/ac.rkt:1127:21>)
         (v sym #<procedure:...t/private/kw.rkt:592:14>)
         (v table #<procedure:...t/private/kw.rkt:592:14>)))
     (v fn
  ...

This saves some lines by not needing whitespace to group keys with their objects. In even larger examples it can cost some lines since it introduces twice as much indentation at every level, so that might be a wash. What really makes a difference here is that all those pairs of parentheses can be pretty-printed just like function calls, so things that process the "##object" syntax don't need to make special considerations for pretty-printing it.

---

"Assuming you're attempting to have ordered tables"

In this thread, the original post's example used unordered tables. It doesn't matter to this design. Ordered tables and unordered tables can coexist with different ## names.

-----

6 points by shawn 2360 days ago | link

Done: https://github.com/arclanguage/anarki/pull/141

[foo bar] is now read as (%brackets foo bar) and {a 1 b 2} is read as (%braces a 1 b 2)

The default implementation of %braces is the obj macro, so your example works exactly as written above.

I've merged this into anarki. If this causes unexpected breakage, please let me know.

-----

4 points by kinnard 2360 days ago | link

Wow. Freaky fast. Thanks! I was thinking of going even further and finding out if arc could output tables and read in tables in that structure?

  {todo:({id 1 name "get eggs" done nil} {id 8 name "fix computer" done t})}

looks so much better than the #hash() equivalent and this gets extreme with nested tables. It's also much easier to think through a table structure writing it out.

-----

4 points by shawn 2360 days ago | link

I switched `(write ...)` to `(pretty-print ...)` for repl values. https://github.com/arclanguage/anarki/pull/142

Let me know if that seems sufficient for now.

You're right that Arc still can't read tables written via `write`. That is definitely worth supporting. Here is an example of how it could work: https://github.com/sctb/lumen/blob/55b14ca8aafeaf6b0ca1b636d...

It would be important to ensure that circular structures don't cause an infinite loop, and I'd be nervous about straying too far from Racket's `write` facility. For better or worse, it's a limitation of racket that you can't `read` a table you've written. But it could be worth doing.

-----

3 points by rocketnia 2359 days ago | link

"it's a limitation of racket that you can't `read` a table you've written"

Eh? You definitely can, in Racket. :) The only problem I know of is that when you write a mutable table, you read back an immutable one.

---

"You're right that Arc still can't read tables written via `write`."

Arc's supported this for as long as I can remember. The only problem is that they come back immutable.

Here's Arc 3.2 on Racket 7.0:

  arc> (= foo (fromstring (tostring:write:obj a 1 b 2) (read)))
  #hash((a . 1) (b . 2))
  arc> (= foo!a 3)
  Error: "hash-set!: contract violation\n  expected: (and/c hash? (not/c immutable?))\n  given: '#hash((a . 1) (b . 2))\n  argument position: 1st\n  other arguments...:\n   'a\n   3"

And here's the latest Anarki:

  arc> (= foo (fromstring (tostring:write:obj a 1 b 2) (read)))
  '#hash((a . 1) (b . 2))
  
  arc> (= foo!a 3)
  3
  
  arc> foo
  '#hash((a . 3) (b . 2))

Oh, I guess it works!

It looks like that's thanks to a March 8, 2012 commit by akkartik (https://github.com/arclanguage/anarki/commit/547d8966de76320...)... which, lol... Everything I was saying in a couple of recent threads about replacing the Arc reader to read mutable tables... I guess that's already in place. :)

-----

4 points by kinnard 2358 days ago | link

It'd be cool if this worked with curly brace syntax. You could read in (and write) a file that looked like this:

  {'id 3 
   'c {
      'name "james c clarke" 
      'age 23 
      'addr "1724 Cox Ave. NY, NY 90210"
     }
  }

I think it'd make reading and writing a much better experience.[1]

[1] http://arclanguage.org/item?id=20803

EDIT: I guess this would be called "table literals"?

-----

2 points by kinnard 2356 days ago | link

Does the necessity of quasiquote + unquote feel natural?

  (= tpipe {todo 
              '({id 1 cont "get eggs" done '(nil)}
                {id 23 cont "fix toilet" done '(nil)})
            week '(nil)
            today '({id 83 cont "Build something that works in arc" done '(nil)})

This won't work:

  arc> tpipe!todo.1!id

One must employ (list) or quasiquotation

    (= npipe {todo 
                (list {id 1 cont "get eggs" done '(nil)}
                      {id 23 cont "fix toilet" done '(nil)})
              week '(nil)
              today (list {id 83 cont "Build something that works in arc" done '(nil)})
              done (list {id 44 cont "Research Ordered Associative Arrays" done '(2019 1 21)})})

    (= unqpipe {todo 
                `(,{id 1 cont "get eggs" done '(nil)}
                  ,{id 23 cont "fix toilet" done '(nil)})
                week '(nil)
                today `(,{id 83 cont "Build something that works in arc" done '(nil)})
                done `(,{id 44 cont "Research Ordered Associative Arrays" done '(2019 1 21)})})

-----

3 points by rain1 2347 days ago | link

I think you should have to quote them. Like how you have to quote lists:

    '(foo bar)

is just a list, but

    (foo bar)

will either call the function foo or error if it doesn't exist.

So in a similar way

    {foo "bar"}

should give a syntax error, but maybe it can have some kind of semantic meaning later. I've been considering that square brackets could be used for assignment/local binding, to cut down on the need for LET (not necessarily in arc just in lisp in general).

-----

2 points by i4cu 2347 days ago | link

> should give a syntax error...

I don't agree. quoting a list is a way to protect the expression from evaluation, in this case because round brackets normally indicate an expression that needs to be called. A table literal {...} doesn't need protection from evaluation as a callable expression as it's just data and like any other data it should evaluate to itself. And, frankly, it would really suck having to protect that data everywhere in my code because someone wants a really nuanced use case to work.

Really what should happen is that [] should be implemented such that we don't need to protect lists of data.

-----

2 points by i4cu 2347 days ago | link

> Really what should happen is that [] should be implemented such that we don't need to protect lists of data.

I should point out that I don't think this can happen since square brackets are reserved for other uses in Arc.

-----

2 points by rain1 2347 days ago | link

I see what you mean, making it self evaluate seems like the best option.

-----

2 points by krapp 2347 days ago | link

>For better or worse, it's a limitation of racket that you can't `read` a table you've written. But it could be worth doing.

I've been trying to get the tables generated by the personal data link in news to export as JSON for a while now[0]. Part of the problem seems to be related to this - Racket doesn't seem to know what to do with #tagged tem or #hash (much less nils.)

[0]https://github.com/arclanguage/anarki/blob/master/apps/news/...

-----

2 points by krapp 2359 days ago | link

Are you running the tests before pushing?

I ask because I didn't for my first few commits and it got a bit ugly.

-----

3 points by rain1 2347 days ago | link

It would be nice if it the way it prints out is the same as the way it is written

  arc> (= user '{name "John Doe" age 23 id 73881})
  {(age . 23) (id . 73881) (name . "John Doe")}

  arc> (= user '#{name "John Doe" age 23 id 73881})
  #{(age . 23) (id . 73881) (name . "John Doe")}

where

  arc> #{foo "bar"}

gives a syntax error. extra brackets like [] and {} may be used in macros as special syntax.

-----

2 points by rain1 2347 days ago | link

update: having it self evaluate (or evaluate to a hashtable with quoted keys and non-quoted values maybe) instead of a syntax error seems like a better choice

-----