Hi.
I've read the paper. From what I understood, the core idea is that source code maps to ast directly. Which is what lisp is already doing.
You say everything is a tree, but isn't this already so? Json, html, lisp, just to name a few languages. You call it 2D: an advancement in the y axis makes a "sibling" and in the x makes a "child". But children and siblings as a concept already exist, even though I haven't heard it called 2D programming yet.
Let's look at the example in the paper:
title Jack and Ada at BCHS
visitors
mozilla 802
This is basically a whitespace based syntax for lisp.
((title Jack and Ada at BCHS) (visitors (mozilla 802)))
So, I'm sorry, but I see no innovation in this new approach. Of course I could be wrong.
I'll comment your points in the paper:
1. ETN uses fewer nodes. Why? It maps to the same tree as the languages I already mentioned.
2. No parse errors. I have to say that I was writing in c++ recently, and the parsing wasn't an issue. And as you say, it doesn't make nonsense programs correct.
3. No semantic diffs. I didn't really understand, to be honest, but "just one way to encode a program" is wrong, imho. There are always many ways to the same thing. For example "a + b" is the same ad "b + a", and "a <= n" is the same as "a < n + 1" (when "a" and "n" are ints).
4. Easy composition. Python uses indentation, which collides with the syntax of TN. As long as you have reserved characters(and not having is probably impossible) you'll have to do something with languages that use them.
By the way, the whole idea seems to be that you can have nicer syntax, while I think the biggest advantage of writing ast directly is that you can create syntax. Obviously I could be wrong, I could have missed the point, it's perfectly possible.
Though it's hard once again to understand. When are two nodes coincident? By definition you can only have one character in one place on the screen. Is he talking about indentation, that the same level of indentation can mean different things? That seems true of ETN as well, from what I can tell.
The pencil scratchings on the screenshots don't help either.
Sorry, the lines in the geometric mapping of the source code are coincident. In drawing B) in the visual proof, the edges which connect the child nodes to their parents intersect and/or are coincident. So when you put Lisp source code onto graph paper, and draw boxes around the nodes, and line segments for edges, it shows why Lisp source is not a geometric language (I define a geometric language as one where there are no intersecting or coincident line segments).
Now, figure A) shows the same Lisp code, formatted differently, in a way that is a geometric language. But as you can see, that code is standard TN/ETN. Or perhaps another way to put it is ETNs are just Lisps with a whitespace syntax and no parentheses. Another reader on HN pointed me to I expressions (https://srfi.schemers.org/srfi-49/srfi-49.html), which I hadn't seen before and is 90% of the way there to TN and ETNs. The creator of I-Expressions have communicated briefly over email now and are going to be talking soon.
Anyway, perhaps another term for Tree Notation/ETNs is "Geometric Lisp", or "2-Dimensional Lisp". I'm not wedded to the terms TN/ETNs, although I do think it's better to have new terms, because I think these will come to dominate the usage of Lisp.
Still working on more updates and evidence on why I think these will be so big.
Thanks for the comments! The thing that z, readable, and wisp all lack, is that with ETNs every node has a point (x,y,z). So in addition to your source code mapping directly to your AST, it also maps to physical, geometric space (The Z axis is your program/document, the y axis is your line number, and the x axis is the column #/indent level).
So what you now have with ETNs, is the power of Lisp, now in a regular 2D/3D "structure", that you can inspect, visualize, and manipulate in new ways.
> 1. ETN uses fewer nodes. Why?
That's not compared to Lisps. Sorry, that was not clear. The paper was targeted toward a broader programming audience.
> 2. No parse errors. I have to say that I was writing in c++ recently, and the parsing wasn't an issue. And as you say, it doesn't make nonsense programs correct.
Ohayo briefly shows the power of no parse errors. Your program is parsed by little "micro-parsers", so there's no monolithic parse failure and programs can recover/autocorrect gracefully. More demonstrations here will do a better job of explaining. More to come.
> 3. No semantic diffs.
It's actually only "Semantic diffs" (not "not semantic diffs"). As opposed to syntax diffs. So "(+ 1 2)" and "(+ 1 2)" mean the same thing semantically in Clojure, for example, but git will give you a 1 line diff because of the whitespace syntax diff. In an ETN, "+ 1 2" and "+ 1 2" generally would be different, and could cause an ETN error unless the ETN allowed blank words.
4. Nothing collides. After years of going through every possible edge case, you cannot break TN. There is nothing that collides with the syntax. See for yourself. Download the library and create a TN with it's line/children set to something you think will collide. The indentation and ability to do "getTailWithChildren" takes care of all possible edge cases. (Sorry, I realize I still need to explain this better as this is a common concern and it is very surprising to people (myself included), that there are no edge cases that don't work.
Looks like Scala has some experimental macro feature: http://docs.scala-lang.org/overviews/macros/overview.html. I imagine people will be about as likely to use it as they've been to use previous non-lisp macros (i.e. not very likely).
OP isn't about macros because the output of the function isn't automatically evaluated. It's a step in the direction of allowing any programmer to perform compiler transformations and optimizations on his/her own code. I'm very interested in that area.
Er, I upvoted your "sorry" just now because I did find it interesting that those archive links exist. :) I guess the Racket project doesn't have such tireless devotion to documentation that they maintain active versions of the docs for old software releases, but it's nice that a snapshot is up somewhere.
This approach (everything is a service) has pros and cons: with it you can easily overload operators, and it gives you the ability to extend infix notation (while in programming languages with c syntax only the built in language operators have infix notation, and your functions have the functions(arg1,arg2,...) syntax). But hilvl is quite verbose. Usually arguments against static typing include the fact that typing the type name is verbose. In hilvl you have to type two words: @ and var. And to call a function (action) you have to always type the service. This leads to a lot of @, which isn't very good. If you look at the recursion example [1] you will see that every line starts with @. The other two examples, and the server code aren't much better. There should be a trick to have less @'s.
Apart from that, the absence of documentation isn't nice. I wanted to see the IO actions, and I had to find it's definition in the js source.
Thank you for taking the time to look at the language. Yes, I agree that there is a problem with a lot of @s and vars littering the code. A consequence of the small syntax seems to be that working with variables and scope is very verbose. My goal with the language is to keep the syntax very very small, but at the same time be able to express the typical things we expect from modern programming languages. Maybe some syntactic sugar could help. But I am not sure what kind of sugar would work best to avoid all the @s. Any ideas?
And yes, the documentation of system provided services is indeed lacking. I have made an issue for it and plan to address it asap: https://github.com/holgerl/hilvl/issues/9
I have an idea, but I'm not sure it's ok: Since @ is used most of the time, it could be the default.
To do that you might make a rule that all services must start with a capital letter(or a number or " or be "true" or be "false"), and only services can.
Then you can make:
foo bar baz
change to:
@ foo bar baz
while:
Foo bar baz
Remains the same. If you don't like the idea of every service starting uppercase, you could just check if the word is a service.
Now an unrelated issue, accessing and modifying a variable:
@ set foo = 4
Even if we ignore the @ (because it's a different problem):
set foo = 4
It would be better to write just:
foo = 4
And it makes sense: variable service foo, action =, argument 4.
Now accessing a variable:
4 + @.foo
I should be able to write:
4 + foo
This is a bit more complicated.
To allow the foo = 4 part foo must be a variable service.
But in this case it should act as a number service.
I see two ways to solve this: or variable services gain new actions based on their type,
or when a service is called without an action, a standard action is called, which by default returns the service itself, but can be overwritten, for example in variables, where it returns the variable value.
That is a good idea. And it will make the code much prettier. Here is what I think the fibonacci example would look like:
var fibonacci :
var scope :
var result = (.argument)
.argument > 1 then
set result =
fibonacci (.result - 1) + (fibonacci (.result - 2))
.result
scope (.argument)
fibonacci 7
//result: 13
But, unfortunately the language will be much more constrained if all services must consist of letters. I have an explicit goal that any service name goes except some very few reserved characters. The reason is that it should be possible to create DSL-like APIs with service names like "$", "&", "->", "+", or even "{" and ";".
As you have suggested, a solution for this could be to instead check if the service exist at all in the scope. Service names that do not exist can be assumed to be actions on the @ service. This is a more unconstrained solution. But it means that all actions on @ can never be used as service names. This makes a dilemma between unconstrained, small syntax and prettier code with quirky reserved words. Do you agree?
Now for your other suggestion. Yes, I agree! I have thought about it before, and I think it might have no drawbacks. The fact that you also noticed it makes it even more probable that it is a good solution. The challenge will then be, as you also noticed, how a variable service can be the variable and the value at the same time.
I think it is not good enough to simply check if there are no actions on foo, because then foo must always be the last value. I.e. this has to work:
foo + 4
Can you explain more on how your other solution to this would work? I did not quite understand it.
By the way, if variable services could also represent their value, it should be very simple to implement some sort of lazy evaluation in the language. That would be cool.
"But, unfortunately the language will be much more constrained if all services must consist of letters. I have an explicit goal that any service name goes except some very few reserved characters. The reason is that it should be possible to create DSL-like APIs with service names like "$", "&", "->", "+", or even "{" and ";"."
Yes, probably you are right here.
"As you have suggested, a solution for this could be to instead check if the service exist at all in the scope."
Yes, actually I thought of having all service names starting uppercase to check if a name is a service faster.
"Service names that do not exist can be assumed to be actions on the @ service. This is a more unconstrained solution. But it means that all actions on @ can never be used as service names."
I think you should be able to shadow @'s names, and then, if you need them, call them with the @.
"This makes a dilemma between unconstrained, small syntax and prettier code with quirky reserved words. Do you agree?"
I think that if you use the approach I just said (you can shadow the @'s names) you wont make your language more constrained. However, yes, you will add new syntax, which you consider a bad thing.
"Now for your other suggestion. Yes, I agree! I have thought about it before, and I think it might have no drawbacks. The fact that you also noticed it makes it even more probable that it is a good solution. The challenge will then be, as you also noticed, how a variable service can be the variable and the value at the same time.
I think it is not good enough to simply check if there are no actions on foo, because then foo must always be the last value. I.e. this has to work:"
Yes, it would not be an improvement.
"Can you explain more on how your other solution to this would work? I did not quite understand it."
Imagine a service which supports both int/number's actions and variable actions. You could write:
foo = 4
and it would call the variable action =. And here:
foo + 4
it would call the number action +.
However in this case:
4 + foo
things are more complicated. I looked at your "+" implementation, and it expects that it's argument is a js number. You can make a check, like if the argument isn't a number and it is a service, you try to call a getvalue action, and if it returns a number, you use it. Or maybe change something about accessing the value of an number (or string), but I don't know how services are implemented (yet), so I can't tell precisely.
> I think you should be able to shadow @'s names, and then, if you need them, call them with the @.
Ah, that is a good idea! That can work
> Imagine a service which supports both int/number's actions and variable actions. [...]
I see. I really think that would be an awesome improvement of the language. I have made a git issue for it. And when I have the time I would love to branch the code and make an attempt at it.
Really appreciate your input on the language. Now I just need some spare time to do these things :)
Yes, I wondered what to do about that and who if anyone cared about all those features. Then I forgot :/ I'll create a more bare-bones but working script today.
Edit 15 minutes later: I've made the flag to disable rlwrap '-n' like in the master branch.
(I didn't pick the original flag, so I'm not attached to that name. I can change it if you want, I just want both branches to be consistent. I also renamed the script to 'arc' like in the master branch, just to make my life easier. I'll update the instructions at https://arclanguage.github.io next.)
Now it works both from the command line and from emacs.
It's ok for me that the the flag is -n.
I think that the arc script was named "arc.sh" because the folder which is used by the news server is "arc", and it would conflict. We can either take back the "arc.sh" name or change the news server's directory to something else, perhaps "www" like in the master branch.
Edit: We should also change the flag of the default program name in inferior-arc.el (line 95):
The iter macro doesn't work: when it reaches the last iteration it calls next which increases i to (len list) and then tries to get that element, causing an error.
Edit:
The iter macro has another problem: it evaluates list to many times:
In principle I don't have a fully general solution yet :/ It's something I'm working on.
What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code. Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application. Then I would bundle the application with all its libraries included.
Of course this doesn't scale to large libraries, because managing a fork today involves an amount of work that ranges from non-trivial to intractable. But this would be my ideal.
Bear in mind that it's only a hard problem for collisions in the interface of the two libraries. Functions that are used only internally can be wrapped inside closures so they're only accessible to the library that cares about them.
I've been noticing continuities between social code distribution, modularity, and variable scope. A guiding example is code verification:
Unrecorded reasoning, existing mainly in our minds.
-->
Codebases dedicated to proofs or tests.
-->
Proofs or tests located in the codebase they apply to.
-->
A type/contract declaring a module interface.
-->
A type/contract annotation for a function definition.
-->
A type/contract annotation for an individual expression.
-->
A type/contract annotation for an individual built-in operator, but at
this point it becomes implicit in the operator itself, and we just
have structured programming, enjoying properties by construction.
Verification is a simplified version of a build process; it's just a build with a yes or no answer. So the design of a build system has similar continuity:
Unrecorded how-to knowledge, existing mainly in our minds.
-->
Codebases or how-to guides dedicated to curated builds (e.g. distros).
-->
Build scripts and docs located in the codebase they apply to.
-->
Macroexpansion-time glue code, importing compiler extensions by name.
-->
Load-time glue code, importing runtime extensions by name.
-->
Service-startup-time glue code, obtaining dependency-injected fields
by name.
-->
An expression, taking free variables from its lexical scope by name.
(This is a build at "evaluation of this particular expression" time.)
There might be some rough parts in here. I might be taking things for granted that I don't want to, like taking for granted that we want unambiguous named references from one module to another. My point with this continuity is to note that if I don't want named imports, then maybe I don't want named local variables either; maybe tweaks to one design should apply to the other.
And this means that even local syntactic concerns extrapolate to social decisions about how we expect to deal with our unrecorded knowledge. Every design decision has a lot to go by. :)
---
Another exciting part is that I think nested quasiquotation shows us a more general theory of lexical locality. If we're dealing with syntax as text, then locations in that text have an order, and we can isolate code snippets at intervals along that order (and mark them with parentheses). Intervals are partially ordered by containment, so we can isolate code snippets at meta-intervals between an outer interval and multiple nonoverlapping inner intervals (and mark them with parentheses with nonoverlapping parentheses-shaped holes: quasiquotations).
That "nonoverlapping" part seems awkward, but I think there's a simple concept somewhere in here.
With this concept of intervals, I'm considering higher degrees of lexical structure past quasiquotation, and I'm considering what kind of parentheses or quasiquotations would exist for non-textual syntaxes.
A module system deals with a non-textual syntax: The syntax of a bundle of modules. If the modules have no order to them, then we don't even have parentheses to work with, let alone quasiquotation. But they can have an order to them. We can impose one from outside:
Module A precedes module B.
And anything we can impose from outside, we might want to add as a module:
Module A says, "..."
Module B says, "..."
Module C says, "Module A precedes module B."
This is prone to contradictions and ambiguities. If we can say how to resolve these ambiguities from the outside, we should be able to do so as a module:
Module A says, "..."
Module B says, "..."
Module C says, "Module A precedes module B."
Module D says, "Module B precedes module A."
Module E says, "If module C and module D disagree, listen to module C."
Module F says, "If module C and module D disagree, listen to module D."
Module G says, "If module E and module F disagree, listen to module E."
This should lead to a very complete system of closed-system extensibility: For any given set of modules, if the set's self-proclaimed ordering between A and B is currently unambiguous, then we might as well listen to it! If we don't like it, we can add more contradictions and disambiguations until we do, right up to and including "Ignore all those other modules and do it like this." :)
With this ability to disambiguate when things go wrong, we can model lexical scope:
Module A says, "Export foo = (import bar from system {B, C})."
Module B says, "Export foo = 2."
Module C says, "Export bar = foo + foo."
Result: foo = 4.
While both A and B have an export named "foo," this conflict is disambiguated by the fact that module A is treating {B, C} as a local scope. I intend this to mean that bar isn't at the top level either.
If we really want access to bar at the top level, we can refer to it again, and we can even be sloppy about it and make up for our sloppiness with disambiguations:
Module A says, "Export foo = (import bar from system {B, C})."
Module B says, "Export foo = 2."
Module C says, "Export bar = foo + foo."
Module D says, "Export all imports from system {B, C}."
Module E says, "If A and D export the same variable, listen to A."
Result: foo = 4; bar = 4.
If we want, we can have the top-level bar see the version of foo exported by A, even though the version of bar used by A still uses the foo from B:
Module A says, "Export foo = (import bar from system {B, C})."
Module B says, "Export foo = 2."
Module C says, "Export bar = foo + foo."
Module D says, "Export all imports from system {A, B, C}."
Module E says, "Export all imports from system {C, F}."
Module F says, "Export foo = (import foo from system {A, B, C})."
Module G says, "If D and E export the same variable, listen to D."
Result: foo = 4; bar = 8.
Not easy enough to extend? Define some structure. Write modules that assign folksonomic tags to other modules or themselves, and then refer to the system of all modules with a given tag. Write modules that act as parentheses, and write modules that determine enough of an order to decide which modules those parentheses contain. Here's an example of the latter:
Module A says, "Export foo = (import bar from range R1)."
Module B says, "Export interval R1, and begin it here."
Module C says, "Export foo = 2."
Module D says, "Export bar = foo + foo."
Module E says, "End interval."
Module F says, "These modules are in order: B, C, D, E."
The flexibility is obviously really open-ended here, and it's going to be a challenge to make this a well-defined idea. :-p
>What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code.
Do you want to avoid the situation, wich happens in c, where when you need the sqrt you have to include the whole math file? I totally agree.
>Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application
This looks right to me. Perhaps it could be something like python's
from library import function as good_name_for_your_project
>Then I would bundle the application with all its libraries included.
I'm not sure making a (even not full) copy of a library is a good idea because it would lead the user to have many copies of the same libraries. On my windows machine I ended having 4 versions of python! I think that common parts should be in common.
> from library import function as good_name_for_your_project
What's happening here is that you're a) adding a feature in Python to support 'from..as', b) including an external library and c) continuing to keep around an old name that you don't really care about. You're essentially preserving the old name just because other people who your application doesn't care about use it.
Imagine a world where maintaining forks was tractable. Would this still be a good idea? Why not just do a search and replace and maintain a private fork, eliminating all this complexity in your private stack? Just delete 'from..as' from your private Python! :o)
> I'm not sure making a copy of a library is a good idea because it would lead the user to have many copies of the same libraries.
Yes, this is a fundamental difference in outlook/ideology. I think that copying isn't always bad. We culturally tend to emphasize the issues with copying a lot more than the costs of avoiding duplication.
A degenerate example is to observe that there are tons of 'e's in the novel I'm reading and try to deduplicate them. That is of course obviously farcical, but it at least serves to illustrate that there's a trade-off, and that always DRY'ing your code isn't obviously a good idea. Another example is to observe that the internet has many copies of the same libraries running at any given time. You can argue that they're on different machines, but then imagine a 'machine' consisting of multiple cores and private caches and non-uniform memory access and RAID-partitioned disks. Changing latency costs can make it reasonable to maintain multiple copies of some immutable data in a single 'machine'. Now consider that development is yet another cost that is open to variation. If (automatically) creating copies of something eases development, it's at least worth considering. For example, optimizing compilers can sometimes specialize a function differently for different callsites. That's duplication often inside a single binary, and it makes sense in some contexts.
The npm eco-system promiscuously duplicates dependencies inside the node_modules/ directory, so that is at least some evidence that the approach I'm suggesting isn't too insane :)
Ok, this maybe could be the way to go. Adapting little libraries isn't a problem, and it probably makes your program better. This defeats collisions, useless code and is ok for autoloading. But this approach will work only if our libraries will be small enough. For now this is ok.
Duplicating libraries isn't a problem: disk space for ease of development is an exchange which is getting more and more convenient.
For autoloading: the interpreter/compiler could load all .arc files in current directory (or current-directory/lib), or scan them for function definitions (without loading them) and making elisp autoload automatically for every function. I prefer the first option.
One possibility for this bundling is that Arc looks first where it would expect a library to be (in an equivalent of npm_modules), then looks for it in the usual place (/usr/lib or wherever).
Or, if it all needs to be bundled, you could have symlinks for the libraries you don't change.
Thing is...this is about as verbose as you can get!
If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)
[As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well. So we can all chill.]
Quality of a name is relative to a purpose. The more public we go, the more meanings compete for a single name, making us resort to jargon. If a language really only uses homogenous intensional equality, being able to call it = is a relief. If someone wants to build a side-by-side comparison of several versions of an extension, they might prefer for some of the names to be different in every version while others stay the same.
But it's not just names per se. In that side-by-side comparison, they might also want to merge and branch parts of the code whose assumed invariants have now changed; invariants can act as Schelling points, like invisible names. Modifying code is something we do sometimes, and I think akkartik wants to see how much simplicity we'll get if everyone who wants a simpler system has the tooling support to modify the code and make it simpler themselves.
Personally, I find it fascinating how to design a language for multiple people to edit the code at the same time, a use case that can singlehandedly justify information hiding, modules, and versioning. But I think existing module systems enforce information hiding even more than they have to, so that in the cases where people do need to invade that hidden information, they face unnecessary difficulties. I think a good module system will support akkartik's way of pursuing simplicity.
But... my module system ideas aren't finished. At a high level:
- You can invade implementation details you already know. You can prove this by having their entire code as a first-class value with the expected hash.
- You can invade implementation details if you can authorize yourself as their author.
"If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)"
Not necessarily. 'Good' and 'bad' are not absolute, they are extremely contextual. A name that is good for a general-use library might be sub-optimal for your application, or vice versa. Subjective taste is also a thing. So while you should certainly send out a pull request for the change, our model of the world shouldn't rely on the change actually getting pushed.
In general it is amazing to me how often a blindingly obvious Pull Request gets rejected or just sits in the queue, untouched. There's lots of different kinds of people out there. Which is why I tend to think more like a barbarian[1] about collaboration: think of other people as islands with whom you might collaborate if the stars align. But don't rely on the collaboration. Be self-sufficient.
"As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well."
I actually interpreted your original post that kicked off this thread as implicit loading since Arc has no notion of modules or import. So the question of changing names did not arise. That seemed like a tangent to the original question.
These seem like separate questions:
1. Should Arc know how to react with implicit symbols?
2. Should Arc provide namespaces?
One the one hand, you can have implicit loading without needing a module/namespace system. On the other hand, I don't see how you can have implicit loading in the presence of namespaces. Without the "from..as" construct how would your system know which library to load a symbol from, if there's a collision?
Summary: even if you have namespaces, you're still going to be doing your own collision-detection if you want implicit loading. What's the point of a module system then?