Arc Forumnew | comments | leaders | submitlogin
2 points by rocketnia 4382 days ago | link | parent

"That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name."

Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't.

---

"If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime."

I did this in Penknife too, but Penknife only had mutable boxes.

Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?

---

"JS doesn't have boxes"

Sure it does! I usually use { val: _ } when I want a box.

You should probably keep using separate JavaScript variables, but here are your examples using boxes:

  // (var foo 1)
  (function () {
      env.foo = { val: 1 };
  })();
  
  // (def bar -> foo + 5)
  (function () {
      var g1 = env.foo;
      env.bar = { val: function () {
          return nulan.plus_( g1.val, 5 );
      } };
  })();
  
  // (var foo 3)
  (function () {
      env.foo = { val: 3 };
  })();
  
  // (bar)
  (function () {
      return bar();
  })();
  
  // (mac foo -> {bar qux corge})
  (function () {
      var g1 = env.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      env.foo = { val: nulan.macro_( { bar: g1 }, function () {
          return [ g1.val, g2.val, g3.val ];
      } ) };
  })();
  
  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  (function () {
      var g1 = env.foo.val.captures.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      return (function () {
          var bar = 5;
          return (0, g1.val)( g2.val, g3.val );
      })();
  })();
(The calls to nulan.macro_() and nulan.plus_() are just placeholders for whatever you would use there.)

Since I'm not using separate JavaScript variables for separate definitions, the original bar is totally out of scope in (let bar 5 (foo)), and I end up having to smuggle it in as part of foo (env.foo.val.captures.bar). I did this in Penknife, and I called the foo->bar path a "lexid." If I had to do it again, I might consider generating unique variables outside every environment, something like what you're doing.

For an easier solution all around, we can call the 'foo macro at run time, rather than relying on the compiler to expand it beforehand.

  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  nulan.eval_( env, [ "let", "bar", 5, [ "foo" ] ] );
This way, 'foo will return its own view of [ bar, qux, corge ], and we don't have to worry about bar being in scope.

As a bonus, this approach allows side effects during macroexpansion to work properly, rather than being forgotten by the compilation process.

The reason I didn't do this in Penknife is because Penknife macros suppressed parsing, the most expensive step, so this kind of compilation wouldn't have done any good.

---

"Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster."

Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now.

---

"I'd still like to hear rocketnia's advice on bytecode, though, for future reference."

I've only done a few things with bytecode, most of which involve reverse engineering GB/NES/SNES machine code and game-specific binary data. I've taken a look at bytecode generation for the JVM a few times, but I know better than to trouble myself with that. :-p

Bytecode is a term commonly used to describe binary formats for programs that run on virtual machines. It's designed to be computationally cheap to execute (e.g. JIT-compile) on multiple CPU machine code dialects.

JavaScript is the only "machine" code you're targeting, so there's not much need to build a compatibility layer. However, you might be interested in fine-tuning for particular JS engines' strengths and weaknesses. For instance, supposedly V8 compiles JavaScript a lot like C code, so it might encourage large blocks of imperative code, few variables, and heavier use of structured control flow than function calls. (Eh, I almost never write C code, so what do I know?)



1 point by Pauan 4382 days ago | link

"Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't."

Ewwww. Slow. And yes I know you only need to eval once per top-level form, but the problem is that means you need to actually keep the top level forms around and the compiler/runtime needs to always be present.

Keep in mind that I want to use Nulan to write Chrome Extensions. Now imagine a popup that occurs when the user presses a button. This popup is coded in Nulan. Now, every time the user presses the button, it has to re-parse, re-compile, and re-eval all of the Nulan code that is inside the popup.

I consider that unacceptable for my desires, where the code should be about as fast as handwritten JS. So, Nulan compiles things ahead of time to a JS file, to achieve those speeds.

---

"Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?"

Well, in Racket, I just use a "let" for local scope, so no, locals aren't boxes. But I could make locals use boxes too. It would just be a lot harder to implement. At that point I might as well just write an interpreter.

And in fact, the interpreted vau version of Nulan does use boxes for all variables, global and local.

---

"Sure it does! I usually use { val: _ } when I want a box."

Yes, but then it has to do the boxing/unboxing at runtime in addition to the variable lookup. Whereas with Nulan's scheme, you get almost all the same benefits, but with just the variable lookup. The only downside is that the boxes are no longer available at runtime (but you can access the fake "boxes" at compile-time).

---

"Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now."

Yes it is a tradeoff. Yes I know it's more static than Arc or the Racket version of Nulan. But with my scheme, I can actually make things feel fairly fluid. As an example, this works:

  (var + (fn ...))

  (+ 1 2)
That is, the runtime function + shadows the macro +, so it doesn't do the macro expansion. And macros can hygienically expand to runtime values, as I already explained, thanks to the variable renaming scheme.

So, the only real downside is that you can't do this:

  (def bar -> ...)

  (mac foo ->
    (bar ...))
That is, you can't actually call a runtime function from inside the macro. But you can still expand to it:

  (mac foo ->
    {bar ...})
And you can also evaluate the "bar" function at compile-time, which would make it available:

  (&eval (def bar -> ...))

  (mac foo ->
    (bar ...))
I consider this to be a perfectly reasonable downside in exchange for increased speed.

-----

1 point by Pauan 4382 days ago | link

Your alternate code looks pretty straightforward, except for these two bits:

  env.foo = { val: nulan.macro_( { bar: g1 }, function () {
      return [ g1.val, g2.val, g3.val ];
  } ) };

  return (function () {
      var bar = 5;
      return (0, g1.val)( g2.val, g3.val );
  })();
Why does it pass { bar: g1 } as the first argument to nulan.macro_?

Why do you do (0, g1.val)? That's the same as just using g1.val.

-----

1 point by rocketnia 4382 days ago | link

"Why does it pass { bar: g1 } as the first argument to nulan.macro_?"

That object becomes the "captures" property in env.foo.val.captures.bar later. In Penknife, that's how I let code have run time access to the lexical scopes of compile-time-expanded macros.

---

"Why do you do (0, g1.val)? That's the same as just using g1.val."

I like to go out of my way not to pass a "this" parameter unless I actually want to pass it. If I were hand-writing this code, I would have probably used a fresh local variable to hold g1.val, but this this (0, g1.val)( ... ) approach is sufficient for compiler output.

-----

1 point by Pauan 4381 days ago | link

You learn something new every day:

  var o = {
    foo: function () {
      return this
    }
  }

  o.foo()       -> Object
  (0, o.foo)()  -> Window
I would have expected JS to be smart enough to return the object in both cases.

-----

2 points by rocketnia 4380 days ago | link

Well, this behavior is the one that surprised me initially:

  o.foo()       -> Object
I expected the property access and the procedure call to be two completely separate steps. Instead, there's a secret avenue that somehow exposes my local variable "o" to the callee without me explicitly passing it in!

After this encounter, I reasoned that a.b( ... ) must be its own syntax, only superficially related to a.b and b( ... ). My choice to use this syntax is explicit. This intuition worked well for a while, but then this behavior surprised me:

  (o.foo)()     -> Object
  ((o.foo))()   -> Object
So it's its own syntax, but it also supports grouping parentheses? What for?

Turns out the spec actually specifies this behavior in terms of "References." An expression doesn't just return a value; it returns a Reference that can be either resolved to a value, assigned to, deleted, passed to typeof, or called as a method call. (Maybe there are some other options too.)

In the spec's own words:

"The Reference type is used to explain the behaviour of such operators as delete, typeof, and the assignment operators. For example, the left-hand operand of an assignment is expected to produce a reference. The behaviour of assignment could, instead, be explained entirely in terms of a case analysis on the syntactic form of the left-hand operand of an assignment operator, but for one difficulty: function calls are permitted to return references. This possibility is admitted purely for the sake of host objects. No built-in ECMAScript function defined by this specification returns a reference and there is no provision for a user-defined function to return a reference. (Another reason not to use a syntactic case analysis is that it would be lengthy and awkward, affecting many parts of the specification.)"

An ECMAScript Reference is similar to what I independently called a "fork" in Penknife. Forks were my approach to unifying Arc-style metafns, setforms, and macros into a single syntactic mechanism. A Penknife syntactic abstraction would return a fork value, and that value would have behaviors for parsing a macro call body, for setting, for getting, and for acting as a top-level command. The result of the parsing behavior would be another fork, so this branching process could handle multiple layers of brackets, metafn style. (Now I understand this as a monadic, coinductive approach. Yay big words.)

I've compared JavaScript's method invocation to metafns before: http://arclanguage.org/item?id=12094. I was somewhat supportive of the combination in that post, but I think Arc, JavaScript, and Penknife just use this technology for name punning, rather than resolving any essential conundrums of language design, so I'm not very enthusiastic about the technique in general.

P.S.: I've also casually explained this part of JavaScript's method invocation to you before: http://arclanguage.org/item?id=14665 But it was technically a different example, so no worries. ;)

-----

1 point by Pauan 4380 days ago | link

Yes, the whole "this" system in JS is pretty insane. Naturally Nulan solves this by requiring you to explicitly pass in the object. This also gives you the "call" and "apply" behavior of JS for free, in an orthogonal way.

P.S. The JS version of Nulan is coming along quite nicely. 25 working macros and 10 semi-working macros. It can handle a ton of stuff already, but it still has a long way to go!

-----