Arc Forumnew | comments | leaders | submit | raymyers's commentslogin

I get a stack overflow when running welder in Java 6. I can send the stack trace if need be.

Works fine in Sun's Java 5 though. Great work!

-----

4 points by conanite 6021 days ago | link

> I get a stack overflow when running welder in Java 6

Oh dear. I haven't installed java 6 yet. Please do send the stack trace anyway, maybe I can figure out what's going on.

> Works fine in Sun's Java 5 though. Great work!

Thank you!

-----

2 points by raymyers 6069 days ago | link | parent | on: Reducing Parentheses

On a related note, try out Paredit mode in Emacs and see if the nesting-level still bothers you.

-----

1 point by eds 6069 days ago | link

Just tried Paredit and I really like it. Thanks!

-----

3 points by raymyers 6086 days ago | link | parent | on: CL-Arc: Arc Compiler in Common Lisp

Besides access to all the CL libraries, I think it would be worth doing for speed alone. Not everyone has (yet) used Arc in a situation where the speed became an issue, but it does happen pretty easily (just ask almkglor).

In SBCL, I am consistently able to achieve results within 30% of C programs. Without type annotations the speed is still quite respectable.

If we ever want Arc to be usable in the wild for a diverse range of applications, we will need a compiler that can generate fast code. This is one way of getting there.

-----

4 points by almkglor 6086 days ago | link

Well, there are still ways to break the speed barrier. For instance the main reason I made cached-table was in order to cache the results of parsing the wikitext in Arki (but not keeping them for too long, since that would simply be a waste of memory). This is even arguably the "correct" thing to do and I might not have done it yet if Arc hadn't been so slow ^^.

"When I was a noob, I talked like a noob, I thought like a noob, I reasoned like a noob. When I became a hacker, I put noobish things behind me. Now we see but a poor implementation as in an alpha; then we shall see code to code. Now I code in part; then I shall code in full, even as I am full of code. Now these three remain: hubris, impatience, laziness. But the greatest of these is laziness."

-----

1 point by raymyers 6086 days ago | link

Larry Wall, I think.

Yes, you can always optimize your Arc code to mitigate the problem. My point still stands though. If we want to be able to use Arc in production situations, we will eventually crave an industrial-strength Arc compiler.

This becomes even more obvious if you take seriously the notion of "A Hundred-Year Language".

-----

1 point by almkglor 6086 days ago | link

Actually a paraphrase of the Christian Bible's 1 Corinthians 13:11-13, although yes, the three virtues are the Larry Wallian ones ^^

-----

1 point by eds 6086 days ago | link

Wow, did you come up with that yourself? I'd like to quote it if you don't mind.

-----

1 point by almkglor 6086 days ago | link

Err, yeah. I tend to pattern match on lots of things - I've got some weird takes on Buddhist philosophy somewhere on the boards too.

-----

1 point by wfarr 6086 days ago | link

We ought to focus on what gap Arc is really meant to fill: quick prototyping, and easy deployment of applications. It is not designed to crunch numbers at the speed of light, or cure cancer. Web applications simply don't need the kind of speed that C et al. do. If they did, we wouldn't be using Python or Ruby either.

We should focus on improving what Arc does, rather than trying to morph it into the behemoth Swiss Army knife that is Common Lisp.

Not to discourage you from your own pursuits regarding this, but I think as a whole, the Arc community would be better off putting its focus elsewhere.

-----

5 points by raymyers 6086 days ago | link

I can't speak for the whole of "the Arc community" any more than you can, but...

I'm not trying to turn Arc into Common Lisp, I'm advocating making Arc fast. There's a difference.

CL was a behemoth because it was a union by committee of all the cruft in every popular Lisp dialect circa 1980. One of Arc's goals is to design from a clean slate. There is not a pressure to conform to bad or inconsistent decisions of the past. None of that has anything to do with performance.

The question becomes: Do we want to make a real general purpose language, or do we want Arc to become "yet another scripting language"?

This is a long term question. We may not need a compiler tomorrow, but I'm not ready to say for the long haul: "Arc is for exploratory programming and rapid prototyping. It will never scale. You shouldn't write ray tracers in it, or image processing, or games. It's fine for web apps -- unless they get allot of traffic."

-----

3 points by sacado 6086 days ago | link

The funny thing is that, even in some of my webapps, I always needed some more speed, at one moment or another. The last example I have in mind : I wanted to generate a chart showing the use of a big system. As the chart's look depends on a few criteria given by the user, it has to be generated on demand. Well I had to dig into a DB containing millions of items, then mix these items together (that couldn't be done in SQL) and finally generate the chart.

Glad I had Psyco there. If I hadn't had it, or at least Pyrex, I would have probably dropped Python because writing C extensions for it is quite painful. And that's also the reason why I never really used Ruby, despite its cool features.

And I don't want to write prototypes and say : "Hmm... My code is working now, let's write it in a serious language for the production version".

Look at Scheme anyway. We really can't say it's a language focused on speed or designed to crunch numbers. Well, look at Ikarus. Oh, yes, for number crunching, you might prefer Stalin. Even C looks slow when compared to Stalin.

Of course, this shouldn't be the main focus of the community, and I don't even think the language should be designed with speed in mind (well, a little of course, or else we would have dead slow numbers implementd with lists of symbols) but that it should be seriously taken into consideration.

-----

3 points by Jesin 6084 days ago | link

Yes, exactly. I think a major cause of all this railing against optimizations is all the newbies who have just learned to write programs running around shouting about efficiency. I was one of those just a couple of years ago. The problem with these newbies is that they're naive. They decide that something is fast or slow based on how efficient its most naive implementation sounds. It seems that they grasp the vague idea that optimized code tends to be longer than code that is not optimized, but rather than responding to that by not trying to optimize until they know what parts of the code are slow, they respond by assuming that approaches that take more lines of code are more efficient and therefore better.

A misplaced focus on speed is bad, and you should get it working before you make it fast, but that doesn't mean speed is a non-issue. If a program is slow enough to cause annoyance, that is a problem, and it should be fixed. Languages have to pay extra attention to speed issues. If programs written in a language are slow because of the language, and not because the programs themselves are badly written, there's something wrong with the language.

Another thing that newbies don't get is that well-built languages are usually optimized so that the more obvious and more commonly-used approaches are actually faster than that tangled mass of "optimized" code you just wrote. Profiling profiling profiling. Don't just guess.

So, the points are: performance should be a secondary concern, but secondary is still pretty high up on the list, and optimization should be based on information gathered with a profiler, not what sounds efficient or inefficient. Sorry for rambling, I hope this post contributes something to something. I just have a tendency to spew everything I have to say about a topic all in one place every now and then, even if only some of it is relevant. I guess you could boil this post down to a "me too", but only if you boiled it a lot.

-----

3 points by almkglor 6086 days ago | link

Yes, but if we want quick prototyping, we also want transformation of the prototype to something we'll use, which might need to be faster than our mockup. Take for example Arki: the wikitext parser was written easily using a quick prototype on raymyers' treeparse, but in actual use, the wikitext parser was too slow. Unfortunately, there's no easy way to transform the parser structure to faster techniques (such as state machines), without doing it by hand, which rather undermines the purpose of the quick prototype. By optimizing the underlying parser, however, the prototype was still useable, and by exposing a few more particular combinations (such as nil-returning sequences, for when we only care that a syntax exists or doesn't exist, not the individual charactes of the syntax), we can refine the prototype into something faster.

-----

2 points by stefano 6086 days ago | link

Ikarus is a compiled scheme implementation. Porting Arc to it would lead to a real speed gain, without rewriting ac.scm in Common Lisp.

-----

2 points by sacado 6086 days ago | link

I tried that. Not very easy as Ikarus' hash-tables don't work like mzscheme's (there is no "equal" hash-tables). You would have to rewrite the reader too. And I'm not sure we would get all the stuff with sockets and networking. It doesn't have an FFI yet, either.

Well, actually, I'm not sure if implementing a still-designed language in the beta-release of a compiler is a long-term solution... :) Maybe in a few months / years this could be done ? But as for now, I think porting it to CL would be easier.

-----

4 points by stefano 6086 days ago | link

If you look in the long term run the best (and most difficult) solution would be to write an Arc compiler that translates Arc code directly in machine code.

-----

1 point by almkglor 6086 days ago | link

I also suggest that. In fact, looking at Chicken's implementation - stack == heap - is rather inspiring, because it shows exactly how a garbage-collected memory manager should be done: just decrement a pointer, in this case the stack pointer. Brilliant IMO. Wish I'd thought of that.

-----

4 points by raymyers 6088 days ago | link | parent | on: IDEA: Syntax checker

Macros that throw syntax errors at macro-expansion -- not a bad idea. I had actually experimented a bit with this.

  arc> (load "lib/treeparse.arc")
  nil
  arc> (let (s clause forms) nil
         (= forms (filt [list:cons 'do _] (many anything)))
         (= clause (filt car (list (seq anything forms))))
         (= s (filt [cons 'if _] (many clause)))
         (mac cond clauses (parse-all s clauses)))
  #3(tagged mac #<procedure>)

  arc> (cond nil 1 t 2)
  Parse error: extra tokens '(nil 1 t 2)
  nil

  arc> (cond (nil 1) (t 2))
  2
Not the most meaningful error message in the word, but it's a start.

-----

1 point by stefano 6087 days ago | link

Interesting. This could eventually evolve.

-----

1 point by raymyers 6086 days ago | link

As I have remarked before, treeparse was influenced by Haskell's parsec. One of the features of parsec is the optional infusion of more meaningful error messages into grammars. One could easily imagine adding this functionality to treeparse.

-----

1 point by almkglor 6086 days ago | link

Interesting. I suppose this means that (at least for the cps version of treeparse) a fail must also "return" an error value (potentially just a "Expected lit %c, not found"). Of course 'alt parsers would ignore the error message, and generate its own when all alternatives fail ("No alternatives found"). Hmm. I think if the fail continuation returned an error message, we could create a filt-style parser (for the cps case):

  (def onerr (parser msg)
    (fn (remaining pass fail)
      (parser remaining pass
        (fn (_) ; ignored message
          (fail msg)))))
For the monadic version, our old nil-as-fail code would have to be changed I suspect; basically instead of the old type Return = Return parsed remaining actions | nil, we'll need type Return = Return parsed remainig actions | Failed message .

-----

1 point by raymyers 6089 days ago | link | parent | on: Profiler on Anarki

Dang, I thought I fixed that bug :P

The last line of profiler-exit is supposed to prevent that.

  (def profiler-exit ()
    (let (name ms) (pop *profiler-stack)
      (let total (- (msec) ms)
        (++ (*profiler-times name)
            (- total *profiler-child-time))
        (= *profiler-child-time
           (if *profiler-stack total 0)))))
Did you do something weird, like Control-C out of a profiled call?

-----

1 point by almkglor 6089 days ago | link

No. But if you're talking about weird stuff, well yes: bolded-text and italicized-text also refer back to formatting via delay-parser:

        (*wiki-pp italicized-text
          (seq italics
               (filt in-italics
                     (many:seq (cant-see italics) (delay-parser formatting)))
               italics))
   ...

        (*wiki-pp formatting
          (alt
            (pred [let c (car _)
                    (or (alphadig c)
                        (whitec c)
                        (in c #\, #\.))]
                  anything)
            wiki-link
            bolded-text
            italicized-text
            ampersand-coded-text
            nowiki-text
            (filt [cons (escape:car _) nil] anything)))

-----

1 point by raymyers 6088 days ago | link

Crikey! Do you think delay-parser is the issue? Let me know if you can reproduce this.

-----

1 point by almkglor 6088 days ago | link

I don't think it's 'delay-parser, per se, just the interrecursion: 'formatting calls 'bolded-text calls 'formatting. I think this is what's making the profiler give weird numbers.

-----

1 point by raymyers 6086 days ago | link

I am having trouble to replicating your problem with mutual recursion. I get the expected result with this mutually recursive test. Can you give me an example that reports incorrectly?

  (def foo (li)
    (if li (do (sleep 0.25) (bar (cdr li)) (sleep 0.25))
        (sleep 2)))

  (def bar (li) 
    (if li (do (sleep 0.25) (foo (cdr li)) (sleep 0.25))
        (sleep 1)))

  (profile foo bar)
  (foo '(a a a a a a a))
  (profiler-report)

-----

1 point by almkglor 6086 days ago | link

Hmm. I'll check again later. In the meantime, what I did in the great-grandparent post was just this, on a fresh start:

1. turn on profiling in wiki-arc/wiki-arc.arc, by setting the variable whose name escapes me at the moment to t

2. Load wiki-arc/wiki-arc.arc and (thread (asv)) the server.

3. Load a single test page, with various formatting, including '''a bolded text's attempt to ''confuse the parser''''', [[link]]s, [[sample link|links with different text]]s etc. etc.

4. (profiler-report)

I'll check later on it, and possibly also post the sample wikitext.

-----

1 point by almkglor 6086 days ago | link

Even weirder now:

  ((alt-r 2) (many-r 1) (formatting 0) (close-br 0) (italics 0) (arc-code 0) 
  (wiki-link 0) (open-br 0) (arc-tag-e 0) (bolded-text 0) (nowiki-e 0) 
  (many-format 0) (elided-white 0) (p-alphadig 0) (italicized-text 0) 
  (ampersand-coded-text 0) (ampersand-codes 0) (arc-tag 0) 
  (nowiki-text 0) (nowiki 0) (bold 0) (seq-r -3))
This from a page which Arki reports as having been generated in 1666msec. It's funny, because the total time is 0.

Can you try reproducing the above setup? Maybe it's the threading thing which is bollixing it?

-----

1 point by raymyers 6089 days ago | link | parent | on: Profiler on Anarki

Interesting. Wouldn't the continuation approach be slower though?

-----

1 point by almkglor 6089 days ago | link

Not sure; might actually be faster, since supposedly scheme is much more continuation-friendly than lazy-evaluation-friendly. Or not, I'm not sure

Anyway I haven't managed to wrap my head around how to implement 'seq in a continuation style; also, there's the problem of how 'cut on a parser would work in a nonlazy language - supposedly as soon as the parser given to 'cut consumes anything, 'cut will somehow dispose the 'fail continuation. I'd assume 'cut will be nearer to cut-seq, which would approximately be a cut:seq -

  (def cut-seq (parser . rest)
    (let next (seq rest)
      (fn (remaining pass fail)
        (parser
          ; since the name "remaining" is reused here,
          ; a smart compiler should be able to
          ; dispose of the outer function's remaining
          (fn (parsed remaining actions)
            (next remaining
                  (fn (parsed2 remaining actions2)
                      (pass (+ parsed parsed2)
                            remaining
                            (+ actions actions2)))
                  [err "parse error!"]]))
          fail))))
Since fail is disposed upon continuing to next (which is the rest of the sequence) everything which fail holds could then be reclaimed. 'alt's pass is its own pass, but its 'fail contains a closure.

-----

1 point by almkglor 6089 days ago | link

The nice thing here too is that we completely lose the 'return structure. The current code keeps composing and decomposing it anyway. Hmm:

  (def filt (fun parser)
    (fn (remaining pass fail)
      (parser remaining
        (fn (parsed remaining actions)
          (pass (fun parsed) remaining actions))
        fail)))
If we assume that the base compiler isn't very optimizing (i.e. captures closures when it doesn't need to) we could use helper functions for new 'pass functions:

  (let new-pass
       (fn (fun pass)
         (fn (parsed remaining actions)
           (pass (fun parsed) remaining actions)))
    (def filt (fun parser)
      (fn (remaining pass fail)
        (parser remaining
                (new-pass fun pass)
                fail))))

-----

2 points by almkglor 6089 days ago | link

I've just built a continuation-approach parser combinator library, pushed on the git. It's currently faster on 'many and 'seq, but slower on 'alt. It's considerably faster on Arki's use case, however (about 33% less time).

-----

2 points by raymyers 6088 days ago | link

From the comments: "...continuation-passing style, which is hard on the eyes..."

You weren't kidding :) However, I congratulate you on your superior understanding of continuations.

-----

1 point by almkglor 6088 days ago | link

Thanks. And #2 wasn't kidding either.

That said I think the main issues slowing down the monadic treeparse are:

1. Use of 'return. I'm not sure if mzscheme can optimize this away (it's just being used as a multiple-value return, and is immediately destructured by the calling function). Heck, I'm not even sure Arc destructuring is in a form optimizable by Scheme. Using continuations, this tuple becomes the parameters of the call to the continuation, and that I'm somewhat convinced that Scheme (all Schemes) are capable of optimizing. Also, by avoiding explicit destructuring, we can be reasonably sure that Arc gives the parameter list plain to Scheme (not that I've actually studied ac.scm much).

2. Copying of 'parsed. Unfortunately, we have to copy 'parsed, because of 'actions! This is because 'sem attaches the 'parsed part to the 'actions field (via closure), and so we can't reuse this 'parsed part (alternatively we could just copy 'parsed inside 'sem, instead of copying on 'many and 'seq). IMO: drop 'actions ^^. Copying 'parsed, I suspect, may be the single most time-consuming part of the monadic treeparse.

3. Using a 'tconc cell, while neater, is less efficient (about 20% IIRC) than explicit and separate head+tail variables. My goal in optimizing treeparse was to maintain legibility while adding speed; my goal in cps_treeparse was to optimize, legibility be damned^H^H^H^H^H^H^H consigned to hell.

4. There's a few more bits and pieces that may be optimizable - for example, most of my 'filt usage is (filt [map something _] parser). If we are sure that 'parsed will not need to be reused (assured by dropping 'actions, or giving the responsibility of copying 'parsed to 'sem) then we can define a (filt-map something parser) which returns the same list as 'parser, but with their car's transformed by 'something:

  ;monadic
  (def filt-map (fun parser)
    (fn (remaining)
      ; no actions!
      (iflet (parsed remaining) (parser remaining)
        ((afn (p)
           (when p
             (zap fun (car p))
             (self cdr p)))
         parsed)
        (return parsed remaining))))

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))
Using filt-map would reduce consing, and especially in the cps case would remove the scan through 'parsed to find the 'parsedtl.

Further optimizations abound: another common idiom I found in my wiki parser was (filt nilfn (seq-str "something")), which basically ignores the 'parsed return value of 'seq, It may be possible to create a variant of 'seq, 'seq-nil, which wouldn't bother catenating the 'parsed return values of subparsers.

So no, I don't think continuation-passing per se causes most of the speed difference, except possibly for eliminating the 'return structure.

Another possible issue is the use of a list-like structure. 'scanner-string also conses on each 'cdr, and the total memory usage actually exceeds (coerce "string" 'cons) - it's just that scanners are sexier ^^.

For this case it may be possible to create a variant of the CPS style which takes a tuple of (string, index, end) - as explicit and separate variables of course, so that underlying scheme has a shot at optimizing it away - and use indexing, which I suspect is faster on strings (and which scanner-string ends up doing anyway). This would remove the need for consing too.

-----

1 point by almkglor 6088 days ago | link

LOL forgot to give the parser the list to parse:

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser remaining ;oops!
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))

-----


Great to see someone using treeparse. Using scanner-string to view strings as lists is an approach I had never considered. As nex3 said, scanners are a surprisingly useful abstraction, showing yet more uses for lazy evaluation.

-----

1 point by almkglor 6091 days ago | link

That said, the formatter in wiki-arc is plenty darned slow on &amp; codes (I used a really long 'alt form). Lacking a profiler, I can't really tell whether it's treeparse or scanner-string which is slow. Any suggestions on optimization? I think one problem is that each parser returns a list of stuff, without any chance to reuse that list (especially for 'seq forms). Another problem might be that scanner-string is just too slow for easy usage.

-----

1 point by raymyers 6091 days ago | link

I haven't fully digested the wiki parser yet. As a first thought, the use of enclose-sem confuses me a bit -- seems like a reinvention of filt. I doubt that would be a performance issue of course.

Maybe the grammar should be factored, or maybe this is an opportunity to optimize treeparse.

Try this: convert the string to a normal list of characters in advance, and see if the performance improves:

  (map idfn (scanner-string "hello world"))
If that doesn't help, then scanner-string is not the problem.

-----

3 points by almkglor 6091 days ago | link

Hmm, sorry, I didn't fully digest 'filt either. I thought 'filt was for transforming the 'parsed field in the return value? 'enclose-sem is intended to act on the 'actions field in the return value (although I did end up passing the entire return value).

The main slowdown occurred on adding &amp; codes. Here's my most degenerate case:

&Agrave; &Aacute; &Acirc; &Atilde; &Auml; &Aring; &AElig; &Ccedil; &Egrave; &Eacute; &Ecirc; &Euml; &Igrave; &Iacute; &Icirc; &Iuml; &Ntilde; &Ograve; &Oacute; &Ocirc; &Otilde; &Ouml; &Oslash; &Ugrave; &Uacute; &Ucirc; &Uuml; &szlig; &agrave; &aacute; &acirc; &atilde; &auml; &aring; &aelig; &ccedil; &egrave; &eacute; &ecirc; &euml; &igrave; &iacute; &icirc; &iuml; &ntilde; &ograve; &oacute; &ocirc; &oelig; &otilde; &ouml; &oslash; &ugrave; &uacute; &ucirc; &uuml; &yuml; &iquest; &iexcl; &sect; &para; &dagger; &Dagger; &bull; &ndash; &mdash; &lsaquo; &rsaquo; &laquo; &raquo; &lsquo; &rsquo; &ldquo; &rdquo; &trade; &copy; &reg; &cent; &euro; &yen; &pound; &curren; x&sup1; x&sup2; x&sup3; &alpha; &beta; &gamma; &delta; &epsilon; &zeta; &eta; &theta; &iota; &kappa; &lambda; &mu; &nu; &xi; &omicron; &pi; &rho; &sigma; &sigmaf; &tau; &upsilon; &phi; &chi; &psi; &omega; &Gamma; &Delta; &Theta; &Lambda; &Xi; &Pi; &Sigma; &Phi; &Psi; &Omega; &int; &sum; &prod; &radic; &minus; &plusmn; &infin; &asymp; &prop; &equiv; &ne; &le; &ge; &times; &middot; &divide; &part; &prime; &Prime; &nabla; &permil; &deg; &there4; &alefsym; &oslash; &isin; &notin; &cap; &cup; &sub; &sup; &sube; &supe; &not; &and; &or; &exist; &forall;

Takes about 5-6 seconds to render; also, if I do some thing on the repl (such as searching using (help "searchstring"), or loading some random library) the parsing takes up to 12 seconds. Anyway I've added a rendering time at the lower right of each rendered page. Note also that I've added caching, to disable caching you'll need to search through wiki-arc.arc for *wiki-def and change the (cached-table) call to (cached-table 'cachetime 0).

-----

3 points by raymyers 6091 days ago | link

I think I've found the problem. Calling a many parser on a list that long (over 1000 characters) takes waaay too long.

Even something simple like this takes around 6 second on my machine:

   ((many anything) (range 1 1000)))
If treeparse is going to be feasible for parsing large strings, the basic combinators like maybe and alt will need to speed up.

-----

1 point by almkglor 6091 days ago | link

I tried the following modification on 'many:

  (def many (parser)
    "Parser is repeated zero or more times."
    (fn (remaining) (many-r parser remaining nil nil nil nil)))

  (let lastcdr (afn (p) (aif (cdr p) (self it) p))
    (def many-r (parser li acc act-acc acctl act-acctl)
      (iflet (parsed remaining actions) (parse parser li)
             (do
               (when parsed
                 ; edit: necessary, it seems that some of the other
                 ; parsers reuse the return value
                 (zap copy parsed)
                 ; end of edit
                 (if acc
                     (= (cdr acctl) parsed)
                     (= acc parsed))
                 (= acctl (lastcdr parsed)))
               (when actions
                 ; edit: necessary, it seems that some of the other
                 ; parsers reuse the return value
                 (zap copy actions)
                 ; end of edit
                 (if act-acc
                     (= (cdr act-acctl) actions)
                     (= act-acc actions))
                 (= act-acctl (lastcdr actions)))
               (many-r parser remaining
                       acc act-acc acctl act-acctl))
             (return acc li act-acc))))
Basically instead of using join, I used a head+tail form of concatenating lists. It seems to work, and the optimization above seems to drop the test:

   ((many anything) (range 1 1000)))
down to 27 msec (edited: 58msec) on my machine (it was about 7350msec on the older version)

What are your thoughts? The code now looks unprintable. Also, I'm not 100% sure of its correctness.

UPDATE: yes, it's not correct, however the edited version above seems to work now. Rendering of my "difficult" page has dropped to 1100msec.

-----

2 points by almkglor 6091 days ago | link

I've since added a 'tconc facility to Anarki. Basically tconc encapsulates away the head+tail form of list catenation; a single cons cell is used with car==head and cdr==tail.

The head of the list is the start of the list, while the tail of the list is the last cons cell:

  cons
    O->cdr
    v
    car

  the list (1 2 3 4 5):
  O->O->O->O->O->nil
  v  v  v  v  v
  1  2  3  4  5

  the tconc cell for the above list:
  tconc cell
  O-----------+
  | head      | tail
  v           v
  O->O->O->O->O->nil
  v  v  v  v  v
  1  2  3  4  5
'tconc creates a new cell and modifies the tconc cell to repoint the tail to the new tail. You can extract the currently concatenated list by using 'car on the tconc cell.

The diff between my version of treeparse and yours is now:

  --- treeparse.arc     2008-03-21 11:59:13.000000000 +0800
  +++ m_treeparse.arc   2008-03-22 23:00:51.000000000 +0800
  @@ -23,4 +23,6 @@
   ; Examples in "lib/treeparse-examples.arc"
   
  +(require "lib/tconc.arc")
  +
   (mac delay-parser (p)
     "Delay evaluation of a parser, in case it is not yet defined."
  @@ -112,12 +114,12 @@
   (def many (parser)
     "Parser is repeated zero or more times."
  -  (fn (remaining) (many-r parser remaining nil nil)))
  +  (fn (remaining) (many-r parser remaining (tconc-new) (tconc-new))))
   
   (def many-r (parser li acc act-acc)
     (iflet (parsed remaining actions) (parse parser li)
            (many-r parser remaining
  -                 (join acc parsed) 
  -                 (join act-acc actions))
  -         (return acc li act-acc)))
  +                 (nconc acc (copy parsed))
  +                 (nconc act-acc (copy actions)))
  +         (return (car acc) li (car act-acc))))
   
   (def many1 (parser)
edit: note that use of 'tconc/'nconc is slightly slower than explicitly passing around the tails. For the test, it runs at 79 msec on my machine (explicit passing ran at 58msec); this is expected since we must destructure the cons cell into the head and tail of the list under construction. Would it be perhaps better to use a macro to hide the dirty parts of the code in explicit passing of hd and tl?

-----

1 point by raymyers 6090 days ago | link

Nice optimization. I'm not so sure about the naming of nconc, though. Although it is used for a similar purpose as the traditional CL nconc, I would expect anything called nconc to behave like this:

  (def last-list (li)
    (if (or (no li) (no (cdr li))) li
        (last-list (cdr li))))

  (def nconc (li . others)
    "Same behavior as Common Lisp nconc."
    (if (no others) li
        (no li) (apply nconc others)
        (do (= (cdr (last-list li)) (apply nconc others))
            li)))

-----

1 point by almkglor 6090 days ago | link

Ah crick; let me change that to lconc, that was what I was thinking ^^

I picked up 'tconc and lconc from Cadence Skill; see:

http://www.ece.uci.edu/eceware/cadence/sklanguser/chap8.html...

Funny that CL doesn't actually have this facility ^^

Will rename this soon, in the meantime, do you think this optimization is worth putting in treeparse?

-----

4 points by raymyers 6090 days ago | link

>> do you think this optimization is worth putting in treeparse?

Certainly. At the moment you are probably 50% of the treeparse user base, so it needs to be fast enough for your use case :)

I admit that efficiency wasn't a big thought when I first wrote treeparse (besides avoiding infinite loops -- hopefully those are gone now...). I fondly remember my CL optimization days... we've gotta make ourselves one of those nifty profilers for Arc.

-----

3 points by almkglor 6090 days ago | link

>> we've gotta make ourselves one of those nifty profilers for Arc.

True, true. I was optimizing random functions in treeparse, but didn't get a boost in speed until you suggested optimizing 'many.

-----

1 point by almkglor 6090 days ago | link

It seems that 'many is the low-hanging fruit of optimization. I've since gotten an 8-paragraph lorem ipsum piece, totalling about 5k, which renders in 3-4 seconds (about around 3800msec).

Hmm. Profiler.

I'm not 100% sure but maybe the fact that nearly all the composing parsers decompose the return value of sub-parsers, then recompose the return value, might be slowing it down? Maybe have parsers accept an optional return value argument, which 'return will fill in (instead of creating its own) might reduce significantly the memory consumption (assuming it's GC which is slowing it down)?

mockup:

  (def parser-function (remaining (o retval (list nil nil nil)))
    (....)
    (return parsed li actions retval))

  (def many-r (parser remaining acc act-acc (o retval (list nil nil nil)))
      (while (parse parser remaining retval)
        ; parsed
        (lconc acc (copy (car retval)))
        ; actions
        (lconc act-acc (copy (car:cdr:cdr retval)))
        (= remaining (car:cdr scratch)))
      (return (car acc) remaining (car act-acc) retval))
Removing 'actions might help too - we can now use just a plain 'cons cell, with car == parsed and cdr == remaining.

-----

1 point by raymyers 6090 days ago | link

I tried taking out actions for the heck of it. Removing them yields roughly a 30% speed increase on this benchmark:

  (time (do ((many anything) (range 1 5000)) nil))
Using the following method, we can keep actions as a feature but still get the 30% speedup when we don't use them.

  (def many (parser)
    "Parser is repeated zero or more times."
    (fn (remaining) (many-r parser remaining (tconc-new) nil)))

  (def many-r (parser li acc act-acc)
    (iflet (parsed remaining actions) (parse parser li)
           (many-r parser remaining
                   (lconc acc (copy parsed))
                   (if actions (join act-acc actions) act-acc))
           (return (car acc) li act-acc)))
Not bad, but still not as fast as we'd want for processing wiki formatting on the fly...

ed: Yes. act-acc, not (car act-acc).

-----

1 point by almkglor 6090 days ago | link

Hmm. If you remove 'actions, how about also trying to use just a single 'cons cell:

  (iflet (parsed . remaining) (parse parser remaining)
    ...)

  (def return (parsed remaining)
    (cons parsed remaining))
?

If the speed increase is that large on that testbench, it might very well be due to garbage collection.

This might be an interesting page for our problem here ^^

http://www.valuedlessons.com/2008/03/why-are-my-monads-so-sl...

-----

1 point by raymyers 6090 days ago | link

Tried changing the list to a single cons cell. I did not see any additional performance boost.

-----

1 point by almkglor 6090 days ago | link

  (def many-r (parser li acc act-acc)
    (iflet (parsed remaining actions) (parse parser li)
           (many-r parser remaining
                   (lconc acc (copy parsed))
                   (if actions (join act-acc actions) act-acc))
           (return (car acc) li (car act-acc))))
s/(car act-acc)/act-acc maybe?

Personally I don't mind losing 'actions, it does seem that 'filt would be better ^^.

-----

1 point by almkglor 6090 days ago | link

I tested this on my 8-paragraph 5000-char lorem ipsum page, and the run dropped down to about 3400msec (from 3800 msec).

Hmm. Not sure where the slow down is now ^^

I've tried my "retval" suggestion and it's actually slower, not faster. So much for not creating new objects T.T;

-----

1 point by almkglor 6090 days ago | link

Arrg, I've built a sort-of profiler for the wiki, it's on the git, to enable just look for the line:

  (= *wiki-profiling-on nil)
And change it to t, then reload Arki to turn it on. Then use (*wiki-profile-print) to print out the profile report.

Note that turning on profiling increases time by a factor of > 5. Don't use unless desperate.

Anyway a sample run - this page was rendered in about 800msec without profiling, with profiling it took about 5150msec:

  bold: 305
  bolded-text: 914
  nowiki-e: 31
  open-br: 305
  seq-r: 2681
  italicized-text: 793
  many-format: 4830
  plain-wiki-link: 841
  alt-r: 4555
  nowiki-text: 584
  nowiki: 184
  joined-wiki-link: 413
  ampersand-coded-text: 486
  ampersand-codes: 171
  italics: 128
  many-r: 4829
  formatting: 4616
  close-br: 39
Note that the timing will not be very accurate or particularly useful IMO, since it doesn't count recursion but does count calls to other functions. Sigh. We need a real profiler ^^

-----

2 points by raymyers 6089 days ago | link

>> Sigh. We need a real profiler ^^

Maybe this'll help. http://www.arclanguage.org/item?id=5318

-----

1 point by raymyers 6090 days ago | link

Knowing a bit about the call hierarchy, maybe we can squeeze a bit more knowledge out of that. Here's what seems to be going on:

    formatting               4616
    [-] alt-r                4555
     | bolded-text            914
     | plain-wiki-link        841
     | italicized-text        793
     | nowiki-text            584
     | ampersand-coded-text   486
     | joined-wiki-link       413

-----

1 point by almkglor 6089 days ago | link

Hmm, then the total time of alt-r's children is 4031, leaving 524 msec in alt-r itself.

My test page has quite a bit of bolded text (for testing), so I suppose it's the reason why bolded-text is the highest. Hmm.

Anyway I'm thinking of adding the following parser to the top of the big 'alt structure in formatting:

  (= plain-text
    (pred [or (alphadig _) (whitec _) (in _ #\. #\,)] anything))

  (= formatting
    (alt
      plain-text
      ...))
Hmm. It seems we can't squeeze much performance out of 'alt, I can't really see a way of optimizing 'alt itself, so possibly we should optimize the grammar that uses 'alt.

-----

1 point by almkglor 6089 days ago | link

Did something highly similar to this, it reduced my 8-paragraph lorem ipsum time from about 3200msec to 2100msec.

-----

1 point by raymyers 6091 days ago | link

True, filt doesn't touch the actions field, while sem does. However, I am usually able to replace the use of actions with filters that operate on the parsed field. I prefer this, because the filter style is a more clean and functional style -- rather than relying on side-effects. Hopefully that made sense.

I don't yet know for certain whether filters could or should be used in this particular case. enclose-sem might be the right way to go after all.

-----

1 point by almkglor 6091 days ago | link

I'll have to defer to you on this one - I've only written a parser combinator type parser once before, and that was before I learned what it was. I did end up using something nearer to filters (i.e. acts on the returned value instead of having a separate 'actions field).

Edit: Perhaps part of the optimization of treeparse could be to eliminate the 'actions field. None of your samples used 'sem, and I might prefer 'filt for such enclosing stuff as ''bold'' and '''italics'''. The [[link]]s might be more difficult (because of the necessity of adding alphadigs after the closing bracket in [[link]]s to the actual link text but not the link target), but I think it's doable.

-----

2 points by raymyers 6090 days ago | link

I've thought several times of removing sem in favor of filt. As features, they are very similar but filt is usually cleaner. If I don't see a compelling use case for actions soon I may indeed remove them. This would simplify the interface considerably.

Just between us, filt is actually an analogue to Haskell's monadic lift operator. They're even anagrams! (this happened by accident.)

-----

1 point by almkglor 6090 days ago | link

Okay, I've since ported the Arki wikiformat parser to use filt instead of sem. Removing 'actions would reduce slightly the memory consumption of treeparse.

-----

1 point by almkglor 6091 days ago | link

>> Try this: ...

Doesn't help. I inserted (map idfn ...) on the value given to enformat, which is really the currying of enformat-base (the very short (fn ) at the end):

  (fn (p)
    (carry-out (parse (many formatting) (map idfn p)))))
Hmm. I tried refactoring the (apply alt (map str-seq ...)) thing to an alt-str-seq, basically I created a trie and then did alt on each key, with each sub-trie another 'alt and stored sequences converted to 'seq, and stored end-of-strings (nil keys) as 'nothing, but it was buggy and didn't improve performance T.T.

I've been doing quite a few attempts at improving performance but often I ended up getting more bugs, waa!

-----

2 points by raymyers 6091 days ago | link

Hmm. I just tried replacing the scanner. Just as you also found, it didn't help performance.

  (def scanner-string2 (s (o start 0) (o end (len s)))
    (map idfn (scanner-string s start end)))
I'll see if I can find time soon to digest the wiki-arc grammar. Possibly it could be optimized, but I suspect treeparse could be handling it better. There are probably lessons to be learned from how Parsec gets good performance.

-----

1 point by almkglor 6090 days ago | link

Found the parsec paper:

http://legacy.cs.uu.nl/daan/download/papers/parsec-paper.pdf

http://legacy.cs.uu.nl/daan/pubs.html#parsec

It seems that part of Parsec's optimization is that it is actually limited to an LL(1) grammar (whatever that means) and it's <|> ('alt) will fail immediately if the first parser ever consumed anything at all. Not sure how that translates to treeparse.

-----

1 point by raymyers 6090 days ago | link

LL(1) grammars only require one token of look-ahead to parse.

Parsec does not strictly require this, it can handle infinite look-ahead grammars. However, for good performance, it is best to use LL(1) grammars -- so there will be no backtracking required.

When using Parsec, I have often been surprised by the quick-failing behavior of <|> that you mentioned. Thus, I did not duplicate it in treeparse.

-----

1 point by almkglor 6089 days ago | link

Hmm. Apparently it's the fast way of doing it, although I'm not sure how to implement it in the first place.

-----

6 points by raymyers 6104 days ago | link | parent | on: Will Arc ever be as fast as CL?

Since Scheme and CL are both specifications, it would be more meaningful to ask "Is MzScheme slower than SBCL?", for instance.

-----

3 points by KirinDave 6104 days ago | link

And to expand on that, the answer is generally yes. MzScheme is a great version of scheme for its completeness, not its speed. However, several schemes are directly competitive SBCL and CMUCL. Arc could be ported to one of these without too much effort.

-----

3 points by sacado 6104 days ago | link

As for speed, Stalin Scheme is amazing for example. Not very complete and well documented for example. However, CL is full of "efficiency hacks". Scheme is full of purity, not always easy to implement efficiently. Optimizing CL is easier than Scheme, IMHO.

-----

2 points by sramsay 6103 days ago | link

Why do you think CL is easier to optimize that Scheme? This isn't a hostile question; I'm just curious. Intuitively, I feel like "purer" languages should be easier to optimize.

-----

2 points by sacado 6103 days ago | link

First, CL has in the standard many possibilities for making declarations reagrding optimization : for the functions you want, you can compile code, declare types (e. g. this var only holds fixnums), declare you want to optimize the speed and ignore type safety, etc. This way, you end up writing code the way you would write it in C. There is no such thing in the Scheme standard. Individual implementations could, of course, but as far as I know no one does.

Abother example is call/cc. This is a very interesting beast, only existing in Scheme. But it is hard to implement efficiently.

The last example I can think of is 'nil. Using nil as false and the empty list is very interesting in this regard : you can implement nil as the NULL pointer, which is also 0, the false boolean. Less manipulations to do on the bare metal. Distinguishing between #f and '(), on the contrary, implies making more tests at the lower levels.

There are other points I guess...

-----

3 points by almkglor 6103 days ago | link

re: call/cc - I think a bit of the lambda the ultimate series of papers eventually boils down to the realization that a machine language jump-to-subroutine is equivalent to a call/cc, and the target of the call/cc just has to access the return address on the stack as a function address.

-----

3 points by kens1 6103 days ago | link

I don't get it. There's the whole stack copying for call/cc, so call/cc is much more expensive.

(I read the "Lambda the ultimate GOTO" paper you referenced earlier; it's about goto vs structured programming, not call/cc. As an aside, it's very interesting to reflect on just how controversial structured programming was.)

-----

4 points by soegaard 6103 days ago | link

Implementing call/cc efficiently has been well-reasearched in the Scheme community. For a very well-written account of a non-stack-copying implementation see

R. Kent Dybvig. "Three Implementation Models for Scheme". PhD. Thesis. http://www.cs.indiana.edu/~dyb/papers/3imp.pdf

Then continue at ReadScheme at "Compiler Technology/Implementation Techniques and Optimization" to see further developments (Look especially for Clinger's papers).

http://library.readscheme.org/page8.html

-----

2 points by almkglor 6103 days ago | link

Who said anything about copying stack? For that matter - who said local variables should be kept in the stack anyway?

-----

1 point by kens1 6103 days ago | link

In MIT Scheme, the stack gets copied; at least that's what I was told last week. Whether or not you use a stack, the state will need to be copied.

-----

1 point by almkglor 6103 days ago | link

In a function call, the state (the current computation being done) is saved anyway, and therefore "copied" if that is your preferred term. So ideally, call/cc should have the same overhead as an ordinary function call; the only difference is that in call/cc the continuation state is the value given to the function, while in a function call it's just one of the values given to the function.

Note however that much of the theoretical analyses of call/cc make a basic assumption of a "spaghetti stack", which would mean that partially unwound stacks would be saved implicitly as long as any continuation exists which refers to that stack, and all stacks themselves are subject to garbage collection. Most machines don't actually have a spaghetti stack and can't make a spaghetti stack anyway ^^. That said a spaghetti stack could be implemented as a simple list, with push == cons and pop = cdr.

Alternatively store the local variables on a garbage-collected heap, and include a reference to the local variables with the continuation/return address (you'll probably need to save the pointer-to-local-variables anyway, since the target function is likely to use that pointer for its own locals). Again, no additional overhead over plain function calls, except perhaps to restructure the return address and the pointer-to-local-variables.

Don't know about MIT Scheme, but if I were to implement call/cc on stock hardware and compiling down to machine language that's what I'd do ^^

-----

3 points by kens1 6102 days ago | link

I'm still totally not understanding your claim that call/cc should have the same overhead as an ordinary function call.

I read the Clinger "Implementation Strategies for Continuations" paper and they found call/cc about 10 times slower than function calls on the tak/ctak tests. I tried those tests on PLT Scheme and the overhead I saw is even worse: .7 seconds with function calls vs 51.8 seconds with continuations on (tak 24 16 8).

Clinger says about the stack strategy: "When a continuation is captured, however, a copy of the entire stack is made and stored in the heap. ... Variations on the stack strategy are used by most implementations of Scheme and Smalltalk-80."

-----

3 points by almkglor 6102 days ago | link

Components of function call: (1) put arguments somewhere (2) put return address somewhere (3) jump to function.

Components of call/cc: (1) put return address as argument somewhere (2) put return address somewhere (3) jump to function.

That said, continuations generally assume that "put X somewhere" means somewhere != stack. Yes, even return addresses are not intended to be on the stack when using continuations. The main problem is when compiling down to C, which always assumes that somewhere == stack. If you're compiling down to machine language where you have access to everything, then you can just ignore the stack, but not in C, which inherently expects the stack.

-----

3 points by sramsay 6103 days ago | link

Ah, I get it. I suppose typing is one of the big issues affecting speed. If the language standard insists on dynamic typing, there might be no way to get certain kinds of optimizations.

And yeah, call/cc is probably always going to be a bear. But man is it cool. :)

I suppose this goes against what a few of us (including me) were saying above -- that the language and the speed are really separate issues. Or maybe it's more coherent to say that language standards (as opposed to "languages" generally understood) can have a profound effect on speed. If they don't give implementors a lot of choice, they can box people into certain corners.

It's interesting that Scheme actually mandates optimization in at least one case (tail recursion). I don't know how many language standards make those kinds of demands, but I suspect there aren't many.

-----

2 points by sacado 6103 days ago | link

Tail recursion is interesting as it is not especially an optimization for speed but as a way to make programmers rely primarily on functional programming : if you don't have it, functional programming is rapidly a dead-end as you can make the stack explode really fast. As a bonus, it is faster :)

-----

3 points by raymyers 6111 days ago | link | parent | on: Parser combinator library for trees

To show more of what is possible, "lib/treeparse-examples.arc" shows how to define the 20-line hcase2 macro, a pattern matching case statement that extends defpat with guard clauses and some infix sugar.

    (def union (< xs ys)
      "Merge two sorted lists, discarding duplicates."
      (hcase2 `(,xs ,ys)
        xs () = xs
        () ys = ys
        (x . xt) (y . yt)
        || x < y = x : (union < xt ys)
        || y < x = y : (union < xs yt)
        || t     = x : (union < xt yt)))

-----


I was skeptical, but I tried it out.

    (map [+ 1] li) ; does the same as the Haskell
     map (+ 1) li
But this has the added bonus of being able to sensibly curry variadic functions like + and map.

    (map [+ 1] '(1 2 3 4) '(1 2 3 4))  =>  (3 5 7 9)
    (apply [map +] '((1 2) (1 2) (1 2)))  =>  (3 6)
I'm sold :)

-----

2 points by almkglor 6111 days ago | link

([apply [map +]] '((1 2) (1 2) (1 2)))

-----

More