Arc Forumnew | comments | leaders | submitlogin
1 point by akkartik 5404 days ago | link | parent

So it seems you don't need full-blown html parsing for your scraper.


1 point by thaddeus 5404 days ago | link

Correct.

I just find the subset lines by finding start & end indicator points then write a custom parser for the subset section. I might be wrong, but for my needs a full-blown html parser would be much slower and I'm hitting the same file structure every time (for each stock).

-----

2 points by akkartik 5404 days ago | link

Yes, that def seems reasonable.

Arc is missing an html parser; I may take care of it. It doesn't have to be built in arc, just be callable from within arc.

-----

1 point by thaddeus 5404 days ago | link

I vaguely remember trying this out long time ago... which may help out.

http://github.com/nex3/arc/blob/arc2.master/lib/xml.arc

-----