Arc Forumnew | comments | leaders | submitlogin
Project to learn Arc
4 points by jsgrahamus 4907 days ago | 9 comments
I figure I might better my learning of Arc by developing a project in it.

I want to be able to specify a URL, get that webpage, edit it, and display it in a browser.

So I suppose what I need my Arc program to do is: 1) Accept a URL 2) Download that webpage 3) Edit it 4) Call browser passing that edited page

Could you folks give me some hints on #1 and 4?

Thanks, Steve



6 points by thaddeus 4907 days ago | link

1&2.There are a few options:

There's an old anarki library you can use to download a webpage. The library can be seen here: http://github.com/nex3/arc/tree/arc2.master/lib/http-get/. However - I once used that library for my personal market scanner (downloads data for tens of thousands of stocks) and found it to be fragile. My suggestion is to use the 'system' command (http://files.arcfn.com/doc/os.html) in order to call 'curl' or 'wget'. See http://curl.haxx.se/ or http://www.gnu.org/software/wget/. These utilities auto-download the webpage for you.

3. You will then need to 'readfile', see http://files.arcfn.com/doc/io.html

4. You'll need to edit the data, then re-serve the page, in which case you use 'defop' (http://files.arcfn.com/doc/srv.html) or you can write the file out to your static directory.

[edit] #4: If you choose to use the static directory you may need to ensure arc is set up to serve out certain types of files (i.e. .js, etc). See http://arclanguage.org/item?id=10620. Anarki has already taken care of this, but arc proper has not.

-----

1 point by parenthesis 4904 days ago | link

>> My suggestion is to use the 'system' command …

One just needs to be careful to avoid command injection vulnerabilities.

-----

1 point by akkartik 4904 days ago | link

Eek, meant to upvote.

-----

1 point by akkartik 4904 days ago | link

Can you elaborate on how http-get is fragile?

-----

1 point by thaddeus 4904 days ago | link

My memory is a little foggy, but I remember a system fork error.

Hmmm... I also found the old related post and there was also another error: "PLT Scheme virtual machine has run out of memory; aborting Aborted"

http://arclanguage.org/item?id=11899

I know the nginx comments add a little confusion to the post (as I wasn't sure at the time what was going on), but I don't think Apache vs. nginx was the problem.

In retrospect it may not be the library, it might have been scheme, but none the less when I switched to wget, the problem went away.

-----

2 points by zck 4907 days ago | link

I've been working on how to hit a webpage in arc using the underlying scheme tcp-connect function. This code -- Racket, not Arc -- will hit google, get its source, and print it on standard out:

  (let-values ([(from-server to-server) (tcp-connect "google.com" 80)])
              (write-string "GET / HTTP/1.1\nHost: google.com\n\n" to-server)
              (close-output-port to-server)
              (do ((response (read-line from-server) (read-line from-server)))
                  ((eof-object? response))
                  (write response)
                  (newline)))
I've been having trouble getting it all to work in arc by hacking ac.scm with some 'xdefs. I can get 'tcp-connect to read just fine, but 'eof-object doesn't seem to ever return true. I'm going to try this weekend, and I'll see what I can find. Here's the working 'xdef for tcp-connect:

  (xdef tcp-connect (lambda (host port)
                      (let-values ([(from-server to-server)
                                    (tcp-connect host port)])
                        (list from-server to-server))))

-----

2 points by rocketnia 4906 days ago | link

A while ago I tried to get www.google.com using Anarki's web.arc (https://github.com/nex3/arc/blob/master/lib/web.arc). It appended a spurious "?&" to the end of the URL, but once I hacked in a fix to keep that from happening, it worked just fine. (Well, I think I encountered one more issue occasionally: If there's absolutely no response, even headers, 'parse-server-headers blocks forever.)

To read the response body, web.arc just says (tostring (whilet line (readline in) (prn line))). Apparently it there's EOF detection somewhere in this process....

Sorry I'm not providing working code, or even a reliable memory. XD I suppose I'll try things out again tonight and commit any fixes back to Anarki.

-----

1 point by rocketnia 4906 days ago | link

Oh, web.arc already worked with Google. Since the last time I tried this, I think Google must have updated to accept http://www.google.com/?&; as a legitimate home page URL.

Still, that spurious "?&" would probably break other pages, so I committed a fix: https://github.com/nex3/arc/commit/ea05adec8235fe3713b05dd9f...

Here's how to use web.arc:

  (load "lib/web.arc")
  (google "foo")              ; It comes with its own Google demo. :-p
  (get-url "www.google.com")

-----

1 point by rocketnia 4906 days ago | link

Hmm, you should not see a semicolon at the end of this line: http://&;

It seems to be a bug in the Arc Forum right now.

-----