Arc Forumnew | comments | leaders | submit | olavk's commentslogin
3 points by olavk 6093 days ago | link | parent | on: Does Arc need the character type?

I believe that e.g. accented characters like é are implemented as a single glyph in fonts, but are composed of two unicode code points: the base character (e) and a modifier character (´).

This is complicated by the issue that unicode also supports the combined character as a seperate single code point, for backwards compatibility with legacy character sets. However the decomposed (normalized) form is the recommended.

-----

1 point by almkglor 6093 days ago | link

True. A bit of research also suggests that it would be better for both forms to be considered "equal" when comparing individual characters.

-----

1 point by olavk 6093 days ago | link | parent | on: Does Arc need the character type?

You need some way to get to the numerical code-point value of a character, to be able to implement string library functions like casing. You dont need a seperate char data type though, the function could operate on a string and just return the code point of a character (given by index) as an int.

-----

2 points by olavk 6130 days ago | link | parent | on: A solution for "unknown or expired link" ?

If links stops working after a while, I'm pretty sure people will stop linking. And if not, visitors following the links will get a bad experience. You may be right that it wont hurt page rank directly, though.

-----

3 points by olavk 6132 days ago | link | parent | on: A solution for "unknown or expired link" ?

I suppose you never have to use closures, since you can always rewrite to use defop. But a major selling point of Arc seem to be the conciseness of building flows using macros like w/link. If this approach turns out to be not recommended for "real world use", I think it defeats the purpose and it would be fair to say that Arc itself fails the Arc-challenge.

I'd much rather change the underlying implementation of w/link to be more robust, if possible.

(Btw. it is only in the context of links I think long-lived closures are a problem. In the context of responses to form posts I don't think there is a problem, since these are not bookmarkable or indexed anyway.)

-----

2 points by pg 6131 days ago | link

You can also associate explicit lifetimes with closures on the server if you want. See the def of vars-form.

-----

3 points by olavk 6133 days ago | link | parent | on: First Priority: Core Language

Writing flash apps in arc - that would be a _really_ cool showcase.

-----

2 points by olavk 6134 days ago | link | parent | on: Fix: UFT-8 in app server

Cool. I don't think it should be an option though, since the server generates utf-8 anyway - it just doesn't label it correctly. I can't imagine when it would be useful _not_ to indicate the encoding.

-----

7 points by kens 6133 days ago | link

Not indicating the encoding leaves you vulnerable to an XSS attack. For instance, the following looks harmless, but if you don't set the encoding explicitly it can get executed if your browser is set to UTF-7, or auto-detects to UTF-7:

+ADw-script+AD4-alert('XSS')+ADw-/script+AD4-

Edit to add some explanation: if displayed as UTF-7, the above will pop up a "XSS" alert box. It's just an example; it doesn't actually do anything bad but it shows the potential for malicious XSS. A key point is that HTML-escaping your output or filtering out HTML tags isn't enough, since innocuous-looking characters can cause problems if the encoding is misinterpreted.

-----


Another option is to have all necessary information about the current operation in the URL. This is highly scalable, since you don't need to keep track of anything user-specific on the server(s), and the navigation supports branching and back/undo just like you describe.

Strangely enough PG specifically disallow this approach in his competition!

Most web apps need _both_ global session state and URL-based state. As others have pointed out, if you browse a product catalog, you would like to be able to branch into different browser windows or use the back-button. However, when you add an item to the shopping basket, you want it to be a global state change (you want have the same shopping basket in all windows), and you don't want a buy to be undone by clicking back.

Continuations are only an options for handling URL-based state, not for handling global state. And for page state they have some limitations.

For example, if all navigation is handled by continuations, you basically have to store a continuation for every hit indefinitely, since you dont know if the user have bookmarked the URL. If you don't want to store the continuations forever, you should only use them on pages that are not bookmarkable anyway, i.e. pages that are the response to form posts. But then the stated advantages, like the ability to branch and use the back button is moot, since you cannot do that anyway with form responses.

Continuations are really nifty for quick prototypes of web apps, but for production use, I believe they are a leaky abstraction.

-----

1 point by pc 6115 days ago | link

"Continuations are only an options for handling URL-based state"

This isn't true.

-----

3 points by olavk 6134 days ago | link | parent | on: where does Unicode break?

Unicode breaks in the hello-world webapp. E.g. if you write

   (defop hello req (pr "hello world \u1234"))
You get some strange looking text in you browsers. This seem to be because arc is generating UTF-8 output (which I think is MzScheme default) but not declaring the encoding, which will make most browsers default to interpret it as iso8859-1. It seem to be fixed by changing svr.asc line 105 to

    Content-Type: text/html;charset=utf-8

-----


It is not really more work to make strings sequences of (at least) 24bit values rather than sequences of 8bit values. Actually it makes a lot of things simpler, since all strings can be in the unicode character set, rather that a host of different and incompatible 8bit character sets, which is the case in non-Unicode languages.

The difficulties languages like Python and Ruby has is because of backwards compatibility - a lot of existing code expects strings to be 8bit byte arrays. Java and JavaScript got this more right by using 16bit chars. It is still not enough for the full Unicode set, but at least they don't have the problem with strings in multiple incompatible character sets.

-----

1 point by olavk 6139 days ago | link | parent | on: where does Unicode break?

It works on account of Arc using the underlying MzScheme string implementation. Since this is incidental to the host and not part of the arc spec (arc.arc), it is not guaranteed to keep working.

A patch to support unicode (which PG has asked for) would have to include a "native" implementation of strings in Arc, which is a rather fundamental extension to the language, and I suspect the language designers would want to do this themselves? Or would you (the Arc language designers, if you read this) accept such a patch?

-----

More