Arc Forumnew | comments | leaders | submitlogin
3 points by bogomipz 6102 days ago | link | parent

So ruby has a syntax for regular expressions, such as /\D+/. What I've always wondered is, does this have any advantage at all?

I mean, the actual regex operations are done by methods on the string class, which like nex3 mentioned is at the library level.

Is there any reason

  a_string.split(/\D+/)
is better than

  a_string.split("\D+")
Please do enlighten me.


5 points by nex3 6102 days ago | link

A distinction between regexen and strings is actually very handy. I've done a fair bit of coding in Ruby, where this distinction is present, and a fair bit in Emacs Lisp, where it's not.

There are really two places where it's really important. First, if regexen are strings, then you have to double-escape everything. /\.foo/ becomes "\\.foo". /"([^"]|\\"|\\\\)+"/ becomes "\"([^\"]|\\\\"|\\\\\\\\)+\"". Which is preferable?

Second, it's very often useful to treat strings as auto-escaped regexps. For instance,

  a_string.split("\D+")
is actually valid Ruby. It's equivalent to

  a_string.split("D+")
because D isn't an escape char, which will split the string on the literal string "D+". For example

  "BAD++".split("D+") #=> ["BA", "+"]
Now, I'm not convinced that regexen are necessary for nearly as many string operations as they're typically used for. But I think no matter how powerful a standard string library a language has, they'll still be useful sometimes, and then it's a great boon to have literal syntax for them.

-----

3 points by bogomipz 6102 days ago | link

Ok, so what it comes down to, is that you don't want escapes to be processed. Wouldn't providing a non-escapable string be far more general, then?

Since '\D+' clashes with quote, maybe /\D+/ is a good choice for the non-escapable string syntax. Only problem is that using it in other places might trigger some reactions as the slashes make everybody think of it as "regex syntax".

-----

3 points by nex3 6102 days ago | link

Escaping isn't the only thing. Duck typing is also a good reason to differentiate regular expressions and strings. foo.gsub("()", "nil") is distinct from foo.gsub(/()/, "nil"), and both are useful enough to make both usable. There are lots of similar issues - for instance, it would be very useful to make (/foo/ str) return some sort of match data, but that wouldn't be possible if regexps and strings were the same type.

-----

4 points by bogomipz 6101 days ago | link

Now we're getting somewhere :) For this argument to really convince me, though, Arc needs better support for user defined types. It should be possible to write special cases of existing functions without touching the core definition. Some core functions use case forms or similar to treat data types differently. Extending those is not really supported. PG has said a couple of times;

"We believe Lisp should let you define new types that are treated just like the built-in types-- just as it lets you define new functions that are treated just like the built-in functions."

Using annotate and rep doesn't feel "just like built-in types" quite yet.

-----

2 points by almkglor 6101 days ago | link

Try 'redef on nex3's arc-wiki.git. You might also be interested in my settable-fn.arc and nex3's take on it (settable-fn2.arc).

-----

3 points by earthboundkid 6102 days ago | link

You could always do it the Python way: r"\D+" => '\\D+'

There's also u"" for Unicode strings (in Python <3.0) and b"" for byte strings (in Python >2.6).

-----

2 points by map 6102 days ago | link

If the "x" modifier is used, whitespace and comments in the regex are ignored.

  re =
  %r{
      # year
      (\d {4})
      # separator is one or more non-digits
      \D+
      # month
      (\d\d)
      # separator is one or more non-digits
      \D+
      # day
      (\d\d)
  }x

  p "the 1st date, 1984-08-08, was ignored".match(re).captures

  --->["1984", "08", "08"]

-----