Arc Forumnew | comments | leaders | submitlogin
Missing string stuff
3 points by lg 5467 days ago | 5 comments
i've been writing some stuff in arc and keep wanting 2 things, maybe they should be added:

1) a function to find the index of the last occurrence of a substring in a string. In ruby this is String#rindex.

2) cut should take indices from the right. I've seen this suggested before. like if str is "how are you" and I want to extract "yo" from that. I know you can say

  (cut str (- (len str) 3) -1)
but I really want to say:

  (cut str -3 -2)
and not (cut str -2 -1), although i guess i could get used to that. but it's easier to think the first one, because I want everything between 3 from the end and 2 from the end, not between 2 from the end and 1 from the end.


2 points by akkartik 5466 days ago | link

I've felt the need for 2) myself; your post caused me to finally do something about it. Thanks!

  (def cut (seq start (o end))
    (with (end    (if (no end)    (len seq)
                     (< end 0)    (+ (len seq) end) 
                                  end)
           start  (if (< start 0) (+ (len seq) start) start))
      (if (isa seq 'string)
          (let s2 (newstring (- end start))
            (for i 0 (- end start 1)
              (= (s2 i) (seq (+ start i))))
            s2)
          (firstn (- end start) (nthcdr start seq)))))

  ;? (include "arctap.arc")
  ;? 
  ;? (test-iso "cut should work"
  ;?   '(2 3)
  ;?   (cut '(1 2 3 4) 1 3))
  ;? 
  ;? (test-iso "cut should default end to last elem"
  ;?   '(2 3 4)
  ;?   (cut '(1 2 3 4) 1))
  ;? 
  ;? (test-iso "cut should work for negative end"
  ;?   '(2 3)
  ;?   (cut '(1 2 3 4) 1 -1))
  ;? 
  ;? (test-iso "cut should work for negative start"
  ;?   '(2 3)
  ;?   (cut '(1 2 3 4) -3 -1))

-----

2 points by rocketnia 5466 days ago | link

One thing that bugs me sometimes with negative indexing is that 0 doesn't always wrap around the way I expect it to. This comes up when I'm trying to write code like (cut seq start (- (some-calculation))). Here's a hack that gives the behavior I expect:

  (let old-cut cut
    (def cut (seq start (o end len.seq))
      (let length len.seq
        (if (< start 0)    (++ start length))
        (if (< end start)  (++ end length))
        (unless (<= 0 start end length)
          (err:+ "cut: The sign-adjusted indices were out of range or "
                 "backwards."))
        (old-cut seq start end))))
  
  ;? (include "arctap.arc")
  ;?
  ;? (test-iso "cut should work for negative start and 0 end"
  ;?   '(2 3 4)
  ;?   (cut '(1 2 3 4) -3 -0))
  ;?
  ;? (test-iso "cut should work for positive start and 0 end"
  ;?   '(2 3 4) 
  ;?   (cut '(1 2 3 4) 1 -0))
  ;?
  ;? (test-iso "cut should assume 0 start and 0 end are same position"
  ;?   '()
  ;?   (cut '(1 2 3 4) 0 -0))
(I haven't actually used arctap.arc yet, so I'm just parroting you in that section and hoping it helps.)

Unfortunately, if I have code that does something like (cut '(1 2 3 4) ref!start-trim (- ref!end-trim)), then this approach fails too (because of that double-zero assumption). In that case, I suppose I'd just resort to writing another function as a workaround.

  (def cut-sides (seq front back)
    (cut seq front (- len.seq back)))

-----

1 point by akkartik 5466 days ago | link

I like your approach; here's mine in similar style:

    (let old-cut cut
      (def cut (seq start (o end))
        (old-cut seq
                 (if (start < 0)
                   (+ (len seq) start)
                   start)
                 end)))
I'm having a hard time assessing the use case, though. Are you trying to avoid +1, etc.?

Also, sometimes the calculations are just as easy to do in positive indices.

-----

1 point by rocketnia 5465 days ago | link

I've searched through lots of the code I've written, and as it turns out, I haven't found one case where being having 0 as a "negative" index would have helped. :-p So while I still like it better my way, I don't have a real use case to show.

From another standpoint, maybe it's just that I don't want to rely on the behavior of something like (cut '(1 2 3 4) 3 1), where the indices are reversed. If I can exempt (cut '(1 2 3 4) -3 0) from that error/undefined zone (in a way that's smoothly consistent with cut's other behavior), then cut may be slightly more useful to me at some point. But yeah, I don't know exactly when it would pay off or whether some other behavior would be better.

-----

3 points by lg 5464 days ago | link

I like this cut, except I changed the (< end 0) case to (+ (len seq) end 1). This gives the behavior I expect:

  (cut "abc" 0)
  ;=> "abc"

  (cut "abc" 0 -1)
  ;=> "abc"

  (cut "abc" -3 -2)
  ;=> "ab"
Also if anyone's interested, here's lastmatch (i.e. #1):

  (def lastmatch (pat seq)
    (catch (if (isa pat 'fn)
               (let leng (- (len seq) 1)
                 (for i 0 leng
                      (when (pat (seq (- leng i))) (throw (- leng i)))))
               (let leng (- (len seq) (len pat))
                 (for i 0 leng
                    (when (headmatch pat seq (- leng i)) (throw (- leng i))))))
           nil))

-----