Arc Forumnew | comments | leaders | submitlogin
Reading a text file into a list or an array
2 points by jsgrahamus 3554 days ago | 26 comments
I have a text file I want to process. I'd like to read it into an array.

What would be the easiest way to do that?

TIA, Steve



2 points by zck 3554 days ago | link

untested code:

  (def read-all (filename)
       (w/infile file filename
                 (drain (readline file))))
`drain` executes its argument repeatedly until it returns nil. (https://arclanguage.github.io/ref/iteration.html#drain)

`readline` returns nil when there's nothing more to read (https://arclanguage.github.io/ref/io.html#readline)

-----

2 points by zck 3553 days ago | link

Now that I've been able to test it, this works, as long as you don't have empty lines in your file. If you do, it works...oddly:

This file:

    This is the first line.
    After this line there is an empty line.
    
    After this line there are two empty lines.
    
    
    This is the last line.

results in this list:

    arc> (read-all "/home/zck/test.txt")
    ("This is the first line." "After this line there is an empty line." "\nAfter this line there are two empty lines." "\n" "This is the last line.")
Note how the newline after the first empty line gets glommed onto the line after it? Yech. But it does work exactly as expected if each line is nonempty.

-----

2 points by akkartik 3553 days ago | link

That is indeed quite lame: http://arclanguage.github.io/ref/io.html#readline

Edit 8 minutes later: seems to work fine for me on anarki.

  arc> (fromfile "x" (drain:readline))
  ("This is the first line." "After this line there is an empty line." "" "After this line there are two empty lines." "" "" "This is the last line.")
  arc> (fromstring "\n\na\nc\n\nd" (drain:readline))
  ("" "" "a" "c" "" "d")
Were you running arc3.1 or something?

-----

2 points by zck 3551 days ago | link

Yeah, I run arc3.1 for several reasons -- including that anarki doesn't work in Emacs's shell, and I haven't taken the time to figure out^1 why: my hypothesis is that simply removing rlwrap would fix it, but I so rarely use Arc these days I haven't dealt with it.

[1] Nor have I taken the time to respond to your emails from months ago. I'm sorry about that; it's related (among other things) to some general malaise I'm trying to deal with.

-----

3 points by rocketnia 3551 days ago | link

I think this thread is when the bug was raised and fixed: http://arclanguage.org/item?id=10830

-----

1 point by jsgrahamus 3551 days ago | link

Here are some of the results I got:

  Use (quit) to quit, (tl) to return here after an interrupt.

  arc> (def read-all (filename)
       (w/infile file filename
                 (drain (readline file))))
  #<procedure: read-all>
  arc> (read-all "c:\users\steve\desktop\mccf2.txt")
  Error: "UNKNOWN::112: read: no hex digit following \\u in string"
  arc> Error: "_ersstevedesktopmccf2: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:\users\steve\desktop\iiv.txt")")\r\n(read-all "
  arc> #<procedure>
  arc> (read-all "c:\users\steve\desktop\iiv.txt")")\r\nRread-all "
  arc> #<procedure>
  arc> (read-all "c:\users\steve\desktop\xxx2.m3")")\r\n(read-all "
  arc> #<procedure>
  arc> (read-all "c:\users\steve\desktop\mccf.scm")")\r\n(read-all "
  arc> #<procedure>
  arc> (read-all "c:\users\steve\desktop\jsg.xxx")")\r\n(read-all "
  arc> #<procedure>
  arc>

-----

1 point by rocketnia 3551 days ago | link

This is probably what you need:

  (read-all "c:\\users\\steve\\desktop\\mccf2.txt")
What you wrote was a string with \u, which didn't follow through with a complete Unicode escape sequence:

  (read-all "c:\users\steve\desktop\mccf2.txt")
Once the reader got to \u, it raised a parse error, and the REPL continued to process the rest of your input as a new command:

  sers\steve\desktop\mccf2.txt")
The " here started a string, and your next command was interpreted as part of that string.

  (read-all "c:\users\steve\desktop\iiv.txt")
So here we have the end of a string, followed by the symbol c:\users\steve\desktop\iiv.txt followed by the start of another string.

-----

3 points by jsgrahamus 3551 days ago | link

   Use (quit) to quit, (tl) to return here after an interrupt.
   arc> (def read-all (filename)
          (w/infile file filename
                    (drain (readline file))))
   #<procedure: read-all>
   arc> (read-all "c:\\users\\steve\\desktop\\mccf2.txt")
   Error: "_R: undefined;\n cannot reference undefined identifier"
   arc> 1
   1
   arc> (read-all "c:/users/steve/desktop/mccf2.txt")
   Error: "_R: undefined;\n cannot reference undefined identifier"
   arc>

-----

2 points by jsgrahamus 3551 days ago | link

  C:\Users\Steve\Desktop>type mccf.scm
  (define x)
  (call-with-input-file "c:/users/steve/desktop/mccf.txt"
    (lambda (input-port)
      (let loop ((x (read-char input-port)))
        (if (not (eof-object? x))
            (begin
              (display x)
              (loop (read-char input-port)))))))
  C:\Users\Steve\Desktop>

  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (def read-all (filename)
            (w/infile file filename
                      (drain (readline file))))
  #<procedure: read-all>
  arc> (read-all "c:/users/steve/desktop/mccf.scm")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:/users/steve/desktop/mccf.scm")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:\\users\\steve\\desktop\\mccf.scm")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc>

-----

3 points by rocketnia 3550 days ago | link

I've seen this before. What's happening, somehow, is that every time you write more than one line in a definition at the REPL in a Windows prompt, a capital R is being inserted at each newline. Arc compiles this to the Racket code _R, and when Racket executes this, it can't find the _R variable.

I seem to remember I used work around this by always pasting my multi-line definitions from a text editor rather than writing them directly at the REPL.

-----

3 points by jsgrahamus 3550 days ago | link

Thanks for mentioning this. Saw it in a racket repl, too. Reported it to the Racket Users list.

-----

3 points by jsgrahamus 3550 days ago | link

BTW, this is Windows 7x64.

I am pasting the definition into the arc cmd window.

-----

2 points by rocketnia 3550 days ago | link

Oh, sorry. Now that I test it, I realize I remembered incorrectly.

The only time I get those spurious R characters is when I paste code into the REPL and then press enter manually. I don't get them when typing multi-line definitions directly at the REPL, and I don't get them if the code I'm pasting already has a line break at the end.

So the habit I've formed is to make sure the code I'm pasting already has a line break at the end.

I notice this issue also happens on Racket 5.3.3 -- I'm a few versions behind -- and it does not happen in the REPLs for Node.js or Clojure. It's some kind of bug in Racket. (Hmm... Racket's port.c has a bunch of spaghetti code for CRLF processing. Maybe the bug's in there somewhere.)

-----

1 point by akkartik 3550 days ago | link

Oh I wonder if it's a linefeed-newline thing. I know "\r" is the code for linefeed, for example..

-----

2 points by zck 3550 days ago | link

As akkartik says, let's step away from the complex code, and get back to basics. Let's use dir-exists (https://arclanguage.github.io/ref/filesystem.html#dir-exists) to test out how to reference directories.

So let's just see if we can get a 't when we check the existence of C:\users

Here are the four things I'd try:

    (dir-exists "C:/users")
    (dir-exists "C://users")
    (dir-exists "C:\users")
    (dir-exists "C:\\users")
My money's on the first or last one working. (Obviously this assumes you _have_ a `C:\users` directory) I would similarly bet that you might need to capitalize the drive, even though Windows drive letters are case insensitive (https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...). So if it doesn't work with lowercase letters, try it as above.

-----

2 points by jsgrahamus 3550 days ago | link

arc> (dir-exists "c:/users") "c:/users" arc> (dir-exists "c:\\users") "c:\\users"

-----

1 point by akkartik 3551 days ago | link

Very strange. What arc are you using?

Can you try it without the drain, just read the first line from the file?

Edit 10 minutes later: here's a few things I would try:

  ; a relative path with no slashes/backslashes
  (read-all "mccf2.txt")
  ; inline read-all
  (w/infile file "mccf2.txt" (drain (readline file)))
  ; try reading just the first line
  (w/infile file "mccf2.txt" (readline file))

-----

2 points by jsgrahamus 3550 days ago | link

This is arc 3.1

  C:\Users\Steve\Documents\Programming\Lisp\arc\arc3.1>type log.txt
  =====   11:52:29 AM
  =====   11:56:49 AM
  =====   12:10:19 PM
  =====   12:39:31 PM
  =====   1:08:54 PM
  =====   1:11:19 PM
  =====   2:14:21 PM
  =====   2:14:33 PM
  =====   12:36:29 PM
  =====   5:13:08 PM
  =====   9:56:43 AM
  =====   2:36:16 PM
  =====   4:23:45 PM
  =====   2:35:41 PM

  C:\Users\Steve\Documents\Programming\Lisp\arc\arc3.1>dir c:\log.txt
   Volume in drive C is TI105757W0A
   Volume Serial Number is 48C4-C0F7

   Directory of c:\

  12/17/2014  03:40 PM               271 Log.txt
                 1 File(s)            271 bytes
                 0 Dir(s)  61,392,650,240 bytes free

  C:\Users\Steve\Documents\Programming\Lisp\arc\arc3.1>

  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (def read-all (filename)
         (w/infile file filename
                   (drain (readline file))))
  #<procedure: read-all>
    arc> (read-all "Log.txt")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:Log.txt")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:/Log.txt")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (read-all "c:\\Log.txt")
  Error: "_R: undefined;\n cannot reference undefined identifier"
arc>

-----

2 points by jsgrahamus 3550 days ago | link

This seems to be onto something!

  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (def read-all2 (filename)
         (w/infile file filename))
  #<procedure: read-all2>
  arc> (read-all2 "Log.txt")
  Error: "_R: undefined;\n cannot reference undefined identifier"
  arc> (w/infile file "Log.txt" (drain (readline file)))
  ("===== \t11:52:29 AM\r" "===== \t11:56:49 AM\r" "===== \t12:10:19 PM\r" "===== \t12:39:31 PM\r" "===== \t1:08:54 PM\r" "===== \t1:11:19 PM\r" "=====\t2:14:21 PM\r" "===== \t2:14:33 PM\r" "===== \t12:36:29 PM\r" "===== \t5:13:08 PM\r" "===== \t9:56:43 AM\r" "===== \t2:36:16 PM\r" "===== \t4:23:45 PM\r" "===== \t2:35:41 PM\r")
  arc> (w/infile file "Log.txt" (readline file))
  "===== \t11:52:29 AM\r"
  arc>

-----

1 point by akkartik 3550 days ago | link

So it looks like the inlined version works, but wrapping it in a function doesn't? Very strange. Paste these lines one at a time into a fresh arc session and show me what you get in response to each line.

  (w/infile file "Log.txt" (drain (readline file)))  ; just to set a baseline
  (def foo (filename) (prn "AAA") (w/infile f filename (prn "BBB") (drain (do1 (readline f) (prn "CCC")))))
  (foo "Log.txt")
  (def foo (filename) (prn "AAA") (w/infile f filename (prn "BBB") (readline f)))
  (foo "Log.txt")

-----

2 points by jsgrahamus 3550 days ago | link

akkartik, here are the results.

  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (w/infile file "Log.txt" (drain (readline file)))  ; just to set a baseline
  ("===== \t11:52:29 AM\r" "===== \t11:56:49 AM\r" "===== \t12:10:19 PM\r" "===== \t12:39:31 PM\r" "===== \t1:08:54 PM\r" "===== \t1:11\t2:14:21 PM\r" "===== \t2:14:33 PM\r" "===== \t12:36:29 PM\r" "===== \t5:13:08 PM\r" "=====\t9:56:43 AM\r" "===== \t2:36:16 PM\r" "M\r" "=====\t2:35:41 PM\r")
  arc> (def foo (filename) (prn "AAA") (w/infile f filename (prn "BBB") (drain (do1 (readline f) (prn "CCC")))))
  *** redefining foo
  #<procedure: foo>
  arc> (foo "Log.txt")
  AAA
  BBB
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  CCC
  ("===== \t11:52:29 AM\r" "===== \t11:56:49 AM\r" "===== \t12:10:19 PM\r" "===== \t12:39:31 PM\r" "===== \t1:08:54 PM\r" "===== \t1:11\t2:14:21 PM\r" "===== \t2:14:33 PM\r" "===== \t12:36:29 PM\r" "===== \t5:13:08 PM\r" "===== \t9:56:43 AM\r" "===== \t2:36:16 PM\r" "M\r" "===== \t2:35:41 PM\r")
  arc> (def foo (filename) (prn "AAA") (w/infile f filename (prn "BBB") (readline f)))
   *** redefining foo
  #<procedure: foo>
  arc> (foo "Log.txt")
  AAA
  BBB
  "===== \t11:52:29 AM\r"
  arc>

-----

1 point by akkartik 3550 days ago | link

I think rocketnia has figured it out. Does rocketnia's comment http://arclanguage.org/item?id=19137 make sense? Basically you shouldn't get an error if you type in this expression character by character, but you should if you paste it into an arc session without a trailing <enter>.

  (def read-all2 (filename)
    (w/infile file filename))
(Try it out each time as before by running (read-all2 "Log.txt"))

-----

2 points by jsgrahamus 3550 days ago | link

  arc> (def read-all (filename) (w/infile file filename (drain (readline file))))
  arc> (read-all "Log.txt")
("===== \t11:52:29 AM\r" "===== \t11:56:49 AM\r" "===== \t12:10:19 PM\r" "===== \t12:39:31 PM\r" "===== \t1:08:54 PM\r" "===== \t1:11:19 PM\r" "=====\t2:14:21 PM\r" "===== \t2:14:33 PM\r" "===== \t12:36:29 PM\r" "===== \t5:13:08 PM\r" "===== \t9:56:43 AM\r" "===== \t2:36:16 PM\r" "===== \t4:23:45 PM\r" "===== \t2:35:41 PM\r")

Thanks for that.

It does appear that the problem is with pasting into the repl. So, how does one hook up arc with Emacs?

Thanks to all those who chimed in with help. Great community here.

Steve

-----

1 point by jsgrahamus 3551 days ago | link

It doesn't seem to deal nicely with control characters. Not sure why the rest of the results are as they are.

-----

2 points by jsgrahamus 3550 days ago | link

Well, here's reading it into a list, which is probably the next best thing.

   (= alist (w/infile file "c:/users/steve/desktop/mccf.txt" (drain (readline file))))
Thanks for all of the help.

-----

2 points by jsgrahamus 3550 days ago | link

More interesting answers/questions.

From arc.arc:

  (def read ((o x (stdin)) (o eof nil))
    (if (isa x 'string) (readstring1 x eof) (sread x eof)))

  ; inconsistency between names of readfile[1] and writefile

  (def readfile (name) (w/infile s name (drain (read s))))

  (def readfile1 (name) (w/infile s name (read s)))

  (def readall (src (o eof nil))
    ((afn (i)
      (let x (read i eof)
        (if (is x eof)
            nil
            (cons x (self i)))))
     (if (isa src 'string) (instring src) src)))
===

  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (def read-all (filename) \
            (w/infile file filename \
                      (drain (readline file))))
  #<procedure: read-all>
  arc> (read-all "c:Log.txt")
  Error: "|_\r|: undefined;\n cannot reference undefined identifier"
  arc> (readfile "c:Log.txt")
  (===== 11:52:29 AM ===== 11:56:49 AM ===== 12:10:19 PM ===== 12:39:31 PM ===== 1:08:54 PM ===== 1:11:19 PM ===== 2:14:21 PM ===== 2:14:33 PM ===== 12:36:29 PM ===== 5:13:08 PM ===== 9:56:43 AM ===== 2:36:16 PM ===== 4:23:45 PM ===== 2:35:41 PM)
  arc> (readfile "c:/users/steve/desktop/mccf.txt")
  Error: "c:/users/steve/desktop/mccf.txt::509: read: bad syntax `# '"
  arc> (readfile "c:\\users\\steve\\desktop\\mccf.txt")
  Error: "c:\\users\\steve\\desktop\\mccf.txt::509: read: bad syntax `# '"
  arc> (readfile1 "c:\\users\\steve\\desktop\\mccf.txt")
  DEVISC1A1:DEVVCC>D
  arc> (readall "c:\\users\\steve\\desktop\\mcc.txt")
  (c:usersstevedesktopmcc.txt)
  arc> (readall "c:/users/steve/desktop/mcc.txt")
  (c:/users/steve/desktop/mcc.txt)
  arc> (readall "c:/users/steve/desktop/mcc.txt" (o))
  Error: "_o: undefined;\n cannot reference undefined identifier"
  arc>

-----