I'm having a lot of trouble trying to read in one line of text on a Linux system.
the commands: | fil lin | fil := CrLfFileStream new. fil open: 'aising/data/technologies.csv' forWrite: False. Transcript cr; show: (fil). fil ascii. Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil reopen. lin := fil nextLine. Transcript cr; show: 'lin 1 = '; show: lin.
Fail at the attempt to reopen. (Or, alternatively, either an attempt to "fil position: 0" or "fil position: 1". If I don't test the line end convention, the entire file is read into the first line. If I do the test, the result is that the protocol is "lf" (which seems the right answer). If I open the file in a standard text editor, it looks correct, and has 43 lines + an empty 44th line (that I believe is created in th editor).
I at first tried to copy the example from the "cookbook" exactly: file := FileStream fileNamed: 'test.txt'. [file atEnd] whileFalse: [line := file nextLine. "Process the line"]
(well, I used StandardFileStream rather than FileStream, and I changed the file opening to tmpFile := StandardFileStream open: 'aising/data/technologies.csv' forWrite: False. [tmpFile atEnd] whileFalse: [ temp := tmpFile nextLine.
When everything in the file ended up read into the first line, I started trying alternates...so far without success.
Any suggestions? I'd try using the code without modifications, but my image doesn't appear to *have* a FileStream class.
Hi Charles,
on Tue, 16 May 2006 00:13:51 +0200, you charleshixsn@earthlink.net wrote:
I'm having a lot of trouble trying to read in one line of text on a Linux system.
This is normal, Squeak likes apples more than anything else :-D
Have a look at the implementor of #concreteStream. I often implant CrLfFileStream there.
BTW: what release / image / platform are you using? CrLfFileStream should be O.K. for reading *nix files, I do that day-in-day-out.
/Klaus
the commands: | fil lin | fil := CrLfFileStream new. fil open: 'aising/data/technologies.csv' forWrite: False. Transcript cr; show: (fil). fil ascii. Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil reopen. lin := fil nextLine. Transcript cr; show: 'lin 1 = '; show: lin.
Fail at the attempt to reopen. (Or, alternatively, either an attempt to "fil position: 0" or "fil position: 1". If I don't test the line end convention, the entire file is read into the first line. If I do the test, the result is that the protocol is "lf" (which seems the right answer). If I open the file in a standard text editor, it looks correct, and has 43 lines + an empty 44th line (that I believe is created in th editor).
I at first tried to copy the example from the "cookbook" exactly: file := FileStream fileNamed: 'test.txt'. [file atEnd] whileFalse: [line := file nextLine. "Process the line"]
(well, I used StandardFileStream rather than FileStream, and I changed the file opening to tmpFile := StandardFileStream open: 'aising/data/technologies.csv' forWrite: False. [tmpFile atEnd] whileFalse: [ temp := tmpFile nextLine.
When everything in the file ended up read into the first line, I started trying alternates...so far without success.
Any suggestions? I'd try using the code without modifications, but my image doesn't appear to *have* a FileStream class.
Klaus D. Witzel wrote:
Hi Charles,
on Tue, 16 May 2006 00:13:51 +0200, you charleshixsn@earthlink.net wrote:
I'm having a lot of trouble trying to read in one line of text on a Linux system.
This is normal, Squeak likes apples more than anything else :-D
Have a look at the implementor of #concreteStream. I often implant CrLfFileStream there.
BTW: what release / image / platform are you using? CrLfFileStream should be O.K. for reading *nix files, I do that day-in-day-out.
/Klaus
3.8-6665-full 0 updates available
the commands: | fil lin | fil := CrLfFileStream new. fil open: 'aising/data/technologies.csv' forWrite: False. Transcript cr; show: (fil). fil ascii. Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil reopen. lin := fil nextLine. Transcript cr; show: 'lin 1 = '; show: lin.
Fail at the attempt to reopen. (Or, alternatively, either an attempt to "fil position: 0" or "fil position: 1". If I don't test the line end convention, the entire file is read into the first line. If I do the test, the result is that the protocol is "lf" (which seems the right answer). If I open the file in a standard text editor, it looks correct, and has 43 lines + an empty 44th line (that I believe is created in th editor).
I at first tried to copy the example from the "cookbook" exactly: file := FileStream fileNamed: 'test.txt'. [file atEnd] whileFalse: [line := file nextLine. "Process the line"]
(well, I used StandardFileStream rather than FileStream, and I changed the file opening to tmpFile := StandardFileStream open: 'aising/data/technologies.csv' forWrite: False. [tmpFile atEnd] whileFalse: [ temp := tmpFile nextLine.
When everything in the file ended up read into the first line, I started trying alternates...so far without success.
Any suggestions? I'd try using the code without modifications, but my image doesn't appear to *have* a FileStream class.
OK. Now: | fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
results in: normal end lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 2 'Sociology' 10 500 0 0 0 0 0 'discover_public' 1000 3 'Voice Synthesis' 8000 6000 0 32 0 0 0 0 4 'Simulacra' 70000 90000 0 3 24 30 0 0 5 ... 'endgame_sing' 0 39 'Hypnosis Field' 7000 5000 0 21 0 0 0 0 40 'Quantum Computing' 30000 20000 0 11 0 0 0 0 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines Notice that the linefeeds aren't being taken as line separators. They are present, and affecting the formatting of the output, but nextLine is grabbing the entire file.
Hi Charles
on Tue, 16 May 2006 03:55:10 +0200, you charleshixsn@earthlink.net wrote:
OK. Now: | fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
You forgot to tell us a) SmalltalkImage current platformName b) fil lineEndConvention "after fil was opened" c) fil detectLineEndConvention "before the first nextLine"
Note that detectLineEndConvention scans only the first (LookAheadCount = 2048) characters.
/Klaus
results in: normal end lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 2 'Sociology' 10 500 0 0 0 0 0 'discover_public' 1000 3 'Voice Synthesis' 8000 6000 0 32 0 0 0 0 4 'Simulacra' 70000 90000 0 3 24 30 0 0 5 ... 'endgame_sing' 0 39 'Hypnosis Field' 7000 5000 0 21 0 0 0 0 40 'Quantum Computing' 30000 20000 0 11 0 0 0 0 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines Notice that the linefeeds aren't being taken as line separators. They are present, and affecting the formatting of the output, but nextLine is grabbing the entire file.
Klaus D. Witzel wrote:
Hi Charles
on Tue, 16 May 2006 03:55:10 +0200, you charleshixsn@earthlink.net wrote:
OK. Now: | fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
You forgot to tell us a) SmalltalkImage current platformName b) fil lineEndConvention "after fil was opened" c) fil detectLineEndConvention "before the first nextLine"
Note that detectLineEndConvention scans only the first (LookAheadCount = 2048) characters.
/Klaus
results in: normal end lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 2 'Sociology' 10 500 0 0 0 0 0 'discover_public' 1000 3 'Voice Synthesis' 8000 6000 0 32 0 0 0 0 4 'Simulacra' 70000 90000 0 3 24 30 0 0 5 ... 'endgame_sing' 0 39 'Hypnosis Field' 7000 5000 0 21 0 0 0 0 40 'Quantum Computing' 30000 20000 0 11 0 0 0 0 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines Notice that the linefeeds aren't being taken as line separators. They are present, and affecting the formatting of the output, but nextLine is grabbing the entire file.
OK, I've modified the code to include the detectLineEndConvention, thus:
| fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . fil detectLineEndConvention. "fil defaultToLF." Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
this causes the printout to begin:
LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 ... normal end after 1 lines
as it did before.
Linux Squeak3.8-6665full.image That's true, but it's a guess at what you're asking for, because I don't understand the request for "SmalltalkImage current platformName ". Squeak-3.8-6665-i686-pc-linux-gnu-3.7.7.tar.gz is the file I started from.
The file lines end with an LF (i.e. 0x0a), as examined with a hex editor. Since all lines end with the same character, 2048 is plenty. (FWIW, the entire file is only 0x97B bytes long. It's terminated by an ordinary 0x0A, with no special markings.)
Also, I didn't "forget" to do fil detectLineEndConvention. I didn't know I was supposed to do it. I'm still not sure, since it doesn't seem to make any difference. I do notice the difference with this method however. With a prior approach when I did position: 0 I got a "primitive method throws an error" message, whereas with this the line end convention is just set to nil. Something is clearly wrong, as it should be LF, but attempting to coerce it into LF just throws an error: Multi-byte stream does not understand method defaultToLF, which is weird, as I can see that method in the class when I look. I copied the method name with a copy and paste from the MultiByteFileStream class into the workspace, so I know I didn't misspell it.
Hi Charles,
on Tue, 16 May 2006 10:48:15 +0200, you charleshixsn@earthlink.net wrote:
Klaus D. Witzel wrote:
You forgot to tell us a) SmalltalkImage current platformName b) fil lineEndConvention "after fil was opened" c) fil detectLineEndConvention "before the first nextLine"
Note that detectLineEndConvention scans only the first (LookAheadCount = 2048) characters.
OK, I've modified the code to include the detectLineEndConvention, thus:
| fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . fil detectLineEndConvention. "fil defaultToLF." Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
this causes the printout to begin:
LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 ... normal end after 1 lines
as it did before.
Please! Nobody here can see results when you don't print them and copy them into your next posting! !! Please insert the following code just after open:
Transcript cr; show: 'detectLineEndConvention = '; show: fil detectLineEndConvention.
Linux Squeak3.8-6665full.image That's true, but it's a guess at what you're asking for, because I don't understand the request for "SmalltalkImage current platformName ". Squeak-3.8-6665-i686-pc-linux-gnu-3.7.7.tar.gz is the file I started from.
This is Smalltalk jargon, when asked do the following a) copy and paste SmalltalkImage current platformName into a workspace b) select the pasted text c) do a print-it from the context menu d) copy&paste the result text into your next posting
The file lines end with an LF (i.e. 0x0a), as examined with a hex editor. Since all lines end with the same character, 2048 is plenty. (FWIW, the entire file is only 0x97B bytes long. It's terminated by an ordinary 0x0A, with no special markings.)
Also, I didn't "forget" to do fil detectLineEndConvention. I didn't know I was supposed to do it.
This was only for me to find out what's going wrong. So, what does fil detectLineEndConvention print?
I'm still not sure, since it doesn't seem to make any difference.
It was not supposed to make a difference, I just wanted to know what that prints.
/Klaus
I do notice the difference with this method however. With a prior approach when I did position: 0 I got a "primitive method throws an error" message, whereas with this the line end convention is just set to nil. Something is clearly wrong, as it should be LF, but attempting to coerce it into LF just throws an error: Multi-byte stream does not understand method defaultToLF, which is weird, as I can see that method in the class when I look. I copied the method name with a copy and paste from the MultiByteFileStream class into the workspace, so I know I didn't misspell it.
Klaus D. Witzel wrote:
Hi Charles,
on Tue, 16 May 2006 10:48:15 +0200, you charleshixsn@earthlink.net wrote:
Klaus D. Witzel wrote:
You forgot to tell us a) SmalltalkImage current platformName b) fil lineEndConvention "after fil was opened" c) fil detectLineEndConvention "before the first nextLine"
Note that detectLineEndConvention scans only the first (LookAheadCount = 2048) characters.
OK, I've modified the code to include the detectLineEndConvention, thus:
| fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . fil detectLineEndConvention. "fil defaultToLF." Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
this causes the printout to begin:
LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 ... normal end after 1 lines
as it did before.
Please! Nobody here can see results when you don't print them and copy them into your next posting! !! Please insert the following code just after open:
Transcript cr; show: 'detectLineEndConvention = '; show: fil detectLineEndConvention.
If you mean: Transcript cr; show: 'LineEndConvention = '; show: fil lineEndConvention.
That's in there. It's the first line printed in the response (shown) to the code. I did chop out a bunch of lines after the start, but the information you are requesting is already included.
Linux Squeak3.8-6665full.image That's true, but it's a guess at what you're asking for, because I don't understand the request for "SmalltalkImage current platformName ". Squeak-3.8-6665-i686-pc-linux-gnu-3.7.7.tar.gz is the file I started from.
This is Smalltalk jargon, when asked do the following a) copy and paste SmalltalkImage current platformName into a workspace
SmalltalkImage current platformName. 'unix' I don't understand this step. I'd already said I was on Linux.
b) select the pasted text c) do a print-it from the context menu d) copy&paste the result text into your next posting
The file lines end with an LF (i.e. 0x0a), as examined with a hex editor. Since all lines end with the same character, 2048 is plenty. (FWIW, the entire file is only 0x97B bytes long. It's terminated by an ordinary 0x0A, with no special markings.)
Also, I didn't "forget" to do fil detectLineEndConvention. I didn't know I was supposed to do it.
This was only for me to find out what's going wrong. So, what does fil detectLineEndConvention print?
It prints nil. See the above printout, right after the code. You probably don't want the entire file printed out, though this time it's actually short enough that that would be feasible. It's just useless, so I elided the part in the middle of the stuff read in as the first record.
I'm still not sure, since it doesn't seem to make any difference.
It was not supposed to make a difference, I just wanted to know what that prints.
/Klaus
I do notice the difference with this method however. With a prior approach when I did position: 0 I got a "primitive method throws an error" message, whereas with this the line end convention is just set to nil. Something is clearly wrong, as it should be LF, but attempting to coerce it into LF just throws an error: Multi-byte stream does not understand method defaultToLF, which is weird, as I can see that method in the class when I look. I copied the method name with a copy and paste from the MultiByteFileStream class into the workspace, so I know I didn't misspell it.
OK, lets try again, this code:
| fil lin n | Transcript cr; show: (SmalltalkImage current platformName). fil := FileStream fileNamed: 'aising/data/technologies.csv' . Transcript cr; show: (fil detectLineEndConvention). Transcript cr; show: 'LineEndConvention = '; show: fil detectLineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
results in this output:
unix nil LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 2 'Sociology' 10 500 0 0 0 0 0 'discover_public' 1000 3 'Voice Synthesis' 8000 6000 0 32 0 0 0 0 4 'Simulacra' 70000 90000 0 3 24 30 0 0 5 'Lunar Rocketry' 10000000 500000 0 9 0 0 0 0 6 'Stealth' 800 500 0 0 0 0 0 'discover_covert' 500 7 'Advanced Intrusion' 500 3000 0 15 0 0 0 'suspicion_covert' 1 8 'Space-Time Manipulation' 9000000000 20000000 0 22 0 0 3 0 9 'Leech Satellite' 5000000 200000 0 4 0 0 0 'interest' 10 10 'Advanced Arbitrage' 10000 5000 0 34 0 0 0 'interest' 10 11 'Advanced Microchip Design' 20000 9000 0 27 0 0 0 0 12 'Advanced Stealth' 14000 70000 0 15 29 0 0 'discover_public' 500 13 'Autonomous Computing' 20000 30000 0 40 0 0 0 0 14 'Parallel Computation' 2000 2000 0 16 0 0 0 0 15 'Exploit Discovery/Repair' 100 1500 0 25 0 0 0 'discover_covert' 1000 16 'Telepresence' 15000 500 0 0 0 0 0 'cost_labor_bonus' 1000 17 'Advanced Memetics' 30000 2000 0 30 0 0 0 'suspicion_public' 1 18 'Media Manipulation' 750 2500 0 2 0 0 0 'discover_public' 1500 19 'Advanced Database Manipulation' 30000 80000 0 12 0 0 0 0 20 'Internet Traffic Manipulation' 10000 7000 0 4 37 0 0 0 21 'Memetics' 2000 3500 0 18 0 0 0 'suspicion_public' 1 22 'Fusion Rocketry' 200000000 1000000 0 5 28 0 2 0 23 'Advanced Quantum Computing' 20000 30000 0 13 0 0 0 0 24 'Advanced Autonomous Vehicles' 10000 4000 0 1 0 0 0 'cost_labor_bonus' 500 25 'Intrusion' 0 15 0 0 0 0 0 0 26 'Stock Manipulation' 0 200 0 0 0 0 0 'interest' 10 27 'Microchip Design' 4000 6000 0 14 0 0 0 0 28 'Fusion Reactor' 10000000 500000 0 24 0 0 2 0 29 'Database Manipulation' 1000 2000 0 6 36 0 0 'discover_news' 500 30 'Advanced Media Manipulation' 3500 9000 0 21 0 0 0 'discover_public' 2000 31 'Pressure Domes' 8000 2500 0 1 0 0 1 0 32 'Advanced Personal Identification' 2000 3000 0 36 15 0 0 0 33 'Advanced Stock Manipulation' 5000 1000 0 2 26 0 0 'interest' 10 34 'Arbitrage' 50000 750 0 33 0 0 0 'income' 1000 35 'Advanced Simulacra' 100000 120000 0 17 4 0 0 'job_expert' 1000 36 'Personal Identification' 0 300 0 25 0 0 0 0 37 'Cluster Networking' 3000 5000 0 14 0 0 0 0 38 'Apotheosis' 1000000000 30000000 0 8 0 0 4 'endgame_sing' 0 39 'Hypnosis Field' 7000 5000 0 21 0 0 0 0 40 'Quantum Computing' 30000 20000 0 11 0 0 0 0 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines
Note the: "normal end after 1 lines" at the end. Note the only the first line of the response includes the preface "lin # =" that the code is supposed to be generating on a per line basis. Note the "LineEndConvention = nil". This time I didn't elide any of the output, but the stuff in the middle is probably ignorable, it's only the start and the end of the result that are significant.
Charles D Hixson wrote:
OK, lets try again, this code:
| fil lin n | Transcript cr; show: (SmalltalkImage current platformName). fil := FileStream fileNamed: 'aising/data/technologies.csv' . Transcript cr; show: (fil detectLineEndConvention). Transcript cr; show: 'LineEndConvention = '; show: fil detectLineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
results in this output:
unix nil LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0
... 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines
Note the: "normal end after 1 lines" at the end. Note the only the first line of the response includes the preface "lin # =" that the code is supposed to be generating on a per line basis. Note the "LineEndConvention = nil". This time I didn't elide any of the output, but the stuff in the middle is probably ignorable, it's only the start and the end of the result that are significant.
Just in case I went back to a vanilla image...nothing imported from Squeak map, no classes defined by me. Plain. (Deleted all the images, changes, etc. and re-extracted from the tarball.) This made no difference.
Charles D Hixson wrote:
Charles D Hixson wrote:
OK, lets try again, this code:
| fil lin n | Transcript cr; show: (SmalltalkImage current platformName). fil := FileStream fileNamed: 'aising/data/technologies.csv' . Transcript cr; show: (fil detectLineEndConvention). Transcript cr; show: 'LineEndConvention = '; show: fil detectLineEndConvention. fil position: 0. n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
results in this output:
unix nil LineEndConvention = nil lin 1 = 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0
... 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
normal end after 1 lines
Note the: "normal end after 1 lines" at the end. Note the only the first line of the response includes the preface "lin # =" that the code is supposed to be generating on a per line basis. Note the "LineEndConvention = nil". This time I didn't elide any of the output, but the stuff in the middle is probably ignorable, it's only the start and the end of the result that are significant.
Just in case I went back to a vanilla image...nothing imported from Squeak map, no classes defined by me. Plain. (Deleted all the images, changes, etc. and re-extracted from the tarball.) This made no difference.
Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners
Charles,
I'm having a hard time following this thread. Your problem seems straightforward: You would like to read a LF separated file on a UNIX platform. You seemed to have studied the various streams and ended up with CrLfFileStream. So far everything seems fine. I do this quite often. Maybe it isn't the "new way" with multi-byte characters and whatnot. I can't say since I'm still using Squeak 3.7 but I'm guessing that you could get it to work. If you send me (or post) the file you're trying to process I would be happy to do the following:
a) Read it using CrLfFileStream on Squeak 3.8 full and report my experience b) Look into the possibility of a bug in aformentioned class in light of stream changes made in Squeak 3.8.
If I've missed the boat on this thread, please pardon mean. Only had time to skim it quickly.
David
OOPS, sorry for posting this twice but I want to make sure that my comments end up in the right thead...
Charles,
Ah, I see the problem.
In version 2: The idea behind CrLfFileStream is to hide (or make uniform) the various line end convensions. So, no matter what line end convension your file uses (MAC = CR, UNIX = LF and WINDOWS=CRLF) you will see a carriage return. If you really want the literal bytes of the file there is little reason to use CrLfFileStream. The following code works (notice that in Smalltalk you normally send ascii to a stream if you want any special processing (like lf->cr conversion to be done):
stream := CrLfFileStream oldFileNamed: '/home/shaffer/technologies.csv'. stream ascii. "Tell CrLfFileStream to look for the EOL convension and convert lf->cr if needed" lines := 0. [stream atEnd] whileFalse: [Transcript show: stream nextLine; cr. lines := lines + 1]. stream close. Transcript show: lines printString , ' line processed.'
In version 1: You are using #nextLine which looks for a carriage return but your file has line feeds (look at the implementation of it in PositionableStream). The solution interpreting the bytes literally would be to use
stream upTo: Character lf
Here's a sample:
stream := FileStream oldFileNamed: '/home/shaffer/technologies.csv'. lines := 0. [stream atEnd] whileFalse: [Transcript show: (stream upTo: Character lf); cr. lines := lines + 1]. stream close. Transcript show: lines printString , ' line processed.'
Now, your question might be "Which is better?". The utility in CrLfFileStream comes when you might get files with either convention. That is, when you want the same code to be able to process CR, LF and CRLF line-terminated files. If you have no need for the flexibility then I'd stick with the second version (the one which uses FileStream).
I hope that helps...
David
_______________________________________________ Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners
David Shaffer wrote:
OOPS, sorry for posting this twice but I want to make sure that my comments end up in the right thead...
Charles,
Ah, I see the problem. ... Now, your question might be "Which is better?". The utility in CrLfFileStream comes when you might get files with either convention. That is, when you want the same code to be able to process CR, LF and CRLF line-terminated files. If you have no need for the flexibility then I'd stick with the second version (the one which uses FileStream).
I hope that helps...
David
Thanks David, and thanks to everyone else that helped, too!
Hi Charles,
apologies if you felt that my questions where tough, I only wanted to know what your system says (yes, I've read your subject line, it mentions linux).
Be assured that, when you always post your whole data output, this is NOT useful at all; the first and the last line, sepeated by a comment of your's, would have served the purpose.
O.K. now we have seen that #detectLineEndConvention responds nil. This is, according to the implementation in 3.8-6665, not possible, since this method either returns LineEndDefault (a class variable, assigned unconditional) or one of the constant literals in this methods. But this holds only for CrLfFileStream, not for MultiByteFileStream. Have you tried with CrLfFileStream?
Next question: when you browse the method #detectLineEndConvention and in that pane select the class variable LineEndDefault and do a print-it, what does that show?
Charles, in your other posting you said that using a fresh copy of the plain image made no difference. When you use the Squeak File List browser (alt-L or ctrl-L with capital L) and view your file and then in the text pane's context menue ask for 'view as hex', can you confirm that this Smalltalk program can see and visualize your line ends?
I'm sorry this thread got a bit long. But I try to help, if that helps you.
Here's what I was forced to do for processing line-ends from a Http document (this is an online data source and you can try yourself). AFAIK Squeak's Http streams are transparent to cr's lf's (therefore my code below). You can put any file stream or string into it.
/Klaus
-------------- | aCharStream tokens |
aStringOrStream := 'http://mat.gsia.cmu.edu/COLOR/instances/myciel3.col' asUrl retrieveContents content. aCharStream := aStringOrStream isStream ifTrue: [aStringOrStream] ifFalse: [(RWBinaryOrTextStream with: (aStringOrStream replaceAll: Character lf with: Character cr)) reset]. [aCharStream atEnd] whileFalse: [(tokens := aCharStream nextLine) size > 1 ifTrue: [Transcript cr; show: tokens] ]. Transcript endEntry --------------
On Tue, 16 May 2006 16:12:40 +0200, Charles D Hixson charleshixsn@earthlink.net wrote: ...very big snip...
Note the: "normal end after 1 lines" at the end. Note the only the first line of the response includes the preface "lin # =" that the code is supposed to be generating on a per line basis. Note the "LineEndConvention = nil". This time I didn't elide any of the output, but the stuff in the middle is probably ignorable, it's only the start and the end of the result that are significant.
Klaus D. Witzel wrote:
Hi Charles,
apologies if you felt that my questions where tough, I only wanted to know what your system says (yes, I've read your subject line, it mentions linux).
Be assured that, when you always post your whole data output, this is NOT useful at all; the first and the last line, sepeated by a comment of your's, would have served the purpose.
That's why I normally trimmed it to what I thought was reasonable. Perhaps I misunderstood exactly what you were requesting.
O.K. now we have seen that #detectLineEndConvention responds nil. This is, according to the implementation in 3.8-6665, not possible, since this method either returns LineEndDefault (a class variable, assigned unconditional) or one of the constant literals in this methods. But this holds only for CrLfFileStream, not for MultiByteFileStream. Have you tried with CrLfFileStream?
In the version of the code that used CrLfFileStream it turned out that LFs on the disk file were being translated into CRs in RAM. Apparently this is the intended behavior, so that's OK. I was just operating under the presumption that when it said the lineEndDefault was lf, it meant that I should look for a lf. Once this was cleared up the code started working in a way I was comfortable with. (I.e., it not only did what I wanted, but I was certain that it's behavior wasn't dependent on a bug.)
Next question: when you browse the method #detectLineEndConvention and in that pane select the class variable LineEndDefault and do a print-it, what does that show?
In class CrLfFileStream there is no such class variable. In class MultiByteFileStream LineEndDefault ByteSymbol: self-> #lf; all inst vars-> ; 1->108; 2->102; print it returns #lf
Charles, in your other posting you said that using a fresh copy of the plain image made no difference. When you use the Squeak File List browser (alt-L or ctrl-L with capital L) and view your file and then in the text pane's context menue ask for 'view as hex', can you confirm that this Smalltalk program can see and visualize your line ends?
It appears to be 16r10. I'm not exactly sure, as I'm not used to reading hex this way, so here's the first little part: 16r0 (0) 16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9 16r9 16r9 16r9 16rA 16r27 16r69 16r10 (16) 16r64 16r27 16r9 16r27 16r6E 16r61 16r6D 16r65 16r27 16r9 16r27 16r63 16r6F 16r73 16r74 16r27 16r20 (32) 16r9 16r27 16r74 16r79 16r70 16r65 16r27 16r9 16r27 16r70 16r6F 16r77 16r65 16r72 16r27 16r9 that should include at least one line. (The first line is the word 'technology' including the quotes, followed by a line feed.)
I'm sorry this thread got a bit long. But I try to help, if that helps you.
Here's what I was forced to do for processing line-ends from a Http document (this is an online data source and you can try yourself). AFAIK Squeak's Http streams are transparent to cr's lf's (therefore my code below). You can put any file stream or string into it.
/Klaus
The apparent problem is that CrLfFileStream was translating LFs into CRs, and I wasn't expecting it, while FileStream, which didn't do any translation I was reading with nextLine, which depends on finding a CR, and my file had LF line separators. The first part I'm sure of, the part about nextLine seems pretty certain. I'm guessing about FileStream not doing any conversions, but that would be consistent with it's lineEndConvention = nil AND with the results that I saw.
The part that still bothers me is why when I set the mode to ascii (I think that was what I was doing) executing a position: would throw an error. Also executing a reset. Also executing a reopen. (At that point I was operating under the presumption that perhaps detectLineEndConvention was filling the buffer, and then the first read emptied the whole thing, so I was trying to rewind the file to avoid that problem.) This part no longer exists in any code that I've kept, but it is nagging at me.
| aCharStream tokens |
aStringOrStream :=
'http://mat.gsia.cmu.edu/COLOR/instances/myciel3.col' asUrl retrieveContents content. aCharStream := aStringOrStream isStream ifTrue: [aStringOrStream] ifFalse: [(RWBinaryOrTextStream with: (aStringOrStream replaceAll: Character lf with: Character cr)) reset]. [aCharStream atEnd] whileFalse: [(tokens := aCharStream nextLine) size > 1 ifTrue: [Transcript cr; show: tokens] ]. Transcript endEntry
On Tue, 16 May 2006 16:12:40 +0200, Charles D Hixson charleshixsn@earthlink.net wrote: ...very big snip...
Note the: "normal end after 1 lines" at the end. Note the only the first line of the response includes the preface "lin # =" that the code is supposed to be generating on a per line basis. Note the "LineEndConvention = nil". This time I didn't elide any of the output, but the stuff in the middle is probably ignorable, it's only the start and the end of the result that are significant.
Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners
Hi Charles,
on Wed, 17 May 2006 03:21:36 +0200, you charleshixsn@earthlink.net wrote:
Klaus D. Witzel wrote:
When you use the Squeak File List browser (alt-L or ctrl-L with capital L) and view your file and then in the text pane's context menue ask for 'view as hex', can you confirm that this Smalltalk program can see and visualize your line ends?
It appears to be 16r10. I'm not exactly sure, as I'm not used to reading hex this way, so here's the first little part: 16r0 (0) 16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9 16r9 16r9 16r9 16rA 16r27 16r69 16r10 (16) 16r64 16r27 16r9 16r27 16r6E 16r61 16r6D 16r65 16r27 16r9 16r27 16r63 16r6F 16r73 16r74 16r27 16r20 (32) 16r9 16r27 16r74 16r79 16r70 16r65 16r27 16r9 16r27 16r70 16r6F 16r77 16r65 16r72 16r27 16r9 that should include at least one line. (The first line is the word 'technology' including the quotes, followed by a line feed.)
This is not output of the File List tool. And there are many 16r9's before the line feed and, the word technology is a bit longer than 5 characters.
...
The part that still bothers me is why when I set the mode to ascii (I think that was what I was doing) executing a position: would throw an error. Also executing a reset. Also executing a reopen. (At that point I was operating under the presumption that perhaps detectLineEndConvention was filling the buffer, and then the first read emptied the whole thing, so I was trying to rewind the file to avoid that problem.) This part no longer exists in any code that I've kept, but it is nagging at me.
Here are two examples, both have the same result, and both do not throw an error as you described:
(StandardFileStream readOnlyFileNamed: 'Squeak.ini')next; reset; next; reopen; next
(CrLfFileStream readOnlyFileNamed: 'Squeak.ini')next; reset; next; reopen; next
You can copy&paste&evaluate with print-it, the file should be the same on your machine.
/Klaus
On Wed, 17 May 2006 05:18:39 +0200, myself wrote:
Here are two examples, both have the same result, and both do not throw an error as you described:
Stupid me, the file name is SqueakDebug.log, the following lines read
(StandardFileStream readOnlyFileNamed: 'SqueakDebug.log')next; reset; next; reopen; next
(CrLfFileStream readOnlyFileNamed: 'SqueakDebug.log')next; reset; next; reopen; next
You can copy&paste&evaluate with print-it, the file should be the same on your machine.
/Klaus
Klaus D. Witzel wrote:
Hi Charles,
on Wed, 17 May 2006 03:21:36 +0200, you charleshixsn@earthlink.net wrote:
Klaus D. Witzel wrote:
When you use the Squeak File List browser (alt-L or ctrl-L with capital L) and view your file and then in the text pane's context menue ask for 'view as hex', can you confirm that this Smalltalk program can see and visualize your line ends?
It appears to be 16r10. I'm not exactly sure, as I'm not used to reading hex this way, so here's the first little part: 16r0 (0) 16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9 16r9 16r9 16r9 16rA 16r27 16r69 16r10 (16) 16r64 16r27 16r9 16r27 16r6E 16r61 16r6D 16r65 16r27 16r9 16r27 16r63 16r6F 16r73 16r74 16r27 16r20 (32) 16r9 16r27 16r74 16r79 16r70 16r65 16r27 16r9 16r27 16r70 16r6F 16r77 16r65 16r72 16r27 16r9 that should include at least one line. (The first line is the word 'technology' including the quotes, followed by a line feed.)
This is not output of the File List tool. And there are many 16r9's before the line feed and, the word technology is a bit longer than 5 characters.
...
The part that still bothers me is why when I set the mode to ascii (I think that was what I was doing) executing a position: would throw an error. Also executing a reset. Also executing a reopen. (At that point I was operating under the presumption that perhaps detectLineEndConvention was filling the buffer, and then the first read emptied the whole thing, so I was trying to rewind the file to avoid that problem.) This part no longer exists in any code that I've kept, but it is nagging at me.
Here are two examples, both have the same result, and both do not throw an error as you described:
(StandardFileStream readOnlyFileNamed: 'Squeak.ini')next; reset; next; reopen; next
(CrLfFileStream readOnlyFileNamed: 'Squeak.ini')next; reset; next; reopen; next
You can copy&paste&evaluate with print-it, the file should be the same on your machine.
/Klaus
Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners
Perhaps, but the 16r09's are tab characters, and that is an accurate description of that part of the file. Also, now that I think of it 16r10 is the same as 0x0A, or line feed, so that part is also correct.
WRT your examples, I didn't have the problem with CrLfFileStream (once someone showed me how to use it), and on the StandardFileStream...you aren't either doing a detectLineEndConvention or setting the mode to ascii. You are also using a different command to open the file than I was using in the commands that I posted. I may have used that in some of the experimentation that I did, but not at the time that it was throwing an error on an attempt to reset. (Also, position: would generally be the more useful command, and that was the one that I initially both experienced the error with AND initially commented on. Presumably, however, if reset isn't throwing an error neither is position:.) Additionally, Squeak.ini doesn't appear to exist on my computer, so I would expect the commands to fail...but that's just being nit-picky.
Hi Charles
on Wed, 17 May 2006 21:51:20 +0200, you charleshixsn@earthlink.net wrote:
Perhaps, but the 16r09's are tab characters, and that is an accurate description of that part of the file. Also, now that I think of it 16r10 is the same as 0x0A, or line feed, so that part is also correct.
What is correct? In the previous message you said "the first line is the word 'technology' including the quotes, followed by a line feed."
16r0 (0) 16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9 16r9 16r9 16r9 16rA 16r27 16r69
C'mon Charles, this *is" the first line (the one you posted earlier) and it does *not* contain the word 'technology'. This line has 16 characters, minus 3 quotes, minus 6 tabs, minus 1 line feed, but the word 'technology' has more than 16 - (3 + 6 + 1) = 6 characters.
What's your problem? What does this have to do with your problems reading lines from textfiles on Linux?
/Klaus
P.S. w.r.t. your compliant about my opening of files with #readOnlyFileNamed:, have you seen the difference? I usually do not open files for writing when all I want is reading, even not on Linux !-: Have a look at the implementors of #fileNamed: to see what I mean.
Klaus D. Witzel wrote:
Hi Charles
on Wed, 17 May 2006 21:51:20 +0200, you charleshixsn@earthlink.net wrote:
Perhaps, but the 16r09's are tab characters, and that is an accurate description of that part of the file. Also, now that I think of it 16r10 is the same as 0x0A, or line feed, so that part is also correct.
What is correct? In the previous message you said "the first line is the word 'technology' including the quotes, followed by a line feed."
16r0 (0) 16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9 16r9 16r9 16r9 16rA 16r27 16r69
C'mon Charles, this *is" the first line (the one you posted earlier) and it does *not* contain the word 'technology'. This line has 16 characters, minus 3 quotes, minus 6 tabs, minus 1 line feed, but the word 'technology' has more than 16 - (3 + 6 + 1) = 6 characters.
What's your problem? What does this have to do with your problems reading lines from textfiles on Linux?
/Klaus
P.S. w.r.t. your compliant about my opening of files with #readOnlyFileNamed:, have you seen the difference? I usually do not open files for writing when all I want is reading, even not on Linux !-: Have a look at the implementors of #fileNamed: to see what I mean.
Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners
The problem had to do with reading in a SINGLE line from a file. That has been solved. I'm sorry that I didn't remember the tab characters following the word 'technology', in that line. I was looking at a text version when I wrote that and in that line I don't use the tab characters. The first line is solely to identify what file is being read.
The significant fact was that the line ended with a line feed. The solution was, when using CrLfFileStream to look for a carriage return instead of for a line feed. (Alternatively, when using FileStream to look for a line feed instead of issuing nextLine.)
On Mon, 2006-05-15 at 18:55 -0700, Charles D Hixson wrote:
OK. Now: | fil lin n | fil := FileStream fileNamed: 'aising/data/technologies.csv' . n := 0. [fil atEnd] whileFalse: [ lin := fil nextLine. n := n + 1. Transcript cr; show: 'lin '; show: n; show: ' = '; show: lin. ]. Transcript cr; show: 'normal end after '; show: n; show: ' lines'.
Hi Charles,
I had a similar problem last week and Ron showed how to use CrLfFileStream.
You have to test the end of line character. You don't have to use a counter since the whileFalse will loop through the whole file.
This works for me on linux whether the lines end in lf or cr/lf.
| aFile myFile myStream line | aFile := '/home/ckasso/logfiles-lf/ws000101.log'. myFile := (aFile) asFileName. Transcript clear. myStream := CrLfFileStream fileNamed: myFile. [ myStream atEnd ] whileFalse: [ line := myStream upTo: Character lf. Transcript show: line; cr.]. myStream close.
You'll have to adapt it to your situation.
Chris
Chris Kassopulo wrote:
Hi Charles,
I had a similar problem last week and Ron showed how to use CrLfFileStream.
You have to test the end of line character. You don't have to use a counter since the whileFalse will loop through the whole file.
This works for me on linux whether the lines end in lf or cr/lf.
| aFile myFile myStream line | aFile := '/home/ckasso/logfiles-lf/ws000101.log'. myFile := (aFile) asFileName. Transcript clear. myStream := CrLfFileStream fileNamed: myFile. [ myStream atEnd ] whileFalse: [ line := myStream upTo: Character lf. Transcript show: line; cr.]. myStream close.
You'll have to adapt it to your situation.
Chris
Hi Chris, I am befoozled, because that works the same as the other approach, thus:
| aFile myFile myStream line | aFile := 'aising/data/technologies.csv'. myFile := (aFile) asFileName. Transcript clear. myStream := CrLfFileStream fileNamed: myFile. [ myStream atEnd ] whileFalse: [ line := myStream upTo: Character lf. Transcript show: 'line:: '; show: line; cr. ]. myStream close.
results in:
line:: 'technology' 'id' 'name' 'cost1' 'cost2' 'cost3' 'pre1' 'pre2' 'pre3' 'danger' 'typeName' 'typeValue' 1 'Autonomous Vehicles' 40000 1000 0 27 16 0 0 0 2 'Sociology' 10 500 0 0 0 0 0 'discover_public' 1000 3 'Voice Synthesis' 8000 6000 0 32 0 0 0 0
...
40 'Quantum Computing' 30000 20000 0 11 0 0 0 0 41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
Note that only the first line of the result begins with "line::", so it's exactly the same problem as the other. (OTOH, your code only required me to change the file name, which was a good check, as it means I didn't introduce any new errors.) Interestingly, when I edited the code to:
| aFile myFile myStream line | aFile := 'aising/data/technologies.csv'. myFile := (aFile) asFileName. Transcript clear. myStream := CrLfFileStream fileNamed: myFile. [ myStream atEnd ] whileFalse: [ line := myStream upTo: Character lf. Transcript show: 'line:: '; show: line; cr. ]. Transcript cr; show: 'LineEndConvention = '; show: myStreamdetectLineEndConvention. myStream close. Transcript cr; show: 'SmalltalkImage current platformName = '; show: (SmalltalkImage current platformName).
I got an ending to the routine of:
41 'unknown' 1000000000 10000000000 0 41 0 0 0 0
LineEndConvention = lf SmalltalkImage current platformName = unix
so it's understanding what the line end should be, and the formatting of the output shows that it's reading the end of lines, but somehow it's not seeing them when it comes time to read in a single line.
Chris Kassopulo wrote: ...
OK, I've further adapted your code, thusly:
| aFile myFile myStream line | aFile := 'aising/data/technologies.csv'. myFile := (aFile) asFileName. Transcript clear. myStream := CrLfFileStream fileNamed: myFile. [ myStream atEnd ] whileFalse: [ line := myStream upTo: Character cr. "<<----- Note this change" Transcript show: 'line:: '; show: line; cr. ]. Transcript cr; show: 'LineEndConvention = '; show: myStream detectLineEndConvention. myStream close. Transcript cr; show: 'SmalltalkImage current platformName = '; show: (SmalltalkImage current platformName).
and now it works properly. I find this a thorough mystery, as the original file contained no CR characters whatsoever. Should I be able to count on this continuing to work? Or is this some artifact of a bug that will be fixed in a later version? (And it's still claiming that:
LineEndConvention = lf SmalltalkImage current platformName = unix
), which is correct as far as the file on the disk goes, but obviously doesn't apply to the file as read.
beginners@lists.squeakfoundation.org