[Newbies] Re: reading lines from textfiles on Linux

Charles D Hixson charleshixsn at earthlink.net
Wed May 17 01:21:36 UTC 2006


Klaus D. Witzel wrote:
> Hi Charles,
>
> apologies if you felt that my questions where tough, I only wanted to
> know what your system says (yes, I've read your subject line, it
> mentions linux).
>
> Be assured that, when you always post your whole data output, this is
> NOT useful at all; the first and the last line, sepeated by a comment
> of your's, would have served the purpose.
That's why I normally trimmed it to what I thought was reasonable. 
Perhaps I misunderstood exactly what you were requesting.
>
> O.K. now we have seen that #detectLineEndConvention responds nil. This
> is, according to the implementation in 3.8-6665, not possible, since
> this method either returns LineEndDefault (a class variable, assigned
> unconditional) or one of the constant literals in this methods. But
> this holds only for CrLfFileStream, not for MultiByteFileStream. Have
> you tried with CrLfFileStream?
In the version of the code that used CrLfFileStream it turned out that
LFs on the disk file were being translated into CRs in RAM.  Apparently
this is the intended behavior, so that's OK.  I was just operating under
the presumption that when it said the lineEndDefault was lf, it meant
that I should look for a lf.  Once this was cleared up the code started
working in a way I was comfortable with.  (I.e., it not only did what I
wanted, but I was certain that it's behavior wasn't dependent on a bug.)
>
> Next question: when you browse the method #detectLineEndConvention and
> in that pane select the class variable LineEndDefault and do a
> print-it, what does that show?
In class CrLfFileStream there is no such class variable. 
In class MultiByteFileStream LineEndDefault ByteSymbol:  self-> #lf; all
inst vars-> ; 1->108; 2->102; print it returns #lf
>
> Charles, in your other posting you said that using a fresh copy of the
> plain image made no difference. When you use the Squeak File List
> browser (alt-L or ctrl-L with capital L) and view your file and then
> in the text pane's context menue ask for 'view as hex', can you
> confirm that this Smalltalk program can see and visualize your line ends?
It appears to be 16r10.  I'm not exactly sure, as I'm not used to
reading hex this way, so here's the first little part:
16r0 (0)     16r27 16r69 16r74 16r65 16r6D 16r73 16r27 16r9 16r9 16r9
16r9 16r9 16r9 16rA 16r27 16r69
16r10 (16)     16r64 16r27 16r9 16r27 16r6E 16r61 16r6D 16r65 16r27 16r9
16r27 16r63 16r6F 16r73 16r74 16r27
16r20 (32)     16r9 16r27 16r74 16r79 16r70 16r65 16r27 16r9 16r27 16r70
16r6F 16r77 16r65 16r72 16r27 16r9
that should include at least one line.  (The first line is the word
'technology'  including the quotes, followed by a line feed.)
>
> I'm sorry this thread got a bit long. But I try to help, if that helps
> you.
>
> Here's what I was forced to do for processing line-ends from a Http
> document (this is an online data source and you can try yourself).
> AFAIK Squeak's Http streams are transparent to cr's lf's (therefore my
> code below). You can put any file stream or string into it.
>
> /Klaus
The apparent problem is that CrLfFileStream was translating LFs into
CRs, and I wasn't expecting it, while FileStream, which didn't do any
translation I was reading with nextLine, which depends on finding a CR,
and my file had LF line separators.  The first part I'm sure of, the
part about nextLine seems pretty certain.  I'm guessing about FileStream
not doing any conversions, but that would be consistent with it's
lineEndConvention = nil AND with the results that I saw.

The part that still bothers me is why when I set the mode to ascii (I
think that was what I was doing) executing a position: would throw an
error.  Also executing a reset.  Also executing a reopen.  (At that
point I was operating under the presumption that perhaps
detectLineEndConvention was filling the buffer, and then the first read
emptied the whole thing, so I was trying to rewind the file to avoid
that problem.)  This part no longer exists in any code that I've kept,
but it is nagging at me.
>
> --------------
>  | aCharStream tokens |
>
>     aStringOrStream :=
> 'http://mat.gsia.cmu.edu/COLOR/instances/myciel3.col' asUrl
> retrieveContents content.
>     aCharStream := aStringOrStream isStream
>                 ifTrue: [aStringOrStream]
>                 ifFalse: [(RWBinaryOrTextStream
>                         with: (aStringOrStream replaceAll: Character
> lf with: Character cr)) reset].
>     [aCharStream atEnd]
>         whileFalse: [(tokens := aCharStream nextLine) size > 1 ifTrue:
> [Transcript cr; show: tokens]
>             ].
>     Transcript endEntry
> --------------
>
> On Tue, 16 May 2006 16:12:40 +0200, Charles D Hixson
> <charleshixsn at earthlink.net> wrote:
> ...very big snip...
>> Note the: "normal end after 1 lines" at the end.  Note the only the
>> first line of the response includes the preface "lin # =" that the code
>> is supposed to be generating on a per line basis.  Note the
>> "LineEndConvention = nil".  This time I didn't elide any of the output,
>> but the stuff in the middle is probably ignorable, it's only the start
>> and the end of the result that are significant.
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>



More information about the Beginners mailing list