Extending FileList with CrLf

Andreas Raab andreas.raab at gmx.de
Tue Jul 22 23:48:29 UTC 2003


> CrLfFileStream isn't robust during reads, because it decides early on 
> what the line-ending convention is and sticks to it.

It only determines the line end convention for trying not to change it when
writing a file. Reading is entirely unaffected.

> So if you have a file that starts off with CR line ends, and has a 
> chunk with CRLF line ends later, you'll have garbage LF characters.

Nope. Did you ever try it? Look at CrLfStream>>next as an example:

next
    | char secondChar |
    char _ super next.
    self isBinary ifTrue: [^char].
    char == Cr ifTrue:
        [secondChar _ super next.
        secondChar ifNotNil: [secondChar == Lf ifFalse: [self skip: -1]].
        ^Cr].
    char == Lf ifTrue: [^Cr].
    ^char

So we skip an LF if it comes after a CR and we map it to CR if it comes on
its own (which is the most reasonable interpretation for me). So whatever
you do - if you get an LF out of the stream something's broken.

> A reasonable algorithm that will deal with most mixed-ending files 
> pretty well would be to take each string of consecutive CRs and/or 
> LFs and count how many CRs and how many LFs. Then take the minimum of 
> those two counts (as long as that minimum is > 0) and report that 
> many Squeak crs.

I don't like this too much as it sounds overly complex. In almost all cases
we have consistent line endings (if not, then that file can hardly count as
text) so CrLfFileStream is trying as good as it can. For your examples this
means:

> CR	-> cr
> CRLF -> cr
> LF -> cr
> CRLFCRLF -> cr cr

Those all work.

> CRLFLF -> cr
> CRCRLF -> cr
> LFCRLF -> cr

Those would all be cr cr, which make sense to me.

Cheers,
  - Andreas



More information about the Squeak-dev mailing list