CrLfFileStream as default?

lex at cc.gatech.edu lex at cc.gatech.edu
Sun Oct 25 21:51:45 UTC 1998


Dwight Hughes <dwighth at ipa.net> wrote:
> I would also like to see it handle pathological files -- I seem to run
> into a number of source files that have (for example) LF in one part, CR
> in another, CRLF in another, and, just for fun, mutants like LFCRLF,
> CRLFLF, CRLFCR, and CRCRLF sprinkled here and there. Perhaps
> CrLfFileStream should simply chomp all of the above combinations into a
> single CR on a read (which seems to be the "right thing" in most cases
> I've seen), and only bother with line end conventions when filing out.
> 
> I guess I'm wondering if this would be useful to others or if I'm just
> lucky.
> 

I think it would be, though I think you're quite "lucky" if you are actually seeing things quite as bad as you describe!  It's easy to do.  I think you only need to change next and next: and remove references to lineEndConvention:

	CrLfFileStream.next
	| char nextC |
	char _ super next.
	self isBinary ifTrue: [^ char ].
	char = Cr
		ifTrue: [ 
			"funny code because of how peek is implemented"
			nextC _ super next.  super position: super position - 1.

			nextC = Lf ifTrue: [super next].
			^ Cr].
	char = Lf ifTrue: [^ Cr].
	^ char



	CrLfFileStream.next: n
		| string |
		string _ super next: n.
		string size = 0 ifTrue: [ ^string ].
		self isBinary ifTrue: [ ^string ].
		string _ string withSqueakLineEndings.
		string size = n ifTrue: [ ^string ].

		"string shrunk due to embedded crlfs; make up the difference"
		^string, (self next: n - string size)


There is still a problem with using CrLfFileStream as the default, however: dealing with file positions.  Right now, file positions in a CrLfFileStream will sometimes jump up by 2 even though you've only read 1 Squeak-visible character.  Either this needs to be sanctioned like it is in C (yick!), or code needs to be added to keep up with the "virtual" position in the file.

With the latter approach, it's interesting that the position in ASCII mode is different than the position in binary mode.  But a file with mixed modes is a strange beast indeed, I would think.  If there is any Squeak binary data in a file, then the whole file may as well be considered a binary file, yes?

Lex


> -- Dwight 
> 
> Lex Spoon wrote:
> > 
> > If a file has a CRLF pair in it, and CrLfFileStream converts that to a CR, then the idea of "position" gets messed up.  It looks like 1 character, but "file position" will act like 2 characters have gone by.  For CrLfFileStream to go the final mile, it should probably have some code to make positions truly transparent to the user.
> > 
> > Other than that, CrLfFileStream seems very nice.  I've been using it as the default for several months now and found it nothing but convenient.
> > 
> > Lex
> > 
> > "Pennell" <pennell at tiac.net> wrote:
> > > Dan - thanks for adding this in 2.2.  Is there any reason not to make this
> > > the default?
> > > If you don't change the default for 2.2, can you add it as a preference?
> > >
> > > - david





More information about the Squeak-dev mailing list