CrLfFileStream as default?

William O. Dargel wDargel at shoshana.com
Fri Nov 6 01:34:55 UTC 1998


Lex Spoon wrote:

> If a file has a CRLF pair in it, and CrLfFileStream converts that to 
> a CR, then the idea of "position" gets messed up.  It looks like 
> 1 character, but "file position" will act like 2 characters have 
> gone by.  For CrLfFileStream to go the final mile, it should 
> probably have some code to make positions truly transparent to 
> the user.

I've been bitten by the "position mismatch" in a number of places. Us
poor windoze users probably get it more than most, having CrLf as the
default. Attached is a change set that fixes the problems I've seen.

Doing a 'save as' of the image messes up the source pointers to the
changes file when the Cr's get written as CrLf's. I changed the copy to
use 'binary' mode, which seems appropriate.

Both #peek and #upTo: in StandardFileStream assume the 1:1 correlation
between characters and positions. The re-writes for CrLfFileStream fix
that. I'm sure there are other parts to the protocol that also need
fixing -- I just haven't run across them yet.

Also included is a #correctLineEndings method that makes a copy of a
file while changing every type of line ending into the current default.
I've found it to be handy in dealing with certain "problem" files where
the line endings have been mixed and matched.

I also explored the slowness of some file parsing I was doing. By
changing CrLfFileStream>>next (and 3 other methods) I got better than a
factor of 5 speed up.

I agree with others here that the FileStream (and probably the whole
Stream) hierarchy needs some rework to both clean it up and make it more
flexible. However, I'm hoping that the attached changes can get
incorporated into the standard image, while we figure out what we really
want to do.

I've added below the more detailed description from the change set.

-------------------------------------------
Bill Dargel            wdargel at shoshana.com
Shoshana Technologies
100 West Joy Road, Ann Arbor, MI 48105  USA
------------------------------------------------------------------

Change Set:  CrLfFileStream updates
Date:   5 November 1998
Author:   William O. Dargel

Some bug fixes and enhancements for CrLfFileStream for Squeak 2.2.

FileDirectory>>copyFileNamed:toFileNamed: - Set both files to #binary
mode in order to make an exact copy. Fixes problem with 'saveAs'
possibly converting Cr to CrLf in the changes file, thereby messing up
the image offsets to the source code.

CrLfFileStream>>peek - Fixes the erroneous assumption made in
StandardFileStream that #next will always change the position by 1. Note
that this implementation in CrLfFileStream could be promoted, replacing
StandardFileStream's.

CrLfFileStream>>upTo: - Fixes the assumption made by StandardFileStream
that positions and character counts are directly correlated. Uses the
simple, one character at a time implementation similar to
PositionableStream. At one point I implemented a complicated buffer
based version, but found that after optimizing #next, there was no
appreciable speed difference between it and the simple version.

CrLfFileStream>>next - Optimized to be about 5.5 times faster when just
doing #next through a large file. This was accomplished by adding a
caching instance variable for 'lineEndFirstCharacter', used instead of
doing a dictionary lookup each time. This is updated whenever the
line-ending convention is set. The speedup was measured for
Windows (NT 4.0), and is predicated on a fairly efficient primitive for
getting a single character from the file system, YMMV.

CrLfFileStream>>lineEndConvention: - Setter method for
lineEndConvention. Updates the 'lineEndFirstCharacter' caching instance
variable. Also useful for explicit external control of the
lineEndConvention on individual file instances.

CrLfFileStream>>detectLineEndConvention - Uses the lineEndConvention:
setter. Also changed to use a 'nil' lineEndConvention while it is
operating in order to have next and peek operate in a basic manner.

CrLfFileStream>>binary - Changed to use the lineEndConvention: setter.

FileStream>>correctLineEndings - A utility that re-writes a file
changing all flavors of line endings, Cr, Lf and CrLf, into the current
CrLfFileStream default line ending. Makes use of a temporary file,
renaming it after copying and deleting the original. I hooked this
method up as an option on the FileList menu.
'From Squeak 2.2 of Sept 23, 1998 on 5 November 1998 at 8:23:18 pm'!
"Change Set:		CrLfFileStream updates
Date:			5 November 1998
Author:			William O. Dargel

Some bug fixes and enhancements for CrLfFileStream for Squeak 2.2.

FileDirectory>>copyFileNamed:toFileNamed: - Set both files to #binary mode in order to make an exact copy. Fixes problem with 'saveAs' possibly converting Cr to CrLf in the changes file, thereby messing up the image offsets to the source code.

CrLfFileStream>>peek - Fixes the erroneous assumption made in StandardFileStream that #next will always change the position by 1. Note that this implementation in CrLfFileStream could be promoted, replacing StandardFileStream's.

CrLfFileStream>>upTo: - Fixes the assumption made by StandardFileStream that positions and character counts are directly correlated. Uses the simple, one character at a time implementation similar to PositionableStream. At one point I implemented a complicated buffer based version, but found that after optimizing #next, there was no appreciable speed difference between it and the simple version.

CrLfFileStream>>next - Optimized to be about 5.5 times faster when just doing #next through a large file. This was accomplished by adding a caching instance variable for 'lineEndFirstCharacter', used instead of doing a dictionary lookup each time. This is updated whenever the line-ending convention is set. The speedup was measured for Windows (NT 4.0), and is predicated on a fairly efficient primitive for getting a single character from the file system, YMMV.

CrLfFileStream>>lineEndConvention: - Setter method for lineEndConvention. Updates the 'lineEndFirstCharacter' caching instance variable. Also useful for explicit external control of the lineEndConvention on individual file instances.

CrLfFileStream>>detectLineEndConvention - Uses the lineEndConvention: setter. Also changed to use a 'nil' lineEndConvention while it is operating in order to have next and peek operate in a basic manner.

CrLfFileStream>>binary - Changed to use the lineEndConvention: setter.

FileStream>>correctLineEndings - A utility that re-writes a file changing all flavors of line endings, Cr, Lf and CrLf, into the current CrLfFileStream default line ending. Makes use of a temporary file, renaming it after copying and deleting the original. I hooked this method up as an option on the FileList menu.
"!

StandardFileStream subclass: #CrLfFileStream
	instanceVariableNames: 'lineEndConvention lineEndFirstCharacter '
	classVariableNames: 'Cr CrLf Lf LineEndDefault LineEndStrings LookAheadCount '
	poolDictionaries: ''
	category: 'System-Files'!

!FileDirectory methodsFor: 'file operations' stamp: 'wod 11/5/1998 18:41'!
copyFileNamed: fileName1 toFileNamed: fileName2
	"Copy the contents of the existing file with the first name into a new file with the second name. Both files are assumed to be in this directory."
	"FileDirectory default copyFileNamed: 'todo.txt' toFileNamed: 'todocopy.txt'"

	| file1 file2 buffer |
	file1 _ (self readOnlyFileNamed: fileName1) binary.
	file2 _ (self newFileNamed: fileName2) binary.
	buffer _ String new: 50000.
	[file1 atEnd] whileFalse:
		[file2 nextPutAll: (file1 nextInto: buffer)].
	file1 close.
	file2 close.
! !


!FileStream methodsFor: 'as yet unclassified' stamp: 'wod 11/5/1998 17:46'!
correctLineEndings
		"Correct the line-endings in the receiver by writing a new copy of the file. All combinations of line-endings found, Cr, Lf and CrLf, are converted into the default used by CrLfFileStream."
    | tempName out char |
    tempName _ self directory fullNameFor: '~LinEnd~.tmp'.
    out _ CrLfFileStream newFileNamed: tempName.
    self reset.
    [self atEnd] whileFalse:
        [char _ self next.
        char == Character cr
            ifTrue:
                [out cr.
                self peek == Character lf ifTrue: [self next]]
            ifFalse:
                [char == Character lf
                    ifTrue: [out cr]
                    ifFalse: [out nextPut: char]]].
    self close.
    out close.
    self directory deleteFileNamed: self name.
    self directory
        rename: tempName
        toBe: self name.! !


!CrLfFileStream methodsFor: 'open/close' stamp: 'wod 11/5/1998 17:31'!
lineEndConvention: aSymbolOrNil
		"Set the line-ending convention of the receiver to be #cr, #lf, #crlf, or nil."
	lineEndFirstCharacter _ aSymbolOrNil ifNotNil: [(LineEndStrings at: aSymbolOrNil) first].
	^ lineEndConvention _ aSymbolOrNil.
! !

!CrLfFileStream methodsFor: 'access' stamp: 'wod 6/18/1998 14:00'!
binary
	super binary.
	self lineEndConvention: nil! !

!CrLfFileStream methodsFor: 'access' stamp: 'wod 11/5/1998 15:01'!
detectLineEndConvention
	"Detect the line end convention used in this stream. The result may be either #cr, #lf or #crlf."
	| char numRead pos |
	self isBinary ifTrue: [^ self error: 'Line end conventions are not used on binary streams'].
	self lineEndConvention: nil.
	numRead _ 0.
	pos _ self position.
	self position: 0.
	[self atEnd not and: [numRead < LookAheadCount]] whileTrue: 
		[char _ self next.
		char = Lf ifTrue: 
			[self position: pos.
			^ self lineEndConvention: #lf].
		char = Cr ifTrue: 
			[self peek = Lf
				ifTrue: [self lineEndConvention: #crlf]
				ifFalse: [self lineEndConvention: #cr].
			self position: pos.
			^ lineEndConvention].
		numRead _ numRead + 1].
	self position: pos.
	^ lineEndConvention ifNil: [self lineEndConvention: LineEndDefault]! !

!CrLfFileStream methodsFor: 'access' stamp: 'wod 11/5/1998 17:49'!
next
    | char secondChar |
    char _ super next.
    self isBinary ifTrue: [^char].
    char == Cr ifTrue:
        [secondChar _ super next.
        secondChar ifNotNil: [secondChar == Lf ifFalse: [self skip: -1]].
        ^Cr].
    char == Lf ifTrue: [^Cr].
    ^char! !

!CrLfFileStream methodsFor: 'access' stamp: 'wod 6/18/1998 13:52'!
peek
	"Answer what would be returned if the message next were sent to the receiver. If the receiver is at the end, answer nil.  "
	| next pos |
	self atEnd ifTrue: [^ nil].
	pos _ self position.
	next _ self next.
	self position: pos.
	^ next! !

!CrLfFileStream methodsFor: 'access' stamp: 'wod 11/5/1998 14:15'!
upTo: aCharacter
	| newStream char |
	newStream _ WriteStream on: (String new: 100).
	[(char _ self next) isNil or: [char == aCharacter]]
		whileFalse: [newStream nextPut: char].
	^ newStream contents
! !





More information about the Squeak-dev mailing list