[squeak-dev] The Trunk: Compression-fbs.38.mcz

commits at source.squeak.org commits at source.squeak.org
Thu Jul 18 18:10:24 UTC 2013


Frank Shearar uploaded a new version of Compression to project The Trunk:
http://source.squeak.org/trunk/Compression-fbs.38.mcz

==================== Summary ====================

Name: Compression-fbs.38
Author: fbs
Time: 18 July 2013, 5:30:21.449 pm
UUID: 290267cb-781a-854e-a734-4367d109ca0f
Ancestors: Compression-fbs.37

Move CompressedSourceStream to Compression, making Files (in conjunction with System-fbs.570) independent of Compression.

=============== Diff against Compression-fbs.37 ===============

Item was added:
+ ReadWriteStream subclass: #CompressedSourceStream
+ 	instanceVariableNames: 'segmentFile segmentSize nSegments segmentTable segmentIndex dirty endOfFile'
+ 	classVariableNames: ''
+ 	poolDictionaries: ''
+ 	category: 'Compression-Streams'!
+ 
+ !CompressedSourceStream commentStamp: 'nice 3/24/2010 07:36' prior: 0!
+ I implement a file format that compresses segment by segment to allow incremental writing and browsing.  Note that the file can only be written at the end.
+ 
+ Structure:
+ segmentFile		The actual compressed file.
+ segmentSize		This is the quantum of compression.  The virtual file is sliced up
+ 				into segments of this size.
+ nSegments		The maximum number of segments to which this file can be grown.
+ endOfFile		The user's endOfFile pointer.
+ segmentTable	When a file is open, this table holds the physical file positions
+ 				of the compressed segments.
+ segmentIndex	Index of the most recently accessed segment.
+ 
+ Inherited from ReadWriteStream...
+ collection		The segment buffer, uncompressed
+ position			This is the position *local* to the current segment buffer
+ readLimit		ReadLimit for the current buffer
+ writeLimit		WriteLimit for the current buffer
+ 
+ Great care must be exercised to distinguish between the position relative to the segment buffer and the full file position (and, or course, the segment file position ;-).
+ 
+ The implementation defaults to a buffer size of 20k, and a max file size of 34MB (conveniently chosen to be greater than the current 33MB limit of source code pointers).  The format of the file is as follows:
+ 	segmentSize		4 bytes
+ 	nSegments		4 bytes
+ 	endOfFile		4 bytes
+ 	segmentTable	4 bytes * (nSegments+1)
+ 	beginning of first compressed segment
+ 
+ It is possible to override the default allocation by sending the message #segmentSize:nSegments: immediately after opening a new file for writing, as follows:
+ 
+ 	bigFile := (CompressedSourceStream on: (FileStream newFileNamed: 'biggy.stc'))
+ 			segmentSize: 50000 maxSize: 200000000
+ 
+ The difference between segment table entries reveals the size of each compressed segment.  When a file is being written, it may lack the final segment, but any flush, position:, or close will force a dirty segment to be written.!

Item was added:
+ ----- Method: CompressedSourceStream class>>on: (in category 'as yet unclassified') -----
+ on: aFile
+ 	^ self basicNew openOn: aFile!

Item was added:
+ ----- Method: CompressedSourceStream>>atEnd (in category 'access') -----
+ atEnd
+ 
+ 	position >= readLimit ifFalse: [^ false].  "more in segment"
+ 	^ self position >= endOfFile  "more in file"!

Item was added:
+ ----- Method: CompressedSourceStream>>binary (in category 'open/close') -----
+ binary
+ 	self error: 'Compressed source files are ascii to the user (though binary underneath)'!

Item was added:
+ ----- Method: CompressedSourceStream>>close (in category 'open/close') -----
+ close
+ 	self flush.
+ 	segmentFile close!

Item was added:
+ ----- Method: CompressedSourceStream>>contentsOfEntireFile (in category 'access') -----
+ contentsOfEntireFile
+ 	| contents |
+ 	self position: 0.
+ 	contents := self next: self size.
+ 	self close.
+ 	^ contents!

Item was added:
+ ----- Method: CompressedSourceStream>>fileID (in category 'private') -----
+ fileID  "Only needed for OSProcess stuff"
+ 	^ segmentFile fileID
+ !

Item was added:
+ ----- Method: CompressedSourceStream>>firstSegmentLoc (in category 'private') -----
+ firstSegmentLoc
+ 	"First segment follows 3 header words and segment table"
+ 	^ (3 + nSegments+1) * 4!

Item was added:
+ ----- Method: CompressedSourceStream>>flush (in category 'access') -----
+ flush
+ 	dirty ifTrue:
+ 		["Write buffer, compressed, to file, and also write the segment offset and eof"
+ 		self writeSegment].!

Item was added:
+ ----- Method: CompressedSourceStream>>next (in category 'access') -----
+ next
+ 	<primitive: 65>
+ 	position >= readLimit
+ 		ifTrue: [^ (self next: 1) at: 1]
+ 		ifFalse: [^ collection at: (position := position + 1)]!

Item was added:
+ ----- Method: CompressedSourceStream>>next: (in category 'access') -----
+ next: n
+ 	| str |
+ 	n <= (readLimit - position) ifTrue:
+ 		["All characters are available in buffer"
+ 		str := collection copyFrom: position + 1 to: position + n.
+ 		position := position + n.
+ 		^ str].
+ 
+ 	"Read limit could be segment boundary or real end of file"
+ 	(readLimit + self segmentOffset) = endOfFile ifTrue:
+ 		["Real end of file -- just return what's available"
+ 		^ self next: readLimit - position].
+ 
+ 	"Read rest of segment.  Then (after positioning) read what remains"
+ 	str := self next: readLimit - position.
+ 	self position: self position.
+ 	^ str , (self next: n - str size)
+ !

Item was added:
+ ----- Method: CompressedSourceStream>>nextChunk (in category 'access') -----
+ nextChunk
+ 	self flag: #workAround. 	"all accessors should decode utf8"
+ 	^super nextChunk utf8ToSqueak!

Item was added:
+ ----- Method: CompressedSourceStream>>nextPut: (in category 'access') -----
+ nextPut: char
+ 	"Slow, but we don't often write, and then not a lot"
+ 	self nextPutAll: char asString.
+ 	^ char!

Item was added:
+ ----- Method: CompressedSourceStream>>nextPutAll: (in category 'access') -----
+ nextPutAll: str
+ 	| n nInSeg |
+ 	n := str size.
+ 	n <= (writeLimit - position) ifTrue:
+ 		["All characters fit in buffer"
+ 		collection replaceFrom: position + 1 to: position + n with: str.
+ 		dirty := true.
+ 		position := position + n.
+ 		readLimit := readLimit max: position.
+ 		endOfFile := endOfFile max: self position.
+ 		^ str].
+ 
+ 	"Write what fits in segment.  Then (after positioning) write what remains"
+ 	nInSeg := writeLimit - position.
+ 	nInSeg = 0
+ 		ifTrue: [self position: self position.
+ 				self nextPutAll: str]
+ 		ifFalse: [self nextPutAll: (str first: nInSeg).
+ 				self position: self position.
+ 				self nextPutAll: (str allButFirst: nInSeg)].
+ 	^str
+ 	
+ !

Item was added:
+ ----- Method: CompressedSourceStream>>openOn: (in category 'open/close') -----
+ openOn: aFile
+ 	"Open the receiver."
+ 	segmentFile := aFile.
+ 	segmentFile binary.
+ 	segmentFile size > 0
+ 	ifTrue:
+ 		[self readHeaderInfo.  "If file exists, then read the parameters"]
+ 	ifFalse:
+ 		[self segmentSize: 20000 maxSize: 34000000.  "Otherwise write default values"]!

Item was added:
+ ----- Method: CompressedSourceStream>>openReadOnly (in category 'open/close') -----
+ openReadOnly
+ 
+ 	segmentFile openReadOnly!

Item was added:
+ ----- Method: CompressedSourceStream>>position (in category 'access') -----
+ position
+ 
+ 	^ position + self segmentOffset!

Item was added:
+ ----- Method: CompressedSourceStream>>position: (in category 'access') -----
+ position: newPosition
+ 	| compressedBuffer newSegmentIndex |
+ 	newPosition > endOfFile ifTrue:
+ 		[self error: 'Attempt to position beyond the end of file'].
+ 	newSegmentIndex := (newPosition // segmentSize) + 1.
+ 	newSegmentIndex ~= segmentIndex ifTrue:
+ 		[self flush.
+ 		segmentIndex := newSegmentIndex.
+ 		newSegmentIndex > nSegments ifTrue:
+ 			[self error: 'file size limit exceeded'].
+ 		segmentFile position: (segmentTable at: segmentIndex).
+ 		(segmentTable at: segmentIndex+1) = 0
+ 			ifTrue:
+ 			[newPosition ~= endOfFile ifTrue:
+ 				[self error: 'Internal logic error'].
+ 			collection size = segmentSize ifFalse:
+ 				[self error: 'Internal logic error'].
+ 			"just leave garbage beyond end of file"]
+ 			ifFalse:
+ 			[compressedBuffer := segmentFile next: ((segmentTable at: segmentIndex+1) - (segmentTable at: segmentIndex)).
+ 			collection :=  (GZipReadStream on: compressedBuffer) upToEnd asString].
+ 		readLimit := collection size min: endOfFile - self segmentOffset].
+ 	position := newPosition \\ segmentSize.
+ 	!

Item was added:
+ ----- Method: CompressedSourceStream>>readHeaderInfo (in category 'open/close') -----
+ readHeaderInfo
+ 	| valid a b |
+ 	segmentFile position: 0.
+ 	segmentSize := segmentFile nextNumber: 4.
+ 	nSegments := segmentFile nextNumber: 4.
+ 	endOfFile := segmentFile nextNumber: 4.
+ 	segmentFile size < (nSegments+1 + 3 * 4) ifTrue: "Check for reasonable segment info"
+ 		[self error: 'This file is not in valid compressed source format'].
+ 	segmentTable := (1 to: nSegments+1) collect: [:x | segmentFile nextNumber: 4].
+ 	segmentTable first ~= self firstSegmentLoc ifTrue:
+ 		[self error: 'This file is not in valid compressed source format'].
+ 	valid := true.
+ 	1 to: nSegments do:  "Check that segment offsets are ascending"
+ 		[:i | a := segmentTable at: i.  b := segmentTable at: i+1.
+ 		(a = 0 and: [b ~= 0]) ifTrue: [valid := false].
+ 		(a ~= 0 and: [b ~= 0]) ifTrue: [b <= a ifTrue: [valid := false]]].
+ 	valid ifFalse:
+ 		[self error: 'This file is not in valid compressed source format'].
+ 	dirty := false.
+ 	self position: 0.!

Item was added:
+ ----- Method: CompressedSourceStream>>readOnlyCopy (in category 'open/close') -----
+ readOnlyCopy
+ 
+ 	^ self class on: segmentFile readOnlyCopy!

Item was added:
+ ----- Method: CompressedSourceStream>>segmentOffset (in category 'private') -----
+ segmentOffset
+ 
+ 	^ segmentIndex - 1 * segmentSize!

Item was added:
+ ----- Method: CompressedSourceStream>>segmentSize:maxSize: (in category 'private') -----
+ segmentSize: segSize maxSize: maxSize
+ 	"Note that this method can be called after the initial open, provided that no
+ 	writing has yet taken place.  This is how to override the default segmentation."
+ 	self size = 0 ifFalse: [self error: 'Cannot set parameters after the first write'].
+ 	segmentFile position: 0.
+ 	segmentFile nextNumber: 4 put: (segmentSize := segSize).
+ 	segmentFile nextNumber: 4 put: (nSegments := maxSize // segSize + 2).
+ 	segmentFile nextNumber: 4 put: (endOfFile := 0).
+ 	segmentTable := Array new: nSegments+1 withAll: 0.
+ 	segmentTable at: 1 put: self firstSegmentLoc.  "Loc of first segment, always."
+ 	segmentTable do: [:i | segmentFile nextNumber: 4 put: i].
+ 	segmentIndex := 1.
+ 	collection := String new: segmentSize.
+ 	writeLimit := segmentSize.
+ 	readLimit := 0.
+ 	position := 0.
+ 	endOfFile := 0.
+ 	self writeSegment.
+ !

Item was added:
+ ----- Method: CompressedSourceStream>>size (in category 'access') -----
+ size
+ 	^ endOfFile ifNil: [0]!

Item was added:
+ ----- Method: CompressedSourceStream>>test (in category 'open/close') -----
+ test
+ 	"FileDirectory default deleteFileNamed: 'test.stc'.
+ 	(CompressedSourceStream on: (FileStream newFileNamed: 'test.stc')) fileOutChanges"
+ 
+ 	"FileDirectory default deleteFileNamed: 'test2.stc'.
+ 	((CompressedSourceStream on: (FileStream newFileNamed: 'test2.stc'))
+ 		segmentSize: 100 nSegments: 1000) fileOutChanges"
+ 
+ 	"FileDirectory default deleteFileNamed: 'test3.st'.
+ 	(FileStream newFileNamed: 'test3.st') fileOutChanges"
+ 
+ 	"(CompressedSourceStream on: (FileStream oldFileNamed: 'test.stc')) contentsOfEntireFile"
+ !

Item was added:
+ ----- Method: CompressedSourceStream>>writeSegment (in category 'private') -----
+ writeSegment
+ 	"The current segment must be the last in the file."
+ 	| compressedSegment |
+ 	segmentFile position: (segmentTable at: segmentIndex).
+ 	compressedSegment := ByteArray streamContents:
+ 		[:strm | (GZipWriteStream on: strm) nextPutAll: collection asByteArray; close].
+ 	segmentFile nextPutAll: compressedSegment.
+ 	segmentTable at: segmentIndex + 1 put: segmentFile position.
+ 
+ 	segmentFile position: 2 * 4.
+ 	segmentFile nextNumber: 4 put: endOfFile.
+ 	segmentFile position: (segmentIndex + 3) * 4.
+ 	segmentFile nextNumber: 4 put: (segmentTable at: segmentIndex + 1).
+ 	dirty := false!



More information about the Squeak-dev mailing list