ImageSegments in Celeste

Lex Spoon lex at cc.gatech.edu
Sun Apr 23 23:49:39 UTC 2000


Celeste has a performance problem when you have a large mail database. 
The problem is simply that it takes a long time to save or load the
index file.

The below changes uses ImageSegments to make this faster.  On my
machine, Celeste loads and saves improve from ~14 seconds to ~4 seconds.
 WARNING, however: it's probably not the most robust system.  If you try
it, you can always revert to the regular format, by modifying
IndexFile>>writeOn: and changing the "true" to "false" at the
(hopefully) obvious location.

I must say, the image segment model is  little wierd to program around. 
For example, I had to change an instance of '' to ('' copy).    As
another example, opening an explorer on an index file can cause it to be
unable to save itself!

I'm not sure what the ultimate answer to these conundrums is.  But I
thought I'd share this much, just because the speed improvement changes
the entire feel of Celeste.  4 seconds!!


-Lex

====
'From Squeak2.8alpha of 9 March 2000 [latest update: #1974] on 23 April 2000 at 11:40:46 pm'!
"Change Set:		Celeste-fastIndex
Date:			21 April 2000
Author:			Lex Spoon

Uses ImageSegment's to store Celeste's index file.  Probably not very robust just yet...."
!

Magnitude subclass: #IndexFileEntry
	instanceVariableNames: 'messageFile msgID location textLength time from to cc subject tocLineCache '
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Network-Mail Reader'!

!IndexFile methodsFor: 'file operations' stamp: 'ls 4/20/2000 23:48'!
openOn: aFileName messageFile: messageFile
	"Initialize myself from the file with the given name."

	| fileStream |
	filename _ aFileName.
	fileStream _ FileStream fileNamed: aFileName.
	self readFrom: fileStream messageFile: messageFile.

		"close and release the file stream"
	fileStream _ nil.! !

!IndexFile methodsFor: 'read-write' stamp: 'ls 4/20/2000 23:44'!
readBinaryFrom: aStream messageFile: messageFile
	| numEncodedOutPointers encodedOutPointers outPointers segment restoredRoots |
	"Initialize myself from the given binary stream"

	"read in the out the encoded pointers"
	numEncodedOutPointers := Integer readFromString: (aStream nextLine).
	encodedOutPointers := (1 to: numEncodedOutPointers) collect: [ :ignored |
		aStream nextLine ].
	outPointers := encodedOutPointers collect: [ :ep |
		ep = '=MMM' ifTrue: [ messageFile ] ifFalse: [
			ep = '=nil' ifTrue: [ nil ] ifFalse: [
				Smalltalk at: ep asSymbol ] ] ].


	"read in the segment"
	aStream binary.
	segment := WordArray newFromStream: aStream.

	"extract the segment"
	restoredRoots := ImageSegment new loadSegmentFrom: segment outPointers: outPointers.
	msgDictionary := restoredRoots at: 1.
	timeSortedEntries := restoredRoots at: 2! !

!IndexFile methodsFor: 'read-write' stamp: 'ls 4/21/2000 00:13'!
readFrom: aStream messageFile: messageFile
	"Initialize myself from the given text stream. It is assumed that the .index file was written in order of ascending message timestamps, although this method is only less efficient, not incorrect, if this is not the case."

	| sorted lastTime msgID entry savedPosition |

	"check for binary format"
	savedPosition := aStream position.
	(aStream nextLine = self class binaryMagic) ifTrue: [
		^self readBinaryFrom: aStream messageFile: messageFile ].
	
	"nope.  do a normal read"
	aStream position: savedPosition.

	msgDictionary _ Dictionary new: 1000.
	timeSortedEntries _ (SortedCollection new: 1000) .
	sorted _ true.
	lastTime _ nil.
	[aStream atEnd] whileFalse:
		[msgID _ MailDB readIntegerLineFrom: aStream.
		 entry _ IndexFileEntry readFrom: aStream messageFile: messageFile msgID: msgID.
		 msgDictionary at: msgID put: entry.
		 timeSortedEntries addLast: (Association key: msgID value: entry).
		 ((sorted & lastTime notNil) and: [lastTime > entry time]) ifTrue:
			[sorted _ false].
		 lastTime _ entry time].
	sorted ifFalse: [timeSortedEntries reSort]. 	"re-sort if necessary"! !

!IndexFile methodsFor: 'read-write' stamp: 'ls 4/21/2000 20:28'!
writeBinaryOn: aStream
	| imageSegment encodedOutPointers |
	"Write my index entries to the given text stream in a fast binary format."

	aStream nextPutAll: self class binaryMagic ; cr.

	imageSegment := ImageSegment new copyFromRoots: {msgDictionary.  timeSortedEntries} sizeHint: (400 * msgDictionary size).

	encodedOutPointers := imageSegment outPointers collect: [ :p |
		p class == MessageFile ifTrue: [ '=MMM' ] ifFalse: [
			p ifNil: [ '=nil' ] ifNotNil: [
				p isBehavior ifFalse: [ self error: 'unexpected out pointer' ].
				p isMeta ifTrue: [ self error: 'unexpected metaclass' ].
				p name asString ] ]. ]. 

	"write the encoded out pointers out"
	aStream print: encodedOutPointers size; cr.
	encodedOutPointers do: [ :ep |
		aStream nextPutAll: ep.
		aStream cr ].

	"write the imagesegment data"
	aStream binary.
	imageSegment segment writeOn: aStream.
! !

!IndexFile methodsFor: 'read-write' stamp: 'ls 4/20/2000 23:44'!
writeOn: aStream
	"Write my index entries to the given text stream in human-readable form."
	"Note: For efficiency, this is done in order of increasing message timestamps, to save the cost of sorting when we read it back in. It is assumed that timeSortedEntries should contains exactly the same message ID's as msgDictionary."

	true ifTrue: [
		"use binary format"
		^self writeBinaryOn: aStream ].

	timeSortedEntries do:
		[: assoc |
		 (assoc key) printOn: aStream.		"message ID"
		 aStream cr.
		 (assoc value) writeOn: aStream].	"index entry"! !


!IndexFile class methodsFor: 'codes' stamp: 'ls 4/20/2000 23:26'!
binaryMagic
	"a string identifying a file as being in binary format"
	^'CELESTE-INDEX-BINARY'! !


!IndexFileEntry methodsFor: 'comparing' stamp: 'ls 4/20/2000 23:15'!
< anEntry
	^self value time <= anEntry value time! !


!MailDB methodsFor: 'housekeeping' stamp: 'ls 4/21/2000 01:06'!
appendMessages: msgBuffer messageFile: msgFile indexFile: idxFile
	"Append the given collection of messages to the message file. msgBuffer is a collection of (message ID, message text) pairs."

	| id msgText location entry |
	msgBuffer do:
		[: idAndText |
		 id _ idAndText at: 1.
		 msgText _ idAndText at: 2.
		 location _ msgFile basicAppend: msgText id: id.
		 entry _ IndexFileEntry
					message: (MailMessage from: msgText)
					location: location
					messageFile: msgFile
					msgID: id.
		 entry  location: location; textLength: msgText size.
		 idxFile at: id put: entry].! !


!MailMessage methodsFor: 'initialize-release' stamp: 'ls 4/21/2000 00:23'!
from: aString 
	"Parse aString to initialize myself."
	| parseStream isMime contentType bodyText contentTransferEncoding |
	time _ 0.
	from _ to _ cc _ subject _ '' copy.
	text _ aString withoutTrailingBlanks.
	parseStream _ ReadStream on: text.
	isMime _ false.
	"mdr: this does not appear to be used"
	contentType _ 'text/plain'.
	contentTransferEncoding _ nil.
	self fieldsFrom: parseStream do: 
		[:fName :fValue | 
		fName = 'date' ifTrue: [time _ (self timeFrom: fValue) ifNil: [ 0 ]].
		fName = 'from' ifTrue: [from _ fValue].
		fName = 'to'
			ifTrue: [to isEmpty
					ifTrue: [to _ fValue]
					ifFalse: [to _ to , ', ' , fValue]].
		fName = 'cc'
			ifTrue: [cc isEmpty
					ifTrue: [cc _ fValue]
					ifFalse: [cc _ cc , ', ' , fValue]].
		fName = 'subject' ifTrue: [subject _ fValue].
		fName = 'mime-version' ifTrue: [isMime _ true].
		fName = 'content-type' ifTrue: [contentType _ fValue copyUpTo: $;].
		fName = 'content-transfer-encoding' ifTrue: [contentTransferEncoding _ fValue asLowercase]].
	bodyText _ parseStream upToEnd.
	contentTransferEncoding = 'base64'
		ifTrue: 
			[bodyText _ Base64MimeConverter mimeDecodeToChars: (ReadStream on: bodyText).
			bodyText _ bodyText contents].
	contentTransferEncoding = 'quoted-printable' ifTrue: [bodyText _ bodyText decodeQuotedPrintable].
	body _ MIMEDocument contentType: contentType content: bodyText! !





More information about the Squeak-dev mailing list