[ENH][FIX] Fulltext: Misc enhancements and fixes

Scott A Crosby crosby at qwes.math.cmu.edu
Mon Feb 18 07:19:36 UTC 2002


MiscEnhancements includes about 8 independent fixes&enhancements:

  - Small documentation fixes,
  - Fixing a buglet in creating ReadWriteStreams with on:
  - Giving Writestream an understanding of lf/crlf messages.
  - Giving Socket a lowlevel try-to-send-data function
  - Buffer size changes for Socket>>getData
  - Giving SharedQueue the ability to safetly check its size with critical
           sections.
[1]- Altered GC parameters.
[2]- More flexible CharacterSet generation, and more predefined
charactersets.
[3]- A tenure operation.


[1] is *suggested* for full text indexing; its a GC-heavy task, so the
    higher parameters are definitely worth it.
[2] is *required* for full text indexing.
[3] is *used* by full text indexing. (My use of this is not necessary, but
    a good idea.)

I could split this changeset up on class-boundries, but then there'd be
another 7 changesets floating around. As I'm already dealing with over a
dozen, I'd rather not split this one up before posting. (Besides, all of
the changes are trivial.)

Note that part [2] is *required* to to be filed in before the DemoAdaptors
are filed in. So, filein this changeset before Fulltext-DemoAdaptors.

--

I also include an unrelated changeset where I try to reduce the number of
explicit invocations of Garbage Collection on the Smalltalk side. This
changeset is *unrelated* to fulltext.


Scott








-------------- next part --------------
'From Squeak3.2alpha of 11 October 2001 [latest update: #4646] on 28 January 2002 at 6:44:53 pm'!

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allAlphabetic
	"return a set containing only alphabetic characters"
	^self allMatching: [ :char | char isLetter].

 ! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:23'!
allCharacters
	"return a set containing all characters"
	^self allMatching: [ :unused | true ]! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allMatching: aBlock 
	"return a set containing all characters matching some predicate."
	| set character |
	set _ self empty.
	0
		to: 255
		do: [:ascii | 
			character _ Character value: ascii.
			(aBlock value: character)
				ifTrue: [set add: character]].
	^ set! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:30'!
allNumerical
	"return a set containing only alphabetic characters"
	^ self
		allMatching: [:char | char isDigit]! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 10/1/2001 17:24'!
allSeperator
	"return a set containing only alphabetic characters"
	^ self
		allMatching: [:char | char isSeparator]! !


!SharedQueue methodsFor: 'testing' stamp: 'sac 8/9/2001 13:22'!
critical: aBlock
	"Evaluate aBlock with the accessProtect semaphore held. Do NOT mutate the queue during aBlock."
	"This is to allow variable operation based on the sharedqueue length, for example,
	 disabling writers when the queue gets too long

		queue critical: [queue size > 10. ifTrue: self writerDisable wait]

	And the writer runs
		self wait.
		self signal."

	accessProtect critical: aBlock.

! !


!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 01:41'!
getData
	"Get some data"
	| buf bytesRead |
	(self waitForDataUntil: Socket standardDeadline)
		ifFalse: [self error: 'getData timeout'].
	buf _ String new: 4096.
	bytesRead _ self
				primSocket: socketHandle
				receiveDataInto: buf
				startingAt: 1
				count: buf size.
	^ buf copyFrom: 1 to: bytesRead! !

!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 03:17'!
receiveDataInto: aStringOrByteArray at: starting
	"Receive data into the given buffer and return the number of bytes 
	received. Note the given buffer may be only partially filled by the 
	received data."
	^ self
		primSocket: socketHandle
		receiveDataInto: aStringOrByteArray
		startingAt: starting
		count: (aStringOrByteArray size - starting) + 1! !

!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/29/2001 05:31'!
trySendSomeData: aStringOrByteArray startIndex: startIndex count: count 
	"Send up to count bytes of the given data starting at the given index.  
	Answer the number of bytes actually sent. Do not do any timeouts for data not sent."
	"Note: This operation may have to be repeated multiple times to send a  
	large amount of data."
	^ self
		primSocket: socketHandle
		sendData: aStringOrByteArray
		startIndex: startIndex
		count: count! !


!SystemDictionary methodsFor: 'memory space' stamp: 'sac 1/28/2002 18:12'!
tenure
        "Incrementally collect all objects and tenure. This is useful after allo
cating a large array or after making a large array point to a new object."
     |oldtenure|
     oldtenure _ self vmParameterAt: 6.
        self vmParameterAt: 6 put: 0.
        self garbageCollectMost.
        self vmParameterAt: 6 put: oldtenure.
! !

!SystemDictionary methodsFor: 'snapshot and quit' stamp: 'sac 1/28/2002 18:13'!
setGCParameters
        "Adjust the VM's default GC parameters to avoid premature tenuring."

        Smalltalk vmParameterAt: 5 put: 4000.  "do an incremental GC after this 
many allocations"
        Smalltalk vmParameterAt: 6 put: 2000.  "tenure when more than this many 
objects survive the GC"
        Smalltalk vmParameterAt: 5 put: 400000.
        Smalltalk vmParameterAt: 6 put: 12000.
! !


!UndefinedObject methodsFor: 'testing' stamp: 'sac 9/24/2001 07:12'!
isNil
	"Refer to the comment in ProtoObject|isNil."
	^ true! !


!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:01'!
crlf
	"Append a return character to the receiver."

	self nextPut: Character cr.
	self nextPut: Character lf.! !

!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:00'!
lf
	"Append a return character to the receiver."

	self nextPut: Character lf! !


!ReadWriteStream methodsFor: 'accessing' stamp: 'sac 9/24/2001 03:47'!
on: aCollection 
	super on: aCollection.
	readLimit _ aCollection size.
! !

-------------- next part --------------
'From Squeak3.2alpha of 3 October 2001 [latest update: #4411] on 13 December 2001 at 12:47:09 pm'!
"Change Set:		AntiGarbageCollect
Date:			13 December 2001
Author:			Scott Crosby

Try to reduce the amount of garbage collections within the system. This code fixups a variety of other code to use a new Utilities>>tryToFree: and Utilities tryHardToFree: functions.

I have no way of testing out many of these changes to confirm if this code works OK. By inspection, it should."
!


!AIFFFileReader methodsFor: 'private' stamp: 'sac 12/13/2001 12:16'!
readSamplesChunk: chunkSize 
	"Read a SSND chunk. All AIFF files with a non-zero frameCount contain 
	exactly one chunk of this type."
	| offset blockSize bytesOfSamples s |
	skipDataChunk
		ifTrue: [in skip: chunkSize.
			^ self].
	offset _ in nextNumber: 4.
	blockSize _ in nextNumber: 4.
	(offset ~= 0
			or: [blockSize ~= 0])
		ifTrue: [^ self error: 'this AIFF reader cannot handle blocked sample chunks'].
	bytesOfSamples _ chunkSize - 8.
	bytesOfSamples = (channelCount * frameCount * (bitsPerSample // 8))
		ifFalse: [self error: 'actual sample count does not match COMM chunk'].
	(mergeIfStereo
			and: [channelCount = 2])
		ifTrue: [channelData _ Array
						with: (SoundBuffer newMonoSampleCount: frameCount)]
		ifFalse: [channelData _ (1 to: channelCount)
						collect: [:i | SoundBuffer newMonoSampleCount: frameCount]].
	(Utilities tryToFree: bytesOfSamples +300000 )
		ifTrue: [s _ ReadStream
						on: (in next: bytesOfSamples)
			"bulk-read, then process"]
		ifFalse: [s _ in].
	"not enough space to buffer; read directly from file"
	"mono and stereo are special-cased for better performance"
	channelCount = 1
		ifTrue: [^ self readMonoChannelDataFrom: s].
	channelCount = 2
		ifTrue: [mergeIfStereo
				ifTrue: [channelCount _ 1.
					^ self readMergedStereoChannelDataFrom: s]
				ifFalse: [^ self readStereoChannelDataFrom: s]].
	self readMultiChannelDataFrom: s! !


!BookMorph methodsFor: 'menu' stamp: 'sac 12/13/2001 12:35'!
loadImagesIntoBook
	"PowerPoint stores GIF presentations as individual slides named Slide1,  
	Slide2, etc.  
	Load these into the book. mjg 9/99"
	| directory filenumber form newpage |
	directory _ ((StandardFileMenu oldFileFrom: FileDirectory default)
				ifNil: [^ nil]) directory.
	directory isNil
		ifTrue: [^ nil].
	"Start loading 'em up!!"
	filenumber _ 1.
	[directory fileExists: 'Slide' , filenumber asString]
		whileTrue: [Transcript show: 'Slide' , filenumber asString;
				 cr.
			(Utilities tryHardToFree: 1000000)
				ifFalse: ["Make some room"
					(self valueOfProperty: #url)
							== nil
						ifTrue: [self savePagesOnURL]
						ifFalse: [self saveAsNumberedURLs]].
			form _ Form
						fromFileNamed: (directory fullNameFor: 'Slide' , filenumber asString).
			newpage _ PasteUpMorph new extent: form extent.
			newpage
				addMorph: (SketchMorph withForm: form).
			self pages addLast: newpage.
			filenumber _ filenumber + 1].
	"After adding all, delete the first page."
	self goToPage: 1.
	self deletePageBasic.
	"Save the book"
	(self valueOfProperty: #url)
			== nil
		ifTrue: [self savePagesOnURL]
		ifFalse: [self saveAsNumberedURLs]! !


!DisplayScreen methodsFor: 'private' stamp: 'sac 12/13/2001 12:28'!
newDepthNoRestore: pixelSize 
	"Change depths. Check if there is enough space!! , di"
	| area need |
	pixelSize = depth
		ifTrue: [^ self"no change"].
	pixelSize abs < self depth
		ifFalse: ["Make sure there is enough space"
			area _ Display boundingBox area.
			"pixels"
			Smalltalk isMorphic
				ifFalse: [ScheduledControllers scheduledWindowControllers
						do: [:aController | "This should be refined..."
							aController view cacheBitsAsTwoTone
								ifFalse: [area _ area + aController view windowBox area]]].
			need _ area * (pixelSize abs - self depth) // 8 + Smalltalk lowSpaceThreshold.
			"new bytes needed"
			(Utilities tryHardToFree: need)
				ifFalse: [self error: 'Insufficient free space']].
	self setExtent: self extent depth: pixelSize.
	Smalltalk isMorphic
		ifFalse: [ScheduledControllers updateGray].
	DisplayScreen startUp! !


!MailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 11:47'!
delimitersDo: aBlock 
	"Invoke the given block for each message in the mail inbox. The block  
	argument is the text of a new message."
	| fileStream stream |
	fileStream _ FileStream fileNamed: filename.
	"Can we fit it in without GC?"
	(Utilities tryToFree: fileStream size + 200000)
		ifTrue: ["if possible, buffer the entire file in memory for speed"
			stream _ ReadStream on: fileStream contentsOfEntireFile.
			fileStream _ nil]
		ifFalse: ["otherwise, use the actual file stream, reading from disk"
			stream _ fileStream].
	[self scanToNextMessageIn: stream]
		whileTrue: [aBlock
				value: (MailDB readStringLineFrom: stream)].
	fileStream = nil
		ifFalse: [fileStream close]! !

!MailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 11:47'!
mailMessagesDo: aBlock 
	"Invoke the given block for each message in the mail inbox. The block  
	argument is the text of a new message."
	| fileStream stream msgStart msgSize msgText |
	fileStream _ CrLfFileStream readOnlyFileNamed: filename.
	"See if we have enough RAM without GC."

	(Utilities tryToFree: fileStream size + 200000)
		ifTrue: ["if possible, buffer the entire file in memory for speed"
			stream _ ReadStream on: fileStream contentsOfEntireFile.
			fileStream _ nil]
		ifFalse: ["otherwise, use the actual file stream, reading from disk"
			stream _ fileStream].
	self scanToNextMessageIn: stream.
	MailDB skipRestOfLine: stream.
	"skip message delimiter"
	msgStart _ stream position.
	[self scanToNextMessageIn: stream]
		whileTrue: [msgSize _ stream position - msgStart.
			stream position: msgStart.
			msgText _ stream next: msgSize.
			MailDB skipRestOfLine: stream.
			"skip message delimiter"
			msgStart _ stream position.
			aBlock value: msgText].
	"process final message"
	msgSize _ stream position - msgStart.
	msgSize > 0
		ifTrue: [stream position: msgStart.
			msgText _ stream next: msgSize.
			aBlock value: msgText].
	fileStream = nil
		ifFalse: [fileStream close]! !


!MHMailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 12:02'!
mailMessagesDo: aBlock 
	"Invoke the given block for each message in the mail inbox. The block 
	argument is the text of a new message."
	| fileStream stream textStart textSize msgText |
	fileStream _ FileStream fileNamed: filename.
	Smalltalk garbageCollect.
	(Utilities tryToFree: fileStream size + 200000)
		ifTrue: ["if possible, buffer the entire file in memory for speed"
			stream _ ReadStream on: fileStream contentsOfEntireFile.
			fileStream _ nil]
		ifFalse: ["otherwise, use the actual file stream, reading from disk"
			stream _ fileStream].
	[self scanToNextMessageIn: stream]
		whileTrue: ["skip the three-line message delimiter"
			3
				timesRepeat: [MailDB skipRestOfLine: stream].
			textStart _ stream position.
			self scanToNextMessageIn: stream.
			textSize _ stream position - textStart.
			stream position: textStart.
			msgText _ stream next: textSize.
			aBlock value: msgText].
	fileStream = nil
		ifFalse: [fileStream close]! !


!Project methodsFor: 'displaying' stamp: 'sac 12/13/2001 12:25'!
showZoom
	"Decide if user wants a zoom transition, and if there is enough memory 
	to do it."
	^ Preferences projectZoom
		and: ["Only show zoom if there is room for both displays plus a 
			megabyte "
			(Utilities tryToFree: (Display boundingBox area * displayDepth // 8 + 1000000))]! !


!StandardSystemView methodsFor: 'displaying' stamp: 'sac 12/13/2001 12:20'!
deEmphasizeForDebugger
	"Carefully de-emphasis this window because a debugger is being opened. 
	Care must be taken to avoid invoking potentially buggy window display 
	code that could cause a recursive chain of errors eventually resulting 
	in a virtual machine crash. In particular, do not de-emphasize the 
	subviews. "
	self deEmphasizeView.
	"de-emphasize this top-level view"
	self uncacheBits.
	(Utilities tryToFree:  1000000)
		ifTrue: ["if there is enough space, cache current window screen bits"
			self cacheBitsAsIs]! !


!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 11:35'!
bytesFree
	"Answer the number of bytes of that are available without a GC."
	| params youngSpaceEnd memoryEnd |
	params _ Smalltalk getVMParameters.
	youngSpaceEnd _ params at: 2.
	memoryEnd _ params at: 3.

	^ (memoryEnd - youngSpaceEnd) ! !

!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 12:35'!
lowSpaceWatcher
	"Wait until the low space semaphore is signalled, then take appropriate  
	actions."
	| free |
	(Utilities tryHardToFree: self lowSpaceThreshold)
		ifFalse: ["free space must be above threshold before starting low  
			space watcher"
			^ self beep].
	LowSpaceSemaphore _ Semaphore new.
	self primLowSpaceSemaphore: LowSpaceSemaphore.
	self primSignalAtBytesLeft: self lowSpaceThreshold.
	"enable low space interrupts"
	LowSpaceSemaphore wait.
	"wait for a low space condition..."
	self primSignalAtBytesLeft: 0.
	"disable low space interrupts"
	self primLowSpaceSemaphore: nil.
	LowSpaceProcess _ nil.
	"Note: user now unprotected until the low space watcher is re-installed"
	self memoryHogs isEmpty
		ifFalse: [free _ self bytesLeft.
			self memoryHogs
				do: [:hog | hog freeSomeSpace].
			self bytesLeft > free
				ifTrue: [^ self installLowSpaceWatcher]].
	Smalltalk isMorphic
		ifTrue: [CurrentProjectRefactoring currentInterruptName: 'Space is low']
		ifFalse: [ScheduledControllers interruptName: 'Space is low']! !

!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 12:35'!
okayToProceedEvenIfSpaceIsLow
	"Return true if either there is enough memory to do so safely or if the  
	user gives permission after being given fair warning."
	(Utilities tryHardToFree: self lowSpaceThreshold)
		ifTrue: [^ true].
	"work harder"
	^ self confirm: 'WARNING: There is not enough space to start the low space watcher.
If you proceed, you will not be warned again, and the system may
run out of memory and crash. If you do proceed, you can start the
low space notifier when more space becomes available simply by
opening and then closing a debugger (e.g., by hitting Cmd-period.)
Do you want to proceed?'! !


!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:18'!
tryHardToFree: anInteger 
	"Try to free at least the requested number of bytes. Do an incremental 
	GC or full GC only if necesssary"
	Smalltalk bytesFree > anInteger
		ifTrue: [^ true].
	Smalltalk garbageCollectMost.
	Smalltalk bytesFree > anInteger
		ifTrue: [^ true].
	Smalltalk garbageCollect.
	Smalltalk bytesFree > anInteger
		ifTrue: [^ true].
	^ false! !

!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:43'!
tryReallyHardToFree: anInteger 
	"Try to free at least the requested number of bytes. Do an incremental  
	GC or full GC only if necesssary. Also, may unload various items if 
	really really necessary. (Not yet implemented)"
	(self tryHardToFree: anInteger)
		ifTrue: [^ true].
	"Put in more code here? Maybe put in: 
	AbstractScound unloadSampledTimbres ?"
	^ false! !

!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:18'!
tryToFree: anInteger 
	"Try to free at least the requested number of bytes. Do an incremental 
	GC only if necesssary. Do not do a fullGC."
	Smalltalk bytesFree > anInteger
		ifTrue: [^ true].
	Smalltalk garbageCollectMost.
	Smalltalk bytesFree > anInteger
		ifTrue: [^ true].
	^ false! !



More information about the Squeak-dev mailing list