[ENH][FIX] Fulltext: Misc enhancements and fixes
Scott A Crosby
crosby at qwes.math.cmu.edu
Mon Feb 18 07:19:36 UTC 2002
MiscEnhancements includes about 8 independent fixes&enhancements:
- Small documentation fixes,
- Fixing a buglet in creating ReadWriteStreams with on:
- Giving Writestream an understanding of lf/crlf messages.
- Giving Socket a lowlevel try-to-send-data function
- Buffer size changes for Socket>>getData
- Giving SharedQueue the ability to safetly check its size with critical
sections.
[1]- Altered GC parameters.
[2]- More flexible CharacterSet generation, and more predefined
charactersets.
[3]- A tenure operation.
[1] is *suggested* for full text indexing; its a GC-heavy task, so the
higher parameters are definitely worth it.
[2] is *required* for full text indexing.
[3] is *used* by full text indexing. (My use of this is not necessary, but
a good idea.)
I could split this changeset up on class-boundries, but then there'd be
another 7 changesets floating around. As I'm already dealing with over a
dozen, I'd rather not split this one up before posting. (Besides, all of
the changes are trivial.)
Note that part [2] is *required* to to be filed in before the DemoAdaptors
are filed in. So, filein this changeset before Fulltext-DemoAdaptors.
--
I also include an unrelated changeset where I try to reduce the number of
explicit invocations of Garbage Collection on the Smalltalk side. This
changeset is *unrelated* to fulltext.
Scott
-------------- next part --------------
'From Squeak3.2alpha of 11 October 2001 [latest update: #4646] on 28 January 2002 at 6:44:53 pm'!
!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allAlphabetic
"return a set containing only alphabetic characters"
^self allMatching: [ :char | char isLetter].
! !
!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:23'!
allCharacters
"return a set containing all characters"
^self allMatching: [ :unused | true ]! !
!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allMatching: aBlock
"return a set containing all characters matching some predicate."
| set character |
set _ self empty.
0
to: 255
do: [:ascii |
character _ Character value: ascii.
(aBlock value: character)
ifTrue: [set add: character]].
^ set! !
!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:30'!
allNumerical
"return a set containing only alphabetic characters"
^ self
allMatching: [:char | char isDigit]! !
!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 10/1/2001 17:24'!
allSeperator
"return a set containing only alphabetic characters"
^ self
allMatching: [:char | char isSeparator]! !
!SharedQueue methodsFor: 'testing' stamp: 'sac 8/9/2001 13:22'!
critical: aBlock
"Evaluate aBlock with the accessProtect semaphore held. Do NOT mutate the queue during aBlock."
"This is to allow variable operation based on the sharedqueue length, for example,
disabling writers when the queue gets too long
queue critical: [queue size > 10. ifTrue: self writerDisable wait]
And the writer runs
self wait.
self signal."
accessProtect critical: aBlock.
! !
!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 01:41'!
getData
"Get some data"
| buf bytesRead |
(self waitForDataUntil: Socket standardDeadline)
ifFalse: [self error: 'getData timeout'].
buf _ String new: 4096.
bytesRead _ self
primSocket: socketHandle
receiveDataInto: buf
startingAt: 1
count: buf size.
^ buf copyFrom: 1 to: bytesRead! !
!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 03:17'!
receiveDataInto: aStringOrByteArray at: starting
"Receive data into the given buffer and return the number of bytes
received. Note the given buffer may be only partially filled by the
received data."
^ self
primSocket: socketHandle
receiveDataInto: aStringOrByteArray
startingAt: starting
count: (aStringOrByteArray size - starting) + 1! !
!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/29/2001 05:31'!
trySendSomeData: aStringOrByteArray startIndex: startIndex count: count
"Send up to count bytes of the given data starting at the given index.
Answer the number of bytes actually sent. Do not do any timeouts for data not sent."
"Note: This operation may have to be repeated multiple times to send a
large amount of data."
^ self
primSocket: socketHandle
sendData: aStringOrByteArray
startIndex: startIndex
count: count! !
!SystemDictionary methodsFor: 'memory space' stamp: 'sac 1/28/2002 18:12'!
tenure
"Incrementally collect all objects and tenure. This is useful after allo
cating a large array or after making a large array point to a new object."
|oldtenure|
oldtenure _ self vmParameterAt: 6.
self vmParameterAt: 6 put: 0.
self garbageCollectMost.
self vmParameterAt: 6 put: oldtenure.
! !
!SystemDictionary methodsFor: 'snapshot and quit' stamp: 'sac 1/28/2002 18:13'!
setGCParameters
"Adjust the VM's default GC parameters to avoid premature tenuring."
Smalltalk vmParameterAt: 5 put: 4000. "do an incremental GC after this
many allocations"
Smalltalk vmParameterAt: 6 put: 2000. "tenure when more than this many
objects survive the GC"
Smalltalk vmParameterAt: 5 put: 400000.
Smalltalk vmParameterAt: 6 put: 12000.
! !
!UndefinedObject methodsFor: 'testing' stamp: 'sac 9/24/2001 07:12'!
isNil
"Refer to the comment in ProtoObject|isNil."
^ true! !
!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:01'!
crlf
"Append a return character to the receiver."
self nextPut: Character cr.
self nextPut: Character lf.! !
!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:00'!
lf
"Append a return character to the receiver."
self nextPut: Character lf! !
!ReadWriteStream methodsFor: 'accessing' stamp: 'sac 9/24/2001 03:47'!
on: aCollection
super on: aCollection.
readLimit _ aCollection size.
! !
-------------- next part --------------
'From Squeak3.2alpha of 3 October 2001 [latest update: #4411] on 13 December 2001 at 12:47:09 pm'!
"Change Set: AntiGarbageCollect
Date: 13 December 2001
Author: Scott Crosby
Try to reduce the amount of garbage collections within the system. This code fixups a variety of other code to use a new Utilities>>tryToFree: and Utilities tryHardToFree: functions.
I have no way of testing out many of these changes to confirm if this code works OK. By inspection, it should."
!
!AIFFFileReader methodsFor: 'private' stamp: 'sac 12/13/2001 12:16'!
readSamplesChunk: chunkSize
"Read a SSND chunk. All AIFF files with a non-zero frameCount contain
exactly one chunk of this type."
| offset blockSize bytesOfSamples s |
skipDataChunk
ifTrue: [in skip: chunkSize.
^ self].
offset _ in nextNumber: 4.
blockSize _ in nextNumber: 4.
(offset ~= 0
or: [blockSize ~= 0])
ifTrue: [^ self error: 'this AIFF reader cannot handle blocked sample chunks'].
bytesOfSamples _ chunkSize - 8.
bytesOfSamples = (channelCount * frameCount * (bitsPerSample // 8))
ifFalse: [self error: 'actual sample count does not match COMM chunk'].
(mergeIfStereo
and: [channelCount = 2])
ifTrue: [channelData _ Array
with: (SoundBuffer newMonoSampleCount: frameCount)]
ifFalse: [channelData _ (1 to: channelCount)
collect: [:i | SoundBuffer newMonoSampleCount: frameCount]].
(Utilities tryToFree: bytesOfSamples +300000 )
ifTrue: [s _ ReadStream
on: (in next: bytesOfSamples)
"bulk-read, then process"]
ifFalse: [s _ in].
"not enough space to buffer; read directly from file"
"mono and stereo are special-cased for better performance"
channelCount = 1
ifTrue: [^ self readMonoChannelDataFrom: s].
channelCount = 2
ifTrue: [mergeIfStereo
ifTrue: [channelCount _ 1.
^ self readMergedStereoChannelDataFrom: s]
ifFalse: [^ self readStereoChannelDataFrom: s]].
self readMultiChannelDataFrom: s! !
!BookMorph methodsFor: 'menu' stamp: 'sac 12/13/2001 12:35'!
loadImagesIntoBook
"PowerPoint stores GIF presentations as individual slides named Slide1,
Slide2, etc.
Load these into the book. mjg 9/99"
| directory filenumber form newpage |
directory _ ((StandardFileMenu oldFileFrom: FileDirectory default)
ifNil: [^ nil]) directory.
directory isNil
ifTrue: [^ nil].
"Start loading 'em up!!"
filenumber _ 1.
[directory fileExists: 'Slide' , filenumber asString]
whileTrue: [Transcript show: 'Slide' , filenumber asString;
cr.
(Utilities tryHardToFree: 1000000)
ifFalse: ["Make some room"
(self valueOfProperty: #url)
== nil
ifTrue: [self savePagesOnURL]
ifFalse: [self saveAsNumberedURLs]].
form _ Form
fromFileNamed: (directory fullNameFor: 'Slide' , filenumber asString).
newpage _ PasteUpMorph new extent: form extent.
newpage
addMorph: (SketchMorph withForm: form).
self pages addLast: newpage.
filenumber _ filenumber + 1].
"After adding all, delete the first page."
self goToPage: 1.
self deletePageBasic.
"Save the book"
(self valueOfProperty: #url)
== nil
ifTrue: [self savePagesOnURL]
ifFalse: [self saveAsNumberedURLs]! !
!DisplayScreen methodsFor: 'private' stamp: 'sac 12/13/2001 12:28'!
newDepthNoRestore: pixelSize
"Change depths. Check if there is enough space!! , di"
| area need |
pixelSize = depth
ifTrue: [^ self"no change"].
pixelSize abs < self depth
ifFalse: ["Make sure there is enough space"
area _ Display boundingBox area.
"pixels"
Smalltalk isMorphic
ifFalse: [ScheduledControllers scheduledWindowControllers
do: [:aController | "This should be refined..."
aController view cacheBitsAsTwoTone
ifFalse: [area _ area + aController view windowBox area]]].
need _ area * (pixelSize abs - self depth) // 8 + Smalltalk lowSpaceThreshold.
"new bytes needed"
(Utilities tryHardToFree: need)
ifFalse: [self error: 'Insufficient free space']].
self setExtent: self extent depth: pixelSize.
Smalltalk isMorphic
ifFalse: [ScheduledControllers updateGray].
DisplayScreen startUp! !
!MailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 11:47'!
delimitersDo: aBlock
"Invoke the given block for each message in the mail inbox. The block
argument is the text of a new message."
| fileStream stream |
fileStream _ FileStream fileNamed: filename.
"Can we fit it in without GC?"
(Utilities tryToFree: fileStream size + 200000)
ifTrue: ["if possible, buffer the entire file in memory for speed"
stream _ ReadStream on: fileStream contentsOfEntireFile.
fileStream _ nil]
ifFalse: ["otherwise, use the actual file stream, reading from disk"
stream _ fileStream].
[self scanToNextMessageIn: stream]
whileTrue: [aBlock
value: (MailDB readStringLineFrom: stream)].
fileStream = nil
ifFalse: [fileStream close]! !
!MailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 11:47'!
mailMessagesDo: aBlock
"Invoke the given block for each message in the mail inbox. The block
argument is the text of a new message."
| fileStream stream msgStart msgSize msgText |
fileStream _ CrLfFileStream readOnlyFileNamed: filename.
"See if we have enough RAM without GC."
(Utilities tryToFree: fileStream size + 200000)
ifTrue: ["if possible, buffer the entire file in memory for speed"
stream _ ReadStream on: fileStream contentsOfEntireFile.
fileStream _ nil]
ifFalse: ["otherwise, use the actual file stream, reading from disk"
stream _ fileStream].
self scanToNextMessageIn: stream.
MailDB skipRestOfLine: stream.
"skip message delimiter"
msgStart _ stream position.
[self scanToNextMessageIn: stream]
whileTrue: [msgSize _ stream position - msgStart.
stream position: msgStart.
msgText _ stream next: msgSize.
MailDB skipRestOfLine: stream.
"skip message delimiter"
msgStart _ stream position.
aBlock value: msgText].
"process final message"
msgSize _ stream position - msgStart.
msgSize > 0
ifTrue: [stream position: msgStart.
msgText _ stream next: msgSize.
aBlock value: msgText].
fileStream = nil
ifFalse: [fileStream close]! !
!MHMailInboxFile methodsFor: 'scanning' stamp: 'sac 12/13/2001 12:02'!
mailMessagesDo: aBlock
"Invoke the given block for each message in the mail inbox. The block
argument is the text of a new message."
| fileStream stream textStart textSize msgText |
fileStream _ FileStream fileNamed: filename.
Smalltalk garbageCollect.
(Utilities tryToFree: fileStream size + 200000)
ifTrue: ["if possible, buffer the entire file in memory for speed"
stream _ ReadStream on: fileStream contentsOfEntireFile.
fileStream _ nil]
ifFalse: ["otherwise, use the actual file stream, reading from disk"
stream _ fileStream].
[self scanToNextMessageIn: stream]
whileTrue: ["skip the three-line message delimiter"
3
timesRepeat: [MailDB skipRestOfLine: stream].
textStart _ stream position.
self scanToNextMessageIn: stream.
textSize _ stream position - textStart.
stream position: textStart.
msgText _ stream next: textSize.
aBlock value: msgText].
fileStream = nil
ifFalse: [fileStream close]! !
!Project methodsFor: 'displaying' stamp: 'sac 12/13/2001 12:25'!
showZoom
"Decide if user wants a zoom transition, and if there is enough memory
to do it."
^ Preferences projectZoom
and: ["Only show zoom if there is room for both displays plus a
megabyte "
(Utilities tryToFree: (Display boundingBox area * displayDepth // 8 + 1000000))]! !
!StandardSystemView methodsFor: 'displaying' stamp: 'sac 12/13/2001 12:20'!
deEmphasizeForDebugger
"Carefully de-emphasis this window because a debugger is being opened.
Care must be taken to avoid invoking potentially buggy window display
code that could cause a recursive chain of errors eventually resulting
in a virtual machine crash. In particular, do not de-emphasize the
subviews. "
self deEmphasizeView.
"de-emphasize this top-level view"
self uncacheBits.
(Utilities tryToFree: 1000000)
ifTrue: ["if there is enough space, cache current window screen bits"
self cacheBitsAsIs]! !
!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 11:35'!
bytesFree
"Answer the number of bytes of that are available without a GC."
| params youngSpaceEnd memoryEnd |
params _ Smalltalk getVMParameters.
youngSpaceEnd _ params at: 2.
memoryEnd _ params at: 3.
^ (memoryEnd - youngSpaceEnd) ! !
!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 12:35'!
lowSpaceWatcher
"Wait until the low space semaphore is signalled, then take appropriate
actions."
| free |
(Utilities tryHardToFree: self lowSpaceThreshold)
ifFalse: ["free space must be above threshold before starting low
space watcher"
^ self beep].
LowSpaceSemaphore _ Semaphore new.
self primLowSpaceSemaphore: LowSpaceSemaphore.
self primSignalAtBytesLeft: self lowSpaceThreshold.
"enable low space interrupts"
LowSpaceSemaphore wait.
"wait for a low space condition..."
self primSignalAtBytesLeft: 0.
"disable low space interrupts"
self primLowSpaceSemaphore: nil.
LowSpaceProcess _ nil.
"Note: user now unprotected until the low space watcher is re-installed"
self memoryHogs isEmpty
ifFalse: [free _ self bytesLeft.
self memoryHogs
do: [:hog | hog freeSomeSpace].
self bytesLeft > free
ifTrue: [^ self installLowSpaceWatcher]].
Smalltalk isMorphic
ifTrue: [CurrentProjectRefactoring currentInterruptName: 'Space is low']
ifFalse: [ScheduledControllers interruptName: 'Space is low']! !
!SystemDictionary methodsFor: 'memory space' stamp: 'sac 12/13/2001 12:35'!
okayToProceedEvenIfSpaceIsLow
"Return true if either there is enough memory to do so safely or if the
user gives permission after being given fair warning."
(Utilities tryHardToFree: self lowSpaceThreshold)
ifTrue: [^ true].
"work harder"
^ self confirm: 'WARNING: There is not enough space to start the low space watcher.
If you proceed, you will not be warned again, and the system may
run out of memory and crash. If you do proceed, you can start the
low space notifier when more space becomes available simply by
opening and then closing a debugger (e.g., by hitting Cmd-period.)
Do you want to proceed?'! !
!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:18'!
tryHardToFree: anInteger
"Try to free at least the requested number of bytes. Do an incremental
GC or full GC only if necesssary"
Smalltalk bytesFree > anInteger
ifTrue: [^ true].
Smalltalk garbageCollectMost.
Smalltalk bytesFree > anInteger
ifTrue: [^ true].
Smalltalk garbageCollect.
Smalltalk bytesFree > anInteger
ifTrue: [^ true].
^ false! !
!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:43'!
tryReallyHardToFree: anInteger
"Try to free at least the requested number of bytes. Do an incremental
GC or full GC only if necesssary. Also, may unload various items if
really really necessary. (Not yet implemented)"
(self tryHardToFree: anInteger)
ifTrue: [^ true].
"Put in more code here? Maybe put in:
AbstractScound unloadSampledTimbres ?"
^ false! !
!Utilities class methodsFor: 'vm statistics' stamp: 'sac 12/13/2001 12:18'!
tryToFree: anInteger
"Try to free at least the requested number of bytes. Do an incremental
GC only if necesssary. Do not do a fullGC."
Smalltalk bytesFree > anInteger
ifTrue: [^ true].
Smalltalk garbageCollectMost.
Smalltalk bytesFree > anInteger
ifTrue: [^ true].
^ false! !
More information about the Squeak-dev
mailing list
|