Full-text search, performance results.

Scott A Crosby crosby at qwes.math.cmu.edu
Wed Jan 23 03:18:37 UTC 2002


On Tue, 22 Jan 2002, Bijan Parsia wrote:

> On Tue, 22 Jan 2002, Scott A Crosby wrote:
>
> [snip Scott's cool stuff from an afternoon's work]

Actually, the engine itself is trivial. About 30-40 lines of code. Ask a
different simple thing and ye shall recieve. :)  (Well, tomorrow.)

Most of my time was spent on fixing up String and Skiplists in many ways.

>
> Sheesh, and you complain about not getting your stuff into the main
> image? I've been whining about fast full text search for YEARS. *Cees*
> does it and gets next day/week turn around!!!!

Well, I have *several* goodies that are hard to maintain seperately, that
have been done for weeks/months, and yet to be incorporated.

Thats a little different than complaining about nobody writing some code.

--

But, to whet your appetite and to see if anyone wishes to critique the
interface, I include my current code. One issue with it is that I cleanup
a few methods that will be dumped with a new VM. I'm unfamiliar with this,
so don't file into an image you like.

I'm not including the skiplist updates here; prefix matching is untested.


di _ DocumentIndex withAdaptor: (SimpleIndexAdaptor new).

di add: 'hello, I am Scott Crosby'.
di add: 'Scott crosby is my name'.
di add: 'What is your name?'.
di add: '123456'.
di anyOf: #('Scott' 'name') asSet.

Or,

di _ DocumentIndex withAdaptor: (SimpleTextIndexAdaptor new).

MessageTally spyOn: [Morph selectorsDo: [ :method | di add: ((Morph
sourceCodeAt: method) asString)]].

MessageTally spyOn: [((di anyOf: #(color) asSet) intersection: (di anyOf:
#(size) asSet)) size]

-----

So, make your own indexAdaptors to index whatever documents ya want. :)

That part should have a stable interface. :)


Scott
-------------- next part --------------
'From Squeak3.2alpha of 11 October 2001 [latest update: #4646] on 22 January 2002 at 10:05:25 pm'!
Object subclass: #DocumentIndex
	instanceVariableNames: 'dbase indexAdaptor '
	classVariableNames: ''
	poolDictionaries: ''
	category: 'FullTextSearch'!
Smalltalk renameClassNamed: #DocumentPrefix as: #DocumentPrefixIndex!
Object subclass: #DocumentPrefixIndex
	instanceVariableNames: 'dbase indexAdaptor '
	classVariableNames: ''
	poolDictionaries: ''
	category: 'FullTextSearch'!
Object subclass: #IndexAdaptor
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'FullTextSearch'!
IndexAdaptor subclass: #SimpleIndexAdaptor
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'FullTextSearch'!
SimpleIndexAdaptor subclass: #SimpleTextIndexAdaptor
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'FullTextSearch'!

!DocumentIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:19'!
add: aDocument
	"Add a document into the database."
	| words |
	words _ indexAdaptor terms: aDocument.
	words do: [ :word |
				|canonword set|
					canonword _ (indexAdaptor canonicalize: word).
					set _ dbase at: canonword ifAbsent: [IdentitySet new].
				     set add: aDocument.
					dbase at: canonword put: set]
! !

!DocumentIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:20'!
anyOf: aSet
	"Return the set of documents that matches any of the words in the given set."
	| sum |
	sum _ IdentitySet new.
	aSet do: [ :item |  
                   |other|
                  other _ dbase at: (indexAdaptor canonicalize: item) ifAbsent: [IdentitySet new].
                  sum union: other
           ].
	^sum! !

!DocumentIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 19:55'!
dbase
	^dbase.
! !

!DocumentIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:14'!
initialize: anIndexAdaptor
	| |
	indexAdaptor _ anIndexAdaptor.
	dbase _ Dictionary new.
! !

!DocumentIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:20'!
remove: aDocument
	"Remove a document into the database. Note that the document should be unchanged from when it was added in."
	| words |
	words _ indexAdaptor terms: aDocument.
	words do: [ :word |
                   (dbase at: (indexAdaptor canonicalize: word)  ifAbsent: [IdentitySet new]) remove: aDocument]
! !


!DocumentIndex class methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 19:03'!
withAdaptor: anIndexAdaptor
	"Build an index over documents. IndexAdaptor is something responsible for interfacing documents to and from the index code."
	^self new initialize: anIndexAdaptor.! !


!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:22'!
add: aDocument
	"Add a document into the database."
	| words |
	words _ indexAdaptor terms: aDocument.
	words do: [ :word |
				|canonword set|
					canonword _ (indexAdaptor canonicalize: word).
					set _ dbase at: canonword ifAbsent: [IdentitySet new].
				     set add: aDocument.
					dbase at: canonword put: set]
! !

!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:22'!
anyOf: aSet
	"Return the set of documents that matches any of the words in the given set."
	| sum |
	sum _ IdentitySet new.
	aSet do: [ :item |  
                   |other|
                  other _ dbase at: (indexAdaptor canonicalize: item) ifAbsent: [IdentitySet new].
                  sum union: other
           ].
	^sum! !

!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:12'!
dbase
	^dbase.
! !

!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:12'!
initialize: anIndexAdaptor
	| |
	indexAdaptor _ anIndexAdaptor.
	dbase _ SkipList new.
! !

!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:12'!
prefixOf: aString
	"Return the set of documents that match the prefix of the given string:"
	| sum node |
	sum _ Set new.
	node _ dbase search: aString.
	node ifNil: [^sum]. "No key exists."
	[node key beginsWith: aString]
		whileTrue:  [sum union: node value.
					node _ node next.
					node ifNil: [^sum]].
	^sum! !

!DocumentPrefixIndex methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:22'!
remove: aDocument
	"Remove a document into the database. Note that the document should be unchanged from when it was added in."
	| words |
	words _ indexAdaptor terms: aDocument.
	words do: [ :word |
                   (dbase at: (indexAdaptor canonicalize: word) ifAbsent: [IdentitySet new]) remove: aDocument]
! !


!DocumentPrefixIndex class methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 20:12'!
withAdaptor: anIndexAdaptor
	"Build an index over documents. IndexAdaptor is something responsible for interfacing documents to and from the index code."
	^self new initialize: anIndexAdaptor.! !


!IndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 18:57'!
canonicalize: aString
	"Given a string or a term, canonicalize it for searching."
	self subclassResponsibility! !

!IndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 18:56'!
terms: aDocument
	"Given a document, give a list of the search terms occuring in it."
	self subclassResponsibility! !


!SimpleIndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 18:58'!
canonicalize: aString
	"Given a string or a term, canonicalize it for searching."
	^aString asLowercase
! !

!SimpleIndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 19:55'!
terms: aString
	"Given a document, give a list of the search terms occuring in it."
	| words ignore|
	ignore _ #(iftrue iffalse self value do aBlock) asSet.

	words _ aString substrings: CharacterSet allAlphabetic.
	words _ words select: [ :word | (word size > 3) & (ignore includes: word) not].
	^words
! !


!SimpleTextIndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 19:32'!
canonicalize: aString
	"Given a string or a term, canonicalize it for searching."
	^aString asLowercase
! !

!SimpleTextIndexAdaptor methodsFor: 'as yet unclassified' stamp: 'sac 1/22/2002 19:33'!
terms: aText
	"Given a document, give a list of the search terms occuring in it."
	| |
	^super terms: aText asString.! !

DocumentIndex class removeSelector: #withExtractor:!
DocumentIndex removeSelector: #prefixOf:!
-------------- next part --------------
'From Squeak3.2alpha of 11 October 2001 [latest update: #4646] on 22 January 2002 at 10:05:37 pm'!

!ChessMoveGenerator methodsFor: 'public' stamp: 'sac 1/22/2002 15:32'!
findAllPossibleMovesFor: player
	"Find all possible moves. This method does not check if the move is legal, e.g., if the king of the player is under attack after the move. If the opponent is check mate (e.g., the king could be taken in the next move) the method returns nil. If the game is stale mate (e.g., the receiver has no move left) this method returns an empty array."
	| piece actions square |
	myPlayer _ player.
	myPieces _ player pieces.
	itsPieces _ player opponent pieces.
	castlingStatus _ player castlingStatus.
	enpassantSquare _ player opponent enpassantSquare.
	firstMoveIndex = lastMoveIndex ifFalse:[self error:'I am confused'].
	kingAttack _ nil.
	myPlayer isWhitePlayer ifTrue:[
		actions _ #(	moveWhitePawnAt: moveKnightAt: moveBishopAt:
					moveRookAt: moveQueenAt: moveWhiteKingAt:).
	] ifFalse:[
		actions _ #(	moveBlackPawnAt: moveKnightAt: moveBishopAt:
					moveRookAt: moveQueenAt: moveBlackKingAt:).
	].
	square _ 0.
	[square < 64] whileTrue:[
		"Note: The following is only to skip empty fields efficiently.
		It could well be replaced by going through each field and test it
		for zero but this is *much* faster."
		square _ myPieces indexOfAnyOf: EmptyPieceMap startingAt: square+1.
		"square _ String findFirstInString: myPieces inSet: EmptyPieceMap startingAt: square+1."
		square = 0 ifTrue:[^self moveList].
		piece _ myPieces at: square.
		self perform: (actions at: piece) with: square.
		kingAttack ifNotNil:[^self moveList].
	].
	^self moveList! !


!InterpreterSupportCode class methodsFor: 'source file exporting' stamp: 'sac 1/22/2002 15:15'!
cCodeForMiscPrimitives
	"Return the contents of the miscellaneous primitives file, which is generated via automatic translation to C."

	^ CCodeGenerator new codeStringForPrimitives: #(
		(Bitmap compress:toByteArray:)
		(Bitmap decompress:fromByteArray:at:)
		(Bitmap encodeBytesOf:in:at:)
		(Bitmap encodeInt:in:at:)
		(String compare:with:collated:)
		(String translate:from:to:table:)	
		(String findFirstInString:inSet:startingAt:endingAt:)
		(String indexOfAscii:inString:startingAt:)
		(String findSubstring:in:startingAt:matchTable:)
		(SampledSound convert8bitSignedFrom:to16Bit:))
! !


!MethodFinder methodsFor: 'initialize' stamp: 'sac 1/22/2002 15:16'!
initialize2
	"The methods we are allowed to use.  (MethodFinder new initialize) "

"Set"
	#("in class" sizeFor:
"testing" "adding" "removing" "enumerating"
"private" array findElementOrNil: 
"accessing" someElement) do: [:sel | Approved add: sel].

"Dictionary, IdentityDictionary, IdentitySet"
	#("accessing" associationAt: associationAt:ifAbsent: at:ifPresent: keyAtIdentityValue: keyAtIdentityValue:ifAbsent: keyAtValue: keyAtValue:ifAbsent: keys
"testing" includesKey: ) do: [:sel | Approved add: sel].
	#(removeKey: removeKey:ifAbsent:
) do: [:sel | AddAndRemove add: sel].

"LinkedList, Interval, MappedCollection"
	#("in class"  from:to: from:to:by:
"accessing" contents) do: [:sel | Approved add: sel].
	#(
"adding" addFirst: addLast:) do: [:sel | AddAndRemove add: sel].

"OrderedCollection, SortedCollection"
	#("accessing" after: before:
"copying" copyEmpty
"adding"  growSize
"removing" "enumerating" "private" 
"accessing" sortBlock) do: [:sel | Approved add: sel].
	#("adding" add:after: add:afterIndex: add:before: addAllFirst: addAllLast: addFirst: addLast:
"removing" removeAt: removeFirst removeLast
"accessing" sortBlock:) do: [:sel | AddAndRemove add: sel].

"Character"
	#("in class, instance creation" allCharacters digitValue: new separators
	"accessing untypeable characters" backspace cr enter lf linefeed nbsp newPage space tab
	"constants" alphabet characterTable
"accessing" asciiValue digitValue
"comparing"
"testing" isAlphaNumeric isDigit isLetter isLowercase isSafeForHTTP isSeparator isSpecial isUppercase isVowel tokenish
"copying"
"converting" asIRCLowercase asLowercase asUppercase
	) do: [:sel | Approved add: sel].

"String"
	#("in class, instance creation" crlf fromPacked:
	"primitives" findFirstInString:inSet:startingAt:endingAt: indexOfAscii:inString:startingAt:endingAt:	"internet" valueOfHtmlEntity:
"accessing" byteAt: endsWithDigit findAnySubStr:startingAt: findBetweenSubStrs: findDelimiters:startingAt: findString:startingAt: findString:startingAt:caseSensitive: findTokens: findTokens:includes: findTokens:keep: includesSubString: includesSubstring:caseSensitive: indexOf:startingAt: indexOfAnyOf: indexOfAnyOf:ifAbsent: indexOfAnyOf:startingAt: indexOfAnyOf:startingAt:ifAbsent: lineCorrespondingToIndex: lineCount lineNumber: skipAnySubStr:startingAt: skipDelimiters:startingAt: startsWithDigit
"comparing" alike: beginsWith: caseSensitiveLessOrEqual: charactersExactlyMatching: compare: crc16 endsWith: endsWithAnyOf: sameAs: startingAt:match:startingAt:
"copying" copyReplaceTokens:with: padded:to:with:
"converting" asByteArray asDate asDisplayText asFileName asHtml asLegalSelector asPacked asParagraph asText asTime asUnHtml asUrl asUrlRelativeTo: capitalized compressWithTable: contractTo: correctAgainst: encodeForHTTP initialIntegerOrNil keywords quoted sansPeriodSuffix splitInteger stemAndNumericSuffix substrings surroundedBySingleQuotes truncateWithElipsisTo: withBlanksTrimmed withFirstCharacterDownshifted withNoLineLongerThan: withSeparatorsCompacted withoutLeadingDigits withoutTrailingBlanks
"displaying" "printing"
"system primitives" compare:with:collated: 
"Celeste" withCRs
"internet" decodeMimeHeader decodeQuotedPrintable replaceHtmlCharRefs unescapePercents withInternetLineEndings withSqueakLineEndings withoutQuoting
"testing" isAllSeparators lastSpacePosition
"paragraph support" indentationIfBlank:
"arithmetic" ) do: [:sel | Approved add: sel].
	#(byteAt:put: translateToLowercase match:) do: [:sel | AddAndRemove add: sel].

"Symbol"
	#("in class, private" hasInterned:ifTrue:
	"access" morePossibleSelectorsFor: possibleSelectorsFor: selectorsContaining: thatStarts:skipping:
"accessing" "comparing" "copying" "converting" "printing" 
"testing" isInfix isKeyword isPvtSelector isUnary) do: [:sel | Approved add: sel].

"Array"
	#("comparing" "converting" evalStrings 
"printing" "private" hasLiteralSuchThat:) do: [:sel | Approved add: sel].

"Array2D"
	#("access" at:at: atCol: atCol:put: atRow: extent extent:fromArray: height width width:height:type:) do: [:sel | Approved add: sel].
	#(at:at:add: at:at:put: atRow:put: ) do: [:sel | AddAndRemove add: sel].

"ByteArray"
	#("accessing" doubleWordAt: wordAt: 
"platform independent access" longAt:bigEndian: shortAt:bigEndian: unsignedLongAt:bigEndian: unsignedShortAt:bigEndian: 
"converting") do: [:sel | Approved add: sel].
	#(doubleWordAt:put: wordAt:put: longAt:put:bigEndian: shortAt:put:bigEndian: unsignedLongAt:put:bigEndian: unsignedShortAt:put:bigEndian:
	) do: [:sel | AddAndRemove add: sel].

"FloatArray"		"Dont know what happens when prims not here"
	false ifTrue: [#("accessing" "arithmetic" *= += -= /=
"comparing"
"primitives-plugin" primAddArray: primAddScalar: primDivArray: primDivScalar: primMulArray: primMulScalar: primSubArray: primSubScalar:
"primitives-translated" primAddArray:withArray:from:to: primMulArray:withArray:from:to: primSubArray:withArray:from:to:
"converting" "private" "user interface") do: [:sel | Approved add: sel].
	].

"IntegerArray, WordArray"
"RunArray"
	#("in class, instance creation" runs:values: scanFrom:
"accessing" runLengthAt: 
"adding" "copying"
"private" runs values) do: [:sel | Approved add: sel].
	#(coalesce addLast:times: repeatLast:ifEmpty: repeatLastIfEmpty:
		) do: [:sel | AddAndRemove add: sel].

"Stream  -- many operations change its state"
	#("testing" atEnd) do: [:sel | Approved add: sel].
	#("accessing" next: nextMatchAll: nextMatchFor: upToEnd
next:put: nextPut: nextPutAll: "printing" print: printHtml:
	) do: [:sel | AddAndRemove add: sel].

"PositionableStream"
	#("accessing" contentsOfEntireFile originalContents peek peekFor: "testing"
"positioning" position ) do: [:sel | Approved add: sel].
	#(nextDelimited: nextLine upTo: position: reset resetContents setToEnd skip: skipTo: upToAll: ) do: [:sel | AddAndRemove add: sel].
	"Because it is so difficult to test the result of an operation on a Stream (you have to supply another Stream in the same state), we don't support Streams beyond the basics.  We want to find the messages that convert Streams to other things."

"ReadWriteStream"
	#("file status" closed) do: [:sel | Approved add: sel].
	#("accessing" next: on: ) do: [:sel | AddAndRemove add: sel].

"WriteStream"
	#("in class, instance creation" on:from:to: with: with:from:to:
		) do: [:sel | Approved add: sel].
	#("positioning" resetToStart
"character writing" crtab crtab:) do: [:sel | AddAndRemove add: sel].

"LookupKey, Association, Link"
	#("accessing" key nextLink) do: [:sel | Approved add: sel].
	#(key: key:value: nextLink:) do: [:sel | AddAndRemove add: sel].

"Point"
	#("in class, instance creation" r:degrees: x:y:
"accessing" x y "comparing" "arithmetic" "truncation and round off"
"polar coordinates" degrees r theta
"point functions" bearingToPoint: crossProduct: dist: dotProduct: eightNeighbors flipBy:centerAt: fourNeighbors grid: nearestPointAlongLineFrom:to: nearestPointOnLineFrom:to: normal normalized octantOf: onLineFrom:to: onLineFrom:to:within: quadrantOf: rotateBy:centerAt: transposed unitVector
"converting" asFloatPoint asIntegerPoint corner: extent: rect:
"transforming" adhereTo: rotateBy:about: scaleBy: scaleFrom:to: translateBy: "copying"
"interpolating" interpolateTo:at:) do: [:sel | Approved add: sel].

"Rectangle"
	#("in class, instance creation" center:extent: encompassing: left:right:top:bottom: 
	merging: origin:corner: origin:extent: 
"accessing" area bottom bottomCenter bottomLeft bottomRight boundingBox center corner corners innerCorners left leftCenter origin right rightCenter top topCenter topLeft topRight
"comparing"
"rectangle functions" adjustTo:along: amountToTranslateWithin: areasOutside: bordersOn:along: encompass: expandBy: extendBy: forPoint:closestSideDistLen: insetBy: insetOriginBy:cornerBy: intersect: merge: pointNearestTo: quickMerge: rectanglesAt:height: sideNearestTo: translatedToBeWithin: withBottom: withHeight: withLeft: withRight: withSide:setTo: withTop: withWidth:
"testing" containsPoint: containsRect: hasPositiveExtent intersects: isTall isWide
"truncation and round off"
"transforming" align:with: centeredBeneath: newRectFrom: squishedWithin: "copying"
	) do: [:sel | Approved add: sel].

"Color"
	#("in class, instance creation" colorFrom: colorFromPixelValue:depth: fromRgbTriplet: gray: h:s:v: r:g:b: r:g:b:alpha: r:g:b:range:
	"named colors" black blue brown cyan darkGray gray green lightBlue lightBrown lightCyan lightGray lightGreen lightMagenta lightOrange lightRed lightYellow magenta orange red transparent veryDarkGray veryLightGray veryVeryDarkGray veryVeryLightGray white yellow
	"other" colorNames indexedColors pixelScreenForDepth: quickHighLight:
"access" alpha blue brightness green hue luminance red saturation
"equality"
"queries" isBitmapFill isBlack isGray isSolidFill isTranslucent isTranslucentColor
"transformations" alpha: dansDarker darker lighter mixed:with: muchLighter slightlyDarker slightlyLighter veryMuchLighter alphaMixed:with:
"groups of shades" darkShades: lightShades: mix:shades: wheel:
"printing" shortPrintString
"other" colorForInsets rgbTriplet
"conversions" asB3DColor asColor balancedPatternForDepth: bitPatternForDepth: closestPixelValue1 closestPixelValue2 closestPixelValue4 closestPixelValue8 dominantColor halfTonePattern1 halfTonePattern2 indexInMap: pixelValueForDepth: pixelWordFor:filledWith: pixelWordForDepth: scaledPixelValue32
"private" privateAlpha privateBlue privateGreen privateRGB privateRed "copying"
	) do: [:sel | Approved add: sel].

"	For each selector that requires a block argument, add (selector argNum) 
		to the set Blocks."
"ourClasses _ #(Object Boolean True False UndefinedObject Behavior ClassDescription Class Metaclass MethodContext BlockContext Message Magnitude Date Time Number Integer SmallInteger LargeNegativeInteger LargePositiveInteger Float Fraction Random Collection SequenceableCollection ArrayedCollection Bag Set Dictionary IdentityDictionary IdentitySet LinkedList Interval MappedCollection OrderedCollection SortedCollection Character String Symbol Array Array2D ByteArray FloatArray IntegerArray WordArray RunArray Stream PositionableStream ReadWriteStream WriteStream LookupKey Association Link Point Rectangle Color).
ourClasses do: [:clsName | cls _ Smalltalk at: clsName.
	(cls selectors) do: [:aSel |
		((Approved includes: aSel) or: [AddAndRemove includes: aSel]) ifTrue: [
			(cls formalParametersAt: aSel) withIndexDo: [:tName :ind |
				(tName endsWith: 'Block') ifTrue: [
					Blocks add: (Array with: aSel with: ind)]]]]].
"
#((timesRepeat: 1 ) (indexOf:ifAbsent: 2 ) (pairsCollect: 1 ) (mergeSortFrom:to:by: 3 ) (ifNotNil:ifNil: 1 ) (ifNotNil:ifNil: 2 ) (ifNil: 1 ) (at:ifAbsent: 2 ) (ifNil:ifNotNil: 1 ) (ifNil:ifNotNil: 2 ) (ifNotNil: 1 ) (at:modify: 2 ) (identityIndexOf:ifAbsent: 2 ) (sort: 1 ) (sortBlock: 1 ) (detectMax: 1 ) (repeatLastIfEmpty: 1 ) (allSubclassesWithLevelDo:startingLevel: 1 ) (keyAtValue:ifAbsent: 2 ) (in: 1 ) (ifTrue: 1 ) (or: 1 ) (select: 1 ) (inject:into: 2 ) (ifKindOf:thenDo: 2 ) (forPoint:closestSideDistLen: 2 ) (value:ifError: 2 ) (selectorsDo: 1 ) (removeAllSuchThat: 1 ) (keyAtIdentityValue:ifAbsent: 2 ) (detectMin: 1 ) (detect:ifNone: 1 ) (ifTrue:ifFalse: 1 ) (ifTrue:ifFalse: 2 ) (detect:ifNone: 2 ) (hasLiteralSuchThat: 1 ) (indexOfAnyOf:ifAbsent: 2 ) (reject: 1 ) (newRectFrom: 1 ) (removeKey:ifAbsent: 2 ) (at:ifPresent: 2 ) (associationAt:ifAbsent: 2 ) (withIndexCollect: 1 ) (repeatLast:ifEmpty: 2 ) (findLast: 1 ) (indexOf:startingAt:ifAbsent: 3 ) (remove:ifAbsent: 2 ) (ifFalse:ifTrue: 1 ) (ifFalse:ifTrue: 2 ) (caseOf:otherwise: 2 ) (count: 1 ) (collect: 1 ) (sortBy: 1 ) (and: 1 ) (asSortedCollection: 1 ) (with:collect: 2 ) (sourceCodeAt:ifAbsent: 2 ) (detect: 1 ) (scopeHas:ifTrue: 2 ) (collectWithIndex: 1 ) (compiledMethodAt:ifAbsent: 2 ) (bindWithTemp: 1 ) (detectSum: 1 ) (indexOfSubCollection:startingAt:ifAbsent: 3 ) (findFirst: 1 ) (sourceMethodAt:ifAbsent: 2 ) (collect:thenSelect: 1 ) (collect:thenSelect: 2 ) (select:thenCollect: 1 ) (select:thenCollect: 2 ) (ifFalse: 1 ) (indexOfAnyOf:startingAt:ifAbsent: 3 ) (indentationIfBlank: 1 ) ) do: [:anArray |
	Blocks add: anArray].

self initialize3.

"
MethodFinder new initialize.
MethodFinder new organizationFiltered: TranslucentColor class 
"
"Do not forget class messages for each of these classes"
! !


!MiscPrimitivePlugin class methodsFor: 'translation' stamp: 'sac 1/22/2002 15:16'!
translatedPrimitives
	"an assorted list of various primitives"
	^#(
		(Bitmap compress:toByteArray:)
		(Bitmap decompress:fromByteArray:at:)
		(Bitmap encodeBytesOf:in:at:)
		(Bitmap encodeInt:in:at:)
		(String compare:with:collated:)
		(String translate:from:to:table:)	
		(String findFirstInString:inSet:startingAt:endingAt:)
		(String indexOfAscii:inString:startingAt:endingAt:)
		(String findSubstring:in:startingAt:matchTable:)
		(String stringHash:initialHash:)
		(SampledSound convert8bitSignedFrom:to16Bit:)
	)
! !


!String methodsFor: 'accessing' stamp: 'sac 10/1/2001 17:03'!
indexOf: aCharacter 
	aCharacter class == Character
		ifFalse: [^ 0].
	^ String
		indexOfAscii: aCharacter asciiValue
		inString: self
		startingAt: 1
		endingAt: self size.! !

!String methodsFor: 'accessing' stamp: 'sac 10/3/2001 11:52'!
indexOf: aCharacter startingAt: start 
	^self
		indexOf: aCharacter
		startingAt: start
		endingAt: self size! !

!String methodsFor: 'accessing' stamp: 'sac 9/24/2001 05:31'!
indexOf: aCharacter startingAt: start endingAt: end
	aCharacter class == Character
		ifFalse: [^ 0].
	^ String
		indexOfAscii: aCharacter asciiValue
		inString: self
		startingAt: start
		endingAt: (end min: self size)! !

!String methodsFor: 'accessing' stamp: 'sac 10/1/2001 17:04'!
indexOf: aCharacter startingAt: start ifAbsent: aBlock 
	| ans |
	aCharacter class == Character
		ifFalse: [^ aBlock value].
	ans _ String
				indexOfAscii: aCharacter asciiValue
				inString: self
				startingAt: start
				endingAt: self size.
	ans = 0
		ifTrue: [^ aBlock value]
		ifFalse: [^ ans]! !

!String methodsFor: 'accessing' stamp: 'sac 1/22/2002 15:26'!
indexOfAnyOf: aCharacterSet  startingAt: start
	"returns the index of the first character in the given set, starting from start.  Returns 0 if none are found"
	^String findFirstInString: self  inSet: aCharacterSet byteArrayMap startingAt: start endingAt: self size.
! !

!String methodsFor: 'accessing' stamp: 'sac 1/22/2002 15:32'!
indexOfAnyOf: aCharacterSet  startingAt: start endingAt: end
	"returns the index of the first character in the given set, starting from start.  Returns 0 if none are found"
	end > self size ifTrue: [self error: 'Access beyond the bounds'].
	^String findFirstInString: self  inSet: aCharacterSet byteArrayMap startingAt: start endingAt: end.
! !

!String methodsFor: 'accessing' stamp: 'sac 1/22/2002 15:26'!
indexOfAnyOf: aCharacterSet  startingAt: start ifAbsent: aBlock
	"returns the index of the first character in the given set, starting from start"

	| ans |
	ans _ self indexOfAnyOf: aCharacterSet startingAt: start.
	ans = 0 
		ifTrue: [ ^aBlock value ]
		ifFalse: [ ^ans ]! !

!String methodsFor: 'converting' stamp: 'sac 1/22/2002 16:03'!
substrings
	"Answer an array of the substrings that compose the receiver."
	self substrings: CSNonSeparators.! !

!String methodsFor: 'converting' stamp: 'sac 1/22/2002 15:29'!
substrings: aCharacterSet
	"Answer an array of the substrings containing only charactes in aCharacterSet that compose the receiver."
	| result end beginning seperators |

	result _ WriteStream on: (Array new: 10).
	seperators _ aCharacterSet complement.


	end _ 0.
	"find one substring each time through this loop"
	[ 
		"find the beginning of the next substring"
		beginning _ self indexOfAnyOf: aCharacterSet startingAt: end+1 ifAbsent: [ nil ].
		beginning ~~ nil ] 
	whileTrue: [
		"find the end"
		end _ self indexOfAnyOf: seperators startingAt: beginning ifAbsent: [ self size + 1 ].
		end _ end - 1.

		result nextPut: (self copyFrom: beginning to: end).

	].


	^result contents! !


!String class methodsFor: 'primitives' stamp: 'sac 1/22/2002 15:14'!
findFirstInString: aString  inSet: inclusionMap  startingAt: start endingAt: end

	"Scan a string for from the start to the end as long as the characters  
	are in the first character set. Return the index of the last character  
	that is also in the second character set.  
	 
	For example, if you want ot identify the end of a whitespace  
	deliminated alpha string, make the first set alphabetic+whitespace, and  
	the second set be alphabetic."
	| |
	<primitive: 'primitiveMatchInSubString' module:'MiscPrimitivePlugin'> "primitiveExternalCall" 
	self var: #aString declareC: 'unsigned char *aString'.
	self var: #inclusionMap declareC: 'char *inclusionMap'.

	start
		to: end
		do: [:pos | (inclusionMap at: (aString at: pos) asciiValue + 1)
					= 1
				ifTrue: [^ pos]].
	^ 0! !

!String class methodsFor: 'primitives' stamp: 'sac 10/1/2001 17:05'!
indexOfAscii: anInteger inString: aString startingAt: start endingAt: end 
	| |
	self var: #aCharacter declareC: 'int anInteger'.
	self var: #aString declareC: 'unsigned char *aString'.
	
	start
		to: end
		do: [:pos | (aString at: pos) asciiValue = anInteger
				ifTrue: [^ pos]].
	^ 0! !

String class removeSelector: #findFirstInString:inSet:startingAt:!
String class removeSelector: #indexOfAscii:inString:startingAt:!
-------------- next part --------------
'From Squeak3.2alpha of 11 October 2001 [latest update: #4646] on 22 January 2002 at 10:05:33 pm'!

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allAlphabetic
	"return a set containing only alphabetic characters"
	^self allMatching: [ :char | char isLetter].

 ! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:23'!
allCharacters
	"return a set containing all characters"
	^self allMatching: [ :unused | true ]! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:25'!
allMatching: aBlock 
	"return a set containing all characters matching some predicate."
	| set character |
	set _ self empty.
	0
		to: 255
		do: [:ascii | 
			character _ Character value: ascii.
			(aBlock value: character)
				ifTrue: [set add: character]].
	^ set! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 9/22/2001 02:30'!
allNumerical
	"return a set containing only alphabetic characters"
	^ self
		allMatching: [:char | char isDigit]! !

!CharacterSet class methodsFor: 'instance creation' stamp: 'sac 10/1/2001 17:24'!
allSeperator
	"return a set containing only alphabetic characters"
	^ self
		allMatching: [:char | char isSeparator]! !


!SharedQueue methodsFor: 'testing' stamp: 'sac 8/9/2001 13:22'!
critical: aBlock
	"Evaluate aBlock with the accessProtect semaphore held. Do NOT mutate the queue during aBlock."
	"This is to allow variable operation based on the sharedqueue length, for example,
	 disabling writers when the queue gets too long

		queue critical: [queue size > 10. ifTrue: self writerDisable wait]

	And the writer runs
		self wait.
		self signal."

	accessProtect critical: aBlock.

! !


!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 01:41'!
getData
	"Get some data"
	| buf bytesRead |
	(self waitForDataUntil: Socket standardDeadline)
		ifFalse: [self error: 'getData timeout'].
	buf _ String new: 4096.
	bytesRead _ self
				primSocket: socketHandle
				receiveDataInto: buf
				startingAt: 1
				count: buf size.
	^ buf copyFrom: 1 to: bytesRead! !

!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/24/2001 03:17'!
receiveDataInto: aStringOrByteArray at: starting
	"Receive data into the given buffer and return the number of bytes 
	received. Note the given buffer may be only partially filled by the 
	received data."
	^ self
		primSocket: socketHandle
		receiveDataInto: aStringOrByteArray
		startingAt: starting
		count: (aStringOrByteArray size - starting) + 1! !

!Socket methodsFor: 'sending-receiving' stamp: 'sac 9/29/2001 05:31'!
trySendSomeData: aStringOrByteArray startIndex: startIndex count: count 
	"Send up to count bytes of the given data starting at the given index.  
	Answer the number of bytes actually sent. Do not do any timeouts for data not sent."
	"Note: This operation may have to be repeated multiple times to send a  
	large amount of data."
	^ self
		primSocket: socketHandle
		sendData: aStringOrByteArray
		startIndex: startIndex
		count: count! !


!UndefinedObject methodsFor: 'testing' stamp: 'sac 9/24/2001 07:12'!
isNil
	"Refer to the comment in ProtoObject|isNil."
	^ true! !


!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:01'!
crlf
	"Append a return character to the receiver."

	self nextPut: Character cr.
	self nextPut: Character lf.! !

!WriteStream methodsFor: 'character writing' stamp: 'sac 8/10/2001 15:00'!
lf
	"Append a return character to the receiver."

	self nextPut: Character lf! !


!ReadWriteStream methodsFor: 'accessing' stamp: 'sac 9/24/2001 03:47'!
on: aCollection 
	super on: aCollection.
	readLimit _ aCollection size.
! !



More information about the Squeak-dev mailing list