join

Oscar Nierstrasz oscar at iam.unibe.ch
Fri Sep 15 19:15:39 UTC 2006


Hi Folks,

Let me summarize my thoughts on the topic of split and join.

General:
- As far as I can tell, neither split: nor join: can be easily  
simulated by existing methods

On split:
- split: is fairly specific to strings, though it could be  
generalized to other kinds of Sequenceables
- split: should take a String, not a Character, as its argument (as  
in Ruby & friends)
- generally speaking, the argument to split: should be a Regex, so it  
makes sense to make String>>split: an extension to String from the  
regex package  (i.e., VB-Regex)
- string-matching is a well-known problem, so we should avoid ad hoc  
solutions.  See for example
	http://doi.acm.org/10.1145/360825.360855
For this reason I think that split: should depend on VB-Regex, unless  
someone want's to implement one of the modern algoirthms

I propose that: String>>split: ^ regexString asRegex split: self
As follows

Regex>>split: aString
	| result lastPosition |
	result := OrderedCollection new.
	stream := aString readStream.
	lastPosition := stream position.
	[ self searchStream: stream ] whileTrue:
		[ result add: (aString copyFrom: lastPosition+1 to: (self  
subBeginning: 1)).
		self assert: lastPosition < stream position description: 'Regex  
cannot match null string'.
		lastPosition := stream position ].
	result add: (aString copyFrom: lastPosition+1 to: aString size).
	^ result

NB:
- Assertion is needed to avoid infinite loops in case of null Regex.

On join:
- join: is the conceptual inverse of split: (see the tests in http:// 
squeaksource.com/RubyShards/)
- join: obviously works for Sequenceables as well as Strings

I propose adding the following method to either  
SequenceableCollection or OrderedCollection [the tradeoff is not  
clear to me].

join: anOrderedCollection
	"Implicit precondition: my elements are all OrderedCollections"
	| result index |
	result := (self at: 1) writeStream.
	result nextPutAll: (self at: 1).
	index := 2.
	[index <= self size] whileTrue: [
		result nextPutAll: anOrderedCollection.
		result nextPutAll: (self at: index).
		index := index + 1.
		].
	^ result contents

This will clearly work not only for Strings but for other kinds of  
collections too.

SplitJoinTest>>setUp
	eg := 'Now is the time for all good men to come to the aid of the  
party'.

SplitJoinTest>>testJoin
	self assert: ((eg split: 'the') join: 'the') = eg.
	self assert: ({ {1. 2}. {4. 5} } asOrderedCollection join: {3}) =  
{1. 2. 3. 4. 5}.

Is anyone convinced?

Oscar

On Sep 15, 2006, at 17:40, stephane ducasse wrote:

> hi guys
>
> could you sit down and propose a cool set of names and  
> implementation and we include it in 3.10
>
> http://bugs.impara.de/view.php?id=4874
>




More information about the Squeak-dev mailing list