join
Oscar Nierstrasz
oscar at iam.unibe.ch
Fri Sep 15 19:15:39 UTC 2006
Hi Folks,
Let me summarize my thoughts on the topic of split and join.
General:
- As far as I can tell, neither split: nor join: can be easily
simulated by existing methods
On split:
- split: is fairly specific to strings, though it could be
generalized to other kinds of Sequenceables
- split: should take a String, not a Character, as its argument (as
in Ruby & friends)
- generally speaking, the argument to split: should be a Regex, so it
makes sense to make String>>split: an extension to String from the
regex package (i.e., VB-Regex)
- string-matching is a well-known problem, so we should avoid ad hoc
solutions. See for example
http://doi.acm.org/10.1145/360825.360855
For this reason I think that split: should depend on VB-Regex, unless
someone want's to implement one of the modern algoirthms
I propose that: String>>split: ^ regexString asRegex split: self
As follows
Regex>>split: aString
| result lastPosition |
result := OrderedCollection new.
stream := aString readStream.
lastPosition := stream position.
[ self searchStream: stream ] whileTrue:
[ result add: (aString copyFrom: lastPosition+1 to: (self
subBeginning: 1)).
self assert: lastPosition < stream position description: 'Regex
cannot match null string'.
lastPosition := stream position ].
result add: (aString copyFrom: lastPosition+1 to: aString size).
^ result
NB:
- Assertion is needed to avoid infinite loops in case of null Regex.
On join:
- join: is the conceptual inverse of split: (see the tests in http://
squeaksource.com/RubyShards/)
- join: obviously works for Sequenceables as well as Strings
I propose adding the following method to either
SequenceableCollection or OrderedCollection [the tradeoff is not
clear to me].
join: anOrderedCollection
"Implicit precondition: my elements are all OrderedCollections"
| result index |
result := (self at: 1) writeStream.
result nextPutAll: (self at: 1).
index := 2.
[index <= self size] whileTrue: [
result nextPutAll: anOrderedCollection.
result nextPutAll: (self at: index).
index := index + 1.
].
^ result contents
This will clearly work not only for Strings but for other kinds of
collections too.
SplitJoinTest>>setUp
eg := 'Now is the time for all good men to come to the aid of the
party'.
SplitJoinTest>>testJoin
self assert: ((eg split: 'the') join: 'the') = eg.
self assert: ({ {1. 2}. {4. 5} } asOrderedCollection join: {3}) =
{1. 2. 3. 4. 5}.
Is anyone convinced?
Oscar
On Sep 15, 2006, at 17:40, stephane ducasse wrote:
> hi guys
>
> could you sit down and propose a cool set of names and
> implementation and we include it in 3.10
>
> http://bugs.impara.de/view.php?id=4874
>
More information about the Squeak-dev
mailing list
|