[squeak-dev] Generator example: Nested log file compression

Andreas Raab andreas.raab at gmx.de
Thu Feb 11 05:51:44 UTC 2010


Folks -

Now that we have both a compression algorithm and generators, let's put 
them together. Attached is a (slightly more compact) version of my 
compression script for Eliot that can be used like this:

Array
	compress: 'AAAABCBCBCABCABCA' readStream
	to:[:count :pattern| Transcript show: count printString,'x', (String 
withAll: pattern); space]
	lookahead: 20.

But more interestingly, we can now nest the algorithm using generators:

Installer installUrl:
'http://source.lukas-renggli.ch/continuations/Generator-ar.5.mcz'.

generator := Generator on:[:g|
	Array compress:'AABAAB' readStream
			to:[:repeat :pattern| g yield: repeat printString, 'x', (String 
withAll: pattern)]
			lookahead: 20.
].

Array
	compress: generator
	to:[:repeat :pattern| Transcript cr; show: repeat printString, 'x ', 
pattern, ' ']
	lookahead: 20.

The result of which is 2x #('2xA' '1xB'), i.e., AABAAB just as expected.

The advantage of using generators in this case is that I can whip up the 
nested version without having to think of eventual consequences in terms 
of space (yes, I could conceivably run two passes but would that work if 
Eliot's log file is GBs in size?) or how to restructure the algorithm 
(yes, you could rewrite the algorithm but why waste my valuable time?).

*That* is the value of generators to me. They solve a certain class of 
problems straightforwardly and intuitively that one needs to work around 
in awkward ways otherwise. It's simply a way of being more productive 
when you have a problem that fits that pattern (which is not all that 
uncommon).

Cheers,
   - Andreas
-------------- next part --------------
'From Squeak3.11alpha of 10 February 2010 [latest update: #9307] on 10 February 2010 at 9:50:18 pm'!

!Array class methodsFor: 'utilities' stamp: 'ar 2/10/2010 21:50'!
compress: inputStream to: outputBlock lookahead: lookahead
	"Compress inputStream to outputBlock by emitting sequences of repeated patterns.
	Example:
		Array 
			compress: 'AAAABCBCBCABCABCA' readStream
			to:[:count :pattern| Transcript show: count printString,'x', (String withAll: pattern); space]
			lookahead: 20.
	The matching algorithm itself looks ahead for a match of the first element and if it repeats, simply eats the input until the match ends.
	Could be improved in various ways, for example by detecting the longest match instead of taking the first match (this leads to AABAAB being output as 2xA 1xB 2xA 1xB instead of 2xAAB)."
	| input output buffer repeat match pattern stop |
	input := [:buf| inputStream atEnd ifFalse:[buf add: inputStream next]].
	output := outputBlock.
	buffer := OrderedCollection new.
	[lookahead - buffer size timesRepeat:[input value: buffer].
	buffer isEmpty] whileFalse:[   
		repeat := match := 1.
		"find first occurance of first element in lookahead buffer"
		[repeat = 1 and:[
			match := buffer indexOf: buffer first startingAt: match+1 ifAbsent:[0].
			match between: 2 and: lookahead // 2]] whileTrue:[
				"see if we have a repeat pattern"
				pattern := buffer copyFrom: 1 to: match-1.
				stop := match+match-2.
				[buffer size >= stop and:[pattern = (buffer copyFrom: match to: stop)]] whileTrue:[
				"Eat the pattern and repeat"
				repeat := repeat + 1.
				buffer removeFirst: match-1.
				match-1 timesRepeat:[input value: buffer]]].
		repeat = 1 ifTrue:[match := 2]. "match always wrong if repeat = 1"
		output value: repeat value: (buffer removeFirst: match-1)].
! !


More information about the Squeak-dev mailing list