[squeak-dev] Generator example: Nested log file compression
Andreas Raab
andreas.raab at gmx.de
Thu Feb 11 05:51:44 UTC 2010
Folks -
Now that we have both a compression algorithm and generators, let's put
them together. Attached is a (slightly more compact) version of my
compression script for Eliot that can be used like this:
Array
compress: 'AAAABCBCBCABCABCA' readStream
to:[:count :pattern| Transcript show: count printString,'x', (String
withAll: pattern); space]
lookahead: 20.
But more interestingly, we can now nest the algorithm using generators:
Installer installUrl:
'http://source.lukas-renggli.ch/continuations/Generator-ar.5.mcz'.
generator := Generator on:[:g|
Array compress:'AABAAB' readStream
to:[:repeat :pattern| g yield: repeat printString, 'x', (String
withAll: pattern)]
lookahead: 20.
].
Array
compress: generator
to:[:repeat :pattern| Transcript cr; show: repeat printString, 'x ',
pattern, ' ']
lookahead: 20.
The result of which is 2x #('2xA' '1xB'), i.e., AABAAB just as expected.
The advantage of using generators in this case is that I can whip up the
nested version without having to think of eventual consequences in terms
of space (yes, I could conceivably run two passes but would that work if
Eliot's log file is GBs in size?) or how to restructure the algorithm
(yes, you could rewrite the algorithm but why waste my valuable time?).
*That* is the value of generators to me. They solve a certain class of
problems straightforwardly and intuitively that one needs to work around
in awkward ways otherwise. It's simply a way of being more productive
when you have a problem that fits that pattern (which is not all that
uncommon).
Cheers,
- Andreas
-------------- next part --------------
'From Squeak3.11alpha of 10 February 2010 [latest update: #9307] on 10 February 2010 at 9:50:18 pm'!
!Array class methodsFor: 'utilities' stamp: 'ar 2/10/2010 21:50'!
compress: inputStream to: outputBlock lookahead: lookahead
"Compress inputStream to outputBlock by emitting sequences of repeated patterns.
Example:
Array
compress: 'AAAABCBCBCABCABCA' readStream
to:[:count :pattern| Transcript show: count printString,'x', (String withAll: pattern); space]
lookahead: 20.
The matching algorithm itself looks ahead for a match of the first element and if it repeats, simply eats the input until the match ends.
Could be improved in various ways, for example by detecting the longest match instead of taking the first match (this leads to AABAAB being output as 2xA 1xB 2xA 1xB instead of 2xAAB)."
| input output buffer repeat match pattern stop |
input := [:buf| inputStream atEnd ifFalse:[buf add: inputStream next]].
output := outputBlock.
buffer := OrderedCollection new.
[lookahead - buffer size timesRepeat:[input value: buffer].
buffer isEmpty] whileFalse:[
repeat := match := 1.
"find first occurance of first element in lookahead buffer"
[repeat = 1 and:[
match := buffer indexOf: buffer first startingAt: match+1 ifAbsent:[0].
match between: 2 and: lookahead // 2]] whileTrue:[
"see if we have a repeat pattern"
pattern := buffer copyFrom: 1 to: match-1.
stop := match+match-2.
[buffer size >= stop and:[pattern = (buffer copyFrom: match to: stop)]] whileTrue:[
"Eat the pattern and repeat"
repeat := repeat + 1.
buffer removeFirst: match-1.
match-1 timesRepeat:[input value: buffer]]].
repeat = 1 ifTrue:[match := 2]. "match always wrong if repeat = 1"
output value: repeat value: (buffer removeFirst: match-1)].
! !
More information about the Squeak-dev
mailing list
|