[squeak-dev] OT: compressing log files

Eliot Miranda eliot.miranda at gmail.com
Tue Feb 9 23:57:45 UTC 2010


On Tue, Feb 9, 2010 at 3:44 PM, David T. Lewis <lewis at mail.msen.com> wrote:

> A bit of a strain on the old garbage collector, but a Bag is good
> for that kind of analysis:
>
>  f := FileStream fileNamed: 'strace.txt'.
>  lines := Bag new.
>  [[f atEnd] whileFalse: [lines add: (f upTo: Character lf)]]
>      ensure: [f close].
>  lines sortedCounts inspect
>

That doesn't do what I want.  That gives the frequency of each line.  I want
a shortened file that I can browse more easily where successive runs of
multiple lines are compressed down to a single run of the multiple lines
marked with a repeat count.  Instead of having to wade through pages and
pages of the same N lines there is just one occurrence of those N lines
prefixed with a repeat count.  So the condensed log preserves the ordering
of the events it logs but is much abbreviated.


> Dave
>
> On Tue, Feb 09, 2010 at 01:34:19PM -0800, Eliot Miranda wrote:
> > Hi All,
> >
> >     I've just needed to make sense of a very long log file generated by
> > strace.  The log file is full of entries like:
> >
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > gettimeofday({1265744804, 491238}, NULL) = 0
> > sigreturn()                             = ? (mask now [])
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> >
> > and my workspace script reduces these to e.g.
> >
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > gettimeofday({1265744797, 316183}, NULL) = 0
> > sigreturn()                             = ? (mask now [])
> > NEXT 2 LINES REPEAT 715 TIMES
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > gettimeofday({1265744797, 317189}, NULL) = 0
> > sigreturn()                             = ? (mask now [])
> >
> >
> > My question is has anyone looked at this issue in any depth and perhaps
> come
> > up with something not as crude as the below and possibly even recursive.
> >  i.e. the above would ideally be reduced to e.g.
> >
> > NEXT 7 LINES REPEAT 123456 TIMES
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > gettimeofday({1265744797, 316183}, NULL) = 0
> > sigreturn()                             = ? (mask now [])
> > NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > ioctl(8, 0x80045530, 0xbfd4fe70)        = 0
> > ioctl(8, 0xc1205531, 0xbfd4fb80)        = 0
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > gettimeofday({1265744797, 317189}, NULL) = 0
> > sigreturn()                             = ? (mask now [])
> >
> >
> >
> > Here's my quick hack that I ran in vw7.7nc:
> >
> > | f o lines maxrun repeats range |
> > f := '../Cog/squeak.strace.log' asFilename readStream.
> > o := 'compressed.log' asFilename writeStream.
> > lines := OrderedCollection new.
> > maxrun := 50.
> > repeats := 0.
> > range := nil.
> > [[f atEnd] whileFalse:
> > [lines size > maxrun ifTrue:
> > [repeats > 0
> > ifTrue:
> > [1 to: range first - 1 do:
> > [:i| o nextPutAll: (lines at: i); cr].
> > o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT ';
> > print: repeats + 1; nextPutAll: ' TIMES'; cr.
> > range do:
> > [:i| o nextPutAll: (lines at: i); cr].
> > lines removeFirst: range last.
> > repeats := 0]
> > ifFalse:
> > [o nextPutAll: lines removeFirst; cr; flush].
> >  range := nil].
> > lines addLast: (f upTo: Character cr).
> > [:exit|
> > 1 to: lines size do:
> > [:i| | line repeat |
> > line := lines at: i.
> > repeat := lines nextIndexOf: line from: i + 1 to: lines size.
> > (repeat ~~ nil
> >  and: [lines size >= (repeat - i * 2 + i)
> >  and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j -
> i
> > + repeat)]]]) ifTrue:
> > [repeats := repeats + 1.
> >  range isNil
> > ifTrue: [range := i to: repeat - 1]
> > ifFalse:
> > [range = (i to: repeat - 1) ifTrue:
> > [range do: [:ignore| lines removeAtIndex: repeat].
> >  exit value]]]]] valueWithExit]]
> > ensure: [f close. o close].
> > repeats
> >
> > Forgive the cross post.  I expect deep expertise in each newsgroup posted
> > to.
> >
> > best
> > Eliot
>
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20100209/b0f2c42b/attachment.htm


More information about the Squeak-dev mailing list