<br><br><div class="gmail_quote">On Tue, Feb 9, 2010 at 3:44 PM, David T. Lewis <span dir="ltr"><<a href="mailto:lewis@mail.msen.com">lewis@mail.msen.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
A bit of a strain on the old garbage collector, but a Bag is good<br>
for that kind of analysis:<br>
<br>
f := FileStream fileNamed: 'strace.txt'.<br>
lines := Bag new.<br>
[[f atEnd] whileFalse: [lines add: (f upTo: Character lf)]]<br>
ensure: [f close].<br>
lines sortedCounts inspect<br></blockquote><div><br></div><div>That doesn't do what I want. That gives the frequency of each line. I want a shortened file that I can browse more easily where successive runs of multiple lines are compressed down to a single run of the multiple lines marked with a repeat count. Instead of having to wade through pages and pages of the same N lines there is just one occurrence of those N lines prefixed with a repeat count. So the condensed log preserves the ordering of the events it logs but is much abbreviated.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Dave<br>
<div><div></div><div class="h5"><br>
On Tue, Feb 09, 2010 at 01:34:19PM -0800, Eliot Miranda wrote:<br>
> Hi All,<br>
><br>
> I've just needed to make sense of a very long log file generated by<br>
> strace. The log file is full of entries like:<br>
><br>
> --- SIGALRM (Alarm clock) @ 0 (0) ---<br>
> gettimeofday({1265744804, 491238}, NULL) = 0<br>
> sigreturn() = ? (mask now [])<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
><br>
> and my workspace script reduces these to e.g.<br>
><br>
> --- SIGALRM (Alarm clock) @ 0 (0) ---<br>
> gettimeofday({1265744797, 316183}, NULL) = 0<br>
> sigreturn() = ? (mask now [])<br>
> NEXT 2 LINES REPEAT 715 TIMES<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> --- SIGALRM (Alarm clock) @ 0 (0) ---<br>
> gettimeofday({1265744797, 317189}, NULL) = 0<br>
> sigreturn() = ? (mask now [])<br>
><br>
><br>
> My question is has anyone looked at this issue in any depth and perhaps come<br>
> up with something not as crude as the below and possibly even recursive.<br>
> i.e. the above would ideally be reduced to e.g.<br>
><br>
> NEXT 7 LINES REPEAT 123456 TIMES<br>
> --- SIGALRM (Alarm clock) @ 0 (0) ---<br>
> gettimeofday({1265744797, 316183}, NULL) = 0<br>
> sigreturn() = ? (mask now [])<br>
> NEXT 2 LINES REPEAT BETWEEN 500 AND 800 TIMES<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> ioctl(8, 0x80045530, 0xbfd4fe70) = 0<br>
> ioctl(8, 0xc1205531, 0xbfd4fb80) = 0<br>
> --- SIGALRM (Alarm clock) @ 0 (0) ---<br>
> gettimeofday({1265744797, 317189}, NULL) = 0<br>
> sigreturn() = ? (mask now [])<br>
><br>
><br>
><br>
> Here's my quick hack that I ran in vw7.7nc:<br>
><br>
> | f o lines maxrun repeats range |<br>
> f := '../Cog/squeak.strace.log' asFilename readStream.<br>
> o := 'compressed.log' asFilename writeStream.<br>
> lines := OrderedCollection new.<br>
> maxrun := 50.<br>
> repeats := 0.<br>
> range := nil.<br>
> [[f atEnd] whileFalse:<br>
> [lines size > maxrun ifTrue:<br>
> [repeats > 0<br>
> ifTrue:<br>
> [1 to: range first - 1 do:<br>
> [:i| o nextPutAll: (lines at: i); cr].<br>
> o nextPutAll: 'NEXT '; print: range size; nextPutAll: ' LINES REPEAT ';<br>
> print: repeats + 1; nextPutAll: ' TIMES'; cr.<br>
> range do:<br>
> [:i| o nextPutAll: (lines at: i); cr].<br>
> lines removeFirst: range last.<br>
> repeats := 0]<br>
> ifFalse:<br>
> [o nextPutAll: lines removeFirst; cr; flush].<br>
> range := nil].<br>
> lines addLast: (f upTo: Character cr).<br>
> [:exit|<br>
> 1 to: lines size do:<br>
> [:i| | line repeat |<br>
> line := lines at: i.<br>
> repeat := lines nextIndexOf: line from: i + 1 to: lines size.<br>
> (repeat ~~ nil<br>
> and: [lines size >= (repeat - i * 2 + i)<br>
> and: [(i to: repeat - 1) allSatisfy: [:j| (lines at: j) = (lines at: j - i<br>
> + repeat)]]]) ifTrue:<br>
> [repeats := repeats + 1.<br>
> range isNil<br>
> ifTrue: [range := i to: repeat - 1]<br>
> ifFalse:<br>
> [range = (i to: repeat - 1) ifTrue:<br>
> [range do: [:ignore| lines removeAtIndex: repeat].<br>
> exit value]]]]] valueWithExit]]<br>
> ensure: [f close. o close].<br>
> repeats<br>
><br>
> Forgive the cross post. I expect deep expertise in each newsgroup posted<br>
> to.<br>
><br>
> best<br>
> Eliot<br>
<br>
><br>
<br>
<br>
</div></div></blockquote></div><br>