[Newbies] Linux locks up when handling large data sets

Sat May 3 22:22:31 UTC 2008

> Quoting johnps11 at bigpond.com:
>
>> Hi Stan!
<snipped large post>
>
> Hi John, not confusing- an excellent response, thanks.
>
> With the memory option it also cruises on under Linux, until it freezes at
> 70
> million objects.
>
> While it's still loading vmstat shows:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  2  0  32528  38552  22792 758560    0    0   183   520  295 2399 66  9 17
>  7
>  3  0  32528  38420  22800 758568    0    0     0     3  254 1779 92  8  0
>  0
>  4  0  32528  38420  22804 758568    0    0     0     6  234 2100 93  7  0
>  0
>  4  0  32528  38420  22804 758568    0    0     0     0  239 2125 93  7  0
>  0
>  2  0  32528  38420  22808 758568    0    0     0     0  233 1379 95  5  0
>  0
>  2  0  32528  38420  22816 758568    0    0     0     1  251 2255 92  8  0
>  0
>  2  0  32528  38296  22824 758568    0    0     0     3  244 1963 94  6  0
>  0
>  4  0  32528  38296  22824 758568    0    0     0     0  230 1863 93  7  0
>  0
>  5  0  32528  38296  22832 758568    0    0     0     4  234 2108 91  9  0
>  0
>  2  0  32528  38296  22840 758568    0    0     0    12  228 1616 94  6  0
>  0
>  4  0  32528  38296  22840 758568    0    0     0     0  234 2142 93  7  0
>  0
>  2  0  32528  36608  22680 751824    0    0     0     1  228 1639 94  6  0
>  0
>  1  0  32528  36608  22680 751824    0    0     0     7  234 2128 92  8  0
>  0
>  3  0  32528  36484  22680 751824    0    0     0     3  248 2170 92  9  0
>  0
>  2  0  32528  39080  22560 749472    0    0     2     4  231 1610 93  7  0
>  0
>  4  0  32528  35104  22560 749584    0    0     0     0  239 2137 89 11  0
>  0
>  3  0  32528  34636  22560 749584    0    0     0     0  236 2084 92  8  0
>  0
>  3  0  32528  34960  22568 749584    0    0     0     4  233 1898 93  7  0
>  0
>  4  0  32528  38956  22568 749584    0    0     0     1  238 2277 90 10  0
>  0
>  2  0  32528  36724  22568 749584    0    0     0     0  227 1719 94  6  0
>  0
>
> Without the -mmap option, once the image has frozen, vmstat shows:
>
>
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  1  0  32528  41884  23220 703504    0    0   130   370  272 2402 74  9 12
>  5
>  1  0  32528  41624  23228 703512    0    0     0     4  215 1094 97  3  0
>  0
>  1  0  32528  41628  23228 703512    0    0     0     0  186  703 98  2  0
>  0
>  1  0  32528  41628  23228 703512    0    0     0     0  179  667 97  3  0
>  0
>  1  0  32528  41628  23236 703512    0    0     0     6  180  668 97  3  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     4  180  667 97  3  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     0  179  667 98  2  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     0  204  765 96  4  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     2  205 1019 97  3  0
>  0
>  1  0  32528  41628  23260 703516    0    0     0     6  180  665 97  3  0
>  0
>  1  0  32528  41628  23268 703516    0    0     0     1  179  658 98  2  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     6  180  657 97  3  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     3  180  655 97  3  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     0  180  654 97  3  0
>  0
>  1  0  32528  41628  23280 703520    0    0     0     0  179  656 97  3  0
>  0
>  1  0  32528  41628  23280 703520    0    0     0     0  179  659 98  3  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     1  179  653 97  3  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0
>  0
>  1  0  32528  41628  23296 703520    0    0     0     3  180  658 98  2  0
>  0.
>
> Nothing obviously different to me.
>
> At least I can work around this as long as I keep sizes moderate.
>
> Thanks again,   Stan

Hi Stan!

What vmstat seems to suggest is that the issue isn't linux going mad
swapping.  Perhaps there's a hard limit to the number of references the VM
can hold, under linux or Windows.  I also notice that there is the same
amount of stuff swapped out in both traces, and no swap activity (si and
so are both zero - you always ignore the first line of output from vmstat
as it's the activity since boot).

The other odd thing is your total swapped column never changes.  How much
virtual memory do you have? The output of

    free

would tell you.

The drop in the number of context switches when it's locked up suggests
that the CPU is completely busy in the squeak VM.  Maybe the garbage
collector gets in a bit of a tiz when abused - is there some way to easily
trace what the squeak GC is doing?  Perhaps some of the people who are
knowledgeable about the deep internals of the VM could shed light on how
to dynamically trace the time spent in the GC thread versus the user space
thread in the VM, I know I found a process manager in squeak once and saw
the garbage collector in there, but I cant recall how I found it or if it
showed how much time was being spent in each "squeak VM thread".

Yours,

John