Q: incremental garbage collection overhead

alain rastoul alr.dev at free.fr
Sun Nov 25 16:37:27 UTC 2007


Hi Jason

I 've been working with sql server every day since years, part of my job 
includes helping developers or consultants to rewrite bad performing queries 
(query plans etc). For some customers we set up cubes with analysis services 
and we do not directly use the rdbms, that would be too much load and far 
unusable.

And about your question, yes, of course I'm sure I need to load a lot of 
rows, in fact I hope I could not load hundred thousands but milllions of 
rows ... (one hundred millions would be fine :) ) . I don't know if it will 
be possible with squeak without tackling some issues, but today I find it 
good for quick prototyping and explorations (in this case about hashing, 
cardinalities and computations...), it's not at all about how to persist 
data.

Whatever, thank you for taking time to answer

Regards
Alain

"Jason Johnson" <jason.johnson.081 at gmail.com> a écrit dans le message de 
news: aa22f0200711221000x25c03bf8neeb1fa560efceccf at mail.gmail.com...
> This will probably sound like a cop-out, but are you sure you need to
> be loading hundreds of thousands of rows?  If you are using an RDBMS
> anyway, I would move as much processing as possible to the DB.
>
> I don't know that you are doing this, but in my professional
> experience I see a lot of people pulling lots of rows like this and
> then doing all kinds of post processing on them.  If one is going to
> do that then the overhead of having an RDBMS isn't worth it.  There
> are lots of ways to persist data.
>
> On Nov 21, 2007 9:23 PM, alain rastoul <alr.dev at free.fr> wrote:
>> Hi
>>
>> I'm trying to load data from a sql server database (hundred thousands 
>> rows)
>> into Heaps with ODBC/FFI and I noticed that most of the time is spent in
>> incremental garbage collection (about 80 to 90% of the running time of 
>> the
>> load process!).
>> I will look at the ODBCResultSet implementation to limit
>> IdentityDictionnary/Row allocation by working with preallocated arrays 
>> but
>> this will solve only one of my problems.
>>
>> I was wondering if there is a way to limit incremental collections by
>> running them only when a certain amount of memory was allocated, I found
>> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
>> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a
>> primitive has failed" error. Is it the right method ?
>>
>> Another question about garbage collection is the overhead of loaded data 
>> in
>> objects for the VM (hundred MB) : is there a way to know if incremental
>> collection is bloated by those data or to know when they are moved to old
>> space ?
>>
>> Any pointers, ideas or links are welcome
>>
>> Thanks
>>
>> Regards,
>> Alain
>>
>>
>>
>>
>>
>
> 






More information about the Squeak-dev mailing list