Q: incremental garbage collection overhead

List overview All Threads
Download

newer

older

Regarding the license change...

Universe and pre-/postload

alain rastoul

21 Nov 2007 21 Nov '07

9:23 p.m.

I'm trying to load data from a sql server database (hundred thousands rows) into Heaps with ODBC/FFI and I noticed that most of the time is spent in incremental garbage collection (about 80 to 90% of the running time of the load process!). I will look at the ODBCResultSet implementation to limit IdentityDictionnary/Row allocation by working with preallocated arrays but this will solve only one of my problems.

I was wondering if there is a way to limit incremental collections by running them only when a certain amount of memory was allocated, I found setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

Another question about garbage collection is the overhead of loaded data in objects for the VM (hundred MB) : is there a way to know if incremental collection is bloated by those data or to know when they are moved to old space ?

Any pointers, ideas or links are welcome

Thanks

Regards, Alain

Show replies by date

John M McIntosh

21 Nov 21 Nov

9:44 p.m.

You need to use a VM that supports setGCBiasToGrowGCLimit Which VM are you using?

Also to turn it on you need to do Smalltalk setGCBiasToGrow: 1.

Other GC tuning values are below. The values given below have no meaning for your application and may make it better, may make it worse.

Smalltalk vmParameterAt: 5 put: 8000. "do an incremental GC after this many allocations" Smalltalk vmParameterAt: 6 put: 4000. "tenure when more than this many objects survive the GC"

Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom" Smalltalk vmParameterAt: 24 put: 48*1024*1024. "shrink threshold"

...

setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

-- = = = ======================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ========================================================================

alain rastoul

25 Nov 25 Nov

5:29 p.m.

Hi John thank you very much for your answer.

Sorry for not responding earlier but my internet connection was down until now. With 32MB as paramter 5 , 16 mb as parameter 6 and 32 mb as parameter 25 the time spent in incremental gc was 50% and and I was able to load 500k rows in 220 sec. The VM i'm using is a standard 3.9, I'll try a 3.10 soon.

Best regards Adain

"John M McIntosh" johnmci@smalltalkconsulting.com a écrit dans le message de news: 4764C3FD-FC45-4321-8460-0AE306F5D3DA@smalltalkconsulting.com...

...

You need to use a VM that supports setGCBiasToGrowGCLimit Which VM are you using?

Also to turn it on you need to do Smalltalk setGCBiasToGrow: 1.

Other GC tuning values are below. The values given below have no meaning for your application and may make it better, may make it worse.

Smalltalk vmParameterAt: 5 put: 8000. "do an incremental GC after this many allocations" Smalltalk vmParameterAt: 6 put: 4000. "tenure when more than this many objects survive the GC"

Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom" Smalltalk vmParameterAt: 24 put: 48*1024*1024. "shrink threshold"

...
setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

-- = = = ======================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ========================================================================

Jason Johnson

22 Nov 22 Nov

7 p.m.

This will probably sound like a cop-out, but are you sure you need to be loading hundreds of thousands of rows? If you are using an RDBMS anyway, I would move as much processing as possible to the DB.

I don't know that you are doing this, but in my professional experience I see a lot of people pulling lots of rows like this and then doing all kinds of post processing on them. If one is going to do that then the overhead of having an RDBMS isn't worth it. There are lots of ways to persist data.

On Nov 21, 2007 9:23 PM, alain rastoul alr.dev@free.fr wrote:

...

Hi

I'm trying to load data from a sql server database (hundred thousands rows) into Heaps with ODBC/FFI and I noticed that most of the time is spent in incremental garbage collection (about 80 to 90% of the running time of the load process!). I will look at the ODBCResultSet implementation to limit IdentityDictionnary/Row allocation by working with preallocated arrays but this will solve only one of my problems.

I was wondering if there is a way to limit incremental collections by running them only when a certain amount of memory was allocated, I found setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

Another question about garbage collection is the overhead of loaded data in objects for the VM (hundred MB) : is there a way to know if incremental collection is bloated by those data or to know when they are moved to old space ?

Any pointers, ideas or links are welcome

Thanks

Regards, Alain

alain rastoul

25 Nov 25 Nov

5:37 p.m.

Hi Jason

I 've been working with sql server every day since years, part of my job includes helping developers or consultants to rewrite bad performing queries (query plans etc). For some customers we set up cubes with analysis services and we do not directly use the rdbms, that would be too much load and far unusable.

And about your question, yes, of course I'm sure I need to load a lot of rows, in fact I hope I could not load hundred thousands but milllions of rows ... (one hundred millions would be fine :) ) . I don't know if it will be possible with squeak without tackling some issues, but today I find it good for quick prototyping and explorations (in this case about hashing, cardinalities and computations...), it's not at all about how to persist data.

Whatever, thank you for taking time to answer

Regards Alain

"Jason Johnson" jason.johnson.081@gmail.com a écrit dans le message de news: aa22f0200711221000x25c03bf8neeb1fa560efceccf@mail.gmail.com...

...

This will probably sound like a cop-out, but are you sure you need to be loading hundreds of thousands of rows? If you are using an RDBMS anyway, I would move as much processing as possible to the DB.

I don't know that you are doing this, but in my professional experience I see a lot of people pulling lots of rows like this and then doing all kinds of post processing on them. If one is going to do that then the overhead of having an RDBMS isn't worth it. There are lots of ways to persist data.

On Nov 21, 2007 9:23 PM, alain rastoul alr.dev@free.fr wrote:

...
Hi

I'm trying to load data from a sql server database (hundred thousands rows) into Heaps with ODBC/FFI and I noticed that most of the time is spent in incremental garbage collection (about 80 to 90% of the running time of the load process!). I will look at the ODBCResultSet implementation to limit IdentityDictionnary/Row allocation by working with preallocated arrays but this will solve only one of my problems.

I was wondering if there is a way to limit incremental collections by running them only when a certain amount of memory was allocated, I found setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

Another question about garbage collection is the overhead of loaded data in objects for the VM (hundred MB) : is there a way to know if incremental collection is bloated by those data or to know when they are moved to old space ?

Any pointers, ideas or links are welcome

Thanks

Regards, Alain

Jason Johnson

6:29 p.m.

Ok, sounds like you know what you're doing then. In that case, yes it would be good for prototyping and so on. I didn't mean my response to be unhelpful, but I also didn't want to be person (a) from this question:

http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/408925.aspx

On Nov 25, 2007 5:37 PM, alain rastoul alr.dev@free.fr wrote:

...

Hi Jason

I 've been working with sql server every day since years, part of my job includes helping developers or consultants to rewrite bad performing queries (query plans etc). For some customers we set up cubes with analysis services and we do not directly use the rdbms, that would be too much load and far unusable.

And about your question, yes, of course I'm sure I need to load a lot of rows, in fact I hope I could not load hundred thousands but milllions of rows ... (one hundred millions would be fine :) ) . I don't know if it will be possible with squeak without tackling some issues, but today I find it good for quick prototyping and explorations (in this case about hashing, cardinalities and computations...), it's not at all about how to persist data.

Whatever, thank you for taking time to answer

Regards Alain

"Jason Johnson" jason.johnson.081@gmail.com a écrit dans le message de news: aa22f0200711221000x25c03bf8neeb1fa560efceccf@mail.gmail.com...

...
This will probably sound like a cop-out, but are you sure you need to be loading hundreds of thousands of rows? If you are using an RDBMS anyway, I would move as much processing as possible to the DB.

I don't know that you are doing this, but in my professional experience I see a lot of people pulling lots of rows like this and then doing all kinds of post processing on them. If one is going to do that then the overhead of having an RDBMS isn't worth it. There are lots of ways to persist data.

On Nov 21, 2007 9:23 PM, alain rastoul alr.dev@free.fr wrote:

...
Hi

I'm trying to load data from a sql server database (hundred thousands rows) into Heaps with ODBC/FFI and I noticed that most of the time is spent in incremental garbage collection (about 80 to 90% of the running time of the load process!). I will look at the ODBCResultSet implementation to limit IdentityDictionnary/Row allocation by working with preallocated arrays but this will solve only one of my problems.

I was wondering if there is a way to limit incremental collections by running them only when a certain amount of memory was allocated, I found setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ?

Another question about garbage collection is the overhead of loaded data in objects for the VM (hundred MB) : is there a way to know if incremental collection is bloated by those data or to know when they are moved to old space ?

Any pointers, ideas or links are welcome

Thanks

Regards, Alain

6020

Age (days ago)

6024

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

5 comments

3 participants

tags (0)

participants (3)

alain rastoul
Jason Johnson
John M McIntosh