SqueakDBX July 2011

squeakdbx@lists.squeakfoundation.org

1 participants
4 discussions

Re: [opendbx] slow insert with sqlite
by Mariano Martinez Peck 06 Jul '11

06 Jul '11

On Tue, Jul 5, 2011 at 11:46 PM, Alain Rastoul <alr.dev(a)free.fr> wrote: > ** > Hi Mariano, > I don't want to do multi threading with sqlite because I know it doesn't > work. > I was curious about the squeakdbx (or opendbx architecture) because of the > not so good performance and the time spent in waiting , I do not > understand the squeakdbx package vs opendbx package: the doc is mentioning a > squeakdbx plugin dll but I have no squeakdbx dll ? > Sorry. THat's outdated. Once (2 years ago) Esteban Lorenzano tried to write a plugin to avoid FFI. The idea was that such plugin could avoid locking the VM. But I don't remember why we didn't succeded. > > You are saying that in that case the external call is counted on the > InputEventPollingFetcher>> wait and not in primitives (?). > maybe :) but I don't know > I will investigate with FFI/SQlite and it should be the same (I've seen > some messages about incorrect profiling reports in primitives), > > Yes, primitives are not really well measured in profilers. Check the new profiler announced by Eliot Miranda, it fixes this problem. > I expected much better performance with sqlite , and glorp is very good > (5% of the time), I would have expected the contrary. > Sorry I didn't understand. > > Thanks > > Cheers > Alain > > "Mariano Martinez Peck" <marianopeck(a)gmail.com> a écrit dans le message de > news:CAA+-=mVV3zvPcFPm3UwtS11Y1ugxpCJi6pZxYpVJPZTfsDrrdQ@mail.gmail.com... > > > On Tue, Jul 5, 2011 at 10:50 PM, Alain Rastoul <alr.dev(a)free.fr> wrote: > >> Hi, >> (sorry for sending this mail again, my pc was off for a long time and the >> message was dated from 2007, people who sort their messages would not see >> it) >> >> I've done a small program in Pharo 1.3 with glorp+opendbx that insert 1000 >> rows in a customer table in a sqlite db. >> The 1000 insert takes 140 sec (very slow), but the Pharo profiler says >> that >> it spend 95% >> of the time waiting for input. >> (in InputEventPollingFetcher>> waitForInput) >> I was wondering if the queries are executed in another thread than the vm >> thread ? >> > > Hi Alain. No. Squeak/Pharo's thread architecture is the so called green > thread, that is, only ONE OS thread is used. Internally, the language > reifies Process, Scheduler, #fork: , etc etc etc. But from the OS point of > view there is only one thread for the VM. So.....the regular FFI blocks the > VM. What does it mean? that while the C function called by FFI is being > executed, the WHOLE VM is block. Notihgn can happen at the same time. > Imagine the function that retrieves the results and needs to wait for > them.....TERRIBLE. So...if the backend does not support async quieries, then > you are screw and dbx may be slow in Pharo. Nothing to do. > > However, some backends support async queries, and opendbx let us configure > this. This is explained in: > > http://www.squeakdbx.org/Architecture%20and%20desing?_s=FlIhkPQOOFSlqf8C&_k… > where it says "External call implementation" > > You can see the list of backends that support async queries in here: > > http://www.squeakdbx.org/documentation/Asynchronous%20queries?_s=FlIhkPQOOF… > > Notice that there is some room for improvements, but we didn't have time so > far. Hernik told us some good ideas. But since we didn't need more power so > far we couldn't find time to integrate his ideas. I am forwarding now the > emails to the mailing list. If you can take a look and provide code, it > would be awesome. Basically, it improves how and how much we wait in each > side: image and opendbx. > > Finally, notice that Eliot is working in a multithreared FFI for Cog, but > it is not yet available as far as I know. > > Cheers > > Mariano > > (I thought I've seen a document about opendbx architecture but could'nt >> find >> it on the site). >> >> TIA >> Alain >> >> >> >> >> ------------------------------------------------------------------------------ >> All of the data generated in your IT infrastructure is seriously valuable. >> Why? It contains a definitive record of application performance, security >> threats, fraudulent activity, and more. Splunk takes this data and makes >> sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-d2d-c2 >> _______________________________________________ >> libopendbx-devel mailing list >> libopendbx-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f(a)public.gmane.org >> >> https://lists.sourceforge.net/lists/listinfo/libopendbx-devel >> http://www.linuxnetworks.de/doc/index.php/OpenDBX >> > > > > -- > Mariano > http://marianopeck.wordpress.com > > ------------------------------ > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > > ------------------------------ > > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > _______________________________________________ > libopendbx-devel mailing list > libopendbx-devel(a)lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libopendbx-devel > http://www.linuxnetworks.de/doc/index.php/OpenDBX > > -- Mariano http://marianopeck.wordpress.com

1 0

Fwd: Increasing the performances of a Seaside application
by Mariano Martinez Peck 05 Jul '11

05 Jul '11

---------- Forwarded message ---------- From: Henrik Sperre Johansen <henrik.s.johansen(a)veloxit.no> Date: Fri, Jun 3, 2011 at 8:35 AM Subject: Re: Increasing the performances of a Seaside application To: Mariano Martinez Peck <marianopeck(a)gmail.com> ** On 01.06.2011 20:54, Mariano Martinez Peck wrote: On Tue, May 31, 2011 at 11:30 PM, Henrik Sperre Johansen < henrik.s.johansen(a)veloxit.no> wrote: > Thanks. I was not clear. What we actually do is: >>> >>> (code = OpenDBX resultTimeout) ifTrue: [ (Delay forMilliseconds: >>> (aQuerySettings timeout asMiliseconds)) wait ]. >>> >>> Is that better? Even if it lets just run processes of the same priority, >>> this is good anyway because what we want is at least be able to process >>> other queries. Probably, those other processes are being done from other >>> Process. >>> >> >> It's a bit better. There's no starvation if the timeout is greater than >> zero, but it's still a form of busy waiting, and it limits the number of >> queries per second per connection to at most 1000 (actually 1000 / timeout). >> To compare this with our native implementation - PostgresV3 - I measured 6k+ >> queries per second per connection and it's still not optimized for Cog >> (#perform: is slow on Cog). >> >> > Thanks Levente. Unfortunatly I guess that's all we can do with a blocking > FFI :( > > Not really :) > Thanks Henrik. Before analyzing your suggestions, let me tell you something stupid we did in DBX that I have just realized. There are TWO different timeouts. Yes :) 1) OpenDBX timeout: the one send by parameter to OpenDBX function: http://www.linuxnetworks.de/doc/index.php/OpenDBX/C_API/odbx_result that determinates the time the C function of OpenDBX waits for the result. nextResultSet: aConnection querySettings: aQuerySettings onReturn: aBlock "Returns the next resultSet from the last resultSet. When there is no more resultSets, the block is evaluated." | handle err handleArray | handleArray := WordArray with: 0. err := OpenDBX current apiQueryResult: aConnection handle handle: handleArray timeout: aQuerySettings timeoutAsDBXTimeSpec chunk: aQuerySettings pageSize. ...... 2) SqueakDBX timeout: the time we wait in the IMAGE side once we got a timeout from OpenDBX. this is what I showed you: (code = OpenDBX resultTimeout) ifTrue: [ (Delay forMilliseconds: (aQuerySettings timeout asMiliseconds)) wait ]. So....as you can see we are using both values for both things. This is not necessary and maybe stupid. Yes :) The default timeout now is 10 miliseconds >> defaultTimeout "10 miliseconds" ^DBXQueryTimeout seconds: 0 microseconds: 10. So...if I follow your a) you smartly recommend to use 100ms. And in this case you are talking about the OpenDBX timeout and only for the first time. This way most queries will be cought in the first try and even if they do not, we return fast. And then, for future calls of the same query (only if there is a timeout) we use a really short timeout. For example 1ms. The idea is to wait as much as possible in image side (Delay) rather than C. This isn't milliseconds, this is microSeconds (when you use it for the C call) :) I'm not sure of DBX implementation, if it returns immediately when done or the entire period. Should be easily testable by setting a really long timeout, like 1 million (1second) and repeating a query you know only take a couple of milliseconds to complete, say 10 times. If the test takes 10 seconds wall time, you know it wait entire period, if not, then it's safe to set the timeout to the maximum amount of time you feel it's acceptable to block the image. (at first call, subsequent should still block for mimimum amount in C and rather use the delay) At the same time, with b) you recommend you use an incremental SqueakDBX timeout (the Delay). So we can start with 1 ms and then grow 2 4 8 16 32 64 128 256 512 up to 1024. And if we get until 1024 we continue using that value? but isn't 1ms too small? because this value will be used if a timeout happened (the result took more than 100ms). So it is quite weird that it will be ready just 1ms after. No? so...did I understand correctly ? The timeout in C call is in microseconds, thus 100 means 1/10th of a millisecond, not 100 milliseconds. Thus starting at 1ms delay makes more sense. Other than that, you understood perfectly. > > You could > a) Use a default timeout for the first call which means it actually > completes more queries on the first try yet still returns fast, say 100ms > rather than 1ms. > (For later calls just to check if it is possibly finished, you probably > want to block for as short a time as possible though) > b) Use an exponentially growing value for the Delay rather than a constant > one, starting at 1ms and max some other value > 1 2 4 8 16 32 64 128 256 512 1024 for instance, polling once per second > shouldn't hurt other processes at all, yet give ok responsiveness for > queries > 1 seconds. > > This way, you (in the cases where potential is 9k queries /sec) will have a > hard cap at 10k queries (due to the 100ms block time), and hurt those above > that as little as possible using Delays. What you don't have though, is a > cap of around 1k, due to calls never completing in 1ms, and having to wait > (at least, I don't know the default value of aQuerySettings timeout) 1ms for > each due to minimum delay wait time resolution. > > > > Btw, unless the microseconds and seconds are switched, this could be > simpler (as well as misspelled :) ): > DBXQueryTimeout >> asMiliseconds > ^ (self seconds * 1000) + (((self microseconds / 1000) asFloat) > integerPart asInteger ) > ^ (self seconds * 1000) + (self microseconds // 1000) > > Thanks :) Of course, this round down. If you want it rounded UP to closes millisecond, you can do: ^ (self seconds * 1000) + (999 + self microseconds // 1000) Or if you want to round to nearest: ^ (self seconds * 1000) + (500 + self microseconds // 1000) > DBXTimeSpec also has an field called nseconds which contains microseconds, > rather confusing :) > yes, I know. The problem was the OpenDBX/C uses that structure but from image side it was nicer to use microseconds hehehehe The C struct contains microseconds, and that's what it's being used as image-side as well :) Ie. it should be named microSeconds or something instead (mseconds is too ambiguos ) TLDR; You understood perfectly, using the same timeout for both blocking in C, and waiting in image between C calls is not the best idea. Using a longer initial C timeout ensures you get -better- than Delay resolution response times in the cases where that is possible. Cheers, Henry PS. Is there a lock somewhere? What happens if you do two queries, how do you handle waiting for both at the same time, and getting the correct result set to the correct sender? -- Mariano http://marianopeck.wordpress.com

1 0

Fwd: Increasing the performances of a Seaside application
by Mariano Martinez Peck 05 Jul '11

05 Jul '11

---------- Forwarded message ---------- From: Mariano Martinez Peck <marianopeck(a)gmail.com> Date: Wed, Jun 1, 2011 at 8:54 PM Subject: Re: Increasing the performances of a Seaside application To: Henrik Sperre Johansen <henrik.s.johansen(a)veloxit.no>, proyecto_relacional(a)googlegroups.com On Tue, May 31, 2011 at 11:30 PM, Henrik Sperre Johansen < henrik.s.johansen(a)veloxit.no> wrote: > Thanks. I was not clear. What we actually do is: >>> >>> (code = OpenDBX resultTimeout) ifTrue: [ (Delay forMilliseconds: >>> (aQuerySettings timeout asMiliseconds)) wait ]. >>> >>> Is that better? Even if it lets just run processes of the same priority, >>> this is good anyway because what we want is at least be able to process >>> other queries. Probably, those other processes are being done from other >>> Process. >>> >> >> It's a bit better. There's no starvation if the timeout is greater than >> zero, but it's still a form of busy waiting, and it limits the number of >> queries per second per connection to at most 1000 (actually 1000 / timeout). >> To compare this with our native implementation - PostgresV3 - I measured 6k+ >> queries per second per connection and it's still not optimized for Cog >> (#perform: is slow on Cog). >> >> > Thanks Levente. Unfortunatly I guess that's all we can do with a blocking > FFI :( > > Not really :) > Thanks Henrik. Before analyzing your suggestions, let me tell you something stupid we did in DBX that I have just realized. There are TWO different timeouts. 1) OpenDBX timeout: the one send by parameter to OpenDBX function: http://www.linuxnetworks.de/doc/index.php/OpenDBX/C_API/odbx_result that determinates the time the C function of OpenDBX waits for the result. nextResultSet: aConnection querySettings: aQuerySettings onReturn: aBlock "Returns the next resultSet from the last resultSet. When there is no more resultSets, the block is evaluated." | handle err handleArray | handleArray := WordArray with: 0. err := OpenDBX current apiQueryResult: aConnection handle handle: handleArray timeout: aQuerySettings timeoutAsDBXTimeSpec chunk: aQuerySettings pageSize. ...... 2) SqueakDBX timeout: the time we wait in the IMAGE side once we got a timeout from OpenDBX. this is what I showed you: (code = OpenDBX resultTimeout) ifTrue: [ (Delay forMilliseconds: (aQuerySettings timeout asMiliseconds)) wait ]. So....as you can see we are using both values for both things. This is not necessary and maybe stupid. The default timeout now is 10 miliseconds >> defaultTimeout "10 miliseconds" ^DBXQueryTimeout seconds: 0 microseconds: 10. So...if I follow your a) you smartly recommend to use 100ms. And in this case you are talking about the OpenDBX timeout and only for the first time. This way most queries will be cought in the first try and even if they do not, we return fast. And then, for future calls of the same query (only if there is a timeout) we use a really short timeout. For example 1ms. The idea is to wait as much as possible in image side (Delay) rather than C. At the same time, with b) you recommend you use an incremental SqueakDBX timeout (the Delay). So we can start with 1 ms and then grow 2 4 8 16 32 64 128 256 512 up to 1024. And if we get until 1024 we continue using that value? but isn't 1ms too small? because this value will be used if a timeout happened (the result took more than 100ms). So it is quite weird that it will be ready just 1ms after. No? so...did I understand correctly ? > > You could > a) Use a default timeout for the first call which means it actually > completes more queries on the first try yet still returns fast, say 100ms > rather than 1ms. > (For later calls just to check if it is possibly finished, you probably > want to block for as short a time as possible though) > b) Use an exponentially growing value for the Delay rather than a constant > one, starting at 1ms and max some other value > 1 2 4 8 16 32 64 128 256 512 1024 for instance, polling once per second > shouldn't hurt other processes at all, yet give ok responsiveness for > queries > 1 seconds. > > This way, you (in the cases where potential is 9k queries /sec) will have a > hard cap at 10k queries (due to the 100ms block time), and hurt those above > that as little as possible using Delays. What you don't have though, is a > cap of around 1k, due to calls never completing in 1ms, and having to wait > (at least, I don't know the default value of aQuerySettings timeout) 1ms for > each due to minimum delay wait time resolution. > > > > Btw, unless the microseconds and seconds are switched, this could be > simpler (as well as misspelled :) ): > DBXQueryTimeout >> asMiliseconds > ^ (self seconds * 1000) + (((self microseconds / 1000) asFloat) > integerPart asInteger ) > ^ (self seconds * 1000) + (self microseconds // 1000) > > Thanks :) > DBXTimeSpec also has an field called nseconds which contains microseconds, > rather confusing :) > yes, I know. The problem was the OpenDBX/C uses that structure but from image side it was nicer to use microseconds hehehehe > > Cheers, > Henry > -- Mariano http://marianopeck.wordpress.com -- Mariano http://marianopeck.wordpress.com

1 0

Re: [opendbx] slow insert with sqlite
by Mariano Martinez Peck 05 Jul '11

05 Jul '11

On Tue, Jul 5, 2011 at 10:50 PM, Alain Rastoul <alr.dev(a)free.fr> wrote: > Hi, > (sorry for sending this mail again, my pc was off for a long time and the > message was dated from 2007, people who sort their messages would not see > it) > > I've done a small program in Pharo 1.3 with glorp+opendbx that insert 1000 > rows in a customer table in a sqlite db. > The 1000 insert takes 140 sec (very slow), but the Pharo profiler says that > it spend 95% > of the time waiting for input. > (in InputEventPollingFetcher>> waitForInput) > I was wondering if the queries are executed in another thread than the vm > thread ? > Hi Alain. No. Squeak/Pharo's thread architecture is the so called green thread, that is, only ONE OS thread is used. Internally, the language reifies Process, Scheduler, #fork: , etc etc etc. But from the OS point of view there is only one thread for the VM. So.....the regular FFI blocks the VM. What does it mean? that while the C function called by FFI is being executed, the WHOLE VM is block. Notihgn can happen at the same time. Imagine the function that retrieves the results and needs to wait for them.....TERRIBLE. So...if the backend does not support async quieries, then you are screw and dbx may be slow in Pharo. Nothing to do. However, some backends support async queries, and opendbx let us configure this. This is explained in: http://www.squeakdbx.org/Architecture%20and%20desing?_s=FlIhkPQOOFSlqf8C&_k… where it says "External call implementation" You can see the list of backends that support async queries in here: http://www.squeakdbx.org/documentation/Asynchronous%20queries?_s=FlIhkPQOOF… Notice that there is some room for improvements, but we didn't have time so far. Hernik told us some good ideas. But since we didn't need more power so far we couldn't find time to integrate his ideas. I am forwarding now the emails to the mailing list. If you can take a look and provide code, it would be awesome. Basically, it improves how and how much we wait in each side: image and opendbx. Finally, notice that Eliot is working in a multithreared FFI for Cog, but it is not yet available as far as I know. Cheers Mariano (I thought I've seen a document about opendbx architecture but could'nt find > it on the site). > > TIA > Alain > > > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2d-c2 > _______________________________________________ > libopendbx-devel mailing list > libopendbx-devel(a)lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libopendbx-devel > http://www.linuxnetworks.de/doc/index.php/OpenDBX > -- Mariano http://marianopeck.wordpress.com

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

SqueakDBX July 2011