I have two applications that both need a class Customer. However, one of the application requires that class Customer have a set of attributes and methods very different from the second application who uses the class Customer for a totally different industry.
Hi Daniel, for me, the key to the answer to your question is the phrase "totally different industry." If this is true, it seems there would be little chance that the Customers would need to be in the same running image, nor would they need to be developed together.
Therefore I wouldn't worry too much about trying to factor the two Customer classes into a single, grand Customer superclass, if it won't ever be commingled in practice. In fact, doing so could potentially increase cost. For example, the two groups of developers may have their own ideas about what belongs in the generic superclass Customer. So then you have developers spending time reconciling code by two completely disparate industries.
Even if theres a remote chance of the two programs needing to co-exist in an image, you can use refactoring-tools to meld the code at that time. But keep it simpler in the meantime. Remember, if you can solve the problem of ultimate malleability of software, you no longer need to worry about up-front planning of "ultimate" classes.. We still need a balance of both, but I believe the future is in the former.
That's my view, anyhow..
- Chris
Chris,
That's a very interesting perspective. The thought crossed my mind, but then I thought of another "issue" which is what prompted me to ask the question to the list.
Originally I thought I could simply just run an image for each of these specialized applications. For example, one could be a Customer Relationship Management system, while the other is a Medical Billing system. Both systems have different views of what a Customer is and it would make sense to run them in separate images.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
To some degree, I guess the question will depend on the traffic load on the machine, but I would guesstimate that, for example, the CRM system would handle about 10000 contacts a day (by contact I mean customer record inquiries, or logging call attempts, etc) and maybe about 500 medical bill records per day. If anyone has some experience in these industries, you can guesstimate some sort of traffic load.
I don't k now if I'm being too vague or worrying too much about something that hasn't been materialized yet. I just like to plan ahead.
Thanks, Daniel
On Jun 17, 2005, at 11:50 AM, Chris Muller wrote:
I have two applications that both need a class Customer. However, one of the application requires that class Customer have a set of attributes and methods very different from the second application who uses the class Customer for a totally different industry.
Hi Daniel, for me, the key to the answer to your question is the phrase "totally different industry." If this is true, it seems there would be little chance that the Customers would need to be in the same running image, nor would they need to be developed together.
Therefore I wouldn't worry too much about trying to factor the two Customer classes into a single, grand Customer superclass, if it won't ever be commingled in practice. In fact, doing so could potentially increase cost. For example, the two groups of developers may have their own ideas about what belongs in the generic superclass Customer. So then you have developers spending time reconciling code by two completely disparate industries.
Even if theres a remote chance of the two programs needing to co- exist in an image, you can use refactoring-tools to meld the code at that time. But keep it simpler in the meantime. Remember, if you can solve the problem of ultimate malleability of software, you no longer need to worry about up- front planning of "ultimate" classes.. We still need a balance of both, but I believe the future is in the former.
That's my view, anyhow..
- Chris
Ah well, it sounds like you *do* have a reason to commingle your Customers then. My answer assumed none, other than code-reuse benefits.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
That's a valid question. Given your end-goal is to have good-performance for users of both applications, an equally valid question is; will one single-threaded Squeak VM efficiently run both applications?
My guess is a dual-Xeon machine with 2GB RAM will not run a single Squeak image any faster than a single-Xeon machine.
For analogy, I currently, this very second as I'm typing, have five running images on my 768MB 1.3GHz IBM laptop. One of them is listening on two different ports, two others are listening on one port each. A fourth one is not listening but is sending requests to the first three. Millions of requests are being sent to each other like mad. Windows Task Manager shows 100% CPU used for the last hour. Meanwhile, I type this e-mail in Mozilla and I don't even notice a spec of delay between keystrokes or any indication of stress. The same is true in the fifth Squeak image which is not participating in the TCP/IP party. That's because they are running on their own OS thread.
However, if I try to type in one of the other images that *is* sending/receiving requests, I notice little pauses between my keystrokes. This occurs because each VM runs in but a single thread, and when that thread is handling a request, my keystrokes and/or the UI updates are blocked.
The point is, requests processed by one app will most likely temporarily block requests for the other app, when both are running in the same image. Something worth considering for your planning..
Of course, if you're restricted on port #'s, that could present its own challenge..
- Chris
Chris,
On Jun 17, 2005, at 4:03 PM, Chris Muller wrote:
Ah well, it sounds like you *do* have a reason to commingle your Customers then. My answer assumed none, other than code-reuse benefits.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
That's a valid question. Given your end-goal is to have good- performance for users of both applications, an equally valid question is; will one single-threaded Squeak VM efficiently run both applications?
Eventually, I'll be able to properly architect the system across multiple machines where Apache, Storage (MySQL and/or GOODS), Squeak, etc. It's just that for now, all I have is this single machine.
My guess is a dual-Xeon machine with 2GB RAM will not run a single Squeak image any faster than a single-Xeon machine.
I don't know too much about the behind the scenes of Squeak so I don't know about its multi-process/multi-threaded capabilities coexisting with the support provided by the host OS.
For analogy, I currently, this very second as I'm typing, have five running images on my 768MB 1.3GHz IBM laptop. One of them is listening on two different ports, two others are listening on one port each. A fourth one is not listening but is sending requests to the first three. Millions of requests are being sent to each other like mad. Windows Task Manager shows 100% CPU used for the last hour. Meanwhile, I type this e-mail in Mozilla and I don't even notice a spec of delay between keystrokes or any indication of stress. The same is true in the fifth Squeak image which is not participating in the TCP/IP party. That's because they are running on their own OS thread.
That's pretty good. I think that if my apps perform similar to that, I should be fine for some time.
However, if I try to type in one of the other images that *is* sending/receiving requests, I notice little pauses between my keystrokes. This occurs because each VM runs in but a single thread, and when that thread is handling a request, my keystrokes and/or the UI updates are blocked.
The point is, requests processed by one app will most likely temporarily block requests for the other app, when both are running in the same image. Something worth considering for your planning..
I've noticed that when I have taxed a Squeak image running bulk data transfers using any of the Squeak OODB interfaces (I haven't noticed the same when doing similar bulk data transfers on MySQL or Postgres)
Of course, if you're restricted on port #'s, that could present its own challenge..
I guess I could always start Seaside (WAKom) to listen on different ports and have Apache redirect appropriately.
- Chris
Thanks, Daniel
Hey Daniel,
I haven't done a site yet with Seaside, but I've used Smalltalk for public corporate web tier stuff before (insurance). I would rather have several less powerful machines for each application than one big machine. Of course, I had a paying sponsor who agreed with me that hardware is cheap whereas failures and programmers are expensive.
I'd prefer at least two machines per application. A failure in one machine/image wouldn't threaten the running application from the end-user perspective. Nor would it threaten other applications. I'd put Apache on a separate machine too. Same for the backing store (whether Magma -- hi Chris, GOODS, or a RDBMS). Making it so your app can run on several machines makes it easier to add machines to support load.
Of course, you need enough memory to handle everything. 2GB might be a lot or not enough, depending on what each one of those 10,000 or 500 records are doing. Looking at just the 10000. There is probably an uneven distribution (different customers will have different usage patterns). Maybe 60% of that traffic wants to enter during your peak hour, so 6,000 per hour at say 1:00 P.M. local time. If they timeout at 20 minutes...
Anyway, I would expect that the CRM and Medical systems would have very different usage characteristics and load patterns. The real answer is to test and find out. Luckily Squeak has an Http client whose guts you can script.
--David
Chris,
That's a very interesting perspective. The thought crossed my mind, but then I thought of another "issue" which is what prompted me to ask the question to the list.
Originally I thought I could simply just run an image for each of these specialized applications. For example, one could be a Customer Relationship Management system, while the other is a Medical Billing system. Both systems have different views of what a Customer is and it would make sense to run them in separate images.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
To some degree, I guess the question will depend on the traffic load on the machine, but I would guesstimate that, for example, the CRM system would handle about 10000 contacts a day (by contact I mean customer record inquiries, or logging call attempts, etc) and maybe about 500 medical bill records per day. If anyone has some experience in these industries, you can guesstimate some sort of traffic load.
I don't k now if I'm being too vague or worrying too much about something that hasn't been materialized yet. I just like to plan ahead.
Thanks, Daniel
On Jun 17, 2005, at 11:50 AM, Chris Muller wrote:
I have two applications that both need a class Customer. However, one of the application requires that class Customer have a set of attributes and methods very different from the second application who uses the class Customer for a totally different industry.
Hi Daniel, for me, the key to the answer to your question is the phrase "totally different industry." If this is true, it seems there would be little chance that the Customers would need to be in the same running image, nor would they need to be developed together.
Therefore I wouldn't worry too much about trying to factor the two Customer classes into a single, grand Customer superclass, if it won't ever be commingled in practice. In fact, doing so could potentially increase cost. For example, the two groups of developers may have their own ideas about what belongs in the generic superclass Customer. So then you have developers spending time reconciling code by two completely disparate industries.
Even if theres a remote chance of the two programs needing to co- exist in an image, you can use refactoring-tools to meld the code at that time. But keep it simpler in the meantime. Remember, if you can solve the problem of ultimate malleability of software, you no longer need to worry about up- front planning of "ultimate" classes.. We still need a balance of both, but I believe the future is in the former.
That's my view, anyhow..
- Chris
David,
On Jun 17, 2005, at 4:32 PM, David Mitchell wrote:
Hey Daniel,
I haven't done a site yet with Seaside, but I've used Smalltalk for public corporate web tier stuff before (insurance). I would rather have several less powerful machines for each application than one big machine. Of course, I had a paying sponsor who agreed with me that hardware is cheap whereas failures and programmers are expensive.
I'd prefer at least two machines per application. A failure in one machine/image wouldn't threaten the running application from the end-user perspective. Nor would it threaten other applications. I'd put Apache on a separate machine too. Same for the backing store (whether Magma -- hi Chris, GOODS, or a RDBMS). Making it so your app can run on several machines makes it easier to add machines to support load.
That makes 100% and that's how I would do it as well. However, given that I don't have a sponsor, I have to deliver first and then get a sponsor.
Of course, you need enough memory to handle everything. 2GB might be a lot or not enough, depending on what each one of those 10,000 or 500 records are doing. Looking at just the 10000. There is probably an uneven distribution (different customers will have different usage patterns). Maybe 60% of that traffic wants to enter during your peak hour, so 6,000 per hour at say 1:00 P.M. local time. If they timeout at 20 minutes...
That's an interesting issue. I would need to carefully monitor memory usage. Are there any docs that can help me incorporate some memory usage analysis of squeak on a per application basis?
Anyway, I would expect that the CRM and Medical systems would have very different usage characteristics and load patterns. The real answer is to test and find out. Luckily Squeak has an Http client whose guts you can script.
That's true. I didn't think about using Http client but I guess that would be a good way to design and implement some test scripting.
--David
Thanks, Daniel
Chris,
That's a very interesting perspective. The thought crossed my mind, but then I thought of another "issue" which is what prompted me to ask the question to the list.
Originally I thought I could simply just run an image for each of these specialized applications. For example, one could be a Customer Relationship Management system, while the other is a Medical Billing system. Both systems have different views of what a Customer is and it would make sense to run them in separate images.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
To some degree, I guess the question will depend on the traffic load on the machine, but I would guesstimate that, for example, the CRM system would handle about 10000 contacts a day (by contact I mean customer record inquiries, or logging call attempts, etc) and maybe about 500 medical bill records per day. If anyone has some experience in these industries, you can guesstimate some sort of traffic load.
I don't k now if I'm being too vague or worrying too much about something that hasn't been materialized yet. I just like to plan ahead.
Thanks, Daniel
On Jun 17, 2005, at 11:50 AM, Chris Muller wrote:
I have two applications that both need a class Customer. However, one of the application requires that class Customer have a set of attributes and methods very different from the second application who uses the class Customer for a totally different industry.
Hi Daniel, for me, the key to the answer to your question is the phrase "totally different industry." If this is true, it seems there would be little chance that the Customers would need to be in the same running image, nor would they need to be developed together.
Therefore I wouldn't worry too much about trying to factor the two Customer classes into a single, grand Customer superclass, if it won't ever be commingled in practice. In fact, doing so could potentially increase cost. For example, the two groups of developers may have their own ideas about what belongs in the generic superclass Customer. So then you have developers spending time reconciling code by two completely disparate industries.
Even if theres a remote chance of the two programs needing to co- exist in an image, you can use refactoring-tools to meld the code at that time. But keep it simpler in the meantime. Remember, if you can solve the problem of ultimate malleability of software, you no longer need to worry about up- front planning of "ultimate" classes.. We still need a balance of both, but I believe the future is in the former.
That's my view, anyhow..
- Chris
Of course, you need enough memory to handle everything. 2GB might be a lot or not enough, depending on what each one of those 10,000 or 500 records are doing. Looking at just the 10000. There is probably an uneven distribution (different customers will have different usage patterns). Maybe 60% of that traffic wants to enter during your peak hour, so 6,000 per hour at say 1:00 P.M. local time. If they timeout at 20 minutes...
That's an interesting issue. I would need to carefully monitor memory usage. Are there any docs that can help me incorporate some memory usage analysis of squeak on a per application basis?
I haven't done anything serious in Squeak, but I googled: squeak memory usage
and came across this interesting site:
Which seems to catalog some threads from this list. Found this:
http://www.visoracle.com/squeakfaq/image-size.html
Under the category of:
Squeak Smalltalk : Tools Tricks Usage : prevnext Image Size Space Tally Print Analysis
--David
Look for the thread Garbage Collection in Squeak starting Oct 29th, 2004 (I"ll attach the original note).
The changeset allow you with a current VM to monitor memory usage by Squeak and what the garbage collector is doing. Although it is not designed to work at the class level, it would show over time how the memory footprint of the VM changes.
The changeset and macintosh VM can be found at http://homepage.mac.com/johnmci look in the GC work directory
In order to collect information about how the GC is working.
a) Take a test image and file in JMMGCMonitor.4.cs
b) Run this altered image with the mac carbon VM Squeak 3.8.4Beta1.app or higher.
c) in a work space do
GCMonitor run.
"do your test here"
GCMonitor stop. "When you stop your testing"
This will produce a file "gcStats.txt" which contains statistical data collected from the GC logic every 100 milliseconds. Please email file to johnmci@mac.com with a bit of explanation of what you tested, please archive the file so it's zipped to reduce the size of it.
Now to test with some active tuning, please re-run your test but do
GCMonitor runActive.
"do your test here"
GCMonitor stop. "When you stop your testing"
I will note you must save a copy of the gcStats.txt before you do the runActive because it will overwrite any previous results, say the earlier results of doing "run". Please email the file and note this was the result of runActive, also note any observations about pauses or odd behavior.
On 20-Jun-05, at 9:48 PM, David Mitchell wrote:
Of course, you need enough memory to handle everything. 2GB might be a lot or not enough, depending on what each one of those 10,000 or 500 records are doing. Looking at just the 10000. There is probably an uneven distribution (different customers will have different usage patterns). Maybe 60% of that traffic wants to enter during your peak hour, so 6,000 per hour at say 1:00 P.M. local time. If they timeout at 20 minutes...
That's an interesting issue. I would need to carefully monitor memory usage. Are there any docs that can help me incorporate some memory usage analysis of squeak on a per application basis?
I haven't done anything serious in Squeak, but I googled: squeak memory usage
and came across this interesting site:
Which seems to catalog some threads from this list. Found this:
http://www.visoracle.com/squeakfaq/image-size.html
Under the category of:
Squeak Smalltalk : Tools Tricks Usage : prevnext Image Size Space Tally Print Analysis
--David
As some of you know I from time to time do serious GC tuning in VisualWorks. http://www.smalltalkconsulting.com/papers/GCPaper/GCTalk%202001.htm
For years now (yes years) I've been thinking about doing serious tuning in Squeak, but yet things just weren't interesting enough. However with the release of Croquet, thoughts on TK4, and mumblings from folks I turned my eye upon what the Squeak GC was doing. First thanks to Ted Kaehler (among others) for writing this GC 10 odd years back which we've run with little change over the years until we added the ability to shrink/grow the memory space a few years back.
First I'll need some help from macintosh users that have interesting images and can run a pending 3.8.4 mac VM with instrumentation and then send me some diagnostic log. People who are willing to help should email me directly, and I'll get you a VM and changeset, and we'll see if we can fill up my gmail account with log data. I'm very interested in getting Croquet data, having a test case you can run before/after would be best.
So lets review how does the Squeak GC work:
A simplistic view is that your image (25MB) is loaded into Old Space, and 4MB (set in Smalltalk) is allocated for Young Space. After allocating 4000 (if they fit) new object (limit set in smalltalk) we do an mark/sweep/compacting GC on young space using the VM roots plus roots identified from Old Space to Young Space references. Normally this means looking only at a few thousand objects, then occasionally we might do a full GC across all 300K object based on various conditions, but that's rare.
I'll note the Old Space to Young Space remember table, remembers the Object containing the reference, not the reference so that if you have 100,000 element collection we interate over all 100,000 entries looking for the 1 or more old to young references, a cause for performance concern.
If after completing the mark/sweep/compaction and over 2000 objects (set in Smalltalk) are survivors we "Tenure" them to old space by moving the old/young pointer boundary, then start allocating objects again, mind if the young space exceeds now 8MB free (set in smalltalk) we given memory back and reset things back to the 4MB boundary.
Now object allocation can end early because we are allocating a big object, in that case we do an young GC, followed by a full GC (very expensive) followed by growing young space so the big object can fit. Also of course if we run out of root table remember space, we end the allocation process early and think about Tenuring things.
Or it can end if we drop below a minimum amount of memory (200K set by smalltalk) which makes the image grow, which if it fails signals the low space semaphore (and brings up a low space dialog which no-one really see and have bitterly complained about) Yes I now know why that occurs too...
The Problem:
Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.
What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why... What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.
Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000. The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing those statistics and hiding the GC time.
AllocationCount Survivors 4000 5400 3209 3459 2269 2790 1760 1574 1592 2299 1105 1662 427 2355 392 2374 123 1472 89 1478 79 2 78 2 76 2 76 2
Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...
What will change in the future.
a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.
b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required. For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.
c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.
Some of this might be back-ported to earlier VM, I think so, yet I won't know until we gather more data and try a few things.
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
Daniel:
You might want to look into "subjectivity." Here' a link to get you started: http://www.laputan.org/reflection/subject94.html
--Alan
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] On Behalf Of Daniel Salama Sent: Friday, June 17, 2005 10:26 AM To: chris@funkyobjects.org; The general-purpose Squeak developers list Subject: Re: Grasping the concept of Classes & Categories
Chris,
That's a very interesting perspective. The thought crossed my mind, but then I thought of another "issue" which is what prompted me to ask the question to the list.
Originally I thought I could simply just run an image for each of these specialized applications. For example, one could be a Customer Relationship Management system, while the other is a Medical Billing system. Both systems have different views of what a Customer is and it would make sense to run them in separate images.
However, given my limited resources for hardware (mainly), I wasn't sure if I could, efficiently, run two Squeak images, each with Seaside listening on different ports and both front-ended by Apache on a single machine. I wasn't sure how two images could run on the same machine in terms of performance, processes, memory, etc. Assuming these would run on a dual-Xeon machine with 2GB RAM, would that be feasible?
To some degree, I guess the question will depend on the traffic load on the machine, but I would guesstimate that, for example, the CRM system would handle about 10000 contacts a day (by contact I mean customer record inquiries, or logging call attempts, etc) and maybe about 500 medical bill records per day. If anyone has some experience in these industries, you can guesstimate some sort of traffic load.
I don't k now if I'm being too vague or worrying too much about something that hasn't been materialized yet. I just like to plan ahead.
Thanks, Daniel
On Jun 17, 2005, at 11:50 AM, Chris Muller wrote:
I have two applications that both need a class Customer. However, one of the application requires that class Customer have a set of attributes and methods very different from the second application who uses the class Customer for a totally different industry.
Hi Daniel, for me, the key to the answer to your question is the phrase "totally different industry." If this is true, it seems there would be little chance that the Customers would need to be in the same running image, nor would they need to be developed together.
Therefore I wouldn't worry too much about trying to factor the two Customer classes into a single, grand Customer superclass, if it won't ever be commingled in practice. In fact, doing so could potentially increase cost. For example, the two groups of developers may have their own ideas about what belongs in the generic superclass Customer. So then you have developers spending time reconciling code by two completely disparate industries.
Even if theres a remote chance of the two programs needing to co- exist in an image, you can use refactoring-tools to meld the code at that time. But keep it simpler in the meantime. Remember, if you can solve the problem of ultimate malleability of software, you no longer need to worry about up- front planning of "ultimate" classes.. We still need a balance of both, but I believe the future is in the former.
That's my view, anyhow..
- Chris
squeak-dev@lists.squeakfoundation.org