[Box-Admins] [Vm-dev] Could someone add a VMMaker Inbox project to source.squeak.org?
David T. Lewis
lewis at mail.msen.com
Wed Oct 3 01:00:49 UTC 2018
On Tue, Oct 02, 2018 at 04:22:36PM -0700, Bert Freudenberg wrote:
> [Let's move that part of the discussion here]
> Our virtual server might be significantly slower than your machine. But it
> still seems suspiciously slow, agreed.
I strongly suspect, but cannot prove, that the timeouts and slowdowns are
associated with the source.squeak.org image serializing its repository to
to a file on disk (ss/data.obj) whenever something important has changed
in the repository.
This could be confirmed by capturing or logging the processing time for
the serializing. I do not feel comfortable doing this myself, but maybe
Chris can add something to the image to confirm.
I do not believe that Magma as the backing store is the cause of the problem,
although it would be good to measure and confirm this also.
Assuming that serializing to ss/data.obj is the issue, I can see two ways
to deal with it in the near term:
1) Don't do that. The squeaksource.com image has not used that approach
for many years, since long before I started maintaining it on squeak.org.
Instead it just does an hourly image save, which is fast enough and does
a good job of persistence for purposes of recovery from failures.
2) If serializing to ss/data.obj is important, consider doing it in
a #forkSqueak image so it does not block the image. This would require
some development work and careful testing but can probably be done fairly
easily if the ss/data.obj save is considered important.
I was not around when the early squeaksource.com supporters decided to
stop using object serialization for persistence. But the early repository
was probably very small, and my guess is that the decision may have had
something to do with performance as the size of the squeaksource repository
grew. Since that time, computers are faster and the VM is faster, but
serializing the repository is still slow. So .... don't do that.
> - Bert -
> On Tue, Oct 2, 2018 at 3:54 PM Levente Uzonyi <leves at caesar.elte.hu> wrote:
> > On Tue, 2 Oct 2018, Chris Muller wrote:
> > >
> > > There are a number of factors that combine to lead to occasional
> > > timeouts depending on the package.
> > >
> > > In this case, VMMaker is a huge package, and it takes a long time for
> > > the server to open up the .mcz and materialize its contents. That's
> > > why, for example, the diffing operation, which doesn't use Magma at
> > > all, times out.
> > I measured diff creation on my machine using various VMMaker versions:
> > - creating a diff of 2 versions takes less than 2 seconds
> > - from this 2 seconds the actuall diff creation takes about 150ms, which
> > can easily be halved with some basic optimizations
> > - more than a second is spent by DataStream reading and mangling strings
> > for no real reason. We should be able to significantly reduce that as
> > well.
> > - the remaining few hundred milliseconds is spent on parsing the ancestry,
> > which can only be solved by using a new non-recursive ancestry format
> > which doesn't require you to parse everything to read a single entry.
> > But 2 seconds is not enough to get a 504 response, so there must be some
> > other reason for those timeouts.
> > Levente
> > >
> > > When a new version of VMMaker is saved, once again, it must be opened
> > > up by the server and then, all of its MCDefinitions are enumerated and
> > > indexed into huge Dictionary's stored by Magma. This takes a long
> > > time and even though it occurs in a background local Smalltalk
> > > Process, it can affect local image performance.
> > >
> > > Another cause are the many thousands of files which the server must
> > > constantly refresh its directory cache, again and again, as it grows
> > > ever larger.
> > >
> > > Yet another issue is the dual persistence -- currently since we use
> > > both File-based AND Magma, we have to save a "data.obj" file, and so
> > > we're forced to keep the entire model into memory instead of taking
> > > advantage of Magma's ability to work efficiently with subgraphs of
> > > models. But this is the only way we can have the MC history and
> > > origin functions.
> > >
> > > As I run the exact same code repository on my laptops to manage my own
> > > code as runs source.squeak.org, after Squeak 5.2, I plan to refresh
> > > and test my own codebase, bring it up to 64-bit, latest Magma, and
> > > make some optimizations. I'll test it by running my own development
> > > with it for a while and then apply the same upgrades to
> > > source.squeak.org.
> > >
> > > - Chris
> > > On Tue, Oct 2, 2018 at 12:08 PM Levente Uzonyi <leves at caesar.elte.hu>
> > wrote:
> > >>
> > >>
> > >> My impression is that these slowdowns are present since the update to
> > the
> > >> Magma backend.
> > >> I haven't seen the code yet, but I presume the code accessing data from
> > >> Magma is responsible for this, and someone familiar with Magma can
> > easily
> > >> fix it.
> > >>
> > >> Levente
More information about the Box-Admins