SocketPlugin issues (was: Re: [Vm-dev] Win32 beta test)

David T. Lewis lewis at mail.msen.com
Thu Sep 20 01:16:01 UTC 2012


There seem to be some issues with the new networking code (in Squeak,
for IPv6 support) when running on a Windows VM that has the IPv6
primitives in the SocketPlugin.

I booted up Windows and tried Levente's deadlock test:

> > I could reproduce a deadlock-like state by evaluating:
> > 
> > NetNameResolver addressesForName: 'amazon.com'
> > 

I tried this both with Ian's beta interpreter VM, and one of Eliot's
recent Cog VMs. Both VMs have the IPv6 primitives now, and both of
them show similar issues. I did not see actual deadlocks, but what
I did see was extremely long primitive calls that make the image feel
like it is deadlocked. The #primitiveResolverGetNameInfo call is a
source of problems, and there may be others.

It looks to me like some of the new primitives are invoking some very
slow system functions on Windows, and the Squeak network code updates
cause these primitives to be called if they are available in the VM,
so the newer VMs are having problems. I have not seen these issues
on Linux, so it may reflect differences in the networking support
for different operating systems.

Levente, is this consistent with what you were seeing?

Dave

On Mon, Sep 17, 2012 at 03:37:45PM -0400, David T. Lewis wrote:
> Thanks Levente, this is very helpful.
> 
> It sounds like these are problems in the new network code on the image
> side (and I'm responsible for causing that). The updated VM is probably
> different only in that it provides the IPv6 primitives, which in turn
> expose the bugs on the image side. So I expect that the problems you
> describe should also happen with a unix interpreter VM (I'll check
> and find out as soon as I can).
> 
> I note for the record that Andreas is fully entitled to say "I told
> you so!" at this point ;) 
> 
> To the extent that these are Squeak image problems, evaluating
> "NetNameResolver useOldNetwork: true" should make the symptoms
> go away.
> 
> Other comments in line below.
> 
> On Mon, Sep 17, 2012 at 07:01:47PM +0200, Levente Uzonyi wrote:
> > 
> > On Mon, 17 Sep 2012, David T. Lewis wrote:
> > 
> > >Levente,
> > >
> > >Can you say anything more about what weakness you found in the
> > >network code?
> > 
> > All issues I found are related to name lookup. To do a name lookup the new 
> > code requires multiple primitive calls (see SocketAddressInformation >> 
> > #forHost:service:flags:addressFamily:socketType:protocol:). The plugin 
> > uses static variables to store the result of the name lookup 
> > (hostNameInfo, servNameInfo and nameInfoValid). This means that only one 
> > name can be looked up at a time.
> 
> I noticed that, and attempted to provide some protection for it with
> a semaphore (ResolverMutex) in class NetNameResolver. Apparently I did
> not do a very good job of it though.
> 
> > 
> > The image side code doesn't prevent simultaneous access to these static 
> > variables, so the can get into an unexpected state (see SocketAddress >> 
> > #hostName).
> > 
> > Another issue is that the plugin doesn't allocate objects (strings), so 2 
> > primitive calls have to be done to fetch a string (see SocketAddress >> 
> > #hostName again). One requests the size of the string, the other copies 
> > the data to a string it receives as argument.
> > 
> > I could reproduce a deadlock-like state by evaluating:
> > 
> > NetNameResolver addressesForName: 'amazon.com'
> > 
> > It's sometimes possible to interrupt the process to get a debugger, but 
> > since the primitives are called by the debugger too (see SocketAddress >> 
> > #printOn:), the image will hang if you try to use it.
> > 
> 
> That seems likely to be a problem related to the semaphore in NetNameResolver.
> 
> If this turns out to be a problem with the network support in Squeak
> trunk, we can take the discussion back to squeak-dev for resolution.
> 
> Dave


More information about the Vm-dev mailing list