[squeak-dev] Issues in the new network support (was: SocketPlugin issues (was: Re: [Vm-dev] Win32 beta test))

David T. Lewis lewis at mail.msen.com
Thu Sep 20 18:16:44 UTC 2012


Moving the discussion to squeak-dev. Follow-up related to the Squeak image
should be on squeak-dev, discussion of the primitives can stay on vm-dev.

Dave

On Thu, Sep 20, 2012 at 02:11:35PM -0400, David T. Lewis wrote:
>  
> We have an upcoming image release, so I think that we should consider
> de-activating the new network support for that release. I think that
> can be done by just setting UseOldNetwork to true. Right now it is
> automatically set at image startup, depending on whether the IPv6
> primitives are available in the image. We could turn it back on in
> trunk after the release, and deal with the issues then.
> 
> It would not be good to release with network issues that might affect
> the entire Windows user base.
> 
> Dave
> 
> 
> On Thu, Sep 20, 2012 at 07:50:42PM +0200, Levente Uzonyi wrote:
> > On Wed, 19 Sep 2012, David T. Lewis wrote:
> > 
> > >There seem to be some issues with the new networking code (in Squeak,
> > >for IPv6 support) when running on a Windows VM that has the IPv6
> > >primitives in the SocketPlugin.
> > >
> > >I booted up Windows and tried Levente's deadlock test:
> > >
> > >>>I could reproduce a deadlock-like state by evaluating:
> > >>>
> > >>>NetNameResolver addressesForName: 'amazon.com'
> > >>>
> > >
> > >I tried this both with Ian's beta interpreter VM, and one of Eliot's
> > >recent Cog VMs. Both VMs have the IPv6 primitives now, and both of
> > >them show similar issues. I did not see actual deadlocks, but what
> > >I did see was extremely long primitive calls that make the image feel
> > >like it is deadlocked. The #primitiveResolverGetNameInfo call is a
> > >source of problems, and there may be others.
> > >
> > >It looks to me like some of the new primitives are invoking some very
> > >slow system functions on Windows, and the Squeak network code updates
> > >cause these primitives to be called if they are available in the VM,
> > >so the newer VMs are having problems. I have not seen these issues
> > >on Linux, so it may reflect differences in the networking support
> > >for different operating systems.
> > >
> > >Levente, is this consistent with what you were seeing?
> > 
> > It is, though I didn't check which primitive takes too long. I checked the 
> > implementation of #primitiveResolverGetNameInfo now on win32 and it seems 
> > to be okay, pretty much the same as on unix/linux. I see no reason why it 
> > would take so long to respond.
> > 
> > In the meantime I found another thing I don't like about the new 
> > primitives. The timeout for the namelookup is ignored, so now we have to 
> > wait till the primitive returns.
> > 
> > Actually I don't like the way how the name lookup is implemented. I think 
> > it might worth moving the DNS lookup to Squeak. The only thing the VM 
> > needs to provide then is a primitive which returns the IP addresses 
> > of the nameservers to be used (though the system can work without that 
> > too). Here are the pros and cons I came up with so far:
> > Pros:
> > - the code is in Smalltalk, so it's platform independent
> > - concurrent name lookups become possible
> > - no more long waits on the VM side
> > Cons:
> > - extra complexity, since a DNS client has to be implemented
> > - the OS's DNS cache can't be used
> > 
> > 
> > Levente
> > 
> > >
> > >Dave
> > >
> > >On Mon, Sep 17, 2012 at 03:37:45PM -0400, David T. Lewis wrote:
> > >>Thanks Levente, this is very helpful.
> > >>
> > >>It sounds like these are problems in the new network code on the image
> > >>side (and I'm responsible for causing that). The updated VM is probably
> > >>different only in that it provides the IPv6 primitives, which in turn
> > >>expose the bugs on the image side. So I expect that the problems you
> > >>describe should also happen with a unix interpreter VM (I'll check
> > >>and find out as soon as I can).
> > >>
> > >>I note for the record that Andreas is fully entitled to say "I told
> > >>you so!" at this point ;)
> > >>
> > >>To the extent that these are Squeak image problems, evaluating
> > >>"NetNameResolver useOldNetwork: true" should make the symptoms
> > >>go away.
> > >>
> > >>Other comments in line below.
> > >>
> > >>On Mon, Sep 17, 2012 at 07:01:47PM +0200, Levente Uzonyi wrote:
> > >>>
> > >>>On Mon, 17 Sep 2012, David T. Lewis wrote:
> > >>>
> > >>>>Levente,
> > >>>>
> > >>>>Can you say anything more about what weakness you found in the
> > >>>>network code?
> > >>>
> > >>>All issues I found are related to name lookup. To do a name lookup the 
> > >>>new
> > >>>code requires multiple primitive calls (see SocketAddressInformation >>
> > >>>#forHost:service:flags:addressFamily:socketType:protocol:). The plugin
> > >>>uses static variables to store the result of the name lookup
> > >>>(hostNameInfo, servNameInfo and nameInfoValid). This means that only one
> > >>>name can be looked up at a time.
> > >>
> > >>I noticed that, and attempted to provide some protection for it with
> > >>a semaphore (ResolverMutex) in class NetNameResolver. Apparently I did
> > >>not do a very good job of it though.
> > >>
> > >>>
> > >>>The image side code doesn't prevent simultaneous access to these static
> > >>>variables, so the can get into an unexpected state (see SocketAddress >>
> > >>>#hostName).
> > >>>
> > >>>Another issue is that the plugin doesn't allocate objects (strings), so 2
> > >>>primitive calls have to be done to fetch a string (see SocketAddress >>
> > >>>#hostName again). One requests the size of the string, the other copies
> > >>>the data to a string it receives as argument.
> > >>>
> > >>>I could reproduce a deadlock-like state by evaluating:
> > >>>
> > >>>NetNameResolver addressesForName: 'amazon.com'
> > >>>
> > >>>It's sometimes possible to interrupt the process to get a debugger, but
> > >>>since the primitives are called by the debugger too (see SocketAddress >>
> > >>>#printOn:), the image will hang if you try to use it.
> > >>>
> > >>
> > >>That seems likely to be a problem related to the semaphore in 
> > >>NetNameResolver.
> > >>
> > >>If this turns out to be a problem with the network support in Squeak
> > >>trunk, we can take the discussion back to squeak-dev for resolution.
> > >>
> > >>Dave
> > >


More information about the Squeak-dev mailing list