SocketPlugin issues (was: Re: [Vm-dev] Win32 beta test)

David T. Lewis lewis at mail.msen.com
Thu Sep 20 18:11:35 UTC 2012


We have an upcoming image release, so I think that we should consider
de-activating the new network support for that release. I think that
can be done by just setting UseOldNetwork to true. Right now it is
automatically set at image startup, depending on whether the IPv6
primitives are available in the image. We could turn it back on in
trunk after the release, and deal with the issues then.

It would not be good to release with network issues that might affect
the entire Windows user base.

Dave


On Thu, Sep 20, 2012 at 07:50:42PM +0200, Levente Uzonyi wrote:
> On Wed, 19 Sep 2012, David T. Lewis wrote:
> 
> >There seem to be some issues with the new networking code (in Squeak,
> >for IPv6 support) when running on a Windows VM that has the IPv6
> >primitives in the SocketPlugin.
> >
> >I booted up Windows and tried Levente's deadlock test:
> >
> >>>I could reproduce a deadlock-like state by evaluating:
> >>>
> >>>NetNameResolver addressesForName: 'amazon.com'
> >>>
> >
> >I tried this both with Ian's beta interpreter VM, and one of Eliot's
> >recent Cog VMs. Both VMs have the IPv6 primitives now, and both of
> >them show similar issues. I did not see actual deadlocks, but what
> >I did see was extremely long primitive calls that make the image feel
> >like it is deadlocked. The #primitiveResolverGetNameInfo call is a
> >source of problems, and there may be others.
> >
> >It looks to me like some of the new primitives are invoking some very
> >slow system functions on Windows, and the Squeak network code updates
> >cause these primitives to be called if they are available in the VM,
> >so the newer VMs are having problems. I have not seen these issues
> >on Linux, so it may reflect differences in the networking support
> >for different operating systems.
> >
> >Levente, is this consistent with what you were seeing?
> 
> It is, though I didn't check which primitive takes too long. I checked the 
> implementation of #primitiveResolverGetNameInfo now on win32 and it seems 
> to be okay, pretty much the same as on unix/linux. I see no reason why it 
> would take so long to respond.
> 
> In the meantime I found another thing I don't like about the new 
> primitives. The timeout for the namelookup is ignored, so now we have to 
> wait till the primitive returns.
> 
> Actually I don't like the way how the name lookup is implemented. I think 
> it might worth moving the DNS lookup to Squeak. The only thing the VM 
> needs to provide then is a primitive which returns the IP addresses 
> of the nameservers to be used (though the system can work without that 
> too). Here are the pros and cons I came up with so far:
> Pros:
> - the code is in Smalltalk, so it's platform independent
> - concurrent name lookups become possible
> - no more long waits on the VM side
> Cons:
> - extra complexity, since a DNS client has to be implemented
> - the OS's DNS cache can't be used
> 
> 
> Levente
> 
> >
> >Dave
> >
> >On Mon, Sep 17, 2012 at 03:37:45PM -0400, David T. Lewis wrote:
> >>Thanks Levente, this is very helpful.
> >>
> >>It sounds like these are problems in the new network code on the image
> >>side (and I'm responsible for causing that). The updated VM is probably
> >>different only in that it provides the IPv6 primitives, which in turn
> >>expose the bugs on the image side. So I expect that the problems you
> >>describe should also happen with a unix interpreter VM (I'll check
> >>and find out as soon as I can).
> >>
> >>I note for the record that Andreas is fully entitled to say "I told
> >>you so!" at this point ;)
> >>
> >>To the extent that these are Squeak image problems, evaluating
> >>"NetNameResolver useOldNetwork: true" should make the symptoms
> >>go away.
> >>
> >>Other comments in line below.
> >>
> >>On Mon, Sep 17, 2012 at 07:01:47PM +0200, Levente Uzonyi wrote:
> >>>
> >>>On Mon, 17 Sep 2012, David T. Lewis wrote:
> >>>
> >>>>Levente,
> >>>>
> >>>>Can you say anything more about what weakness you found in the
> >>>>network code?
> >>>
> >>>All issues I found are related to name lookup. To do a name lookup the 
> >>>new
> >>>code requires multiple primitive calls (see SocketAddressInformation >>
> >>>#forHost:service:flags:addressFamily:socketType:protocol:). The plugin
> >>>uses static variables to store the result of the name lookup
> >>>(hostNameInfo, servNameInfo and nameInfoValid). This means that only one
> >>>name can be looked up at a time.
> >>
> >>I noticed that, and attempted to provide some protection for it with
> >>a semaphore (ResolverMutex) in class NetNameResolver. Apparently I did
> >>not do a very good job of it though.
> >>
> >>>
> >>>The image side code doesn't prevent simultaneous access to these static
> >>>variables, so the can get into an unexpected state (see SocketAddress >>
> >>>#hostName).
> >>>
> >>>Another issue is that the plugin doesn't allocate objects (strings), so 2
> >>>primitive calls have to be done to fetch a string (see SocketAddress >>
> >>>#hostName again). One requests the size of the string, the other copies
> >>>the data to a string it receives as argument.
> >>>
> >>>I could reproduce a deadlock-like state by evaluating:
> >>>
> >>>NetNameResolver addressesForName: 'amazon.com'
> >>>
> >>>It's sometimes possible to interrupt the process to get a debugger, but
> >>>since the primitives are called by the debugger too (see SocketAddress >>
> >>>#printOn:), the image will hang if you try to use it.
> >>>
> >>
> >>That seems likely to be a problem related to the semaphore in 
> >>NetNameResolver.
> >>
> >>If this turns out to be a problem with the network support in Squeak
> >>trunk, we can take the discussion back to squeak-dev for resolution.
> >>
> >>Dave
> >


More information about the Vm-dev mailing list