SocketPlugin issues (was: Re: [Vm-dev] Win32 beta test)

Levente Uzonyi leves at elte.hu
Thu Sep 20 17:50:42 UTC 2012


On Wed, 19 Sep 2012, David T. Lewis wrote:

> There seem to be some issues with the new networking code (in Squeak,
> for IPv6 support) when running on a Windows VM that has the IPv6
> primitives in the SocketPlugin.
>
> I booted up Windows and tried Levente's deadlock test:
>
>>> I could reproduce a deadlock-like state by evaluating:
>>>
>>> NetNameResolver addressesForName: 'amazon.com'
>>>
>
> I tried this both with Ian's beta interpreter VM, and one of Eliot's
> recent Cog VMs. Both VMs have the IPv6 primitives now, and both of
> them show similar issues. I did not see actual deadlocks, but what
> I did see was extremely long primitive calls that make the image feel
> like it is deadlocked. The #primitiveResolverGetNameInfo call is a
> source of problems, and there may be others.
>
> It looks to me like some of the new primitives are invoking some very
> slow system functions on Windows, and the Squeak network code updates
> cause these primitives to be called if they are available in the VM,
> so the newer VMs are having problems. I have not seen these issues
> on Linux, so it may reflect differences in the networking support
> for different operating systems.
>
> Levente, is this consistent with what you were seeing?

It is, though I didn't check which primitive takes too long. I checked the 
implementation of #primitiveResolverGetNameInfo now on win32 and it seems 
to be okay, pretty much the same as on unix/linux. I see no reason why it 
would take so long to respond.

In the meantime I found another thing I don't like about the new 
primitives. The timeout for the namelookup is ignored, so now we have to 
wait till the primitive returns.

Actually I don't like the way how the name lookup is implemented. I think 
it might worth moving the DNS lookup to Squeak. The only thing the VM 
needs to provide then is a primitive which returns the IP addresses 
of the nameservers to be used (though the system can work without that 
too). Here are the pros and cons I came up with so far:
Pros:
- the code is in Smalltalk, so it's platform independent
- concurrent name lookups become possible
- no more long waits on the VM side
Cons:
- extra complexity, since a DNS client has to be implemented
- the OS's DNS cache can't be used


Levente

>
> Dave
>
> On Mon, Sep 17, 2012 at 03:37:45PM -0400, David T. Lewis wrote:
>> Thanks Levente, this is very helpful.
>>
>> It sounds like these are problems in the new network code on the image
>> side (and I'm responsible for causing that). The updated VM is probably
>> different only in that it provides the IPv6 primitives, which in turn
>> expose the bugs on the image side. So I expect that the problems you
>> describe should also happen with a unix interpreter VM (I'll check
>> and find out as soon as I can).
>>
>> I note for the record that Andreas is fully entitled to say "I told
>> you so!" at this point ;)
>>
>> To the extent that these are Squeak image problems, evaluating
>> "NetNameResolver useOldNetwork: true" should make the symptoms
>> go away.
>>
>> Other comments in line below.
>>
>> On Mon, Sep 17, 2012 at 07:01:47PM +0200, Levente Uzonyi wrote:
>>>
>>> On Mon, 17 Sep 2012, David T. Lewis wrote:
>>>
>>>> Levente,
>>>>
>>>> Can you say anything more about what weakness you found in the
>>>> network code?
>>>
>>> All issues I found are related to name lookup. To do a name lookup the new
>>> code requires multiple primitive calls (see SocketAddressInformation >>
>>> #forHost:service:flags:addressFamily:socketType:protocol:). The plugin
>>> uses static variables to store the result of the name lookup
>>> (hostNameInfo, servNameInfo and nameInfoValid). This means that only one
>>> name can be looked up at a time.
>>
>> I noticed that, and attempted to provide some protection for it with
>> a semaphore (ResolverMutex) in class NetNameResolver. Apparently I did
>> not do a very good job of it though.
>>
>>>
>>> The image side code doesn't prevent simultaneous access to these static
>>> variables, so the can get into an unexpected state (see SocketAddress >>
>>> #hostName).
>>>
>>> Another issue is that the plugin doesn't allocate objects (strings), so 2
>>> primitive calls have to be done to fetch a string (see SocketAddress >>
>>> #hostName again). One requests the size of the string, the other copies
>>> the data to a string it receives as argument.
>>>
>>> I could reproduce a deadlock-like state by evaluating:
>>>
>>> NetNameResolver addressesForName: 'amazon.com'
>>>
>>> It's sometimes possible to interrupt the process to get a debugger, but
>>> since the primitives are called by the debugger too (see SocketAddress >>
>>> #printOn:), the image will hang if you try to use it.
>>>
>>
>> That seems likely to be a problem related to the semaphore in NetNameResolver.
>>
>> If this turns out to be a problem with the network support in Squeak
>> trunk, we can take the discussion back to squeak-dev for resolution.
>>
>> Dave
>


More information about the Vm-dev mailing list