[squeak-dev] Re: SocketSteam: Switching ascii/binary modes

Tue Mar 16 08:57:38 UTC 2010

2010/3/16 Igor Stasenko <siguctua at gmail.com>:
> On 16 March 2010 05:51, Andreas Raab <andreas.raab at gmx.de> wrote:
>> On 3/15/2010 8:14 PM, Igor Stasenko wrote:
>>>
>>> There could be an alternative approach:
>>> - keep buffers in a single (binary) format and covert an output
>>> depending on mode.
>>>
>>> The choice is, when you should pay the conversion price:
>>> - each time you read something
>>> - each time you switching the mode
>>>
>>> If input is a mix of ascii/binary content, it will be very ineffective
>>> converting the cache each time mode switching.
>>> For example - HTTP 'transfer-encoding: chunked'.
>>> Content may be a binary data, but it could be chunked, then input
>>> becomes a mix of
>>> binary data and hexadecimal ascii values, and crlf's.
>>>
>>> So, it requires mode deep analyzis than just saying 'convert it' :)
>>
>> I don't think it's all that complicated :-)
>>
>> First, you'd slow down all current use cases and introduce a lot of
>> potential bugs if you added conversion upon access. You would also break any
>> extension methods (the next:into: methods were originally extensions on
>> SocketStream before I added them to trunk). Given all of that changing
>> SocketStream in that way seems highly questionable.
>>
>> The specific use case of chunked encoding is interesting too, since the
>> motivation of adding the next:into: family of methods came from reading
>> chunked encoding :-) As a consequence, the fastest way to read chunked
>> encoding in Squeak today is the following:
>>
>> buffer := ByteArray new. "or: ByteString new"
>> [firstLine := socketStream nextLine.
>> chunkSize := ('16r',firstLine asUppercase) asNumber. "icky but works"
>> chunkSize = 0] whileFalse:[
>>  buffer size < chunkSize
>>    ifFalse:[buffer := buffer class new: chunkSize].
>>  buffer := socketStream next: chunkSize into: buffer startingAt: 1.
>>  outStream next: chunkSize putAll: buffer.
>>  socketStream skip: 2. "CRLF"
>> ].
>> socketStream skip: 2. "CRLF"
>>
>> There is no conversion needed between ascii/binary since the next:into: code
>> accepts both strings and byte arrays. By the end of the day switching
>> between ascii and binary is a bit of a convenience function which means that
>> you probably shouldn't be writing high-performance code that depends on
>> constantly switching between the two (I think that's a fair tradeoff). The
>> next:into: family was specifically provided for high-performance situations
>> by providing a pre-allocated buffer and avoiding the allocation overhead.
>>
>
> Yes, #next:into: is convenient, if you know the content size from the start
> or if you want to read all at once into memory.
> But conversion strategy which you showing above doesn't works well for
> all cases.
> For persistent streams, used for exchanging the data between peers,
> there is no notion of 'read everything up to end',
> but usually 'read what is currently available', because peers
> exchanging data in real time and you can't predict what will follow
> next to the last input.
>
> My current intent is to make a fast reader which using a socket as
> backend, and which is:
>
> - reads/parsing http headers
> - handling chunked content transfer encoding
> - handling utf8 content encoding
> - and only then, there is a consumer, which is a JSON parser parsing
> input character by character,
> and, as many other parsers, obviously have no use of #next:into: , but
> using #peek and #next all the way.
>
> The idea is to parse data, once it become available, instead of
> reading all up to the end, and only then start parsing.
> You could ask, why its more effective? - Because of network latency.
> A client, instead of simply waiting for a next data packet to arrive
> could spend this time more productively by parsing the available input
> (besides, it will spend this time anyways, so why wasting the time?).
> This means that results of parsing will be available earlier,
> comparing to scheme, when you start parsing only when all data
> arrives.
> Also, the bigger content size, the more efficient it will be not only
> by speed, but also by memory consumption.
>
> So, i tend to look for a ways, when socket stream design is focused on
> streaming the data and doesn't assuming that consumer of it having any
> preferences to use buffered approach (#next:into: ) over non-buffered
> one (#next).
>

Then you should really consider looking at VW-XTream transforming: stuff.
The idea is to have parallel processing (pipelines).
Of  course, we can not have true parallelism yet in Smalltalk, but at
least the 1st level can work with non blocking squeak socket.

Nicolas

>> Cheers,
>>  - Andreas
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>