[squeak-dev] Re: SocketSteam: Switching ascii/binary modes

Andreas Raab andreas.raab at gmx.de
Tue Mar 16 03:51:53 UTC 2010


On 3/15/2010 8:14 PM, Igor Stasenko wrote:
> There could be an alternative approach:
> - keep buffers in a single (binary) format and covert an output
> depending on mode.
>
> The choice is, when you should pay the conversion price:
> - each time you read something
> - each time you switching the mode
>
> If input is a mix of ascii/binary content, it will be very ineffective
> converting the cache each time mode switching.
> For example - HTTP 'transfer-encoding: chunked'.
> Content may be a binary data, but it could be chunked, then input
> becomes a mix of
> binary data and hexadecimal ascii values, and crlf's.
>
> So, it requires mode deep analyzis than just saying 'convert it' :)

I don't think it's all that complicated :-)

First, you'd slow down all current use cases and introduce a lot of 
potential bugs if you added conversion upon access. You would also break 
any extension methods (the next:into: methods were originally extensions 
on SocketStream before I added them to trunk). Given all of that 
changing SocketStream in that way seems highly questionable.

The specific use case of chunked encoding is interesting too, since the 
motivation of adding the next:into: family of methods came from reading 
chunked encoding :-) As a consequence, the fastest way to read chunked 
encoding in Squeak today is the following:

buffer := ByteArray new. "or: ByteString new"
[firstLine := socketStream nextLine.
chunkSize := ('16r',firstLine asUppercase) asNumber. "icky but works"
chunkSize = 0] whileFalse:[
   buffer size < chunkSize
     ifFalse:[buffer := buffer class new: chunkSize].
   buffer := socketStream next: chunkSize into: buffer startingAt: 1.
   outStream next: chunkSize putAll: buffer.
   socketStream skip: 2. "CRLF"
].
socketStream skip: 2. "CRLF"

There is no conversion needed between ascii/binary since the next:into: 
code accepts both strings and byte arrays. By the end of the day 
switching between ascii and binary is a bit of a convenience function 
which means that you probably shouldn't be writing high-performance code 
that depends on constantly switching between the two (I think that's a 
fair tradeoff). The next:into: family was specifically provided for 
high-performance situations by providing a pre-allocated buffer and 
avoiding the allocation overhead.

Cheers,
   - Andreas



More information about the Squeak-dev mailing list