[squeak-dev] Re: SocketSteam: Switching ascii/binary modes
Andreas Raab
andreas.raab at gmx.de
Tue Mar 16 03:51:53 UTC 2010
On 3/15/2010 8:14 PM, Igor Stasenko wrote:
> There could be an alternative approach:
> - keep buffers in a single (binary) format and covert an output
> depending on mode.
>
> The choice is, when you should pay the conversion price:
> - each time you read something
> - each time you switching the mode
>
> If input is a mix of ascii/binary content, it will be very ineffective
> converting the cache each time mode switching.
> For example - HTTP 'transfer-encoding: chunked'.
> Content may be a binary data, but it could be chunked, then input
> becomes a mix of
> binary data and hexadecimal ascii values, and crlf's.
>
> So, it requires mode deep analyzis than just saying 'convert it' :)
I don't think it's all that complicated :-)
First, you'd slow down all current use cases and introduce a lot of
potential bugs if you added conversion upon access. You would also break
any extension methods (the next:into: methods were originally extensions
on SocketStream before I added them to trunk). Given all of that
changing SocketStream in that way seems highly questionable.
The specific use case of chunked encoding is interesting too, since the
motivation of adding the next:into: family of methods came from reading
chunked encoding :-) As a consequence, the fastest way to read chunked
encoding in Squeak today is the following:
buffer := ByteArray new. "or: ByteString new"
[firstLine := socketStream nextLine.
chunkSize := ('16r',firstLine asUppercase) asNumber. "icky but works"
chunkSize = 0] whileFalse:[
buffer size < chunkSize
ifFalse:[buffer := buffer class new: chunkSize].
buffer := socketStream next: chunkSize into: buffer startingAt: 1.
outStream next: chunkSize putAll: buffer.
socketStream skip: 2. "CRLF"
].
socketStream skip: 2. "CRLF"
There is no conversion needed between ascii/binary since the next:into:
code accepts both strings and byte arrays. By the end of the day
switching between ascii and binary is a bit of a convenience function
which means that you probably shouldn't be writing high-performance code
that depends on constantly switching between the two (I think that's a
fair tradeoff). The next:into: family was specifically provided for
high-performance situations by providing a pre-allocated buffer and
avoiding the allocation overhead.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|