We recently changed ByteArray>at:put: to remove the backup conversion of the value to integer, for what seemed like decent reasons.
It’s broken my WeatherStation code a little because there are places where I use {my byte stream} nextPutAll: (aString squeakToUtf8) or similar. #squeakToUtf8 returns a bytestring, and of course when the #nextPutAll: loop does its thing each character is pulled out as a Character (even though we know at this point it’s a byte value) - and we’ve just made it impossible to stick a character into a byte array.
Clearly I could fix it reasonably trivially with a few #asByteArray type messages scattered around but it feels a bit tacky somehow. I see some faintly similar code with plausibly similar issues in WebSocket classes too, which would need some care. Not that I can see a lot of usage of that code…
Performance isn’t a colossal issue for MQTT packets but it just rankles a bit to have a known byte valued string and then have to convert it to write it into a byte valued stream collection. KnowWhadIMean?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Never write software that anthropomorphizes the machine. They hate that.
On 29.07.2017, at 21:15, tim Rowledge tim@rowledge.org wrote:
We recently changed ByteArray>at:put: to remove the backup conversion of the value to integer, for what seemed like decent reasons.
It’s broken my WeatherStation code a little because there are places where I use {my byte stream} nextPutAll: (aString squeakToUtf8) or similar. #squeakToUtf8 returns a bytestring, and of course when the #nextPutAll: loop does its thing each character is pulled out as a Character (even though we know at this point it’s a byte value) - and we’ve just made it impossible to stick a character into a byte array.
Clearly I could fix it reasonably trivially with a few #asByteArray type messages scattered around but it feels a bit tacky somehow. I see some faintly similar code with plausibly similar issues in WebSocket classes too, which would need some care. Not that I can see a lot of usage of that code…
Performance isn’t a colossal issue for MQTT packets but it just rankles a bit to have a known byte valued string and then have to convert it to write it into a byte valued stream collection. KnowWhadIMean?
Underlying questions: - does an utf8 encoded string contain unicode-valued characters? -> is an utf8-encoded string a string after all?
I'd suggest no out of purity but I'll expect yes from other out of practicality.
Best regards -Tobias
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Never write software that anthropomorphizes the machine. They hate that.
2017-07-29 21:31 GMT+02:00 Tobias Pape Das.Linux@gmx.de:
On 29.07.2017, at 21:15, tim Rowledge tim@rowledge.org wrote:
We recently changed ByteArray>at:put: to remove the backup conversion of
the value to integer, for what seemed like decent reasons.
It’s broken my WeatherStation code a little because there are places
where I use
{my byte stream} nextPutAll: (aString squeakToUtf8) or similar. #squeakToUtf8 returns a bytestring, and of course when the
#nextPutAll: loop does its thing each character is pulled out as a Character (even though we know at this point it’s a byte value) - and we’ve just made it impossible to stick a character into a byte array.
Clearly I could fix it reasonably trivially with a few #asByteArray type
messages scattered around but it feels a bit tacky somehow. I see some faintly similar code with plausibly similar issues in WebSocket classes too, which would need some care. Not that I can see a lot of usage of that code…
Performance isn’t a colossal issue for MQTT packets but it just rankles
a bit to have a known byte valued string and then have to convert it to write it into a byte valued stream collection. KnowWhadIMean?
Underlying questions:
- does an utf8 encoded string contain unicode-valued characters?
-> is an utf8-encoded string a string after all?
I'd suggest no out of purity but I'll expect yes from other out of practicality.
Best regards -Tobias
Absolutely, to me a String is a sequence of characters. squeakToUtf8 is a hack that makes us consider a String as a sequence of codePoints whose encoding is in the eye of the beholder (or implicitly in the Context - the Smalltalk one). I's not very object oriented and quite fragile. We started to clean Multilingual but never finished the job...
It's difficult to finish it, because we value backward compatibility. So maybe the ByteArray change was a bit radical with this respect.
Nicolas
tim
-- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Never write software that anthropomorphizes the machine. They hate that.
On 29-07-2017, at 12:48 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote: Absolutely, to me a String is a sequence of characters. squeakToUtf8 is a hack that makes us consider a String as a sequence of codePoints whose encoding is in the eye of the beholder (or implicitly in the Context - the Smalltalk one). I's not very object oriented and quite fragile. We started to clean Multilingual but never finished the job…
Yes, that’s pretty much how I see it. Currently the utf8 ‘string’ is just kept as a byte string and the user is expected to understand that it is in a rather dangerous state.
It's difficult to finish it, because we value backward compatibility. So maybe the ByteArray change was a bit radical with this respect.
Backward compatibility can sometimes drive you to loud swearing!
Maybe a new message to return the bytearray of the uft8 data could be added, leaving the old one alone. We should probably consider making an actual UTF8String class, though I did try to work out the best thing to do for that several years ago for NuScratch and got lost in the tangles. Editing the damn things is a pain, to say the leat, so you get to thinking about having the canonical string as an instvar and a byte array and edits work on the String which gets converted at the end of the edit to update bytearray. Or the other way around… or… aaargh!
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Immune from any serious head injury.
On Sat 29. Jul 2017 at 21:16, tim Rowledge tim@rowledge.org wrote:
We recently changed ByteArray>at:put: to remove the backup conversion of the value to integer, for what seemed like decent reasons.
It’s broken my WeatherStation code a little because there are places where I use {my byte stream} nextPutAll: (aString squeakToUtf8) or similar. #squeakToUtf8 returns a bytestring, and of course when the #nextPutAll: loop does its thing each character is pulled out as a Character (even though we know at this point it’s a byte value) - and we’ve just made it impossible to stick a character into a byte array.
Clearly I could fix it reasonably trivially with a few #asByteArray type messages scattered around but it feels a bit tacky somehow. I see some faintly similar code with plausibly similar issues in WebSocket classes too, which would need some care. Not that I can see a lot of usage of that code…
Performance isn’t a colossal issue for MQTT packets but it just rankles a bit to have a known byte valued string and then have to convert it to write it into a byte valued stream collection. KnowWhadIMean?
It only worked accidentally - you can't put text in a binary stream. So either use a string stream or you put a byte array.
It may make sense to create a #asUtf8Bytes method ... or maybe a #nextPutAllUtf8: which could avoid the extra copy.
- Bert -
squeak-dev@lists.squeakfoundation.org