[Seaside] How do we get a string with a specific encodings

stephane ducasse stephane.ducasse at free.fr
Fri Aug 3 08:34:46 UTC 2007


Thanks I will digest that.
Sounds like a mess in squeak.

> You need to make a decision wheter you want utf8 or widestrings/"new
> squeak encoding" in your image.
> utf8 strings -> WAKom
> WideString -> WAKomEncoded39
>
> Make sure you convert the strings you read from disk to the machting
> encoding. You can convert a string to widestring/"new squeak encoding"
> by sending #convertFromEncoding: and can convert a widestring/"new
> squeak encoding" to something else by sending #convertToEncoding:
>
> WideStrings have a history of slow performance and bugs (there are
> still known, open bugs). There is no way of telling which methods and
> primitives don't work. For utf-8 strings you can rely on pretty much
> everything but concatenation being broken (including #size). As for
> inspector support both of them don't shine. Your choice.
>
>> Now when the user type text in input fields I would like to have the
>> same encoding.
>>
>> Then I was also curious about the use in the image of encoded strings
>> (ie how to create them, convert....)
>>
>>
>>>
>>> Just "I have an encoding problem" is not really helpful at all
>>> especially if asking for answers before supplying any information.
>>
>> sorry
>>
>>>
>>>> My opinion
>>>> is that you should convert every string you get from outside
>>>> the image to the internal format squeak uses. That is a controlled
>>>> setting and you are able to use size and other methods on those
>>>> strings without worry.
>>>
>>> Unfortunately even if you have WideStrings you can not rely on that
>>> all the methods for Strings work since some are broken for
>>> WideStrings.
>>
>> How can I see that?
>
> Stuff like #match: throwing an exception if the argument is a  
> WideString.
>
>> It would be good to report (if this is not already done)
>
> Done.
>
>>> Additionally you won't be able to inspect your non-latin
>>> Squeak WideStrings because they have no language tag and there is no
>>> way of adding one.
>>
>> ahhh.
>> What do you mean is that we cannot inspect utf-8 wideStrings ?
>
> There is no such thing as utf-8 wideStrings. There are either utf-8
> strings or widestrings. You can inspect both, but neither will display
> correctly for non-latin characters if created from Seaside.
>
>>>> The get your file to the web with the
>>>> correct encoding you read in the file convert it from utf-8 to
>>>> the internal squeak encoding. Then you should use WAKomEncoded39
>>>> that will do the conversion to utf-8 when the string is about
>>>> to leave the image.
>>>
>>> That is of course only true if you get your Strings into the  
>>> image via
>>> Seaside and not via FileStream. And of course WAKomEncoded39 does  
>>> not
>>> work in Squeak 3.9 if you have the special version of Kom  
>>> distributed
>>> with the SeasideInstaller.
>>
>> arghhhh :)
>
> Well it would be much simpler if Kom was maintained. It's not that
> we're asking for much. Just integrating changes to three methods.
> There are some people working on an updated Kom release and we are
> looking forward to it.
>
>>>
>>>> From input fields it is nearly the same except
>>>> the fact that WAKomEncoded39 doesn't do conversion for fields if
>>>> the request is multipart (that means mostly you uploaded something
>>>> like image along with the text fields).
>>>
>>> This is fixed in 2.8. The only place where WAKomEncoded39 does not
>>> conversion is for fileuploads because we have no way of telling what
>>> encoding the file has or if it's even a text file.
>>
>> ok
>> so I should wait and use 2.8.
>
> Only if you have multipart fields and use WAKomEncoded39
>
> Cheers
> Philippe
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>



More information about the Seaside mailing list