[Seaside] How do we get a string with a specific encodings

Thu Aug 2 17:00:37 UTC 2007

2007/8/2, stephane ducasse <stephane.ducasse at free.fr>:
> >
> > Sorry, I didn't get the question. Could you please rephrase and be
> > very exact about:
> > - where your strings come from
>
> either from a file with an encoding (utf8 but could be latin1)
>
> > - what encoding these strings have
> utf-8
>
> > - what encoding these strings should have in the end
>
> I do not know.

You need to make a decision wheter you want utf8 or widestrings/"new
squeak encoding" in your image.
utf8 strings -> WAKom
WideString -> WAKomEncoded39

Make sure you convert the strings you read from disk to the machting
encoding. You can convert a string to widestring/"new squeak encoding"
by sending #convertFromEncoding: and can convert a widestring/"new
squeak encoding" to something else by sending #convertToEncoding:

WideStrings have a history of slow performance and bugs (there are
still known, open bugs). There is no way of telling which methods and
primitives don't work. For utf-8 strings you can rely on pretty much
everything but concatenation being broken (including #size). As for
inspector support both of them don't shine. Your choice.

> Now when the user type text in input fields I would like to have the
> same encoding.
>
> Then I was also curious about the use in the image of encoded strings
> (ie how to create them, convert....)
>
>
> >
> > Just "I have an encoding problem" is not really helpful at all
> > especially if asking for answers before supplying any information.
>
> sorry
>
> >
> >> My opinion
> >> is that you should convert every string you get from outside
> >> the image to the internal format squeak uses. That is a controlled
> >> setting and you are able to use size and other methods on those
> >> strings without worry.
> >
> > Unfortunately even if you have WideStrings you can not rely on that
> > all the methods for Strings work since some are broken for
> > WideStrings.
>
> How can I see that?

Stuff like #match: throwing an exception if the argument is a WideString.

> It would be good to report (if this is not already done)

Done.

> > Additionally you won't be able to inspect your non-latin
> > Squeak WideStrings because they have no language tag and there is no
> > way of adding one.
>
> ahhh.
> What do you mean is that we cannot inspect utf-8 wideStrings ?

There is no such thing as utf-8 wideStrings. There are either utf-8
strings or widestrings. You can inspect both, but neither will display
correctly for non-latin characters if created from Seaside.

> >> The get your file to the web with the
> >> correct encoding you read in the file convert it from utf-8 to
> >> the internal squeak encoding. Then you should use WAKomEncoded39
> >> that will do the conversion to utf-8 when the string is about
> >> to leave the image.
> >
> > That is of course only true if you get your Strings into the image via
> > Seaside and not via FileStream. And of course WAKomEncoded39 does not
> > work in Squeak 3.9 if you have the special version of Kom distributed
> > with the SeasideInstaller.
>
> arghhhh :)

Well it would be much simpler if Kom was maintained. It's not that
we're asking for much. Just integrating changes to three methods.
There are some people working on an updated Kom release and we are
looking forward to it.

> >
> >> From input fields it is nearly the same except
> >> the fact that WAKomEncoded39 doesn't do conversion for fields if
> >> the request is multipart (that means mostly you uploaded something
> >> like image along with the text fields).
> >
> > This is fixed in 2.8. The only place where WAKomEncoded39 does not
> > conversion is for fileuploads because we have no way of telling what
> > encoding the file has or if it's even a text file.
>
> ok
> so I should wait and use 2.8.

Only if you have multipart fields and use WAKomEncoded39

Cheers
Philippe