[Seaside] How do we get a string with a specific encodings

Philippe Marschall philippe.marschall at gmail.com
Wed Sep 12 17:52:37 UTC 2007


2007/9/12, Damien Pollet <damien.pollet at gmail.com>:
> On 03/08/2007, stephane ducasse <stephane.ducasse at free.fr> wrote:
> > Thanks I will digest that.
> > Sounds like a mess in squeak.
>
> What a surprise ;-)
>
> I think I have a similar problem in Citezen. Reading/parsing and utf8
> file,

You you use WAKom or WAKomEncoded*. Do you set the converter for the
file stream? Do you use the Kom from SqueakMap or the one from the
SeasideInstaller?

Cheers
Philippe

> writing stuff from it out to an utf8 page with seaside 2.8, and
> I still get funky characters instead of accents. I tried guessing what
> the actual output encoding is but no success. Normal characters seem
> to be ascii/latin1, accented characters use two bytes but are not
> correct utf8...
>
> >
> > > You need to make a decision wheter you want utf8 or widestrings/"new
> > > squeak encoding" in your image.
> > > utf8 strings -> WAKom
> > > WideString -> WAKomEncoded39
> > >
> > > Make sure you convert the strings you read from disk to the machting
> > > encoding. You can convert a string to widestring/"new squeak encoding"
> > > by sending #convertFromEncoding: and can convert a widestring/"new
> > > squeak encoding" to something else by sending #convertToEncoding:
> > >
> > > WideStrings have a history of slow performance and bugs (there are
> > > still known, open bugs). There is no way of telling which methods and
> > > primitives don't work. For utf-8 strings you can rely on pretty much
> > > everything but concatenation being broken (including #size). As for
> > > inspector support both of them don't shine. Your choice.
> > >
> > >> Now when the user type text in input fields I would like to have the
> > >> same encoding.
> > >>
> > >> Then I was also curious about the use in the image of encoded strings
> > >> (ie how to create them, convert....)
> > >>
> > >>
> > >>>
> > >>> Just "I have an encoding problem" is not really helpful at all
> > >>> especially if asking for answers before supplying any information.
> > >>
> > >> sorry
> > >>
> > >>>
> > >>>> My opinion
> > >>>> is that you should convert every string you get from outside
> > >>>> the image to the internal format squeak uses. That is a controlled
> > >>>> setting and you are able to use size and other methods on those
> > >>>> strings without worry.
> > >>>
> > >>> Unfortunately even if you have WideStrings you can not rely on that
> > >>> all the methods for Strings work since some are broken for
> > >>> WideStrings.
> > >>
> > >> How can I see that?
> > >
> > > Stuff like #match: throwing an exception if the argument is a
> > > WideString.
> > >
> > >> It would be good to report (if this is not already done)
> > >
> > > Done.
> > >
> > >>> Additionally you won't be able to inspect your non-latin
> > >>> Squeak WideStrings because they have no language tag and there is no
> > >>> way of adding one.
> > >>
> > >> ahhh.
> > >> What do you mean is that we cannot inspect utf-8 wideStrings ?
> > >
> > > There is no such thing as utf-8 wideStrings. There are either utf-8
> > > strings or widestrings. You can inspect both, but neither will display
> > > correctly for non-latin characters if created from Seaside.
> > >
> > >>>> The get your file to the web with the
> > >>>> correct encoding you read in the file convert it from utf-8 to
> > >>>> the internal squeak encoding. Then you should use WAKomEncoded39
> > >>>> that will do the conversion to utf-8 when the string is about
> > >>>> to leave the image.
> > >>>
> > >>> That is of course only true if you get your Strings into the
> > >>> image via
> > >>> Seaside and not via FileStream. And of course WAKomEncoded39 does
> > >>> not
> > >>> work in Squeak 3.9 if you have the special version of Kom
> > >>> distributed
> > >>> with the SeasideInstaller.
> > >>
> > >> arghhhh :)
> > >
> > > Well it would be much simpler if Kom was maintained. It's not that
> > > we're asking for much. Just integrating changes to three methods.
> > > There are some people working on an updated Kom release and we are
> > > looking forward to it.
> > >
> > >>>
> > >>>> From input fields it is nearly the same except
> > >>>> the fact that WAKomEncoded39 doesn't do conversion for fields if
> > >>>> the request is multipart (that means mostly you uploaded something
> > >>>> like image along with the text fields).
> > >>>
> > >>> This is fixed in 2.8. The only place where WAKomEncoded39 does not
> > >>> conversion is for fileuploads because we have no way of telling what
> > >>> encoding the file has or if it's even a text file.
> > >>
> > >> ok
> > >> so I should wait and use 2.8.
> > >
> > > Only if you have multipart fields and use WAKomEncoded39
> > >
> > > Cheers
> > > Philippe
> > > _______________________________________________
> > > Seaside mailing list
> > > Seaside at lists.squeakfoundation.org
> > > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> > >
> >
> > _______________________________________________
> > Seaside mailing list
> > Seaside at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
>
>
> --
> Damien Pollet
> type less, do more [ | ] http://typo.cdlm.fasmz.org
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>


More information about the seaside mailing list