[Seaside] How do we get a string with a specific encodings

Damien Pollet damien.pollet at gmail.com
Wed Sep 12 14:24:29 UTC 2007


On 03/08/2007, stephane ducasse <stephane.ducasse at free.fr> wrote:
> Thanks I will digest that.
> Sounds like a mess in squeak.

What a surprise ;-)

I think I have a similar problem in Citezen. Reading/parsing and utf8
file, writing stuff from it out to an utf8 page with seaside 2.8, and
I still get funky characters instead of accents. I tried guessing what
the actual output encoding is but no success. Normal characters seem
to be ascii/latin1, accented characters use two bytes but are not
correct utf8...

>
> > You need to make a decision wheter you want utf8 or widestrings/"new
> > squeak encoding" in your image.
> > utf8 strings -> WAKom
> > WideString -> WAKomEncoded39
> >
> > Make sure you convert the strings you read from disk to the machting
> > encoding. You can convert a string to widestring/"new squeak encoding"
> > by sending #convertFromEncoding: and can convert a widestring/"new
> > squeak encoding" to something else by sending #convertToEncoding:
> >
> > WideStrings have a history of slow performance and bugs (there are
> > still known, open bugs). There is no way of telling which methods and
> > primitives don't work. For utf-8 strings you can rely on pretty much
> > everything but concatenation being broken (including #size). As for
> > inspector support both of them don't shine. Your choice.
> >
> >> Now when the user type text in input fields I would like to have the
> >> same encoding.
> >>
> >> Then I was also curious about the use in the image of encoded strings
> >> (ie how to create them, convert....)
> >>
> >>
> >>>
> >>> Just "I have an encoding problem" is not really helpful at all
> >>> especially if asking for answers before supplying any information.
> >>
> >> sorry
> >>
> >>>
> >>>> My opinion
> >>>> is that you should convert every string you get from outside
> >>>> the image to the internal format squeak uses. That is a controlled
> >>>> setting and you are able to use size and other methods on those
> >>>> strings without worry.
> >>>
> >>> Unfortunately even if you have WideStrings you can not rely on that
> >>> all the methods for Strings work since some are broken for
> >>> WideStrings.
> >>
> >> How can I see that?
> >
> > Stuff like #match: throwing an exception if the argument is a
> > WideString.
> >
> >> It would be good to report (if this is not already done)
> >
> > Done.
> >
> >>> Additionally you won't be able to inspect your non-latin
> >>> Squeak WideStrings because they have no language tag and there is no
> >>> way of adding one.
> >>
> >> ahhh.
> >> What do you mean is that we cannot inspect utf-8 wideStrings ?
> >
> > There is no such thing as utf-8 wideStrings. There are either utf-8
> > strings or widestrings. You can inspect both, but neither will display
> > correctly for non-latin characters if created from Seaside.
> >
> >>>> The get your file to the web with the
> >>>> correct encoding you read in the file convert it from utf-8 to
> >>>> the internal squeak encoding. Then you should use WAKomEncoded39
> >>>> that will do the conversion to utf-8 when the string is about
> >>>> to leave the image.
> >>>
> >>> That is of course only true if you get your Strings into the
> >>> image via
> >>> Seaside and not via FileStream. And of course WAKomEncoded39 does
> >>> not
> >>> work in Squeak 3.9 if you have the special version of Kom
> >>> distributed
> >>> with the SeasideInstaller.
> >>
> >> arghhhh :)
> >
> > Well it would be much simpler if Kom was maintained. It's not that
> > we're asking for much. Just integrating changes to three methods.
> > There are some people working on an updated Kom release and we are
> > looking forward to it.
> >
> >>>
> >>>> From input fields it is nearly the same except
> >>>> the fact that WAKomEncoded39 doesn't do conversion for fields if
> >>>> the request is multipart (that means mostly you uploaded something
> >>>> like image along with the text fields).
> >>>
> >>> This is fixed in 2.8. The only place where WAKomEncoded39 does not
> >>> conversion is for fileuploads because we have no way of telling what
> >>> encoding the file has or if it's even a text file.
> >>
> >> ok
> >> so I should wait and use 2.8.
> >
> > Only if you have multipart fields and use WAKomEncoded39
> >
> > Cheers
> > Philippe
> > _______________________________________________
> > Seaside mailing list
> > Seaside at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
>
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>


-- 
Damien Pollet
type less, do more [ | ] http://typo.cdlm.fasmz.org


More information about the seaside mailing list