[Box-Admins] Re: [squeak-dev] Re: [Seaside] SqueakSource/Seaside question - has anyone seen this problem before?

Philippe Marschall philippe.marschall at gmail.com
Mon Dec 23 11:51:52 UTC 2013


On Sun, Dec 22, 2013 at 5:50 PM, David T. Lewis <lewis at mail.msen.com> wrote:
> I think that we probably have a few issues related to wide strings in the
> SqueakSource code, and these issues certainly will effect source.squeak.org
> in the same way that we have seen on squeaksource.com. The only difference
> being that we do not happen to have any author names with multibyte characters
> registered on source.squeak.org at the moment.
>
> When I was originally loading the old SqueakSource onto our squeak.org servers,
> I found some problems with the image updating its repository from disk, and
> at the time I chose to work around them manually in order to get the system
> up and running. But there seem to be places where the identity of an author
> is saved in the repository (in the image, not on disk), and stored with
> possibly different encoding in the MCZ file names on disk, and may be stored
> with yet another possibly different encoding internally within the MCZ file.
>
> The good news is that the squeaksource.com files and image give us enough
> real life data that we should be able to locate the problem cases and think
> about how to handle them properly. For example, the ss.log file shows evidence
> of continuing problems related to six specific files:
>
> 2013-12-21T18:39:10.089+00:00 RECOVERING FelTimetable/FelTimetable-M·Sa.53.mcz
> 2013-12-21T18:39:11.353+00:00 RECOVERING FelTimetable/FelTimetable-M·Sa.52.mcz
> 2013-12-21T18:39:11.602+00:00 RECOVERING FelTimetable/FelTimetable-M·Sa.55.mcz
> 2013-12-21T18:39:12.884+00:00 RECOVERING FelTimetable/FelTimetable-M·Sa.66.mcz
> 2013-12-21T18:39:14.193+00:00 RECOVERING FelTimetable/FelTimetable-M·Sa.54.mcz
> 2013-12-21T18:39:15.45+00:00 RECOVERING FelTimetable/Seaside2.8a1-M·Sa.49.mcz
>
> So some follow up is needed. But maybe not today, for now I'm just happy
> to have the site running again :-)

Long story short it's messy and your options are kinda limited. The
problems stem from the fact that the Seaside version of SqueakSource
is very old (probably a decade by now). It's unmaintained and missing
all the Unicode fixes that went in over the past years. There are
newer versions of SqueakSource available [1] [2] [3] that work with
newer versions of Seaside. The trouble however is migrating (you'll
likely have to migrate to a newer version of Squeak as well). But I'm
sure all the advocates of images and objects will be eager lend you a
helping hand.

Now regarding encoding there are two things you need decide. What
should be the internal encoding in the image and what should be the
external encoding on the web page. If they are different some
transcoding has to happen for both input and output. In Seaside 3.x
this is quite easy to do, in Seaside 2.6 not so much. The webpage
currently seems to use iso-8859-1 as indicated in the XML preamble
(there is no HTTP header). I assume (without being sure) that the
internal encoding is Squeak/MacRoman. Which brings us to the question
how St鰨ane Munioz ended up in the image. Can you confirm that his name
is a WideString and 鰨 is a single instance of Character?
The obvious choice at this point would be to go for utf-8 external and
Squeak internal. The easiest way to do this would be to use
WAKomEncoded but I don't think this is even present in this version of
Seaside. Remember you'll have to encode all the output in decode all
the input. For example when I search for "Munioz" under "Members"
still only half the page renders. There is one downside to this
approach though and that is that you'll end up having WideStrings in
the image. WideString has a bad reputation of being slow and buggy.
Seaside 3.x helps a bit because the response would be encoded on the
fly and therefore avoid a huge WideString response buffer. To avoid
this you could use utf-8 internally but that breaks all length related
methods and you'll have to pay attention when interacting with
external systems (eg. file system).

General itmes:
One of the optimizations we never had time to implement was installing
mod_xsendfile [4]. Serving all the MCZ files through the image is very
inefficient and puts unnecessary pressure on the image. We can't do it
directly with Apache because we have to do an authentication check
first. mod_xsendfile would allow the image to tell Apache which file
to serve.
>From time to time the image would lock up completely. We applied
several patches that were supposed to make Semaphore thread-safe but
the issue never fully went away. Some people said this was because
SqueakSource was never designed to handle this load. I don't
understand this argument, even if this is the case that should just
make the image slow, not lock it up.

 [1] http://www.squeaksource.com/ss2.html
 [2] http://www.squeaksource.com/squeaksource3.html
 [3] http://ss3.gemstone.com/ss/ss3.html
 [4] https://tn123.org/mod_xsendfile/

Cheers
Philippe


More information about the Box-Admins mailing list