[squeak-dev] unrecognized unicode characters
leves at elte.hu
Thu Sep 1 22:39:58 UTC 2011
On Tue, 30 Aug 2011, Gonzalo Romano wrote:
> Hi levente thanks for you answer, the idea was just to process the
> files, I'm sure the files are in utf8, I've used squeakToUtf8 to
> convert the string and write the files, but no luck.
That converter should be fine if your files really have UTF-8 encoding.
> am I using the wright text converter?
> I'm doing some stuff with regexp, and rewriting the file, these
> characters have no translation to ascii or iso could that be the
Which regular expression library do you use? How are you opening the file
you're writing the output into?
> 2011/8/29 Levente Uzonyi <leves at elte.hu>:
>> On Mon, 29 Aug 2011, Gonzalo Romano wrote:
>>> Hi, I'been working on a script to fix some xml files for a web
>>> application, and I'm having some trouble with character encoding.
>>> Tt seems there are some characters that squeak does not recognize like
>>> "..." -> u2026, u2014, that ms word uses on their text files...
>>> Could anyone confirm this?, and maybe provide a workaround...
>>> thanks in advance!
>> Would you like to display those documents in Squeak or just process the
>> files with a program you wrote?
>> In the first case you have to install and use a font, that contains the
>> missing characters (the default font doesn't contain these).
>> In the second case you have to make sure that you're using the right text
>> converter for your document.
>>> Gonzalo, Romano
> Gonzalo, Romano
More information about the Squeak-dev