[squeak-dev] unrecognized unicode characters

Levente Uzonyi leves at elte.hu
Thu Sep 1 22:39:58 UTC 2011


On Tue, 30 Aug 2011, Gonzalo Romano wrote:

> Hi levente thanks for you answer, the idea was just to process the
> files, I'm sure the files are in utf8, I've used squeakToUtf8 to
> convert the string and write the files, but no luck.

That converter should be fine if your files really have UTF-8 encoding.

>
> am I using the wright text converter?
> I'm doing some stuff with regexp, and rewriting the file, these
> characters have no translation to ascii or iso could that be the
> problem?

Which regular expression library do you use? How are you opening the file 
you're writing the output into?


Levente

>
> 2011/8/29 Levente Uzonyi <leves at elte.hu>:
>> On Mon, 29 Aug 2011, Gonzalo Romano wrote:
>>
>>> Hi, I'been working on a script to fix some xml files for a web
>>> application, and I'm having some trouble with character encoding.
>>> Tt seems there are some characters that squeak does not recognize like
>>> "..." -> u2026,  u2014, that ms word uses on their text files...
>>> Could anyone confirm this?, and maybe provide a workaround...
>>> thanks in advance!
>>
>> Would you like to display those documents in Squeak or just process the
>> files with a program you wrote?
>> In the first case you have to install and use a font, that contains the
>> missing characters (the default font doesn't contain these).
>> In the second case you have to make sure that you're using the right text
>> converter for your document.
>>
>>
>> Levente
>>
>>>
>>>
>>>
>>> --
>>> Gonzalo, Romano
>>>
>>>
>>
>>
>
>
>
> -- 
> Gonzalo, Romano
>
>


More information about the Squeak-dev mailing list