[squeak-dev] unrecognized unicode characters
Levente Uzonyi
leves at elte.hu
Thu Sep 1 22:39:58 UTC 2011
On Tue, 30 Aug 2011, Gonzalo Romano wrote:
> Hi levente thanks for you answer, the idea was just to process the
> files, I'm sure the files are in utf8, I've used squeakToUtf8 to
> convert the string and write the files, but no luck.
That converter should be fine if your files really have UTF-8 encoding.
>
> am I using the wright text converter?
> I'm doing some stuff with regexp, and rewriting the file, these
> characters have no translation to ascii or iso could that be the
> problem?
Which regular expression library do you use? How are you opening the file
you're writing the output into?
Levente
>
> 2011/8/29 Levente Uzonyi <leves at elte.hu>:
>> On Mon, 29 Aug 2011, Gonzalo Romano wrote:
>>
>>> Hi, I'been working on a script to fix some xml files for a web
>>> application, and I'm having some trouble with character encoding.
>>> Tt seems there are some characters that squeak does not recognize like
>>> "..." -> u2026, u2014, that ms word uses on their text files...
>>> Could anyone confirm this?, and maybe provide a workaround...
>>> thanks in advance!
>>
>> Would you like to display those documents in Squeak or just process the
>> files with a program you wrote?
>> In the first case you have to install and use a font, that contains the
>> missing characters (the default font doesn't contain these).
>> In the second case you have to make sure that you're using the right text
>> converter for your document.
>>
>>
>> Levente
>>
>>>
>>>
>>>
>>> --
>>> Gonzalo, Romano
>>>
>>>
>>
>>
>
>
>
> --
> Gonzalo, Romano
>
>
More information about the Squeak-dev
mailing list
|