[squeak-dev] Re: a diacritics free version of a string

stephan at stack.nl stephan at stack.nl
Wed Jun 3 11:23:47 UTC 2009


Philippe wrote:
> The Unicode solution would be to do normalization with full
> decomposition and then a regex on \p{InCombiningDiacriticalMarks} and
> replace it with an empty string or something similar.

I don't think that is enough. I think the normalization is language dependent.
o-umlaut is replaced by oe in German, but the equivalent in Dutch is o.

Stephan



More information about the Squeak-dev mailing list