[squeak-dev] fastest #isUtf8

Torge Husfeldt torge.husfeldt at gmx.de
Fri Jan 24 22:30:14 UTC 2020


Priceless:

Höhrmann’s finite-state machine

Von meinem iPhone gesendet

> Am 24.01.2020 um 22:15 schrieb Jakob Reschke <forums.jakob at resfarm.de>:
> 
> 
> Hi Chris,
> 
> I don't have an implementation ready.
> But here a start for the research: https://lemire.me/blog/2018/05/09/how-quickly-can-you-check-that-a-string-is-valid-unicode-utf-8/
> Not Smalltalk, but a few approaches.
> 
> Is there any chance to do vectorized computation (using SIMD registers and instructions) from Squeak? Can the JIT compiler generate such code?
> 
> Kind regards,
> Jakob
> 
>> Am Do., 23. Jan. 2020 um 01:46 Uhr schrieb Chris Muller <ma.chris.m at gmail.com>:
>> For the GraphQL engine, I have to validate String inputs as being valid UTF-8.  Before researching it, I thought I'd check whether anyone has already done it and willing to share their implementation.
>> 
>> I see we have conversions for this, but I need the boolean response whether its valid UTF8, as fast as possible.
>> 
>> Thanks!
>> 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200124/00a5c45e/attachment.html>


More information about the Squeak-dev mailing list