Re: speech synthesis

1 May 1998


      On Tue, 28 Apr 1998, Paul Fernhout wrote:
...
Rsynth is public domain I believe. Do you have any plans to put your
work with it under the Squeak license (or something similar)?  If so,
I'd be interested in helping out as a tester or with comments related to
getting it to work efficiently or better under Squeak.
It was just an experiment... I'd like to have the time to do it much 
better. Anyway, everything I do is always public domain, so if you or 
others want it I can post it. I will send it to you and to a Squeak ftp
site the next Tuesday (I don't have it here, sorry) and by the way I'll 
try to improve it a little.
...
How slow is it? What system (processor/speed/memory) are you running it
on? Is it as understandable as the original rsynth itself? Are there
specific Squeak speed related problems (like clicking or drop outs or
stuttering) that could be worked around?
Roughly it needed about 10 or 20 seconds to say 'how are you' at a sampling 
rate of 8 khz in a 486 DX2 66 with 12 MBs running Linux. It sounds like
rsynth, with the exception of intonation (stress), that I have not yet
implemented. It can probably be speed up adding some primitives.
...
Is it possible to use the Smalltalk->C translator to take your work and
build it into the VM (like some current sound primitives)? It would have
to be written with the translator restrictions in mind, or the
translator would have to be expanded to support it.
I tryed to do it, but the translator does not support changing instance 
variables which are not integers (floats, for instance) and don't support 
returns which are not ^self.
...
Festival is only free for non-commercial use I believe. If you used that
code directly, your work could not be part of the general Squeak
distribution under the Squeak license (although I guess it could be
distributed as an add on).
I didn't mean using Festival code directly, but using a diphone 
concatenation approach just in the way Festival and other synthesizers 
(most of them I think) do. The disadvantage of that method is that
it needs a big (between 4 and 12 MBs) base consisting of sampled spoken
units coded in such a way to make it easy to change duration, pitch and
amplitude of voice; these units are concatenated to form words and 
phrases. The advantage of this approach is that the resulting voice is very
natural sounding, in such a way that even can make the synthesizer sing as
in the Lyricos project.
regards,
Luciano.-