speech synthesis

Luciano Esteban Notarfrancesco lnotarfr at dc.uba.ar
Thu Apr 30 22:30:14 UTC 1998


On Tue, 28 Apr 1998, Paul Fernhout wrote:
> Rsynth is public domain I believe. Do you have any plans to put your
> work with it under the Squeak license (or something similar)?  If so,
> I'd be interested in helping out as a tester or with comments related to
> getting it to work efficiently or better under Squeak.  

It was just an experiment... I'd like to have the time to do it much 
better. Anyway, everything I do is always public domain, so if you or 
others want it I can post it. I will send it to you and to a Squeak ftp
site the next Tuesday (I don't have it here, sorry) and by the way I'll 
try to improve it a little.

> How slow is it? What system (processor/speed/memory) are you running it
> on? Is it as understandable as the original rsynth itself? Are there
> specific Squeak speed related problems (like clicking or drop outs or
> stuttering) that could be worked around?

Roughly it needed about 10 or 20 seconds to say 'how are you' at a sampling 
rate of 8 khz in a 486 DX2 66 with 12 MBs running Linux. It sounds like
rsynth, with the exception of intonation (stress), that I have not yet
implemented. It can probably be speed up adding some primitives.

> Is it possible to use the Smalltalk->C translator to take your work and
> build it into the VM (like some current sound primitives)? It would have
> to be written with the translator restrictions in mind, or the
> translator would have to be expanded to support it.

I tryed to do it, but the translator does not support changing instance 
variables which are not integers (floats, for instance) and don't support 
returns which are not ^self.

> Festival is only free for non-commercial use I believe. If you used that
> code directly, your work could not be part of the general Squeak
> distribution under the Squeak license (although I guess it could be
> distributed as an add on).

I didn't mean using Festival code directly, but using a diphone 
concatenation approach just in the way Festival and other synthesizers 
(most of them I think) do. The disadvantage of that method is that
it needs a big (between 4 and 12 MBs) base consisting of sampled spoken
units coded in such a way to make it easy to change duration, pitch and
amplitude of voice; these units are concatenated to form words and 
phrases. The advantage of this approach is that the resulting voice is very
natural sounding, in such a way that even can make the synthesizer sing as
in the Lyricos project.

regards,
Luciano.-





More information about the Squeak-dev mailing list