Discussion:
Formant shifting
(too old to reply)
b***@anthropics.com
2001-10-26 09:24:38 UTC
Permalink
Hi,

I'm interested in approaches to applying the characteristics of one voice to
another, perhaps through the use of analysis and resynthesis techniques.
What research has been done in this area? - any references to books/web
pages/papers/etc would be much appreciated. One could for example use
formant shifting techniques - could anyone tell me about resources on this?
Is there any source code available anywhere on the web?

Thanks very much,
Ben

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Peter Maas
2001-10-26 10:03:21 UTC
Permalink
perhaps you should have a look at the following link:

http://fonsg3.let.uva.nl/praat/manual/Praat_program.html


gr,

P

----- Original Message -----
From: <***@anthropics.com>
To: <music-***@shoko.calarts.edu>
Sent: Friday, October 26, 2001 11:24 AM
Subject: [music-dsp] Formant shifting
Post by b***@anthropics.com
Hi,
I'm interested in approaches to applying the characteristics of one voice to
another, perhaps through the use of analysis and resynthesis techniques.
What research has been done in this area? - any references to books/web
pages/papers/etc would be much appreciated. One could for example use
formant shifting techniques - could anyone tell me about resources on this?
Is there any source code available anywhere on the web?
Thanks very much,
Ben
dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Ian Lewis
2001-10-26 16:20:20 UTC
Permalink
A decent vocoder will go a long way toward accomplishing this. Depends on
which characteristics you want to transfer, what your source material is,
and how accurate you want the final sound to be.
Ian
-----Original Message-----
Sent: Friday, October 26, 2001 3:25 AM
Subject: [music-dsp] Formant shifting
Hi,
I'm interested in approaches to applying the characteristics
of one voice to
another, perhaps through the use of analysis and resynthesis
techniques.
What research has been done in this area? - any references to
books/web
pages/papers/etc would be much appreciated. One could for example use
formant shifting techniques - could anyone tell me about
resources on this?
Is there any source code available anywhere on the web?
Thanks very much,
Ben
subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Svante Stadler
2001-10-26 19:16:07 UTC
Permalink
Analysis and resynthesis is usually done with LPC, which finds the filter
that turns a short piece of the signal into a white noise AR process. This
is closely related to the source/filter model of the voice, which (usually)
assumes that the voice excitation is a pulse train and the human
articulation system is an all-pole transfer function. This works pretty
good, at least for non-nasal sounds.

The standard formant preservation technique, used by Steinberg and others,
is a pitch-tracking algoritm that finds the individual FOFs (formant wave
packets) and changes the interval between them, thus changing the pulse
train but keeping the articulatory transfer function. Obviously it only
works with monophonic signals, otherwise your pitch tracking algorithm will
be confused.

I have done some work on formant preservation for arbitrary signals, mainly
with LPC analysis, but with little success. To algoritmically separate
'tones' from 'resonances' is problematic. If someone solves this problem
(i.e. me) I think it could be the basis for some really cool
samplers/synthesizers.

/Svante.
Post by b***@anthropics.com
Hi,
I'm interested in approaches to applying the characteristics of one voice
to
another, perhaps through the use of analysis and resynthesis techniques.
What research has been done in this area? - any references to books/web
pages/papers/etc would be much appreciated. One could for example use
formant shifting techniques - could anyone tell me about resources on this?
Is there any source code available anywhere on the web?
Thanks very much,
Ben
dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Yaakov Stein
2001-10-28 13:54:53 UTC
Permalink
Post by b***@anthropics.com
I'm interested in approaches to applying the characteristics
of one voice to another, perhaps through the use of analysis and
resynthesis
Post by b***@anthropics.com
techniques.
What research has been done in this area? - any references to
books/web pages/papers/etc would be much appreciated. One could for
example use
Post by b***@anthropics.com
formant shifting techniques - could anyone tell me about
resources on this?
Is there any source code available anywhere on the web?
Try looking for "voice fonts".

The successful techniques I have heard are all based on STC (sinusoidal
decomposition),
rather than LPC based modeling. If I remember correctly this traces back to
an article by McCandless,
but I don't have the reference here right now.

If you have problems tracing it down email me and I will look it up for you.

Jonathan (Y) Stein ***@dspcsp.com

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
b***@anthropics.com
2001-10-29 12:27:48 UTC
Permalink
Thanks very much for eveyones responses so far.

Perhaps I could outline my requirements more precisely : the source will be
'telephone' quality speech, and the output should be of similar quality. It
is important that the output does not sound synthetic - it is preferable for
the audio not to be changed, rather than to sound like a robot (I realise
this is likely to be a trade-off). The characteristics I wish to transfer
from one voice to another are pitch, formants, breathiness, etc.

I would be very grateful for any references on STC (sinusoidal
decomposition) from Johnathan (or anyone else)?

Thanks also for the pointer to ATTs text to speech technology
(http://www.naturalvoices.att.com/products/faq.html). It's the most
impressive text-to-speech I've heard (I think they're using a similar
technique to an Edinburgh (UK) based group). However, I don't see this
complete analysis and re-synthesis technique as being suitable for me, the
result doesn't sound human enough. I may also wish to preserve some other
aspects of the original speakers performance.

Are there any audio files on the net which demonstrate the quality which can
be achieved through the LPC analysis and resynthesis technique - or any
other technique?

Any further information would be greatly appreciated.

Thanks
Ben


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
piter pasma ..ritz
2001-10-29 14:18:18 UTC
Permalink
a while back, while i was reading up on wavelets and compression
technology, i stumbled upon a paper that talked about using wavelets
to emulate another voice using a sample, or something.. [i think they
tried to make someones voice sound like captain kirk from startrek]

i have lost the url, but the above should provide enough information
to find the exact article with google, which was the searchengine
i used to find the article, so it should be there somewhere, or
perhaps on a bookmarks-page found with google... happy +searching~ :)
Post by b***@anthropics.com
Thanks very much for eveyones responses so far.
Perhaps I could outline my requirements more precisely : the source
will be
'telephone' quality speech, and the output should be of similar
quality. It
is important that the output does not sound synthetic - it is
preferable for
the audio not to be changed, rather than to sound like a robot (I
realise
this is likely to be a trade-off). The characteristics I wish to
transfer
from one voice to another are pitch, formants, breathiness, etc.
I would be very grateful for any references on STC (sinusoidal
decomposition) from Johnathan (or anyone else)?
Thanks also for the pointer to ATTs text to speech technology
(http://www.naturalvoices.att.com/products/faq.html). It's the
most
impressive text-to-speech I've heard (I think they're using a
similar
technique to an Edinburgh (UK) based group). However, I don't see
this
complete analysis and re-synthesis technique as being suitable for
me, the
result doesn't sound human enough. I may also wish to preserve
some other
aspects of the original speakers performance.
Are there any audio files on the net which demonstrate the quality
which can
be achieved through the LPC analysis and resynthesis technique - or
any
other technique?
Any further information would be greatly appreciated.
Thanks
Ben
dupswapdrop -- the music-dsp mailing list and website: subscription
info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
=====
. piter pasma ....
.. ritz[nd-44-zh] ...
... ritz_rvl(at)yahoo(dot)com ..
.... http://www.ritz.nd-44-zh.net .

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Julian Rohrhuber
2001-10-29 15:55:36 UTC
Permalink
Post by piter pasma ..ritz
a while back, while i was reading up on wavelets and compression
technology, i stumbled upon a paper that talked about using wavelets
to emulate another voice using a sample, or something.. [i think they
tried to make someones voice sound like captain kirk from startrek]
i have lost the url, but the above should provide enough information
to find the exact article with google, which was the searchengine
i used to find the article, so it should be there somewhere, or
perhaps on a bookmarks-page found with google... happy +searching~ :)
you could also have a look on the xavier serra homepage
(http://www.iua.upf.es/~xserra/)


the presentation the icmc2000 was very funny and convincing.

there is a pdf on that site:
(http://www.iua.upf.es/~xserra/articles/icmc-00/voice-morphing.pdf)
--
**

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
piter pasma ..ritz
2001-10-29 14:27:54 UTC
Permalink
Post by b***@anthropics.com
Thanks very much for eveyones responses so far.
ok, so i couldn't resist, and searched the page i was telling
you about myself.. [at least i knew what to look for :) ]
http://www.amara.com/current/wavelet.html#soundfun
hope it helps ya.

- ritz

=====
. piter pasma ....
.. ritz[nd-44-zh] ...
... ritz_rvl(at)yahoo(dot)com ..
.... http://www.ritz.nd-44-zh.net .

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
=?X-UNKNOWN?Q?T=F3th_L=E1szl=F3?=
2001-10-29 16:59:23 UTC
Permalink
Post by Julian Rohrhuber
the presentation the icmc2000 was very funny and convincing.
(http://www.iua.upf.es/~xserra/articles/icmc-00/voice-morphing.pdf)
I think this paper is about morphing two _given_ voices into each other
and not about modifying one voice with preserving the formants...

Formant preserving in sinusodial modeling seems pretty awkward to me, as
the sinusoidal model does not model formants in any sense...

Laszlo Toth
Hungarian Academy of Sciences *
Research Group on Artificial Intelligence * "Failure only begins
e-mail: ***@inf.u-szeged.hu * when you stop trying"
http://www.inf.u-szeged.hu/~tothl *



dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Svante Stadler
2001-10-29 17:32:02 UTC
Permalink
I remember reading something quite similar to what you're looking for. I
can't seem to find the link, but I think it might have been done by a
graduate student at Cambridge. I've put the files "male.au" and "female.au"
in this folder:

http://home.swipnet.se/funken/speech/

One of the files was made by processing the other with a speech database. I
don't really know which one is the original, that's how 'unrobotic' it is.
If I'm not mistaken, it is done with LPC.

/Svante
Post by b***@anthropics.com
Perhaps I could outline my requirements more precisely : the source will be
'telephone' quality speech, and the output should be of similar quality.
It
is important that the output does not sound synthetic - it is preferable
for
the audio not to be changed, rather than to sound like a robot (I realise
this is likely to be a trade-off). The characteristics I wish to transfer
from one voice to another are pitch, formants, breathiness, etc.
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Christopher Weare
2001-10-29 17:57:21 UTC
Permalink
My vote says the male is synthetic. Sounds like it could be lpc. Nice
job though.

-chris

-----Original Message-----
From: Svante Stadler [mailto:***@hotmail.com]
Sent: Monday, October 29, 2001 9:32 AM
To: music-***@shoko.calarts.edu
Subject: RE: [music-dsp] Formant shifting


I remember reading something quite similar to what you're looking for. I

can't seem to find the link, but I think it might have been done by a
graduate student at Cambridge. I've put the files "male.au" and
"female.au"
in this folder:

http://home.swipnet.se/funken/speech/

One of the files was made by processing the other with a speech
database. I
don't really know which one is the original, that's how 'unrobotic' it
is.
If I'm not mistaken, it is done with LPC.

/Svante
Post by b***@anthropics.com
Perhaps I could outline my requirements more precisely : the source
will be
Post by b***@anthropics.com
'telephone' quality speech, and the output should be of similar
quality.
Post by b***@anthropics.com
It
is important that the output does not sound synthetic - it is
preferable
Post by b***@anthropics.com
for
the audio not to be changed, rather than to sound like a robot (I
realise
Post by b***@anthropics.com
this is likely to be a trade-off). The characteristics I wish to
transfer
Post by b***@anthropics.com
from one voice to another are pitch, formants, breathiness, etc.
_________________________________________________________________
Get your FREE download of MSN Explorer at
http://explorer.msn.com/intl.asp


dupswapdrop -- the music-dsp mailing list and website: subscription
info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
b***@anthropics.com
2001-10-31 09:59:06 UTC
Permalink
Thanks for the responses. The example speech Svante put up does sound
impressive - my guess is also that male.au was formed from female.au.

If you know of any pointers or links to this research then I'd be very if
interested to read it. I'd also be interested to hear any more information
anyone else has on the subject of voice transormation, particularly for
systems which do not require large amounts of training time.

Thanks very much,
Ben
Post by Svante Stadler
I remember reading something quite similar to what you're looking for. I
can't seem to find the link, but I think it might have been done by a
graduate student at Cambridge. I've put the files "male.au" and "female.au"
http://home.swipnet.se/funken/speech/
One of the files was made by processing the other with a speech database. I
don't really know which one is the original, that's how 'unrobotic' it is.
If I'm not mistaken, it is done with LPC.
dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Sciss
2001-11-01 19:56:51 UTC
Permalink
I don't know, maybe this could be interesting: Oppenheim/Schafer propose
some generalized system models called "homomorphic transforms" (or
similar); you can use them in some cases to do a kind of deconvolution
if the spectra of the spectra ;----) i.e. the "cepstra" do not overlap.
this may be the case for speech and the source/filter model where the
formant filter spectrum is very slowly varying and the pulse train
spectrum varies fast.

then taking the FFT of the signal, taking the complex logarithm (which
turns the multiplication of the convolution into addition) and low- or
highpass filtering this "cepstrum", followed by the inverse process
(take exp() and IFFT) may separate source and filter...? i have just
programmed this a couple of days ago but not yet tried it with speech;
but don't be too optimistic, i got a lot of artifacts and a lot of
noise. also you have to be aware of a couple of problems:

filtering the spectrum just by setting some coefficients of the ceptrum
to zero introduces great frequency aliasing; i use a windowed
sync-lowpass but this is very slow of course. also taking the complex
logarithm (which is in fact a rectangular->polar transform followed by
taking the log. of the amplitude) is tricky because log 0 = NaN and
because you must properly "unwrap" the phases and do a linear correction
to make it differentiable. and of course you must work in high
resolution (i use 32bit floating point, but 64 would most certainly be
better). last but not least changing the gain of the cepstra coeffs
introduces distortion (this could be the main point about my artifacts
and noise?)...

just an idea...

ciao,
hanns holger


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
Kirill 'Big K' Katsnelson
2001-11-02 00:08:06 UTC
Permalink
Some time ago, Sciss wrote...
Post by Sciss
last but not least changing the gain of the cepstra coeffs
introduces distortion (this could be the main point about my artifacts
and noise?)...
Since cepstrum is logarithmical, multiplying by a constant is a non-linear
transform, while adding a constant is :) The constant can be complex,
of course.

Anyway, let us know how are you advancing in your research. I was
experimenting with cepstral analysis for speech recognition years ago,
and it is a quite exciting field with often very unusual results :)

-kkm


dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Loading...