[music-dsp] WSOLA

Discussion:

[music-dsp] WSOLA

Alex Dashevski

2018-05-25 21:06:50 UTC

Hi,

I want to implement WSOLA on Real Time.
The pitch is between 5ms and 20ms.
Frequency samples of the system is 48Khz
Buffer size has 240 sample.
I want to implement it on android.
My issue is that my buffer is smaller than pitch, I can't understand how I
can implement WSOLA.

Thanks,
Alex

robert bristow-johnson

2018-05-26 23:50:59 UTC

Permalink

Post by Alex Dashevski
I want to implement WSOLA on Real Time.
The pitch is between 5ms and 20ms.

do you mean the *period* is between 5 ms and 20 ms? or that the
fundamental frequency is between 50 Hz and 200 Hz? this appears to be a
bass instrument

Post by Alex Dashevski
Frequency samples of the system is 48Khz
Buffer size has 240 sample.

that's not long enough. you will never be able to even do the necessary
pitch detection with a buffer that small. (unless you mean the
input/output buffer of the android, then that is plenty long.)

Post by Alex Dashevski
I want to implement it on android.

then you should have no problem securing a megabyte of memory.

Post by Alex Dashevski
My issue is that my buffer is smaller than pitch,

it's the *period*. pitch is not measured in ms.

Post by Alex Dashevski
I can't understand how I can implement WSOLA.

you can't unless you can allocate more memory. that's a programming
issue with the android.

--
r b-j ***@audioimagination.com

"Imagination is more important than knowledge."

Alex Dashevski

2018-05-27 05:22:28 UTC

Permalink

Hi,
I mean that fundamental frequency is between 50Hz and 4"50Hz. Right?
Why period of pitct isn't equal to 1/fundamental frequency?

what is about of subsampling? That means that proccessing will be done
with 8Kh.

what is about pitch shifting?

How can I prove to my instractor that I can't implementation wsola?

I have already asked this question on ndk android group but they refer me
to this forum.

Thanks,
Alex

Post by robert bristow-johnson

Post by Alex Dashevski
I want to implement WSOLA on Real Time.
The pitch is between 5ms and 20ms.

do you mean the *period* is between 5 ms and 20 ms? or that the
fundamental frequency is between 50 Hz and 200 Hz? this appears to be a
bass instrument

Post by Alex Dashevski
Frequency samples of the system is 48Khz
Buffer size has 240 sample.

that's not long enough. you will never be able to even do the necessary
pitch detection with a buffer that small. (unless you mean the
input/output buffer of the android, then that is plenty long.)

Post by Alex Dashevski
I want to implement it on android.

then you should have no problem securing a megabyte of memory.

Post by Alex Dashevski
My issue is that my buffer is smaller than pitch,

it's the *period*. pitch is not measured in ms.

Post by Alex Dashevski
I can't understand how I can implement WSOLA.

you can't unless you can allocate more memory. that's a programming
issue with the android.
--
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

Phil Burk

2018-05-27 18:41:13 UTC

Permalink

Hello Alex,

The period is 1 / frequency. I think what was confusing is you said that
the pitch was 5 milliseconds. Pitch is normally described either in Hertz
or in semitones.

Also the buffer size that you are referring to is a single buffer used for
reading or writing the audio data. That is not the total size of the
sample. The buffer size use for reading and writing should not affect the
signal processing algorithm because you are basically processing one sample
at a time anyway. You can collect the samples into blocks of data if you
need to. but that is independent of the input output buffer size.

Most smart phones including Android should be fast enough to implement this
algorithm. It should be possible. You might want to start with just reading
a WAV file in, processing the data, then writing a WAV file out. Separate
the reading and writing of the file from the processing algorithm. Then
when you have it working you can just port it to Android.

If you have Android specific questions about the Android APIs then please
use the Android mailing list. If you have mathematical questions about DSP
then this is a better mailing list.

Phil Burk

Post by Alex Dashevski
Hi,
I mean that fundamental frequency is between 50Hz and 4"50Hz. Right?
Why period of pitct isn't equal to 1/fundamental frequency?
what is about of subsampling? That means that proccessing will be done
with 8Kh.
what is about pitch shifting?
How can I prove to my instractor that I can't implementation wsola?
I have already asked this question on ndk android group but they refer me
to this forum.
Thanks,
Alex
On Sun, May 27, 2018, 02:51 robert bristow-johnson <

Post by robert bristow-johnson

Post by Alex Dashevski
I want to implement WSOLA on Real Time.
The pitch is between 5ms and 20ms.

do you mean the *period* is between 5 ms and 20 ms? or that the
fundamental frequency is between 50 Hz and 200 Hz? this appears to be a
bass instrument

Post by Alex Dashevski
Frequency samples of the system is 48Khz
Buffer size has 240 sample.

that's not long enough. you will never be able to even do the necessary
pitch detection with a buffer that small. (unless you mean the
input/output buffer of the android, then that is plenty long.)

Post by Alex Dashevski
I want to implement it on android.

then you should have no problem securing a megabyte of memory.

Post by Alex Dashevski
My issue is that my buffer is smaller than pitch,

it's the *period*. pitch is not measured in ms.

Post by Alex Dashevski
I can't understand how I can implement WSOLA.

dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

Alex Dashevski

2018-05-27 18:56:00 UTC

Permalink

Hi,

I don't understand your answer.
I have already audio echo application on Android. Buffer size and Frequency
sample infuence on latency.
Could you explain me how implement WSOLA on Real-time ? It is a bit more
difficult .

Thanks,
Alex

Post by Phil Burk
Hello Alex,
The period is 1 / frequency. I think what was confusing is you said that
the pitch was 5 milliseconds. Pitch is normally described either in Hertz
or in semitones.
Also the buffer size that you are referring to is a single buffer used for
reading or writing the audio data. That is not the total size of the
sample. The buffer size use for reading and writing should not affect the
signal processing algorithm because you are basically processing one sample
at a time anyway. You can collect the samples into blocks of data if you
need to. but that is independent of the input output buffer size.
Most smart phones including Android should be fast enough to implement
this algorithm. It should be possible. You might want to start with just
reading a WAV file in, processing the data, then writing a WAV file out.
Separate the reading and writing of the file from the processing algorithm.
Then when you have it working you can just port it to Android.
If you have Android specific questions about the Android APIs then please
use the Android mailing list. If you have mathematical questions about DSP
then this is a better mailing list.
Phil Burk

Post by Alex Dashevski
Hi,
I mean that fundamental frequency is between 50Hz and 4"50Hz. Right?
Why period of pitct isn't equal to 1/fundamental frequency?
what is about of subsampling? That means that proccessing will be done
with 8Kh.
what is about pitch shifting?
How can I prove to my instractor that I can't implementation wsola?
I have already asked this question on ndk android group but they refer
me to this forum.
Thanks,
Alex
On Sun, May 27, 2018, 02:51 robert bristow-johnson <

Post by robert bristow-johnson

Post by Alex Dashevski
I want to implement WSOLA on Real Time.
The pitch is between 5ms and 20ms.

do you mean the *period* is between 5 ms and 20 ms? or that the
fundamental frequency is between 50 Hz and 200 Hz? this appears to be a
bass instrument

Post by Alex Dashevski
Frequency samples of the system is 48Khz
Buffer size has 240 sample.

that's not long enough. you will never be able to even do the necessary
pitch detection with a buffer that small. (unless you mean the
input/output buffer of the android, then that is plenty long.)

Post by Alex Dashevski
I want to implement it on android.

then you should have no problem securing a megabyte of memory.

Post by Alex Dashevski
My issue is that my buffer is smaller than pitch,

it's the *period*. pitch is not measured in ms.

Post by Alex Dashevski
I can't understand how I can implement WSOLA.

dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

robert bristow-johnson

2018-05-29 01:19:20 UTC

Permalink

---------------------------- Original Message ----------------------------

Subject: Re: [music-dsp] WSOLA

From: "Alex Dashevski" <***@gmail.com>

Date: Sun, May 27, 2018 2:56 pm

To: ***@mobileer.com

music-***@music.columbia.edu

--------------------------------------------------------------------------

yes WSOLA is a little difficult, but less difficult than a phase-vocoder.

now, when you say "WSOLA" and "Real-time" in the same breath, do you mean a pitch shifter? not a time-scaler, right? because pitch shifting can be done real-time,
but time-scaling has to be done with an input buffer (with some number of samples) getting made into a longer (more samples) or shorter (fewer samples) buffer with the same sample rate. that can't be done on an operation the runs on indefinitely, even with a long throughput delay.
eventually the input and output pointers will collide.
but you can combine time-scaling and resampling (the latter is mathematically well defined) to get pitch shifting that can run on forever. one operation increases the number of samples and the other reduces the number of samples
exactly in reciprocal proportion. so the number of samples coming out every buffer of time is the same as the number going in.
now the "S" in acronym stands for "Similarity", so you have to position the windows in the input waveform to be similar to the waveform in
the output. the waveform in the first-half of the input window should match the similarity to the waveform in the last-half of the output window of the previous frame. normally the frame hop is exactly half of the window width. and the window shape should be complementary like a
Hann window.
i believe that 240 sample buffer in the Android is an input/output sample buffer for the media I/O. you can't really do anything with that buffer except pull in input samples and push out output samples. you will have to (using whatever programming environment one uses
to make Android apps) allocate memory and create your own buffers to hold about 100 ms of sound. in that buffer, you will use a technique called AMDF, ASDF, or autocorrelation to measure waveform similarity. your input frame hop distance (which has both integer and fractional
parts) is the output frame hop size times the reciprocal of the time-stretch factor. so, if you're time-stretching (instead of time-compressing), your input frame will advance more slowly than your output frame, that increases the number of samples. but in that output buffer, you will
resample (interpolate) with a step-size that is time-stretch factor (do this only for output samples that have already been overlapped and added) thus reducing the final output number of samples back to the original number. you will allow some jitter on the input window that is informed
by the result of the waveform similarity analysis.
that's how you do WSOLA, as best as i understand it.
--

r b-j ***@audioimagination.com

"Imagination is more important than knowledge."

Alex Dashevski

2018-05-29 05:22:12 UTC

Permalink

Hi,

I mean WSOLA on RealTime. How can I proof to my instructor that it's not
possible ?

Why do I need to do resampling ? Android sample and resample in the same
frequency(in my case,48Khz). Maybe, do you mean to do a processing with
8Khz(subsample) ?

I also want to achieve the high performance and minimum latency.

How can I proof to my instructor that correct way to implement is pitch
shifting and not WSOLA on* RealTime*?

Thanks,
Alex

Post by robert bristow-johnson
---------------------------- Original Message ----------------------------
Subject: Re: [music-dsp] WSOLA
Date: Sun, May 27, 2018 2:56 pm
--------------------------------------------------------------------------

Post by Alex Dashevski
Hi,
I don't understand your answer.
I have already audio echo application on Android. Buffer size and

Frequency

Post by Alex Dashevski
sample infuence on latency.
Could you explain me how implement WSOLA on Real-time ? It is a bit more
difficult .

yes WSOLA is a little difficult, but less difficult than a phase-vocoder.
now, when you say "WSOLA" and "Real-time" in the same breath, do you mean
a pitch shifter? not a time-scaler, right? because pitch shifting can be
done real-time, but time-scaling has to be done with an input buffer (with
some number of samples) getting made into a longer (more samples) or
shorter (fewer samples) buffer with the same sample rate. that can't be
done on an operation the runs on indefinitely, even with a long throughput
delay. eventually the input and output pointers will collide.
but you can combine time-scaling and resampling (the latter is
mathematically well defined) to get pitch shifting that can run on
forever. one operation increases the number of samples and the other
reduces the number of samples exactly in reciprocal proportion. so the
number of samples coming out every buffer of time is the same as the number
going in.
now the "S" in acronym stands for "Similarity", so you have to position
the windows in the input waveform to be similar to the waveform in the
output. the waveform in the first-half of the input window should match
the similarity to the waveform in the last-half of the output window of the
previous frame. normally the frame hop is exactly half of the window
width. and the window shape should be complementary like a Hann window.
i believe that 240 sample buffer in the Android is an input/output sample
buffer for the media I/O. you can't really do anything with that buffer
except pull in input samples and push out output samples. you will have to
(using whatever programming environment one uses to make Android apps)
allocate memory and create your own buffers to hold about 100 ms of sound.
in that buffer, you will use a technique called AMDF, ASDF, or
autocorrelation to measure waveform similarity. your input frame hop
distance (which has both integer and fractional parts) is the output frame
hop size times the reciprocal of the time-stretch factor. so, if you're
time-stretching (instead of time-compressing), your input frame will
advance more slowly than your output frame, that increases the number of
samples. but in that output buffer, you will resample (interpolate) with a
step-size that is time-stretch factor (do this only for output samples that
have already been overlapped and added) thus reducing the final output
number of samples back to the original number. you will allow some jitter
on the input window that is informed by the result of the waveform
similarity analysis.
that's how you do WSOLA, as best as i understand it.
--
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

robert bristow-johnson

2018-05-29 09:04:24 UTC

Permalink

Do you mean as a time-scaler or as a pitch-shifter?
WSOLA can and does work real-time in a pitch-shifter.Â But a time-scaler can't be real-time whether it's WSOLA or a phase-vocoder.Â Because a real-time process requires the output to process the input indefinitely without the input and output pointers colliding or diverting away from each other indefinitely.Â

--r b-jÂ Â Â Â Â Â Â Â Â Â Â ***@audioimagination.com
"Imagination is more important than knowledge."

-------- Original message --------
From: Alex Dashevski <***@gmail.com>
Date: 5/28/2018 10:22 PM (GMT-08:00)
To: robert bristow-johnson <***@audioimagination.com>, music-***@music.columbia.edu
Subject: Re: [music-dsp] WSOLA

Hi,
I mean WSOLA on RealTime.Â How can I proof to my instructor that it's not possible ?
Why do I need to do resampling ? Android sample and resample in the same frequency(in my case,48Khz). Maybe, do you mean to do a processing with 8Khz(subsample) ?
I also wantÂ toÂ achieve the high performance and minimum latency.
How can IÂ proof to my instructor that correct way to implement is pitch shifting and not WSOLAÂ on RealTime?
Thanks,AlexÂ Â Â
2018-05-29 4:19 GMT+03:00 robert bristow-johnson <***@audioimagination.com>:

---------------------------- Original Message ----------------------------

Subject: Re: [music-dsp] WSOLA

From: "Alex Dashevski" <***@gmail.com>

Date: Sun, May 27, 2018 2:56 pm

To: ***@mobileer.com

music-***@music.columbia.edu

--------------------------------------------------------------------------

Post by Alex Dashevski
Hi,
I don't understand your answer.
I have already audio echo application on Android. Buffer size and Frequency
sample infuence on latency.
Could you explain me how implement WSOLA on Real-time ? It is a bit more
difficult .
yes WSOLA is a little difficult, but less difficult than a phase-vocoder.Â now, when you say "WSOLA" and "Real-time" in the same breath, do you mean a pitch shifter?Â not a time-scaler, right?Â because pitch shifting can be done real-time,

but time-scaling has to be done with an input buffer (with some number of samples) getting made into a longer (more samples) or shorter (fewer samples) buffer with the same sample rate.Â that can't be done on an operation the runs on indefinitely, even with a long throughput delay.Â
eventually the input and output pointers will collide.but you can combine time-scaling and resampling (the latter is mathematically well defined) to get pitch shifting that can run on forever.Â one operation increases the number of samples and the other reduces the number of samples
exactly in reciprocal proportion.Â so the number of samples coming out every buffer of time is the same as the number going in.now the "S" in acronym stands for "Similarity", so you have to position the windows in the input waveform to be similar to the waveform in
the output.Â the waveform in the first-half of the input window should match the similarity to the waveform in the last-half of the output window of the previous frame.Â normally the frame hop is exactly half of the window width.Â and the window shape should be complementary like a
Hann window.i believe that 240 sample buffer in the Android is an input/output sample buffer for the media I/O.Â you can't really do anything with that buffer except pull in input samples and push out output samples.Â you will have to (using whatever programming environment one uses
to make Android apps)Â allocate memory and create your own buffers to hold about 100 ms of sound.Â in that buffer, you will use a technique called AMDF, ASDF, or autocorrelation to measure waveform similarity.Â your input frame hop distance (which has both integer and fractional
parts) is the output frame hop size times the reciprocal of the time-stretch factor.Â so, if you're time-stretching (instead of time-compressing), your input frame will advance more slowly than your output frame, that increases the number of samples.Â but in that output buffer, you will
resample (interpolate) with a step-size that is time-stretch factor (do this only for output samples that have already been overlapped and added) thus reducing the final output number of samples back to the original number.Â Â you will allow some jitter on the input window that is informed
by the result of the waveform similarity analysis.that's how you do WSOLA, as best as i understand it.--

r b-jÂ Â Â Â Â Â Â Â Â Â Â Â Â ***@audioimagination.com

"Imagination is more important than knowledge."

Â Â Â Â
_______________________________________________

dupswapdrop: music-dsp mailing list

music-***@music.columbia.edu

https://lists.columbia.edu/mailman/listinfo/music-dsp

Alex Dashevski

2018-05-29 09:22:17 UTC

Permalink

Hi,

From what I understood, WSOLA is algorithm that should work on Time domain.
Pitch shifting is a technique that should work on Frequency domain.
Thus, I don't understand your answer.
Could you explain in a more details what I need to do ?

Thanks,
Alex

Post by robert bristow-johnson
Do you mean as a time-scaler or as a pitch-shifter?
WSOLA can and does work real-time in a pitch-shifter. But a time-scaler
can't be real-time whether it's WSOLA or a phase-vocoder. Because a
real-time process requires the output to process the input indefinitely
without the input and output pointers colliding or diverting away from each
other indefinitely.
--
"Imagination is more important than knowledge."
-------- Original message --------
Date: 5/28/2018 10:22 PM (GMT-08:00)
Subject: Re: [music-dsp] WSOLA
Hi,
I mean WSOLA on RealTime. How can I proof to my instructor that it's not
possible ?
Why do I need to do resampling ? Android sample and resample in the same
frequency(in my case,48Khz). Maybe, do you mean to do a processing with
8Khz(subsample) ?
I also want to achieve the high performance and minimum latency.
How can I proof to my instructor that correct way to implement is pitch
shifting and not WSOLA on* RealTime*?
Thanks,
Alex

Post by Alex Dashevski
Hi,
I don't understand your answer.
I have already audio echo application on Android. Buffer size and

Frequency

Post by Alex Dashevski
sample infuence on latency.
Could you explain me how implement WSOLA on Real-time ? It is a bit more
difficult .

yes WSOLA is a little difficult, but less difficult than a phase-vocoder.
now, when you say "WSOLA" and "Real-time" in the same breath, do you mean
a pitch shifter? not a time-scaler, right? because pitch shifting can be
done real-time, but time-scaling has to be done with an input buffer (with
some number of samples) getting made into a longer (more samples) or
shorter (fewer samples) buffer with the same sample rate. that can't be
done on an operation the runs on indefinitely, even with a long throughput
delay. eventually the input and output pointers will collide.
but you can combine time-scaling and resampling (the latter is
mathematically well defined) to get pitch shifting that can run on
forever. one operation increases the number of samples and the other
reduces the number of samples exactly in reciprocal proportion. so the
number of samples coming out every buffer of time is the same as the number
going in.
now the "S" in acronym stands for "Similarity", so you have to position
the windows in the input waveform to be similar to the waveform in the
output. the waveform in the first-half of the input window should match
the similarity to the waveform in the last-half of the output window of the
previous frame. normally the frame hop is exactly half of the window
width. and the window shape should be complementary like a Hann window.
i believe that 240 sample buffer in the Android is an input/output sample
buffer for the media I/O. you can't really do anything with that buffer
except pull in input samples and push out output samples. you will have to
(using whatever programming environment one uses to make Android apps)
allocate memory and create your own buffers to hold about 100 ms of sound.
in that buffer, you will use a technique called AMDF, ASDF, or
autocorrelation to measure waveform similarity. your input frame hop
distance (which has both integer and fractional parts) is the output frame
hop size times the reciprocal of the time-stretch factor. so, if you're
time-stretching (instead of time-compressing), your input frame will
advance more slowly than your output frame, that increases the number of
samples. but in that output buffer, you will resample (interpolate) with a
step-size that is time-stretch factor (do this only for output samples that
have already been overlapped and added) thus reducing the final output
number of samples back to the original number. you will allow some jitter
on the input window that is informed by the result of the waveform
similarity analysis.
that's how you do WSOLA, as best as i understand it.
--
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

Eder Souza

2018-05-29 14:04:28 UTC

Permalink

WSOLA is an Time Domain algorithm, Pitch shifters can works in time domain
too, there are some ways to do this...

WSOLA is just one time-scaler, but you can pitch shift combining WSOLA and
Resample:

- Change the time using WSOLA (ex. time scale by 2.0)
- Use some interpolation(ex. resample scale by 1/2.0=0.5 )

The steps example time scale your signal by one factor of two(2.0) and then
you resample the time scaled signal by 0.5, this give you one pitch shifted
signal, that is one example of pitch shift that not works in frequency
domain (important this not keep the formants)...

You want use WSOLA in real-time to do Time-Scale ? or use WSOLA in
real-time to apply resample after apply Time-Scale(pitch shift) ? What is
your input data (audio files or microphone) ?

The first option can not be possible in real-time using microphone input.

If you want the first option using audio files as input, you can control
the time-scale factor at real time, changing the tempo of the output audio
and listening, do you need build a ring buffer to control how read and
write data position of your input/output .

The second option using audio files or microphones can be done using a ring
buffer too, for microphone input do you need save some data to precess,
this can give you some delay output.

So what option are you trying ?

Regards,

Eder

âªâ« â«âª
â â â â â â â â â â â â â â â â
Sent From The Moon and Written With My Thumbs !

Post by Alex Dashevski
Hi,
From what I understood, WSOLA is algorithm that should work on Time
domain. Pitch shifting is a technique that should work on Frequency domain.
Thus, I don't understand your answer.
Could you explain in a more details what I need to do ?
Thanks,
Alex
2018-05-29 12:04 GMT+03:00 robert bristow-johnson <

Post by robert bristow-johnson
---------------------------- Original Message
----------------------------
Subject: Re: [music-dsp] WSOLA
Date: Sun, May 27, 2018 2:56 pm
------------------------------------------------------------
--------------

Post by Alex Dashevski
Hi,
I don't understand your answer.
I have already audio echo application on Android. Buffer size and

Frequency

Post by Alex Dashevski
sample infuence on latency.
Could you explain me how implement WSOLA on Real-time ? It is a bit

Post by Alex Dashevski
difficult .

yes WSOLA is a little difficult, but less difficult than a phase-vocoder.
now, when you say "WSOLA" and "Real-time" in the same breath, do you
mean a pitch shifter? not a time-scaler, right? because pitch shifting
can be done real-time, but time-scaling has to be done with an input buffer
(with some number of samples) getting made into a longer (more samples) or
shorter (fewer samples) buffer with the same sample rate. that can't be
done on an operation the runs on indefinitely, even with a long throughput
delay. eventually the input and output pointers will collide.
but you can combine time-scaling and resampling (the latter is
mathematically well defined) to get pitch shifting that can run on
forever. one operation increases the number of samples and the other
reduces the number of samples exactly in reciprocal proportion. so the
number of samples coming out every buffer of time is the same as the number
going in.
now the "S" in acronym stands for "Similarity", so you have to position
the windows in the input waveform to be similar to the waveform in the
output. the waveform in the first-half of the input window should match
the similarity to the waveform in the last-half of the output window of the
previous frame. normally the frame hop is exactly half of the window
width. and the window shape should be complementary like a Hann window.
i believe that 240 sample buffer in the Android is an input/output
sample buffer for the media I/O. you can't really do anything with that
buffer except pull in input samples and push out output samples. you will
have to (using whatever programming environment one uses to make Android
apps) allocate memory and create your own buffers to hold about 100 ms of
sound. in that buffer, you will use a technique called AMDF, ASDF, or
autocorrelation to measure waveform similarity. your input frame hop
distance (which has both integer and fractional parts) is the output frame
hop size times the reciprocal of the time-stretch factor. so, if you're
time-stretching (instead of time-compressing), your input frame will
advance more slowly than your output frame, that increases the number of
samples. but in that output buffer, you will resample (interpolate) with a
step-size that is time-stretch factor (do this only for output samples that
have already been overlapped and added) thus reducing the final output
number of samples back to the original number. you will allow some jitter
on the input window that is informed by the result of the waveform
similarity analysis.
that's how you do WSOLA, as best as i understand it.
--
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

alex dashevski

2018-05-29 16:26:24 UTC

Permalink

Hi,

My input is microphone. Then , I need the second option.

Could you give me reference with example and code so that I can understand how to implement it ?

Thanks,

Alex

2018-05-29 17:04 GMT+03:00 Eder Souza <***@gmail.com <mailto:***@gmail.com> >:

WSOLA is an Time Domain algorithm, Pitch shifters can works in time domain too, there are some ways to do this...

WSOLA is just one time-scaler, but you can pitch shift combining WSOLA and Resample:

- Change the time using WSOLA (ex. time scale by 2.0)

- Use some interpolation(ex. resample scale by 1/2.0=0.5 )

The steps example time scale your signal by one factor of two(2.0) and then you resample the time scaled signal by 0.5, this give you one pitch shifted signal, that is one example of pitch shift that not works in frequency domain (important this not keep the formants)...

You want use WSOLA in real-time to do Time-Scale ? or use WSOLA in real-time to apply resample after apply Time-Scale(pitch shift) ? What is your input data (audio files or microphone) ?

The first option can not be possible in real-time using microphone input.

If you want the first option using audio files as input, you can control the time-scale factor at real time, changing the tempo of the output audio and listening, do you need build a ring buffer to control how read and write data position of your input/output .

The second option using audio files or microphones can be done using a ring buffer too, for microphone input do you need save some data to precess, this can give you some delay output.

So what option are you trying ?

Regards,

Eder

âªâ« â«âª

â â â â â â â â â â â â â â â â

Sent From The Moon and Written With My Thumbs !

On Tue, May 29, 2018 at 6:22 AM, Alex Dashevski <***@gmail.com <mailto:***@gmail.com> > wrote:

Hi,

From what I understood, WSOLA is algorithm that should work on Time domain. Pitch shifting is a technique that should work on Frequency domain.

Thus, I don't understand your answer.

Could you explain in a more details what I need to do ?

Thanks,

Alex

2018-05-29 12:04 GMT+03:00 robert bristow-johnson <***@audioimagination.com <mailto:***@audioimagination.com> >:

Do you mean as a time-scaler or as a pitch-shifter?

WSOLA can and does work real-time in a pitch-shifter. But a time-scaler can't be real-time whether it's WSOLA or a phase-vocoder. Because a real-time process requires the output to process the input indefinitely without the input and output pointers colliding or diverting away from each other indefinitely.

--

r b-j ***@audioimagination.com <mailto:***@audioimagination.com>

"Imagination is more important than knowledge."

-------- Original message --------
From: Alex Dashevski <***@gmail.com <mailto:***@gmail.com> >

Date: 5/28/2018 10:22 PM (GMT-08:00)
To: robert bristow-johnson <***@audioimagination.com <mailto:***@audioimagination.com> >, music-***@music.columbia.edu <mailto:music-***@music.columbia.edu>
Subject: Re: [music-dsp] WSOLA

Hi,

I mean WSOLA on RealTime. How can I proof to my instructor that it's not possible ?

Why do I need to do resampling ? Android sample and resample in the same frequency(in my case,48Khz). Maybe, do you mean to do a processing with 8Khz(subsample) ?

I also want to achieve the high performance and minimum latency.

How can I proof to my instructor that correct way to implement is pitch shifting and not WSOLA on RealTime?

Thanks,

Alex

2018-05-29 4:19 GMT+03:00 robert bristow-johnson <***@audioimagination.com <mailto:***@audioimagination.com> >:

---------------------------- Original Message ----------------------------
Subject: Re: [music-dsp] WSOLA
From: "Alex Dashevski" <***@gmail.com <mailto:***@gmail.com> >
Date: Sun, May 27, 2018 2:56 pm
To: ***@mobileer.com <mailto:***@mobileer.com>
music-***@music.columbia.edu <mailto:music-***@music.columbia.edu>
--------------------------------------------------------------------------

yes WSOLA is a little difficult, but less difficult than a phase-vocoder.

now, when you say "WSOLA" and "Real-time" in the same breath, do you mean a pitch shifter? not a time-scaler, right? because pitch shifting can be done real-time, but time-scaling has to be done with an input buffer (with some number of samples) getting made into a longer (more samples) or shorter (fewer samples) buffer with the same sample rate. that can't be done on an operation the runs on indefinitely, even with a long throughput delay. eventually the input and output pointers will collide.

but you can combine time-scaling and resampling (the latter is mathematically well defined) to get pitch shifting that can run on forever. one operation increases the number of samples and the other reduces the number of samples exactly in reciprocal proportion. so the number of samples coming out every buffer of time is the same as the number going in.

now the "S" in acronym stands for "Similarity", so you have to position the windows in the input waveform to be similar to the waveform in the output. the waveform in the first-half of the input window should match the similarity to the waveform in the last-half of the output window of the previous frame. normally the frame hop is exactly half of the window width. and the window shape should be complementary like a Hann window.

i believe that 240 sample buffer in the Android is an input/output sample buffer for the media I/O. you can't really do anything with that buffer except pull in input samples and push out output samples. you will have to (using whatever programming environment one uses to make Android apps) allocate memory and create your own buffers to hold about 100 ms of sound. in that buffer, you will use a technique called AMDF, ASDF, or autocorrelation to measure waveform similarity. your input frame hop distance (which has both integer and fractional parts) is the output frame hop size times the reciprocal of the time-stretch factor. so, if you're time-stretching (instead of time-compressing), your input frame will advance more slowly than your output frame, that increases the number of samples. but in that output buffer, you will resample (interpolate) with a step-size that is time-stretch factor (do this only for output samples that have already been overlapped and added) thus reducing the final output number of samples back to the original number. you will allow some jitter on the input window that is informed by the result of the waveform similarity analysis.

that's how you do WSOLA, as best as i understand it.

--

r b-j ***@audioimagination.com <mailto:***@audioimagination.com>

"Imagination is more important than knowledge."

_______________________________________________
dupswapdrop: music-dsp mailing list
music-***@music.columbia.edu <mailto:music-***@music.columbia.edu>
https://lists.columbia.edu/mailman/listinfo/music-dsp

_______________________________________________
dupswapdrop: music-dsp mailing list
music-***@music.columbia.edu <mailto:music-***@music.columbia.edu>
https://lists.columbia.edu/mailman/listinfo/music-dsp

robert bristow-johnson

2018-05-29 17:45:32 UTC

Permalink

---------------------------- Original Message ----------------------------

Subject: Re: [music-dsp] WSOLA

From: "Alex Dashevski" <***@gmail.com>

Date: Tue, May 29, 2018 5:22 am

To: music-***@music.columbia.edu

--------------------------------------------------------------------------

Post by Alex Dashevski
Hi,

From what I understood, WSOLA is algorithm that should work on Time domain.

Post by Alex Dashevski
Pitch shifting is a technique that should work on Frequency domain.

that's mistaken. most pitch shifting algorithms in hardware (like the Eventide, the Lexicon PCM-90, Digitech) are time-domain algorithms.

Post by Alex Dashevski
Thus, I don't understand your answer.
Could you explain in a more details what I need to do ?

in my answer of May 27, i did explain WSOLA to the best of my ability.
to do something with your Android, first you need to be able to pass samples from the input buffer to the output buffer. then you need to figure out
the standard way of creating an app where you can allocate memory in your app *and* get access to those input and output buffers built in (i believe those are the 240 sample buffers you mention). this is *not* a small project. and you're not even at square 2 if you cannot pass audio
through an app of your own. you gotta be able to do that. i have never programmed either an Android or an iPhone.
sorry, at this point this is the best advise i can yell from the helicopter (while you're down in the fray).

Post by Alex Dashevski
Thanks,

for what it's worth...

--

r b-j ***@audioimagination.com

"Imagination is more important than knowledge."