Discussion:
[music-dsp] Resampling
Alex Dashevski
2018-10-03 09:16:56 UTC
Permalink
Hi,

I use a sample rate :48Khz and buffer size = 240 samples.
I made pitch shifting with WSOLA and resampling.
But pitch duration is ~20ms then I need decrease rate sample or increase
buffer size. As a result of it, the delay will increase.
if I do resampling before and after processing. for example, 48Khz -> 8Khz
and then 8Khz -> 48Khz then will it help ?
If so, how should do ? I don't understand why I need filter, This is to
prevent alias but I can't understand why ?

Is there option to decrease latency or delay ?

Thanks,
Alex
Spencer Jackson
2018-10-03 17:55:58 UTC
Permalink
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz -> 8Khz
and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you fewer
samples to process in the same amount of time but just downsampling and
then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I can't
understand why ?
Technically you only need a filter if your signal has information above
the nyquist frequency of the lowest rate but this is not usually the case.
I think wikipedia explains aliasing pretty well:
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions . Once
the high frequency information aliases it cannot be recovered by resampling
back to the higher rate and your lower band information is now mixed in
with the aliased information. The filter removes this high freqency data so
that the low band stays clean through the whole process.


Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.

I'm not sure I understand what you mean by the pitch duration requiring a
buffer-resize or sample-rate decrease. WSOLA creates a signal with more
samples than the input, you must resample that (usually by a non-integer
amount) to make it the correct number of samples then output that, and
reload your buffer with the next block of input data. Please clarify if you
mean some other issue.

_Spencer
Alex Dashevski
2018-10-03 20:45:37 UTC
Permalink
I wrote on android ndk and there is fastpath concept. Thus, I think that
resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to 8Khz
and 8Khz to 48Khz.
I found this:
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,

Thanks,
Alex


‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-20:56 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz ->
8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you fewer
samples to process in the same amount of time but just downsampling and
then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information above
the nyquist frequency of the lowest rate but this is not usually the case.
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions .
Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration requiring a
buffer-resize or sample-rate decrease. WSOLA creates a signal with more
samples than the input, you must resample that (usually by a non-integer
amount) to make it the correct number of samples then output that, and
reload your buffer with the next block of input data. Please clarify if you
mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Spencer Jackson
2018-10-03 20:51:03 UTC
Permalink
I have only used libraries for resampling myself. I haven't looked at their
source, but it's available. The two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
and
https://kokkinizita.linuxaudio.org/linuxaudio/zita-resampler/resampler.html

perhaps they can give you some insight.
Post by Alex Dashevski
I wrote on android ndk and there is fastpath concept. Thus, I think that
resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to 8Khz
and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-20:56 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz ->
8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you fewer
samples to process in the same amount of time but just downsampling and
then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information above
the nyquist frequency of the lowest rate but this is not usually the case.
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions .
Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration requiring a
buffer-resize or sample-rate decrease. WSOLA creates a signal with more
samples than the input, you must resample that (usually by a non-integer
amount) to make it the correct number of samples then output that, and
reload your buffer with the next block of input data. Please clarify if you
mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 08:10:34 UTC
Permalink
Hi,
Let's assume that my system has sample rate = 48Khz and audio buffer size =
240 samples. It should be on RealTime.
Can I do that:

1. Dowsampe to 8Khz and buffer size should be 240*6
2. To do proccessing on buffer 240*6 with 8Khz sample rate.
3. Upsample to 48khz with original buffer size.

Thanks,
Alex


‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-23:51 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
I have only used libraries for resampling myself. I haven't looked at
their source, but it's available. The two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
and
https://kokkinizita.linuxaudio.org/linuxaudio/zita-resampler/resampler.html
perhaps they can give you some insight.
Post by Alex Dashevski
I wrote on android ndk and there is fastpath concept. Thus, I think that
resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to 8Khz
and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-20:56 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz ->
8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you
fewer samples to process in the same amount of time but just downsampling
and then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information above
the nyquist frequency of the lowest rate but this is not usually the case.
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions .
Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration requiring
a buffer-resize or sample-rate decrease. WSOLA creates a signal with more
samples than the input, you must resample that (usually by a non-integer
amount) to make it the correct number of samples then output that, and
reload your buffer with the next block of input data. Please clarify if you
mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Ethan Fenn
2018-10-06 11:10:57 UTC
Permalink
You've got it backwards -- downsample means fewer samples. If you have a
240-sample buffer at 48kHz, then resample to 8kHz, you'll have 240/6=40
samples.

-Ethan
Post by Alex Dashevski
Hi,
Let's assume that my system has sample rate = 48Khz and audio buffer size
= 240 samples. It should be on RealTime.
1. Dowsampe to 8Khz and buffer size should be 240*6
2. To do proccessing on buffer 240*6 with 8Khz sample rate.
3. Upsample to 48khz with original buffer size.
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-23:51 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
I have only used libraries for resampling myself. I haven't looked at
their source, but it's available. The two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
and
https://kokkinizita.linuxaudio.org/linuxaudio/
zita-resampler/resampler.html
perhaps they can give you some insight.
Post by Alex Dashevski
I wrote on android ndk and there is fastpath concept. Thus, I think that
resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to
8Khz and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-20:56 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz ->
8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you
fewer samples to process in the same amount of time but just downsampling
and then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information
above the nyquist frequency of the lowest rate but this is not usually the
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions .
Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration requiring
a buffer-resize or sample-rate decrease. WSOLA creates a signal with more
samples than the input, you must resample that (usually by a non-integer
amount) to make it the correct number of samples then output that, and
reload your buffer with the next block of input data. Please clarify if you
mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 11:45:54 UTC
Permalink
I have project with pitch shifting (resampling with wsola), It implements
on android NDK.
Since duration of pitch is ~20ms, I can't use system recommended
parameters for the fast path. for example, for my device: SampleRate:48Khz
and buffer size 240 samples. That means, duration time is 5ms (< pitch
duration = 20ms).
What can I do so I can use recommended parameters because it increases
latency. For example if I use 48Khz and 240 samples then latency is 66 ms
but if buffer size is 24000 samples then latency is 300ms.
I need latency < 100ms.

Thanks,
Alex

‫בתאךיך שבת, 6 באוק׳ 2018 ב-14:11 מאת ‪Ethan Fenn‬‏ <‪***@polyspectral.com
‬‏>:‬
Post by Ethan Fenn
You've got it backwards -- downsample means fewer samples. If you have a
240-sample buffer at 48kHz, then resample to 8kHz, you'll have 240/6=40
samples.
-Ethan
Post by Alex Dashevski
Hi,
Let's assume that my system has sample rate = 48Khz and audio buffer size
= 240 samples. It should be on RealTime.
1. Dowsampe to 8Khz and buffer size should be 240*6
2. To do proccessing on buffer 240*6 with 8Khz sample rate.
3. Upsample to 48khz with original buffer size.
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-23:51 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
I have only used libraries for resampling myself. I haven't looked at
their source, but it's available. The two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
and
https://kokkinizita.linuxaudio.org/linuxaudio/zita-resampler/resampler.html
perhaps they can give you some insight.
Post by Alex Dashevski
I wrote on android ndk and there is fastpath concept. Thus, I think
that resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to
8Khz and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
‫בתאךיך יום ד׳, 3 באוק׳ 2018 ב-20:56 מאת ‪Spencer Jackson‬‏ <‪
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz ->
8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you
fewer samples to process in the same amount of time but just downsampling
and then upsampling back doesn't really have any effect.
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information
above the nyquist frequency of the lowest rate but this is not usually the
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions
. Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is some
error in the implementation) is to reduce the block size, so you process
128 samples rather than 240. 240 isn't a very large amount of latency for a
pitch shifter which is typically a CPU intensive process and therefore most
implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration
requiring a buffer-resize or sample-rate decrease. WSOLA creates a signal
with more samples than the input, you must resample that (usually by a
non-integer amount) to make it the correct number of samples then output
that, and reload your buffer with the next block of input data. Please
clarify if you mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 15:22:35 UTC
Permalink
Your numbers don't make sense to me but probably I just dont understand it.

The latency should be independent of the sample rate, right?

You search for similarity in the wave, chop it up, and replay the grains
at different speeds and/or rates.

What you need for this is a certain amount of time of the wave.

If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given taht voice is
ususally well above 20 Hz.
Post by Alex Dashevski
I have project with pitch shifting (resampling with wsola), It
implements on android NDK.
Since duration of pitch is ~20ms, I can't use system recommendedB
SampleRate:48Khz and buffer size 240 samples. That means, duration
time is 5ms (< pitch duration = 20ms).
What can I do so I can use recommended parameters because it increases
latency. For example if I use 48Khz and 240 samples then latency is 66
ms but if buffer size is 24000 samples then latency is 300ms.
I need latency < 100ms.
Thanks,
Alex
b> <b>
You've got it backwards -- downsample means fewer samples. If you
have a 240-sample buffer at 48kHz, then resample to 8kHz, you'll
have 240/6=40 samples.
-Ethan
Hi,
Let's assume that my system has sample rate = 48Khz and audio
buffer size = 240 samples. It should be on RealTime.
1. Dowsampe to 8Khz and buffer size should be 240*6
2. To do proccessing on buffer 240*6 with 8Khz sample rate.
3. Upsample to 48khz with original buffer size.
Thanks,
Alex
b> <b>
I have only used libraries for resampling myself. I
haven't looked at their source, but it's available. The
two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
and
https://kokkinizita.linuxaudio.org/linuxaudio/zita-resampler/resampler.html
perhaps they can give you some insight.
On Wed, Oct 3, 2018 at 2:46 PM Alex Dashevski
I wrote on android ndk and there is fastpath concept.
Thus, I think that resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example
from 48Khz to 8Khz and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
On Wed, Oct 3, 2018 at 3:17 AM Alex Dashevski
if I do resampling before and after
processing. for example, 48Khz -> 8Khz and
then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower
latencies by giving you fewer samples to process
in the same amount of time but just downsampling
and then upsampling back doesn't really have any
effect.
I don't understand why I need filter, This is
to prevent alias but I can't understand why ?
Technically you only need a filter if your signal
has information above the nyquist frequency of the
lowest rate but this is not usually the case.B I
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions
. Once the high frequency information aliases it
cannot be recovered by resampling back to the
higher rate and your lower band information is now
mixed in with the aliased information. The filter
removes this high freqency data so that the low
band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm
(unless there is some error in the implementation)
is to reduce the block size, so you process 128
samples rather than 240. 240 isn't a very large
amount of latency for a pitch shifter which is
typically a CPU intensive process and therefore
most implementations have relatively high latencies.
I'm not sure I understand what you mean by the
pitch duration requiring a buffer-resize or
sample-rate decrease. WSOLA creates a signal with
more samples than the input, you must resample
that (usually by a non-integer amount) to make it
the correct number of samples then output that,
and reload your buffer with the next block of
input data. Please clarify if you mean some other
issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 15:58:36 UTC
Permalink
Hi,

I can't understand your answer. The duration of buffer should be bigger
than duration of pitch because I use WSOLA.
The latency also depends on sample rate and buffer length.

Thanks,
Alex
Post by gm
Your numbers don't make sense to me but probably I just dont understand it.
The latency should be independent of the sample rate, right?
You search for similarity in the wave, chop it up, and replay the grains
at different speeds and/or rates.
What you need for this is a certain amount of time of the wave.
If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given taht voice is
ususally well above 20 Hz.
I have project with pitch shifting (resampling with wsola), It implements
on android NDK.
Since duration of pitch is ~20ms, I can't use system recommendedB
parameters for the fast path. for example, for my device: SampleRate:48Khz
and buffer size 240 samples. That means, duration time is 5ms (< pitch
duration = 20ms).
What can I do so I can use recommended parameters because it increases
latency. For example if I use 48Khz and 240 samples then latency is 66 ms
but if buffer size is 24000 samples then latency is 300ms.
I need latency < 100ms.
Thanks,
Alex
Post by Ethan Fenn
You've got it backwards -- downsample means fewer samples. If you have a
240-sample buffer at 48kHz, then resample to 8kHz, you'll have 240/6=40
samples.
-Ethan
Post by Alex Dashevski
Hi,
Let's assume that my system has sample rate = 48Khz and audio buffer
size = 240 samples. It should be on RealTime.
1. Dowsampe to 8Khz and buffer size should be 240*6
2. To do proccessing on buffer 240*6 with 8Khz sample rate.
3. Upsample to 48khz with original buffer size.
Thanks,
Alex
Post by Spencer Jackson
I have only used libraries for resampling myself. I haven't looked at
their source, but it's available. The two libraries I'm aware of are at
http://www.mega-nerd.com/SRC/download.html
andB
https://kokkinizita.linuxaudio.org/linuxaudio/zita-resampler/resampler.html
perhaps they can give you some insight.
Post by Alex Dashevski
I wrote on android ndk and there is fastpath concept. Thus, I think
that resampling can help me.
Can you recommend me code example ?
Can you give me an example of resampling ? for example from 48Khz to
8Khz and 8Khz to 48Khz.
https://dspguru.com/dsp/faqs/multirate/resampling/
but it is not enough clear for me,
Thanks,
Alex
Post by Spencer Jackson
Post by Alex Dashevski
if I do resampling before and after processing. for example, 48Khz
-> 8Khz and then 8Khz -> 48Khz then will it help ?
Lowering sample rate can help achieve lower latencies by giving you
fewer samples to process in the same amount of time but just downsampling
and then upsampling back doesn't really have any effect.
B
Post by Alex Dashevski
I don't understand why I need filter, This is to prevent alias but I
can't understand why ?
Technically you only need a filter if your signal has information
above the nyquist frequency of the lowest rate but this is not usually the
https://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions
. Once the high frequency information aliases it cannot be recovered by
resampling back to the higher rate and your lower band information is now
mixed in with the aliased information. The filter removes this high
freqency data so that the low band stays clean through the whole process.
Is there option to decrease latency or delay ?
The only way to reduce latency in your algorithm (unless there is
some error in the implementation) is to reduce the block size, so you
process 128 samples rather than 240. 240 isn't a very large amount of
latency for a pitch shifter which is typically a CPU intensive process and
therefore most implementations have relatively high latencies.
I'm not sure I understand what you mean by the pitch duration
requiring a buffer-resize or sample-rate decrease. WSOLA creates a signal
with more samples than the input, you must resample that (usually by a
non-integer amount) to make it the correct number of samples then output
that, and reload your buffer with the next block of input data. Please
clarify if you mean some other issue.
_Spencer
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 16:11:20 UTC
Permalink
In my example, the buffer is 2 times as long as the lowest possible pitch,
for example if your lowest pitch is 20 Hz, you need 50 ms for one wave cycle

Think of it as magnetic tape, without sample rate, the minimum requierd
latency and the buffer length in milliesconds
are independent of sample rate
You have 100 ms "magnetic tape", search for similarity, and then chop
the tape according to that.
Then you have sippets of 50 ms length or smaller.
Then you copy these snippets and piece them together again, at a higher
or slower rate than before.
You can also shrink or lengthen the snippets and change the formants,
that ist shift all
spectral contant of one snippet up or down.
Hi,
I can't understand your answer.B The duration of buffer should be
bigger than duration of pitch because I use WSOLA.
The latency also depends on sample rate and buffer length.
Thanks,
Alex
Your numbers don't make sense to me but probably I just dont understand it.
The latency should be independent of the sample rate, right?
You search for similarity in the wave, chop it up, and replay the
grains at different speeds and/or rates.
What you need for this is a certain amount of time of the wave.
If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given taht
voice is ususally well above 20 Hz.
Alex Dashevski
2018-10-06 16:27:45 UTC
Permalink
I still don't understand. You change buffer size. Right ?
But I don't want to change.
Post by gm
In my example, the buffer is 2 times as long as the lowest possible pitch,
for example if your lowest pitch is 20 Hz, you need 50 ms for one wave cycle
Think of it as magnetic tape, without sample rate, the minimum requierd
latency and the buffer length in milliesconds
are independent of sample rate
You have 100 ms "magnetic tape", search for similarity, and then chop the
tape according to that.
Then you have sippets of 50 ms length or smaller.
Then you copy these snippets and piece them together again, at a higher or
slower rate than before.
You can also shrink or lengthen the snippets and change the formants, that
ist shift all
spectral contant of one snippet up or down.
Hi,
I can't understand your answer.B The duration of buffer should be bigger
than duration of pitch because I use WSOLA.
The latency also depends on sample rate and buffer length.
Thanks,
Alex
Post by gm
Your numbers don't make sense to me but probably I just dont understand it.
The latency should be independent of the sample rate, right?
You search for similarity in the wave, chop it up, and replay the grains
at different speeds and/or rates.
What you need for this is a certain amount of time of the wave.
If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given taht voice is
ususally well above 20 Hz.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 16:52:47 UTC
Permalink
no, you don't change the buffer size, you just change the playback rate
(and speed, if you want) of your grains.

For instance, lets say the pitch is 20 Hz, or 50 ms time for one cycle.

You want to change that to 100 Hz.

Then you take 50 ms of audio, and replay this 5 times every 10 ms (with
or without overlaps, but at the same speed

as the original to maintaine the formants).

Then you take the next 50 ms, and do that again.

For this, you need a buffer size of 50 ms or more.

But to compare two different wave cycles of 50 ms length to find
similarity, you need a buffer size of 100 ms.

That is your latency required, for 20 Hz.

That is all independent of sample rate, but of course your buffer size
in samples will be larger for a higher sample rate

and smaller for a lower sample rate. But the times of latency required
will be the same.

Also, if you to correlation you need less values to calcuate for a lower
sample rate.
Post by Alex Dashevski
I still don't understand. You change buffer size. Right ?
But I don't want to change.
In my example, the buffer is 2 times as long as the lowest
possible pitch,
for example if your lowest pitch is 20 Hz, you need 50 ms for one wave cycle
Think of it as magnetic tape, without sample rate, the minimum
requierd latency and the buffer length in milliesconds
are independent of sample rate
You have 100 ms "magnetic tape", search for similarity, and then
chop the tape according to that.
Then you have sippets of 50 ms length or smaller.
Then you copy these snippets and piece them together again, at a
higher or slower rate than before.
You can also shrink or lengthen the snippets and change the
formants, that ist shift all
spectral contant of one snippet up or down.
Post by gm
Hi,
I can't understand your answer.B The duration of buffer should be
bigger than duration of pitch because I use WSOLA.
The latency also depends on sample rate and buffer length.
Thanks,
Alex
Your numbers don't make sense to me but probably I just dont
understand it.
The latency should be independent of the sample rate, right?
You search for similarity in the wave, chop it up, and replay
the grains at different speeds and/or rates.
What you need for this is a certain amount of time of the wave.
If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given
taht voice is ususally well above 20 Hz.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 17:07:57 UTC
Permalink
What do you mean "replay" ? duplicate buffer ?
I have the opposite problem. My original buffer size doesn't contain full
cycle of the pitch.
How can I succeed to shift pitch ?

Thanks,
Alex
Post by gm
no, you don't change the buffer size, you just change the playback rate
(and speed, if you want) of your grains.
For instance, lets say the pitch is 20 Hz, or 50 ms time for one cycle.
You want to change that to 100 Hz.
Then you take 50 ms of audio, and replay this 5 times every 10 ms (with or
without overlaps, but at the same speed
as the original to maintaine the formants).
Then you take the next 50 ms, and do that again.
For this, you need a buffer size of 50 ms or more.
But to compare two different wave cycles of 50 ms length to find
similarity, you need a buffer size of 100 ms.
That is your latency required, for 20 Hz.
That is all independent of sample rate, but of course your buffer size in
samples will be larger for a higher sample rate
and smaller for a lower sample rate. But the times of latency required
will be the same.
Also, if you to correlation you need less values to calcuate for a lower
sample rate.
I still don't understand. You change buffer size. Right ?
But I don't want to change.
Post by gm
In my example, the buffer is 2 times as long as the lowest possible pitch,
for example if your lowest pitch is 20 Hz, you need 50 ms for one wave cycle
Think of it as magnetic tape, without sample rate, the minimum requierd
latency and the buffer length in milliesconds
are independent of sample rate
You have 100 ms "magnetic tape", search for similarity, and then chop the
tape according to that.
Then you have sippets of 50 ms length or smaller.
Then you copy these snippets and piece them together again, at a higher
or slower rate than before.
You can also shrink or lengthen the snippets and change the formants,
that ist shift all
spectral contant of one snippet up or down.
Hi,
I can't understand your answer.B The duration of buffer should be bigger
than duration of pitch because I use WSOLA.
The latency also depends on sample rate and buffer length.
Thanks,
Alex
Post by gm
Your numbers don't make sense to me but probably I just dont understand it.
The latency should be independent of the sample rate, right?
You search for similarity in the wave, chop it up, and replay the grains
at different speeds and/or rates.
What you need for this is a certain amount of time of the wave.
If you need a latency of <= 100 ms you can have two wave cycles stored
of 50ms length / 20 Hz, which should be sufficient, given taht voice is
ususally well above 20 Hz.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 17:25:53 UTC
Permalink
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't contain
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
Alex Dashevski
2018-10-06 17:34:58 UTC
Permalink
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't contain
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 17:49:02 UTC
Permalink
right

the latency required is that you need to store the complete wavecycle,
or two of them, to compare them

(My method works a little bit different, so I only need one wavecycle.)

So you always have this latency, regardless what sample rate you use.

But maybe you dont need 20 Hz, for speech for instance I think that 100
or even 150 Hz is sufficient? I dont know
Post by Alex Dashevski
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Daniel Varela
2018-10-06 17:55:05 UTC
Permalink
You could try a phase vocoder instead of WSOLA for time stretching. Latency
would be the size of the fft block.
right
the latency required is that you need to store the complete wavecycle, or
two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that 100 or
even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't contain
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 18:02:49 UTC
Permalink
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex

‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
right
the latency required is that you need to store the complete wavecycle, or
two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that 100
or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't contain
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
gm
2018-10-06 18:40:47 UTC
Permalink
You can "freeze" audio with the phase vocoder "for ever" if that ist
what you want to do.

You just keep the magnitude of the spectrum from one point in time and
keep it

and update the phases with the phase differences of that moment.
Post by Alex Dashevski
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
b> <b>
You could try a phase vocoder instead of WSOLA for time
stretching. Latency would be the size of the fft block.
right
the latency required is that you need to store the complete
wavecycle, or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think
that 100 or even 150 Hz is sufficient? I dont know
Post by Alex Dashevski
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of
course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice"
or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size
doesn't contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small -
there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with
a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Alex Dashevski
2018-10-06 18:52:40 UTC
Permalink
Could you know where I can find phase vocoder implementaion in cpp thus I
can run it on real time ?

‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:21 מאת ‪Daniel Varela‬‏ <‪
For real time you will need to do windowing and overlap add. But yeah, 5ms
should be enough.
This is a high level explanation with MATLAB
https://se.mathworks.com/help/audio/examples/pitch-shifting-and-time-dilation-using-a-phase-vocoder-in-matlab.html
Can you tell what minimum duration of buffer ? 5ms should be Ok ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:06 מאת ‪Daniel Varela‬‏ <‪
You can process buffers as small as your fft allows.
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
Post by gm
right
the latency required is that you need to store the complete wavecycle,
or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that
100 or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Scott Cotton
2018-10-06 20:14:32 UTC
Permalink
The best open source one I know of is https://breakfastquay.com/rubberband/

It is however very dense. I wouldn't bet on coming to an understanding of
how it does sample/window framing without significant investment. The
author himself said it was very hard to get sample accurate input samples
to output samples ratios.

However, to use it shouldn't be too hard.

Scott
Post by Alex Dashevski
Could you know where I can find phase vocoder implementaion in cpp thus I
can run it on real time ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:21 מאת ‪Daniel Varela‬‏ <‪
For real time you will need to do windowing and overlap add. But yeah,
5ms should be enough.
This is a high level explanation with MATLAB
https://se.mathworks.com/help/audio/examples/pitch-shifting-and-time-dilation-using-a-phase-vocoder-in-matlab.html
Can you tell what minimum duration of buffer ? 5ms should be Ok ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:06 מאת ‪Daniel Varela‬‏ <‪
You can process buffers as small as your fft allows.
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
Post by gm
right
the latency required is that you need to store the complete
wavecycle, or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that
100 or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
--
Scott Cotton
http://www.iri-labs.com
Scott Cotton
2018-10-06 20:15:21 UTC
Permalink
sorry, dropped a phrase by accident: shouldn't be too hard -- to use --.
Post by Scott Cotton
The best open source one I know of is
https://breakfastquay.com/rubberband/
It is however very dense. I wouldn't bet on coming to an understanding of
how it does sample/window framing without significant investment. The
author himself said it was very hard to get sample accurate input samples
to output samples ratios.
However, to use it shouldn't be too hard.
Scott
Post by Alex Dashevski
Could you know where I can find phase vocoder implementaion in cpp thus I
can run it on real time ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:21 מאת ‪Daniel Varela‬‏ <‪
For real time you will need to do windowing and overlap add. But yeah,
5ms should be enough.
This is a high level explanation with MATLAB
https://se.mathworks.com/help/audio/examples/pitch-shifting-and-time-dilation-using-a-phase-vocoder-in-matlab.html
Can you tell what minimum duration of buffer ? 5ms should be Ok ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:06 מאת ‪Daniel Varela‬‏ <‪
You can process buffers as small as your fft allows.
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
Post by gm
right
the latency required is that you need to store the complete
wavecycle, or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that
100 or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
Alex Dashevski
2018-10-07 06:20:01 UTC
Permalink
is it phase vocoder ?
I can't understand how it work.

‫בתאךיך שבת, 6 באוק׳ 2018 ב-23:15 מאת ‪Scott Cotton‬‏ <‪***@iri-labs.com
‬‏>:‬
Post by Scott Cotton
sorry, dropped a phrase by accident: shouldn't be too hard -- to use --.
Post by Scott Cotton
The best open source one I know of is
https://breakfastquay.com/rubberband/
It is however very dense. I wouldn't bet on coming to an understanding
of how it does sample/window framing without significant investment. The
author himself said it was very hard to get sample accurate input samples
to output samples ratios.
However, to use it shouldn't be too hard.
Scott
Post by Alex Dashevski
Could you know where I can find phase vocoder implementaion in cpp thus
I can run it on real time ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:21 מאת ‪Daniel Varela‬‏ <‪
For real time you will need to do windowing and overlap add. But yeah,
5ms should be enough.
This is a high level explanation with MATLAB
https://se.mathworks.com/help/audio/examples/pitch-shifting-and-time-dilation-using-a-phase-vocoder-in-matlab.html
Can you tell what minimum duration of buffer ? 5ms should be Ok ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:06 מאת ‪Daniel Varela‬‏ <‪
You can process buffers as small as your fft allows.
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
Post by gm
right
the latency required is that you need to store the complete
wavecycle, or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think that
100 or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Scott Cotton
2018-10-07 06:38:11 UTC
Permalink
Post by Alex Dashevski
is it phase vocoder ?
I am not the author, so take with a grain of salt: Yes, but it treats the
input to output sample ratio
differently than say a standard pv in matlab.
Post by Alex Dashevski
I can't understand how it work.
welcome to the world of closed ideas by open source obscurity :) I wish
the author would publish a clear
explanation, it would save lots of folks lots of time. But it would also
make things easier for its competitors...

That said, I think there are examples and clear docs of how to use it, and
the web page probably provides
better forum and access to usage than this list.


Scott
Post by Alex Dashevski
‬‏>:‬
Post by Scott Cotton
sorry, dropped a phrase by accident: shouldn't be too hard -- to use --.
Post by Scott Cotton
The best open source one I know of is
https://breakfastquay.com/rubberband/
It is however very dense. I wouldn't bet on coming to an understanding
of how it does sample/window framing without significant investment. The
author himself said it was very hard to get sample accurate input samples
to output samples ratios.
However, to use it shouldn't be too hard.
Scott
Post by Alex Dashevski
Could you know where I can find phase vocoder implementaion in cpp thus
I can run it on real time ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:21 מאת ‪Daniel Varela‬‏ <‪
For real time you will need to do windowing and overlap add. But yeah,
5ms should be enough.
This is a high level explanation with MATLAB
https://se.mathworks.com/help/audio/examples/pitch-shifting-and-time-dilation-using-a-phase-vocoder-in-matlab.html
Can you tell what minimum duration of buffer ? 5ms should be Ok ?
‫בתאךיך שבת, 6 באוק׳ 2018 ב-21:06 מאת ‪Daniel Varela‬‏ <‪
You can process buffers as small as your fft allows.
Hi,
phase vocoder doesn't have restriction of duration ?
Thanks,
Alex
‫בתאךיך שבת, 6 באוק׳ 2018 ב-20:55 מאת ‪Daniel Varela‬‏ <‪
Post by Daniel Varela
You could try a phase vocoder instead of WSOLA for time stretching.
Latency would be the size of the fft block.
Post by gm
right
the latency required is that you need to store the complete
wavecycle, or two of them, to compare them
(My method works a little bit different, so I only need one wavecycle.)
So you always have this latency, regardless what sample rate you use.
But maybe you dont need 20 Hz, for speech for instance I think
that 100 or even 150 Hz is sufficient? I dont know
If I understand correctly, resampling will not help. Right ?
No other technique that will help. Right ?
What do you mean "but not the duration/latency required" ?
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then
you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or
"snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't
contain
Post by Alex Dashevski
full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way
around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample
rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be
sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
--
Scott Cotton
http://www.iri-labs.com
Ethan Duni
2018-10-06 17:46:06 UTC
Permalink
Alex, it sounds like you are confusing algorithmic latency with framing latency. At each frame, you take in 10ms (or whatever) of input, and then provide 10ms of output. This (plus processing time to generate the output) is the IO latency of the process. But the algorithm itself can add additional signal delay.

Consider a simple delay process, wherein the algorithm maintains an internal delay buffer. At each 10ms frame, it reads the new input into the end of the buffer, and writes out 10ms of output from the front of the buffer. So the IO latency is 10ms, but the algorithmic latency is determined by the length of the delay buffer.

So if your WSOLA process requires more memory than the IO buffer, then it should maintain a longer internal memory. Then for each frame, you first digest the input into the buffer, then perform whatever processing to get 1 frame of output, and then save whatever state variables you need for the next frame. This internal buffer will add signal latency, but not IO latency.

Ethan
Post by gm
Post by Alex Dashevski
What do you mean "replay" ? duplicate buffer ?
I mean to just read the buffer for the output.
So in my example you play back 10 ms audio (windowed of course), then you move your read pointer and play
that audio back again, and so on, untill the next "slice" or "grain" or "snippet" of audio is played back.
Post by Alex Dashevski
I have the opposite problem. My original buffer size doesn't contain full cycle of the pitch.
then your pitch is too low or your buffer too small - there is no way around this, it's physics / causality.
You can decrease the number of samples of the buffer with a lower sample rate,
but not the duration/latency required.
Post by Alex Dashevski
How can I succeed to shift pitch ?
You wrote you can have a latency of < 100ms, but 100ms should be sufficient for this.
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp
Continue reading on narkive:
Loading...