[music-dsp] sound transcription knowledge

Discussion:

Raphaël Ilias

2018-07-10 10:57:49 UTC

Hello dear dsp-freaks!

Since a few years I'm very interested in all forms of traditional/classic
sound transcription.
I often find interesting tools, but currently I am interested to re-code
them (with Pure Data and Processing mostly, or Python), or hack them for
future artistic use.

So not to reinvent the wheel, I want first to make a quick
"state-of-the-art" about classical techniques of "making sound visual", and
learn of to implement them.

As examples of the topics i'm interested in :
- Spectral representation with Fast Fourier Transform (linear/logarithmic
frequency display & interpolation, time downsampling, colorscaling, visual
filtering / thresholding)
- Spectral representation by bands with Q-constant filters bank
- Waveform draw/plotting (especially issue of linear/logarithmic amplitude,
upsampling and interpolation, and time/downsampling techniques (decimation,
peak, rms)
- Dynamics representation : RMS/peak histograms... variance or...
- and also further analysis tools : musical scaling/filtering, note
detection, correlation of different sounds
- partials representation (sinus tracking)

Of course I have tested a few tools, like Spear, Adobe Audition... and a
few other command line tools I don't remember the names (but most of them
were buggy or Windows-based :Â§ )

So I wonder if you have other suggestions of software (preferably
open-source, hackable or freeware), pseudo-code or academic articles on
techniques (I have already partially read : Curtis Roads's L'AudionumÃ©rique
- Musique et Informatique, Miller Puckette's Theory and techniques of
electronic music book, Andy Farnell's Designing Sound and a bit of The
Scientist and Engineer's DSP Guide by Steven W. Smith) - I read French and
English.

I know this is a very large question !
Anyway I'll welcome any small suggestion from people more advised than
Google Search !

thanks by advance,

RaphaÃ«l Ilias
___________________________________
* phae.fr <http://phae.fr>*
17 places des Halles - Ingrandes
49123 Ingrandes Le Fresne s/ Loire
***@gmail.com / +33 (0) 6 04 45 79 78
___________

Patric Schmitz

2018-07-10 13:01:45 UTC

Permalink

Hi Raphaël,

you might consider additional spectral descriptors such as MFCC/BFCC,
cepstrum, spectral moments like centroid, variance, skewness. For
visualization purposes where you want to map perceived sound to visuals
a more perceptually oriented scale is also useful, so use barkhausen or
mel scale instead of plain FFT. Also psychoacoustic quantities such as
sharpness, roughness, tonalness, spectral rolloff, spread, or even time
domain quantities like zero crossing rate (e.g. puredata Dissonance
Model Toolbox or timbreID package). Have a look also at the stuff from
the MTG Barcelona (http://essentia.upf.edu/documentation/) they have
mood/key detection and all kinds of things. Transient detection is also
important for mapping rhytmical events as opposed to 'steady state'
spectral characteristics.

I am very much into this topic of mapping live musical parameters to
(generative, real-time rendered) visuals, so I'm very interested in what
others have to add to this list.

Best regards,
Patric

Post by RaphaÃ«l Ilias
Hello dear dsp-freaks!
Since a few years I'm very interested in all forms of
traditional/classic sound transcription.
I often find interesting tools, but currently I am interested to re-code
them (with Pure Data and Processing mostly, or Python), or hack them for
future artistic use.
So not to reinvent the wheel, I want first to make a quick
"state-of-the-art" about classical techniques of "making sound visual",
and learn of to implement them.
- Spectral representation with Fast Fourier Transform
(linear/logarithmic frequency display & interpolation, time
downsampling, colorscaling, visual filtering / thresholding)
- Spectral representation by bands with Q-constant filters bank
- Waveform draw/plotting (especially issue of linear/logarithmic
amplitude, upsampling and interpolation, and time/downsampling
techniques (decimation, peak, rms)
- Dynamics representation : RMS/peak histograms... variance or...
- and also further analysis tools : musical scaling/filtering, note
detection, correlation of different sounds
- partials representation (sinus tracking)
Of course I have tested a few tools, like Spear, Adobe Audition... and a
few other command line tools I don't remember the names (but most of
them were buggy or Windows-based :§ )
So I wonder if you have other suggestions of software (preferably
open-source, hackable or freeware), pseudo-code or academic articles on
techniques (I have already partially read : Curtis Roads's
L'Audionumérique - Musique et Informatique, Miller Puckette's Theory and
techniques of electronic music book, Andy Farnell's Designing Sound and
a bit of The Scientist and Engineer's DSP Guide by Steven W. Smith) - I
read French and English.
I know this is a very large question !
Anyway I'll welcome any small suggestion from people more advised than
Google Search !
thanks by advance,
Raphaël Ilias
___________________________________
*phae.fr <http://phae.fr>*
17 places des Halles - Ingrandes
49123 Ingrandes Le Fresne s/ Loire
___________
_______________________________________________
dupswapdrop: music-dsp mailing list
https://lists.columbia.edu/mailman/listinfo/music-dsp

Richard Dobson

2018-07-10 13:25:54 UTC

Permalink

It might also be worth looking at Sonic Visualiser, which is open source:

https://www.sonicvisualiser.org/

Richard Dobson

Post by Patric Schmitz
Hi Raphaël,
you might consider additional spectral descriptors such as MFCC/BFCC,
cepstrum, spectral moments like centroid, variance, skewness. For
visualization purposes where you want to map perceived sound to visuals
a more perceptually oriented scale is also useful, so use barkhausen or
mel scale instead of plain FFT. Also psychoacoustic quantities such as
sharpness, roughness, tonalness, spectral rolloff, spread, or even time
domain quantities like zero crossing rate (e.g. puredata Dissonance
Model Toolbox or timbreID package). Have a look also at the stuff from
the MTG Barcelona (http://essentia.upf.edu/documentation/) they have
mood/key detection and all kinds of things. Transient detection is also
important for mapping rhytmical events as opposed to 'steady state'
spectral characteristics.
I am very much into this topic of mapping live musical parameters to
(generative, real-time rendered) visuals, so I'm very interested in what
others have to add to this list.
Best regards,
Patric

...

Sound of L.A. Music and Audio

2018-07-10 15:18:16 UTC

Permalink

Hello Richard and others

Post by Patric Schmitz
I am very much into this topic of mapping live musical parameters to
(generative, real-time rendered) visuals, so I'm very interested in what
others have to add to this list.

Music to Video is also a thing which took my interest all of my life and
here are some of my approaches:

Two years ago, I worked out a concept for real time rendering images and
objects according to music note occurence to express emotions coming up
when listening. This can (but does not necessarily have to) be related
to some people's ability to "see" music: This special effect in some
musician's brains is pretty often viewed and also subject of scientifc
research. So is the way these colours occur and in wich way music is
related to visual impressions.

Surprisingly (or not?) many of these impressions have a lot in common:
For instance some people reported to see some kind of brown nutshells
when listening to long durant week bass tunes. The project leader
provided me with a list of such audio-music links.

The intention was to draw images similar to these impressions to give
also the "non viewing" audience the chance to see sound.

My solution finally based on my VRR system I once created in FPGAs to
test optical lenses the virtual way and / or emulate camera and video
sensors in real time. It has been derived from a DSP algo set developped
earlier for testing purposes. You can get an idea here:

https://www.mikrocontroller.net/topic/403956?goto=4679470#4679470

Now, coming to a more human interpretation of sound on order to draw
emotional images, one has to "decode" the meaning of sound, such as
harsch, week, aggressive, calming and so on. I continued this and
developped some mapping tables to analyze the harmonic constellations,
the amount of energy, attack and sustain of sounds and transform them to
colours, contrast and more or less round or edged objects. For copyright
reasons I cannot go into further details here but first versions of the
hardware were partly described here:

http://www.96khz.org/htm/graphicaudiooutc3.htm
and here:
http://www.96khz.org/htm/graphicvisualizerrt.htm

This is still in progress, currently i am building up a system to become
an installation of a museeum.

What you can look for is to identify certain dominant frequencies in the
mix with analog or virtual analog filters and use amplitude and phase to
drive a colour model (6 main colours r,g,b, c,m,y willdo) and cause
intererences of the stereo channels. Some of the "interferrences" images
were created this way:

http://engineer.bplaced.net/index.htm

And a final thing: One interesting side effect and maybe another kind of
music visualization can be done with error processing:

I am often testing my algorithms (also the non musical) by visualization
of the differences between two calculation of lower and higher
resulutions in order to find out possible unhappy cummulations of
errors. The resulting difference is amplified and displayed over two
parameters on a video screen. This leads to very interesting fractal
images changing their colours and structures in real time. If you then
apply a music signal, this leads to very strange images:

These images are so to speak the debris when taking away the correctly
processed bits the audio signals.

Jürgen

Patric Schmitz

2018-07-10 14:22:20 UTC

Permalink

Theo Verelst

2018-07-10 17:12:01 UTC

Permalink

For artistic purposes, there are a number of professional signal recognition
tracks, but they are too complicated, and successful "light organ" applications
or Rock Show lighting plans are directly correlated with the artistic intentions
in the music production.

Technology students with some experience on a audio platform of your choice should
be able to create most of the main tasks you're asking about, some of them are
standard things. You could take a look at some of the Open Source Linux applications
such as oscilloscope, VU meter, spectrum meter applications, they all exist in various
forms, and the advantage of OS is: you can modify them.

T