iracema.features¶

This module contains the implementation of feature extractors.

References

Bello2005: Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1046.
Dixon2006: Dixon, S. (2006). Onset Detection Revisited. In 9th International Conference on Digital Audio Effects (pp. 133–137). Montreal, Canada.
Lerch2012(1,2,3): Lerch, A. (2012). An introduction to audio content analysis: Applications in signal processing and music informatics. In An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics.
Park2004: Park, T. H. (2004). Towards automatic musical instrument timbre recognition. Princeton University.
Park2010: Park, T. H. (2010). Introduction to digital signal processing: Computer musically speaking. World Scientific Publishing Co. Pte. Ltd.
Peeters2011(1,2): Peeters, G., Giordano, B. L., Susini, P., Misdariis, N., & McAdams, S. (2011). The timbre toolbox: extracting audio features from musical signals, 130(5).

iracema.features.peak_envelope(time_series, window_size, hop_size)[source]¶

Calculate the peak envelope of a time series

The peak envelope consists in the peak absolute values of the amplitude within the aggregation window.

\[\operatorname{PE} = max(|x(n)|), 1 <= n <= L\]

Where x(n) is the n-th sample of a window of length L.

Parameters

time_series (iracema.core.timeseries.TimeSeries) – An audio time-series object.
window_size (int) –
hop_size (int) –

iracema.features.rms(time_series, window_size, hop_size)[source]¶

Calculate the root mean square of a time series

The RMS envelope consists in the root mean square of the amplitude, calculated within the aggregation window.

\[RMS = \sqrt{ \frac{1}{L} \sum_{n=1}^{L} x(n)^2 }\]

Where x(n) is the n-th sample of a window of length L.

Parameters

time_series (iracema.core.timeseries.TimeSeries) – A time-series object. It is usually applied on Audio objects.
window_size (int) –
hop_size (int) –

iracema.features.zcr(time_series, window_size, hop_size)[source]¶

Calculate the zero-crossing rate of a time series, i.e., the number of times the signal crosses the zero axis, per second.

The zero-crossing rate gives some insight on the noisiness character of a sound. In noisy / unvoiced signals, the zero-crossing rate tends to reach higher values than in periodic / voiced signals.

\[\operatorname{ZC} = \frac{1}{2 L} \sum_{n=1}^{L}\left|\operatorname{sgn}\left[x(n)\right]-\operatorname{sgn}\left[x(n-1)\right]\right|\]

Where

\[\begin{split}\operatorname{sgn}\left[x(n)\right]=\left\{\begin{array}{c}{1, x(n) \geq 0} \\ {-1, x(n)<0}\end{array}\right.\end{split}\]

And x(n) is the n-th sample of a window of length L.

Parameters

time_series (iracema.core.timeseries.TimeSeries) – A time-series object. It is usually applied on Audio objects.
window_size (int) –
hop_size (int) –

iracema.features.spectral_flatness(stft)[source]¶

Calculate the spectral flatness for a given STFT.

The spectral flatness gives an estimation of the noisiness / sinusoidality of an audio signal (for the whole spectrum or for a frequency range). It can be used to determine voiced / unvoiced parts of a signal [Park2004].

It is defined as the ratio between the geometric mean and the arithmetic mean of the energy spectrum:

\begin{eqnarray} \operatorname{SFM} = 10 log_{10} \left( \frac {\left( \prod_{k=1}^{N} |X(k)| \right)^\frac{1}{N}} { \frac{1}{N} \sum_{k=1}^{N} |X(k)| } \right) \end{eqnarray}

Where X(k) is the result of the STFT for the k-th frequency bin.

Parameters: time_series (iracema.spectral.STFT) – A STFT object

iracema.features.hfc(stft, method='energy')[source]¶

Calculate the high frequency content for a STFT time-series.

The HFC _function produces sharp peaks during attacks or transients [Bello2005] and might be a good choice for detecting onsets in percussive sounds.

\[\operatorname{HFC} = \sum_{k=1}^{N} |X(k)|^2 \cdot k\]

Alternatively, you can set method = ‘amplitude’ instead of ‘energy’ (default value):

\[\operatorname{HFC} = \sum_{k=1}^{N} |X(k)| \cdot k\]

Parameters

stft (iracema.spectral.STFT) – STFT time-series.
method (str) – Method of choice to calculate the HFC.

iracema.features.spectral_centroid(stft)[source]¶

Calculate the spectral centroid for a STFT time-series.

The spectral centroid is a well known timbral feature that is used to describe the brightness of a sound. It represents the center of gravity of the frequency components of a signal [Park2010].

\[\operatorname{SC} = \frac{\sum_{k=1}^{N} |X(k)| \cdot f_k }{\sum_{k=1}^{N} |X(k)|}\]

Where X(k) is the result of the STFT for the k-th frequency bin.

Parameters: stft (iracema.spectral.STFT) – A STFT object

iracema.features.spectral_spread(stft)[source]¶

Calculate the spectral spread for a STFT time-series.

The spectral spread represents the spread of the spectrum around the spectral centroid [Peeters2011], [Lerch2012].

\[\operatorname{SSp} = \sqrt{\frac{\sum_{k=1}^{N} |X(k)| \cdot (f_k - SC)^2 }{\sum_ {k=1}^{N} |X (k)|}}\]

Where X(k) is the result of the STFT for the k-th frequency bin and SC is the spectral centroid for the frame.

iracema.features.spectral_skewness(stft)[source]¶

Calculate the spectral skewness for an STFT time series

The spectral skewness is a measure of the asymetry of the distribution of the spectrum around its mean value, and is calculated from its third order moment. It will output negative values when the spectrum has more energy bellow the mean value, and positive values when it has more energy above the mean. Symmetric distributions will output the value zero [Lerch2012].

\[\operatorname{SSk} = \frac{2 \cdot \sum_{k=1}^{N} \left( |X(k)| - \mu_{|X|} \right)^3 }{ N \cdot \sigma_{|X|}^3}\]

Where \(\mu_{|X|}\) is the mean value of the maginute spectrum and \(\sigma_{|X|}\) its standard deviation.

iracema.features.spectral_kurtosis(stft)[source]¶

Calculate the spectral kurtosis for an STFT time series

The spectral kurtosis is a measure of the flatness of the distribution of the spectrum around its mean value. It will output the value 3 for Gaussian distributions. Values smaller than 3 represent flatter distributions, while values larger than 3 represent peakier distributions [Lerch2012].

\[\operatorname{SKu} = \frac{2 \cdot \sum_{k=1}^{N} \left( |X(k)| - \mu_{|X|} \right)^4 }{ N \cdot \sigma_{|X|}^4}\]

Where \(\mu_{|X|}\) is the mean value of the maginute spectrum and \(\sigma_{|X|}\) its standard deviation.

iracema.features.spectral_flux(stft, method='hwrdiff')[source]¶

Calculate the spectral flux for a STFT time-series.

The spectral flux measures the amount of change between successive spectral frames. There are different methods to calculate the spectral flux across the literature. For now we have implemented the one proposed by [Dixon2006].

\[\operatorname{SF} = \sum_{k=1}^{N} H(|X(t, k)| - |X(t-1, k)|)\]

where \(H(x) = \frac{x+|x|}{2}\) is the half-wave rectifier _function, and t is the temporal index of the frame.

Parameters

stft (iracema.spectral.STFT) – A STFT object
method (str) – ‘hwrdiff’ or ‘corr’

iracema.features.harmonic_centroid(harmonics)[source]¶

Harmonic Centroid

The harmonic centroid represents the center of gravity of the amplitudes of the harmonic series.

\[\operatorname{HC} = \frac{\sum_{k=1}^{H} A(k) \cdot f_k }{\sum_{k=1}^{H} A(k)}\]

Where \(A(h)\) represents the amplitude of the h-th harmonic partial.

iracema.features.harmonic_energy(harmonics_magnitude)[source]¶

Calculate the energy of harmonic partials.

Harmonic energy is the energy of the harmonic partials of a signal.

\[\operatorname{HE} = \sum_{k=1}^{H} A(k)^2\]

iracema.features.spectral_entropy(stft)[source]¶

Calculate the spectral entropy for a STFT time series

The spectral entropy is based on the concept of information entropy from Shannon’s information theory. It measures the unpredictability of the given state of a spectral distribution.

\[\operatorname{SEpy} = - \sum_{k}^{N} P(k) \cdot \log_2 P(k)\]

Where

\[P(i)=\frac{|X(i)|^2}{\sum_{j}^{N} |X(j)|^2}\]

More info at https://www.mathworks.com/help/signal/ref/pentropy.html.

iracema.features.spectral_energy(stft)[source]¶

Calculate the total energy of an STFT frame.

Spectral Energy is the total energy of an STFT frame.

\[\operatorname{SF} = \sum_{k=1}^{N} H(|X(t, k)| - |X(t-1, k)|)\]

iracema.features.noisiness(stft, harmonics_magnitude)[source]¶

Calculate the Noisiness for the given STFT and Harmonics time series.

The Noisiness represent how noisy a signal is (values closer to 1), as oposed to harmonic (values close to 0). It is the ratio of the noise energy to the total energy of a signal [Peeters2011].

\[\operatorname{Ns} = \frac{\operatorname{SE}-\operatorname{HE}}{\operatorname{SE}}\]

iracema.features.oer(harmonics)[source]¶

Calculate the odd-to-even ratio for the harmonics time series.

The OER represents the odd-to-even ratio among the harmonics of an audio signal. This value will be higher for sounds with predominantly odd harmonics, such as the clarinet.

\[\operatorname{OER}=\frac{\sum_{h=1}^{H / 2} A(2 h - 1)^{2}\left(t_{m}\right)}{\sum_{h=1}^{H / 2} A(2 h)^{2}\left(t_{m}\right)}\]

Where \(A(h)\) represents the amplitude of the h-th harmonic partial.

iracema.features.local_tempo(onsets, nominal_ioi_durations)[source]¶

Calculate the local tempo for a list of note onsets.

Parameters

onsets (PointList) – List of note onset points.
nominal_ioi_durations (list) – List containing the nominal durations of the IOIs for the execerpt (based on the score).

Returns

local_tempo – Numpy array containing the local tempos for each IOI.

Return type

np.array

iracema.features.legato_index(audio, note_list, window=1024, hop=441)[source]¶

Estimate the legato index for the given audio and note list.

Parameters

audio (Audio) – Audio object.
note_list (list) – List of dictionaries containing the note envelope points.
window (int) –
hop (int) –

Returns

legato_indexes – Numpy array with the calculated legato index for each note.

Return type

np.array