iracema.features¶
This module contains the implementation of feature extractors.
References
- Bello2005
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1046.
- Dixon2006
Dixon, S. (2006). Onset Detection Revisited. In 9th International Conference on Digital Audio Effects (pp. 133–137). Montreal, Canada.
- Lerch2012(1,2,3)
Lerch, A. (2012). An introduction to audio content analysis: Applications in signal processing and music informatics. In An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics.
- Park2004
Park, T. H. (2004). Towards automatic musical instrument timbre recognition. Princeton University.
- Park2010
Park, T. H. (2010). Introduction to digital signal processing: Computer musically speaking. World Scientific Publishing Co. Pte. Ltd.
- Peeters2011(1,2)
Peeters, G., Giordano, B. L., Susini, P., Misdariis, N., & McAdams, S. (2011). The timbre toolbox: extracting audio features from musical signals, 130(5).
-
iracema.features.
peak_envelope
(time_series, window_size, hop_size)[source]¶ Calculate the peak envelope of a time series
The peak envelope consists in the peak absolute values of the amplitude within the aggregation window.
\[\operatorname{PE} = max(|x(n)|), 1 <= n <= L\]Where x(n) is the n-th sample of a window of length L.
- Parameters
time_series (iracema.core.timeseries.TimeSeries) – An audio time-series object.
window_size (int) –
hop_size (int) –
-
iracema.features.
rms
(time_series, window_size, hop_size)[source]¶ Calculate the root mean square of a time series
The RMS envelope consists in the root mean square of the amplitude, calculated within the aggregation window.
\[RMS = \sqrt{ \frac{1}{L} \sum_{n=1}^{L} x(n)^2 }\]Where x(n) is the n-th sample of a window of length L.
- Parameters
time_series (iracema.core.timeseries.TimeSeries) – A time-series object. It is usually applied on Audio objects.
window_size (int) –
hop_size (int) –
-
iracema.features.
zcr
(time_series, window_size, hop_size)[source]¶ Calculate the zero-crossing rate of a time series, i.e., the number of times the signal crosses the zero axis, per second.
The zero-crossing rate gives some insight on the noisiness character of a sound. In noisy / unvoiced signals, the zero-crossing rate tends to reach higher values than in periodic / voiced signals.
\[\operatorname{ZC} = \frac{1}{2 L} \sum_{n=1}^{L}\left|\operatorname{sgn}\left[x(n)\right]-\operatorname{sgn}\left[x(n-1)\right]\right|\]Where
\[\begin{split}\operatorname{sgn}\left[x(n)\right]=\left\{\begin{array}{c}{1, x(n) \geq 0} \\ {-1, x(n)<0}\end{array}\right.\end{split}\]And x(n) is the n-th sample of a window of length L.
- Parameters
time_series (iracema.core.timeseries.TimeSeries) – A time-series object. It is usually applied on Audio objects.
window_size (int) –
hop_size (int) –
-
iracema.features.
spectral_flatness
(stft)[source]¶ Calculate the spectral flatness for a given STFT.
The spectral flatness gives an estimation of the noisiness / sinusoidality of an audio signal (for the whole spectrum or for a frequency range). It can be used to determine voiced / unvoiced parts of a signal [Park2004].
It is defined as the ratio between the geometric mean and the arithmetic mean of the energy spectrum:
\begin{eqnarray} \operatorname{SFM} = 10 log_{10} \left( \frac {\left( \prod_{k=1}^{N} |X(k)| \right)^\frac{1}{N}} { \frac{1}{N} \sum_{k=1}^{N} |X(k)| } \right) \end{eqnarray}Where X(k) is the result of the STFT for the k-th frequency bin.
- Parameters
time_series (iracema.spectral.STFT) – A STFT object
-
iracema.features.
hfc
(stft, method='energy')[source]¶ Calculate the high frequency content for a STFT time-series.
The HFC _function produces sharp peaks during attacks or transients [Bello2005] and might be a good choice for detecting onsets in percussive sounds.
\[\operatorname{HFC} = \sum_{k=1}^{N} |X(k)|^2 \cdot k\]Alternatively, you can set
method
= ‘amplitude’ instead of ‘energy’ (default value):\[\operatorname{HFC} = \sum_{k=1}^{N} |X(k)| \cdot k\]- Parameters
stft (iracema.spectral.STFT) – STFT time-series.
method (str) – Method of choice to calculate the HFC.
-
iracema.features.
spectral_centroid
(stft)[source]¶ Calculate the spectral centroid for a STFT time-series.
The spectral centroid is a well known timbral feature that is used to describe the brightness of a sound. It represents the center of gravity of the frequency components of a signal [Park2010].
\[\operatorname{SC} = \frac{\sum_{k=1}^{N} |X(k)| \cdot f_k }{\sum_{k=1}^{N} |X(k)|}\]Where X(k) is the result of the STFT for the k-th frequency bin.
- Parameters
stft (iracema.spectral.STFT) – A STFT object
-
iracema.features.
spectral_spread
(stft)[source]¶ Calculate the spectral spread for a STFT time-series.
The spectral spread represents the spread of the spectrum around the spectral centroid [Peeters2011], [Lerch2012].
\[\operatorname{SSp} = \sqrt{\frac{\sum_{k=1}^{N} |X(k)| \cdot (f_k - SC)^2 }{\sum_ {k=1}^{N} |X (k)|}}\]Where X(k) is the result of the STFT for the k-th frequency bin and SC is the spectral centroid for the frame.
-
iracema.features.
spectral_skewness
(stft)[source]¶ Calculate the spectral skewness for an STFT time series
The spectral skewness is a measure of the asymetry of the distribution of the spectrum around its mean value, and is calculated from its third order moment. It will output negative values when the spectrum has more energy bellow the mean value, and positive values when it has more energy above the mean. Symmetric distributions will output the value zero [Lerch2012].
\[\operatorname{SSk} = \frac{2 \cdot \sum_{k=1}^{N} \left( |X(k)| - \mu_{|X|} \right)^3 }{ N \cdot \sigma_{|X|}^3}\]Where \(\mu_{|X|}\) is the mean value of the maginute spectrum and \(\sigma_{|X|}\) its standard deviation.
-
iracema.features.
spectral_kurtosis
(stft)[source]¶ Calculate the spectral kurtosis for an STFT time series
The spectral kurtosis is a measure of the flatness of the distribution of the spectrum around its mean value. It will output the value 3 for Gaussian distributions. Values smaller than 3 represent flatter distributions, while values larger than 3 represent peakier distributions [Lerch2012].
\[\operatorname{SKu} = \frac{2 \cdot \sum_{k=1}^{N} \left( |X(k)| - \mu_{|X|} \right)^4 }{ N \cdot \sigma_{|X|}^4}\]Where \(\mu_{|X|}\) is the mean value of the maginute spectrum and \(\sigma_{|X|}\) its standard deviation.
-
iracema.features.
spectral_flux
(stft, method='hwrdiff')[source]¶ Calculate the spectral flux for a STFT time-series.
The spectral flux measures the amount of change between successive spectral frames. There are different methods to calculate the spectral flux across the literature. For now we have implemented the one proposed by [Dixon2006].
\[\operatorname{SF} = \sum_{k=1}^{N} H(|X(t, k)| - |X(t-1, k)|)\]where \(H(x) = \frac{x+|x|}{2}\) is the half-wave rectifier _function, and t is the temporal index of the frame.
- Parameters
stft (iracema.spectral.STFT) – A STFT object
method (str) – ‘hwrdiff’ or ‘corr’
-
iracema.features.
harmonic_centroid
(harmonics)[source]¶ Harmonic Centroid
The harmonic centroid represents the center of gravity of the amplitudes of the harmonic series.
\[\operatorname{HC} = \frac{\sum_{k=1}^{H} A(k) \cdot f_k }{\sum_{k=1}^{H} A(k)}\]Where \(A(h)\) represents the amplitude of the h-th harmonic partial.
-
iracema.features.
harmonic_energy
(harmonics_magnitude)[source]¶ Calculate the energy of harmonic partials.
Harmonic energy is the energy of the harmonic partials of a signal.
\[\operatorname{HE} = \sum_{k=1}^{H} A(k)^2\]
-
iracema.features.
spectral_entropy
(stft)[source]¶ Calculate the spectral entropy for a STFT time series
The spectral entropy is based on the concept of information entropy from Shannon’s information theory. It measures the unpredictability of the given state of a spectral distribution.
\[\operatorname{SEpy} = - \sum_{k}^{N} P(k) \cdot \log_2 P(k)\]Where
\[P(i)=\frac{|X(i)|^2}{\sum_{j}^{N} |X(j)|^2}\]More info at https://www.mathworks.com/help/signal/ref/pentropy.html.
-
iracema.features.
spectral_energy
(stft)[source]¶ Calculate the total energy of an STFT frame.
Spectral Energy is the total energy of an STFT frame.
\[\operatorname{SF} = \sum_{k=1}^{N} H(|X(t, k)| - |X(t-1, k)|)\]
-
iracema.features.
noisiness
(stft, harmonics_magnitude)[source]¶ Calculate the Noisiness for the given STFT and Harmonics time series.
The Noisiness represent how noisy a signal is (values closer to 1), as oposed to harmonic (values close to 0). It is the ratio of the noise energy to the total energy of a signal [Peeters2011].
\[\operatorname{Ns} = \frac{\operatorname{SE}-\operatorname{HE}}{\operatorname{SE}}\]
-
iracema.features.
oer
(harmonics)[source]¶ Calculate the odd-to-even ratio for the harmonics time series.
The OER represents the odd-to-even ratio among the harmonics of an audio signal. This value will be higher for sounds with predominantly odd harmonics, such as the clarinet.
\[\operatorname{OER}=\frac{\sum_{h=1}^{H / 2} A(2 h - 1)^{2}\left(t_{m}\right)}{\sum_{h=1}^{H / 2} A(2 h)^{2}\left(t_{m}\right)}\]Where \(A(h)\) represents the amplitude of the h-th harmonic partial.
-
iracema.features.
local_tempo
(onsets, nominal_ioi_durations)[source]¶ Calculate the local tempo for a list of note onsets.
- Parameters
onsets (PointList) – List of note onset points.
nominal_ioi_durations (list) – List containing the nominal durations of the IOIs for the execerpt (based on the score).
- Returns
local_tempo – Numpy array containing the local tempos for each IOI.
- Return type
np.array
-
iracema.features.
legato_index
(audio, note_list, window=1024, hop=441)[source]¶ Estimate the legato index for the given audio and note list.
- Parameters
audio (Audio) – Audio object.
note_list (list) – List of dictionaries containing the note envelope points.
window (int) –
hop (int) –
- Returns
legato_indexes – Numpy array with the calculated legato index for each note.
- Return type
np.array