iracema.pitch¶

According to the ANSI standard 1994 [ANSI1994], “Pitch is that attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high. Pitch depends mainly on the frequency content of the sound stimulus, but it also depends on the sound pressure and the waveform of the stimulus.”

This module contains the implementation of a few different pitch estimation methods.

References

ANSI1994: American National Standards Institute, 1994.

iracema.pitch.hps(fft_time_series, minf0, maxf0, n_downsampling=16, decimation='discard')[source]¶

Extract the pitch using Harmonic Product Spectrum.

The Harmonic Product Spectrum measures the maximum coincidence for harmonics [Cuadra2001]. It is based on successive downsampling operations on the frequency spectrum of the signal. If the signal contains harmonic components, then it should contain energy in the frequency positions corresponding to the integer multiples of the fundamental frequency. So by down-sampling the spectrum by increasing integer factors \((1,2,3,...,R)\) it is possible to align the energy of its harmonic components with the fundamental frequency of the signal.

Then we multiply the original spectrum and its downsampled versions. This operation will make a strong peak appear in a position that corresponds to the fundamental frequency. The HPS calculates the maximum coincidence for harmonics, according to the equation:

\[Y(\omega) = \prod_{r=1}^{R} |X(\omega r)|\]

where \(X(\omega r)\) represents one spectral frame and \(R\) is the number of harmonics to be considered in the calculation. After this calculation a simple peak detection algorithm is used to obtain the fundamental frequency of the frame.

This implementation modifies this approach adding an offset of 1 to the magnitude spectrum of the signal before applying the product shown in the equation above. This makes the algorithm more reliable in situations where some harmonics have very little or no energy at all (float arithmetic is not reliable when values get too close to zero).

Also, alternatively to the original approach, it is possible to choose between different interpolation methods, using the argument decimation.

Parameters

fft_time_series (iracema.spectral.FFTs) – FFT time series.
minf0 (float) – Lower frequency limit to search for f0.
maxf0 (float) – Upper frequency limit to search for f0.
n_downsampling (int) – Number of downsampling operations
decimation ('discard', 'mean' or 'interpolation') – Type of decimation operation to be performed.

Returns

pitch – A pitch time series

Return type

TimeSeries

References

Cuadra2001: De La Cuadra, P. Efficient pitch detection techniques for interactive music. In ICMC, page 403–406, 2001.

iracema.pitch.expan(fft_time_series, minf0=24.0, maxf0=4200.0, nharm=12, ncand=5, min_mag_cand=0.1, noisiness_tresh=0.99, perc_tol=0.04)[source]¶

Extract the pitch using the Expan pitch detection algorithm.

Parameters

fft_time_series (iracema.spectral.FFTs) – FFT time series.
minf0 (float) – Lower frequency limit to search for f0.
maxf0 (float) – Upper frequency limit to search for f0.
nharm (int) – Number of harmonics to be considered.
ncand (int) – Number of f0 candidate components to be used.
min_mag_cand (float) – Minimum magnitude of the candidate to be chosen as f0.
noisiness_tresh (float) – Noisiness treshold.
perc_tol (float) – Tolerance percentage to search for harmonics.

Returns

pitch – A pitch time series

Return type

TimeSeries

iracema.pitch.crepe(audio, model_capacity='large', min_confidence=0.0, viterbi=True)[source]¶

Extract the pitch using CREPE pitch tracker.

This function uses a pitch tracker based on deep convolutional neural networks. The model was proposed and trained by [Kim2018].

Parameters

audio (iracema.core.audio.Audio) – Audio time series.
step_size (float) – Length of the time steps for the pitch extraction.
model_capacity ('tiny', 'small', 'medium', 'large', or 'full') – String specifying the capacity of the model. The value ‘full’ will use the model specified in the paper. The others will reduce the number of filters in the convolutional layers, resulting in faster computation, at the cost of slightly reduced accuracy.
min_confidence (float) – Minimum confidence to consider a pitch detection as valid.
viterbi (bool) – Viterbi smoothing for pitch curve.

Returns

pitch – A pitch time series.

Return type

TimeSeries

References

Kim2018: Kim, J. W., Salamon, J., Li, P., & Bello, J. P. (2018). CREPE: A Convolutional Representation for Pitch Estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).

iracema.pitch.pitch_filter(pitch_time_series, delta_max=0.04)[source]¶

The pitch curve can be noisy, this function tries to smooth it.

Parameters: delta_max (int) – Delta parameter for the smoothing algorithm.
Returns: pitch – A smoothed pitch time series.
Return type: TimeSeries

iracema.pitch.pitch_mode(pitch_time_series, window=9)[source]¶

Apply a windowed mode to the pitch curve to remove noise.

Parameters: window (int) – Length of the window for the calculation of the mode.
Returns: pitch – A smoothed pitch time series.
Return type: TimeSeries