F0Extractor#
Defined in: voxatlas.features.acoustic.pitch.f0
- class voxatlas.features.acoustic.pitch.f0.F0Extractor[source]#
Bases:
BaseExtractorExtract the
acoustic.pitch.f0feature within the VoxAtlas pipeline.This public extractor defines the reusable API for computing
acoustic.pitch.f0from VoxAtlas structured inputs. It consumesNoneunits and produces values aligned toframeunits, making the extractor a stable pipeline node that can be cited independently of the surrounding execution machinery.Algorithm#
The implementation estimates a frame-level fundamental-frequency contour using autocorrelation over short analysis windows.
Framing and centering The waveform is segmented into overlapping frames of length
frame_lengthand hopframe_step. Each frame is mean-centered before periodicity analysis.Period search For each frame, the code evaluates the one-sided autocorrelation function
\[R(\tau) = \sum_{n=0}^{N-\tau-1} x[n]x[n+\tau],\]and restricts candidate lags to the interval implied by
fminandfmax.Voicing decision The winning lag \(\tau^*\) is accepted only when the normalized autocorrelation peak exceeds the implementation threshold. The output frequency is then
\[\hat f_0 = \frac{f_s}{\tau^*}.\]Packaging Unvoiced or low-energy frames are set to
NaN, and the resulting contour is returned as a frame-alignedVectorFeatureOutputfor downstream voice-quality and prosodic features.
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.features.acoustic.pitch.f0 import F0Extractor >>> from voxatlas.features.feature_input import FeatureInput >>> sr = 16000 >>> t = np.arange(0, sr // 10) / sr >>> waveform = (0.1 * np.sin(2 * np.pi * 100 * t)).astype(np.float32) >>> audio = Audio(waveform=waveform, sample_rate=sr) >>> feature_input = FeatureInput(audio=audio, units=None, context={}) >>> params = F0Extractor.default_config.copy() >>> out = F0Extractor().compute(feature_input, params) >>> out.unit 'frame'
- name: str = 'acoustic.pitch.f0'#
- input_units: str | None = None#
- output_units: str | None = 'frame'#
- dependencies: list[str] = []#
- default_config: dict = {'fmax': 500.0, 'fmin': 75.0, 'frame_length': 0.04, 'frame_step': 0.01}#
- compute(feature_input, params)[source]#
Compute the frame-aligned F0 contour for one stream.
- Parameters:
feature_input (FeatureInput) – Prepared stream input containing audio and execution context.
params (dict) – Resolved extractor configuration.
- Returns:
Frame-aligned F0 contour.
- Return type:
- Raises:
ValueError – Raised when audio input is unavailable.
Notes
The output time axis matches the analysis frames used during F0 estimation.
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.features.acoustic.pitch.f0 import F0Extractor >>> from voxatlas.features.feature_input import FeatureInput >>> sr = 16000 >>> t = np.arange(0, sr // 10) / sr >>> waveform = (0.1 * np.sin(2 * np.pi * 100 * t)).astype(np.float32) >>> audio = Audio(waveform=waveform, sample_rate=sr) >>> feature_input = FeatureInput(audio=audio, units=None, context={}) >>> params = F0Extractor.default_config.copy() >>> out = F0Extractor().compute(feature_input, params) >>> out.time.shape[0] == out.values.shape[0] True