LogEnergyEnvelope#
Defined in: voxatlas.features.acoustic.envelope.log_energy
- class voxatlas.features.acoustic.envelope.log_energy.LogEnergyEnvelope[source]#
Bases:
BaseExtractorExtract the
acoustic.envelope.log_energyfeature within the VoxAtlas pipeline.Computes a smoothed, frame-aligned log-energy contour from the waveform by applying a logarithmic transform to a non-negative RMS amplitude envelope.
Algorithm#
The implementation mirrors the code path.
RMS amplitude The waveform is framed and converted to RMS values \(r_t \ge 0\).
Log transform VoxAtlas computes
\[e_t = \log(\max(r_t, \varepsilon)),\]where \(\varepsilon\) is a small numerical floor.
Smoothing The resulting contour is optionally smoothed with a moving-average window of length
smoothingframes.
- name#
Registry key for this extractor (
"acoustic.envelope.log_energy").- Type:
str
- input_units#
Required input unit level.
Nonemeans this extractor operates directly on waveform audio.- Type:
str | None
- output_units#
Output alignment unit (
"frame").- Type:
str | None
- dependencies#
Upstream features required before execution. Empty for this extractor.
- Type:
list[str]
- default_config#
Default runtime parameters:
frame_length=0.025,frame_step=0.01,peak_threshold=0.1,smoothing=1.- Type:
dict
References
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. *IEEE Transactions on Acoustics, Speech, and Signal Processing, 28*(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.features.acoustic.envelope.log_energy import LogEnergyEnvelope >>> from voxatlas.features.feature_input import FeatureInput >>> audio = Audio(waveform=np.zeros(1600, dtype=np.float32), sample_rate=16000) >>> feature_input = FeatureInput(audio=audio, units=None, context={}) >>> params = LogEnergyEnvelope.default_config.copy() >>> out = LogEnergyEnvelope().compute(feature_input, params) >>> out.unit 'frame'
- name: str = 'acoustic.envelope.log_energy'#
- input_units: str | None = None#
- output_units: str | None = 'frame'#
- dependencies: list[str] = []#
- default_config: dict = {'frame_length': 0.025, 'frame_step': 0.01, 'peak_threshold': 0.1, 'smoothing': 1}#
- compute(feature_input, params)[source]#
Compute the log-energy contour for one stream.
- Parameters:
feature_input (FeatureInput) – Prepared stream input containing audio and execution context.
params (dict) – Resolved extractor configuration.
- Returns:
Frame-aligned log-energy contour.
- Return type:
- Raises:
ValueError – Raised when audio input is unavailable.
Notes
Smoothing is applied after the log transform.
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.features.acoustic.envelope.log_energy import LogEnergyEnvelope >>> from voxatlas.features.feature_input import FeatureInput >>> audio = Audio(waveform=np.zeros(1600, dtype=np.float32), sample_rate=16000) >>> feature_input = FeatureInput(audio=audio, units=None, context={}) >>> params = LogEnergyEnvelope.default_config.copy() >>> out = LogEnergyEnvelope().compute(feature_input, params) >>> out.values.shape[0] > 0 True