LogEnergyEnvelope#

Defined in: voxatlas.features.acoustic.envelope.log_energy

class voxatlas.features.acoustic.envelope.log_energy.LogEnergyEnvelope[source]#

Bases: BaseExtractor

Extract the acoustic.envelope.log_energy feature within the VoxAtlas pipeline.

Computes a smoothed, frame-aligned log-energy contour from the waveform by applying a logarithmic transform to a non-negative RMS amplitude envelope.

Algorithm#

The implementation mirrors the code path.

RMS amplitude The waveform is framed and converted to RMS values \(r_t \ge 0\).
Log transform VoxAtlas computes

\[e_t = \log(\max(r_t, \varepsilon)),\]

where \(\varepsilon\) is a small numerical floor.
Smoothing The resulting contour is optionally smoothed with a moving-average window of length smoothing frames.

name#

Registry key for this extractor ("acoustic.envelope.log_energy").

Type:: str

input_units#

Required input unit level. None means this extractor operates directly on waveform audio.

Type:: str | None

output_units#

Output alignment unit ("frame").

Type:: str | None

dependencies#

Upstream features required before execution. Empty for this extractor.

Type:: list[str]

default_config#

Default runtime parameters: frame_length=0.025, frame_step=0.01, peak_threshold=0.1, smoothing=1.

Type:: dict

References

Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. *IEEE Transactions on Acoustics, Speech, and Signal Processing, 28*(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420

Examples

>>> import numpy as np
>>> from voxatlas.audio.audio import Audio
>>> from voxatlas.features.acoustic.envelope.log_energy import LogEnergyEnvelope
>>> from voxatlas.features.feature_input import FeatureInput
>>> audio = Audio(waveform=np.zeros(1600, dtype=np.float32), sample_rate=16000)
>>> feature_input = FeatureInput(audio=audio, units=None, context={})
>>> params = LogEnergyEnvelope.default_config.copy()
>>> out = LogEnergyEnvelope().compute(feature_input, params)
>>> out.unit
'frame'

name: str = 'acoustic.envelope.log_energy'#

input_units: str | None = None#

output_units: str | None = 'frame'#

dependencies: list[str] = []#

default_config: dict = {'frame_length': 0.025, 'frame_step': 0.01, 'peak_threshold': 0.1, 'smoothing': 1}#

compute(feature_input, params)[source]#

Compute the log-energy contour for one stream.

Parameters:

feature_input (FeatureInput) – Prepared stream input containing audio and execution context.
params (dict) – Resolved extractor configuration.

Returns:

Frame-aligned log-energy contour.

Return type:

VectorFeatureOutput

Raises:

ValueError – Raised when audio input is unavailable.

Notes

Smoothing is applied after the log transform.

Examples

>>> import numpy as np
>>> from voxatlas.audio.audio import Audio
>>> from voxatlas.features.acoustic.envelope.log_energy import LogEnergyEnvelope
>>> from voxatlas.features.feature_input import FeatureInput
>>> audio = Audio(waveform=np.zeros(1600, dtype=np.float32), sample_rate=16000)
>>> feature_input = FeatureInput(audio=audio, units=None, context={})
>>> params = LogEnergyEnvelope.default_config.copy()
>>> out = LogEnergyEnvelope().compute(feature_input, params)
>>> out.values.shape[0] > 0
True