WordFrequencyExtractor#
Defined in: voxatlas.features.lexical.frequency.word_frequency
- class voxatlas.features.lexical.frequency.word_frequency.WordFrequencyExtractor[source]#
Bases:
BaseExtractorExtract the
lexical.frequency.wordfeature within the VoxAtlas pipeline.This public extractor defines the reusable API for computing
lexical.frequency.wordfrom VoxAtlas structured inputs. It consumestokenunits and produces values aligned totokenunits, making the extractor a stable pipeline node that can be cited independently of the surrounding execution machinery.Algorithm#
The extractor follows the standard VoxAtlas feature-computation pattern.
Input preparation Structured audio, unit tables, and dependency outputs are gathered from
feature_input.Feature-specific computation The implementation applies the domain-specific transformation required by this extractor.
Packaging Results are aligned to
tokenunits and returned as aFeatureOutputobject.
Notes
This extractor declares the upstream dependencies [‘lexical.frequency.lookup’] and is executed only after those features are available in the pipeline feature store.
Examples
>>> import pandas as pd >>> from voxatlas.features.feature_input import FeatureInput >>> from voxatlas.features.feature_output import TableFeatureOutput >>> from voxatlas.features.lexical.frequency.word_frequency import WordFrequencyExtractor >>> from voxatlas.pipeline.feature_store import FeatureStore >>> table = pd.DataFrame({"id": [1], "frequency": [10.0]}) >>> store = FeatureStore() >>> store.add("lexical.frequency.lookup", TableFeatureOutput(feature="lexical.frequency.lookup", unit="token", values=table)) >>> out = WordFrequencyExtractor().compute(FeatureInput(audio=None, units=None, context={"feature_store": store}), {}) >>> float(out.values.loc[1]) 10.0
- name: str = 'lexical.frequency.word'#
- input_units: str | None = 'token'#
- output_units: str | None = 'token'#
- dependencies: list[str] = ['lexical.frequency.lookup']#
- default_config: dict = {}#
- compute(feature_input, params)[source]#
Compute raw token-level frequency values from the lookup table.
- Parameters:
feature_input (FeatureInput) – Prepared stream input containing the feature store.
params (dict) – Resolved extractor configuration. Present for API consistency.
- Returns:
Token-aligned raw frequency values.
- Return type:
- Raises:
KeyError – Raised when the lexical lookup dependency is unavailable.
Notes
The output index matches the token ids from the lookup dependency.
Examples
>>> import pandas as pd >>> from voxatlas.features.feature_input import FeatureInput >>> from voxatlas.features.feature_output import TableFeatureOutput >>> from voxatlas.features.lexical.frequency.word_frequency import WordFrequencyExtractor >>> from voxatlas.pipeline.feature_store import FeatureStore >>> table = pd.DataFrame({"id": [1], "frequency": [10.0]}) >>> store = FeatureStore() >>> store.add("lexical.frequency.lookup", TableFeatureOutput(feature="lexical.frequency.lookup", unit="token", values=table)) >>> result = WordFrequencyExtractor().compute(FeatureInput(audio=None, units=None, context={"feature_store": store}), {}) >>> result.unit 'token'