SyntaxDependenciesExtractor#
Defined in: voxatlas.features.syntax.dependencies
- class voxatlas.features.syntax.dependencies.SyntaxDependenciesExtractor[source]#
Bases:
BaseExtractorExtract the
syntax.dependenciesfeature within the VoxAtlas pipeline.This public extractor defines the reusable API for computing
syntax.dependenciesfrom VoxAtlas structured inputs. It consumestokenunits and produces values aligned totokenunits, making the extractor a stable pipeline node that can be cited independently of the surrounding execution machinery.Algorithm#
The extractor derives syntactic descriptors from dependency annotations aligned to tokens or sentences.
Dependency retrieval The required dependency table is loaded from the feature store.
Structural computation The implementation applies relation labeling, clause grouping, or sentence-level aggregation depending on the extractor.
Packaging Results are aligned to
tokenunits and returned for later discourse-level summaries.
Examples
>>> import pandas as pd >>> from voxatlas.features.feature_input import FeatureInput >>> from voxatlas.features.syntax.dependencies import SyntaxDependenciesExtractor >>> from voxatlas.units import Units >>> tokens = pd.DataFrame( ... {"id": [1, 2], "token": ["hello", "world"], "head": [2, 0], "dep_rel": ["nsubj", "root"], "pos": ["INTJ", "NOUN"]} ... ) >>> units = Units(tokens=tokens) >>> out = SyntaxDependenciesExtractor().compute(FeatureInput(audio=None, units=units, context={}), {"backend": "spacy"}) >>> out.values.loc[:, ["token_id", "head_id"]].to_dict(orient="list") {'token_id': [1, 2], 'head_id': [2, 0]}
- name: str = 'syntax.dependencies'#
- input_units: str | None = 'token'#
- output_units: str | None = 'token'#
- dependencies: list[str] = []#
- default_config: dict = {'backend': 'spacy'}#
- compute(feature_input, params)[source]#
Compute dependency annotations for one stream.
- Parameters:
feature_input (FeatureInput) – Prepared stream input containing token annotations and context.
params (dict) – Resolved extractor configuration.
- Returns:
Token-aligned dependency table.
- Return type:
- Raises:
ValueError – Raised when parsing fails or token annotations are incompatible.
Notes
The returned table is designed to be consumed by other syntax features.
Examples
>>> import pandas as pd >>> from voxatlas.features.feature_input import FeatureInput >>> from voxatlas.features.syntax.dependencies import SyntaxDependenciesExtractor >>> from voxatlas.units import Units >>> tokens = pd.DataFrame( ... {"id": [1, 2], "token": ["hello", "world"], "head": [2, 0], "dep_rel": ["nsubj", "root"], "pos": ["INTJ", "NOUN"]} ... ) >>> units = Units(tokens=tokens) >>> result = SyntaxDependenciesExtractor().compute(FeatureInput(audio=None, units=units, context={}), {"backend": "spacy"}) >>> result.unit 'token'