Feature System#

VoxAtlas features are implemented as small, composable extractors that share a common contract. The registry and pipeline use this contract to discover features, validate metadata (names/units/dependencies), and execute extractors consistently.

The feature system centers on:

  • extractor classes that implement one feature

  • structured feature inputs (audio, units, shared context)

  • typed feature outputs (scalar/vector/matrix/table/array containers)

  • registry metadata used for discovery, validation, and dependency planning

Extractor contract (what you implement)#

All extractors inherit from BaseExtractor and typically define:

  • name (required): fully-qualified feature name like "acoustic.pitch.f0"

  • input_units / output_units (optional): unit labels such as "token" or "frame" (or None for audio/global features)

  • dependencies (optional): upstream feature names that must run first

  • default_config (optional): per-feature default parameters

  • compute(feature_input, params) (required): returns a structured feature output

Extractors should be stateless. If you need upstream results, read them from feature_input.context["feature_store"] rather than storing them on the extractor instance.

Feature inputs (what you receive)#

Each extractor invocation receives a FeatureInput bundle:

  • feature_input.audio: Audio or None

  • feature_input.units: Units or None

  • feature_input.context: shared runtime dictionary (pipeline config + feature store)

The pipeline stores the runtime config and the feature store in the context:

store = feature_input.context["feature_store"]
upstream = store.get("syntax.dependencies")
params = feature_input.context["config"]

Feature outputs (what you return)#

VoxAtlas standardizes common output shapes in voxatlas.features.feature_output:

Most extractors should return one of these dataclasses so downstream consumers and writers can handle outputs uniformly.

Registry + discovery (how features become runnable)#

VoxAtlas uses a global FeatureRegistry instance to map feature names to extractor classes and metadata.

  • Registration: feature modules typically call registry.register(MyExtractor) at import time.

  • Discovery: voxatlas.core.discovery.discover_features() walks the voxatlas.features package and imports modules to trigger registrations.

  • Optional dependencies: if importing a feature module fails due to a missing third-party dependency, discovery records an unavailable registry entry (name, units, dependencies, missing dependency) so the CLI can still report it.

Configuration and parameters#

The pipeline resolves per-feature parameters by merging:

  1. an extractor’s default_config (if provided), with

  2. user overrides under config["feature_config"][<feature_name>]

See voxatlas.config.feature_config.resolve_feature_config() for the exact merge behavior.

Units and alignment#

Unit labels declared on extractors are validated when the extractor is registered (for example "token", "word", "frame", or "conversation"). For how unit tables are represented at runtime, see Unit Hierarchy.

At execution time, extractors should still check that required unit tables and columns exist for the current dataset/stream and raise clear errors when they do not.

Useful API pages#

Where to go next#