Feature System#
VoxAtlas features are implemented as small, composable extractors that share a common contract. The registry and pipeline use this contract to discover features, validate metadata (names/units/dependencies), and execute extractors consistently.
The feature system centers on:
extractor classes that implement one feature
structured feature inputs (audio, units, shared context)
typed feature outputs (scalar/vector/matrix/table/array containers)
registry metadata used for discovery, validation, and dependency planning
Extractor contract (what you implement)#
All extractors inherit from BaseExtractor
and typically define:
name(required): fully-qualified feature name like"acoustic.pitch.f0"input_units/output_units(optional): unit labels such as"token"or"frame"(orNonefor audio/global features)dependencies(optional): upstream feature names that must run firstdefault_config(optional): per-feature default parameterscompute(feature_input, params)(required): returns a structured feature output
Extractors should be stateless. If you need upstream results, read them from
feature_input.context["feature_store"] rather than storing them on the
extractor instance.
Feature inputs (what you receive)#
Each extractor invocation receives a
FeatureInput bundle:
feature_input.audio:AudioorNonefeature_input.units:UnitsorNonefeature_input.context: shared runtime dictionary (pipeline config + feature store)
The pipeline stores the runtime config and the feature store in the context:
store = feature_input.context["feature_store"]
upstream = store.get("syntax.dependencies")
params = feature_input.context["config"]
Feature outputs (what you return)#
VoxAtlas standardizes common output shapes in
voxatlas.features.feature_output:
ScalarFeatureOutput: one scalar per unitVectorFeatureOutput: time-aligned 1D sequenceMatrixFeatureOutput: time-frequency matrixTableFeatureOutput: tabular output (DataFrame)ArrayFeatureOutput: raw NumPy array
Most extractors should return one of these dataclasses so downstream consumers and writers can handle outputs uniformly.
Registry + discovery (how features become runnable)#
VoxAtlas uses a global FeatureRegistry instance
to map feature names to extractor classes and metadata.
Registration: feature modules typically call
registry.register(MyExtractor)at import time.Discovery:
voxatlas.core.discovery.discover_features()walks thevoxatlas.featurespackage and imports modules to trigger registrations.Optional dependencies: if importing a feature module fails due to a missing third-party dependency, discovery records an unavailable registry entry (name, units, dependencies, missing dependency) so the CLI can still report it.
Configuration and parameters#
The pipeline resolves per-feature parameters by merging:
an extractor’s
default_config(if provided), withuser overrides under
config["feature_config"][<feature_name>]
See voxatlas.config.feature_config.resolve_feature_config() for the exact
merge behavior.
Units and alignment#
Unit labels declared on extractors are validated when the extractor is
registered (for example "token", "word", "frame", or
"conversation"). For how unit tables are represented at runtime, see
Unit Hierarchy.
At execution time, extractors should still check that required unit tables and columns exist for the current dataset/stream and raise clear errors when they do not.
Useful API pages#
voxatlas.core.discovery.discover_features()
Where to go next#
Writing Extractors for a step-by-step extractor tutorial
Pipeline for how dependencies, parallelism, and caching are executed