Developer Module Tree#
Raw module-oriented API exposure for developers.
Top-level public interfaces for VoxAtlas.
- class voxatlas.DatasetInput(audio_streams, units_streams)[source]#
Store every stream loaded for one conversation.
- Parameters:
- Returns:
Dataclass containing per-channel dataset inputs.
- Return type:
Notes
Audio and alignment streams are paired by channel order when both modalities are present.
Examples
>>> from voxatlas.io import DatasetInput >>> dataset = DatasetInput(audio_streams=None, units_streams=None) >>> dataset.streams() []
- streams()[source]#
Return paired stream objects for the conversation.
- Returns:
Stream objects pairing audio and alignment data where possible.
- Return type:
list of DatasetStream
- Raises:
ValueError – Raised when the audio and alignment channel counts differ.
Notes
When only one modality is present, the other field is set to
None.Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.io import DatasetInput >>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000) >>> dataset = DatasetInput(audio_streams=[audio], units_streams=None) >>> streams = dataset.streams() >>> len(streams) 1 >>> (streams[0].audio is not None, streams[0].units is None) (True, True)
- class voxatlas.DatasetStream(audio, units)[source]#
Represent one aligned stream from a conversation dataset.
- Parameters:
- Returns:
Dataclass describing one multimodal stream.
- Return type:
Notes
A stream may contain audio only, units only, or both modalities.
Examples
>>> from voxatlas.io import DatasetStream >>> stream = DatasetStream(audio=None, units=None) >>> (stream.audio is None, stream.units is None) (True, True)
- class voxatlas.ExecutionPlan(layers)[source]#
Represent a dependency-sorted feature execution plan.
- Parameters:
layers (iterable of iterable of str) – Sequence of dependency layers. Features in the same layer can be executed independently.
- Returns:
Normalized execution plan.
- Return type:
Notes
The
featuresattribute flattens the layer structure in execution order.Examples
>>> from voxatlas.pipeline.execution_plan import ExecutionPlan >>> plan = ExecutionPlan([["a"], ["b", "c"]]) >>> plan.features ['a', 'b', 'c']
- class voxatlas.FeatureStore[source]#
Store intermediate and final feature outputs for one pipeline run.
The feature store is the shared lookup table used during dependency resolution. Extractors read dependency outputs from this object instead of recomputing upstream features.
Examples
>>> from voxatlas.pipeline.feature_store import FeatureStore >>> store = FeatureStore() >>> store.add("acoustic.pitch.f0", {"value": 123}) >>> store.exists("acoustic.pitch.f0") True
- add(feature_name, result)[source]#
Add a computed output to the store.
- Parameters:
feature_name (str) – Fully qualified feature name.
result (object) – Output object returned by an extractor.
- Returns:
The store is updated in place.
- Return type:
None
Notes
Adding the same feature name again overwrites the previous value.
Examples
>>> from voxatlas.pipeline.feature_store import FeatureStore >>> store = FeatureStore() >>> store.add("syntax.dependencies", {"edges": []})
- get(feature_name)[source]#
Retrieve a stored feature output.
- Parameters:
feature_name (str) – Fully qualified feature name.
- Returns:
Stored feature output.
- Return type:
object
- Raises:
KeyError – Raised when the feature is not present.
Examples
>>> from voxatlas.pipeline.feature_store import FeatureStore >>> store = FeatureStore() >>> store.add("syntax.dependencies", {"edges": []}) >>> store.get("syntax.dependencies") {'edges': []}
- exists(feature_name)[source]#
Check whether a feature has already been stored.
- Parameters:
feature_name (str) – Fully qualified feature name.
- Returns:
Truewhen the feature exists in the store.- Return type:
bool
Examples
>>> from voxatlas.pipeline.feature_store import FeatureStore >>> store = FeatureStore() >>> store.exists("lexical.frequency.lookup") False
- voxatlas.Pipeline#
alias of
VoxAtlasPipeline
- class voxatlas.Units(frames=None, tokens=None, phonemes=None, syllables=None, sentences=None, words=None, ipus=None, turns=None, speaker=None)[source]#
Container for hierarchical speech units (tables) for a single stream.
VoxAtlas feature extractors operate on unit tables (frames, tokens, phonemes, syllables, words, etc.) that are time-aligned and optionally linked through parent-child identifiers.
Unitsis a lightweight, backend-agnostic wrapper around those tables: it stores them, normalizes unit type names (singular/plural aliases), and provides a small set of convenience accessors (lookup, durations, parent/children grouping).This class intentionally does not enforce a rigid schema beyond what its helper methods require; extractors may expect additional columns such as
labelortokendepending on the feature.- Parameters:
frames (pandas.DataFrame | None) – Frame-level table.
tokens (pandas.DataFrame | None) – Token-level table.
phonemes (pandas.DataFrame | None) – Phoneme-level table.
syllables (pandas.DataFrame | None) – Syllable-level table.
sentences (pandas.DataFrame | None) – Sentence-level table.
words (pandas.DataFrame | None) – Word-level table.
ipus (pandas.DataFrame | None) – Inter-pausal-unit table.
turns (pandas.DataFrame | None) – Turn-level table.
speaker (str | None) – Optional speaker label for the stream.
- Returns:
Hierarchical unit container for one stream.
- Return type:
- frames, tokens, phonemes, syllables, sentences, words, ipus, turns
Stored unit tables. Any table may be
Noneif it is unavailable.- Type:
pandas.DataFrame | None
- speaker#
Speaker label associated with this stream (if known).
- Type:
str | None
Notes
Unit labels Methods that accept a
unit_type(for example,table()) accept both singular and plural labels:"frame"/"frames""token"/"tokens""phoneme"/"phonemes""syllable"/"syllables""sentence"/"sentences""word"/"words""ipu"/"ipus""turn"/"turns"
Table conventions
Unitsworks best when each DataFrame follows a few simple conventions:id: unique identifier for the unit row (typically integer-like).startandend: segment boundaries on a shared timeline (commonly seconds). Used byduration()and by many extractors.Parent-child links (optional): to connect units explicitly, include an
<parent>_idcolumn on the child table. For example, syllables that belong to words can carry aword_idcolumn; phonemes that belong to syllables can carry asyllable_idcolumn.parent()andchildren()use this naming convention.table()returns the underlying DataFrame object. If you mutate it, you are mutating the table stored on theUnitsinstance.If a requested table is missing (
None),table()raisesValueError; callers can either catch this or check the relevant attribute first.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> syllables = pd.DataFrame( ... {"id": [10], "word_id": [1], "start": [0.0], "end": [0.5], "label": ["he"]} ... ) >>> units = Units(words=words, syllables=syllables, speaker="A") >>> units.table("word").shape (1, 4) >>> float(units.duration("word").iloc[0]) 1.0
- table(unit_type)[source]#
Return the table for a requested unit type.
- Parameters:
unit_type (str) – Unit label such as
"token"or"syllable".- Returns:
Table associated with the requested unit type.
- Return type:
pandas.DataFrame
- Raises:
ValueError – Raised when the unit type is invalid or unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> tokens = pd.DataFrame({"id": [1], "start": [0.0], "end": [0.2], "label": ["hi"]}) >>> units = Units(tokens=tokens) >>> units.table("token").columns.tolist() ['id', 'start', 'end', 'label']
- get(unit_type)[source]#
Alias for
table().- Parameters:
unit_type (str) – Requested unit label.
- Returns:
Requested unit table.
- Return type:
pandas.DataFrame
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> units = Units(words=words) >>> units.get("word").shape[0] 1
- duration(unit_type)[source]#
Compute durations from
startandendcolumns.- Parameters:
unit_type (str) – Requested unit label.
- Returns:
Duration values for each row.
- Return type:
pandas.Series
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.25], "end": [1.00], "label": ["hello"]}) >>> units = Units(words=words) >>> float(units.duration("word").iloc[0]) 0.75
- parent(child_type, parent_type)[source]#
Return parent identifiers for a child unit table.
- Parameters:
child_type (str) – Child unit label.
parent_type (str) – Parent unit label.
- Returns:
Parent identifier column.
- Return type:
pandas.Series
- Raises:
ValueError – Raised when the mapping column is unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> syllables = pd.DataFrame( ... {"id": [10, 11], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["he", "llo"]} ... ) >>> units = Units(words=words, syllables=syllables) >>> units.parent("syllable", "word").tolist() [1, 1]
- children(parent_type, child_type)[source]#
Group child units by parent identifier.
- Parameters:
parent_type (str) – Parent unit label.
child_type (str) – Child unit label.
- Returns:
Grouped child table keyed by parent identifier.
- Return type:
DataFrameGroupBy
- Raises:
ValueError – Raised when the mapping column is unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> phonemes = pd.DataFrame( ... {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]} ... ) >>> units = Units(words=words, phonemes=phonemes) >>> units.children("word", "phoneme").ngroups 1
- group(child_type, by)[source]#
Alias for
children()usingbyas the parent unit.- Parameters:
child_type (str) – Child unit label.
by (str) – Parent unit label.
- Returns:
Grouped child table.
- Return type:
DataFrameGroupBy
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> phonemes = pd.DataFrame( ... {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]} ... ) >>> units = Units(words=words, phonemes=phonemes) >>> units.group("phoneme", by="word").ngroups 1
- class voxatlas.VoxAtlasPipeline(audio, units, config)[source]#
Run a VoxAtlas feature extraction workflow for a single stream.
A pipeline instance combines one audio stream, one unit hierarchy, and a runtime configuration. It validates requested features, resolves dependency layers, executes extractors in order, and stores intermediate results so downstream features can reuse them.
- Parameters:
audio (Audio | None) – Audio stream for the current conversation channel. Acoustic features require this input.
units (Units | None) – Hierarchical unit container for the current stream. Linguistic and alignment-based features require this input.
config (dict) – Runtime configuration containing the requested features and pipeline options.
- Returns:
Configured pipeline instance ready to execute.
- Return type:
Notes
VoxAtlas resolves dependencies through the feature registry and executes each dependency layer sequentially while allowing optional parallelism inside a layer.
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.pipeline import Pipeline >>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000) >>> pipeline = Pipeline( ... audio=audio, ... units=None, ... config={"features": ["acoustic.pitch.dummy"], "pipeline": {"n_jobs": 1, "cache": False}}, ... ) >>> results = pipeline.run() >>> results.exists("acoustic.pitch.dummy") True
- run()[source]#
Execute the configured feature graph and return computed outputs.
The pipeline validates the requested features, creates an execution plan from registry dependencies, then executes each dependency layer in order. Intermediate outputs are inserted into a feature store so later features can retrieve them.
- Returns:
Store containing requested features and any computed dependencies.
- Return type:
- Raises:
ValueError – Raised when the dependency graph contains a cycle.
KeyError – Raised when a required feature is missing from the store or cache during execution.
Notes
When caching is enabled, cached outputs are loaded before an extractor is scheduled for execution.
Examples
>>> import numpy as np >>> from voxatlas.audio.audio import Audio >>> from voxatlas.pipeline import Pipeline >>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000) >>> pipeline = Pipeline( ... audio=audio, ... units=None, ... config={"features": ["acoustic.pitch.dummy"], "pipeline": {"n_jobs": 1, "cache": False}}, ... ) >>> results = pipeline.run() >>> results.exists("acoustic.pitch.dummy") True
- voxatlas.expand_defaults(cfg)[source]#
Merge a user configuration with VoxAtlas defaults.
What “Expand Defaults” Means#
VoxAtlas maintains a small built-in default configuration (
voxatlas.config.defaults.DEFAULT_CONFIG).expand_defaultsstarts from a deep copy of that default mapping and then applies the user configuration on top.This is a shallow top-level merge:
Only the first level of keys is merged (via
dict.update).If the user provides a top-level key, it replaces the default value for that key entirely.
Nested mappings are not deep-merged. For example, providing a
pipelinemapping replaces the whole defaultpipelinemapping.
Concretely, given the default:
{"features": [], "pipeline": {"cache": True}}
The following user config:
{"pipeline": {"n_jobs": 4}}
Produces:
{"features": [], "pipeline": {"n_jobs": 4}}
(note how
pipeline.cacheis not preserved because nested dicts are not merged).- param cfg:
User-supplied configuration dictionary.
- type cfg:
dict
- returns:
Configuration with top-level defaults applied.
- rtype:
dict
Notes
If you want to override just one pipeline option while keeping other defaults, pass the full desired
pipelinemapping (or useload_and_prepare_config(), which is the recommended config entry point for most workflows).Examples
>>> from voxatlas.config import expand_defaults >>> cfg = expand_defaults({"features": ["acoustic.pitch.dummy"]}) >>> cfg["features"] ['acoustic.pitch.dummy'] >>> sorted(cfg["pipeline"].keys()) ['cache']
- Parameters:
cfg (dict)
- Return type:
dict
- voxatlas.load_alignment(path)[source]#
Load an alignment file into a
Unitscontainer.This is a lightweight compatibility entry point for alignment ingestion. The current implementation returns an empty
Unitsobject and does not parse the file content yet.- Parameters:
path (str) – Filesystem path to an alignment file (for example, a TextGrid file). The path is accepted for API consistency, even though content parsing is not implemented in this helper yet.
- Returns:
An empty
Unitscontainer.- Return type:
Notes
For full data loading workflows, prefer higher-level input loading helpers that combine audio, alignment, and metadata validation.
Examples
>>> from voxatlas.units.alignment import load_alignment >>> from voxatlas.units.units import Units >>> units = load_alignment("alignment.TextGrid") >>> isinstance(units, Units) True
- voxatlas.load_and_prepare_config(path)[source]#
Load, validate, and normalize a VoxAtlas configuration.
- Parameters:
path (str) – Filesystem path to a YAML configuration file.
- Returns:
Validated configuration with defaults applied.
- Return type:
dict
- Raises:
ConfigValidationError – Raised when the configuration does not satisfy the expected schema.
Notes
This is the recommended configuration entry point for the CLI and tutorial workflows.
Examples
>>> import tempfile >>> from pathlib import Path >>> from voxatlas.config import load_and_prepare_config >>> yaml_text = "features:\n - acoustic.pitch.dummy\n" >>> with tempfile.TemporaryDirectory() as tmp: ... path = Path(tmp) / "config.yaml" ... _ = path.write_text(yaml_text, encoding="utf-8") ... cfg = load_and_prepare_config(str(path)) ... cfg["features"] ['acoustic.pitch.dummy']
- voxatlas.load_config(path)[source]#
Load a VoxAtlas YAML configuration file.
Expected YAML Format#
VoxAtlas configuration files are YAML mappings (YAML “dicts”) with a small set of conventional top-level keys. The minimal valid config contains a
featureslist:features: - acoustic.pitch.dummy
Optional keys supported by the pipeline and config layer include:
pipeline: pipeline runtime options (mapping) -n_jobs: number of worker processes per dependency layer (int) -cache: enable/disable on-disk feature caching (bool) -cache_dir: cache directory when caching is enabled (str)feature_config: per-feature parameter overrides (mapping) - keys are feature names fromfeatures- values are extractor-specific parameter mappings
Example with per-feature parameters and pipeline options:
features: - phonology.prosody.stressed - acoustic.pitch.f0 pipeline: n_jobs: 4 cache: true cache_dir: .voxatlas_cache feature_config: phonology.prosody.stressed: language: fra resource_root: /path/to/resources/phonology
- param path:
Filesystem path to a YAML configuration file.
- type path:
str
- returns:
Parsed configuration dictionary.
- rtype:
dict
- raises OSError:
Raised when the file cannot be opened.
- raises yaml.YAMLError:
Raised when the YAML document is invalid.
Notes
This function parses YAML only. It does not apply defaults or schema validation. For the recommended entry point that validates and applies defaults, see
load_and_prepare_config().Examples
>>> import tempfile >>> from pathlib import Path >>> from voxatlas.config import load_config >>> yaml_text = "features:\n - acoustic.pitch.dummy\n" >>> with tempfile.TemporaryDirectory() as tmp: ... path = Path(tmp) / "config.yaml" ... _ = path.write_text(yaml_text, encoding="utf-8") ... cfg = load_config(str(path)) ... cfg["features"] ['acoustic.pitch.dummy']
- Parameters:
path (str)
- Return type:
dict
- voxatlas.load_dataset(dataset_root, conversation_id)[source]#
Load audio and alignment inputs for one conversation.
- Parameters:
dataset_root (str) – Root directory containing
audio/andalignment/subdirectories.conversation_id (str) – Conversation identifier shared by the audio and alignment files.
- Returns:
Loaded dataset object with channel-wise streams.
- Return type:
- Raises:
ValueError – Raised when the directory layout is invalid or required files are missing.
Notes
VoxAtlas expects the SPPAS-style alignment layout used by the repository examples and tests.
Examples
>>> import tempfile >>> from pathlib import Path >>> from voxatlas.io import load_dataset >>> >>> def _write_textgrid(path: Path, tier_names: list[str]) -> None: ... items = [] ... for idx, name in enumerate(tier_names, start=1): ... items.extend( ... [ ... f"item [{idx}]:", ... f' name = "{name}"', ... " intervals [1]:", ... " xmin = 0", ... " xmax = 0.5", ... ' text = "x"', ... ] ... ) ... path.write_text("\n".join(items) + "\n", encoding="utf-8") >>> >>> with tempfile.TemporaryDirectory() as tmp: ... root = Path(tmp) ... (root / "alignment" / "palign").mkdir(parents=True) ... (root / "alignment" / "syll").mkdir(parents=True) ... (root / "alignment" / "ipu").mkdir(parents=True) ... conv = "conversation01" ... for ch in ("ch1", "ch2"): ... _write_textgrid( ... root / "alignment" / "palign" / f"{conv}_{ch}.TextGrid", ... ["TokensAlign", "PhonAlign"], ... ) ... _write_textgrid( ... root / "alignment" / "syll" / f"{conv}_{ch}.TextGrid", ... ["SyllAlign", "SyllClassAlign"], ... ) ... _write_textgrid( ... root / "alignment" / "ipu" / f"{conv}_{ch}.TextGrid", ... ["IPU"], ... ) ... dataset = load_dataset(str(root), conv) ... streams = dataset.streams() ... (len(streams), streams[0].units.speaker, streams[1].units.speaker) (2, 'A', 'B')
- voxatlas.load_textgrid(path)[source]#
Parse a Praat TextGrid file into per-tier interval tables.
Each returned DataFrame contains interval rows with
id,start,end, andlabelcolumns. Tier names are used as dictionary keys.- Parameters:
path (str | Path) – Path to a TextGrid file on disk.
- Returns:
Mapping from tier name to interval table.
- Return type:
dict[str, pandas.DataFrame]
Notes
This parser targets interval tiers (
intervals [n]blocks). Point tiers are not expanded into the output structure.Examples
>>> import tempfile >>> from pathlib import Path >>> from voxatlas.units.alignment_loader import load_textgrid >>> textgrid = "\n".join( ... [ ... "item [1]:", ... ' name = "words"', ... " intervals [1]:", ... " xmin = 0", ... " xmax = 0.5", ... ' text = "hello"', ... "item [2]:", ... ' name = "phones"', ... " intervals [1]:", ... " xmin = 0", ... " xmax = 0.5", ... ' text = "h"', ... ] ... ) + "\n" >>> with tempfile.TemporaryDirectory() as tmp: ... path = Path(tmp) / "alignment.TextGrid" ... _ = path.write_text(textgrid, encoding="utf-8") ... tiers = load_textgrid(path) ... (sorted(tiers.keys()), tiers["words"].columns.tolist()) (['phones', 'words'], ['id', 'start', 'end', 'label'])