Developer Module Tree#

Raw module-oriented API exposure for developers.

Top-level public interfaces for VoxAtlas.

class voxatlas.DatasetInput(audio_streams, units_streams)[source]#

Store every stream loaded for one conversation.

Parameters:
  • audio_streams (list of Audio | None) – Audio streams loaded from the dataset.

  • units_streams (list of Units | None) – Alignment streams loaded from TextGrid files.

Returns:

Dataclass containing per-channel dataset inputs.

Return type:

DatasetInput

Notes

Audio and alignment streams are paired by channel order when both modalities are present.

Examples

>>> from voxatlas.io import DatasetInput
>>> dataset = DatasetInput(audio_streams=None, units_streams=None)
>>> dataset.streams()
[]
audio_streams: list[Audio] | None#
units_streams: list[Units] | None#
streams()[source]#

Return paired stream objects for the conversation.

Returns:

Stream objects pairing audio and alignment data where possible.

Return type:

list of DatasetStream

Raises:

ValueError – Raised when the audio and alignment channel counts differ.

Notes

When only one modality is present, the other field is set to None.

Examples

>>> import numpy as np
>>> from voxatlas.audio.audio import Audio
>>> from voxatlas.io import DatasetInput
>>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000)
>>> dataset = DatasetInput(audio_streams=[audio], units_streams=None)
>>> streams = dataset.streams()
>>> len(streams)
1
>>> (streams[0].audio is not None, streams[0].units is None)
(True, True)
class voxatlas.DatasetStream(audio, units)[source]#

Represent one aligned stream from a conversation dataset.

Parameters:
  • audio (Audio | None) – Audio stream for one channel, if available.

  • units (Units | None) – Hierarchical unit container for the same channel, if available.

Returns:

Dataclass describing one multimodal stream.

Return type:

DatasetStream

Notes

A stream may contain audio only, units only, or both modalities.

Examples

>>> from voxatlas.io import DatasetStream
>>> stream = DatasetStream(audio=None, units=None)
>>> (stream.audio is None, stream.units is None)
(True, True)
audio: Audio | None#
units: Units | None#
class voxatlas.ExecutionPlan(layers)[source]#

Represent a dependency-sorted feature execution plan.

Parameters:

layers (iterable of iterable of str) – Sequence of dependency layers. Features in the same layer can be executed independently.

Returns:

Normalized execution plan.

Return type:

ExecutionPlan

Notes

The features attribute flattens the layer structure in execution order.

Examples

>>> from voxatlas.pipeline.execution_plan import ExecutionPlan
>>> plan = ExecutionPlan([["a"], ["b", "c"]])
>>> plan.features
['a', 'b', 'c']
class voxatlas.FeatureStore[source]#

Store intermediate and final feature outputs for one pipeline run.

The feature store is the shared lookup table used during dependency resolution. Extractors read dependency outputs from this object instead of recomputing upstream features.

Examples

>>> from voxatlas.pipeline.feature_store import FeatureStore
>>> store = FeatureStore()
>>> store.add("acoustic.pitch.f0", {"value": 123})
>>> store.exists("acoustic.pitch.f0")
True
add(feature_name, result)[source]#

Add a computed output to the store.

Parameters:
  • feature_name (str) – Fully qualified feature name.

  • result (object) – Output object returned by an extractor.

Returns:

The store is updated in place.

Return type:

None

Notes

Adding the same feature name again overwrites the previous value.

Examples

>>> from voxatlas.pipeline.feature_store import FeatureStore
>>> store = FeatureStore()
>>> store.add("syntax.dependencies", {"edges": []})
get(feature_name)[source]#

Retrieve a stored feature output.

Parameters:

feature_name (str) – Fully qualified feature name.

Returns:

Stored feature output.

Return type:

object

Raises:

KeyError – Raised when the feature is not present.

Examples

>>> from voxatlas.pipeline.feature_store import FeatureStore
>>> store = FeatureStore()
>>> store.add("syntax.dependencies", {"edges": []})
>>> store.get("syntax.dependencies")
{'edges': []}
exists(feature_name)[source]#

Check whether a feature has already been stored.

Parameters:

feature_name (str) – Fully qualified feature name.

Returns:

True when the feature exists in the store.

Return type:

bool

Examples

>>> from voxatlas.pipeline.feature_store import FeatureStore
>>> store = FeatureStore()
>>> store.exists("lexical.frequency.lookup")
False
voxatlas.Pipeline#

alias of VoxAtlasPipeline

class voxatlas.Units(frames=None, tokens=None, phonemes=None, syllables=None, sentences=None, words=None, ipus=None, turns=None, speaker=None)[source]#

Container for hierarchical speech units (tables) for a single stream.

VoxAtlas feature extractors operate on unit tables (frames, tokens, phonemes, syllables, words, etc.) that are time-aligned and optionally linked through parent-child identifiers. Units is a lightweight, backend-agnostic wrapper around those tables: it stores them, normalizes unit type names (singular/plural aliases), and provides a small set of convenience accessors (lookup, durations, parent/children grouping).

This class intentionally does not enforce a rigid schema beyond what its helper methods require; extractors may expect additional columns such as label or token depending on the feature.

Parameters:
  • frames (pandas.DataFrame | None) – Frame-level table.

  • tokens (pandas.DataFrame | None) – Token-level table.

  • phonemes (pandas.DataFrame | None) – Phoneme-level table.

  • syllables (pandas.DataFrame | None) – Syllable-level table.

  • sentences (pandas.DataFrame | None) – Sentence-level table.

  • words (pandas.DataFrame | None) – Word-level table.

  • ipus (pandas.DataFrame | None) – Inter-pausal-unit table.

  • turns (pandas.DataFrame | None) – Turn-level table.

  • speaker (str | None) – Optional speaker label for the stream.

Returns:

Hierarchical unit container for one stream.

Return type:

Units

frames, tokens, phonemes, syllables, sentences, words, ipus, turns

Stored unit tables. Any table may be None if it is unavailable.

Type:

pandas.DataFrame | None

speaker#

Speaker label associated with this stream (if known).

Type:

str | None

Notes

Unit labels Methods that accept a unit_type (for example, table()) accept both singular and plural labels:

  • "frame" / "frames"

  • "token" / "tokens"

  • "phoneme" / "phonemes"

  • "syllable" / "syllables"

  • "sentence" / "sentences"

  • "word" / "words"

  • "ipu" / "ipus"

  • "turn" / "turns"

Table conventions Units works best when each DataFrame follows a few simple conventions:

  • id: unique identifier for the unit row (typically integer-like).

  • start and end: segment boundaries on a shared timeline (commonly seconds). Used by duration() and by many extractors.

  • Parent-child links (optional): to connect units explicitly, include an <parent>_id column on the child table. For example, syllables that belong to words can carry a word_id column; phonemes that belong to syllables can carry a syllable_id column. parent() and children() use this naming convention.

  • table() returns the underlying DataFrame object. If you mutate it, you are mutating the table stored on the Units instance.

  • If a requested table is missing (None), table() raises ValueError; callers can either catch this or check the relevant attribute first.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> syllables = pd.DataFrame(
...     {"id": [10], "word_id": [1], "start": [0.0], "end": [0.5], "label": ["he"]}
... )
>>> units = Units(words=words, syllables=syllables, speaker="A")
>>> units.table("word").shape
(1, 4)
>>> float(units.duration("word").iloc[0])
1.0
table(unit_type)[source]#

Return the table for a requested unit type.

Parameters:

unit_type (str) – Unit label such as "token" or "syllable".

Returns:

Table associated with the requested unit type.

Return type:

pandas.DataFrame

Raises:

ValueError – Raised when the unit type is invalid or unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> tokens = pd.DataFrame({"id": [1], "start": [0.0], "end": [0.2], "label": ["hi"]})
>>> units = Units(tokens=tokens)
>>> units.table("token").columns.tolist()
['id', 'start', 'end', 'label']
get(unit_type)[source]#

Alias for table().

Parameters:

unit_type (str) – Requested unit label.

Returns:

Requested unit table.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> units = Units(words=words)
>>> units.get("word").shape[0]
1
duration(unit_type)[source]#

Compute durations from start and end columns.

Parameters:

unit_type (str) – Requested unit label.

Returns:

Duration values for each row.

Return type:

pandas.Series

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.25], "end": [1.00], "label": ["hello"]})
>>> units = Units(words=words)
>>> float(units.duration("word").iloc[0])
0.75
parent(child_type, parent_type)[source]#

Return parent identifiers for a child unit table.

Parameters:
  • child_type (str) – Child unit label.

  • parent_type (str) – Parent unit label.

Returns:

Parent identifier column.

Return type:

pandas.Series

Raises:

ValueError – Raised when the mapping column is unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> syllables = pd.DataFrame(
...     {"id": [10, 11], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["he", "llo"]}
... )
>>> units = Units(words=words, syllables=syllables)
>>> units.parent("syllable", "word").tolist()
[1, 1]
children(parent_type, child_type)[source]#

Group child units by parent identifier.

Parameters:
  • parent_type (str) – Parent unit label.

  • child_type (str) – Child unit label.

Returns:

Grouped child table keyed by parent identifier.

Return type:

DataFrameGroupBy

Raises:

ValueError – Raised when the mapping column is unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> phonemes = pd.DataFrame(
...     {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]}
... )
>>> units = Units(words=words, phonemes=phonemes)
>>> units.children("word", "phoneme").ngroups
1
group(child_type, by)[source]#

Alias for children() using by as the parent unit.

Parameters:
  • child_type (str) – Child unit label.

  • by (str) – Parent unit label.

Returns:

Grouped child table.

Return type:

DataFrameGroupBy

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> phonemes = pd.DataFrame(
...     {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]}
... )
>>> units = Units(words=words, phonemes=phonemes)
>>> units.group("phoneme", by="word").ngroups
1
class voxatlas.VoxAtlasPipeline(audio, units, config)[source]#

Run a VoxAtlas feature extraction workflow for a single stream.

A pipeline instance combines one audio stream, one unit hierarchy, and a runtime configuration. It validates requested features, resolves dependency layers, executes extractors in order, and stores intermediate results so downstream features can reuse them.

Parameters:
  • audio (Audio | None) – Audio stream for the current conversation channel. Acoustic features require this input.

  • units (Units | None) – Hierarchical unit container for the current stream. Linguistic and alignment-based features require this input.

  • config (dict) – Runtime configuration containing the requested features and pipeline options.

Returns:

Configured pipeline instance ready to execute.

Return type:

VoxAtlasPipeline

Notes

VoxAtlas resolves dependencies through the feature registry and executes each dependency layer sequentially while allowing optional parallelism inside a layer.

Examples

>>> import numpy as np
>>> from voxatlas.audio.audio import Audio
>>> from voxatlas.pipeline import Pipeline
>>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000)
>>> pipeline = Pipeline(
...     audio=audio,
...     units=None,
...     config={"features": ["acoustic.pitch.dummy"], "pipeline": {"n_jobs": 1, "cache": False}},
... )
>>> results = pipeline.run()
>>> results.exists("acoustic.pitch.dummy")
True
run()[source]#

Execute the configured feature graph and return computed outputs.

The pipeline validates the requested features, creates an execution plan from registry dependencies, then executes each dependency layer in order. Intermediate outputs are inserted into a feature store so later features can retrieve them.

Returns:

Store containing requested features and any computed dependencies.

Return type:

FeatureStore

Raises:
  • ValueError – Raised when the dependency graph contains a cycle.

  • KeyError – Raised when a required feature is missing from the store or cache during execution.

Notes

When caching is enabled, cached outputs are loaded before an extractor is scheduled for execution.

Examples

>>> import numpy as np
>>> from voxatlas.audio.audio import Audio
>>> from voxatlas.pipeline import Pipeline
>>> audio = Audio(waveform=np.zeros(16000, dtype=np.float32), sample_rate=16000)
>>> pipeline = Pipeline(
...     audio=audio,
...     units=None,
...     config={"features": ["acoustic.pitch.dummy"], "pipeline": {"n_jobs": 1, "cache": False}},
... )
>>> results = pipeline.run()
>>> results.exists("acoustic.pitch.dummy")
True
voxatlas.expand_defaults(cfg)[source]#

Merge a user configuration with VoxAtlas defaults.

What “Expand Defaults” Means#

VoxAtlas maintains a small built-in default configuration (voxatlas.config.defaults.DEFAULT_CONFIG). expand_defaults starts from a deep copy of that default mapping and then applies the user configuration on top.

This is a shallow top-level merge:

  • Only the first level of keys is merged (via dict.update).

  • If the user provides a top-level key, it replaces the default value for that key entirely.

  • Nested mappings are not deep-merged. For example, providing a pipeline mapping replaces the whole default pipeline mapping.

Concretely, given the default:

{"features": [], "pipeline": {"cache": True}}

The following user config:

{"pipeline": {"n_jobs": 4}}

Produces:

{"features": [], "pipeline": {"n_jobs": 4}}

(note how pipeline.cache is not preserved because nested dicts are not merged).

param cfg:

User-supplied configuration dictionary.

type cfg:

dict

returns:

Configuration with top-level defaults applied.

rtype:

dict

Notes

If you want to override just one pipeline option while keeping other defaults, pass the full desired pipeline mapping (or use load_and_prepare_config(), which is the recommended config entry point for most workflows).

Examples

>>> from voxatlas.config import expand_defaults
>>> cfg = expand_defaults({"features": ["acoustic.pitch.dummy"]})
>>> cfg["features"]
['acoustic.pitch.dummy']
>>> sorted(cfg["pipeline"].keys())
['cache']
Parameters:

cfg (dict)

Return type:

dict

voxatlas.load_alignment(path)[source]#

Load an alignment file into a Units container.

This is a lightweight compatibility entry point for alignment ingestion. The current implementation returns an empty Units object and does not parse the file content yet.

Parameters:

path (str) – Filesystem path to an alignment file (for example, a TextGrid file). The path is accepted for API consistency, even though content parsing is not implemented in this helper yet.

Returns:

An empty Units container.

Return type:

Units

Notes

For full data loading workflows, prefer higher-level input loading helpers that combine audio, alignment, and metadata validation.

Examples

>>> from voxatlas.units.alignment import load_alignment
>>> from voxatlas.units.units import Units
>>> units = load_alignment("alignment.TextGrid")
>>> isinstance(units, Units)
True
voxatlas.load_and_prepare_config(path)[source]#

Load, validate, and normalize a VoxAtlas configuration.

Parameters:

path (str) – Filesystem path to a YAML configuration file.

Returns:

Validated configuration with defaults applied.

Return type:

dict

Raises:

ConfigValidationError – Raised when the configuration does not satisfy the expected schema.

Notes

This is the recommended configuration entry point for the CLI and tutorial workflows.

Examples

>>> import tempfile
>>> from pathlib import Path
>>> from voxatlas.config import load_and_prepare_config
>>> yaml_text = "features:\n  - acoustic.pitch.dummy\n"
>>> with tempfile.TemporaryDirectory() as tmp:
...     path = Path(tmp) / "config.yaml"
...     _ = path.write_text(yaml_text, encoding="utf-8")
...     cfg = load_and_prepare_config(str(path))
...     cfg["features"]
['acoustic.pitch.dummy']
voxatlas.load_config(path)[source]#

Load a VoxAtlas YAML configuration file.

Expected YAML Format#

VoxAtlas configuration files are YAML mappings (YAML “dicts”) with a small set of conventional top-level keys. The minimal valid config contains a features list:

features:
  - acoustic.pitch.dummy

Optional keys supported by the pipeline and config layer include:

  • pipeline: pipeline runtime options (mapping) - n_jobs: number of worker processes per dependency layer (int) - cache: enable/disable on-disk feature caching (bool) - cache_dir: cache directory when caching is enabled (str)

  • feature_config: per-feature parameter overrides (mapping) - keys are feature names from features - values are extractor-specific parameter mappings

Example with per-feature parameters and pipeline options:

features:
  - phonology.prosody.stressed
  - acoustic.pitch.f0

pipeline:
  n_jobs: 4
  cache: true
  cache_dir: .voxatlas_cache

feature_config:
  phonology.prosody.stressed:
    language: fra
    resource_root: /path/to/resources/phonology
param path:

Filesystem path to a YAML configuration file.

type path:

str

returns:

Parsed configuration dictionary.

rtype:

dict

raises OSError:

Raised when the file cannot be opened.

raises yaml.YAMLError:

Raised when the YAML document is invalid.

Notes

This function parses YAML only. It does not apply defaults or schema validation. For the recommended entry point that validates and applies defaults, see load_and_prepare_config().

Examples

>>> import tempfile
>>> from pathlib import Path
>>> from voxatlas.config import load_config
>>> yaml_text = "features:\n  - acoustic.pitch.dummy\n"
>>> with tempfile.TemporaryDirectory() as tmp:
...     path = Path(tmp) / "config.yaml"
...     _ = path.write_text(yaml_text, encoding="utf-8")
...     cfg = load_config(str(path))
...     cfg["features"]
['acoustic.pitch.dummy']
Parameters:

path (str)

Return type:

dict

voxatlas.load_dataset(dataset_root, conversation_id)[source]#

Load audio and alignment inputs for one conversation.

Parameters:
  • dataset_root (str) – Root directory containing audio/ and alignment/ subdirectories.

  • conversation_id (str) – Conversation identifier shared by the audio and alignment files.

Returns:

Loaded dataset object with channel-wise streams.

Return type:

DatasetInput

Raises:

ValueError – Raised when the directory layout is invalid or required files are missing.

Notes

VoxAtlas expects the SPPAS-style alignment layout used by the repository examples and tests.

Examples

>>> import tempfile
>>> from pathlib import Path
>>> from voxatlas.io import load_dataset
>>>
>>> def _write_textgrid(path: Path, tier_names: list[str]) -> None:
...     items = []
...     for idx, name in enumerate(tier_names, start=1):
...         items.extend(
...             [
...                 f"item [{idx}]:",
...                 f'    name = "{name}"',
...                 "    intervals [1]:",
...                 "        xmin = 0",
...                 "        xmax = 0.5",
...                 '        text = "x"',
...             ]
...         )
...     path.write_text("\n".join(items) + "\n", encoding="utf-8")
>>>
>>> with tempfile.TemporaryDirectory() as tmp:
...     root = Path(tmp)
...     (root / "alignment" / "palign").mkdir(parents=True)
...     (root / "alignment" / "syll").mkdir(parents=True)
...     (root / "alignment" / "ipu").mkdir(parents=True)
...     conv = "conversation01"
...     for ch in ("ch1", "ch2"):
...         _write_textgrid(
...             root / "alignment" / "palign" / f"{conv}_{ch}.TextGrid",
...             ["TokensAlign", "PhonAlign"],
...         )
...         _write_textgrid(
...             root / "alignment" / "syll" / f"{conv}_{ch}.TextGrid",
...             ["SyllAlign", "SyllClassAlign"],
...         )
...         _write_textgrid(
...             root / "alignment" / "ipu" / f"{conv}_{ch}.TextGrid",
...             ["IPU"],
...         )
...     dataset = load_dataset(str(root), conv)
...     streams = dataset.streams()
...     (len(streams), streams[0].units.speaker, streams[1].units.speaker)
(2, 'A', 'B')
voxatlas.load_textgrid(path)[source]#

Parse a Praat TextGrid file into per-tier interval tables.

Each returned DataFrame contains interval rows with id, start, end, and label columns. Tier names are used as dictionary keys.

Parameters:

path (str | Path) – Path to a TextGrid file on disk.

Returns:

Mapping from tier name to interval table.

Return type:

dict[str, pandas.DataFrame]

Notes

This parser targets interval tiers (intervals [n] blocks). Point tiers are not expanded into the output structure.

Examples

>>> import tempfile
>>> from pathlib import Path
>>> from voxatlas.units.alignment_loader import load_textgrid
>>> textgrid = "\n".join(
...     [
...         "item [1]:",
...         '    name = "words"',
...         "    intervals [1]:",
...         "        xmin = 0",
...         "        xmax = 0.5",
...         '        text = "hello"',
...         "item [2]:",
...         '    name = "phones"',
...         "    intervals [1]:",
...         "        xmin = 0",
...         "        xmax = 0.5",
...         '        text = "h"',
...     ]
... ) + "\n"
>>> with tempfile.TemporaryDirectory() as tmp:
...     path = Path(tmp) / "alignment.TextGrid"
...     _ = path.write_text(textgrid, encoding="utf-8")
...     tiers = load_textgrid(path)
...     (sorted(tiers.keys()), tiers["words"].columns.tolist())
(['phones', 'words'], ['id', 'start', 'end', 'label'])