Units#

Defined in: voxatlas.units.units

class voxatlas.units.units.Units(frames=None, tokens=None, phonemes=None, syllables=None, sentences=None, words=None, ipus=None, turns=None, speaker=None)[source]#

Bases: object

Container for hierarchical speech units (tables) for a single stream.

VoxAtlas feature extractors operate on unit tables (frames, tokens, phonemes, syllables, words, etc.) that are time-aligned and optionally linked through parent-child identifiers. Units is a lightweight, backend-agnostic wrapper around those tables: it stores them, normalizes unit type names (singular/plural aliases), and provides a small set of convenience accessors (lookup, durations, parent/children grouping).

This class intentionally does not enforce a rigid schema beyond what its helper methods require; extractors may expect additional columns such as label or token depending on the feature.

Parameters:
  • frames (pandas.DataFrame | None) – Frame-level table.

  • tokens (pandas.DataFrame | None) – Token-level table.

  • phonemes (pandas.DataFrame | None) – Phoneme-level table.

  • syllables (pandas.DataFrame | None) – Syllable-level table.

  • sentences (pandas.DataFrame | None) – Sentence-level table.

  • words (pandas.DataFrame | None) – Word-level table.

  • ipus (pandas.DataFrame | None) – Inter-pausal-unit table.

  • turns (pandas.DataFrame | None) – Turn-level table.

  • speaker (str | None) – Optional speaker label for the stream.

Returns:

Hierarchical unit container for one stream.

Return type:

Units

frames, tokens, phonemes, syllables, sentences, words, ipus, turns

Stored unit tables. Any table may be None if it is unavailable.

Type:

pandas.DataFrame | None

speaker#

Speaker label associated with this stream (if known).

Type:

str | None

Notes

Unit labels Methods that accept a unit_type (for example, table()) accept both singular and plural labels:

  • "frame" / "frames"

  • "token" / "tokens"

  • "phoneme" / "phonemes"

  • "syllable" / "syllables"

  • "sentence" / "sentences"

  • "word" / "words"

  • "ipu" / "ipus"

  • "turn" / "turns"

Table conventions Units works best when each DataFrame follows a few simple conventions:

  • id: unique identifier for the unit row (typically integer-like).

  • start and end: segment boundaries on a shared timeline (commonly seconds). Used by duration() and by many extractors.

  • Parent-child links (optional): to connect units explicitly, include an <parent>_id column on the child table. For example, syllables that belong to words can carry a word_id column; phonemes that belong to syllables can carry a syllable_id column. parent() and children() use this naming convention.

  • table() returns the underlying DataFrame object. If you mutate it, you are mutating the table stored on the Units instance.

  • If a requested table is missing (None), table() raises ValueError; callers can either catch this or check the relevant attribute first.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> syllables = pd.DataFrame(
...     {"id": [10], "word_id": [1], "start": [0.0], "end": [0.5], "label": ["he"]}
... )
>>> units = Units(words=words, syllables=syllables, speaker="A")
>>> units.table("word").shape
(1, 4)
>>> float(units.duration("word").iloc[0])
1.0
table(unit_type)[source]#

Return the table for a requested unit type.

Parameters:

unit_type (str) – Unit label such as "token" or "syllable".

Returns:

Table associated with the requested unit type.

Return type:

pandas.DataFrame

Raises:

ValueError – Raised when the unit type is invalid or unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> tokens = pd.DataFrame({"id": [1], "start": [0.0], "end": [0.2], "label": ["hi"]})
>>> units = Units(tokens=tokens)
>>> units.table("token").columns.tolist()
['id', 'start', 'end', 'label']
get(unit_type)[source]#

Alias for table().

Parameters:

unit_type (str) – Requested unit label.

Returns:

Requested unit table.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> units = Units(words=words)
>>> units.get("word").shape[0]
1
duration(unit_type)[source]#

Compute durations from start and end columns.

Parameters:

unit_type (str) – Requested unit label.

Returns:

Duration values for each row.

Return type:

pandas.Series

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.25], "end": [1.00], "label": ["hello"]})
>>> units = Units(words=words)
>>> float(units.duration("word").iloc[0])
0.75
parent(child_type, parent_type)[source]#

Return parent identifiers for a child unit table.

Parameters:
  • child_type (str) – Child unit label.

  • parent_type (str) – Parent unit label.

Returns:

Parent identifier column.

Return type:

pandas.Series

Raises:

ValueError – Raised when the mapping column is unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> syllables = pd.DataFrame(
...     {"id": [10, 11], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["he", "llo"]}
... )
>>> units = Units(words=words, syllables=syllables)
>>> units.parent("syllable", "word").tolist()
[1, 1]
children(parent_type, child_type)[source]#

Group child units by parent identifier.

Parameters:
  • parent_type (str) – Parent unit label.

  • child_type (str) – Child unit label.

Returns:

Grouped child table keyed by parent identifier.

Return type:

DataFrameGroupBy

Raises:

ValueError – Raised when the mapping column is unavailable.

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> phonemes = pd.DataFrame(
...     {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]}
... )
>>> units = Units(words=words, phonemes=phonemes)
>>> units.children("word", "phoneme").ngroups
1
group(child_type, by)[source]#

Alias for children() using by as the parent unit.

Parameters:
  • child_type (str) – Child unit label.

  • by (str) – Parent unit label.

Returns:

Grouped child table.

Return type:

DataFrameGroupBy

Examples

>>> import pandas as pd
>>> from voxatlas.units import Units
>>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]})
>>> phonemes = pd.DataFrame(
...     {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]}
... )
>>> units = Units(words=words, phonemes=phonemes)
>>> units.group("phoneme", by="word").ngroups
1