Units#
Defined in: voxatlas.units.units
- class voxatlas.units.units.Units(frames=None, tokens=None, phonemes=None, syllables=None, sentences=None, words=None, ipus=None, turns=None, speaker=None)[source]#
Bases:
objectContainer for hierarchical speech units (tables) for a single stream.
VoxAtlas feature extractors operate on unit tables (frames, tokens, phonemes, syllables, words, etc.) that are time-aligned and optionally linked through parent-child identifiers.
Unitsis a lightweight, backend-agnostic wrapper around those tables: it stores them, normalizes unit type names (singular/plural aliases), and provides a small set of convenience accessors (lookup, durations, parent/children grouping).This class intentionally does not enforce a rigid schema beyond what its helper methods require; extractors may expect additional columns such as
labelortokendepending on the feature.- Parameters:
frames (pandas.DataFrame | None) – Frame-level table.
tokens (pandas.DataFrame | None) – Token-level table.
phonemes (pandas.DataFrame | None) – Phoneme-level table.
syllables (pandas.DataFrame | None) – Syllable-level table.
sentences (pandas.DataFrame | None) – Sentence-level table.
words (pandas.DataFrame | None) – Word-level table.
ipus (pandas.DataFrame | None) – Inter-pausal-unit table.
turns (pandas.DataFrame | None) – Turn-level table.
speaker (str | None) – Optional speaker label for the stream.
- Returns:
Hierarchical unit container for one stream.
- Return type:
- frames, tokens, phonemes, syllables, sentences, words, ipus, turns
Stored unit tables. Any table may be
Noneif it is unavailable.- Type:
pandas.DataFrame | None
- speaker#
Speaker label associated with this stream (if known).
- Type:
str | None
Notes
Unit labels Methods that accept a
unit_type(for example,table()) accept both singular and plural labels:"frame"/"frames""token"/"tokens""phoneme"/"phonemes""syllable"/"syllables""sentence"/"sentences""word"/"words""ipu"/"ipus""turn"/"turns"
Table conventions
Unitsworks best when each DataFrame follows a few simple conventions:id: unique identifier for the unit row (typically integer-like).startandend: segment boundaries on a shared timeline (commonly seconds). Used byduration()and by many extractors.Parent-child links (optional): to connect units explicitly, include an
<parent>_idcolumn on the child table. For example, syllables that belong to words can carry aword_idcolumn; phonemes that belong to syllables can carry asyllable_idcolumn.parent()andchildren()use this naming convention.table()returns the underlying DataFrame object. If you mutate it, you are mutating the table stored on theUnitsinstance.If a requested table is missing (
None),table()raisesValueError; callers can either catch this or check the relevant attribute first.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> syllables = pd.DataFrame( ... {"id": [10], "word_id": [1], "start": [0.0], "end": [0.5], "label": ["he"]} ... ) >>> units = Units(words=words, syllables=syllables, speaker="A") >>> units.table("word").shape (1, 4) >>> float(units.duration("word").iloc[0]) 1.0
- table(unit_type)[source]#
Return the table for a requested unit type.
- Parameters:
unit_type (str) – Unit label such as
"token"or"syllable".- Returns:
Table associated with the requested unit type.
- Return type:
pandas.DataFrame
- Raises:
ValueError – Raised when the unit type is invalid or unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> tokens = pd.DataFrame({"id": [1], "start": [0.0], "end": [0.2], "label": ["hi"]}) >>> units = Units(tokens=tokens) >>> units.table("token").columns.tolist() ['id', 'start', 'end', 'label']
- get(unit_type)[source]#
Alias for
table().- Parameters:
unit_type (str) – Requested unit label.
- Returns:
Requested unit table.
- Return type:
pandas.DataFrame
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> units = Units(words=words) >>> units.get("word").shape[0] 1
- duration(unit_type)[source]#
Compute durations from
startandendcolumns.- Parameters:
unit_type (str) – Requested unit label.
- Returns:
Duration values for each row.
- Return type:
pandas.Series
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.25], "end": [1.00], "label": ["hello"]}) >>> units = Units(words=words) >>> float(units.duration("word").iloc[0]) 0.75
- parent(child_type, parent_type)[source]#
Return parent identifiers for a child unit table.
- Parameters:
child_type (str) – Child unit label.
parent_type (str) – Parent unit label.
- Returns:
Parent identifier column.
- Return type:
pandas.Series
- Raises:
ValueError – Raised when the mapping column is unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> syllables = pd.DataFrame( ... {"id": [10, 11], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["he", "llo"]} ... ) >>> units = Units(words=words, syllables=syllables) >>> units.parent("syllable", "word").tolist() [1, 1]
- children(parent_type, child_type)[source]#
Group child units by parent identifier.
- Parameters:
parent_type (str) – Parent unit label.
child_type (str) – Child unit label.
- Returns:
Grouped child table keyed by parent identifier.
- Return type:
DataFrameGroupBy
- Raises:
ValueError – Raised when the mapping column is unavailable.
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> phonemes = pd.DataFrame( ... {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]} ... ) >>> units = Units(words=words, phonemes=phonemes) >>> units.children("word", "phoneme").ngroups 1
- group(child_type, by)[source]#
Alias for
children()usingbyas the parent unit.- Parameters:
child_type (str) – Child unit label.
by (str) – Parent unit label.
- Returns:
Grouped child table.
- Return type:
DataFrameGroupBy
Examples
>>> import pandas as pd >>> from voxatlas.units import Units >>> words = pd.DataFrame({"id": [1], "start": [0.0], "end": [1.0], "label": ["hello"]}) >>> phonemes = pd.DataFrame( ... {"id": [100, 101], "word_id": [1, 1], "start": [0.0, 0.5], "end": [0.5, 1.0], "label": ["h", "i"]} ... ) >>> units = Units(words=words, phonemes=phonemes) >>> units.group("phoneme", by="word").ngroups 1