Unit Hierarchy#
VoxAtlas models conversational analysis at multiple units of observation (turns, words, phonemes, etc.). Many features are defined “per unit”: for example, a per-word feature produces one value per row in the word table.
VoxAtlas represents these unit tables for a single stream (for example one
conversation channel) with Units.
Typical examples include:
conversation-level measurements
speaker or turn-level measurements
utterance and sentence-level measurements
token, word, syllable, or phoneme-level measurements
Extractors declare the unit level they consume and produce via
BaseExtractor.input_units and BaseExtractor.output_units. This metadata
is validated when extractors are registered and is surfaced in the CLI and API
docs. At runtime, extractors should still validate that required tables and
columns are present for the current dataset/stream.
Unit labels#
Extractor unit labels must be one of the supported strings (or None):
conversationturnipusentencewordtokensyllablephonemeframe
conversation is a logical level used for global or summary features; it
is not stored as a table on Units.
The Units container#
Units is a lightweight wrapper around a set of
optional Pandas DataFrames:
frames(frame-level time grid)tokens(token-level segmentation)phonemessyllableswordssentencesipus(inter-pausal units)turns
Tables are optional. Missing tables are represented as None and requesting
them via table() raises ValueError.
Table conventions#
Units does not enforce a rigid schema, but most features assume a few
common conventions:
id: unique identifier for the unit rowstart/end: segment boundaries on a shared timeline (commonly seconds)optional parent-child links using an
<parent>_idcolumn on the child table (for example, syllables can includeword_id)
The helper methods parent() and
children() implement this naming convention.
Illustrated hierarchy (typical)#
Many datasets provide a hierarchy like the following. The arrows are labeled
with the child column that links to the parent (for example, an ipu row
can carry a turn_id to identify its parent turn).
graph TD
turn((turns)) -->|turn_id| ipu([ipus])
ipu -->|ipu_id| sentence[sentences]
sentence -->|sentence_id| word[[words]]
word -->|word_id| syllable([syllables])
syllable -->|syllable_id| phoneme[/phonemes/]
token[/tokens/]:::optional
frames[("frames (time grid)")]:::time
word -. optional .-> token
frames -. time-aligned via start/end .-> word
classDef top fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1;
classDef mid fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20;
classDef low fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#E65100;
classDef time fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#4A148C;
classDef optional fill:#F5F5F5,stroke:#616161,stroke-dasharray:4 3,color:#424242;
class turn top;
class ipu,sentence mid;
class word,syllable,phoneme low;
The exact set of tables and links depends on the dataset. Features should be written defensively (for example, fall back to time-alignment when explicit links are unavailable, or raise a clear error when a required mapping is missing).
Common access patterns#
# Get a table (singular/plural labels both work)
words = units.table("word")
# Durations from start/end
word_durations = units.duration("word")
# Parent ids (requires a <parent>_id column on the child table)
word_ids_for_syllables = units.parent("syllable", "word")
# Group children by parent id
syllables_by_word = units.children("word", "syllable")