Unit Hierarchy#

VoxAtlas models conversational analysis at multiple units of observation (turns, words, phonemes, etc.). Many features are defined “per unit”: for example, a per-word feature produces one value per row in the word table.

VoxAtlas represents these unit tables for a single stream (for example one conversation channel) with Units.

Typical examples include:

conversation-level measurements
speaker or turn-level measurements
utterance and sentence-level measurements
token, word, syllable, or phoneme-level measurements

Extractors declare the unit level they consume and produce via BaseExtractor.input_units and BaseExtractor.output_units. This metadata is validated when extractors are registered and is surfaced in the CLI and API docs. At runtime, extractors should still validate that required tables and columns are present for the current dataset/stream.

Unit labels#

Extractor unit labels must be one of the supported strings (or None):

conversation
turn
ipu
sentence
word
token
syllable
phoneme
frame

conversation is a logical level used for global or summary features; it is not stored as a table on Units.

The `Units` container#

Units is a lightweight wrapper around a set of optional Pandas DataFrames:

frames (frame-level time grid)
tokens (token-level segmentation)
phonemes
syllables
words
sentences
ipus (inter-pausal units)
turns

Tables are optional. Missing tables are represented as None and requesting them via table() raises ValueError.

Table conventions#

Units does not enforce a rigid schema, but most features assume a few common conventions:

id: unique identifier for the unit row
start / end: segment boundaries on a shared timeline (commonly seconds)
optional parent-child links using an <parent>_id column on the child table (for example, syllables can include word_id)

The helper methods parent() and children() implement this naming convention.

Illustrated hierarchy (typical)#

Many datasets provide a hierarchy like the following. The arrows are labeled with the child column that links to the parent (for example, an ipu row can carry a turn_id to identify its parent turn).

graph TD
  turn((turns)) -->|turn_id| ipu([ipus])
  ipu -->|ipu_id| sentence[sentences]
  sentence -->|sentence_id| word[[words]]
  word -->|word_id| syllable([syllables])
  syllable -->|syllable_id| phoneme[/phonemes/]

  token[/tokens/]:::optional
  frames[("frames (time grid)")]:::time

  word -. optional .-> token
  frames -. time-aligned via start/end .-> word

  classDef top fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#0D47A1;
  classDef mid fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20;
  classDef low fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#E65100;
  classDef time fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#4A148C;
  classDef optional fill:#F5F5F5,stroke:#616161,stroke-dasharray:4 3,color:#424242;

  class turn top;
  class ipu,sentence mid;
  class word,syllable,phoneme low;

The exact set of tables and links depends on the dataset. Features should be written defensively (for example, fall back to time-alignment when explicit links are unavailable, or raise a clear error when a required mapping is missing).

Common access patterns#

# Get a table (singular/plural labels both work)
words = units.table("word")

# Durations from start/end
word_durations = units.duration("word")

# Parent ids (requires a <parent>_id column on the child table)
word_ids_for_syllables = units.parent("syllable", "word")

# Group children by parent id
syllables_by_word = units.children("word", "syllable")