Skip to content

Dhātu Experiments v0.1 and Typological Sample (child-directed-first)

This page consolidates artifacts from experiments/dhatu/ (v0.1) into a unified view for research and publications. It documents data sources, the typological sample, targeted phenomena, and reproducible metrics.

Goal and scope

  • Establish a minimal set of cross-lingual primitives (Dhātu) exercised on a toy corpus and multilingual child-directed prompts.
  • Build a balanced typological sample (child-directed-first) for cross-language comparisons.
  • Provide simple metrics and source links for traceability and reproducibility.

Data sources (external refs)

Typological sample v0.1

Priority: child‑directed, morpho‑syntactic diversity. Selected languages and profiles:

Child‑directed prompts: available languages

Codes in experiments/dhatu/prompts_child/:

arb, cmn, deu, en, eus, ewe, fr, hau, heb, hin, hun, iku, jpn, kor, nld, spa, swa, tur, yor, zul

Covered phenomena (aggregate)

Counts across child languages (top categories):

  • spatial:in — 38
  • AAO — 20
  • possession — 20
  • quantification — 20
  • negation — 20
  • time:now — 20
  • event:sequence — 20
  • comparison:more — 20
  • existence — 20
  • spatial:on — 19
  • evidential:reported — 19
  • modality:possibility — 15
  • aspect:progressive — 12
  • modality:permission — 5
  • SVC — 3
  • plural — 2
  • aspect? — 2
  • SVC-like — 2
  • spatial:dans — 2
  • aspect:progressive? — 1
  • spatial:sur — 1
  • incorporation? — 1
  • evidential:inferential — 1
  • habitual? — 1

Experimental metrics (v0.1)

  • Toy corpus (toy_corpus.json + gold_encodings.json):
  • sentences: 12
  • covered: 12 — rate = 1.0
  • avg primitives per encoding: 3.667
  • Child prompts (child gold encodings): gold_encodings_child.json currently empty → detailed metrics pending (annotation in progress).

Repro (local)

Run from repo root:

  • Typological sample and sources:
  • python3 experiments/dhatu/validator.py --list-sample
  • Available child languages:
  • python3 experiments/dhatu/validator.py --list-child-langs
  • Phenomena counts across child languages:
  • python3 experiments/dhatu/validator.py --phenomena
  • Toy corpus metrics:
  • python3 experiments/dhatu/validator.py --metrics
  • Dhātu Inventory v0.1: research/dhatu-inventory-v0-1.md
  • Research references: research/references.md

Last updated: generated from experiments/dhatu/ v0.1 sources.