Dhātu Experiments v0.1 and Typological Sample (child-directed-first)¶
This page consolidates artifacts from experiments/dhatu/ (v0.1) into a unified view for research and publications. It documents data sources, the typological sample, targeted phenomena, and reproducible metrics.
Goal and scope¶
- Establish a minimal set of cross-lingual primitives (Dhātu) exercised on a toy corpus and multilingual child-directed prompts.
- Build a balanced typological sample (child-directed-first) for cross-language comparisons.
- Provide simple metrics and source links for traceability and reproducibility.
Data sources (external refs)¶
- CHILDES (TalkBank): https://childes.talkbank.org/
- WALS (World Atlas of Language Structures): https://wals.info/
- Universal Dependencies (UD): https://universaldependencies.org/
- African Storybook: https://www.africanstorybook.org/
- Global Storybooks: https://globalstorybooks.net/
- Storybooks Canada: https://storybookscanada.ca/
Typological sample v0.1¶
Priority: child‑directed, morpho‑syntactic diversity. Selected languages and profiles:
- English (eng) — Indo‑European > Germanic — SVO
- CHILDES: Brown/Providence → https://childes.talkbank.org/access/Eng-NA/
- GlobalStorybooks: English → https://globalstorybooks.net/collections/english/
- French (fra) — Indo‑European > Romance — SVO
- CHILDES: French → https://childes.talkbank.org/access/French/
- GlobalStorybooks: Français → https://globalstorybooks.net/collections/french/
- Spanish (spa) — Indo‑European > Romance — SVO
- CHILDES: Spanish → https://childes.talkbank.org/access/Spanish/
- GlobalStorybooks: Español → https://globalstorybooks.net/collections/spanish/
- German (deu) — Indo‑European > Germanic — SVO, V2
- CHILDES: German → https://childes.talkbank.org/access/German/
- GlobalStorybooks: Deutsch → https://globalstorybooks.net/collections/german/
- Dutch (nld) — Indo‑European > Germanic — SVO, V2
- CHILDES: Dutch → https://childes.talkbank.org/access/Dutch/
- Italian (ita) — Indo‑European > Romance — SVO
- CHILDES: Italian → https://childes.talkbank.org/access/Italian/
- GlobalStorybooks: Italiano → https://globalstorybooks.net/collections/italian/
- Portuguese (por) — Indo‑European > Romance — SVO
- CHILDES: Portuguese → https://childes.talkbank.org/access/Portuguese/
- GlobalStorybooks: Português → https://globalstorybooks.net/collections/portuguese/
- Mandarin Chinese (cmn) — Sino‑Tibetan — SVO, classifiers
- CHILDES: Mandarin → https://childes.talkbank.org/access/Mandarin/
- GlobalStorybooks: 中文 (简体) → https://globalstorybooks.net/collections/chinese-simplified/
- Japanese (jpn) — Japonic — SOV, honorifics
- CHILDES: Japanese → https://childes.talkbank.org/access/Japanese/
- GlobalStorybooks: 日本語 → https://globalstorybooks.net/collections/japanese/
- Korean (kor) — Koreanic — SOV, honorifics
- CHILDES: Korean → https://childes.talkbank.org/access/Korean/
- GlobalStorybooks: 한국어 → https://globalstorybooks.net/collections/korean/
- Turkish (tur) — Turkic — SOV, agglutinative
- CHILDES: Turkish → https://childes.talkbank.org/access/Turkish/
- GlobalStorybooks: Türkçe → https://globalstorybooks.net/collections/turkish/
- Hungarian (hun) — Uralic — agglutinative
- CHILDES: Hungarian → https://childes.talkbank.org/access/Hungarian/
- GlobalStorybooks: Magyar → https://globalstorybooks.net/collections/hungarian/
- Hebrew (heb) — Afro‑Asiatic > Semitic — root‑and‑pattern
- CHILDES: Hebrew → https://childes.talkbank.org/access/Hebrew/
- GlobalStorybooks: עברית → https://globalstorybooks.net/collections/hebrew/
- Arabic (arb) — Afro‑Asiatic > Semitic — VSO/SVO, root‑and‑pattern
- GlobalStorybooks: العربية → https://globalstorybooks.net/collections/arabic/
- Swahili (swa) — Niger‑Congo > Bantu — SVO, noun classes
- AfricanStorybook: Kiswahili → https://www.africanstorybook.org/language/kiswahili
- Yoruba (yor) — Niger‑Congo — SVO, SVC
- AfricanStorybook: Yorùbá → https://www.africanstorybook.org/language/yoruba
- Hausa (hau) — Afro‑Asiatic > Chadic — SVO
- AfricanStorybook: Hausa → https://www.africanstorybook.org/language/hausa
- Zulu (zul) — Niger‑Congo > Bantu — SVO, noun classes
- AfricanStorybook: isiZulu → https://www.africanstorybook.org/language/isizulu
- Inuktitut (iku) — Eskimo–Aleut — polysynthetic, incorporation
- StorybooksCanada: ᐃᓄᒃᑎᑐᑦ → https://storybookscanada.ca/?lang=iu
Child‑directed prompts: available languages¶
Codes in experiments/dhatu/prompts_child/:
arb, cmn, deu, en, eus, ewe, fr, hau, heb, hin, hun, iku, jpn, kor, nld, spa, swa, tur, yor, zul
Covered phenomena (aggregate)¶
Counts across child languages (top categories):
- spatial:in — 38
- AAO — 20
- possession — 20
- quantification — 20
- negation — 20
- time:now — 20
- event:sequence — 20
- comparison:more — 20
- existence — 20
- spatial:on — 19
- evidential:reported — 19
- modality:possibility — 15
- aspect:progressive — 12
- modality:permission — 5
- SVC — 3
- plural — 2
- aspect? — 2
- SVC-like — 2
- spatial:dans — 2
- aspect:progressive? — 1
- spatial:sur — 1
- incorporation? — 1
- evidential:inferential — 1
- habitual? — 1
Experimental metrics (v0.1)¶
- Toy corpus (
toy_corpus.json+gold_encodings.json): - sentences: 12
- covered: 12 — rate = 1.0
- avg primitives per encoding: 3.667
- Child prompts (child gold encodings):
gold_encodings_child.jsoncurrently empty → detailed metrics pending (annotation in progress).
Repro (local)¶
Run from repo root:
- Typological sample and sources:
- python3 experiments/dhatu/validator.py --list-sample
- Available child languages:
- python3 experiments/dhatu/validator.py --list-child-langs
- Phenomena counts across child languages:
- python3 experiments/dhatu/validator.py --phenomena
- Toy corpus metrics:
- python3 experiments/dhatu/validator.py --metrics
Internal links¶
- Dhātu Inventory v0.1:
research/dhatu-inventory-v0-1.md - Research references:
research/references.md
Last updated: generated from experiments/dhatu/ v0.1 sources.