Semantic universals (Dhātu)¶

Draft synthesis of assumptions, validation paths, and references.

Synthesis (draft)¶

Problem: identify a compact set of semantic primitives (Dhātu) usable across storage/communication/processing.
Context: CI stabilized; documentation refocused on the WHAT (research), not the HOW (infra).

Assumptions¶

A small, stable set of semantic primitives can encode broad conceptual coverage.
Human language acquisition stages inform the emergence order of these universals.

Minimal validation protocol (v0)¶

Coverage: map a 100-item frequent concept set (nouns, verbs, relations) to a minimal Dhātu inventory; measure % covered without adding primitives.
Ambiguity: for each encoding, count plausible decodings; v0 target ≤ 1.5 interpretations on average (with short context).
Reversibility: decode Dhātu reps to EN/FR paraphrases and judge semantic equivalence by humans or a robust LLM (agreement ≥ 0.8).
Parsimony: penalize primitive count per concept (median ≤ 4 primitives/concept at v0).

Micro-cases (sanity checks)¶

1) Agent-Action-Object (AAO) - Input: "The cat hunts the mouse." - Expected Dhātu: [AGENT:cat] [ACTION:hunt] [PATIENT:mouse] [ASPECT:habitual?] - Tests: tense, negation, modality.

2) Possession and location - Input: "The book is on Mary's table." - Expected Dhātu: [OBJ:book] [REL:on] [REF:table] [REL:of] [REF:Mary] - Tests: attachment ambiguity and relation stacking.

3) Simple quantification - Input: "Three children run." - Expected Dhātu: [QUANT:3] [AGENT:child] [ACTION:run] - Tests: plural, indefinites.

Sources (FR journals)¶

See Copilotage journals dated 2025‑08‑30 for context and decisions.

References (selected)¶

Haspelmath, M. (2007). Pre-established categories don't exist: Consequences for language description and typology. Linguistic Typology, 11(1). DOI: 10.1515/LINGTY.2007.011
WALS — World Atlas of Language Structures. https://wals.info/
Universal Dependencies (UD). https://universaldependencies.org/
See also: ../research/references.md

Try it (mini harness)¶

Folder: experiments/dhatu/
List toy corpus: python experiments/dhatu/validator.py --list
Compute basic metrics: python experiments/dhatu/validator.py --metrics