Skip to content

Semantic universals (Dhātu)

Draft synthesis of assumptions, validation paths, and references.

Synthesis (draft)

  • Problem: identify a compact set of semantic primitives (Dhātu) usable across storage/communication/processing.
  • Context: CI stabilized; documentation refocused on the WHAT (research), not the HOW (infra).

Assumptions

  • A small, stable set of semantic primitives can encode broad conceptual coverage.
  • Human language acquisition stages inform the emergence order of these universals.

Minimal validation protocol (v0)

  • Coverage: map a 100-item frequent concept set (nouns, verbs, relations) to a minimal Dhātu inventory; measure % covered without adding primitives.
  • Ambiguity: for each encoding, count plausible decodings; v0 target ≤ 1.5 interpretations on average (with short context).
  • Reversibility: decode Dhātu reps to EN/FR paraphrases and judge semantic equivalence by humans or a robust LLM (agreement ≥ 0.8).
  • Parsimony: penalize primitive count per concept (median ≤ 4 primitives/concept at v0).

Micro-cases (sanity checks)

1) Agent-Action-Object (AAO) - Input: "The cat hunts the mouse." - Expected Dhātu: [AGENT:cat] [ACTION:hunt] [PATIENT:mouse] [ASPECT:habitual?] - Tests: tense, negation, modality.

2) Possession and location - Input: "The book is on Mary's table." - Expected Dhātu: [OBJ:book] [REL:on] [REF:table] [REL:of] [REF:Mary] - Tests: attachment ambiguity and relation stacking.

3) Simple quantification - Input: "Three children run." - Expected Dhātu: [QUANT:3] [AGENT:child] [ACTION:run] - Tests: plural, indefinites.

Sources (FR journals)

  • See Copilotage journals dated 2025‑08‑30 for context and decisions.

References (selected)

  • Haspelmath, M. (2007). Pre-established categories don't exist: Consequences for language description and typology. Linguistic Typology, 11(1). DOI: 10.1515/LINGTY.2007.011
  • WALS — World Atlas of Language Structures. https://wals.info/
  • Universal Dependencies (UD). https://universaldependencies.org/
  • See also: ../research/references.md

Try it (mini harness)

  • Folder: experiments/dhatu/
  • List toy corpus: python experiments/dhatu/validator.py --list
  • Compute basic metrics: python experiments/dhatu/validator.py --metrics