title: Computational semantic compression status: draft

Computational semantic compression¶

Principles, metrics, and use cases (storage, communication, processing, learning, education).

Synthesis (draft)¶

Target minimal loss of meaning at fixed bitrate via Dhātu representations.
Evaluate trade-offs: fidelity, decodability, learnability.

Metrics (to refine)¶

Concept coverage, mutual information retained, ambiguity rate, reconstruction accuracy.

Minimal evaluation protocol¶

1) Bilingual toy corpus (FR/EN) of 100 sentences covering AAO, spatial relations, tense, negation, quantification. 2) Gold manual encoding + rule/LLM-guided automatic attempt. 3) Decode to paraphrases and score similarity/ambiguity. 4) Report mean/median/variance for coverage, ambiguity, length, reconstruction.

Micro-cases¶

Simple instruction: "Close the door" → [ACTION:close][OBJ:door][AGENT:addressee]
Yes/No question: "Are you hungry?" → [INTERROGATIVE][STATE:hunger][AGENT:addressee]
Attachment ambiguity: "I saw the man with the telescope"

Implementation notes¶

Version a small Dhātu inventory (v0.1) in YAML for iteration.
Provide a validator with unit tests and toy sentences.
Grow coverage by domain (household objects, motion, social interactions).

Try it (mini harness)¶

Folder: experiments/dhatu/
List toy corpus: python experiments/dhatu/validator.py --list
Compute basic metrics: python experiments/dhatu/validator.py --metrics

References¶

Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379–423; 27(4), 623–656. DOI: 10.1002/j.1538-7305.1948.tb01338.x
Tishby, N., Pereira, F. C., & Bialek, W. (2000). The Information Bottleneck Method. arXiv:physics/0004057. DOI: 10.48550/arXiv.physics/0004057
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507. DOI: 10.1126/science.1127647
See also: ../research/references.md