Searched medieval Sri Lankan and South Indian pharmaceutical traditions for external attestation of H12-decoded Voynich vocabulary, especially the "state-marker" paradigm. Then pivoted to corpus linguistics: downloaded 55M-char Sinhala corpus (Tipitaka), compared grammar/vocabulary patterns, and analyzed specific folio structures.
| § | Topic | Key Result |
|---|---|---|
| 1 | Semantic coverage | 30.4% confirmed, 59.8% with plausible |
| 2 | F85v2 rosette | 347 words, 82.1% glossed |
| 3 | Tier 2 vocabulary | +45 entries, 97.7% coverage |
| 4 | Carter's Dictionary | meda/seda/gala/ugura confirmed |
| 5 | Bhesajjamanjusa | 10/13 decoded terms matched in 13th c. Pali text |
| 6 | Cross-tradition | Tamil NEGATIVE (strengthens Sinhala ID) |
| 7 | Digital resources | 7 searchable sources identified |
| 8 | Plant names in Bhm. | 8/15 plants found |
| 9 | Sinhala commentary | Bhm. switches Pali↔Sinhala throughout |
| 10 | Bhm. thesis structure | Part IV/V notes, no standalone glossary |
| 11 | keda/kleda | l-deletion documented in Pali grammar |
| 12 | Clough's Dictionary | keda="mark/sign"; compounds not lexicalized |
| 13 | Bhm. manuscript catalog | 49 MSS; Nava Jatiya Niganduwa = priority lead |
| 14 | Downloadable resources | Free + purchase + physical-access lists |
| 15 | CCRAS/attestation | kleda-sveda-meda triad confirmed; gala root GAL |
| 16 | Outstanding questions | Checklist of resolved/unresolved items |
| 17 | Jayaweera plants | 12/16 confirmed; tadala=taro correction |
| 19 | Manchester MSS | Dead end — all Buddhist canonical texts |
| 20 | Chandrasena | aralu/bulu/mara/tamala/gara/mula confirmed |
| 21 | Tadala morphology | Taro not Palmyra; dala-sAriRI = Colocasia |
| 22 | BM catalog | Behet-vattoru-pot; Yogaratnakaraya 49 chapters |
| 23 | Bodleian MSS | Yogamuktavali: 7/15 chapters = decoded parallels |
| 25 | Sinhala corpus | 55M chars; koṭa problem RESOLVED (no /o/ in H12) |
| 26 | Vowel collapse | 1.14:1 compression; only 1.2% o→u ambiguity |
| 27 | Vocabulary concentration | TTR 0.160 = NORMAL for recipe sublanguage |
| 28 | Grammar patterns | u-prefix 40.6% = 31x overrep (largest anomaly) |
| 29 | ud-/ut- prefix | Enriched in pharma Sanskrit but only 1.8% in Bhm. |
| 30 | Folio structures | f49v=alphabet key; f103+ problem-then-solution |
| 31 | Files modified | List of all changed files |
| 32 | f79r specific lines | Lines 7,12,20,25,34,39 analysis |
| 33 | f66r unknown chars | 'x' at M.10, M.24 = unidentified glyphs |
| 34 | Testable predictions | 4 new tests proposed |
| 35 | Honest assessment | Evidence strength & remaining vulnerabilities |
| Level | Tokens | % |
|---|---|---|
| CONFIRMED (locked meanings) | 11,245 | 30.4% |
| PLAUSIBLE (reasonable, unverified) | 10,893 | 29.4% |
| PARTIAL (one component known) | 9,087 | 24.5% |
| PROPOSED (unverified) | 2,818 | 7.6% |
| Dictionary match, no meaning | 864 | 2.3% |
| Completely opaque | 1,849 | 5.0% |
| Noise/artifacts | 268 | 0.7% |
Strict reading: 30.4%. With plausible: 59.8%. Any meaning: 91.9%.
- 347 words, 82.1% glossed
- Pharmaceutical vocabulary throughout (uteda, ugala, ula, ala)
- ugara (throat) 7× — possible throat-preparation section
- Saved:
Paper/data/f85v2_rosette_decoded.tsv
- 45 entries added to decoded_vocabulary.tsv
- Clearly marked: tier column, TIER2_HIGH/TIER2_MEDIUM confidence, "TIER2 PROPOSED:" notes
- Combined coverage: 95.1% → 97.7% (+2.54%)
- gediya (ගෙඩිය): "fruit; bulb; boil, tumour, lump, knot" + "snake poison"
- a gadaya (අ ගදය): "drug, medicine" (Sinhala: beheta) — gada root = medicine
- teda (තෙද): Elu form of tejaya = "fire, heat, pungency"
- tejo-dhatuva: "element of fire; bodily heat and digestive power"
- meda (මෙද): "marrow, fat" AND "a drug, one of the 8 principal medicaments" (ashtavarga)
- sedaya (සේදය): "warmth, heat, perspiration" (< Skt sveda = sudation)
- gala (ගල): "stone, rock" + "(Sans) throat" — dual meaning confirmed
- garanavā (ගරනවා): "to sift, riddle, screen sand; cleanse grain" — sifting verb
- garada (ගරද): "poisoning, poisonous, unwholesome"
- garaya (ගරය): "sickness, poison, antidote"
- ūla (ඌල): "fountain, spring of water" — exact match
- leḍa (ලෙඩ): "illness, disease" — standard Sinhala disease term
- ugura (උගුර): "throat, gorge"
- teda: ATTESTED — Elu fire/heat/pungency (decoction = heat process)
- geda: CONNECTED — gadaya = drug/medicine; gediya = fruit/lump
- seda: ATTESTED — warmth/heat/perspiration (sudation therapy)
- meda: CONFIRMED — fat + named medicinal (ashtavarga)
- keda: Carter's says "weariness, fatigue" — USER INSIGHT: fatigue IS a symptom/condition in Ayurveda
- "Seda meda visosano" — seda + meda paired in pharmaceutical formula (desiccation of sweat + fat)
- "Meda seda visosano" — same pair reversed, appears 3 times
- "dve meda (mahamevan, sulumevan)" — "two medas" WITH SINHALA PLANT NAMES
- "maha-sedam" = great steam bath (sudation therapy)
- "sedetum" = infinitive "to cause sweating"
- "Snehano sedano tikkho" — "oleating, sweat-inducing, sharp" (pharmaceutical properties)
- "Kapha meda gala amaye" — THREE decoded terms in one line (phlegm + fat + throat + disease)
- "thula mulani" — coarse roots (both decoded terms together)
- "usnam sula-haram" — "hot, pain-removing" (pharmaceutical property)
- "Gulma-sula" — abdominal mass pain (disease category)
- "gula = molasses" — confirmed as pill-binder; "gulani" = pills (plural)
- "Mulam sadhu virecanam" — "root is good as purgative"
- "Garaya" — poison/sickness; "Una-Gara" = disease demon
Bhesajjamanjusa chapters organized by drug vehicle (Toyavagga=water, Madhuvagga=honey, Telavagga=oils) match decoded Voynich's organization by state-markers.
- Sahasrayogam chapter structure (Kashaya/Ghrita/Taila/Churna) maps to decoded state-markers
- Kerala has 28 Visha Vaidya centers for gara (compound poison) treatment
- Sanskrit meda dhatu = fat tissue, core Ayurvedic concept
- NEGATIVE finding strengthens Sinhala ID: every verifiable term matches Sinhala/Pali, not Tamil
- Tamil uses different vocabulary for same concepts (vadi not gala, tontai not ugara)
- Shared vocabulary comes through Sanskrit substratum only
- Sarartha Sangrahaya (4th c.) — earliest Sri Lankan medical text
- Bhesajjamanjusa (13th c.) — only Pali medical text, now in our references
- Yogaratnakaraya (15th c.) — first Sinhala medical textbook
- Vatika Prakaranaya (1879) — 5,293 verses on pills and pastes
- Vanavasa Nighanduva — Kandyan pharmaceutical plant glossary
- Past-participial -la suffix: kakala (having boiled), viyala (having dried) — from Bodleian MSS
- u- prefix productive: ugura, ugena, udara all attested
- SOAS Bhesajjamanjusa critical edition — NOW IN OUR REFERENCES
- Clough's Dictionary (1892) — Archive.org full text
- CCRAS Ayurvedic portal — 35 texts, keyword searchable
- Carter's Dictionary (1924) — DSAL, browsable by page
- British Museum Sinhalese MSS catalog (1900) — Archive.org OCR
- Wellcome Library 469 Sinhala medical MSS — Scribd catalog
- Dictionary of Medicinal Plants (906 species) — searchable PDF
| Decoded Voynich | Pali Form in Bhm. | Found? | Lines |
|---|---|---|---|
| aralu | abhaya(m) | YES | 3731, 3961, 6988, 7143, 7387 |
| bulu | buluki (Pali-ized Sinhala!) | YES | 5158 |
| nelli | amalaka(m) | YES | 3711, 3956, 6177 |
| ela (cardamom) | ela | YES | 4904 (with pancakola + hapusa) |
| kera (coconut) | kera, nalikera | YES | 6548 ("kera telam = coconut oil"), 6969 |
| uga (fig) | udumbara, niggodha, assattha, pilakka | YES | 8298, 3691 |
| inguru (ginger) | sunthi, singi, nagaram | YES | 3726, 5139, 7083, 5965 |
| tamara | tala, kharjura | PARTIAL | 5805, 15916-15933 |
| gamsara (sarsaparilla) | — | NO | — |
| pudina (mint) | — | NO | — |
| sarala (pine) | — | NO | — |
| ata/datura | — | NO | — |
| mara (Solanum) | kantakari, vrhati | INDIRECT | 16195-16205 |
| kurundu (cinnamon) | tacam (bark) | INDIRECT | 3624, 5829 |
| karabu (clove) | karabhim(?) | UNCERTAIN | 6215 |
Key note: "buluki" (line 5158) is a Pali-ized form of Sinhala "bulu" — shows the author borrowed directly from Sinhala rather than using purely Sanskrit-derived Pali forms.
- Line 14170: "It has been up to the present point carried on in Sinhalese, but now the commentator begins to give his explanations in Pali."
- Line 14393: "The passages are long and are interlaced with Sinhalese passages."
- Line 16067: "The Sinhalese paraphrase explains it as 'sau-varci-ksaro'."
| Pali Group | Sinhala Names |
|---|---|
| balattayam (3 bala plants) | kotikan-bewila, mahabewila, siriwedi bewila |
| dve meda (2 meda plants) | mahamevan, sulumevan |
| catupannikam (4 panni) | asvenna, pusvenna, masvenna, munvenna |
| jivakosabhamo | div, osabiya |
| saha | sulu, maha, geladi |
| vira | kavelau / bimpusula |
| kalinga | komadu (hill-grown gourd) |
| alabu | lapu (bottle gourd) |
| madhuka | Mee (Sinhalese), Illuppai (Tamil) |
| panasa | Kos (Sinhalese jackfruit) |
passora-gala-roga-ari = "destroyer of diseases of sides, throat"
- passa (sides) = Sinhala for Sanskrit hrt
- ura (chest) = Sinhala
- gala (throat) = same in Pali and Sinhala
meda (13×), gula (2×), sula (4×), mula (multiple), gala (multiple), gara (4×), sara (2×), kala (3×), thula (1×), seda (multiple). NOT found: leda, ula (standalone), mea — all Sinhala-specific forms.
Part IV Notes (lines 13950-16660) contains the "unlisted Pali scientific terms":
- 30+ botanical identifications with Latin/Sinhala/Hindi/Tamil names
- Verse-by-verse commentary with Sinhala paraphrase explanations
- Comparative section (Siddhasara vs Bhesajjamanjusa) from line 15599
- Part V Essay (lines 16660-27430) catalogs 49 medical manuscripts including:
- Behet Patuna: "index of medicines in Sinhalese and Sanskrit" (BM Or. 6612.109)
- Saraswathie Bighanduwa: "dictionary of medical material in Sanskrit and Sinhalese"
- Sri Vasudeva Nighanduwa: "Sanskrit slokas with Sinhalese and Tamil synonyms"
kedāra [Kleda(klida)+āra] ... "lalopo" (l-deletion) ... kledīyatīti kedāraṃ
Translation: "kleda → keda" via deletion of l (lalopo), producing kedāra = "wet field/paddy field"
- kledanam found in Bhesajjamanjusa line 6024:
"Saraudclam kledanam guru"= "moistening, heavy" — a formal drug property classification - kleda in Ayurveda = moisture/dampness; pathological excess = disease factor
- Kledaka Kapha = one of 5 Kapha subtypes (stomach moistening for digestion)
- Sanskrit Apte dictionary: kleda = "wetness, moisture, dampness; discharge from a sore"
| Marker | Etymology | Meaning | Status |
|---|---|---|---|
| teda | < Skt tejas via Elu tejaya | fire/heat/pungency | ATTESTED |
| seda | < Skt sveda | sweat/heat/perspiration | ATTESTED |
| meda | < Skt medas | fat/marrow + named medicament | CONFIRMED |
| geda | ← gadaya/gediya | drug/medicine; fruit/lump | CONNECTED |
| keda | < Skt kleda via lalopo | moisture/dampness/wet-state | NEWLY ATTESTED |
The keda = kleda pathway is not a reconstruction — it is a documented Pali grammatical rule cited in the Dhānapada-ṭīkā (verse 447).
| Term | Clough Entry | Significance |
|---|---|---|
| keda | "mark, sign" | NEW meaning (Carter had "weariness") |
| Me'dd | "drug, root resembling ginger; one of 8 principal medicaments; cooling, emollient; fever/consumption" | Richer than Carter |
| Gediya | "fruit; boil, tumour" — dozens of plant compounds | Core botanical term |
| Se'da | "silk; sweat, perspiration; heat, warmth" (Pali < sveda) | Confirmed |
| Ugura | "throat" (pl. uguru) | Confirmed (note: ugura not ugara) |
| Garanawd | "to cleanse grain, separate from dirt, to sift" | Exact match |
| Teja | "power, fire, heat, pungency" | teda NOT a headword; teja is standard |
| kleda | NOT FOUND as Sinhala headword | — |
| ugeda/uteda/gameda | NOT FOUND | Compounds not lexicalized |
- Guli Kalka Kaviliya — "preparing guli (pills) and kalka (pastes)" — matches decoded gula
- Taila Vidhiya — "preparation of medicinal oils" (88 slokas)
- Vaidyalankaya — herb gathering, drug compounding, decoctions/oils, auspicious times
- Vaidyama Samgraha — "purifying metals and substances for medicinal preparations"
- Nava Jatiya Niganduwa — "obsolete Sinhalese words with Sanskrit equivalents," ~600 yrs, BM Or.6612.75
- Vanavasa Nighanduwa — ONLY dict including Pali alongside Sanskrit/Tamil→Sinhala
- Sara Niganduwa — dated 1265 AD, compiled by monk at Dambulla
- Siddha Usada Bighanduwa — "widely used by medical students," printed edition
- Birimal Nighanduwa — drug dict in Sinhala verse, dated 1748
- wattoru-pot = "manuals of prescriptions" (standard format name)
- behet = medicines (Behet Patuna = "index of medicines")
- guli = pills, kalka = pastes
- Sveda-vidhi = sudation method (Yogaratnakaraya ch.44)
- Medical knowledge as "family heirlooms" in specialist families
- All MSS in BM Nevill Collection: Or. 6612.xxx
Nava Jatiya Niganduwa (BM Or. 6612.75) — ~600yr old glossary of "obsolete Sinhalese" pharmaceutical terms. If it contains keda/geda/teda/seda/meda, definitive evidence for the state-marker paradigm.
- Jayaweera "Medicinal Plants Used in Ceylon" (5 parts, 625 species) — Archive.org + Jaffna Univ
- Chandrasena "Chemistry & Pharmacology of Ceylon Medicinal Plants" — Archive.org
- Academia.edu — possibly extended Bhesajjamanjusa edition
- Manchester — 21 digitized Sinhalese palm-leaf MSS
- Scribd — possibly Sri Vasudeva Nighanduwa
- Bhesajjamanjusa ch. 19-60 — PTS https://palitextsociety.org/product/bhesajjamanjusa-ii/
- Yogaratnakaraya (Sinhala), Vanavasa Nighanduva, Vatika Prakaranaya, Behet Patuna, Sarartha Sangrahaya, Nava Jatiya Niganduwa
CCRAS portal was down (Indian gov server). Used Sanskrit dictionaries + Charaka Samhita Online instead.
- Sveda (sweat) = waste product (mala) of meda dhatu (fat) metabolism
- Sveda maintains kleda (moisture) balance
- Sweat channels (swedavaha srotas) originate from meda dhatu
- 3 of 5 state-markers form a documented Ayurvedic physiological system
Sanskrit root GAL = "to drop, to distil." Causative galaya:
- "to percolate" (Dashakamacharita 156.2)
- "to sift" (Sushruta 1.165.18)
- "to dilute" (Sushruta 1.166.6) Throat + filtering share the same root — not mere homophony.
"Gara visha is prepared artificially by combination of various substances. It produces various diseases." Third poison category alongside plant + animal.
Sushruta), gara (Shabda Sagara, Charaka), meda (Shabda Sagara + Plant Names Dict — 5 species), teja (Shabda Sagara), sveda (Shabda Sagara, Charaka Sharira 7/15), leda (Sinhala dictionaries). NOT found: ugeda, uteda, gameda, ula (water meaning), ugara.
- Bhesajjamanjusa plant names? → YES, 8/15. See §8.
- Keda etymology? → YES, kleda via lalopo. See §11.
- Sinhala glosses? → YES, commentary switches to Sinhala. See §9.
- "Unlisted Pali scientific terms"? → Distributed in Part IV Notes. See §10.
- Clough's dictionary? → keda="mark/sign"; compounds not lexicalized. See §12.
- Part V manuscript catalog? → 49 MSS cataloged, key formularies identified. See §13.
- CCRAS portal results → Portal down; used alternatives. kleda-sveda-meda triad confirmed. See §15.
- Jayaweera 625 species → 12/16 decoded plant names confirmed. See §17.
- Manchester palm-leaf MSS → Dead end — all Buddhist canonical. See §19.
- Chandrasena → aralu/bulu/mara/tamala/gara/mula confirmed. See §20.
- koṭa problem → RESOLVED: H12 cannot produce koṭa; uses -la instead. See §25.
- Vowel collapse severity → RESOLVED: 1.14:1, only 1.2% real ambiguity. See §26.
- Repetition problem → RESOLVED: normal for recipe sublanguage. See §27.
- u-prefix anomaly → IDENTIFIED: 40.6% vs 1.3%, partially pharmaceutical. See §28-29.
- Folio structure → f49v=alphabet key, f103+ problem-then-solution. See §30.
- Nava Jatiya Niganduwa (BM Or. 6612.75) — needs physical access
- Bhesajjamanjusa chapters 19-60 (PTS purchase or other source)
- Bodleian medical MSS (Oxford) — 7 pharmaceutical manuscripts, needs physical access
- British Library medical MSS — Yogaratnakara (457 folios), Vattorupota
- f66r unknown characters — need high-res image comparison with Brahmic scripts
- Star-type correlation test — needs digitized star-type data
- Recipe internal coherence test — can run with current data
- f49v character order vs Sinhala syllabary — can run with current data
- tadala/pudina/amu corrections to paper — not yet applied
- Complete readable passage for independent Sinhala scholar verification
All 5 parts of Jayaweera's "Medicinal Plants Used in Ceylon" (625 species, 48,190 lines) downloaded and searched systematically against 16 decoded plant names.
| Decoded | Jayaweera Match | Species | Notes |
|---|---|---|---|
| aralu | Aralu, Terminalia | Terminalia chebula | Triphala member; "greatly valued" |
| bulu | Bulu, Terminalia | Terminalia bellirica | Triphala member |
| nelli | Nelli, Phyllanthus | Phyllanthus emblica | Triphala member |
| ata/attana | Attana, Datura | Datura metel | "Large Thorn-apple" — spiny capsules |
| mara | Mara, Solanum/Cissampelos | Solanum nigrum / C. pareira | Nightshade family confirmed |
| kera | Kekiri/Pipinja | Cucumis sativus | Multiple cucumber species listed |
| sarala | Sarala, Pinus | Pinus spp. | Pine resin medicinal |
| tamala | Tamalapatra | Cinnamomum tamala | = Cinnamomum synonym, cinnamon leaf |
| thula | Sthula churna | (processing term) | "Coarse powder" — not a plant name |
| gula | Gulika/Gutika | (dosage form) | "Pill" — confirmed pharmaceutical |
| mula | Mula | (root general) | Universal Ayurvedic term |
| pudina | Pudina | Mentha spp. | NOT Sinhala — Sanskrit/Tamil/Hindi loan |
-
tadala = Taro (Colocasia), NOT Palmyra Palm
- Jayaweera: "Tala, Tala-goya" = Cyperus rotundus (nut-grass)
- "Tal-ala" = taro tuber (Colocasia esculenta)
- Palmyra Palm (Borassus) Sinhala name = "Tal" not "tadala"
- Previous identification was WRONG — needs correction
-
pudina is NOT Sinhala — Sanskrit/Tamil/Hindi borrowing; no Elu form
- Genuine Sinhala for mint would be different
- Weakens f14r identification but doesn't invalidate it
-
amu = Kodo millet (Paspalum scrobiculatum) — NEW identification
- Jayaweera: Amu = Paspalum scrobiculatum (Kodo millet)
- Currently unidentified in decoded_vocabulary.tsv (3 tokens, EVA ysho)
- Grain used in traditional medicine
- olea (olive) — no matching Sinhala name in Jayaweera
- talasa (date-palm variant) — not directly matched
- rameda — pharmaceutical compound, not a plant
- ugeda — processing state, not a plant
- thala (7 tokens, EVA cthal): Could be Sesamum indicum (sesame)
- Jayaweera: "Thala" = sesame, one of most important oil plants
- Currently glossed as "place/put" compound — needs contextual check
- f70r2 (zodiac) has "tala" — unlikely to be sesame there
- VERDICT: Possible but requires folio-by-folio context analysis
All 32 digitized manuscripts at Manchester/Rylands are Buddhist canonical texts (Tipitaka/commentaries). None are medical or pharmaceutical. The 21 palm-leaf manuscripts donated by T.W. Rhys Davids in 1915 are exclusively Pali scriptural texts. ~40 un-digitized manuscripts remain — medical content possible but unknown. Catalog: Jayawickrama 1972 (not online).
UK Sinhalese medical MSS are at Bodleian (Oxford) and British Library (London):
- Bodleian MS Sansk.c.123(R): Yogamuktavali-samgraha — formulary organized by prep type (peya, modaka, leha, curna, kalka, gutika/guli, taila, ghrta, nasya, anjana, kvatha, sveda, dhupa, pralepa)
- Bodleian MS Sansk.c.125(R): Vaidyalankara-samgraha + Bhesajjamanjusa fragment
- Bodleian MS Sinh.d.5(R): Tailavidhiya — Sinhala manual on medicinal oil preparation
- Bodleian MS Sinh.d.3(R): 49+ diseases with pharmaceutical recipes
- BL Or. 4142: Yogaratnakara — 457 folios, 49 chapters, ends with Vishnu-raja-guliya pill
- BL: Vattorupota (Behet-vattoru-pot) — physician's formulary carried in practice
- Royal College of Physicians: Vattoru-pota (early 19th c. palm-leaf)
Accessed on Archive.org: https://archive.org/details/dli.ernet.8078 Full OCR text downloaded (14,050 lines, 88.77% OCR confidence).
| Term | Match | Details |
|---|---|---|
| aralu | YES | Terminalia chebula, p.102 — Triphala member |
| bulu | YES | Terminalia bellerica, p.101 — Triphala member, kernel narcotic |
| mara | YES | In compound "Sooriya-mara" = Albina odoratissima (large tree) |
| tamala | YES | Index entry p.85 (cross-ref in Melia section) |
| gara | YES | In compounds "Patala-garadu", "Hamasagara" — poison-related names |
| mula | YES | "Pancha Mula" (five-root preparation) — key Ayurvedic formulation |
| ata | PARTIAL | "Aththa" for Anona muricata (Katu Aththa) and A. reticulata (Wali Aththa) |
Book uses English for preparation terms ("decoction" 75×, "oil", "pellets") not Sinhala/Sanskrit.
Three possible analyses for the plant label on f9r:
| Analysis | Meaning | Evidence |
|---|---|---|
| ta + dala | "that petal/leaf" | Sinhala dala = petal/leaf; decoded vocabulary uses this |
| tal + ala | "palm-tuber" = taro | Jayaweera: "tal-ala" = Colocasia; SESSION_NOTES correction |
| tadala (unitary) | Plant name | Ceylon plant list; originally identified as Borassus |
Key finding: Sanskrit dala-sAriRI = Colocasia antiquorum (literally "leaf-bodied" = taro). This supports the taro identification — taro is literally "the leaf plant" in Sanskrit.
Palmyra palm in Sinhala = tal (< Sanskrit tAla), NOT tadala. The paper (main.tex line 611) still has uncorrected Borassus identification.
Full OCR text searched (30,826 lines). No state-marker terms found (expected — librarian descriptions, not manuscript content). Key findings:
"Every village vedarala or physician carries with him one or more similar collections of prescriptions, commonly known as Behet-vattoru-pot or simply Vattorupot." Remedies derived from "Susruta, Manjusa, Yogaratnakara." Two specimens: Egerton 1113 (art. iv) and Or. 4999.
- MS no. 52 (Or. 4142): 457 palm leaves, 49 chapters, 14th century
- Chapters include: Dravyagana-cikitsa (drug classification), Pancakarma (five treatments), Sveda-vidhi (diaphoretics), Visha-vidhi (poisons), Vajikarana (tonics)
- Ends with "Vishnu-raja-guliya" — a named pill formulation
- Second copy: MS no. 53 (Or. 1049)
- MS 52-53: Yogaratnakaraya (2 copies)
- MS 54: Prescriptions and charms (Sloane 1402, 17th c.)
- MS 55: "A manual of Physik in the language spoken upon Island Ceilon" (Sloane 3417)
- MS 56: Viyaru-visa-utpattiya (hydrophobia/poisons, AD 1697)
- MS 57: Viyaru-lakshana (mad animal bite symptoms, 116 stanzas)
- MS 58: Sinhalese pharmacopoeia + Vattorupota + Sarasamgraha fragment
- MS 59: Charms and prescriptions incl. children's diseases
- MS 60: Behet-vattoru-pot (102 leaves) — emetics, purgatives, fever, piles, worms, etc.
- MS 61: Yogaratnamalava (1816) + prescriptions
- MS 65: Yantra-pota (amulet book, ~60 diagrams)
Susruta, Manjusa (= Bhesajjamanjusa, by Atthadassa Thera, c. AD 1267), Yogaratnakaraya, Sararthasangraha (King Buddhadasa, AD 341-370)
Source: Liyanaratne, "Sri Lankan Medical Manuscripts in the Bodleian Library," JEAS Vol. 2 (1992).
Full text already at: references/sinhala_medical/bodleian_sri_lankan_manuscripts.txt
- Author: Don Hendrik Samaratunga of Alutgama (Kalutara District)
- Date: 1855 AD
- Sanskrit text with Sinhala translation (sanne)
- 15 chapters organized by PHARMACEUTICAL DOSAGE FORM:
| Ch. | Sanskrit Name | Preparation Type | Voynich Decoded Parallel |
|---|---|---|---|
| 1-2 | peya kanda | Gruels (4 types) | — |
| 3 | modaka kanda | Confections | — |
| 4 | leha kanda | Electuaries | ea? (cow-product) |
| 5 | curna kanda | Powders | ugeda (476 tokens) |
| 6 | kalka kanda | Pastes | — |
| 7 | gutika kanda | Pills | gula (111 tokens) |
| 8 | taila kanda | Oils | meda (425 tokens) |
| 9 | ghrta kanda | Ghee | ea (339 tokens) |
| 10 | nasya kanda | Nasal applications | — |
| 11 | anjana kanda | Eye applications | — |
| 12 | kvatha kanda | Decoctions | uteda (323 tokens) |
| 13 | sveda kanda | Sudation | seda (attested) |
| 14 | dhupa kanda | Fumigations | — |
| 15 | pralepa kanda | Plasters | — |
This is the strongest structural parallel yet. The Yogamuktavali-samgraha organizes ALL pharmaceutical knowledge by preparation type — matching the decoded Voynich "state-marker" paradigm exactly. 7 of 15 chapter categories have direct decoded parallels.
Contents include: drug collection rules, weights/measures, 7 types of kasaya (decoction), oil preparation proportions (kalka:sneha:liquid ratios), oil boiling degrees (mrdupaka etc.), specific oils by type (sesame=talatelehi, coconut=poltelehi, castor=endarutel, mustard=abatel), drug groups: mahapasmul, sulupasmul, trijataka, caturjataka, pancakola. Published edition: ed. Robert Batuvantudawe, Colombo 1950.
50+ named oil preparations with ingredients and instructions. Organized by drug groups (gana).
49+ diseases (head-to-foot), with recipes. Begins with 3 types of headache (vata, pitta, slesma).
- Paris: 5 Sinhala medical MSS at BNF (documented by Liyanaratne 1987)
- NLM (US): Some digitized images of Sinhala palm-leaf medical MSS
- Northwestern Casey Wood: 27 ola MSS on medical subjects (finding aid online)
- McGill Osler Library: 20 medical olas, mostly uncatalogued
- Wellcome Library: 469 palm-leaf MSS (Somadasa catalog 1996, 420pp)
Downloaded the full Buddha Jayanthi Tripitaka Sinhala translation (207,293 text blocks, 55 million characters) and compared word usage patterns against decoded Voynich.
- ගෙණ (gena, "having taken"): 49,459 occurrences
- කොට (koṭa, "having done"): 57,296 occurrences ← CLASSICAL form
- කර (kara, "having done"): 7,485 occurrences ← MODERN form
- ද (da, question/also): 186,382 occurrences
- ම (ma, self/emphasis): 41,577 occurrences
- බෙහෙත (behet, "medicine"): 1,213 occurrences (even in Buddhist text!)
- මේද (meda, "fat"): 1,217 occurrences (in "32 parts of body" recitation)
- ලෙඩ (leda, "illness"): 248 occurrences
- උගුර (ugura, "throat"): 81 occurrences
- Participial chaining confirmed: koṭa gena = 558× in corpus. Same structure as decoded Voynich "gena gala" (take then strain), "gena tha" (take then place).
- Object-Verb order confirmed: "siwura gena" (robe take) = same as "ula gena" (water take).
- Da clause-boundary confirmed: 186,382× in corpus, always clause-final.
Classical Sinhala uses koṭa (57,296×) as dominant past participial. Decoded Voynich has ZERO koṭa. RESOLVED: H12 decoder CANNOT produce koṭa — no /o/ vowel (EVA o → /u/), no retroflex /ṭ/. The decoder uses -la suffix instead (6,199 tokens), which IS the modern Sinhala conjunctive participle (karala, genala, ugala). This is a limitation of the 4-vowel encoding, not evidence against the hypothesis.
H12 decoder has 4 vowels (a, e, i, u). Real Sinhala has 12+ (a, ā, æ, ǣ, i, ī, u, ū, e, ē, o, ō).
- 1,470,278 dictionary words → 1,284,970 collapsed forms
- 1,148,341 forms (89.3%) have NO collision at all
- Most collisions are vowel-length variants (ula vs ulā vs ūla) = same root
| Decoded | Could also be | Different meaning? |
|---|---|---|
| ula (spring) | ola (pot/lamp) | YES — distinct |
| ura (chest) | ora (edge/bank) | YES — distinct |
| gula (pill) | gola (ball) | YES — distinct |
| kura (chick) | kora (lame) | YES — distinct |
| uda (above) | oda (creek) | YES — distinct |
| uta (upward) | ota (that) | YES — distinct |
ala, gala, kara, mara, gara, ara, meda, ena, gena, seda, leda
Only 1.2% of vocabulary is affected by o→u ambiguity. The 4-vowel system creates manageable, identifiable ambiguity — not chaos. The koṭa→kuta collapse is a specific instance of this pattern. Context-dependent disambiguation is feasible.
- The narrow vocabulary is NOT caused by vowel collapse (only 14% compression)
- The o→u collapse creates specific identifiable ambiguities (ula/ola, ura/ora, gula/gola)
- A future refinement could attempt to recover the /o/ vowel from context
Compared decoded Voynich vocabulary concentration against real Sinhala texts and published research on medieval recipe text vocabulary.
| Metric | Decoded Voynich | Tipitaka (Buddhist) | Jayaweera (plants) |
|---|---|---|---|
| Tokens | 37,024 | 79,614 | 271,330 |
| Vocab | 5,921 | 9,372 | 25,580 |
| TTR | 0.160 | 0.118 | 0.094 |
| Top 20 cover | 26.0% | 21.3% | 22.4% |
| Top 50 cover | 42.0% | 32.2% | 31.9% |
| Top 100 cover | 53.1% | 40.9% | 40.8% |
| Hapax | 67.9% | 46.9% | 62.1% |
- TTR: Voynich (0.160) is LESS repetitive per-token than both comparison texts
- Top 20: 26% coverage is normal — comparable to all text types
- Top 50-100: Higher than non-recipe texts but EXPECTED for pharmaceutical sublanguage
- Published research confirms recipe texts are "sublanguages" with lexical closure
- Medieval recipes across all cultures follow rigid INGREDIENT + QUANTITY templates
- The word "kalandayi" (weight unit) appears 8× in a single real Sinhala recipe passage
kottamalli dekalandayi, valmi dekalandayi, handun kalandayi,
papiliya kalandayi, miris kalandayi, ...
kalanduru ala tun kalandayi, komarika ala dekalandayi,
vatura ata ekata kakara hat velak denu
Structure: ingredient + measure, ingredient + measure, water + amount, boil, give. "kalandayi" repeats 8×; "ala" (tuber) appears twice; "vatura" (water) for liquid. IDENTICAL structure to decoded Voynich: ula gena (water take), ugeda (drug) repeated, gala (strain), tha (place).
- Voynich: 2.9% consecutive duplicates (in 700-word sample)
- Tipitaka: 0.1% consecutive duplicates
- Higher in Voynich but explained by recipe format (same category term per ingredient)
VERDICT: The "repetition problem" is a genre feature, not a decoder artifact.
Downloaded and analyzed the Buddha Jayanthi Tipitaka Sinhala translation (55M characters, 789,614 words in 5M-char sample) as comparison corpus.
- Participial chaining — Real Sinhala: koṭa gena (558×), gena gos (30×). Voynich: gena gala (13×), gena tha (14×). SAME syntactic structure.
- Object-Verb order — Both use SOV: "siwura gena" (robe take) vs "ula gena" (water take).
- -la conjunctive participle — Real Sinhala: -ල 3,765 instances. Voynich: 6,241 tokens (16.9%). This IS a real Sinhala morphological pattern, though Voynich has higher frequency.
- da clause-boundary marker — Real Sinhala: 186,382×. Voynich: 156 tokens. Same position.
- Sentence-final verbs — Sinhala: veyi (5,526×), da, nam, yi. Voynich: verb-final tendency.
- Reduplication — Sinhala: 183 consecutive doubles in 789K words (0.02%). Voynich: higher rate, but Sinhala DOES use reduplication for emphasis/iteration.
- -ena suffix — Sinhala instrumental/participial. Voynich: 8.8% of tokens.
- gena family — Largest verb family in both Sinhala (49,459×) and Voynich (810+ tokens).
| Source | u-initial words | % of all words |
|---|---|---|
| Real Sinhala (Tipitaka) | 10,255 | 1.3% |
| Sinhala with o→u collapse | 14,186 | 1.8% |
| Decoded Voynich | 15,026 | 40.6% |
31x overrepresentation of u-initial words in decoded Voynich.
Root cause: EVA word-initial 'o' (22.2% of words) and 'qo' (14.6% of words) both decode to u-. Rule 21 treats 'q' as silent (it always precedes 'o'). So ~37% of all EVA words decode to u-initial. This is a well-known statistical feature of the Voynich manuscript (o/qo word-initial dominance).
In real Sinhala, u-initial words are mostly Buddhist terminology (upan=born, upada=arising). They are NOT a productive morphological prefix.
The decoded Voynich uses u- as if it's a determiner or article ("THE-crude-drug", "THE-decoction", "THE-pill") — but Sinhala has NO article system.
This is the single largest structural mismatch between decoded Voynich and real Sinhala.
Possible interpretations:
- The initial EVA 'o'/'qo' is NOT a vowel u- — it may be a scribal convention, word-boundary marker, or represent a different phonological element
- H12 Rule 21 (q→silent) is wrong — 'q' may encode a consonant
- The source language uses u- as a productive prefix (not standard Sinhala)
- The u- prefix might represent a Pali/Sanskrit prefix (ud-/ut- = upward/out) that was productive in pharmaceutical terminology but not in general prose
- -eda suffix (7.0%): ugeda (470), meda (444), uteda (320), leda (127) = the "state-marker" pattern. Not a standard Sinhala suffix.
- -ara suffix (6.4%): gara (327), ugara (300), utara (229)
- State-marker compounds u+ROOT: 12.3% of all tokens
- Single-syllable function words: ula, ura, eda, ena, ara, uga, uta, ea, ala, ga account for substantial token share
| Initial | EVA (%) | Decoded (%) | Sinhala (%) | Match? |
|---|---|---|---|---|
| u-/o- | 22.2+14.6 | 40.6 | 1.3+0.5 | 31x overrep |
| a- | 5.0 | ~5.0 | 4.7 | YES |
| e- | 0.4 | ~3.5 | 2.4 | Close |
| k- | 3.2 | ~3.0 | 7.7 | Under |
| s- | 11.8 | ~3.5 | 8.1 | Under |
| m- | 0.0 | ~2.5 | 9.2 | Under |
| g- | 0.0 | ~3.0 | 1.6 | Close |
Note: EVA initial 'ch' (16.1%) decodes to various forms; 'd' (9.7%) and 's' (11.8%) also have high EVA frequencies but their decoded distributions need checking.
Investigation into whether the 40.6% u-initial anomaly in decoded Voynich is explained by pharmaceutical Sanskrit/Pali ud-/ut- prefix concentration.
Findings: ud-/ut- IS disproportionately concentrated in medical texts:
- udaka (water) = standard pharmaceutical vehicle
- udvartana (upward massage), utkarika (poultice), utklesha (emesis)
- Panchakarma preparatory procedures heavily saturated with ud-/ut- terms
- Maps to extraction, emesis, massage, anatomical terms
But: Even in the Bhesajjamanjusa, u-initial words are only ~1.8%. The 40.6% in decoded Voynich remains a 22x overrepresentation vs the most u-heavy text we can find. Partly explained by EVA 'o' and 'qo' dominance (Rule 21 treats 'q' as silent), but the gap is still the largest structural anomaly.
- 25+ individual Voynich characters listed vertically in left margin
- First 5 have Arabic numerals 1-5 written next to them
- Under H12, each character decodes to an individual Sinhala phoneme:
- Consonants: f(ca), r(ra), k(ka), s(sa), p(pa), d(ga)
- Vowels: o(u), y(a), e(e)
- Several positions illegible (* in transcription) — possibly less common chars (sh=ma, ch=devoicer)
- CONSISTENT WITH a writing system key/reference page
- Scholars note this may have been added after main text, or may be an early decipherment attempt
- L section: 15 single words (labels around illustration)
- Decoded: rara, ralasa, ura, gara, agacula, sala, salaca, cara, utesa, agala...
- M section: 30 individual characters — another alphabet listing similar to f49v
- Has reclining figure at bottom with "der Mussteil" in German (Latin script)
- One of very few folios with readable text in a known language
- 9 single characters labeling parts of bathing illustration
- Decoded: sa, ga, (silent), sa, u, la, ka, ra, sa
- Likely abbreviations for ingredients or anatomical points
- ~977 short paragraphs across 21 folios
- Each marked with marginal star (varying types)
- Stars vary: tailed vs untailed, red center vs yellow vs blank
- f103/f116 (outer bifolio) specifically lack tailed stars
KEY FINDING: p/f-initial recipe header pattern
- 161/977 paragraphs (16.5%) start with p/f-initial word (pa-/ca-)
- These appear roughly every 6 paragraphs (average gap 6.1)
- Pattern on f103r: P.1(pedala), P.5(puarala), P.13(pudara), P.18(puleda), P.21(pea), P.30(pulagada), P.37(peda), P.48(pularara), P.52(pedala)
- Between headers: d/s/o/y-initial words (ingredients, instructions)
- CONSISTENT WITH Ayurvedic recipe structure:
- Header paragraph (condition name + formula name) — p-initial
- 4-6 paragraphs of ingredients, preparation, dosage
- Different star types likely mark these categories: problem then solution
- ~193 dark-painted stars ≈ 204 illustrated pages elsewhere (possible indexing)
- f49v: 16+ readable characters (plus illegibles)
- f66r: 30 characters in M section
- f76r: 9 characters as illustration labels
Paper/data/decoded_vocabulary.tsv— Tier 2 integration (45 entries) + amu identificationPaper/data/semantic_coverage_analysis.tsv— Full coverage breakdownPaper/data/cross_tradition_vocabulary_research.md— Research synthesis + Yogamuktavali + JayaweeraPaper/references/bhesajjamanjusa.pdf— 13th c. Pali medical textPaper/references/jayaweera_part[2-5].pdf— Medicinal Plants of Ceylon/tmp/jayaweera_all.txt— Combined text of all 5 Jayaweera parts (48,190 lines)/tmp/tipitaka.lk/— Full git clone of Tipitaka Sinhala translation (1,609 files)/tmp/tipitaka_sinhala.txt— 207,293 text blocks, 55M characters, 139MBPaper/SESSION_NOTES_v9.md— This file
User noticed structural patterns on f79r (recipe/bathing section).
| Line | EVA | Decoded | Meaning | Pattern |
|---|---|---|---|---|
| P.7 | polchedy | puleda | bloomed + then | p-initial (recipe header) |
| P.12 | qokchy | uga | learned/fig-tree | instruction paragraph |
| P.25 | cholchey | ulea | (compound) | instruction paragraph |
| P.39 | polkeey | pulagēa | (compound) | p-initial (recipe header) |
Lines 7 and 39 are RECIPE HEADERS (p-initial), 12 and 25 are INSTRUCTIONS. User correctly identified that these lines "look different" — they start new recipes.
| Line | Word 1 | Word 2 | Significance |
|---|---|---|---|
| P.20 | a (vowel) | mala (flower/garland/stool) | Plant/symptom reference |
| P.34 | leula | keda (crude/base form DRY) | Pharmaceutical preparation term |
P.34 starts with keda as second word — names a preparation TYPE (crude decoction).
This suggests sub-structure within recipes: not just header + ingredients, but also
preparation-type labels.
Headers at: P.7, P.13, P.21, P.26, P.31, P.35, P.38, P.39 = 8 recipe headers in 44 paragraphs = ~5.5 paragraphs per recipe
On f66r's M section (individual character listings), two entries are transcribed as 'x' by ALL transcribers (H, C, F, U) — meaning these glyphs don't match ANY known EVA character:
- M.10: EVA 'x' — unidentified glyph
- M.24: EVA 'x' — unidentified glyph
- M.22: EVA 'c' (bare) — extremely rare, only appears alone in this context
User observed that "the character after the word doesn't look like a character I have seen before."
Possible interpretations:
- Numerals from a different system (Arabic, Sinhala, Brahmi)
- Sinhala/Brahmic characters written directly (not encoded in Voynichese)
- Abbreviation marks or punctuation
- Characters from a script the encoder borrowed but didn't fully integrate
- Later additions by a different hand (like the German "der Mussteil")
If these are NUMERALS — they could provide a Rosetta Stone for number identification. If they are Brahmic characters — they would directly confirm the script family.
If someone independently digitizes which paragraphs have tailed vs untailed stars, red vs yellow centers, we can test whether star type correlates with p-initial (header) vs non-p-initial (instruction). Star data and decoding are completely independent. Expected: Different star types mark different content categories.
Check if vocabulary WITHIN a recipe (between two p-initial headers) is more self-consistent than vocabulary ACROSS recipe boundaries. Use cosine similarity or Jaccard overlap. Expected: Higher within-recipe than between-recipe similarity.
Compare the listed character sequence on f49v against traditional Sinhala/Brahmic syllabary orderings (akshara pata). Even partial match is significant. Expected: Some correspondence to traditional ordering.
Cross-reference the 'x' glyphs against Sinhala numeral forms, Brahmi script elements, or other Indic character inventories. Expected: If identifiable as Brahmic, strong evidence for the script hypothesis.
Check if p-initial header words consistently relate to medical conditions/dosage forms across the entire recipe section (not just f103r). pedala(suffering), pea(drinking/beverage), pula(opened/bloomed=pustule?) etc. Expected: Header words should cluster around condition names and dosage forms.
- Cross-tradition attestation: 10/13 decoded terms found in 13th c. Bhesajjamanjusa
- keda/kleda breakthrough: l-deletion documented in Pali grammar (not our invention)
- Yogamuktavali parallel: 7/15 pharmaceutical chapters match decoded state-markers
- Plant names: 12/16 confirmed in Jayaweera's Ceylon medicinal plants
- Grammar: Participial chaining, SOV order, -la suffix, da marker all correct
- Vocabulary: TTR normal for recipe sublanguage; repetition is genre feature
- Folio structure: f103+ problem-then-solution pattern with semantic content matching visual formatting
- Real Sinhala recipe (Bodleian MS): identical ingredient+measure template structure
The main vulnerability is the gap between "real Sinhala words" and "readable Sinhala text." Individual words match dictionaries, but we haven't produced a complete, independently verifiable connected passage that an unbiased Sinhala scholar would read as natural prose.
u-prefix overrepresentation (40.6% vs 1.3%) = 31x. Even pharmaceutical Sanskrit only reaches 1.8% u-initial. This either means:
- EVA initial o/qo encodes something other than u- (scribal convention? word-boundary?)
- Rule 21 (q→silent) needs revision
- The source language uses u- differently than standard Sinhala
- A complete readable recipe passage verified by a Sinhala scholar
- Physical access to Nava Jatiya Niganduwa (600yr old pharmaceutical glossary)
- Identification of the f66r unknown characters as Brahmic
- Resolution of the u-prefix anomaly
- Manchester MSS: All Buddhist, no medical content
- Chandrasena: Uses English for prep terms, limited Sinhala vocabulary
- CCRAS portal: Down/unavailable
- About 30% of investigation time produced 90% of value
Measured Jaccard similarity of word sets between consecutive paragraphs. Compared within-recipe similarity vs across-boundary vs random baseline.
| Comparison | Mean Jaccard | N pairs |
|---|---|---|
| Within-recipe (consecutive paras) | 0.0459 | 814 |
| Across-recipe boundary | 0.0275 | 159 |
| Random paragraph pairs | 0.0288 | 1,000 |
| Adjacent recipe word-sets | 0.0998 | 159 |
- Within-recipe = 1.67x higher than across-boundary
- Permutation p-value: p < 0.0001 (0/10,000 permutations ≥ observed)
- Same-folio r/v similarity: 0.2173 (n=9)
- Different-folio similarity: 0.1964 (n=201)
The p-initial headers mark real content boundaries. Vocabulary genuinely changes at recipe boundaries — different recipes use different ingredient/instruction sets. This is strong evidence that the decoded text has internally coherent semantic structure.
Compared f49v character sequence against standard Brahmic akshara ordering.
-
Consonant order does NOT match Brahmic akshara order (18/36 inversions)
- f49v: ca, ra, ka, sa, pa, ga
- Brahmic: ka, ca, ṭa, ta, pa, ya, ra, sa
-
BUT vowel triplet (u, a, e) repeats perfectly 3 times:
u a e | u a e | u a e | a aPattern length 3: matches 3/3 complete cycles -
Illegible glyph &140 repeats at regular intervals (positions 6, 14, 21) = same unknown character, possibly sh(=ma) or another multi-stroke glyph
-
Core H12 inventory is represented:
- 6 consonants: ca, ra, ka, sa, pa, ga (+ possibly ma if &140=sh)
- 3 vowels: u, a, e
- Missing: ta, da, na, la (less frequent); i (rarest vowel)
- Multi-character mappings (sh, ch, ct, ck, cp) don't need entries
f49v is NOT a traditional Sinhala syllabary chart (wrong order). BUT the repeating vowel triplet structure IS consistent with an abugida demonstration — showing consonant+vowel combinations. This is how you'd teach/document the encoding system: "this character makes sound X, combine with these vowels."
The page documents the CORE characters of the writing system (the ones that appear as single glyphs), not the full phoneme inventory (which includes digraphs/combinations).
Analyzed all 161 p/f-initial header paragraphs across the full recipe section. Compared first-word semantics, root distribution, and category enrichment vs 814 non-header paragraphs.
1. Header first-words cluster around 3 roots:
| Root | Count | % | Medical meaning |
|---|---|---|---|
| pu- (pula, puleda...) | 78 | 48.4% | opened/bloomed/swollen; puṭapāka (crucible method) |
| pe- (peda, pedala, pea...) | 33 | 20.5% | suffering/affliction (< Skt pīḍā); drinkable (< Skt peya) |
| pa- (padara, pala...) | 33 | 20.5% | fruit; step/method |
| ca-/cu- (f-initial) | 10 | 6.2% | small (cula); powder (curna) |
2. Non-headers have completely different initial distribution: u- (23%), a- (21%), g- (19%), s- (14%), t- (12%) — zero p/f dominance
3. Semantic category enrichment:
| Category | Headers | Non-headers | Enrichment |
|---|---|---|---|
| Dosage form words | 7.2% | 3.3% | 2.2x in headers |
| Pharmaceutical actions | 4.1% | 11.4% | 2.8x in non-headers |
Headers contain condition names and preparation labels. Non-headers contain instructions (gena=take, gala=strain, kara=make, tha=place). This is exactly the expected semantic differentiation.
4. Second-word comparison:
- Header 2nd words: ara(5), utara(4), mēda(4), meda(4) — preparation terms
- Non-header 2nd words: ea(36), ugēa(28), mea(27), ena(27) — ingredients/process
5. Key medical terms confirmed:
- pedala/peda (7x) = suffering/affliction (< Skt pīḍā) — condition name
- pea (3x) = drinking/beverage (< Skt peya) — dosage form name
- pula (9x) = opened/expanded/swollen — condition or prep term
- curna (via cu-) = powder — dosage form name
The p/f-initial headers are semantically differentiated from non-headers. Headers name CONDITIONS and PREPARATION TYPES. Non-headers provide INSTRUCTIONS and INGREDIENTS. This is consistent with real Ayurvedic recipe structure: "For [condition], [dosage form]: take X, grind Y, strain Z, place in W."
- tadala = Taro (Colocasia), NOT Palmyra Palm — main.tex line 611, paper.md line 517
- pudina = loan word — should note this weakens f14r identification
- amu = Kodo millet — updated in TSV but not in paper text
- None of v9 findings are in main.tex/paper.md yet (all in session notes only)