1 # © 2016 and later: Unicode, Inc. and others.
2 # License & terms of use: http://www.unicode.org/copyright.html#License
4 # File: sat_Olck_sat_FONIPA.txt
8 # Santali (Ol Chiki) → Santali (International Phonetic Alphabet)
11 # m mː n nː ɳ ɳː ɲ ɲː ŋ ŋː
12 # p pʰ pʼ b bʰ t tʰ tʼ d dʰ ʈ ʈʰ ɖ ɖʰ c cʰ cʼ k kʰ kʼ ɡ ʔ
17 # w wː w\u0303 w\u0303ː
20 # e eː ẽ ẽː ə əː ə\u0303 ə\u0303ː o oː õ õː
21 # ɛ ɛː ɛ\u0303 ɛ\u0303ː ɔ ɔː ɔ\u0303 ɔ\u0303ː
25 # [1] Michael Everson: Final proposal to encode the Ol Chiki script
26 # in the UCS. ISO/IEC JTC1/SC2/WG2 Working Group Document N2984R,
27 # September 21, 2005. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2984.pdf
29 # [2] George L. Campbell: Compendium of the World's Languages.
30 # Volume 2: Ladakhi to Zuni. ISBN 0-415-20297-3. Taylor & Francis, 2000.
34 # According to [1] (page 3), ᱽ can only follow the four ejective
35 # consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/; these become
36 # ᱵᱽ /b/, ᱫᱽ /d/, ᱡᱽ /d\u0361ʒ/, and ᱜᱽ /ɡ/. In online texts, however,
37 # we have occasionally encountered ᱽ following non-ejective plosives,
38 # for example after ᱯ /p/. These might possibly be typos. Our rules
39 # try to be resilient and handle ᱯᱽ as /b/.
41 # According to [1] (page 2), U+1C7C PHAARKAA follows the four “glottal”
42 # consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/ (these are actually
43 # ejective, not glottal). In online texts, however, we have frequently
44 # encountered ᱼ following non-ejective consonants.
45 $inword = [[:L:][:M:]];
46 # Some online texts use a decomposed form of U+1C7A MU-GAAHLAA TTUDDAG.
50 # To simplify the rules below, enforce a uniform ordering of marks.
58 # Some online texts use U+1C7C PHAARKAA instead of U+1C7B RELAA for indicating
59 # long phonemes, presumably because the graphemes look similar in some fonts.
60 # Since phaarkaa is used for voicing ejectives and plosives (which cannot
61 # be lenghtened), we rewrite phaarkaa to relaa.
62 [ᱚᱟᱤᱩᱮᱳᱶᱢᱝᱞᱱ] [ᱹᱸᱺ]* {ᱼ} → ᱻ ;
101 $inword {ᱡ} → d\u0361ʒ ;
105 # According to [1], ᱣ is sometimes /v/ and sometimes /w/.
106 # TODO: Find out if there is a rule for this.
119 # According to [1], ᱦ is sometimes /h/ and sometimes /ʔ/.
120 # TODO: Find out if there is a rule for this.
145 # TODO: ᱵᱷᱭᱨᱚᱵ → bʰhrɔb seems unlikely; would be good to verify.