- 5 has exception values
- 6..10 BiDi category
-11 is mirrored
-12..14 numericType:
- 0 no numeric value
- 1 decimal digit value
- 2 digit value
- 3 numeric value
- ### TODO: type 4 for Han digits & numbers?!
-15..19 reserved
-20..31 value according to bits 0..5:
- if(has exception) {
- exception index;
- } else switch(general category) {
- case Ll: delta to uppercase; -- same as titlecase
- case Lu: -delta to lowercase; -- titlecase is same as c
- case Lt: -delta to lowercase; -- uppercase is same as c
- default:
- if(is mirrored) {
- delta to mirror;
- } else if(numericType!=0) {
- numericValue;
- } else {
- 0;
- };
- }
-
-Exception values:
-
-In the first uint32_t exception word for a code point,
-bits
-31..16 reserved
-15..0 flags that indicate which values follow:
-
-bit
- 0 has uppercase mapping
- 1 has lowercase mapping
- 2 has titlecase mapping
- 3 unused
- 4 has numeric value (numerator)
- if numericValue=0x7fffff00+x then numericValue=10^x
- 5 has denominator value
- 6 has a mirror-image Unicode code point
- 7 has SpecialCasing.txt entries
- 8 has CaseFolding.txt entries
-
-According to the flags in this word, one or more uint32_t words follow it
-in the sequence of the bit flags in the flags word; if a flag is not set,
-then the value is missing or 0:
-
-For the case mappings and the mirror-image Unicode code point,
-one uint32_t or UChar32 each is the code point.
-If the titlecase mapping is missing, then it is the same as the uppercase mapping.
-
-For the digit values, bits 31..16 contain the decimal digit value, and
-bits 15..0 contain the digit value. A value of -1 indicates that
-this value is missing.
-
-For the numeric/numerator value, an int32_t word contains the value directly,
-except for when there is no numerator but a denominator, then the numerator
-is implicitly 1. This means:
- numerator denominator result
- none none none
- x none x
- none y 1/y
- x y x/y
-
-If the numerator value is 0x7fffff00+x then it is replaced with 10^x.
-
-For the denominator value, a uint32_t word contains the value directly.
-
-For special casing mappings, the 32-bit exception word contains:
-31 if set, this character has complex, conditional mappings
- that are not stored;
- otherwise, the mappings are stored according to the following bits
-30..24 number of UChars used for mappings
-23..16 reserved
-15.. 0 UChar offset from the beginning of the UChars array where the
- UChars for the special case mappings are stored in the following format:
-
-Format of special casing UChars:
-One UChar value with lengths as follows:
-14..10 number of UChars for titlecase mapping
- 9.. 5 number of UChars for uppercase mapping
- 4.. 0 number of UChars for lowercase mapping
-
-Followed by the UChars for lowercase, uppercase, titlecase mappings in this order.
-
-For case folding mappings, the 32-bit exception word contains:
-31..24 number of UChars used for the full mapping
-23..16 reserved
-15.. 0 UChar offset from the beginning of the UChars array where the
- UChars for the special case mappings are stored in the following format:
-
-Format of case folding UChars:
-Two UChars contain the simple mapping as follows:
- 0, 0 no simple mapping
- BMP,0 a simple mapping to a BMP code point
- s1, s2 a simple mapping to a supplementary code point stored as two surrogates
-This is followed by the UChars for the full case folding mappings.
-
-Example:
-U+2160, ROMAN NUMERAL ONE, needs an exception because it has a lowercase
-mapping and a numeric value.
-Its exception values would be stored as 3 uint32_t words:
-
-- flags=0x0a (see above) with combining class 0
-- lowercase mapping 0x2170
-- numeric value=1
+ 5.. 7 numeric type
+ non-digit numbers are stored with multiple types and pseudo-types
+ in order to facilitate compact encoding:
+ 0 no numeric value (0)
+ 1 decimal digit value (0..9)
+ 2 digit value (0..9)
+ 3 (U_NT_NUMERIC) normal non-digit numeric value 0..0xff
+ 4 (internal type UPROPS_NT_FRACTION) fraction
+ 5 (internal type UPROPS_NT_LARGE) large number >0xff
+ 6..7 reserved
+
+ when returning the numeric type from a public API,
+ internal types must be turned into U_NT_NUMERIC
+
+ 8..15 numeric value
+ encoding of fractions and large numbers see below
+
+Fractions:
+ // n is the 8-bit numeric value from bits 8..15 of the trie word (shifted down)
+ int32_t num, den;
+ num=n>>3; // num=0..31
+ den=(n&7)+2; // den=2..9
+ if(num==0) {
+ num=-1; // num=-1 or 1..31
+ }
+ double result=(double)num/(double)den;
+
+Large numbers:
+ // n is the 8-bit numeric value from bits 8..15 of the trie word (shifted down)
+ int32_t m, e;
+ m=n>>4; // m=0..15
+ e=(n&0xf);
+ if(m==0) {
+ m=1; // for large powers of 10
+ e+=18; // e=18..33
+ } else {
+ e+=2; // e=2..17
+ } // m==10..15 are reserved
+ double result=(double)m*10^e;