X-Git-Url: https://git.saurik.com/apple/icu.git/blobdiff_plain/374ca955a76ecab1204ca8bfa63ff9238d998416..4f1e1a09ce4daed860e35d359ce2fceccb0764e8:/icuSources/data/unidata/SpecialCasing.txt diff --git a/icuSources/data/unidata/SpecialCasing.txt b/icuSources/data/unidata/SpecialCasing.txt index c8401d6c..c90d09ac 100644 --- a/icuSources/data/unidata/SpecialCasing.txt +++ b/icuSources/data/unidata/SpecialCasing.txt @@ -1,13 +1,26 @@ -# SpecialCasing-4.0.1.txt -# Date: 2003-10-06, 17:30:00 PST [KW] +# SpecialCasing-11.0.0.txt +# Date: 2018-02-22, 06:16:47 GMT +# © 2018 Unicode®, Inc. +# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. +# For terms of use, see http://www.unicode.org/terms_of_use.html # -# Special Casing Properties +# Unicode Character Database +# For documentation, see http://www.unicode.org/reports/tr44/ # -# This file is a supplement to the UnicodeData file. -# It contains additional information about the casing of Unicode characters. -# (For compatibility, the UnicodeData.txt file only contains case mappings for -# characters where they are 1-1, and does not have locale-specific mappings.) -# For more information, see the discussion of Case Mappings in the Unicode Standard. +# Special Casing +# +# This file is a supplement to the UnicodeData.txt file. It does not define any +# properties, but rather provides additional information about the casing of +# Unicode characters, for situations when casing incurs a change in string length +# or is dependent on context or locale. For compatibility, the UnicodeData.txt +# file only contains simple case mappings for characters where they are one-to-one +# and independent of context and language. The data in this file, combined with +# the simple case mappings in UnicodeData.txt, defines the full case mappings +# Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc). +# +# Note that the preferred mechanism for defining tailored casing operations is +# the Unicode Common Locale Data Repository (CLDR). For more information, see the +# discussion of case mappings and case algorithms in the Unicode Standard. # # All code points not listed in this file that do not have a simple case mappings # in UnicodeData.txt map to themselves. @@ -16,27 +29,26 @@ # ================================================================================ # The entries in this file are in the following machine-readable format: # -# ; ; ; <upper> ; (<condition_list> ;)? # <comment> +# <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment> # -# <code>, <lower>, <title>, and <upper> provide character values in hex. If there is more than -# one character, they are separated by spaces. Other than as used to separate elements, -# spaces are to be ignored. +# <code>, <lower>, <title>, and <upper> provide the respective full case mappings +# of <code>, expressed as character values in hex. If there is more than one character, +# they are separated by spaces. Other than as used to separate elements, spaces are +# to be ignored. # -# The <condition_list> is optional. Where present, it consists of one or more locales or contexts, -# separated by spaces. In these conditions: +# The <condition_list> is optional. Where present, it consists of one or more language IDs +# or casing contexts, separated by spaces. In these conditions: # - A condition list overrides the normal behavior if all of the listed conditions are true. -# - The context is always the context of the characters in the original string, +# - The casing context is always the context of the characters in the original string, # NOT in the resulting string. # - Case distinctions in the condition list are not significant. # - Conditions preceded by "Not_" represent the negation of the condition. +# The condition list is not represented in the UCD as a formal property. # -# A locale is defined as: -# <locale> := <ISO_639_code> ( "_" <ISO_3166_code> ( "_" <variant> )? )? -# <ISO_3166_code> := 2-letter ISO country code, -# <ISO_639_code> := 2-letter ISO language code +# A language ID is defined by BCP 47, with '-' and '_' treated equivalently. # -# A context is one of the following, as defined in the Unicode Standard: -# Final_Sigma, After_Soft_Dotted, More_Above, Before_Dot, Not_Before_Dot, After_I +# A casing context for a character is defined by Section 3.13 Default Case Algorithms +# of The Unicode Standard. # # Parsers of this file must be prepared to deal with future additions to this format: # * Additional contexts @@ -101,15 +113,15 @@ FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI -# IMPORTANT-when capitalizing iota-subscript (0345) -# It MUST be in normalized form--moved to the end of any sequence of combining marks. -# This is because logically it represents a following base character! -# E.g. <iota_subscript> (<Mn> | <Mc> | <Me>)+ => (<Mn> | <Mc> | <Me>)+ <iota_subscript> -# It should never be the first character in a word, so in titlecasing it can be left as is. +# IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, +# the result will be incorrect unless the iota-subscript is moved to the end +# of any sequence of combining marks. Otherwise, the accents will go on the capital iota. +# This process can be achieved by first transforming the text to NFC before casing. +# E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA> -# The following cases are already in the UnicodeData file, so are only commented here. +# The following cases are already in the UnicodeData.txt file, so are only commented here. -# 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI +# 0345; 0345; 0399; 0399; # COMBINING GREEK YPOGEGRAMMENI # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscript) # have special uppercases. @@ -184,14 +196,21 @@ FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH 1FF7; 1FF7; 03A9 0342 0345; 03A9 0342 0399; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI # ================================================================================ -# Conditional mappings +# Conditional Mappings +# The remainder of this file provides conditional casing data used to produce +# full case mappings. +# ================================================================================ +# Language-Insensitive Mappings +# These are characters whose full case mappings do not depend on language, but do +# depend on context (which characters come before or after). For more information +# see the header of this file and the Unicode Standard. # ================================================================================ # Special case for final form of sigma 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA -# Note: the following cases for non-final are already in the UnicodeData file. +# Note: the following cases for non-final are already in the UnicodeData.txt file. # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA @@ -203,7 +222,10 @@ FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA # ================================================================================ -# Locale-sensitive mappings +# Language-Sensitive Mappings +# These are characters whose full case mappings depend on language and perhaps also +# context (which characters come before or after). For more information +# see the header of this file and the Unicode Standard. # ================================================================================ # Lithuanian @@ -251,6 +273,9 @@ FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I -# Note: the following case is already in the UnicodeData file. +# Note: the following case is already in the UnicodeData.txt file. # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I + +# EOF +