]>
Commit | Line | Data |
---|---|---|
5b2abdfb | 1 | .\" Copyright (c) 1993 |
e9ce8d39 A |
2 | .\" The Regents of the University of California. All rights reserved. |
3 | .\" | |
5b2abdfb A |
4 | .\" This code is derived from software contributed to Berkeley by |
5 | .\" Paul Borman at Krystal Technologies. | |
6 | .\" | |
e9ce8d39 A |
7 | .\" Redistribution and use in source and binary forms, with or without |
8 | .\" modification, are permitted provided that the following conditions | |
9 | .\" are met: | |
10 | .\" 1. Redistributions of source code must retain the above copyright | |
11 | .\" notice, this list of conditions and the following disclaimer. | |
12 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
13 | .\" notice, this list of conditions and the following disclaimer in the | |
14 | .\" documentation and/or other materials provided with the distribution. | |
15 | .\" 3. All advertising materials mentioning features or use of this software | |
16 | .\" must display the following acknowledgement: | |
17 | .\" This product includes software developed by the University of | |
18 | .\" California, Berkeley and its contributors. | |
19 | .\" 4. Neither the name of the University nor the names of its contributors | |
20 | .\" may be used to endorse or promote products derived from this software | |
21 | .\" without specific prior written permission. | |
22 | .\" | |
23 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
24 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
25 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
26 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
27 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
28 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
29 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
30 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
31 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
32 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
33 | .\" SUCH DAMAGE. | |
34 | .\" | |
5b2abdfb | 35 | .\" @(#)utf2.4 8.1 (Berkeley) 6/4/93 |
9385eb3d | 36 | .\" $FreeBSD: src/lib/libc/locale/utf2.4,v 1.10 2002/10/10 22:56:18 tjr Exp $ |
e9ce8d39 | 37 | .\" |
9385eb3d | 38 | .Dd October 11, 2002 |
3d9156a7 | 39 | .Dt UTF2 5 |
e9ce8d39 A |
40 | .Os |
41 | .Sh NAME | |
5b2abdfb A |
42 | .Nm utf2 |
43 | .Nd "Universal character set Transformation Format encoding of runes | |
e9ce8d39 | 44 | .Sh SYNOPSIS |
5b2abdfb A |
45 | .Nm ENCODING |
46 | .Qq UTF2 | |
e9ce8d39 | 47 | .Sh DESCRIPTION |
9385eb3d A |
48 | .Bf Em |
49 | The UTF2 encoding has been deprecated in favour of UTF-8. | |
50 | .Ef | |
51 | New applications should not use UTF2. | |
52 | .Pp | |
e9ce8d39 | 53 | The |
5b2abdfb A |
54 | .Nm UTF2 |
55 | encoding is based on a proposed X-Open multibyte | |
56 | \s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in | |
57 | .Sy "Plan 9 from Bell Labs" . | |
58 | Although it is capable of representing more than 16 bits, | |
59 | the current implementation is limited to 16 bits as defined by the | |
60 | Unicode Standard. | |
61 | .Pp | |
62 | .Nm UTF2 | |
63 | representation is backwards compatible with ASCII, so 0x00-0x7f refer to the | |
64 | ASCII character set. The multibyte encoding of runes between 0x0080 and 0xffff | |
65 | consist entirely of bytes whose high order bit is set. The actual | |
66 | encoding is represented by the following table: | |
e9ce8d39 | 67 | .Bd -literal |
5b2abdfb A |
68 | [0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb |
69 | [0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb | |
70 | [0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb | |
e9ce8d39 | 71 | .Ed |
5b2abdfb A |
72 | .Pp |
73 | If more than a single representation of a value exists (for example, | |
74 | 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always | |
75 | used (but the longer ones will be correctly decoded). | |
76 | .Pp | |
77 | The final three encodings provided by X-Open: | |
78 | .Bd -literal | |
79 | [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> | |
80 | 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
81 | ||
82 | [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> | |
83 | 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
84 | ||
85 | [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> | |
86 | 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb | |
87 | .Ed | |
88 | .Pp | |
89 | which provides for the entire proposed ISO-10646 31 bit standard are currently | |
90 | not implemented. | |
91 | .Sh "SEE ALSO" | |
92 | .Xr mklocale 1 , | |
9385eb3d A |
93 | .Xr setlocale 3 , |
94 | .Xr utf8 5 |