]>
Commit | Line | Data |
---|---|---|
5b2abdfb | 1 | .\" Copyright (c) 1993 |
e9ce8d39 A |
2 | .\" The Regents of the University of California. All rights reserved. |
3 | .\" | |
5b2abdfb A |
4 | .\" This code is derived from software contributed to Berkeley by |
5 | .\" Paul Borman at Krystal Technologies. | |
6 | .\" | |
e9ce8d39 A |
7 | .\" Redistribution and use in source and binary forms, with or without |
8 | .\" modification, are permitted provided that the following conditions | |
9 | .\" are met: | |
10 | .\" 1. Redistributions of source code must retain the above copyright | |
11 | .\" notice, this list of conditions and the following disclaimer. | |
12 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
13 | .\" notice, this list of conditions and the following disclaimer in the | |
14 | .\" documentation and/or other materials provided with the distribution. | |
15 | .\" 3. All advertising materials mentioning features or use of this software | |
16 | .\" must display the following acknowledgement: | |
17 | .\" This product includes software developed by the University of | |
18 | .\" California, Berkeley and its contributors. | |
19 | .\" 4. Neither the name of the University nor the names of its contributors | |
20 | .\" may be used to endorse or promote products derived from this software | |
21 | .\" without specific prior written permission. | |
22 | .\" | |
23 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
24 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
25 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
26 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
27 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
28 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
29 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
30 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
31 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
32 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
33 | .\" SUCH DAMAGE. | |
34 | .\" | |
5b2abdfb | 35 | .\" @(#)utf2.4 8.1 (Berkeley) 6/4/93 |
3d9156a7 | 36 | .\" $FreeBSD: src/lib/libc/locale/utf8.5,v 1.6 2004/10/17 02:29:15 tjr Exp $ |
e9ce8d39 | 37 | .\" |
3d9156a7 | 38 | .Dd April 7, 2004 |
9385eb3d | 39 | .Dt UTF8 5 |
e9ce8d39 A |
40 | .Os |
41 | .Sh NAME | |
9385eb3d A |
42 | .Nm utf8 |
43 | .Nd "UTF-8, a transformation format of ISO 10646" | |
e9ce8d39 | 44 | .Sh SYNOPSIS |
5b2abdfb | 45 | .Nm ENCODING |
9385eb3d | 46 | .Qq UTF-8 |
e9ce8d39 A |
47 | .Sh DESCRIPTION |
48 | The | |
9385eb3d A |
49 | .Nm UTF-8 |
50 | encoding represents UCS-4 characters as a sequence of octets, using | |
51 | between 1 and 6 for each character. | |
52 | It is backwards compatible with | |
53 | .Tn ASCII , | |
54 | so 0x00-0x7f refer to the | |
55 | .Tn ASCII | |
56 | character set. | |
57 | The multibyte encoding of | |
58 | .No non- Ns Tn ASCII | |
59 | characters | |
60 | consist entirely of bytes whose high order bit is set. | |
61 | The actual | |
5b2abdfb | 62 | encoding is represented by the following table: |
e9ce8d39 | 63 | .Bd -literal |
9385eb3d A |
64 | [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb |
65 | [0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb | |
66 | [0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] -> | |
67 | 1110bbbb, 10bbbbbb, 10bbbbbb | |
68 | [0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> | |
5b2abdfb | 69 | 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
9385eb3d | 70 | [0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> |
5b2abdfb | 71 | 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
9385eb3d | 72 | [0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> |
5b2abdfb A |
73 | 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
74 | .Ed | |
75 | .Pp | |
9385eb3d A |
76 | If more than a single representation of a value exists (for example, |
77 | 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always | |
78 | used. | |
79 | Longer ones are detected as an error as they pose a potential | |
80 | security risk, and destroy the 1:1 character:octet sequence mapping. | |
9385eb3d | 81 | .Sh SEE ALSO |
3d9156a7 | 82 | .Xr euc 5 |
9385eb3d A |
83 | .Rs |
84 | .%A "Rob Pike" | |
85 | .%A "Ken Thompson" | |
86 | .%T "Hello World" | |
87 | .%J "Proceedings of the Winter 1993 USENIX Technical Conference" | |
88 | .%Q "USENIX Association" | |
89 | .%D "January 1993" | |
90 | .Re | |
91 | .Rs | |
92 | .%A "F. Yergeau" | |
93 | .%T "UTF-8, a transformation format of ISO 10646" | |
94 | .%O "RFC 2279" | |
95 | .%D "January 1998" | |
96 | .Re | |
97 | .Rs | |
98 | .%Q "The Unicode Consortium" | |
99 | .%T "The Unicode Standard, Version 3.0" | |
100 | .%D "2000" | |
101 | .%O "as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2" | |
102 | .Re | |
103 | .Sh STANDARDS | |
104 | The | |
105 | .Nm | |
106 | encoding is compatible with RFC 2279 and Unicode 3.2. |