]>
Commit | Line | Data |
---|---|---|
5b2abdfb | 1 | .\" Copyright (c) 1993 |
e9ce8d39 A |
2 | .\" The Regents of the University of California. All rights reserved. |
3 | .\" | |
5b2abdfb A |
4 | .\" This code is derived from software contributed to Berkeley by |
5 | .\" Paul Borman at Krystal Technologies. | |
6 | .\" | |
e9ce8d39 A |
7 | .\" Redistribution and use in source and binary forms, with or without |
8 | .\" modification, are permitted provided that the following conditions | |
9 | .\" are met: | |
10 | .\" 1. Redistributions of source code must retain the above copyright | |
11 | .\" notice, this list of conditions and the following disclaimer. | |
12 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
13 | .\" notice, this list of conditions and the following disclaimer in the | |
14 | .\" documentation and/or other materials provided with the distribution. | |
e9ce8d39 A |
15 | .\" 4. Neither the name of the University nor the names of its contributors |
16 | .\" may be used to endorse or promote products derived from this software | |
17 | .\" without specific prior written permission. | |
18 | .\" | |
19 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
20 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
21 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
22 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
23 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
24 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
25 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
26 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
27 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
28 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
29 | .\" SUCH DAMAGE. | |
30 | .\" | |
5b2abdfb | 31 | .\" @(#)utf2.4 8.1 (Berkeley) 6/4/93 |
1f2f436a | 32 | .\" $FreeBSD: src/lib/libc/locale/utf8.5,v 1.7 2007/01/09 00:28:01 imp Exp $ |
e9ce8d39 | 33 | .\" |
3d9156a7 | 34 | .Dd April 7, 2004 |
9385eb3d | 35 | .Dt UTF8 5 |
e9ce8d39 A |
36 | .Os |
37 | .Sh NAME | |
9385eb3d A |
38 | .Nm utf8 |
39 | .Nd "UTF-8, a transformation format of ISO 10646" | |
e9ce8d39 | 40 | .Sh SYNOPSIS |
5b2abdfb | 41 | .Nm ENCODING |
9385eb3d | 42 | .Qq UTF-8 |
e9ce8d39 A |
43 | .Sh DESCRIPTION |
44 | The | |
9385eb3d A |
45 | .Nm UTF-8 |
46 | encoding represents UCS-4 characters as a sequence of octets, using | |
47 | between 1 and 6 for each character. | |
48 | It is backwards compatible with | |
49 | .Tn ASCII , | |
50 | so 0x00-0x7f refer to the | |
51 | .Tn ASCII | |
52 | character set. | |
53 | The multibyte encoding of | |
54 | .No non- Ns Tn ASCII | |
55 | characters | |
56 | consist entirely of bytes whose high order bit is set. | |
57 | The actual | |
5b2abdfb | 58 | encoding is represented by the following table: |
e9ce8d39 | 59 | .Bd -literal |
9385eb3d A |
60 | [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb |
61 | [0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb | |
62 | [0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] -> | |
63 | 1110bbbb, 10bbbbbb, 10bbbbbb | |
64 | [0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> | |
5b2abdfb | 65 | 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
9385eb3d | 66 | [0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> |
5b2abdfb | 67 | 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
9385eb3d | 68 | [0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> |
5b2abdfb A |
69 | 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb |
70 | .Ed | |
71 | .Pp | |
9385eb3d A |
72 | If more than a single representation of a value exists (for example, |
73 | 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always | |
74 | used. | |
75 | Longer ones are detected as an error as they pose a potential | |
76 | security risk, and destroy the 1:1 character:octet sequence mapping. | |
9385eb3d | 77 | .Sh SEE ALSO |
3d9156a7 | 78 | .Xr euc 5 |
9385eb3d A |
79 | .Rs |
80 | .%A "Rob Pike" | |
81 | .%A "Ken Thompson" | |
82 | .%T "Hello World" | |
83 | .%J "Proceedings of the Winter 1993 USENIX Technical Conference" | |
84 | .%Q "USENIX Association" | |
85 | .%D "January 1993" | |
86 | .Re | |
87 | .Rs | |
88 | .%A "F. Yergeau" | |
89 | .%T "UTF-8, a transformation format of ISO 10646" | |
90 | .%O "RFC 2279" | |
91 | .%D "January 1998" | |
92 | .Re | |
93 | .Rs | |
94 | .%Q "The Unicode Consortium" | |
95 | .%T "The Unicode Standard, Version 3.0" | |
96 | .%D "2000" | |
97 | .%O "as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2" | |
98 | .Re | |
99 | .Sh STANDARDS | |
100 | The | |
101 | .Nm | |
102 | encoding is compatible with RFC 2279 and Unicode 3.2. |