]>
Commit | Line | Data |
---|---|---|
3d9156a7 | 1 | .\" Copyright (c) 2002-2004 Tim J. Robbins. All rights reserved. |
5b2abdfb A |
2 | .\" Copyright (c) 1993 |
3 | .\" The Regents of the University of California. All rights reserved. | |
4 | .\" | |
5 | .\" This code is derived from software contributed to Berkeley by | |
6 | .\" Donn Seeley of BSDI. | |
7 | .\" | |
8 | .\" Redistribution and use in source and binary forms, with or without | |
9 | .\" modification, are permitted provided that the following conditions | |
10 | .\" are met: | |
11 | .\" 1. Redistributions of source code must retain the above copyright | |
12 | .\" notice, this list of conditions and the following disclaimer. | |
13 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
14 | .\" notice, this list of conditions and the following disclaimer in the | |
15 | .\" documentation and/or other materials provided with the distribution. | |
5b2abdfb A |
16 | .\" 4. Neither the name of the University nor the names of its contributors |
17 | .\" may be used to endorse or promote products derived from this software | |
18 | .\" without specific prior written permission. | |
19 | .\" | |
20 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
21 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
22 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
23 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
24 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
25 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
26 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
27 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
28 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
29 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
30 | .\" SUCH DAMAGE. | |
31 | .\" | |
32 | .\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 | |
1f2f436a | 33 | .\" $FreeBSD: src/lib/libc/locale/multibyte.3,v 1.28 2007/01/09 00:28:00 imp Exp $ |
5b2abdfb | 34 | .\" |
3d9156a7 | 35 | .Dd April 8, 2004 |
5b2abdfb A |
36 | .Dt MULTIBYTE 3 |
37 | .Os | |
38 | .Sh NAME | |
3d9156a7 A |
39 | .Nm multibyte |
40 | .Nd multibyte and wide character manipulation functions | |
5b2abdfb A |
41 | .Sh LIBRARY |
42 | .Lb libc | |
43 | .Sh SYNOPSIS | |
3d9156a7 | 44 | .In limits.h |
5b2abdfb | 45 | .In stdlib.h |
3d9156a7 | 46 | .In wchar.h |
5b2abdfb | 47 | .Sh DESCRIPTION |
3d9156a7 | 48 | The basic elements of some written natural languages, such as Chinese, |
5b2abdfb | 49 | cannot be represented uniquely with single C |
3d9156a7 | 50 | .Vt char Ns s . |
5b2abdfb | 51 | The C standard supports two different ways of dealing with |
3d9156a7 A |
52 | extended natural language encodings: |
53 | wide characters and | |
54 | multibyte characters. | |
5b2abdfb A |
55 | Wide characters are an internal representation |
56 | which allows each basic element to map | |
57 | to a single object of type | |
3d9156a7 | 58 | .Vt wchar_t . |
5b2abdfb A |
59 | Multibyte characters are used for input and output |
60 | and code each basic element as a sequence of C | |
3d9156a7 | 61 | .Vt char Ns s . |
5b2abdfb A |
62 | Individual basic elements may map into one or more |
63 | (up to | |
9385eb3d | 64 | .Dv MB_LEN_MAX ) |
5b2abdfb A |
65 | bytes in a multibyte character. |
66 | .Pp | |
67 | The current locale | |
68 | .Pq Xr setlocale 3 | |
69 | governs the interpretation of wide and multibyte characters. | |
70 | The locale category | |
71 | .Dv LC_CTYPE | |
72 | specifically controls this interpretation. | |
73 | The | |
3d9156a7 | 74 | .Vt wchar_t |
5b2abdfb A |
75 | type is wide enough to hold the largest value |
76 | in the wide character representations for all locales. | |
77 | .Pp | |
78 | Multibyte strings may contain | |
79 | .Sq shift | |
80 | indicators to switch to and from | |
81 | particular modes within the given representation. | |
82 | If explicit bytes are used to signal shifting, | |
83 | these are not recognized as separate characters | |
84 | but are lumped with a neighboring character. | |
85 | There is always a distinguished | |
86 | .Sq initial | |
87 | shift state. | |
3d9156a7 A |
88 | Some functions (e.g., |
89 | .Xr mblen 3 , | |
90 | .Xr mbtowc 3 | |
5b2abdfb | 91 | and |
3d9156a7 A |
92 | .Xr wctomb 3 ) |
93 | maintain static shift state internally, whereas | |
94 | others store it in an | |
95 | .Vt mbstate_t | |
96 | object passed by the caller. | |
97 | Shift states are undefined after a call to | |
98 | .Xr setlocale 3 | |
5b2abdfb A |
99 | with the |
100 | .Dv LC_CTYPE | |
101 | or | |
102 | .Dv LC_ALL | |
103 | categories. | |
104 | .Pp | |
105 | For convenience in processing, | |
106 | the wide character with value 0 | |
107 | (the null wide character) | |
108 | is recognized as the wide character string terminator, | |
109 | and the character with value 0 | |
110 | (the null byte) | |
111 | is recognized as the multibyte character string terminator. | |
112 | Null bytes are not permitted within multibyte characters. | |
113 | .Pp | |
3d9156a7 A |
114 | The C library provides the following functions for dealing with |
115 | multibyte characters: | |
116 | .Bl -column "Description" | |
117 | .It Sy "Function Description" | |
118 | .It Xr mblen 3 Ta "get number of bytes in a character" | |
119 | .It Xr mbrlen 3 Ta "get number of bytes in a character (restartable)" | |
120 | .It Xr mbrtowc 3 Ta "convert a character to a wide-character code (restartable)" | |
121 | .It Xr mbsrtowcs 3 Ta "convert a character string to a wide-character string (restartable)" | |
122 | .It Xr mbstowcs 3 Ta "convert a character string to a wide-character string" | |
123 | .It Xr mbtowc 3 Ta "convert a character to a wide-character code" | |
124 | .It Xr wcrtomb 3 Ta "convert a wide-character code to a character (restartable)" | |
125 | .It Xr wcstombs 3 Ta "convert a wide-character string to a character string" | |
126 | .It Xr wcsrtombs 3 Ta "convert a wide-character string to a character string (restartable)" | |
127 | .It Xr wctomb 3 Ta "convert a wide-character code to a character" | |
128 | .El | |
9385eb3d | 129 | .Sh SEE ALSO |
3d9156a7 | 130 | .Xr mklocale 1 , |
5b2abdfb | 131 | .Xr setlocale 3 , |
3d9156a7 A |
132 | .Xr stdio 3 , |
133 | .Xr big5 5 , | |
134 | .Xr euc 5 , | |
135 | .Xr gb18030 5 , | |
136 | .Xr gb2312 5 , | |
137 | .Xr gbk 5 , | |
138 | .Xr mskanji 5 , | |
9385eb3d | 139 | .Xr utf8 5 |
5b2abdfb | 140 | .Sh STANDARDS |
3d9156a7 A |
141 | These functions conform to |
142 | .St -isoC-99 . |