1 .\" $NetBSD: vis.3,v 1.39 2013/02/20 20:05:26 christos Exp $
4 .\" Copyright (c) 1989, 1991, 1993
5 .\" The Regents of the University of California. All rights reserved.
7 .\" Redistribution and use in source and binary forms, with or without
8 .\" modification, are permitted provided that the following conditions
10 .\" 1. Redistributions of source code must retain the above copyright
11 .\" notice, this list of conditions and the following disclaimer.
12 .\" 2. Redistributions in binary form must reproduce the above copyright
13 .\" notice, this list of conditions and the following disclaimer in the
14 .\" documentation and/or other materials provided with the distribution.
15 .\" 3. Neither the name of the University nor the names of its contributors
16 .\" may be used to endorse or promote products derived from this software
17 .\" without specific prior written permission.
19 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31 .\" @(#)vis.3 8.1 (Berkeley) 6/9/93
51 .Nd visually encode characters
57 .Fn vis "char *dst" "int c" "int flag" "int nextc"
59 .Fn nvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc"
61 .Fn strvis "char *dst" "const char *src" "int flag"
63 .Fn strnvis "char *dst" "size_t dlen" "const char *src" "int flag"
65 .Fn strvisx "char *dst" "const char *src" "size_t len" "int flag"
67 .Fn strnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag"
69 .Fn strenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "int *cerr_ptr"
71 .Fn svis "char *dst" "int c" "int flag" "int nextc" "const char *extra"
73 .Fn snvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc" "const char *extra"
75 .Fn strsvis "char *dst" "const char *src" "int flag" "const char *extra"
77 .Fn strsnvis "char *dst" "size_t dlen" "const char *src" "int flag" "const char *extra"
79 .Fn strsvisx "char *dst" "const char *src" "size_t len" "int flag" "const char *extra"
81 .Fn strsnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra"
83 .Fn strsenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra" "int *cerr_ptr"
90 a string which represents the character
94 needs no encoding, it is copied in unaltered.
95 The string is null terminated, and a pointer to the end of the string is
97 The maximum length of any encoding is four
98 bytes (not including the trailing
101 encoding a set of characters into a buffer, the size of the buffer should
102 be four times the number of bytes encoded, plus one for the trailing
104 The flag parameter is used for altering the default range of
105 characters considered for encoding and for altering the visual
107 The additional character,
109 is only used when selecting the
111 encoding format (explained below).
121 a visual representation of
128 functions encode characters from
137 functions encode exactly
142 is useful for encoding a block of data that may contain
150 must be four times the number
151 of bytes encoded from
156 forms return the number of characters in
158 (not including the trailing
162 versions of the functions also take an additional argument
164 that indicates the length of the
169 is not large enough to fit the converted string then the
173 functions return \-1 and set
179 function takes an additional argument,
181 that is used to pass in and out a multibyte conversion error flag.
182 This is useful when processing single characters at a time when
183 it is possible that the locale may be set to something other
184 than the locale of the characters in the input data.
204 but have an additional argument
208 terminated list of characters.
209 These characters will be copied encoded or backslash-escaped into
211 These functions are useful e.g. to remove the special meaning
212 of certain characters to shells.
214 The encoding is a unique, invertible representation composed entirely of
215 graphic characters; it can be decoded back into the original form using
223 There are two parameters that can be controlled: the range of
224 characters that are encoded (applies only to
232 and the type of representation used.
233 By default, all non-graphic characters,
234 except space, tab, and newline are encoded (see
238 .Bl -tag -width VIS_WHITEX
240 Also encode the magic characters
265 Unsafe means control characters which may cause common terminals to perform
266 unexpected functions.
267 Currently this form allows space, tab, newline, backspace, bell, and
268 return \(em in addition to all graphic characters \(em unencoded.
271 (The above flags have no effect for
279 When using these functions, place all graphic characters to be
280 encoded in an array pointed to by
282 In general, the backslash character should be included in this array, see the
283 warning on the use of the
287 There are four forms of encoding.
288 All forms use the backslash character
290 to introduce a special
291 sequence; two backslashes are used to represent a real backslash,
300 These are the visual formats:
301 .Bl -tag -width VIS_CSTYLE
305 to represent meta characters (characters with the 8th
306 bit set), and use caret
308 to represent control characters (see
310 The following formats are used:
311 .Bl -tag -width xxxxx
313 Represents the control character
326 with the 8th bit set.
332 Represents control character
334 with the 8th bit set.
348 Represents Meta-space.
352 Use C-style backslash sequences to represent standard non-printable
354 The following sequences are used to represent the indicated characters:
355 .Bd -unfilled -offset indent
356 .Li \ea Tn \(em BEL No (007)
357 .Li \eb Tn \(em BS No (010)
358 .Li \ef Tn \(em NP No (014)
359 .Li \en Tn \(em NL No (012)
360 .Li \er Tn \(em CR No (015)
361 .Li \es Tn \(em SP No (040)
362 .Li \et Tn \(em HT No (011)
363 .Li \ev Tn \(em VT No (013)
364 .Li \e0 Tn \(em NUL No (000)
367 When using this format, the
369 parameter is looked at to determine if a
371 character can be encoded as
377 is an octal digit, the latter representation is used to
380 Use a three digit octal sequence.
385 represents an octal digit.
387 Use URI encoding as described in RFC 1738.
392 represents a lower case hexadecimal digit.
394 Use MIME Quoted-Printable encoding as described in RFC 2045, only don't
395 break lines and don't handle CRLF.
400 represents an upper case hexadecimal digit.
403 There is one additional flag,
406 doubling of backslashes and the backslash before the default
407 format (that is, control characters are represented by
412 With this flag set, the encoding is
413 ambiguous and non-invertible.
414 .Sh MULTIBYTE CHARACTER SUPPORT
415 These functions support multibyte character input.
416 The encoding conversion is influenced by the setting of the
418 environment variable which defines the set of characters
419 that can be copied without encoding.
421 When 8-bit data is present in the input,
423 must be set to the correct locale or to the C locale.
424 If the locales of the data and the conversion are mismatched,
425 multibyte character recognition may fail and encoding will be performed
426 byte-by-byte instead.
430 must be four times the number of bytes processed from
432 But note that each multibyte character can be up to
436 .\" .Xr multibyte 3 )
437 so in terms of multibyte characters,
441 times the number of characters processed from
444 .Bl -tag -width ".Ev LC_CTYPE"
446 Specify the locale of the input data.
447 Set to C if the input data locale is unknown.
462 will return \-1 when the
464 destination buffer size is not enough to perform the conversion while
468 .Bl -tag -width ".Bq Er ENOSPC"
470 The destination buffer size is not large enough to perform the conversion.
476 .\" .Xr multibyte 3 ,
480 .%T Uniform Resource Locators (URL)
484 .%T "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"
493 functions first appeared in
506 functions as well as multibyte character support were added in OS X 10.12.