]> git.saurik.com Git - apple/icu.git/blame - icuSources/tools/gendict/gendict.1.in
ICU-66108.tar.gz
[apple/icu.git] / icuSources / tools / gendict / gendict.1.in
CommitLineData
51004dcb
A
1.\" Hey, Emacs! This is -*-nroff-*- you know...
2.\"
3.\" gendict.1: manual page for the gendict utility
4.\"
f3c0d7a5
A
5.\" Copyright (C) 2016 and later: Unicode, Inc. and others.
6.\" License & terms of use: http://www.unicode.org/copyright.html
51004dcb
A
7.\" Copyright (C) 2012 International Business Machines Corporation and others
8.\"
9.TH GENDICT 1 "1 June 2012" "ICU MANPAGE" "ICU @VERSION@ Manual"
10.SH NAME
11.B gendict
12\- Compiles word list into ICU string trie dictionary
13.SH SYNOPSIS
14.B gendict
15[
16.BR "\fB\-\-uchars"
17|
18.BR "\fB\-\-bytes"
19.BI "\fB\-\-transform" " transform"
20]
21[
22.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
23]
24[
25.BR "\-V\fP, \fB\-\-version"
26]
27[
28.BR "\-c\fP, \fB\-\-copyright"
29]
30[
31.BR "\-v\fP, \fB\-\-verbose"
32]
33[
34.BI "\-i\fP, \fB\-\-icudatadir" " directory"
35]
36.IR " input-file"
37.IR " output\-file"
38.SH DESCRIPTION
39.B gendict
40reads the word list from
41.I dictionary-file
42and creates a string trie dictionary file. Normally this data file has the
43.B .dict
44extension.
45.PP
46Words begin at the beginning of a line and are terminated by the first whitespace.
47Lines that begin with whitespace are ignored.
48.SH OPTIONS
49.TP
50.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
51Print help about usage and exit.
52.TP
53.BR "\-V\fP, \fB\-\-version"
54Print the version of
55.B gendict
56and exit.
57.TP
58.BR "\-c\fP, \fB\-\-copyright"
59Embeds the standard ICU copyright into the
60.IR output-file .
61.TP
62.BR "\-v\fP, \fB\-\-verbose"
63Display extra informative messages during execution.
64.TP
65.BI "\-i\fP, \fB\-\-icudatadir" " directory"
66Look for any necessary ICU data files in
67.IR directory .
68For example, the file
69.B pnames.icu
70must be located when ICU's data is not built as a shared library.
71The default ICU data directory is specified by the environment variable
72.BR ICU_DATA .
73Most configurations of ICU do not require this argument.
74.TP
75.BR "\fB\-\-uchars"
76Set the output trie type to UChar. Mutually exclusive with
77.BR --bytes.
78.TP
79.BR "\fB\-\-bytes"
80Set the output trie type to Bytes. Mutually exclusive with
81.BR --uchars.
82.TP
83.BR "\fB\-\-transform"
84Set the transform type. Should only be specified with
85.BR --bytes.
86Currently supported transforms are:
87.BR offset-<hex-number>,
88which specifies an offset to subtract from all input characters.
89It should be noted that the offset transform also maps U+200D
90to 0xFF and U+200C to 0xFE, in order to offer compatibility to
91languages that require these characters.
92A transform must be specified for a bytes trie, and when applied
93to the non-value characters in the
94.IR input-file
95must produce output between 0x00 and 0xFF.
96.TP
97.BI " input\-file"
98The source file to read.
99.TP
100.BI " output\-file"
101The file to write the output dictionary to.
102.SH CAVEATS
103The
104.IR input-file
105is assumed to be encoded in UTF-8.
106The integers in the
107.IR input-file
108that are used as values must be made up of ASCII digits. They
109may be specified either in hex, by using a 0x prefix, or in
110decimal.
111Either
112.BI --bytes
113or
114.BI --uchars
115must be specified.
116.SH ENVIRONMENT
117.TP 10
118.B ICU_DATA
119Specifies the directory containing ICU data. Defaults to
120.BR @thepkgicudatadir@/@PACKAGE@/@VERSION@/ .
121Some tools in ICU depend on the presence of the trailing slash. It is thus
122important to make sure that it is present if
123.B ICU_DATA
124is set.
125.SH AUTHORS
126Maxime Serrano
127.SH VERSION
1281.0
129.SH COPYRIGHT
130Copyright (C) 2012 International Business Machines Corporation and others
131.SH SEE ALSO
132.BR http://www.icu-project.org/userguide/boundaryAnalysis.html
133