X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/07fce3c2f91978b00eef7a74615badb374774fc0..449d48f9e15e1430805aed3b33e55000754ad926:/docs/html/gettext/gettext_6.html diff --git a/docs/html/gettext/gettext_6.html b/docs/html/gettext/gettext_6.html new file mode 100644 index 0000000000..09387ebe7a --- /dev/null +++ b/docs/html/gettext/gettext_6.html @@ -0,0 +1,258 @@ + + + + +GNU gettext utilities - Producing Binary MO Files + + + + + + +

Go to the first, previous, next, last section, table of contents. +


+ + +

Producing Binary MO Files

+ + + +

Invoking the msgfmt Program

+ + +
+Usage: msgfmt [option] filename.po ...
+
+ +
+ +
`-a number' +
+
`--alignment=number' +
+Align strings to number bytes (default: 1). + +
`-h' +
+
`--help' +
+Display this help and exit. + +
`--no-hash' +
+Binary file will not include the hash table. + +
`-o file' +
+
`--output-file=file' +
+Specify output file name as file. + +
`--strict' +
+Direct the program to work strictly following the Uniforum/Sun +implementation. Currently this only affects the naming of the output +file. If this option is not given the name of the output file is the +same as the domain name. If the strict Uniforum mode is enable the +suffix `.mo' is added to the file name if it is not already +present. + +We find this behaviour of Sun's implementation rather silly and so by +default this mode is not selected. + +
`-v' +
+
`--verbose' +
+Detect and diagnose input file anomalies which might represent +translation errors. The msgid and msgstr strings are +studied and compared. It is considered abnormal that one string +starts or ends with a newline while the other does not. + +Also, if the string represents a format sring used in a +printf-like function both strings should have the same number of +`%' format specifiers, with matching types. If the flag +c-format or possible-c-format appears in the special +comment #, for this entry a check is performed. For example, the +check will diagnose using `%.*s' against `%s', or `%d' +against `%s', or `%d' against `%x'. It can even handle +positional parameters. + +Normally the xgettext program automatically decides whether a +string is a format string or not. This algorithm is not perfect, +though. It might regard a string as a format string though it is not +used in a printf-like function and so msgfmt might report +errors where there are none. Or the other way round: a string is not +regarded as a format string but it is used in a printf-like +function. + +So solve this problem the programmer can dictate the decision to the +xgettext program (see section Special Comments preceding Keywords). The translator should not +consider removing the flag from the #, line. This "fix" would be +reversed again as soon as msgmerge is called the next time. + +
`-V' +
+
`--version' +
+Output version information and exit. + +
+ +

+If input file is `-', standard input is read. If output file +is `-', output is written to standard output. + +

+ + +

The Format of GNU MO Files

+ +

+The format of the generated MO files is best described by a picture, +which appears below. + +

+

+The first two words serve the identification of the file. The magic +number will always signal GNU MO files. The number is stored in the +byte order of the generating machine, so the magic number really is +two numbers: 0x950412de and 0xde120495. The second +word describes the current revision of the file format. For now the +revision is 0. This might change in future versions, and ensures +that the readers of MO files can distinguish new formats from old +ones, so that both can be handled correctly. The version is kept +separate from the magic number, instead of using different magic +numbers for different formats, mainly because `/etc/magic' is +not updated often. It might be better to have magic separated from +internal format version identification. + +

+

+Follow a number of pointers to later tables in the file, allowing +for the extension of the prefix part of MO files without having to +recompile programs reading them. This might become useful for later +inserting a few flag bits, indication about the charset used, new +tables, or other things. + +

+

+Then, at offset O and offset T in the picture, two tables +of string descriptors can be found. In both tables, each string +descriptor uses two 32 bits integers, one for the string length, +another for the offset of the string in the MO file, counting in bytes +from the start of the file. The first table contains descriptors +for the original strings, and is sorted so the original strings +are in increasing lexicographical order. The second table contains +descriptors for the translated strings, and is parallel to the first +table: to find the corresponding translation one has to access the +array slot in the second array with the same index. + +

+

+Having the original strings sorted enables the use of simple binary +search, for when the MO file does not contain an hashing table, or +for when it is not practical to use the hashing table provided in +the MO file. This also has another advantage, as the empty string +in a PO file GNU gettext is usually translated into +some system information attached to that particular MO file, and the +empty string necessarily becomes the first in both the original and +translated tables, making the system information very easy to find. + +

+

+The size S of the hash table can be zero. In this case, the +hash table itself is not contained in the MO file. Some people might +prefer this because a precomputed hashing table takes disk space, and +does not win that much speed. The hash table contains indices +to the sorted array of strings in the MO file. Conflict resolution is +done by double hashing. The precise hashing algorithm used is fairly +dependent of GNU gettext code, and is not documented here. + +

+

+As for the strings themselves, they follow the hash file, and each +is terminated with a NUL, and this NUL is not counted in +the length which appears in the string descriptor. The msgfmt +program has an option selecting the alignment for MO file strings. +With this option, each string is separately aligned so it starts at +an offset which is a multiple of the alignment value. On some RISC +machines, a correct alignment will speed things up. + +

+

+Nothing prevents a MO file from having embedded NULs in strings. +However, the program interface currently used already presumes +that strings are NUL terminated, so embedded NULs are +somewhat useless. But MO file format is general enough so other +interfaces would be later possible, if for example, we ever want to +implement wide characters right in MO files, where NUL bytes may +accidently appear. + +

+

+This particular issue has been strongly debated in the GNU +gettext development forum, and it is expectable that MO file +format will evolve or change over time. It is even possible that many +formats may later be supported concurrently. But surely, we have to +start somewhere, and the MO file format described here is a good start. +Nothing is cast in concrete, and the format may later evolve fairly +easily, so we should feel comfortable with the current approach. + +

+ +
+        byte
+             +------------------------------------------+
+          0  | magic number = 0x950412de                |
+             |                                          |
+          4  | file format revision = 0                 |
+             |                                          |
+          8  | number of strings                        |  == N
+             |                                          |
+         12  | offset of table with original strings    |  == O
+             |                                          |
+         16  | offset of table with translation strings |  == T
+             |                                          |
+         20  | size of hashing table                    |  == S
+             |                                          |
+         24  | offset of hashing table                  |  == H
+             |                                          |
+             .                                          .
+             .    (possibly more entries later)         .
+             .                                          .
+             |                                          |
+          O  | length & offset 0th string  ----------------.
+      O + 8  | length & offset 1st string  ------------------.
+              ...                                    ...   | |
+O + ((N-1)*8)| length & offset (N-1)th string           |  | |
+             |                                          |  | |
+          T  | length & offset 0th translation  ---------------.
+      T + 8  | length & offset 1st translation  -----------------.
+              ...                                    ...   | | | |
+T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
+             |                                          |  | | | |
+          H  | start hash table                         |  | | | |
+              ...                                    ...   | | | |
+  H + S * 4  | end hash table                           |  | | | |
+             |                                          |  | | | |
+             | NUL terminated 0th string  <----------------' | | |
+             |                                          |    | | |
+             | NUL terminated 1st string  <------------------' | |
+             |                                          |      | |
+              ...                                    ...       | |
+             |                                          |      | |
+             | NUL terminated 0th translation  <---------------' |
+             |                                          |        |
+             | NUL terminated 1st translation  <-----------------'
+             |                                          |
+              ...                                    ...
+             |                                          |
+             +------------------------------------------+
+
+ +


+

Go to the first, previous, next, last section, table of contents. + +