X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/c3b177ae6310ed80f054ae227513ef681f9c3dad..65baafba0e8cd74f2264b7e2f7625ff5bea84864:/docs/html/gettext/gettext_6.html diff --git a/docs/html/gettext/gettext_6.html b/docs/html/gettext/gettext_6.html new file mode 100644 index 0000000000..b106043570 --- /dev/null +++ b/docs/html/gettext/gettext_6.html @@ -0,0 +1,258 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.54 + from gettext.texi on 25 January 1999 --> + +<TITLE>GNU gettext utilities - Producing Binary MO Files</TITLE> +<link href="gettext_7.html" rel=Next> +<link href="gettext_5.html" rel=Previous> +<link href="gettext_toc.html" rel=ToC> + +</HEAD> +<BODY> +<p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">Producing Binary MO Files</A></H1> + + + +<H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">Invoking the <CODE>msgfmt</CODE> Program</A></H2> + + +<PRE> +Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ... +</PRE> + +<DL COMPACT> + +<DT><SAMP>`-a <VAR>number</VAR>'</SAMP> +<DD> +<DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP> +<DD> +Align strings to <VAR>number</VAR> bytes (default: 1). + +<DT><SAMP>`-h'</SAMP> +<DD> +<DT><SAMP>`--help'</SAMP> +<DD> +Display this help and exit. + +<DT><SAMP>`--no-hash'</SAMP> +<DD> +Binary file will not include the hash table. + +<DT><SAMP>`-o <VAR>file</VAR>'</SAMP> +<DD> +<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP> +<DD> +Specify output file name as <VAR>file</VAR>. + +<DT><SAMP>`--strict'</SAMP> +<DD> +Direct the program to work strictly following the Uniforum/Sun +implementation. Currently this only affects the naming of the output +file. If this option is not given the name of the output file is the +same as the domain name. If the strict Uniforum mode is enable the +suffix <TT>`.mo'</TT> is added to the file name if it is not already +present. + +We find this behaviour of Sun's implementation rather silly and so by +default this mode is <EM>not</EM> selected. + +<DT><SAMP>`-v'</SAMP> +<DD> +<DT><SAMP>`--verbose'</SAMP> +<DD> +Detect and diagnose input file anomalies which might represent +translation errors. The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are +studied and compared. It is considered abnormal that one string +starts or ends with a newline while the other does not. + +Also, if the string represents a format string used in a +<CODE>printf</CODE>-like function both strings should have the same number of +<SAMP>`%'</SAMP> format specifiers, with matching types. If the flag +<CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special +comment <KBD>#,</KBD> for this entry a check is performed. For example, the +check will diagnose using <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> +against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>. It can even handle +positional parameters. + +Normally the <CODE>xgettext</CODE> program automatically decides whether a +string is a format string or not. This algorithm is not perfect, +though. It might regard a string as a format string though it is not +used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report +errors where there are none. Or the other way round: a string is not +regarded as a format string but it is used in a <CODE>printf</CODE>-like +function. + +So solve this problem the programmer can dictate the decision to the +<CODE>xgettext</CODE> program (see section <A HREF="gettext_3.html#SEC17">Special Comments preceding Keywords</A>). The translator should not +consider removing the flag from the <KBD>#,</KBD> line. This "fix" would be +reversed again as soon as <CODE>msgmerge</CODE> is called the next time. + +<DT><SAMP>`-V'</SAMP> +<DD> +<DT><SAMP>`--version'</SAMP> +<DD> +Output version information and exit. + +</DL> + +<P> +If input file is <SAMP>`-'</SAMP>, standard input is read. If output file +is <SAMP>`-'</SAMP>, output is written to standard output. + +</P> + + +<H2><A NAME="SEC34" HREF="gettext_toc.html#TOC34">The Format of GNU MO Files</A></H2> + +<P> +The format of the generated MO files is best described by a picture, +which appears below. + +</P> +<P> +The first two words serve the identification of the file. The magic +number will always signal GNU MO files. The number is stored in the +byte order of the generating machine, so the magic number really is +two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second +word describes the current revision of the file format. For now the +revision is 0. This might change in future versions, and ensures +that the readers of MO files can distinguish new formats from old +ones, so that both can be handled correctly. The version is kept +separate from the magic number, instead of using different magic +numbers for different formats, mainly because <TT>`/etc/magic'</TT> is +not updated often. It might be better to have magic separated from +internal format version identification. + +</P> +<P> +Follow a number of pointers to later tables in the file, allowing +for the extension of the prefix part of MO files without having to +recompile programs reading them. This might become useful for later +inserting a few flag bits, indication about the charset used, new +tables, or other things. + +</P> +<P> +Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables +of string descriptors can be found. In both tables, each string +descriptor uses two 32 bits integers, one for the string length, +another for the offset of the string in the MO file, counting in bytes +from the start of the file. The first table contains descriptors +for the original strings, and is sorted so the original strings +are in increasing lexicographical order. The second table contains +descriptors for the translated strings, and is parallel to the first +table: to find the corresponding translation one has to access the +array slot in the second array with the same index. + +</P> +<P> +Having the original strings sorted enables the use of simple binary +search, for when the MO file does not contain an hashing table, or +for when it is not practical to use the hashing table provided in +the MO file. This also has another advantage, as the empty string +in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into +some system information attached to that particular MO file, and the +empty string necessarily becomes the first in both the original and +translated tables, making the system information very easy to find. + +</P> +<P> +The size <VAR>S</VAR> of the hash table can be zero. In this case, the +hash table itself is not contained in the MO file. Some people might +prefer this because a precomputed hashing table takes disk space, and +does not win <EM>that</EM> much speed. The hash table contains indices +to the sorted array of strings in the MO file. Conflict resolution is +done by double hashing. The precise hashing algorithm used is fairly +dependent of GNU <CODE>gettext</CODE> code, and is not documented here. + +</P> +<P> +As for the strings themselves, they follow the hash file, and each +is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in +the length which appears in the string descriptor. The <CODE>msgfmt</CODE> +program has an option selecting the alignment for MO file strings. +With this option, each string is separately aligned so it starts at +an offset which is a multiple of the alignment value. On some RISC +machines, a correct alignment will speed things up. + +</P> +<P> +Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings. +However, the program interface currently used already presumes +that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are +somewhat useless. But MO file format is general enough so other +interfaces would be later possible, if for example, we ever want to +implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may +accidently appear. + +</P> +<P> +This particular issue has been strongly debated in the GNU +<CODE>gettext</CODE> development forum, and it is expectable that MO file +format will evolve or change over time. It is even possible that many +formats may later be supported concurrently. But surely, we have to +start somewhere, and the MO file format described here is a good start. +Nothing is cast in concrete, and the format may later evolve fairly +easily, so we should feel comfortable with the current approach. + +</P> + +<PRE> + byte + +------------------------------------------+ + 0 | magic number = 0x950412de | + | | + 4 | file format revision = 0 | + | | + 8 | number of strings | == N + | | + 12 | offset of table with original strings | == O + | | + 16 | offset of table with translation strings | == T + | | + 20 | size of hashing table | == S + | | + 24 | offset of hashing table | == H + | | + . . + . (possibly more entries later) . + . . + | | + O | length & offset 0th string ----------------. + O + 8 | length & offset 1st string ------------------. + ... ... | | +O + ((N-1)*8)| length & offset (N-1)th string | | | + | | | | + T | length & offset 0th translation ---------------. + T + 8 | length & offset 1st translation -----------------. + ... ... | | | | +T + ((N-1)*8)| length & offset (N-1)th translation | | | | | + | | | | | | + H | start hash table | | | | | + ... ... | | | | + H + S * 4 | end hash table | | | | | + | | | | | | + | NUL terminated 0th string <----------------' | | | + | | | | | + | NUL terminated 1st string <------------------' | | + | | | | + ... ... | | + | | | | + | NUL terminated 0th translation <---------------' | + | | | + | NUL terminated 1st translation <-----------------' + | | + ... ... + | | + +------------------------------------------+ +</PRE> + +<P><HR><P> +<p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. +</BODY> +</HTML>