X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/7921cf2badfac0c44cd53644bfc6a483a09ec299..d2b1753d162aa390080a005fcf3e8bbf7bf966a8:/docs/html/gettext/gettext_3.html diff --git a/docs/html/gettext/gettext_3.html b/docs/html/gettext/gettext_3.html new file mode 100644 index 0000000000..1311d7068f --- /dev/null +++ b/docs/html/gettext/gettext_3.html @@ -0,0 +1,606 @@ + + + + +GNU gettext utilities - Preparing Program Sources + + + + + + +

Go to the first, previous, next, last section, table of contents. +

+ + +

Preparing Program Sources

+ +

+For the programmer, changes to the C source code fall into three +categories. First, you have to make the localization functions +known to all modules needing message translation. Second, you should +properly trigger the operation of GNU gettext when the program +initializes, usually from the main function. Last, you should +identify and especially mark all constant strings in your program +needing translation. + +

+Presuming that your set of programs, or package, has been adjusted +so all needed GNU gettext files are available, and your +`Makefile' files are adjusted (see section The Maintainer's View), each C module +having translated C strings should contain the line: + +

+ +

+#include <libintl.h>
+

+ +

+The remaining changes to your C sources are discussed in the further +sections of this chapter. + +

+ + + +

Triggering `gettext` Operations

+ +

+The initialization of locale data should be done with more or less +the same code in every program, as demonstrated below: + +

+ +

+int
+main (argc, argv)
+     int argc;
+     char argv;
+{
+  ...
+  setlocale (LC_ALL, "");
+  bindtextdomain (PACKAGE, LOCALEDIR);
+  textdomain (PACKAGE);
+  ...
+}
+

+ +

+PACKAGE and LOCALEDIR should be provided either by +`config.h' or by the Makefile. For now consult the gettext +sources for more information. + +

+The use of LC_ALL might not be appropriate for you. +LC_ALL includes all locale categories and especially +LC_CTYPE. This later category is responsible for determining +character classes with the isalnum etc. functions from +`ctype.h' which could especially for programs, which process some +kind of input language, be wrong. For example this would mean that a +source code using the @,{c} (c-cedilla character) is runnable in +France but not in the U.S. + +

+Some systems also have problems with parsing number using the +scanf functions if an other but the LC_ALL locale is used. +The standards say that additional formats but the one known in the +"C" locale might be recognized. But some systems seem to reject +numbers in the "C" locale format. In some situation, it might +also be a problem with the notation itself which makes it impossible to +recognize whether the number is in the "C" locale or the local +format. This can happen if thousands separator characters are used. +Some locales define this character according to the national +conventions to '.' which is the same character used in the +"C" locale to denote the decimal point. + +

+So it is sometimes necessary to replace the LC_ALL line in the +code above by a sequence of setlocale lines + +

+ +

+{
+  ...
+  setlocale (LC_TIME, "");
+  setlocale (LC_MESSAGES, "");
+  ...
+}
+

+ +

+or to switch for and back to the character class in question. On all +POSIX conformant systems the locale categories LC_CTYPE, +LC_COLLATE, LC_MONETARY, LC_NUMERIC, and +LC_TIME are available. On some modern systems there is also a +locale LC_MESSAGES which is called on some old, XPG2 compliant +systems LC_RESPONSES. + +

+ + +

How Marks Appears in Sources

+ +

+All strings requiring translation should be marked in the C sources. Marking +is done in such a way that each translatable string appears to be +the sole argument of some function or preprocessor macro. There are +only a few such possible functions or macros meant for translation, +and their names are said to be marking keywords. The marking is +attached to strings themselves, rather than to what we do with them. +This approach has more uses. A blatant example is an error message +produced by formatting. The format string needs translation, as +well as some strings inserted through some `%s' specification +in the format, while the result from sprintf may have so many +different instances that it is impractical to list them all in some +`error_string_out()' routine, say. + +

+This marking operation has two goals. The first goal of marking +is for triggering the retrieval of the translation, at run time. +The keyword are possibly resolved into a routine able to dynamically +return the proper translation, as far as possible or wanted, for the +argument string. Most localizable strings are found in executable +positions, that is, attached to variables or given as parameters to +functions. But this is not universal usage, and some translatable +strings appear in structured initializations. See section Special Cases of Translatable Strings. + +

+The second goal of the marking operation is to help xgettext +at properly extracting all translatable strings when it scans a set +of program sources and produces PO file templates. + +

+The canonical keyword for marking translatable strings is +`gettext', it gave its name to the whole GNU gettext +package. For packages making only light use of the `gettext' +keyword, macro or function, it is easily used as is. However, +for packages using the gettext interface more heavily, it +is usually more convenient to give the main keyword a shorter, less +obtrusive name. Indeed, the keyword might appear on a lot of strings +all over the package, and programmers usually do not want nor need +their program sources to remind them forcefully, all the time, that they +are internationalized. Further, a long keyword has the disadvantage +of using more horizontal space, forcing more indentation work on +sources for those trying to keep them within 79 or 80 columns. + +

+Many packages use `_' (a simple underline) as a keyword, +and write `_("Translatable string")' instead of `gettext +("Translatable string")'. Further, the coding rule, from GNU standards, +wanting that there is a space between the keyword and the opening +parenthesis is relaxed, in practice, for this particular usage. +So, the textual overhead per translatable string is reduced to +only three characters: the underline and the two parentheses. +However, even if GNU gettext uses this convention internally, +it does not offer it officially. The real, genuine keyword is truly +`gettext' indeed. It is fairly easy for those wanting to use +`_' instead of `gettext' to declare: + +

+ +

+#include <libintl.h>
+#define _(String) gettext (String)
+

+ +

+instead of merely using `#include <libintl.h>'. + +

+Later on, the maintenance is relatively easy. If, as a programmer, +you add or modify a string, you will have to ask yourself if the +new or altered string requires translation, and include it within +`_()' if you think it should be translated. `"%s: %d"' is +an example of string not requiring translation! + +

+ + +

Marking Translatable Strings

+ +

+In PO mode, one set of features is meant more for the programmer than +for the translator, and allows him to interactively mark which strings, +in a set of program sources, are translatable, and which are not. +Even if it is a fairly easy job for a programmer to find and mark +such strings by other means, using any editor of his choice, PO mode +makes this work more comfortable. Further, this gives translators +who feel a little like programmers, or programmers who feel a little +like translators, a tool letting them work at marking translatable +strings in the program sources, while simultaneously producing a set of +translation in some language, for the package being internationalized. + +

+The set of program sources, targetted by the PO mode commands describe +here, should have an Emacs tags table constructed for your project, +prior to using these PO file commands. This is easy to do. In any +shell window, change the directory to the root of your project, then +execute a command resembling: + +

+ +

+etags src/*.[hc] lib/*.[hc]
+

+ +

+presuming here you want to process all `.h' and `.c' files +from the `src/' and `lib/' directories. This command will +explore all said files and create a `TAGS' file in your root +directory, somewhat summarizing the contents using a special file +format Emacs can understand. + +

+For packages following the GNU coding standards, there is +a make goal tags or TAGS which construct the tag files in +all directories and for all files containing source code. + +

+Once your `TAGS' file is ready, the following commands assist +the programmer at marking translatable strings in his set of sources. +But these commands are necessarily driven from within a PO file +window, and it is likely that you do not even have such a PO file yet. +This is not a problem at all, as you may safely open a new, empty PO +file, mainly for using these commands. This empty PO file will slowly +fill in while you mark strings as translatable in your program sources. + +

, +: +Search through program sources for a string which looks like a +candidate for translation. + +
M-, +: +Mark the last string found with `_()'. + +
M-. +: +Mark the last string found with a keyword taken from a set of possible +keywords. This command with a prefix allows some management of these +keywords. + +

+ +

+The , (po-tags-search) command search for the next +occurrence of a string which looks like a possible candidate for +translation, and displays the program source in another Emacs window, +positioned in such a way that the string is near the top of this other +window. If the string is too big to fit whole in this window, it is +positioned so only its end is shown. In any case, the cursor +is left in the PO file window. If the shown string would be better +presented differently in different native languages, you may mark it +using M-, or M-.. Otherwise, you might rather ignore it +and skip to the next string by merely repeating the , command. + +

+A string is a good candidate for translation if it contains a sequence +of three or more letters. A string containing at most two letters in +a row will be considered as a candidate if it has more letters than +non-letters. The command disregards strings containing no letters, +or isolated letters only. It also disregards strings within comments, +or strings already marked with some keyword PO mode knows (see below). + +

+If you have never told Emacs about some `TAGS' file to use, the +command will request that you specify one from the minibuffer, the +first time you use the command. You may later change your `TAGS' +file by using the regular Emacs command M-x visit-tags-table, +which will ask you to name the precise `TAGS' file you want +to use. See section `Tag Tables' in The Emacs Editor. + +

+Each time you use the , command, the search resumes from where it was +left by the previous search, and goes through all program sources, +obeying the `TAGS' file, until all sources have been processed. +However, by giving a prefix argument to the command (C-u +,), you may request that the search be restarted all over again +from the first program source; but in this case, strings that you +recently marked as translatable will be automatically skipped. + +

+Using this , command does not prevent using of other regular +Emacs tags commands. For example, regular tags-search or +tags-query-replace commands may be used without disrupting the +independent , search sequence. However, as implemented, the +initial , command (or the , command is used with a +prefix) might also reinitialize the regular Emacs tags searching to the +first tags file, this reinitialization might be considered spurious. + +

+The M-, (po-mark-translatable) command will mark the +recently found string with the `_' keyword. The M-. +(po-select-mark-and-mark) command will request that you type +one keyword from the minibuffer and use that keyword for marking +the string. Both commands will automatically create a new PO file +untranslated entry for the string being marked, and make it the +current entry (making it easy for you to immediately proceed to its +translation, if you feel like doing it right away). It is possible +that the modifications made to the program source by M-, or +M-. render some source line longer than 80 columns, forcing you +to break and re-indent this line differently. You may use the O +command from PO mode, or any other window changing command from +GNU Emacs, to break out into the program source window, and do any +needed adjustments. You will have to use some regular Emacs command +to return the cursor to the PO file window, if you want command +, for the next string, say. + +

+The M-. command has a few built-in speedups, so you do not +have to explicitly type all keywords all the time. The first such +speedup is that you are presented with a preferred keyword, +which you may accept by merely typing RET at the prompt. +The second speedup is that you may type any non-ambiguous prefix of the +keyword you really mean, and the command will complete it automatically +for you. This also means that PO mode has to know all +your possible keywords, and that it will not accept mistyped keywords. + +

+If you reply ? to the keyword request, the command gives a +list of all known keywords, from which you may choose. When the +command is prefixed by an argument (C-u M-.), it inhibits +updating any program source or PO file buffer, and does some simple +keyword management instead. In this case, the command asks for a +keyword, written in full, which becomes a new allowed keyword for +later M-. commands. Moreover, this new keyword automatically +becomes the preferred keyword for later commands. By typing +an already known keyword in response to C-u M-., one merely +changes the preferred keyword and does nothing more. + +

+All keywords known for M-. are recognized by the , command +when scanning for strings, and strings already marked by any of those +known keywords are automatically skipped. If many PO files are opened +simultaneously, each one has its own independent set of known keywords. +There is no provision in PO mode, currently, for deleting a known +keyword, you have to quit the file (maybe using q) and reopen +it afresh. When a PO file is newly brought up in an Emacs window, only +`gettext' and `_' are known as keywords, and `gettext' +is preferred for the M-. command. In fact, this is not useful to +prefer `_', as this one is already built in the M-, command. + +

+ + +

Special Comments preceding Keywords

+ +

+In C programs strings are often used within calls of functions from the +printf family. The special thing about these format strings is +that they can contain format specifiers introduced with %. Assume +we have the code + +

+ +

+printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
+

+ +

+A possible German translation for the above string might be: + +

+ +

+"%d Zeichen lang ist die Zeichenkette `%s'"
+

+ +

+A C programmer, even if he cannot speak German, will recognize that +there is something wrong here. The order of the two format specifiers +is changed but of course the arguments in the printf don't have. +This will most probably lead to problems because now the length of the +string is regarded as the address. + +

+To prevent errors at runtime caused by translations the msgfmt +tool can check statically whether the arguments in the original and the +translation string match in type and number. If this is not the case a +warning will be given and the error cannot causes problems at runtime. + +

+If the word order in the above German translation would be correct one +would have to write + +

+ +

+"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
+

+ +

+The routines in msgfmt know about this special notation. + +

+Because not all strings in a program must be format strings it is not +useful for msgfmt to test all the strings in the `.po' file. +This might cause problems because the string might contain what looks +like a format specifier, but the string is not used in printf. + +

+Therefore the xgettext adds a special tag to those messages it +thinks might be a format string. There is no absolute rule for this, +only a heuristic. In the `.po' file the entry is marked using the +c-format flag in the #, comment line (see section The Format of PO Files). + +

+The careful reader now might say that this again can cause problems. +The heuristic might guess it wrong. This is true and therefore +xgettext knows about special kind of comment which lets +the programmer take over the decision. If in the same line or +the immediately preceding line of the gettext keyword +the xgettext program find a comment containing the words +xgettext:c-format it will mark the string in any case with +the c-format flag. This kind of comment should be used when +xgettext does not recognize the string as a format string but +is really is one and it should be tested. Please note that when the +comment is in the same line of the gettext keyword, it must be +before the string to be translated. + +

+This situation happens quite often. The printf function is often +called with strings which do not contain a format specifier. Of course +one would normally use fputs but it does happen. In this case +xgettext does not recognize this as a format string but what +happens if the translation introduces a valid format specifier? The +printf function will try to access one of the parameter but none +exists because the original code does not refer to any parameter. + +

+xgettext of course could make a wrong decision the other way +round. A string marked as a format string is not really a format +string. In this case the msgfmt might give too many warnings and +would prevent translating the `.po' file. The method to prevent +this wrong decision is similar to the one used above, only the comment +to use must contain the string xgettext:no-c-format. + +

+If a string is marked with c-format and this is not correct the +user can find out who is responsible for the decision. See section Invoking the xgettext Program to see how the --debug option can be used for solving +this problem. + +

+ + +

Special Cases of Translatable Strings

+ +

+The attentive reader might now point out that it is not always possible +to mark translatable string with gettext or something like this. +Consider the following case: + +

+ +

+{
+  static const char *messages[] = {
+    "some very meaningful message",
+    "and another one"
+  };
+  const char *string;
+  ...
+  string
+    = index > 1 ? "a default message" : messages[index];
+
+  fputs (string);
+  ...
+}
+

+ +

+While it is no problem to mark the string "a default message" it +is not possible to mark the string initializers for messages. +What is to be done? We have to fulfil two tasks. First we have to mark the +strings so that the xgettext program (see section Invoking the xgettext Program) +can find them, and second we have to translate the string at runtime +before printing them. + +

+The first task can be fulfilled by creating a new keyword, which names a +no-op. For the second we have to mark all access points to a string +from the array. So one solution can look like this: + +

+ +

+#define gettext_noop(String) (String)
+
+{
+  static const char *messages[] = {
+    gettext_noop ("some very meaningful message"),
+    gettext_noop ("and another one")
+  };
+  const char *string;
+  ...
+  string
+    = index > 1 ? gettext ("a default message") : gettext (messages[index]);
+
+  fputs (string);
+  ...
+}
+

+ +

+Please convince yourself that the string which is written by +fputs is translated in any case. How to get xgettext know +the additional keyword gettext_noop is explained in section Invoking the xgettext Program. + +

+The above is of course not the only solution. You could also come along +with the following one: + +

+ +

+#define gettext_noop(String) (String)
+
+{
+  static const char *messages[] = {
+    gettext_noop ("some very meaningful message",
+    gettext_noop ("and another one")
+  };
+  const char *string;
+  ...
+  string
+    = index > 1 ? gettext_noop ("a default message") : messages[index];
+
+  fputs (gettext (string));
+  ...
+}
+

+ +

+But this has some drawbacks. First the programmer has to take care that +he uses gettext_noop for the string "a default message". +A use of gettext could have in rare cases unpredictable results. +The second reason is found in the internals of the GNU gettext +Library which will make this solution less efficient. + +

+One advantage is that you need not make control flow analysis to make +sure the output is really translated in any case. But this analysis is +generally not very difficult. If it should be in any situation you can +use this second method in this situation. + +

Go to the first, previous, next, last section, table of contents. + +

Preparing Program Sources

Triggering gettext Operations

How Marks Appears in Sources

Marking Translatable Strings

Special Comments preceding Keywords

Special Cases of Translatable Strings

Triggering `gettext` Operations