X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/c3b177ae6310ed80f054ae227513ef681f9c3dad..65baafba0e8cd74f2264b7e2f7625ff5bea84864:/docs/html/gettext/gettext_3.html diff --git a/docs/html/gettext/gettext_3.html b/docs/html/gettext/gettext_3.html new file mode 100644 index 0000000000..1311d7068f --- /dev/null +++ b/docs/html/gettext/gettext_3.html @@ -0,0 +1,606 @@ + +
+ + +Go to the first, previous, next, last section, table of contents. +
+ + +
+For the programmer, changes to the C source code fall into three
+categories. First, you have to make the localization functions
+known to all modules needing message translation. Second, you should
+properly trigger the operation of GNU gettext
when the program
+initializes, usually from the main
function. Last, you should
+identify and especially mark all constant strings in your program
+needing translation.
+
+
+Presuming that your set of programs, or package, has been adjusted
+so all needed GNU gettext
files are available, and your
+`Makefile' files are adjusted (see section The Maintainer's View), each C module
+having translated C strings should contain the line:
+
+
+#include <libintl.h> ++ +
+The remaining changes to your C sources are discussed in the further +sections of this chapter. + +
+ + + +gettext
Operations+The initialization of locale data should be done with more or less +the same code in every program, as demonstrated below: + +
+ ++int +main (argc, argv) + int argc; + char argv; +{ + ... + setlocale (LC_ALL, ""); + bindtextdomain (PACKAGE, LOCALEDIR); + textdomain (PACKAGE); + ... +} ++ +
+PACKAGE and LOCALEDIR should be provided either by
+`config.h' or by the Makefile. For now consult the gettext
+sources for more information.
+
+
+The use of LC_ALL
might not be appropriate for you.
+LC_ALL
includes all locale categories and especially
+LC_CTYPE
. This later category is responsible for determining
+character classes with the isalnum
etc. functions from
+`ctype.h' which could especially for programs, which process some
+kind of input language, be wrong. For example this would mean that a
+source code using the @,{c} (c-cedilla character) is runnable in
+France but not in the U.S.
+
+
+Some systems also have problems with parsing number using the
+scanf
functions if an other but the LC_ALL
locale is used.
+The standards say that additional formats but the one known in the
+"C"
locale might be recognized. But some systems seem to reject
+numbers in the "C"
locale format. In some situation, it might
+also be a problem with the notation itself which makes it impossible to
+recognize whether the number is in the "C"
locale or the local
+format. This can happen if thousands separator characters are used.
+Some locales define this character according to the national
+conventions to '.'
which is the same character used in the
+"C"
locale to denote the decimal point.
+
+
+So it is sometimes necessary to replace the LC_ALL
line in the
+code above by a sequence of setlocale
lines
+
+
+{ + ... + setlocale (LC_TIME, ""); + setlocale (LC_MESSAGES, ""); + ... +} ++ +
+or to switch for and back to the character class in question. On all
+POSIX conformant systems the locale categories LC_CTYPE
,
+LC_COLLATE
, LC_MONETARY
, LC_NUMERIC
, and
+LC_TIME
are available. On some modern systems there is also a
+locale LC_MESSAGES
which is called on some old, XPG2 compliant
+systems LC_RESPONSES
.
+
+
+All strings requiring translation should be marked in the C sources. Marking
+is done in such a way that each translatable string appears to be
+the sole argument of some function or preprocessor macro. There are
+only a few such possible functions or macros meant for translation,
+and their names are said to be marking keywords. The marking is
+attached to strings themselves, rather than to what we do with them.
+This approach has more uses. A blatant example is an error message
+produced by formatting. The format string needs translation, as
+well as some strings inserted through some `%s' specification
+in the format, while the result from sprintf
may have so many
+different instances that it is impractical to list them all in some
+`error_string_out()' routine, say.
+
+
+This marking operation has two goals. The first goal of marking +is for triggering the retrieval of the translation, at run time. +The keyword are possibly resolved into a routine able to dynamically +return the proper translation, as far as possible or wanted, for the +argument string. Most localizable strings are found in executable +positions, that is, attached to variables or given as parameters to +functions. But this is not universal usage, and some translatable +strings appear in structured initializations. See section Special Cases of Translatable Strings. + +
+
+The second goal of the marking operation is to help xgettext
+at properly extracting all translatable strings when it scans a set
+of program sources and produces PO file templates.
+
+
+The canonical keyword for marking translatable strings is
+`gettext', it gave its name to the whole GNU gettext
+package. For packages making only light use of the `gettext'
+keyword, macro or function, it is easily used as is. However,
+for packages using the gettext
interface more heavily, it
+is usually more convenient to give the main keyword a shorter, less
+obtrusive name. Indeed, the keyword might appear on a lot of strings
+all over the package, and programmers usually do not want nor need
+their program sources to remind them forcefully, all the time, that they
+are internationalized. Further, a long keyword has the disadvantage
+of using more horizontal space, forcing more indentation work on
+sources for those trying to keep them within 79 or 80 columns.
+
+
+Many packages use `_' (a simple underline) as a keyword,
+and write `_("Translatable string")' instead of `gettext
+("Translatable string")'. Further, the coding rule, from GNU standards,
+wanting that there is a space between the keyword and the opening
+parenthesis is relaxed, in practice, for this particular usage.
+So, the textual overhead per translatable string is reduced to
+only three characters: the underline and the two parentheses.
+However, even if GNU gettext
uses this convention internally,
+it does not offer it officially. The real, genuine keyword is truly
+`gettext' indeed. It is fairly easy for those wanting to use
+`_' instead of `gettext' to declare:
+
+
+#include <libintl.h> +#define _(String) gettext (String) ++ +
+instead of merely using `#include <libintl.h>'. + +
++Later on, the maintenance is relatively easy. If, as a programmer, +you add or modify a string, you will have to ask yourself if the +new or altered string requires translation, and include it within +`_()' if you think it should be translated. `"%s: %d"' is +an example of string not requiring translation! + +
+ + ++In PO mode, one set of features is meant more for the programmer than +for the translator, and allows him to interactively mark which strings, +in a set of program sources, are translatable, and which are not. +Even if it is a fairly easy job for a programmer to find and mark +such strings by other means, using any editor of his choice, PO mode +makes this work more comfortable. Further, this gives translators +who feel a little like programmers, or programmers who feel a little +like translators, a tool letting them work at marking translatable +strings in the program sources, while simultaneously producing a set of +translation in some language, for the package being internationalized. + +
++The set of program sources, targetted by the PO mode commands describe +here, should have an Emacs tags table constructed for your project, +prior to using these PO file commands. This is easy to do. In any +shell window, change the directory to the root of your project, then +execute a command resembling: + +
+ ++etags src/*.[hc] lib/*.[hc] ++ +
+presuming here you want to process all `.h' and `.c' files +from the `src/' and `lib/' directories. This command will +explore all said files and create a `TAGS' file in your root +directory, somewhat summarizing the contents using a special file +format Emacs can understand. + +
+
+For packages following the GNU coding standards, there is
+a make goal tags
or TAGS
which construct the tag files in
+all directories and for all files containing source code.
+
+
+Once your `TAGS' file is ready, the following commands assist +the programmer at marking translatable strings in his set of sources. +But these commands are necessarily driven from within a PO file +window, and it is likely that you do not even have such a PO file yet. +This is not a problem at all, as you may safely open a new, empty PO +file, mainly for using these commands. This empty PO file will slowly +fill in while you mark strings as translatable in your program sources. + +
+
+The , (po-tags-search
) command search for the next
+occurrence of a string which looks like a possible candidate for
+translation, and displays the program source in another Emacs window,
+positioned in such a way that the string is near the top of this other
+window. If the string is too big to fit whole in this window, it is
+positioned so only its end is shown. In any case, the cursor
+is left in the PO file window. If the shown string would be better
+presented differently in different native languages, you may mark it
+using M-, or M-.. Otherwise, you might rather ignore it
+and skip to the next string by merely repeating the , command.
+
+
+A string is a good candidate for translation if it contains a sequence +of three or more letters. A string containing at most two letters in +a row will be considered as a candidate if it has more letters than +non-letters. The command disregards strings containing no letters, +or isolated letters only. It also disregards strings within comments, +or strings already marked with some keyword PO mode knows (see below). + +
++If you have never told Emacs about some `TAGS' file to use, the +command will request that you specify one from the minibuffer, the +first time you use the command. You may later change your `TAGS' +file by using the regular Emacs command M-x visit-tags-table, +which will ask you to name the precise `TAGS' file you want +to use. See section `Tag Tables' in The Emacs Editor. + +
++Each time you use the , command, the search resumes from where it was +left by the previous search, and goes through all program sources, +obeying the `TAGS' file, until all sources have been processed. +However, by giving a prefix argument to the command (C-u +,), you may request that the search be restarted all over again +from the first program source; but in this case, strings that you +recently marked as translatable will be automatically skipped. + +
+
+Using this , command does not prevent using of other regular
+Emacs tags commands. For example, regular tags-search
or
+tags-query-replace
commands may be used without disrupting the
+independent , search sequence. However, as implemented, the
+initial , command (or the , command is used with a
+prefix) might also reinitialize the regular Emacs tags searching to the
+first tags file, this reinitialization might be considered spurious.
+
+
+The M-, (po-mark-translatable
) command will mark the
+recently found string with the `_' keyword. The M-.
+(po-select-mark-and-mark
) command will request that you type
+one keyword from the minibuffer and use that keyword for marking
+the string. Both commands will automatically create a new PO file
+untranslated entry for the string being marked, and make it the
+current entry (making it easy for you to immediately proceed to its
+translation, if you feel like doing it right away). It is possible
+that the modifications made to the program source by M-, or
+M-. render some source line longer than 80 columns, forcing you
+to break and re-indent this line differently. You may use the O
+command from PO mode, or any other window changing command from
+GNU Emacs, to break out into the program source window, and do any
+needed adjustments. You will have to use some regular Emacs command
+to return the cursor to the PO file window, if you want command
+, for the next string, say.
+
+
+The M-. command has a few built-in speedups, so you do not +have to explicitly type all keywords all the time. The first such +speedup is that you are presented with a preferred keyword, +which you may accept by merely typing RET at the prompt. +The second speedup is that you may type any non-ambiguous prefix of the +keyword you really mean, and the command will complete it automatically +for you. This also means that PO mode has to know all +your possible keywords, and that it will not accept mistyped keywords. + +
++If you reply ? to the keyword request, the command gives a +list of all known keywords, from which you may choose. When the +command is prefixed by an argument (C-u M-.), it inhibits +updating any program source or PO file buffer, and does some simple +keyword management instead. In this case, the command asks for a +keyword, written in full, which becomes a new allowed keyword for +later M-. commands. Moreover, this new keyword automatically +becomes the preferred keyword for later commands. By typing +an already known keyword in response to C-u M-., one merely +changes the preferred keyword and does nothing more. + +
++All keywords known for M-. are recognized by the , command +when scanning for strings, and strings already marked by any of those +known keywords are automatically skipped. If many PO files are opened +simultaneously, each one has its own independent set of known keywords. +There is no provision in PO mode, currently, for deleting a known +keyword, you have to quit the file (maybe using q) and reopen +it afresh. When a PO file is newly brought up in an Emacs window, only +`gettext' and `_' are known as keywords, and `gettext' +is preferred for the M-. command. In fact, this is not useful to +prefer `_', as this one is already built in the M-, command. + +
+ + +
+In C programs strings are often used within calls of functions from the
+printf
family. The special thing about these format strings is
+that they can contain format specifiers introduced with %. Assume
+we have the code
+
+
+printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); ++ +
+A possible German translation for the above string might be: + +
+ ++"%d Zeichen lang ist die Zeichenkette `%s'" ++ +
+A C programmer, even if he cannot speak German, will recognize that
+there is something wrong here. The order of the two format specifiers
+is changed but of course the arguments in the printf
don't have.
+This will most probably lead to problems because now the length of the
+string is regarded as the address.
+
+
+To prevent errors at runtime caused by translations the msgfmt
+tool can check statically whether the arguments in the original and the
+translation string match in type and number. If this is not the case a
+warning will be given and the error cannot causes problems at runtime.
+
+
+If the word order in the above German translation would be correct one +would have to write + +
+ ++"%2$d Zeichen lang ist die Zeichenkette `%1$s'" ++ +
+The routines in msgfmt
know about this special notation.
+
+
+Because not all strings in a program must be format strings it is not
+useful for msgfmt
to test all the strings in the `.po' file.
+This might cause problems because the string might contain what looks
+like a format specifier, but the string is not used in printf
.
+
+
+Therefore the xgettext
adds a special tag to those messages it
+thinks might be a format string. There is no absolute rule for this,
+only a heuristic. In the `.po' file the entry is marked using the
+c-format
flag in the #, comment line (see section The Format of PO Files).
+
+
+The careful reader now might say that this again can cause problems.
+The heuristic might guess it wrong. This is true and therefore
+xgettext
knows about special kind of comment which lets
+the programmer take over the decision. If in the same line or
+the immediately preceding line of the gettext
keyword
+the xgettext
program find a comment containing the words
+xgettext:c-format it will mark the string in any case with
+the c-format flag. This kind of comment should be used when
+xgettext
does not recognize the string as a format string but
+is really is one and it should be tested. Please note that when the
+comment is in the same line of the gettext
keyword, it must be
+before the string to be translated.
+
+
+This situation happens quite often. The printf
function is often
+called with strings which do not contain a format specifier. Of course
+one would normally use fputs
but it does happen. In this case
+xgettext
does not recognize this as a format string but what
+happens if the translation introduces a valid format specifier? The
+printf
function will try to access one of the parameter but none
+exists because the original code does not refer to any parameter.
+
+
+xgettext
of course could make a wrong decision the other way
+round. A string marked as a format string is not really a format
+string. In this case the msgfmt
might give too many warnings and
+would prevent translating the `.po' file. The method to prevent
+this wrong decision is similar to the one used above, only the comment
+to use must contain the string xgettext:no-c-format.
+
+
+If a string is marked with c-format and this is not correct the
+user can find out who is responsible for the decision. See section Invoking the xgettext
Program to see how the --debug option can be used for solving
+this problem.
+
+
+The attentive reader might now point out that it is not always possible
+to mark translatable string with gettext
or something like this.
+Consider the following case:
+
+
+{ + static const char *messages[] = { + "some very meaningful message", + "and another one" + }; + const char *string; + ... + string + = index > 1 ? "a default message" : messages[index]; + + fputs (string); + ... +} ++ +
+While it is no problem to mark the string "a default message"
it
+is not possible to mark the string initializers for messages
.
+What is to be done? We have to fulfil two tasks. First we have to mark the
+strings so that the xgettext
program (see section Invoking the xgettext
Program)
+can find them, and second we have to translate the string at runtime
+before printing them.
+
+
+The first task can be fulfilled by creating a new keyword, which names a +no-op. For the second we have to mark all access points to a string +from the array. So one solution can look like this: + +
+ ++#define gettext_noop(String) (String) + +{ + static const char *messages[] = { + gettext_noop ("some very meaningful message"), + gettext_noop ("and another one") + }; + const char *string; + ... + string + = index > 1 ? gettext ("a default message") : gettext (messages[index]); + + fputs (string); + ... +} ++ +
+Please convince yourself that the string which is written by
+fputs
is translated in any case. How to get xgettext
know
+the additional keyword gettext_noop
is explained in section Invoking the xgettext
Program.
+
+
+The above is of course not the only solution. You could also come along +with the following one: + +
+ ++#define gettext_noop(String) (String) + +{ + static const char *messages[] = { + gettext_noop ("some very meaningful message", + gettext_noop ("and another one") + }; + const char *string; + ... + string + = index > 1 ? gettext_noop ("a default message") : messages[index]; + + fputs (gettext (string)); + ... +} ++ +
+But this has some drawbacks. First the programmer has to take care that
+he uses gettext_noop
for the string "a default message"
.
+A use of gettext
could have in rare cases unpredictable results.
+The second reason is found in the internals of the GNU gettext
+Library which will make this solution less efficient.
+
+
+One advantage is that you need not make control flow analysis to make +sure the output is really translated in any case. But this analysis is +generally not very difficult. If it should be in any situation you can +use this second method in this situation. + +
++
Go to the first, previous, next, last section, table of contents. + +