3 <!-- This HTML file has been created by texi2html 1.54
4 from gettext.texi on 25 January 1999 -->
6 <TITLE>GNU gettext utilities - Preparing Program Sources
</TITLE>
7 <link href=
"gettext_4.html" rel=Next
>
8 <link href=
"gettext_2.html" rel=Previous
>
9 <link href=
"gettext_toc.html" rel=ToC
>
13 <p>Go to the
<A HREF=
"gettext_1.html">first
</A>,
<A HREF=
"gettext_2.html">previous
</A>,
<A HREF=
"gettext_4.html">next
</A>,
<A HREF=
"gettext_12.html">last
</A> section,
<A HREF=
"gettext_toc.html">table of contents
</A>.
17 <H1><A NAME=
"SEC13" HREF=
"gettext_toc.html#TOC13">Preparing Program Sources
</A></H1>
20 For the programmer, changes to the C source code fall into three
21 categories. First, you have to make the localization functions
22 known to all modules needing message translation. Second, you should
23 properly trigger the operation of GNU
<CODE>gettext
</CODE> when the program
24 initializes, usually from the
<CODE>main
</CODE> function. Last, you should
25 identify and especially mark all constant strings in your program
30 Presuming that your set of programs, or package, has been adjusted
31 so all needed GNU
<CODE>gettext
</CODE> files are available, and your
32 <TT>`Makefile'
</TT> files are adjusted (see section
<A HREF=
"gettext_10.html#SEC67">The Maintainer's View
</A>), each C module
33 having translated C strings should contain the line:
38 #include
<libintl.h
>
42 The remaining changes to your C sources are discussed in the further
43 sections of this chapter.
49 <H2><A NAME=
"SEC14" HREF=
"gettext_toc.html#TOC14">Triggering
<CODE>gettext
</CODE> Operations
</A></H2>
52 The initialization of locale data should be done with more or less
53 the same code in every program, as demonstrated below:
64 setlocale (LC_ALL, "");
65 bindtextdomain (PACKAGE, LOCALEDIR);
72 <VAR>PACKAGE
</VAR> and
<VAR>LOCALEDIR
</VAR> should be provided either by
73 <TT>`config.h'
</TT> or by the Makefile. For now consult the
<CODE>gettext
</CODE>
74 sources for more information.
78 The use of
<CODE>LC_ALL
</CODE> might not be appropriate for you.
79 <CODE>LC_ALL
</CODE> includes all locale categories and especially
80 <CODE>LC_CTYPE
</CODE>. This later category is responsible for determining
81 character classes with the
<CODE>isalnum
</CODE> etc. functions from
82 <TT>`ctype.h'
</TT> which could especially for programs, which process some
83 kind of input language, be wrong. For example this would mean that a
84 source code using the @,{c} (c-cedilla character) is runnable in
85 France but not in the U.S.
89 Some systems also have problems with parsing number using the
90 <CODE>scanf
</CODE> functions if an other but the
<CODE>LC_ALL
</CODE> locale is used.
91 The standards say that additional formats but the one known in the
92 <CODE>"C"</CODE> locale might be recognized. But some systems seem to reject
93 numbers in the
<CODE>"C"</CODE> locale format. In some situation, it might
94 also be a problem with the notation itself which makes it impossible to
95 recognize whether the number is in the
<CODE>"C"</CODE> locale or the local
96 format. This can happen if thousands separator characters are used.
97 Some locales define this character according to the national
98 conventions to
<CODE>'.'
</CODE> which is the same character used in the
99 <CODE>"C"</CODE> locale to denote the decimal point.
103 So it is sometimes necessary to replace the
<CODE>LC_ALL
</CODE> line in the
104 code above by a sequence of
<CODE>setlocale
</CODE> lines
111 setlocale (LC_TIME, "");
112 setlocale (LC_MESSAGES, "");
118 or to switch for and back to the character class in question. On all
119 POSIX conformant systems the locale categories
<CODE>LC_CTYPE
</CODE>,
120 <CODE>LC_COLLATE
</CODE>,
<CODE>LC_MONETARY
</CODE>,
<CODE>LC_NUMERIC
</CODE>, and
121 <CODE>LC_TIME
</CODE> are available. On some modern systems there is also a
122 locale
<CODE>LC_MESSAGES
</CODE> which is called on some old, XPG2 compliant
123 systems
<CODE>LC_RESPONSES
</CODE>.
128 <H2><A NAME=
"SEC15" HREF=
"gettext_toc.html#TOC15">How Marks Appears in Sources
</A></H2>
131 All strings requiring translation should be marked in the C sources. Marking
132 is done in such a way that each translatable string appears to be
133 the sole argument of some function or preprocessor macro. There are
134 only a few such possible functions or macros meant for translation,
135 and their names are said to be marking keywords. The marking is
136 attached to strings themselves, rather than to what we do with them.
137 This approach has more uses. A blatant example is an error message
138 produced by formatting. The format string needs translation, as
139 well as some strings inserted through some
<SAMP>`%s'
</SAMP> specification
140 in the format, while the result from
<CODE>sprintf
</CODE> may have so many
141 different instances that it is impractical to list them all in some
142 <SAMP>`error_string_out()'
</SAMP> routine, say.
146 This marking operation has two goals. The first goal of marking
147 is for triggering the retrieval of the translation, at run time.
148 The keyword are possibly resolved into a routine able to dynamically
149 return the proper translation, as far as possible or wanted, for the
150 argument string. Most localizable strings are found in executable
151 positions, that is, attached to variables or given as parameters to
152 functions. But this is not universal usage, and some translatable
153 strings appear in structured initializations. See section
<A HREF=
"gettext_3.html#SEC18">Special Cases of Translatable Strings
</A>.
157 The second goal of the marking operation is to help
<CODE>xgettext
</CODE>
158 at properly extracting all translatable strings when it scans a set
159 of program sources and produces PO file templates.
163 The canonical keyword for marking translatable strings is
164 <SAMP>`gettext'
</SAMP>, it gave its name to the whole GNU
<CODE>gettext
</CODE>
165 package. For packages making only light use of the
<SAMP>`gettext'
</SAMP>
166 keyword, macro or function, it is easily used
<EM>as is
</EM>. However,
167 for packages using the
<CODE>gettext
</CODE> interface more heavily, it
168 is usually more convenient to give the main keyword a shorter, less
169 obtrusive name. Indeed, the keyword might appear on a lot of strings
170 all over the package, and programmers usually do not want nor need
171 their program sources to remind them forcefully, all the time, that they
172 are internationalized. Further, a long keyword has the disadvantage
173 of using more horizontal space, forcing more indentation work on
174 sources for those trying to keep them within
79 or
80 columns.
178 Many packages use
<SAMP>`_'
</SAMP> (a simple underline) as a keyword,
179 and write
<SAMP>`_("Translatable string")'
</SAMP> instead of
<SAMP>`gettext
180 ("Translatable string")'
</SAMP>. Further, the coding rule, from GNU standards,
181 wanting that there is a space between the keyword and the opening
182 parenthesis is relaxed, in practice, for this particular usage.
183 So, the textual overhead per translatable string is reduced to
184 only three characters: the underline and the two parentheses.
185 However, even if GNU
<CODE>gettext
</CODE> uses this convention internally,
186 it does not offer it officially. The real, genuine keyword is truly
187 <SAMP>`gettext'
</SAMP> indeed. It is fairly easy for those wanting to use
188 <SAMP>`_'
</SAMP> instead of
<SAMP>`gettext'
</SAMP> to declare:
193 #include
<libintl.h
>
194 #define _(String) gettext (String)
198 instead of merely using
<SAMP>`#include
<libintl.h
>'
</SAMP>.
202 Later on, the maintenance is relatively easy. If, as a programmer,
203 you add or modify a string, you will have to ask yourself if the
204 new or altered string requires translation, and include it within
205 <SAMP>`_()'
</SAMP> if you think it should be translated.
<SAMP>`"%s: %d"'
</SAMP> is
206 an example of string
<EM>not
</EM> requiring translation!
211 <H2><A NAME=
"SEC16" HREF=
"gettext_toc.html#TOC16">Marking Translatable Strings
</A></H2>
214 In PO mode, one set of features is meant more for the programmer than
215 for the translator, and allows him to interactively mark which strings,
216 in a set of program sources, are translatable, and which are not.
217 Even if it is a fairly easy job for a programmer to find and mark
218 such strings by other means, using any editor of his choice, PO mode
219 makes this work more comfortable. Further, this gives translators
220 who feel a little like programmers, or programmers who feel a little
221 like translators, a tool letting them work at marking translatable
222 strings in the program sources, while simultaneously producing a set of
223 translation in some language, for the package being internationalized.
227 The set of program sources, targetted by the PO mode commands describe
228 here, should have an Emacs tags table constructed for your project,
229 prior to using these PO file commands. This is easy to do. In any
230 shell window, change the directory to the root of your project, then
231 execute a command resembling:
236 etags src/*.[hc] lib/*.[hc]
240 presuming here you want to process all
<TT>`.h'
</TT> and
<TT>`.c'
</TT> files
241 from the
<TT>`src/'
</TT> and
<TT>`lib/'
</TT> directories. This command will
242 explore all said files and create a
<TT>`TAGS'
</TT> file in your root
243 directory, somewhat summarizing the contents using a special file
244 format Emacs can understand.
248 For packages following the GNU coding standards, there is
249 a make goal
<CODE>tags
</CODE> or
<CODE>TAGS
</CODE> which construct the tag files in
250 all directories and for all files containing source code.
254 Once your
<TT>`TAGS'
</TT> file is ready, the following commands assist
255 the programmer at marking translatable strings in his set of sources.
256 But these commands are necessarily driven from within a PO file
257 window, and it is likely that you do not even have such a PO file yet.
258 This is not a problem at all, as you may safely open a new, empty PO
259 file, mainly for using these commands. This empty PO file will slowly
260 fill in while you mark strings as translatable in your program sources.
267 Search through program sources for a string which looks like a
268 candidate for translation.
272 Mark the last string found with
<SAMP>`_()'
</SAMP>.
276 Mark the last string found with a keyword taken from a set of possible
277 keywords. This command with a prefix allows some management of these
283 The
<KBD>,
</KBD> (
<CODE>po-tags-search
</CODE>) command search for the next
284 occurrence of a string which looks like a possible candidate for
285 translation, and displays the program source in another Emacs window,
286 positioned in such a way that the string is near the top of this other
287 window. If the string is too big to fit whole in this window, it is
288 positioned so only its end is shown. In any case, the cursor
289 is left in the PO file window. If the shown string would be better
290 presented differently in different native languages, you may mark it
291 using
<KBD>M-,
</KBD> or
<KBD>M-.
</KBD>. Otherwise, you might rather ignore it
292 and skip to the next string by merely repeating the
<KBD>,
</KBD> command.
296 A string is a good candidate for translation if it contains a sequence
297 of three or more letters. A string containing at most two letters in
298 a row will be considered as a candidate if it has more letters than
299 non-letters. The command disregards strings containing no letters,
300 or isolated letters only. It also disregards strings within comments,
301 or strings already marked with some keyword PO mode knows (see below).
305 If you have never told Emacs about some
<TT>`TAGS'
</TT> file to use, the
306 command will request that you specify one from the minibuffer, the
307 first time you use the command. You may later change your
<TT>`TAGS'
</TT>
308 file by using the regular Emacs command
<KBD>M-x visit-tags-table
</KBD>,
309 which will ask you to name the precise
<TT>`TAGS'
</TT> file you want
310 to use. See section `Tag Tables' in
<CITE>The Emacs Editor
</CITE>.
314 Each time you use the
<KBD>,
</KBD> command, the search resumes from where it was
315 left by the previous search, and goes through all program sources,
316 obeying the
<TT>`TAGS'
</TT> file, until all sources have been processed.
317 However, by giving a prefix argument to the command (
<KBD>C-u
318 ,)
</KBD>, you may request that the search be restarted all over again
319 from the first program source; but in this case, strings that you
320 recently marked as translatable will be automatically skipped.
324 Using this
<KBD>,
</KBD> command does not prevent using of other regular
325 Emacs tags commands. For example, regular
<CODE>tags-search
</CODE> or
326 <CODE>tags-query-replace
</CODE> commands may be used without disrupting the
327 independent
<KBD>,
</KBD> search sequence. However, as implemented, the
328 <EM>initial
</EM> <KBD>,
</KBD> command (or the
<KBD>,
</KBD> command is used with a
329 prefix) might also reinitialize the regular Emacs tags searching to the
330 first tags file, this reinitialization might be considered spurious.
334 The
<KBD>M-,
</KBD> (
<CODE>po-mark-translatable
</CODE>) command will mark the
335 recently found string with the
<SAMP>`_'
</SAMP> keyword. The
<KBD>M-.
</KBD>
336 (
<CODE>po-select-mark-and-mark
</CODE>) command will request that you type
337 one keyword from the minibuffer and use that keyword for marking
338 the string. Both commands will automatically create a new PO file
339 untranslated entry for the string being marked, and make it the
340 current entry (making it easy for you to immediately proceed to its
341 translation, if you feel like doing it right away). It is possible
342 that the modifications made to the program source by
<KBD>M-,
</KBD> or
343 <KBD>M-.
</KBD> render some source line longer than
80 columns, forcing you
344 to break and re-indent this line differently. You may use the
<KBD>O
</KBD>
345 command from PO mode, or any other window changing command from
346 GNU Emacs, to break out into the program source window, and do any
347 needed adjustments. You will have to use some regular Emacs command
348 to return the cursor to the PO file window, if you want command
349 <KBD>,
</KBD> for the next string, say.
353 The
<KBD>M-.
</KBD> command has a few built-in speedups, so you do not
354 have to explicitly type all keywords all the time. The first such
355 speedup is that you are presented with a
<EM>preferred
</EM> keyword,
356 which you may accept by merely typing
<KBD><KBD>RET
</KBD></KBD> at the prompt.
357 The second speedup is that you may type any non-ambiguous prefix of the
358 keyword you really mean, and the command will complete it automatically
359 for you. This also means that PO mode has to
<EM>know
</EM> all
360 your possible keywords, and that it will not accept mistyped keywords.
364 If you reply
<KBD>?
</KBD> to the keyword request, the command gives a
365 list of all known keywords, from which you may choose. When the
366 command is prefixed by an argument (
<KBD>C-u M-.
</KBD>), it inhibits
367 updating any program source or PO file buffer, and does some simple
368 keyword management instead. In this case, the command asks for a
369 keyword, written in full, which becomes a new allowed keyword for
370 later
<KBD>M-.
</KBD> commands. Moreover, this new keyword automatically
371 becomes the
<EM>preferred
</EM> keyword for later commands. By typing
372 an already known keyword in response to
<KBD>C-u M-.
</KBD>, one merely
373 changes the
<EM>preferred
</EM> keyword and does nothing more.
377 All keywords known for
<KBD>M-.
</KBD> are recognized by the
<KBD>,
</KBD> command
378 when scanning for strings, and strings already marked by any of those
379 known keywords are automatically skipped. If many PO files are opened
380 simultaneously, each one has its own independent set of known keywords.
381 There is no provision in PO mode, currently, for deleting a known
382 keyword, you have to quit the file (maybe using
<KBD>q
</KBD>) and reopen
383 it afresh. When a PO file is newly brought up in an Emacs window, only
384 <SAMP>`gettext'
</SAMP> and
<SAMP>`_'
</SAMP> are known as keywords, and
<SAMP>`gettext'
</SAMP>
385 is preferred for the
<KBD>M-.
</KBD> command. In fact, this is not useful to
386 prefer
<SAMP>`_'
</SAMP>, as this one is already built in the
<KBD>M-,
</KBD> command.
391 <H2><A NAME=
"SEC17" HREF=
"gettext_toc.html#TOC17">Special Comments preceding Keywords
</A></H2>
394 In C programs strings are often used within calls of functions from the
395 <CODE>printf
</CODE> family. The special thing about these format strings is
396 that they can contain format specifiers introduced with
<KBD>%
</KBD>. Assume
402 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
406 A possible German translation for the above string might be:
411 "%d Zeichen lang ist die Zeichenkette `%s'"
415 A C programmer, even if he cannot speak German, will recognize that
416 there is something wrong here. The order of the two format specifiers
417 is changed but of course the arguments in the
<CODE>printf
</CODE> don't have.
418 This will most probably lead to problems because now the length of the
419 string is regarded as the address.
423 To prevent errors at runtime caused by translations the
<CODE>msgfmt
</CODE>
424 tool can check statically whether the arguments in the original and the
425 translation string match in type and number. If this is not the case a
426 warning will be given and the error cannot causes problems at runtime.
430 If the word order in the above German translation would be correct one
436 "%
2$d Zeichen lang ist die Zeichenkette `%
1$s'"
440 The routines in
<CODE>msgfmt
</CODE> know about this special notation.
444 Because not all strings in a program must be format strings it is not
445 useful for
<CODE>msgfmt
</CODE> to test all the strings in the
<TT>`.po'
</TT> file.
446 This might cause problems because the string might contain what looks
447 like a format specifier, but the string is not used in
<CODE>printf
</CODE>.
451 Therefore the
<CODE>xgettext
</CODE> adds a special tag to those messages it
452 thinks might be a format string. There is no absolute rule for this,
453 only a heuristic. In the
<TT>`.po'
</TT> file the entry is marked using the
454 <CODE>c-format
</CODE> flag in the
<KBD>#,
</KBD> comment line (see section
<A HREF=
"gettext_2.html#SEC9">The Format of PO Files
</A>).
458 The careful reader now might say that this again can cause problems.
459 The heuristic might guess it wrong. This is true and therefore
460 <CODE>xgettext
</CODE> knows about special kind of comment which lets
461 the programmer take over the decision. If in the same line or
462 the immediately preceding line of the
<CODE>gettext
</CODE> keyword
463 the
<CODE>xgettext
</CODE> program find a comment containing the words
464 <KBD>xgettext:c-format
</KBD> it will mark the string in any case with
465 the
<KBD>c-format
</KBD> flag. This kind of comment should be used when
466 <CODE>xgettext
</CODE> does not recognize the string as a format string but
467 is really is one and it should be tested. Please note that when the
468 comment is in the same line of the
<CODE>gettext
</CODE> keyword, it must be
469 before the string to be translated.
473 This situation happens quite often. The
<CODE>printf
</CODE> function is often
474 called with strings which do not contain a format specifier. Of course
475 one would normally use
<CODE>fputs
</CODE> but it does happen. In this case
476 <CODE>xgettext
</CODE> does not recognize this as a format string but what
477 happens if the translation introduces a valid format specifier? The
478 <CODE>printf
</CODE> function will try to access one of the parameter but none
479 exists because the original code does not refer to any parameter.
483 <CODE>xgettext
</CODE> of course could make a wrong decision the other way
484 round. A string marked as a format string is not really a format
485 string. In this case the
<CODE>msgfmt
</CODE> might give too many warnings and
486 would prevent translating the
<TT>`.po'
</TT> file. The method to prevent
487 this wrong decision is similar to the one used above, only the comment
488 to use must contain the string
<KBD>xgettext:no-c-format
</KBD>.
492 If a string is marked with
<KBD>c-format
</KBD> and this is not correct the
493 user can find out who is responsible for the decision. See section
<A HREF=
"gettext_4.html#SEC20">Invoking the
<CODE>xgettext
</CODE> Program
</A> to see how the
<KBD>--debug
</KBD> option can be used for solving
499 <H2><A NAME=
"SEC18" HREF=
"gettext_toc.html#TOC18">Special Cases of Translatable Strings
</A></H2>
502 The attentive reader might now point out that it is not always possible
503 to mark translatable string with
<CODE>gettext
</CODE> or something like this.
504 Consider the following case:
510 static const char *messages[] = {
511 "some very meaningful message",
517 = index
> 1 ? "a default message" : messages[index];
525 While it is no problem to mark the string
<CODE>"a default message"</CODE> it
526 is not possible to mark the string initializers for
<CODE>messages
</CODE>.
527 What is to be done? We have to fulfil two tasks. First we have to mark the
528 strings so that the
<CODE>xgettext
</CODE> program (see section
<A HREF=
"gettext_4.html#SEC20">Invoking the
<CODE>xgettext
</CODE> Program
</A>)
529 can find them, and second we have to translate the string at runtime
530 before printing them.
534 The first task can be fulfilled by creating a new keyword, which names a
535 no-op. For the second we have to mark all access points to a string
536 from the array. So one solution can look like this:
541 #define gettext_noop(String) (String)
544 static const char *messages[] = {
545 gettext_noop ("some very meaningful message"),
546 gettext_noop ("and another one")
551 = index
> 1 ? gettext ("a default message") : gettext (messages[index]);
559 Please convince yourself that the string which is written by
560 <CODE>fputs
</CODE> is translated in any case. How to get
<CODE>xgettext
</CODE> know
561 the additional keyword
<CODE>gettext_noop
</CODE> is explained in section
<A HREF=
"gettext_4.html#SEC20">Invoking the
<CODE>xgettext
</CODE> Program
</A>.
565 The above is of course not the only solution. You could also come along
566 with the following one:
571 #define gettext_noop(String) (String)
574 static const char *messages[] = {
575 gettext_noop ("some very meaningful message",
576 gettext_noop ("and another one")
581 = index
> 1 ? gettext_noop ("a default message") : messages[index];
583 fputs (gettext (string));
589 But this has some drawbacks. First the programmer has to take care that
590 he uses
<CODE>gettext_noop
</CODE> for the string
<CODE>"a default message"</CODE>.
591 A use of
<CODE>gettext
</CODE> could have in rare cases unpredictable results.
592 The second reason is found in the internals of the GNU
<CODE>gettext
</CODE>
593 Library which will make this solution less efficient.
597 One advantage is that you need not make control flow analysis to make
598 sure the output is really translated in any case. But this analysis is
599 generally not very difficult. If it should be in any situation you can
600 use this second method in this situation.
604 <p>Go to the
<A HREF=
"gettext_1.html">first
</A>,
<A HREF=
"gettext_2.html">previous
</A>,
<A HREF=
"gettext_4.html">next
</A>,
<A HREF=
"gettext_12.html">last
</A> section,
<A HREF=
"gettext_toc.html">table of contents
</A>.