]> git.saurik.com Git - wxWidgets.git/blame - docs/html/gettext/gettext.htm
more tweaks for release
[wxWidgets.git] / docs / html / gettext / gettext.htm
CommitLineData
90e94c04
JS
1<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.51
4 from gettext.texi on 4 September 1998 -->
5
6<TITLE>GNU gettext utilities</TITLE>
7</HEAD>
8<BODY>
9<H1>GNU gettext tools, version 0.10</H1>
10<H2>Native Language Support Library and Tools</H2>
11<H2>Edition 0.10, 26 November</H2>
12<ADDRESS>Ulrich Drepper</ADDRESS>
13<ADDRESS>Jim Meyering</ADDRESS>
14<ADDRESS>Pinard</ADDRESS>
15<P>
16<P><HR><P>
17
18<P>
19Copyright (C) 1995 Free Software Foundation, Inc.
20
21</P>
22<P>
23Permission is granted to make and distribute verbatim copies of
24this manual provided the copyright notice and this permission notice
25are preserved on all copies.
26
27</P>
28<P>
29Permission is granted to copy and distribute modified versions of this
30manual under the conditions for verbatim copying, provided that the entire
31resulting derived work is distributed under the terms of a permission
32notice identical to this one.
33
34</P>
35<P>
36Permission is granted to copy and distribute translations of this manual
37into another language, under the above conditions for modified versions,
38except that this permission notice may be stated in a translation approved
39by the Foundation.
40
41</P>
42
43
44
45<H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">Introduction</A></H1>
46
47
48<BLOCKQUOTE>
49<P>
50This manual is still in <EM>DRAFT</EM> state. Some sections are still
51empty, or almost. We keep merging material from other sources
52(essentially email folders) while the proper integration of this
53material is delayed.
54</BLOCKQUOTE>
55
56<P>
57In this manual, we use <EM>he</EM> when speaking of the programmer or
58maintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM>
59when speaking of the installers or end users of the translated program.
60This is only a convenience for clarifying the documentation. It is
61absolutely not meant to imply that some roles are more appropriate
62to males or females. Besides, as you might guess, GNU <CODE>gettext</CODE>
63is meant to be useful for people using computers, whatever their sex,
64race, religion or nationality!
65
66</P>
67<P>
68This chapter explains what are the goals seeked by the mere existence
69of GNU <CODE>gettext</CODE>. Then, it explains a few wide concepts around
70Native Language Support, and situates message translation in regard
71to other aspects of national and cultural variance, as applicable
72to programs. It also surveys what are those files used to convey
73translations. It explains how the various tools interrelate in the
74initial generation for these files, and later, how the maintenance
75cycle usually operate.
76
77</P>
78
79
80
81<H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">The Purpose of GNU <CODE>gettext</CODE></A></H2>
82
83<P>
84Usually, programs are written and documented in English, and use
85English at execution time for interacting with users. This is true
86not only from within GNU, but also in a great deal of commercial
87and free software. Using a common language is quite handy for
88communication between developers, maintainers and users from all
89countries. On the other hand, most people are less comfortable with
90English than with their own native language, and would rather prefer
91using their mother tongue for day to day's work, as far as possible.
92Many would simply <EM>love</EM> seeing their computer screen showing
93a lot less of English, and far more of their own spoken language.
94
95</P>
96<P>
97However, to some people, this dream might appear so far fetched that
98they may believe it is not even worth spending time thinking about
99it, and they have no confidence at all that the dream might ever
100become true. Many did not loose hope yet, and organized themselves.
101The GNU Translation Project is a formalization of this hope into a
102workable structure, which has a good chance to get all of us nearer
103the achievement of a truly multi-lingual set of programs.
104
105</P>
106<P>
107GNU <CODE>gettext</CODE> is an important step for the GNU Translation
108Project, as it is an asset on which we may build many other steps.
109This package offers to programmers, translators and even users, a
110well integrated set of tools and documentation. Specifically, the GNU
111<CODE>gettext</CODE> utilities are a set of tools that provides a framework
112to help other GNU packages produce multi-lingual messages. These tools
113include a set of conventions about how programs should be written to
114support message catalogs, a directory and file naming organization
115for the message catalogs themselves, a runtime library supporting the
116retrieval of translated messages, and a few stand-alone programs to
117massage in various ways the sets of translatable strings, or already
118translated strings. A special GNU Emacs mode also helps interested
119parties into preparing these sets, or bringing them up to date.
120
121</P>
122<P>
123GNU <CODE>gettext</CODE> is designed so it minimizes the impact of
124internationalization on program sources, keeping this impact as small
125and hardly noticeable as possible. Internationalization has better
126chances of succeeding if it is very light weighted, or at least,
127appear to be so, when looking at program sources.
128
129</P>
130<P>
131The GNU Translation Project also uses the GNU <CODE>gettext</CODE>
132distribution as a vehicle for documenting its structure and methods,
133even if this goes beyond the technicalities of the GNU <CODE>gettext</CODE>
134proper. By doing so, translators will find in a single place, as
135far as possible, all they need to know for properly doing their
136translating work. Also, this supplementary documentation might also
137help programmers, and even curious users, at understanding how GNU
138<CODE>gettext</CODE> is related to the remainder of the GNU Translation
139Project, and consequently, have a glimpse at the <EM>big picture</EM>.
140
141</P>
142
143
144<H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">I18n, L10n, and Such</A></H2>
145
146<P>
147Two long words appear all the time when we discuss support of native
148language in programs, and these words have a precise meaning, worth
149being explained here, once and for all in this document. The words are
150<EM>internationalization</EM> and <EM>localization</EM>. Many people,
151tired of writing these long words over and over again, took the
152habit of writing <STRONG>i18n</STRONG> and <STRONG>l10n</STRONG> instead, quoting the first
153and last letter of each word, and replacing the run of intermediate
154letters by a number merely telling how many such letters there are.
155But in this manual, in the sake of clarity, we will patiently write
156the names in full, each time...
157
158</P>
159<P>
160By <STRONG>internationalization</STRONG>, one refers to the operation by which a
161program, or a set of programs turned into a package, is made aware and
162able to support multiple languages. This is a generalization process,
163by which the programs are untied from using only English strings or
164other English specific habits, and connected to generic ways of doing
165the same, instead. Program developers may use various techniques to
166internationalize their programs, some of them have been standardized.
167GNU <CODE>gettext</CODE> offers one of these standards. See section <A HREF="gettext.html#SEC36">The Programmer's View</A>.
168
169</P>
170<P>
171By <STRONG>localization</STRONG>, one means the operation by which, in a set
172of programs already internationalized, one gives the program all
173needed information so that it can bend itself to handle its input
174and output in a fashion which is correct for some native language and
175cultural habits. This is a particularisation process, by which generic
176methods already implemented in an internationalized program are used
177in specific ways. The programming environment puts several functions
178to the programmers disposal which allow this runtime configuration.
179The formal description of specific set of cultural habits for some
180country, together with all associated translations targeted to the
181same native language, is called the <STRONG>locale</STRONG> for this language
182or country. Users achieve localization of programs by setting proper
183values to special environment variables, prior to executing those
184programs, identifying which locale should be used.
185
186</P>
187<P>
188In fact, locale message support is only one component of the cultural
189data that makes up a particular locale. There are a whole host of
190routines and functions provided to aid programmers in developing
191internationalized software and which allows them to access the data
192stored in a particular locale. When someone presently refers to a
193particular locale, they are obviously referring to the data stored
194within that particular locale. Similarly, if a programmer is referring
195to "accessing the locale routines", they are referring to the
196complete suite of routines that access all of the locale's information.
197
198</P>
199<P>
200One uses the expression <STRONG>Native Language Support</STRONG>, or merely NLS,
201for speaking of the overall activity or feature encompassing both
202internationalization and localization, allowing for multi-lingual
203interactions in a program. In a nutshell, one could say that
204internationalization is the operation by which further localizations
205are made possible.
206
207</P>
208<P>
209Also, very roughly said, when it comes to multi-lingual messages,
210internationalization is usually taken care of by programmers, and
211localization is usually taken care of by translators.
212
213</P>
214
215
216<H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">Aspects in Native Language Support</A></H2>
217
218<P>
219For a totally multi-lingual distribution, there are many things to
220translate beyond output messages.
221
222</P>
223
224<UL>
225<LI>
226
227As of today, GNU <CODE>gettext</CODE> offers a complete toolset for
228translating messages output by C programs. Perl scripts and shell
229scripts also need to be translated. Even if there are some hooks
230so this can be done, these hooks are not integrated as well as they
231should be.
232
233<LI>
234
235Some programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able
236to produce other programs (or scripts). Even if the generating
237programs themselves are internationalized, the generated programs they
238produce may need internationalization on their own, and this indirect
239internationalization could be automated right from the generating
240program. In fact, quite usually, generating and generated programs
241could be internationalized independently, as the effort needed is
242fairly orthogonal.
243
244<LI>
245
246A few programs include textual tables which might need translation
247themselves, independently of the strings contained in the program
248itself. For example, RFC 1345 gives an English description for each
249character which GNU <CODE>recode</CODE> is able to reconstruct at execution.
250Since these descriptions are extracted from the RFC by mechanical means,
251translating them properly would require a prior translation of the RFC
252itself.
253
254<LI>
255
256Almost all programs accept options, which are often worded out so to
257be descriptive for the English readers; one might want to consider
258offering translated versions for program options as well.
259
260<LI>
261
262Many programs read, interpret, compile, or are somewhat driven by
263input files which are texts containing keywords, identifiers, or
264replies which are inherently translatable. For example, one may want
265<CODE>gcc</CODE> to allow diacriticized characters in identifiers or use
266translated keywords; <SAMP>`rm -i'</SAMP> might accept something else than
267<SAMP>`y'</SAMP> or <SAMP>`n'</SAMP> for replies, etc. Even if the program will
268eventually make most of its output in the foreign languages, one has
269to decide whether the input syntax, option values, etc., are to be
270localized or not.
271
272<LI>
273
274The manual accompanying a package, as well as all documentation files
275in the distribution, could surely be translated, too. Translating a
276manual, with the intent of later keeping up with updates, is a major
277undertaking in itself, generally.
278
279</UL>
280
281<P>
282As we already stressed, translation is only one aspect of locales.
283Other internationalization aspects are not currently handled by GNU
284<CODE>gettext</CODE>, but perhaps may be handled in future versions. There
285are many attributes that are needed to define a country's cultural
286conventions. These attributes include beside the country's native
287language, the formatting of the date and time, the representation of
288numbers, the symbols for currency, etc. These local <STRONG>rules</STRONG> are
289termed the country's locale. The locale represents the knowledge
290needed to support the country's native attributes.
291
292</P>
293<P>
294There are a few major areas which may vary between countries and
295hence, define what a locale must describe. The following list helps
296putting multi-lingual messages into the proper context of other tasks
297related to locales, and also presents some other areas which GNU
298<CODE>gettext</CODE> might eventually tackle, maybe, one of these days.
299
300</P>
301<DL COMPACT>
302
303<DT><EM>Characters and Codesets</EM>
304<DD>
305The codeset most commonly used through out the USA and most English
306speaking parts of the world is the ASCII codeset. However, there are
307many characters needed by various locales that are not found within
308this codeset. The 8-bit ISO 8859-1 code set has most of the special
309characters needed to handle the major European languages. However, in
310many cases, the ISO 8859-1 font is not adequate. Hence each locale
311will need to specify which codeset they need to use and will need
312to have the appropriate character handling routines to cope with
313the codeset.
314
315<DT><EM>Currency</EM>
316<DD>
317The symbols used vary from country to country as does the position
318used by the symbol. Software needs to be able to transparently
319display currency figures in the native mode for each locale.
320
321<DT><EM>Dates</EM>
322<DD>
323The format of date varies between locales. For example, Christmas day
324in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
325Other countries might use ISO 8061 dates, etc.
326
327Time of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>,
328or otherwise. Some locales require time to be specified in 24-hour
329mode rather than as AM or PM. Further, the nature and yearly extent
330of the Daylight Saving correction vary widely between countries.
331
332<DT><EM>Numbers</EM>
333<DD>
334Numbers can be represented differently in different locales.
335For example, the following numbers are all written correctly for
336their respective locales:
337
338
339<PRE>
34012,345.67 English
34112.345,67 French
3421,2345.67 Asia
343</PRE>
344
345Some programs could go further and use different unit systems, like
346English units or Metric units, or even take into account variants
347about how numbers are spelled in full.
348
349<DT><EM>Messages</EM>
350<DD>
351The most obvious area is the language support within a locale. This is
352where GNU <CODE>gettext</CODE> provide an ease for developers and users to
353easily change the language that the software uses to communicate to
354the user.
355
356</DL>
357
358<P>
359In the near future we see no chance that beside message handling
360more components of locale will be made available for use in other
361GNU packages. The reason for this is that most modern system provide
362a more or less reasonable support for at least some of the missing
363components. Another point is that the GNU libc and Linux will get
364a new and complete implementation of the whole locale functionality
365which could be adopted by system lacking a reasonable locale support.
366
367</P>
368
369
370<H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">Files Conveying Translations</A></H2>
371
372<P>
373The letters PO in <TT>`.po'</TT> files means Portable Object, to
374distinguish it from <TT>`.mo'</TT> files, where MO stands for Machine
375Object. This paradigm, as well as the PO file format, is inspired
376by the NLS standard developed by Uniforum, and implemented by Sun
377in their Solaris system.
378
379</P>
380<P>
381PO files are meant to be read and edited by humans, and associate each
382original, translatable string of a given package with its translation
383in a particular target language. A single PO file is dedicated to
384a single target language. If a package supports many languages,
385there is one such PO file per language supported, and each package
386has its own set of PO files. These PO files are best created by
387the <CODE>xgettext</CODE> program, and later updated or refreshed through
388the <CODE>tupdate</CODE> program. Program <CODE>xgettext</CODE> extracts all
389marked messages from a set of C files and initializes a PO file with
390empty translations. Program <CODE>tupdate</CODE> takes care of adjusting
391PO files between releases of the corresponding sources, commenting
392obsolete entries, initializing new ones, and updating all source
393line references. Files ending with <TT>`.pot'</TT> are kind of base
394translation files found in distributions, in PO file format, and
395<TT>`.pox'</TT> files are often temporary PO files.
396
397</P>
398<P>
399MO files are meant to be read by programs, and are binary in nature.
400A few systems already offer tools for creating and handling MO files
401as part of the Native Language Support coming with the system, but the
402format of these MO files is often different from system to system,
403and non-portable. They do not necessary use <TT>`.mo'</TT> for file
404extensions, but since system libraries are also used for accessing
405these files, it works as long as the system is self-consistent about
406it. If GNU <CODE>gettext</CODE> is able to interface with the tools already
407provided with systems, it will consequently let these provided tools
408take care of generating the MO files. Or else, if such tools are not
409found or do not seem usable, GNU <CODE>gettext</CODE> will use its own ways
410and its own format for MO files. Files ending with <TT>`.gmo'</TT> are
411really MO files, when it is known that these files use the GNU format.
412
413</P>
414
415
416<H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">Overview of GNU <CODE>gettext</CODE></A></H2>
417
418<P>
419The following diagram summarizes the relation between the files
420handled by GNU <CODE>gettext</CODE> and the tools acting on these files.
421It is followed by a somewhat detailed explanations, which you should
422read while keeping an eye on the diagram. Having a clear understanding
423of these interrelations would surely help programmers, translators
424and maintainers.
425
426</P>
427
428<PRE>
429Original C Sources ---&#62; PO mode ---&#62; Marked C Sources ---.
430 |
431 .---------&#60;--- GNU gettext Library |
432.--- make &#60;---+ |
433| `---------&#60;--------------------+-----------'
434| |
435| .-----&#60;--- PACKAGE.pot &#60;--- xgettext &#60;---' .---&#60;--- PO Compendium
436| | | ^
437| | `---. |
438| `---. +---&#62; PO mode ---.
439| +----&#62; tupdate -------&#62; LANG.pox ---&#62;--------' |
440| .---' |
441| | |
442| `-------------&#60;---------------. |
443| +--- LANG.po &#60;--- New LANG.pox &#60;----'
444| .--- LANG.gmo &#60;--- msgfmt &#60;---'
445| |
446| `---&#62; install ---&#62; /.../LANG/PACKAGE.mo ---.
447| +---&#62; "Hello world!"
448`-------&#62; install ---&#62; /.../bin/PROGRAM -------'
449</PRE>
450
451<P>
452The indication <SAMP>`PO mode'</SAMP> appears in two places in this picture,
453and you may safely read it as merely meaning "hand editing", using
454any editor of your choice, really. However, for those of you being
455the lucky users of GNU Emacs, PO mode has been specifically created
456for providing a cosy environment for editing or modifying PO files.
457While editing a PO file, PO mode allows for the easy browsing of
458auxiliary and compendium PO files, as well as following references into
459the set of C program sources from which PO files has been derived.
460It has a few special features, among which the interactive marking
461of program strings as translatable, and the validatation of PO files
462with easy repositioning to PO file lines showing errors.
463
464</P>
465<P>
466As a programmer, the first step into bringing GNU <CODE>gettext</CODE>
467into your package is identifying, right in the C sources, which
468strings are meant to be translatable, and which are untranslatable.
469This tedious job can be done a little more comfortably using PO
470mode, but you can use any means being usual to you for modifying your
471C sources. Some other simple, standard changes are also needed to
472properly initialize the translation library. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>, for
473more information about all this.
474
475</P>
476<P>
477Once the C sources have been modified, the <CODE>xgettext</CODE> program
478is used to find and extract all translatable strings, and create an
479initial PO file out of all these. This <TT>`<VAR>package</VAR>.pot'</TT> file
480contains all original program strings, it has sets of pointers to
481exactly where in C sources each string is used, and all translations
482are set to empty. The letter <KBD>t</KBD> in <TT>`.pot'</TT> marks that this is
483a Template PO file, not yet oriented towards any particular language.
484See section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the
485<CODE>xgettext</CODE> program. If you are <EM>really</EM> lazy, you might
486be interested at working a lot more right away, and preparing the
487whole distribution setup (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>). By doing so, you
488spare typing the <CODE>xgettext</CODE> command yourself, as <CODE>make</CODE>
489should now generate the proper things automatically for you!
490
491</P>
492<P>
493The first time through, there is no <TT>`<VAR>lang</VAR>.po'</TT> yet, so the
494<CODE>tupdate</CODE> step may be skipped and replaced by a mere copy of
495<TT>`<VAR>package</VAR>.pot'</TT> to <TT>`<VAR>lang</VAR>.pox'</TT>, where <VAR>lang</VAR>
496represents the target language.
497
498</P>
499<P>
500Then comes the initial translation of messages. Translation in
501itself is a whole matter, still exclusively meant for humans,
502and whose complexity far overwhelms the level of this manual.
503Nevertheless, a few hints are given in some other chapter of this
504manual (see section <A HREF="gettext.html#SEC54">The Translator's View</A>). You will also find there indications
505about how to contact translating teams, or becoming part of them,
506for sharing your translating concerns with others who target the same
507native language.
508
509</P>
510<P>
511While adding the translated messages into the <TT>`<VAR>lang</VAR>.pox'</TT>
512PO file, if you do not have GNU Emacs handy, you are on your own
513for ensuring that your fully respect the PO file format, and quoting
514conventions (see section <A HREF="gettext.html#SEC9">The Format of PO Files</A>). This is surely not an impossible task,
515as this is the way many people handled PO files already for Uniforum or
516Solaris. On the other hand, using PO mode in GNU Emacs, most details
517of PO file format are taken care for you, but you have to acquire
518some familiarity with PO mode itself. Besides main PO mode commands
519(see section <A HREF="gettext.html#SEC10">Main Commands</A>), you should know how to move between entries
520(see section <A HREF="gettext.html#SEC11">Entry Positioning</A>), and how to handle untranslated entries
521(see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>).
522
523</P>
524<P>
525If some common translations have already been saved into a compendium
526PO file, translators may use PO mode for initializing untranslated
527entries from the compendium, and also save selected translations into
528the compendium, updating it (see section <A HREF="gettext.html#SEC21">Using Translation Compendiums</A>). Compendium files
529are meant to be exchanged between members of a given translation team.
530
531</P>
532<P>
533Programs, or packages of programs, are dynamic in nature: users write
534bug reports and suggestion for improvements, maintainers react by
535modifying programs in various ways. The fact that a package has
536already been internationalized should not make maintainers shy
537of adding new strings, or modifying strings already translated.
538They just do their job the best they can. For the GNU Translation
539Project to work smoothly, it is important that maintainers do not
540carry translation concerns on their already loaded shoulders, and that
541translators be kept as free as possible of programmatic concerns.
542
543</P>
544<P>
545The only concern maintainers should have is carefully marking new
546strings are translatable, when they should be, and do not otherwise
547worry about them being translated, as this will come in proper time.
548Consequently, when programs and their strings are adjusted in various
549ways by maintainers, and for matters usually unrelated to translation,
550<CODE>xgettext</CODE> would construct <TT>`<VAR>package</VAR>.pot'</TT> files which are
551evolving over time, so the translations carried by <TT>`<VAR>lang</VAR>.po'</TT>
552are slowly fading out of date.
553
554</P>
555<P>
556It is important for translators (and even maintainers) to understand
557that package translation is a continuous process in the lifetime of a
558package, and not something which is done once and for all at the start.
559After an initial burst of translation activity for a given package,
560interventions are needed once in a while, because here and there,
561translated entries become obsolete, and new untranslated entries
562appear, needing translation.
563
564</P>
565<P>
566The <CODE>tupdate</CODE> program has the purpose of refreshing an already
567existing <TT>`<VAR>lang</VAR>.po'</TT> file, by comparing it with a newer
568<TT>`<VAR>package</VAR>.pot'</TT> template file, extracted by <CODE>xgettext</CODE>
569out of recent C sources. The refreshing operation adjusts all
570references to C source locations for strings, since these strings
571move as programs are modified. Also, <CODE>tupdate</CODE> comments out as
572obsolete, in <TT>`<VAR>lang</VAR>.pox'</TT>, those already translated entries
573which are no longer used in the program sources (see section <A HREF="gettext.html#SEC25">Obsolete Entries</A>. It finally discovers new strings and insert them in
574the resulting PO file as untranslated entries (see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>. See section <A HREF="gettext.html#SEC23">Invoking the <CODE>tupdate</CODE> Program</A>, for more information about what
575<CODE>tupdate</CODE> really does.
576
577</P>
578<P>
579Whatever route or means taken, the goal is obtaining an updated
580<TT>`<VAR>lang</VAR>.pox'</TT> file offering translations for all strings.
581When this is properly achieved, this file <TT>`<VAR>lang</VAR>.pox'</TT> may
582take the place of the previous official <TT>`<VAR>lang</VAR>.po'</TT> file.
583
584</P>
585<P>
586The time mobility, or fluidity of PO files, is an integral part of
587the translation game, and should be well understood, and accepted.
588People resisting it will have a hard time participating in the GNU
589Translation Project, or will give a hard time to other participants!
590In particular, maintainers should relax and include all available PO
591files in their distributions, even if these have not recently been
592updated, without banging or otherwise trying to exert pressure on the
593translator teams to get the job done. The pressure should rather
594come from the community of users speaking a particular language,
595and maintainers should consider themselves fairly relieved of any
596concern about the adequacy of translation files. On the other hand,
597translators should reasonably try updating the PO files they are
598responsible for, while the package is undergoing pretest, prior to
599an official distribution.
600
601</P>
602<P>
603Once the PO file is complete and dependable, the <CODE>msgfmt</CODE> program
604is used for turning the PO file into a machine-oriented format, which
605may yield efficient retrieval of translations by the programs of the
606package, whenever needed at runtime (see section <A HREF="gettext.html#SEC31">The Format of GNU MO Files</A>). See section <A HREF="gettext.html#SEC30">Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modalities of execution
607for the <CODE>msgfmt</CODE> program.
608
609</P>
610<P>
611Finally, the modified and marked C sources are compiled and linked
612with the GNU <CODE>gettext</CODE> library, usually through the operation of
613<CODE>make</CODE>, given a suitable <TT>`Makefile'</TT> exists for the project,
614and the resulting executable is installed somewhere users will find it.
615The MO files themselves should also be properly installed. Given the
616appropriate environment variables are set (see section <A HREF="gettext.html#SEC35">Magic for End Users</A>), the
617program should localize itself automatically, whenever it executes.
618
619</P>
620<P>
621The remaining of this manual has the purpose of deepening the various
622steps outlined in this section.
623
624</P>
625
626
627<H1><A NAME="SEC7" HREF="gettext_toc.html#TOC7">PO Files and PO Mode Basics</A></H1>
628
629<P>
630The GNU <CODE>gettext</CODE> toolset helps programmers and translators
631at producing, updating and using translation files, mainly those
632PO files which are textual, editable files. This chapter insists
633on the format of PO files, and contains a PO mode starter. PO mode
634description is spread over this manual instead of being concentrated
635in one place, this chapter presents only the basics of PO mode.
636
637</P>
638
639
640
641<H2><A NAME="SEC8" HREF="gettext_toc.html#TOC8">Completing GNU <CODE>gettext</CODE> Installation</A></H2>
642
643<P>
644Once you have received, unpacked, configured and compiled the GNU
645<CODE>gettext</CODE> distribution, the <SAMP>`make install'</SAMP> command puts in
646place the programs <CODE>xgettext</CODE>, <CODE>msgfmt</CODE>, <CODE>gettext</CODE>, and
647<CODE>tupdate</CODE>, as well as their available message catalogs. For
648completing a comfortable installation, you might also want to make the
649PO mode available to your GNU Emacs users.
650
651</P>
652<P>
653To finish the installation of the PO mode, you might want modify your
654file <TT>`.emacs'</TT>, once and for all, so it contains a few lines looking
655like:
656
657</P>
658
659<PRE>
660(setq auto-mode-alist
661 (cons '("\\.pox?\\'" . po-mode) auto-mode-alist))
662(autoload 'po-mode "po-mode")
663</PRE>
664
665<P>
666Later, whenever you edit some <TT>`.po'</TT> or <TT>`.pox'</TT> file, Emacs
667loads <TT>`po-mode.elc'</TT> (or <TT>`po-mode.el'</TT>) as needed, and
668automatically activate PO mode commands for the associated buffer.
669The string <EM>PO</EM> appears in the mode line for any buffer for
670which PO mode is active. Many PO files may be active at once in a
671single Emacs session.
672
673</P>
674
675
676<H2><A NAME="SEC9" HREF="gettext_toc.html#TOC9">The Format of PO Files</A></H2>
677
678<P>
679A PO file is made up of many entries, each entry holding the relation
680between an original untranslated string and its corresponding
681translation. All entries in a given PO file usually pertain
682to a single project, and all translations are expressed in a single
683target language. One PO file <STRONG>entry</STRONG> has the following schematic
684structure:
685
686</P>
687
688<PRE>
689<VAR>white-space</VAR>
690# <VAR>translator-comments</VAR>
691#. <VAR>automatic-comments</VAR>
692#: <VAR>reference</VAR>...
693msgid <VAR>untranslated-string</VAR>
694msgstr <VAR>translated-string</VAR>
695</PRE>
696
697<P>
698The general structure of a PO file should be well understood by
699the translator. When using PO mode, very little has to be known
700about the format details, as PO mode takes care of them for her.
701
702</P>
703<P>
704Entries begin with some optional white space. Usually, when generated
705through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
706between entries. Then comments follow, on lines all starting with the
707character <KBD>#</KBD>. There are two kinds of comments: those which have
708some white space immediately following the <KBD>#</KBD>, which comments are
709created and maintained exclusively by the translator, and those which
710have some non-white character just after the <KBD>#</KBD>, which comments
711are created and maintained automatically by GNU <CODE>gettext</CODE> tools.
712All comments, of any kind, are optional.
713
714</P>
715<P>
716After white space and comments, entries show two strings, giving
717first the untranslated string as it appears in the original program
718sources, and then, the translation of this string. The original
719string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
720by <CODE>msgstr</CODE>. The two strings, untranslated and translated,
721are quoted in various ways in the PO file, using <KBD>"</KBD>
722delimiters and <KBD>\</KBD> escapes, but the translator does not really
723have to pay attention to the precise quoting format, as PO mode fully
724intend to take care of quoting for her.
725
726</P>
727<P>
728The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
729and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
730provide means for the translator to alter these. The most she can
731do is merely deleting them, and only by deleting the whole entry.
732On the other hand, the <CODE>msgstr</CODE> string, as well as translator
733comments, are really meant for the translator, and PO mode gives her
734the full control she needs.
735
736</P>
737<P>
738It happens that some lines, usually whitespace or comments, follow the
739very last entry of a PO file. Such lines are not part of any entry,
740and PO mode is unable to take action on those lines. By using the
741PO mode function <KBD>M-x po-normalize</KBD>, the translator may get
742rid of those spurious lines. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>.
743
744</P>
745<P>
746The remainder of this section may be safely skipped for those using
747PO mode, yet it may be interesting for everybody to have a better
748idea of the precise format of a PO file. On the other hand, those
749not having GNU Emacs handy should carefully continue reading on.
750
751</P>
752<P>
753Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
754the C syntax for a character string, including the surrounding quotes
755and imbedded backslashed escape sequences. When the time comes
756to write multi-line strings, one should not use escaped newlines.
757Instead, a closing quote should follow the last character on the
758line to be continued, and an opening quote should resume the string
759at the beginning of the following PO file line. For example:
760
761</P>
762
763<PRE>
764msgid ""
765"Here is an example of how one might continue a very long string\n"
766"for the common case the string represents multi-line output.\n"
767</PRE>
768
769<P>
770In this example, the empty string is used on the first line, for
771allowing the better alignment of the <KBD>H</KBD> from the word <SAMP>`Here'</SAMP>
772over the <KBD>f</KBD> from the word <SAMP>`for'</SAMP>. In this example, the
773<CODE>msgid</CODE> keyword is followed by three strings, which are meant
774to be concatenated. Concatenating the empty string does not change
775the resulting overall string, but it is a way for us to comply with
776the necessity of <CODE>msgid</CODE> to be followed by a string on the same
777line, while keeping the multi-line presentation left-justified, as
778we find this to be cleaner disposition. The empty string could have
779been omitted, but only if the string starting with <SAMP>`Here'</SAMP> was
780promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> It was not really necessary
781either to switch between the two last quoted strings immediately after
782the newline <SAMP>`\n'</SAMP>, the switch could have occurred after <EM>any</EM>
783other character, we just did it this way because it is neater.
784
785</P>
786<P>
787One should carefully distinguish between end of lines marked as
788<SAMP>`\n'</SAMP> <EM>inside</EM> quotes, which are part of the represented
789string, and end of lines in the PO file itself, outside string quotes,
790which have no incidence on the represented string.
791
792</P>
793<P>
794Outside strings, white lines and comments may be used freely.
795Comments start at the beginning of a line with <SAMP>`#'</SAMP> and extend
796until the end of the PO file line. Comments written by translators
797should have the initial <SAMP>`#'</SAMP> immediately followed by some white
798space. If the <SAMP>`#'</SAMP> is not immediately followed by white space,
799this comment is most likely generated and managed by specialized GNU
800tools, and might disappear or be replaced unexpectandly when the PO
801file is given to <CODE>tupdate</CODE>.
802
803</P>
804
805
806<H2><A NAME="SEC10" HREF="gettext_toc.html#TOC10">Main Commands</A></H2>
807
808<P>
809When Emacs finds a PO file in a window, PO mode is activated
810for that window. This puts the window read-only and establishes a
811po-mode-map, which is a genuine Emacs mode, in that way that it is
812not derived from text mode in any way.
813
814</P>
815<P>
816The main PO commands are those who do not fit in the other categories in
817subsequent sections, they allow for quitting PO mode or managing windows
818in special ways.
819
820</P>
821<DL COMPACT>
822
823<DT><KBD>u</KBD>
824<DD>
825Undo last modification to the PO file.
826
827<DT><KBD>q</KBD>
828<DD>
829Quit processing and save the PO file.
830
831<DT><KBD>o</KBD>
832<DD>
833Temporary leave the PO file window.
834
835<DT><KBD>h</KBD>
836<DD>
837Show help about PO mode.
838
839<DT><KBD>=</KBD>
840<DD>
841Give some PO file statistics.
842
843<DT><KBD>v</KBD>
844<DD>
845Batch validate the format of the whole PO file.
846
847</DL>
848
849<P>
850The command <KBD>u</KBD> (<CODE>po-undo</CODE>) interfaces to the GNU Emacs
851<EM>undo</EM> facility. See section `Undoing Changes' in <CITE>The Emacs Editor</CITE>. Each time <KBD>u</KBD> is typed, modifications the translator
852did to the PO file are undone a little more. For the purpose of
853undoing, each PO mode command is atomic. This is especially true for
854the <KBD><KBD>RET</KBD></KBD> command: the whole edition made by using a single
855use of this command is undone at once, even if the edition itself
856implied several actions. However, while in the editing window, one
857can undo the edition work quite parsimoniously.
858
859</P>
860<P>
861The command <KBD>q</KBD> (<CODE>po-quit</CODE>) is used when the translator is
862done with the PO file. If the file has been modified, it is saved
863on disk first. However, prior to all this, the command checks if
864some untranslated message remains in the PO file and, if yes, the
865translator is asked if she really wants to leave working with this
866PO file. This is the preferred way of getting rid of an Emacs PO
867file buffer. Merely killing it through the usual command <KBD>C-x
868k</KBD> (<CODE>kill-buffer</CODE>), say, has the unnice effect of leaving a PO
869internal work buffer behind.
870
871</P>
872<P>
873The command <KBD>o</KBD> (<CODE>po-other-window</CODE>) is another, softer
874way, to leave PO mode, temporarily. It just moves the cursor in
875some other Emacs window, and pops one if necessary. For example, if
876the translator just got PO mode to show some source context in some
877other, she might discover some apparent bug in the program source
878that needs correction. This command allows the translator to change
879sex, become a programmer, and have the cursor right into the window
880containing the program she (or rather <EM>he</EM>) wants to modify.
881By later getting the cursor back in the PO file window, or by
882asking Emacs to edit this file once again, PO mode is then recovered.
883
884</P>
885<P>
886The command <KBD>h</KBD> (<CODE>po-help</CODE>) displays a summary of all
887available PO mode commands. The translator should then type any
888character to resume normal PO mode operations. The command <KBD>?</KBD>
889has the same effect as <KBD>h</KBD>.
890
891</P>
892<P>
893The command <KBD>=</KBD> (<CODE>po-statistics</CODE>) computes the total number
894of entries in the PO file, the ordinal of the current entry
895(counted from 1), the number of untranslated entries, the number of
896obsolete entries, and displays all these numbers.
897
898</P>
899<P>
900The command <KBD>v</KBD> (<CODE>po-validate</CODE>) launches <CODE>msgfmt</CODE> in
901verbose mode over the current PO file. This command first offers
902to save the current PO file on disk. The <CODE>msgfmt</CODE> tool, from
903GNU <CODE>gettext</CODE>, has the purpose of creating an MO file out of a
904PO file, and PO mode uses the features of this program for checking
905the overall format of a PO file, as well as all individual entries.
906
907</P>
908<P>
909The program <CODE>msgfmt</CODE> runs asynchronously with Emacs, so
910the translator regains control immediately while her PO file
911is being studied. Error output is collected in the GNU Emacs
912<SAMP>`*compilation*'</SAMP> buffer, displayed in another window. The regular
913GNU Emacs command <KBD>C-x`</KBD> (<CODE>next-error</CODE>), as well as other
914usual compile commands, allow the translator to reposition quickly to
915the offending parts of the PO file. Once the cursor on the line in
916error, the translator may decide for any PO mode action which would
917help correcting the error.
918
919</P>
920
921
922<H2><A NAME="SEC11" HREF="gettext_toc.html#TOC11">Entry Positioning</A></H2>
923
924<P>
925The cursor in a PO file window is almost always part of
926an entry. The only exceptions are the special case when the cursor
927is after the last entry in the file, or when the PO file is
928empty. The entry where the cursor is found to be is said to be the
929current entry. Many PO mode commands operate on the current entry,
930so moving the cursor does more than allowing the translator to browse
931the PO file, this also selects on which entry commands operate.
932
933</P>
934<P>
935Some PO mode commands alter the position of the cursor in a specialized
936way. A few of those special purpose positioning are described here,
937the others are described in following sections.
938
939</P>
940<DL COMPACT>
941
942<DT><KBD>.</KBD>
943<DD>
944Redisplay the current entry.
945
946<DT><KBD>n</KBD>
947<DD>
948<DT><KBD>SPC</KBD>
949<DD>
950Select the entry after the current one.
951
952<DT><KBD>p</KBD>
953<DD>
954<DT><KBD>DEL</KBD>
955<DD>
956Select the entry before the current one.
957
958<DT><KBD>&#60;</KBD>
959<DD>
960Select the first entry in the PO file.
961
962<DT><KBD>&#62;</KBD>
963<DD>
964Select the last entry in the PO file.
965
966<DT><KBD>m</KBD>
967<DD>
968Record the location of the current entry for later use.
969
970<DT><KBD>l</KBD>
971<DD>
972Return to a previously saved entry location.
973
974<DT><KBD>x</KBD>
975<DD>
976Exchange the current entry location with the previously saved one.
977
978</DL>
979
980<P>
981Any GNU Emacs command able to reposition the cursor may be used
982to select the current entry in PO mode, including commands which
983move by characters, lines, paragraphs, screens or pages, and search
984commands. However, there is a kind of standard way to display the
985current entry in PO mode, which usual GNU Emacs commands moving
986the cursor do not especially try to enforce. The command <KBD>.</KBD>
987(<CODE>po-current-entry</CODE>) has the sole purpose of redisplaying the
988current entry properly, after the current entry has been changed by
989means external to PO mode, or the Emacs screen otherwise altered.
990
991</P>
992<P>
993It is yet to decide if PO mode would help the translator, or otherwise
994irritate her, by forcing a more fixed window disposition while she
995is doing her work. We originally had quite precise ideas about
996how windows should behave, but on the other hand, anyone used to
997GNU Emacs is often happy to keep full control. Maybe a fixed window
998disposition might be offered as a PO mode option that the translator
999might activate or deactivate at will, so it could be offered on an
1000experimental basis. If nobody feels a real need for using it, or
1001a compulsion for writing it, we might as well drop this whole idea.
1002The incentive for doing it should come from translators rather than
1003programmers, as opinions from an experienced translator are surely
1004more worth to me than opinions from programmers <EM>thinking</EM> about
1005how <EM>others</EM> should do translation.
1006
1007</P>
1008<P>
1009The commands <KBD>n</KBD> (<CODE>po-next-entry</CODE>) and <KBD>p</KBD>
1010(<CODE>po-previous-entry</CODE>) move the cursor the entry following,
1011or preceding, the current one. If <KBD>n</KBD> is given while the
1012cursor is on the last entry of the PO file, or if <KBD>p</KBD>
1013is given while the cursor is on the first entry, no move is done.
1014<KBD><KBD>SPC</KBD></KBD> and <KBD><KBD>DEL</KBD></KBD> are alternate keys for <KBD>n</KBD> and
1015<KBD>p</KBD>, respectively.
1016
1017</P>
1018<P>
1019The commands <KBD>&#60;</KBD> (<CODE>po-first-entry</CODE>) and <KBD>&#62;</KBD>
1020(<CODE>po-last-entry</CODE>) move the cursor to the first entry, or last
1021entry, of the PO file. When the cursor is located past the last
1022entry in a PO file, most PO mode commands will return an error saying
1023<SAMP>`After last entry'</SAMP>. However, the commands <KBD>&#60;</KBD> and <KBD>&#62;</KBD>
1024have the special property of being able to work even when the cursor
1025is not into some PO file entry, and you may use them for nicely
1026correcting this situation. But even these commands will fail on a
1027truly empty PO file. There are development plans for PO mode for it
1028to interactively fill an empty PO file from sources. See section <A HREF="gettext.html#SEC16">Marking Translatable Strings</A>.
1029
1030</P>
1031<P>
1032The translator may decide, before working at the translation of
1033a particular entry, that she needs browsing the remainder of the
1034PO file, maybe for finding the terminology or phraseology used
1035in related entries. She can of course use the standard Emacs idioms
1036for saving the current cursor location in some register, and use that
1037register for getting back, or else, to use the location ring.
1038
1039</P>
1040<P>
1041PO mode offers another approach, by which cursor locations may be saved
1042onto a special stack. The command <KBD>m</KBD> (<CODE>po-push-location</CODE>)
1043merely adds the location of current entry to the stack, pushing
1044the already saved locations under the new one. The command
1045<KBD>l</KBD> (<CODE>po-pop-location</CODE>) consumes the top stack element and
1046reposition the cursor to the entry associated with that top element.
1047This position is then lost, for the next <KBD>l</KBD> will move the cursor
1048to the previously saved location, and so on until locations remain
1049on the stack.
1050
1051</P>
1052<P>
1053If the translator wants the position to be kept on the location stack,
1054maybe for taking a mere look at the entry associated with the top
1055element, then go elsewhere with the intent of getting back later, she
1056ought to use <KBD>m</KBD> immediately after <KBD>l</KBD>.
1057
1058</P>
1059<P>
1060The command <KBD>x</KBD> (<CODE>po-exchange-location</CODE>) simultaneously
1061reposition the cursor to the entry associated with the top element of
1062the stack of saved locations, and replace that top element with the
1063location of the current entry before the move. Consequently, repeating
1064the <KBD>x</KBD> command toggles alternatively between two entries.
1065For achieving this, the translator will position the cursor on the
1066first entry, use <KBD>m</KBD>, then position to the second entry, and
1067merely use <KBD>x</KBD> for making the switch.
1068
1069</P>
1070
1071
1072<H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">Normalizing Strings in Entries</A></H2>
1073
1074<P>
1075There are many different ways for encoding a particular string into a
1076PO file entry, because there are so many different ways to split and
1077quote multi-line strings, and even, to represent special characters
1078by backslahsed escaped sequences. Some features of PO mode rely on
1079the ability for PO mode to scan an already existing PO file for a
1080particular string encoded into the <CODE>msgid</CODE> field of some entry.
1081Even if PO mode has internally all the built-in machinery for
1082implementing this recognition easily, doing it fast is technically
1083difficult. For facilitating a solution to this efficiency problem,
1084we decided for a canonical representation for strings.
1085
1086</P>
1087<P>
1088A conventional representation of strings in a PO file is currently
1089under discussion, and PO mode experiments a canonical representation.
1090Having both <CODE>xgettext</CODE> and PO mode converging towards a uniform
1091way of representing equivalent strings would be useful, as the internal
1092normalization needed by PO mode could be automatically satisfied
1093when using <CODE>xgettext</CODE> from GNU <CODE>gettext</CODE>. An explicit
1094PO mode normalization should then be only necessary for PO files
1095imported from elsewhere, or for when the convention itself evolves.
1096
1097</P>
1098<P>
1099So, for achieving normalization of at least the strings of a given
1100PO file needing a canonical representation, the following PO mode
1101command is available:
1102
1103</P>
1104<DL COMPACT>
1105
1106<DT><KBD>M-x po-normalize</KBD>
1107<DD>
1108Tidy the whole PO file by making entries more uniform.
1109
1110</DL>
1111
1112<P>
1113The special command <KBD>M-x po-normalize</KBD>, which has no associate
1114keys, revises all entries, ensuring that strings of both original
1115and translated entries use uniform internal quoting in the PO file.
1116It also removes any crumb after the last entry. This command may be
1117useful for PO files freshly imported from elsewhere, or if we ever
1118improve on the canonical quoting format we use. This canonical format
1119is not only meant for getting cleaner PO files, but also for greatly
1120speeding up <CODE>msgid</CODE> string lookup for some other PO mode commands.
1121
1122</P>
1123<P>
1124<KBD>M-x po-normalize</KBD> presently makes three passes over the entries.
1125The first implements heuristics for converting PO files for GNU
1126<CODE>gettext</CODE> 0.6 and earlier, in which <CODE>msgid</CODE> and <CODE>msgstr</CODE>
1127fields were using K&#38;R style C string syntax for multi-line strings.
1128These heuristics may fail for comments not related to obsolete
1129entries and ending with a backslash; they also depend on subsequent
1130passes for finalizing the proper commenting of continued lines for
1131obsolete entries. This first pass might disappear once all oldish PO
1132files would have been adjusted. The second and third pass normalize
1133all <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings respectively. They also
1134clean out those trailing backslashes used by XView's <CODE>msgfmt</CODE>
1135for continued lines.
1136
1137</P>
1138<P>
1139Having such an explicit normalizing command allows for importing PO
1140files from other sources, but also eases the evolution of the current
1141convention, evolution driven mostly by aesthetic concerns, as of now.
1142It is all easy to make suggested adjustments at a later time, as the
1143normalizing command and eventually, other GNU <CODE>gettext</CODE> tools
1144should greatly automate conformance. A description of the canonical
1145string format is given below, for the particular benefit of those not
1146having GNU Emacs handy, and who would nevertheless want to handcraft
1147their PO files in nice ways.
1148
1149</P>
1150<P>
1151Right now, in PO mode, strings are single line or multi-line. A string
1152goes multi-line if and only if it has <EM>embedded</EM> newlines, that
1153is, if it matches <SAMP>`[^\n]\n+[^\n]'</SAMP>. So, we would have:
1154
1155</P>
1156
1157<PRE>
1158msgstr "\n\nHello, world!\n\n\n"
1159</PRE>
1160
1161<P>
1162but, replacing the space by a newline, this becomes:
1163
1164</P>
1165
1166<PRE>
1167msgstr ""
1168"\n"
1169"\n"
1170"Hello,\n"
1171"world!\n"
1172"\n"
1173"\n"
1174</PRE>
1175
1176<P>
1177We are deliberately using a caricatural example, here, to make the
1178point clearer. Usually, multi-lines are not that bad looking.
1179It is probable that we will implement the following suggestion.
1180We might lump together all initial newlines into the empty string,
1181and also all newlines introducing empty lines (that is, for <VAR>n</VAR>
1182&#62; 1, the <VAR>n</VAR>-1'th last newlines would go together on a separate
1183string), so making the previous example appear:
1184
1185</P>
1186
1187<PRE>
1188msgstr "\n\n"
1189"Hello,\n"
1190"world!\n"
1191"\n\n"
1192</PRE>
1193
1194<P>
1195There are a few yet undecided little points about string normalization,
1196to be documented in this manual, once these questions settle.
1197
1198</P>
1199
1200
1201<H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">Preparing Program Sources</A></H1>
1202
1203<P>
1204For the programmer, changes to the C source code fall into three
1205categories. First, you have to make the localization functions
1206known to all modules needing message translation. Second, you should
1207properly trigger the operation of GNU <CODE>gettext</CODE> when the program
1208initializes, usually from the <CODE>main</CODE> function. Last, you should
1209identify and especially mark all constant strings in your program
1210needing translation.
1211
1212</P>
1213<P>
1214Presuming that your set of programs, or package, has been adjusted
1215so all needed GNU <CODE>gettext</CODE> files are available, and your
1216<TT>`Makefile'</TT> files are adjusted (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>), each C module
1217having translated C strings should contain the line:
1218
1219</P>
1220
1221<PRE>
1222#include &#60;libintl.h&#62;
1223</PRE>
1224
1225<P>
1226The remaining changes to your C sources are discussed in the further
1227sections of this chapter.
1228
1229</P>
1230
1231
1232
1233<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">Triggering <CODE>gettext</CODE> Operations</A></H2>
1234
1235<P>
1236The initialization of locale data should be done with more or less
1237the same code in every program, as demonstrated below:
1238
1239</P>
1240
1241<PRE>
1242int
1243main (argc, argv)
1244 int argc;
1245 char argv;
1246{
1247 ...
1248 setlocale (LC_ALL, "");
1249 bindtextdomain (PACKAGE, LOCALEDIR);
1250 textdomain (PACKAGE);
1251 ...
1252}
1253</PRE>
1254
1255<P>
1256<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
1257<TT>`config.h'</TT> or by the Makefile. For now consult the <CODE>gettext</CODE>
1258sources for more information.
1259
1260</P>
1261<P>
1262The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
1263<CODE>LC_ALL</CODE> includes all locale categories and especially
1264<CODE>LC_CTYPE</CODE>. This later category is responsible for determining
1265character classes with the <CODE>isalnum</CODE> etc. functions from
1266<TT>`ctype.h'</TT> which could especially for programs, which process some
1267kind of input language, be wrong. For example this would mean that a
1268source code using the (cedille character) is runnable in
1269France but not in the U.S.
1270
1271</P>
1272<P>
1273So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
1274code above by a sequence of <CODE>setlocale</CODE> lines
1275
1276</P>
1277
1278<PRE>
1279{
1280 ...
1281 setlocale (LC_TIME, "");
1282 setlocale (LC_MESSAGES, "");
1283 ...
1284}
1285</PRE>
1286
1287<P>
1288or to switch for and back to the character class in question.
1289
1290</P>
1291
1292
1293<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">How Marks Appears in Sources</A></H2>
1294
1295<P>
1296The C sources should mark all strings requiring translation. Marking
1297is done in such a way that each translatable string appears to be
1298the sole argument of some function or preprocessor macro. There are
1299only a few such possible functions or macros meant for translation,
1300and their names are said to be marking keywords. The marking is
1301attached to strings themselves, rather than to what we do with them.
1302This approach has more uses. A blatant example is an error message
1303produced by formatting. The format string needs translation, as
1304well as some strings inserted through some <SAMP>`%s'</SAMP> specification
1305in the format, while the result from <CODE>sprintf</CODE> may have so many
1306different instances that it is unpractical to list them all in some
1307<SAMP>`error_string_out()'</SAMP> routine, say.
1308
1309</P>
1310<P>
1311This marking operation has two goals. The first goal of marking
1312is for triggering the retrieval of the translation, at run time.
1313The keyword are possibly resolved into a routine able to dynamically
1314return the proper translation, as far as possible or wanted, for the
1315argument string. Most localizable strings are found into executable
1316positions, that is, affected to variables or given as parameter to
1317functions. But this is not universal usage, and some translatable
1318strings appear in structured initializations. See section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>.
1319
1320</P>
1321<P>
1322The second goal of the marking operation is to help <CODE>xgettext</CODE>
1323at properly extracting all translatable strings when it scans a set
1324of program sources and produces PO file templates.
1325
1326</P>
1327<P>
1328The canonical keyword for marking translatable strings is
1329<SAMP>`gettext'</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
1330package. For packages making only light use of the <SAMP>`gettext'</SAMP>
1331keyword, macro or function, it is easily used <EM>as is</EM>. However,
1332for packages using the <CODE>gettext</CODE> interface more heavily, it
1333is usually more convenient giving the main keyword a shorter, less
1334obtrusive name. Indeed, the keyword might appear on a lot of strings
1335all over the package, and programmers usually do not want nor need
1336that their program sources remind them loud, all the time, that they
1337are internationalized. Further, a long keyword has the disadvantage
1338of using more horizontal space, forcing more indentation work on
1339sources for those trying to keep them within 79 or 80 columns.
1340
1341</P>
1342<P>
1343Many GNU packages use <SAMP>`_'</SAMP> (a simple underline) as a keyword,
1344and write <SAMP>`_("Translatable string")'</SAMP> instead of <SAMP>`gettext
1345("Translatable string")'</SAMP>. Further, the usual GNU coding rule
1346wanting that there is a space between the keyword and the opening
1347parenthesis is relaxed, in practice, for this particular usage.
1348So, the textual overhead per translatable string is reduced to
1349only three characters: the underline and the two parentheses.
1350However, even if GNU <CODE>gettext</CODE> uses this convention internally,
1351it does not offer it officially. The real, genuine keyword is truly
1352<SAMP>`gettext'</SAMP> indeed. It is fairly easy for those wanting to use
1353<SAMP>`_'</SAMP> instead of <SAMP>`gettext'</SAMP> to declare:
1354
1355</P>
1356
1357<PRE>
1358#include &#60;libintl.h&#62;
1359#define _(String) gettext (String)
1360</PRE>
1361
1362<P>
1363instead of merely using <SAMP>`#include &#60;libintl.h&#62;'</SAMP>.
1364
1365</P>
1366<P>
1367Later on, the maintenance is relatively easy. If, as a programmer,
1368you add or modify a string, you will have to ask yourself if the
1369new or altered string requires translation, and include it within
1370<SAMP>`_()'</SAMP> if you think it should be translated. <SAMP>`"%s: %d"'</SAMP> is
1371an example of string <EM>not</EM> requiring translation!
1372
1373</P>
1374
1375
1376<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">Marking Translatable Strings</A></H2>
1377
1378<P>
1379In PO mode, one set of features is meant more for the programmer than
1380for the translator, and allows him to interactively mark which strings,
1381in a set of program sources, are translatable, and which are not.
1382Even if it is a fairly easy job for a programmer to find and mark
1383such strings by other means, using any editor of his choice, PO mode
1384makes this work more comfortable. Further, this gives translators
1385who feel a little like programmers, or programmers who feel a little
1386like translators, a tool letting them work at marking translatable
1387strings in the program sources, while simultaneously producing a set of
1388translation in some language, for the package being internationalized.
1389
1390</P>
1391<P>
1392The set of program sources, aimed by the PO mode commands describe
1393here, should have an Emacs tags table constructed for your project,
1394prior to using these PO file commands. This is easy to do. In any
1395shell window, change the directory to the root of your project, then
1396execute a command resembling:
1397
1398</P>
1399
1400<PRE>
1401etags src/*.[hc] lib/*.[hc]
1402</PRE>
1403
1404<P>
1405presuming here you want to process all <TT>`.h'</TT> and <TT>`.c'</TT> files
1406from the <TT>`src/'</TT> and <TT>`lib/'</TT> directories. This command will
1407explore all said files and create a <TT>`TAGS'</TT> file in your root
1408directory, somewhat summarizing the contents using a special file
1409format Emacs can understand.
1410
1411</P>
1412<P>
1413For official GNU packages which follow the GNU coding standard there is
1414a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which construct the tag files in
1415all directories and for all files containing source code.
1416
1417</P>
1418<P>
1419Once your <TT>`TAGS'</TT> file is ready, the following commands assist
1420the programmer at marking translatable strings in his set of sources.
1421But these commands are necessarily driven from within a PO file
1422window, and it is likely that you do not even have such a PO file yet.
1423This is not a problem at all, as you may safely open a new, empty PO
1424file, mainly for using these commands. This empty PO file will slowly
1425fill in while you mark strings as translatable in your program sources.
1426
1427</P>
1428<DL COMPACT>
1429
1430<DT><KBD>,</KBD>
1431<DD>
1432Search through program sources for a string which looks like a
1433candidate for translation.
1434
1435<DT><KBD>M-,</KBD>
1436<DD>
1437Mark the last string found with <SAMP>`_()'</SAMP>.
1438
1439<DT><KBD>M-.</KBD>
1440<DD>
1441Mark the last string found with a keyword taken from a set of possible
1442keywords. This command with a prefix allows some management of these
1443keywords.
1444
1445</DL>
1446
1447<P>
1448The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command search for the next
1449occurrence of a string which looks like a possible candidate for
1450translation, and displays the program source in another Emacs window,
1451positioned in such a way that the string is near the top of this other
1452window. If the string is to big to fit whole in this window, it is
1453rather positioned so only its end is shown. In any case, the cursor
1454is left in the PO file window. If the shown string would be better
1455presented differently in different native languages, you may mark it
1456using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it
1457and skip to the next string by merely repeating the <KBD>,</KBD> command.
1458
1459</P>
1460<P>
1461A string is a good candidate for translation if it contains a sequence
1462of three or more letters. A string containing at most two letters in
1463a row will be considered as a candidate if it has more letters than
1464non-letters. The command disregards strings containing no letters,
1465or isolated letters only. It also disregards strings within comments,
1466or strings already marked with some keyword PO mode knows (see below).
1467
1468</P>
1469<P>
1470If you have never told Emacs about some <TT>`TAGS'</TT> file to use, the
1471command will request that you specify one from the minibuffer, the
1472first time you use the command. You may later change your <TT>`TAGS'</TT>
1473file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
1474which will ask you to name the precise <TT>`TAGS'</TT> file you want
1475to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>.
1476
1477</P>
1478<P>
1479Each time you use the <KBD>,</KBD> command, the search resumes where it was
1480left over by the previous search, and goes through all program sources,
1481obeying the <TT>`TAGS'</TT> file, until all sources have been processed.
1482However, by giving a prefix argument to the command (<KBD>C-u
1483,)</KBD>, you may request that the search be restarted all over again
1484from the first program source; but in this case, strings that you
1485recently marked as translatable will be automatically skipped.
1486
1487</P>
1488<P>
1489Using this <KBD>,</KBD> command does not prevent using of other regular
1490Emacs tags commands. For example, regular <CODE>tags-search</CODE> or
1491<CODE>tags-query-replace</CODE> commands may be used without disrupting the
1492independent <KBD>,</KBD> search sequence. However, as implemented, the
1493<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
1494prefix) might also reinitialize the regular Emacs tags searching to the
1495first tags file, this reinitialization might be considered spurious.
1496
1497</P>
1498<P>
1499The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
1500recently found string with the <SAMP>`_'</SAMP> keyword. The <KBD>M-.</KBD>
1501(<CODE>po-select-mark-and-mark</CODE>) command will request that you type
1502one keyword from the minibuffer and use that keyword for marking
1503the string. Both commands will automatically create a new PO file
1504untranslated entry for the string being marked, and make it the
1505current entry (making it easy for you to immediately proceed to its
1506translation, if you feel like doing it right away). It is possible
1507that the modifications made to the program source by <KBD>M-,</KBD> or
1508<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
1509to break and re-indent this line differently. You may use the <KBD>o</KBD>
1510command from PO mode, or any other window changing command from
1511GNU Emacs, to break out into the program source window, and do any
1512needed adjustments. You will have to use some regular Emacs command
1513to return the cursor to the PO file window, if you want commanding
1514<KBD>,</KBD> for the next string, say.
1515
1516</P>
1517<P>
1518The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
1519have to explicitly type all keywords all the time. The first such
1520speedup is that you are presented with a <EM>preferred</EM> keyword,
1521which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
1522The second speedup is that you may type any non-ambiguous prefix of the
1523keyword you really mean, and the command will complete it automatically
1524for you. This also means that PO mode has to <EM>know</EM> all
1525your possible keywords, and that it will not accept mistyped keywords.
1526
1527</P>
1528<P>
1529If you reply <KBD>?</KBD> to the keyword request, the command gives a
1530list of all known keywords, from which you may choose. When the
1531command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
1532updating any program source or PO file buffer, and does some simple
1533keyword management instead. In this case, the command asks for a
1534keyword, written in full, which becomes a new allowed keyword for
1535later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically
1536becomes the <EM>preferred</EM> keyword for later commands. By typing
1537an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
1538changes the <EM>preferred</EM> keyword and does nothing more.
1539
1540</P>
1541<P>
1542All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
1543when scanning for strings, and strings already marked by any of those
1544known keywords are automatically skipped. If many PO files are opened
1545simultaneously, each one has its own independent set of known keywords.
1546There is no provision in PO mode, currently, for deleting a known
1547keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
1548it afresh. When a PO file is newly brought up in an Emacs window, only
1549<SAMP>`gettext'</SAMP> and <SAMP>`_'</SAMP> are known as keywords, and <SAMP>`gettext'</SAMP>
1550is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to
1551prefer <SAMP>`_'</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
1552
1553</P>
1554
1555
1556<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">Special Cases of Translatable Strings</A></H2>
1557
1558<P>
1559The attentive reader might now point out that it is not always possible
1560to mark translatable string with <CODE>gettext</CODE> or something like this.
1561Consider the following case:
1562
1563</P>
1564
1565<PRE>
1566{
1567 static const char *messages[] = {
1568 "some very meaningful message",
1569 "and another one"
1570 };
1571 const char *string;
1572 ...
1573 string
1574 = index &#62; 1 ? "a default message" : messages[index];
1575
1576 fputs (string);
1577 ...
1578}
1579</PRE>
1580
1581<P>
1582While it is no problem to mark the string <CODE>"a default message"</CODE> it
1583is not possible to mark the string initializers for <CODE>messages</CODE>.
1584What is to do? We have to fulfill two tasks. First we have to mark the
1585strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>)
1586can find them, and second we have to translate the string at runtime
1587before printing them.
1588
1589</P>
1590<P>
1591The first task can be fulfilled by creating a new keyword, which names a
1592no-op. For the second we have to mark all access points to a string
1593from the array. So one solution can look like this:
1594
1595</P>
1596
1597<PRE>
1598#define gettext_noop(String) (String)
1599
1600{
1601 static const char *messages[] = {
1602 gettext_noop ("some very meaningful message"),
1603 gettext_noop ("and another one")
1604 };
1605 const char *string;
1606 ...
1607 string
1608 = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
1609
1610 fputs (string);
1611 ...
1612}
1613</PRE>
1614
1615<P>
1616Please convince yourself that the string which is written by
1617<CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know
1618the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>.
1619
1620</P>
1621<P>
1622The above is of course not the only solution. You could also come along
1623with the following one:
1624
1625</P>
1626
1627<PRE>
1628#define gettext_noop(String) (String)
1629
1630{
1631 static const char *messages[] = {
1632 gettext_noop ("some very meaningful message",
1633 gettext_noop ("and another one")
1634 };
1635 const char *string;
1636 ...
1637 string
1638 = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
1639
1640 fputs (gettext (string));
1641 ...
1642}
1643</PRE>
1644
1645<P>
1646But this has some drawbacks. First the programmer has to take care that
1647he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
1648A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
1649The second reason is found in the internals of the GNU <CODE>gettext</CODE>
1650Library which will make this solution less efficient.
1651
1652</P>
1653<P>
1654One advantage is that you need not make control flow analysis to make
1655sure the output is really translated in any case. But this analysis is
1656generally not very difficult. If it should be in any situation you can
1657use this second method in this situation.
1658
1659</P>
1660
1661
1662
1663<H1><A NAME="SEC18" HREF="gettext_toc.html#TOC18">Making the Initial PO File</A></H1>
1664
1665
1666
1667<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">Invoking the <CODE>xgettext</CODE> Program</A></H2>
1668
1669
1670<PRE>
1671xgettext [<VAR>option</VAR>] <VAR>inputfile</VAR> ...
1672</PRE>
1673
1674<DL COMPACT>
1675
1676<DT><SAMP>`-a'</SAMP>
1677<DD>
1678<DT><SAMP>`--extract-all'</SAMP>
1679<DD>
1680Extract all strings.
1681
1682<DT><SAMP>`-c [<VAR>tag</VAR>]'</SAMP>
1683<DD>
1684<DT><SAMP>`--add-comments[=<VAR>tag</VAR>]'</SAMP>
1685<DD>
1686Place comment block with <VAR>tag</VAR> (or those preceding keyword lines)
1687in output file.
1688
1689<DT><SAMP>`-C'</SAMP>
1690<DD>
1691<DT><SAMP>`--c++'</SAMP>
1692<DD>
1693Recognize C++ style comments.
1694
1695<DT><SAMP>`-d <VAR>name</VAR>'</SAMP>
1696<DD>
1697<DT><SAMP>`--default-domain=<VAR>name</VAR>'</SAMP>
1698<DD>
1699Use <TT>`<VAR>name</VAR>.po'</TT> for output (instead of <TT>`messages.po'</TT>).
1700
1701<DT><SAMP>`-D <VAR>directory</VAR>'</SAMP>
1702<DD>
1703<DT><SAMP>`--directory=<VAR>directory</VAR>'</SAMP>
1704<DD>
1705Change to <VAR>directory</VAR> before beginning to search and scan source
1706files. The resulting <TT>`.po'</TT> file will be written relative to the
1707original directory, though.
1708
1709<DT><SAMP>`-f <VAR>file</VAR>'</SAMP>
1710<DD>
1711<DT><SAMP>`--files-from=<VAR>file</VAR>'</SAMP>
1712<DD>
1713Read the names of the input files from <VAR>file</VAR> instead of getting
1714them from the command line.
1715
1716<DT><SAMP>`-h'</SAMP>
1717<DD>
1718<DT><SAMP>`--help'</SAMP>
1719<DD>
1720Display this help and exit.
1721
1722<DT><SAMP>`-I <VAR>list</VAR>'</SAMP>
1723<DD>
1724<DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP>
1725<DD>
1726List of directories searched for input files.
1727
1728<DT><SAMP>`-j'</SAMP>
1729<DD>
1730<DT><SAMP>`--join-existing'</SAMP>
1731<DD>
1732Join messages with existing file.
1733
1734<DT><SAMP>`-k <VAR>word</VAR>'</SAMP>
1735<DD>
1736<DT><SAMP>`--keyword[=<VAR>word</VAR>]'</SAMP>
1737<DD>
1738Additonal keyword to be looked for (without <VAR>word</VAR> means not to
1739use default keywords).
1740
1741The default keywords, which are always looked for if not explicitly
1742disabled, are <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE> and
1743<CODE>gettext_noop</CODE>.
1744
1745<DT><SAMP>`-m [<VAR>string</VAR>]'</SAMP>
1746<DD>
1747<DT><SAMP>`--msgstr-prefix[=<VAR>string</VAR>]'</SAMP>
1748<DD>
1749Use <VAR>string</VAR> or "" as prefix for msgstr entries.
1750
1751<DT><SAMP>`-M [<VAR>string</VAR>]'</SAMP>
1752<DD>
1753<DT><SAMP>`--msgstr-suffix[=<VAR>string</VAR>]'</SAMP>
1754<DD>
1755Use <VAR>string</VAR> or "" as suffix for msgstr entries.
1756
1757<DT><SAMP>`--no-location'</SAMP>
1758<DD>
1759Do not write <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines.
1760
1761<DT><SAMP>`-n'</SAMP>
1762<DD>
1763<DT><SAMP>`--add-location'</SAMP>
1764<DD>
1765Generate <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines (default).
1766
1767<DT><SAMP>`--omit-header'</SAMP>
1768<DD>
1769Don't write header with <SAMP>`msgid ""'</SAMP> entry.
1770
1771This is useful for testing purposes because it eliminates a source
1772of variance for generated <CODE>.gmo</CODE> files. We can ship some of
1773these files in the GNU <CODE>gettext</CODE> package, and the result of
1774regenerating them through <CODE>msgfmt</CODE> should yield the same values.
1775
1776<DT><SAMP>`-p <VAR>dir</VAR>'</SAMP>
1777<DD>
1778<DT><SAMP>`--output-dir=<VAR>dir</VAR>'</SAMP>
1779<DD>
1780Output files will be placed in directory <VAR>dir</VAR>.
1781
1782<DT><SAMP>`-s'</SAMP>
1783<DD>
1784<DT><SAMP>`--sort-output'</SAMP>
1785<DD>
1786Generate sorted output and remove duplicates.
1787
1788<DT><SAMP>`--strict'</SAMP>
1789<DD>
1790Write out strict Uniforum conforming PO file.
1791
1792<DT><SAMP>`-v'</SAMP>
1793<DD>
1794<DT><SAMP>`--version'</SAMP>
1795<DD>
1796Output version information and exit.
1797
1798<DT><SAMP>`-x <VAR>file</VAR>'</SAMP>
1799<DD>
1800<DT><SAMP>`--exclude-file=<VAR>file</VAR>'</SAMP>
1801<DD>
1802Entries from <VAR>file</VAR> are not extracted.
1803
1804</DL>
1805
1806<P>
1807Search path for supplementary PO files is:
1808<TT>`/usr/local/share/nls/src/'</TT>.
1809
1810</P>
1811<P>
1812If <VAR>inputfile</VAR> is <SAMP>`-'</SAMP>, standard input is read.
1813
1814</P>
1815<P>
1816This implementation of <CODE>xgettext</CODE> is able to process a few awkward
1817cases, like strings in preprocessor macros, ANSI concatenation of
1818adjacent strings, and escaped end of lines for continued strings.
1819
1820</P>
1821
1822
1823<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">C Sources Context</A></H2>
1824
1825<P>
1826PO mode is particularily powerful when used with PO files
1827created through GNU <CODE>gettext</CODE> utilities, as those utilities
1828insert special comments in the PO files they generate.
1829Some of these special comments relate the PO file entry to
1830exactly where the untranslated string appears in the program sources.
1831
1832</P>
1833<P>
1834When the translator gets to an untranslated entry, she is fairly
1835often faced with an original string which is not as informative as
1836it normally should, being succinct, cryptic, or otherwise ambiguous.
1837Before chosing how to translate the string, she needs to understand
1838better what the string really means and how tight the translation has
1839to be. Most of times, when problems arise, the only way left to make
1840her judgment is looking at the true program sources from where this
1841string originated, searching for surrounding comments the programmer
1842might have put in there, and looking around for helping clues of
1843<EM>any</EM> kind.
1844
1845</P>
1846<P>
1847Surely, when looking at program sources, the translator will receive
1848more help if she is a fluent programmer. However, even if she is
1849not versed in programming and feels a little lost in C code, the
1850translator should not be shy at taking a look, once in a while.
1851It is most probable that she will still be able to find some of the
1852hints she needs. She will learn quickly to not feel uncomfortable
1853in program code, paying more attention to programmer's comments,
1854variable and function names (if he dared chosing them well), and
1855overall organization, than to programmation itself.
1856
1857</P>
1858<P>
1859The following commands are meant to help the translator at getting
1860program source context for a PO file entry.
1861
1862</P>
1863<DL COMPACT>
1864
1865<DT><KBD>c</KBD>
1866<DD>
1867Resume the display of a program source context, or cycle through them.
1868
1869<DT><KBD>M-c</KBD>
1870<DD>
1871Display of a program source context selected by menu.
1872
1873<DT><KBD>d</KBD>
1874<DD>
1875Add a directory to the search path for source files.
1876
1877<DT><KBD>M-d</KBD>
1878<DD>
1879Delete a directory from the search path for source files.
1880
1881</DL>
1882
1883<P>
1884The commands <KBD>c</KBD> (<CODE>po-cycle-reference</CODE>) and <KBD>M-c</KBD>
1885(<CODE>po-select-reference</CODE>) both open another window displaying
1886some source program file, and already positioned in such a way that
1887it shows an actual use of the current string to translate. By doing
1888so, the command gives source program context for the string. But if
1889the entry has no source context references, or if all references
1890are unresolved along the search path for program sources, then the
1891command diagnoses this as an error.
1892
1893</P>
1894<P>
1895Even if <KBD>c</KBD> (or <KBD>M-c</KBD>) opens a new window, the cursor stays
1896in the PO file window. If the translator really wants to
1897get into the program source window, she ought to do it explicitly,
1898maybe by using command <KBD>o</KBD>.
1899
1900</P>
1901<P>
1902When <KBD>c</KBD> is typed for the first time, or for a PO file entry which
1903is different of the last one used for getting source context, then the
1904command reacts by giving the first context available for this entry,
1905if any. If some context has already been recently displayed for the
1906current PO file entry, and the translator wandered to do other
1907things, typing <KBD>c</KBD> again will merely resume, in another window,
1908the context last displayed. In particular, if the translator moved
1909the cursor away from the context in the source file, the command will
1910bring the cursor back to the context. By using <KBD>c</KBD> many times
1911in a row, with no interning other commands, PO mode will cycle to
1912the next available contexts for this particular entry, getting back
1913to the first context once the last has been shown.
1914
1915</P>
1916<P>
1917The command <KBD>M-c</KBD> behaves differently. Instead of cycling through
1918references, it lets the translator choose of particular reference among
1919many, and displays that reference. It is best used with completion,
1920if the translator types <KBD>TAB</KBD> immediately after <KBD>M-c</KBD>, in
1921response to the question, she will be offered a menu of all possible
1922references, as a reminder of which are the acceptable answers.
1923This command is useful only where there are really many contexts
1924available for a single string to translate.
1925
1926</P>
1927<P>
1928Program source files are usually found relative to where the PO
1929file stands. As a special provision, when this fails, the file is
1930also looked for, but relative to the directory immediately above it.
1931Those two cases take proper care of most PO files. However, it might
1932happen that a PO file has been moved, or is edited in a different
1933place than its normal location. When this happens, the translator
1934should tell PO mode in which directory normally sits the genuine PO
1935file. Many such directories may be specified, and all together, they
1936constitute what is called the <STRONG>search path</STRONG> for program sources.
1937The command <KBD>d</KBD> (<CODE>po-add-path</CODE>) is used to interactively
1938enter a new directory at the front of the search path, and the command
1939<KBD>M-d</KBD> (<CODE>po-delete-path</CODE>) is used to select, with completion,
1940one of the directories she does not want anymore on the search path.
1941
1942</P>
1943
1944
1945<H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">Using Translation Compendiums</A></H2>
1946
1947<P>
1948Compendiums are yet to be implemented.
1949
1950</P>
1951<P>
1952An incoming PO mode feature will let the translator maintain a
1953compendium of already achieved translations. A <STRONG>compendium</STRONG>
1954is a special PO file containing a set of translations recurring in
1955many different packages. The translator will be given commands for
1956adding entries to her compendium, and later initializing untranslated
1957entries, or updating already translated entries, from translations
1958kept in the compendium. For this to work, however, the compendium
1959would have to be normalized. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>.
1960
1961</P>
1962
1963
1964
1965<H1><A NAME="SEC22" HREF="gettext_toc.html#TOC22">Updating Existing PO Files</A></H1>
1966
1967
1968
1969<H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">Invoking the <CODE>tupdate</CODE> Program</A></H2>
1970
1971
1972<PRE>
1973tupdate --help
1974tupdate --version
1975tupdate <VAR>new</VAR> <VAR>old</VAR>
1976</PRE>
1977
1978<P>
1979File <VAR>new</VAR> is the last created PO file (generally by
1980<CODE>xgettext</CODE>). It need not contain any translations. File
1981<VAR>old</VAR> is the PO file including the old translations which will
1982be taken over to the newly created file as long as they still match.
1983
1984</P>
1985<P>
1986When English messages change in the programs, this is reflected in
1987the PO file as extracted by <CODE>xgettext</CODE>. In large messages, that
1988can be hard to detect, and will obviously result in an incomplete
1989translation. One of the virtues of <CODE>tupdate</CODE> is that it detects
1990such changes, saving the previous translation into a PO file comment,
1991so marking the entry as obsolete, and giving the modified string with
1992an empty translation, that is, marking the entry as untranslated.
1993
1994</P>
1995
1996
1997<H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">Untranslated Entries</A></H2>
1998
1999<P>
2000When <CODE>xgettext</CODE> originally creates a PO file, unless told
2001otherwise, it initializes the <CODE>msgid</CODE> field with the untranslated
2002string, and leaves the <CODE>msgstr</CODE> string to be empty. Such entries,
2003having an empty translation, are said to be <STRONG>untranslated</STRONG> entries.
2004Later, when the programmer slightly modifies some string right in
2005the program, this change is later reflected in the PO file
2006by the appearance of a new untranslated entry for the modified string.
2007
2008</P>
2009<P>
2010The usual commands moving from entry to entry consider untranslated
2011entries on the same level as active entries. Untranslated entries
2012are easily recognizable by the fact they end with <SAMP>`msgstr ""'</SAMP>.
2013
2014</P>
2015<P>
2016The work of the translator might be (quite naively) seen as the process
2017of seeking after an untranslated entry, editing a translation for
2018it, and repeating these actions until no untranslated entries remain.
2019Some commands are more specifically related to untranslated entry
2020processing.
2021
2022</P>
2023<DL COMPACT>
2024
2025<DT><KBD>e</KBD>
2026<DD>
2027Find the next untranslated entry.
2028
2029<DT><KBD>M-e</KBD>
2030<DD>
2031Find the previous untranslated entry.
2032
2033<DT><KBD>k</KBD>
2034<DD>
2035Turn the current entry into an untranslated one.
2036
2037</DL>
2038
2039<P>
2040The commands <KBD>e</KBD> (<CODE>po-next-empty-entry</CODE>) and <KBD>M-e</KBD>
2041(<CODE>po-previous-empty</CODE>) move forwards or backwards, chasing for an
2042obsolete entry. If none is found, the search is extended and wraps
2043around in the PO file buffer.
2044
2045</P>
2046<P>
2047An entry can be turned back into an untranslated entry by
2048merely emptying its translation, using the command <KBD>k</KBD>
2049(<CODE>po-kill-msgstr</CODE>). See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2050
2051</P>
2052<P>
2053Also, when time comes to quit working on a PO file buffer
2054with the <KBD>q</KBD> command, the translator is asked for confirmation,
2055if some untranslated string still exists.
2056
2057</P>
2058
2059
2060<H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">Obsolete Entries</A></H2>
2061
2062<P>
2063By <STRONG>obsolete</STRONG> PO file entries, we mean those entries which are
2064commented out, usually by <CODE>tupdate</CODE> when it found that the
2065translation is not needed anymore by the package being localized.
2066
2067</P>
2068<P>
2069The usual commands moving from entry to entry consider obsolete
2070entries on the same level as active entries. Obsolete entries are
2071easily recognizable by the fact that all their lines start with
2072<KBD>#</KBD>, even those lines containing <CODE>msgid</CODE> or <CODE>msgstr</CODE>.
2073
2074</P>
2075<P>
2076Commands exist for emptying the translation or reinitializing it
2077to the original untranslated string. Commands interfacing with the
2078kill ring may force some previously saved text into the translation.
2079The user may interactively edit the translation. All these commands
2080may apply to obsolete entries, carefully leaving the entry obsolete
2081after the fact.
2082
2083</P>
2084<P>
2085Moreover, some commands are more specifically related to obsolete
2086entry processing.
2087
2088</P>
2089<DL COMPACT>
2090
2091<DT><KBD>M-n</KBD>
2092<DD>
2093<DT><KBD>M-<KBD>SPC</KBD></KBD>
2094<DD>
2095Find the next obsolete entry.
2096
2097<DT><KBD>M-p</KBD>
2098<DD>
2099<DT><KBD>M-<KBD>DEL</KBD></KBD>
2100<DD>
2101Find the previous obsolete entry.
2102
2103<DT><KBD>z</KBD>
2104<DD>
2105Make an active entry obsolete, or zap out an obsolete entry.
2106
2107</DL>
2108
2109<P>
2110The commands <KBD>M-n</KBD> (<CODE>po-next-obsolete-entry</CODE>) and <KBD>M-p</KBD>
2111(<CODE>po-previous-obsolete-entry</CODE>) move forwards or backwards,
2112chasing for an obsolete entry. If none is found, the search is
2113extended and wraps around in the PO file buffer. The commands
2114<KBD>M-<KBD>SPC</KBD></KBD> and <KBD>M-<KBD>DEL</KBD></KBD> are synonymous to <KBD>M-n</KBD>
2115and <KBD>M-p</KBD>, respectively.
2116
2117</P>
2118<P>
2119PO mode does not provide ways for un-commenting an obsolete entry
2120and making it active, because this would reintroduce an original
2121untranslated string which does not correspond to any marked string
2122in the program sources. This goes with the philosophy of never
2123introducing useless <CODE>msgid</CODE> values.
2124
2125</P>
2126<P>
2127However, it is possible to comment out an active entry, so making
2128it obsolete. GNU <CODE>gettext</CODE> utilities will later react to the
2129disappearance of a translation by using the untranslated string.
2130The command <KBD>z</KBD> (<CODE>po-fade-out-entry</CODE>) pushes the current entry
2131a little further towards annihilation. If the entry is active, then
2132the entry is merely commented out. If the entry is already obsolete,
2133then it is completely deleted from the PO file. It is easy to recycle
2134the translation so deleted into some other PO file entry, usually
2135one which is untranslated. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2136
2137</P>
2138<P>
2139Here is a quite interesting problem to solve for later development of
2140PO mode, for those nights you are not sleepy. The idea would be that
2141PO mode might become bright enough, one of these days, to make good
2142guesses at retrieving the most probable candidate, among all obsolete
2143entries, for initializing the translation of a newly appeared string.
2144I think it might be a quite hard problem to do this algorithmically, as
2145we have to develop good and efficient measures of string similarity.
2146Right now, PO mode completely lets the decision to the translator,
2147when the time comes to find the adequate obsolete translation, it
2148merely tries to provide handy tools for helping her to do so.
2149
2150</P>
2151
2152
2153<H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">Modifying Translations</A></H2>
2154
2155<P>
2156PO mode prevents direct edition of the PO file, by the usual
2157means Emacs give for altering a buffer's contents. By doing so,
2158it pretends helping the translator to avoid little clerical errors
2159about the overall file format, or the proper quoting of strings,
2160as those errors would be easily made. Other kinds of errors are
2161still possible, but some may be catched and diagnosed by the batch
2162validation process, which the translator may always trigger by the
2163<KBD>v</KBD> command. For all other errors, the translator has to rely on
2164her own judgment, and also on the linguistic reports submitted to her
2165by the users of the translated package, having the same mother tongue.
2166
2167</P>
2168<P>
2169When the time comes to create a translation, correct a error diagnosed
2170mechanically or reported by a user, the translator have to resort to
2171using the following commands for modifying the translations.
2172
2173</P>
2174<DL COMPACT>
2175
2176<DT><KBD>RET</KBD>
2177<DD>
2178Interactively edit the translation.
2179
2180<DT><KBD>TAB</KBD>
2181<DD>
2182Reinitialize the translation with the original, untranslated string.
2183
2184<DT><KBD>k</KBD>
2185<DD>
2186Save the translation on the kill ring, and delete it.
2187
2188<DT><KBD>w</KBD>
2189<DD>
2190Save the translation on the kill ring, without deleting it.
2191
2192<DT><KBD>y</KBD>
2193<DD>
2194Replace the translation, taking the new from the kill ring.
2195
2196</DL>
2197
2198<P>
2199The command <KBD>RET</KBD> (<CODE>po-edit-msgstr</CODE>) opens a new Emacs
2200window containing a copy of the translation taken from the current
2201PO file entry, all ready for edition, fully modifiable
2202and with the complete extent of GNU Emacs modifying commands.
2203The string is presented to the translator expunged of all quoting
2204marks, and she will modify the <EM>unquoted</EM> string in this
2205window to heart's content. Once done, the regular Emacs command
2206<KBD>M-C-c</KBD> (<CODE>exit-recursive-edit</CODE>) may be used to return the
2207edited translation into the PO file, replacing the original
2208translation. The keys <KBD>C-c C-c</KBD> are bound so they have the
2209same effect as <KBD>M-C-c</KBD>.
2210
2211</P>
2212<P>
2213If the translator becomes unsatisfied with her translation to the
2214extent she prefers keeping the translation which was existent prior to
2215the <KBD>RET</KBD> command, she may use the regular Emacs command <KBD>C-]</KBD>
2216(<CODE>abort-recursive-edit</CODE>) to merely get rid of edition, while
2217preserving the original translation. Another way would be for her
2218to exit normally with <KBD>C-c C-c</KBD>, then type <CODE>u</CODE> once for
2219undoing the whole effect of last edition.
2220
2221</P>
2222<P>
2223While editing her translation, the translator should pay attention at
2224not inserting unwanted <KBD><KBD>RET</KBD></KBD> (carriage returns) characters at
2225the end of the translated string if those are not meant to be there,
2226or removing such characters when they are required. Since these
2227characters are not visible in the editing buffer, they are easily to
2228introduce by mistake. To help her, <KBD><KBD>RET</KBD></KBD> automatically puts
2229the character <KBD>&#60;</KBD> at the end of the string being edited, but this
2230<KBD>&#60;</KBD> is not really part of the string. On exiting the editing
2231window with <KBD>C-c C-c</KBD>, PO mode automatically removes such
2232<KBD>&#60;</KBD> and all whitespace added after it. If the translator adds
2233characters after the terminating <KBD>&#60;</KBD>, it looses its delimiting
2234property and integrally becomes part of the string. If she removes
2235the delimiting <KBD>&#60;</KBD>, then the edited string is taken <EM>as
2236is</EM>, with all trailing newlines, even if invisible. Also, if the
2237translated string ought to end itself with a genuine <KBD>&#60;</KBD>, then the
2238delimiting <KBD>&#60;</KBD> may not be removed; so the string should appear,
2239in the editing window, as ending with two <KBD>&#60;</KBD> in a row.
2240
2241</P>
2242<P>
2243When a translation (or a comment) is being edited, the translator
2244may move the cursor back into the PO file buffer and freely
2245move to other entries, and browsing at will. The edited entry will
2246be recovered as soon as the edit ceases, because this is this entry
2247only which is being modified. If, with an edition still opened, the
2248translator wanders in the PO file buffer, she cannot modify
2249any other entry. If she tries to, PO mode will react by suggesting
2250that she aborts the current edit, or else, by inviting her to finish
2251the current edit prior to any other modification.
2252
2253</P>
2254<P>
2255The command <KBD>TAB</KBD> (<CODE>po-msgid-to-msgstr</CODE>) initializes, or
2256reinitializes the translation with the original string. This command
2257is normally used when the translator wants to redo a fresh translation
2258of the original string, disregarding any previous work.
2259
2260</P>
2261<P>
2262In fact, whether it is best to start a translation with an empty
2263string, or rather with a copy of the original string, is a matter of
2264taste or habit. Sometimes, the source mother tongue language and the
2265target language are so different that is simply best to start writing
2266on an empty page. At other times, the source and target languages
2267are so close that it would be a waste to retype a number of words
2268already being written in the original string. A translator may also
2269like having the original string right under her eyes, as she will
2270progressively overwrite the original text with the translation, even
2271if this requires some extra editing work to get rid of the original.
2272
2273</P>
2274<P>
2275The command <KBD>k</KBD> (<CODE>po-kill-msgstr</CODE>) merely empties the
2276translation string, so turning the entry into an untranslated
2277one. But while doing so, its previous contents is put apart in
2278a special place, known as the kill ring. The command <KBD>w</KBD>
2279(<CODE>po-kill-ring-save-msgstr</CODE>) has also the effect of taking a
2280copy of the translation onto the kill ring, but it otherwise leaves
2281the entry alone, and does <EM>not</EM> remove the translation from the
2282entry. Both commands use exactly the Emacs kill ring, which is shared
2283between buffers, and which is well known already to GNU Emacs lovers.
2284
2285</P>
2286<P>
2287The translator may use <KBD>k</KBD> or <KBD>w</KBD> many times in the course
2288of her work, as the kill ring may hold several saved translations.
2289From the kill ring, strings may later be reinserted in various
2290Emacs buffers. In particular, the kill ring may be used for moving
2291translation strings between different entries of a single PO file
2292buffer, or if the translator is handling many such buffers at once,
2293even between PO files.
2294
2295</P>
2296<P>
2297To facilitate exchanges with buffers which are not in PO mode, the
2298translation string put on the kill ring by the <KBD>k</KBD> command is fully
2299unquoted before being saved: external quotes are removed, multi-lines
2300strings are concatenated, and backslashed escaped sequences are turned
2301into their corresponding characters. In the special case of obsolete
2302entries, the translation is also uncommented prior to saving.
2303
2304</P>
2305<P>
2306The command <KBD>y</KBD> (<CODE>po-yank-msgstr</CODE>) completely replaces the
2307translation of the current entry by a string taken from the kill ring.
2308Following GNU Emacs terminology, we then say that the replacement
2309string is <STRONG>yanked</STRONG> into the PO file buffer.
2310See section `Yanking' in <CITE>The Emacs Editor</CITE>.
2311The first time <KBD>y</KBD> is used, the translation receives the value of
2312the most recent addition to the kill ring. If <KBD>y</KBD> is typed once
2313again, immediately, without intervening keystrokes, the translation
2314just inserted is taken away and replaced by the second most recent
2315addition to the kill ring. By repeating <KBD>y</KBD> many times in a row,
2316the translator may travel along the kill ring for saved strings,
2317until she finds the string she really wanted.
2318
2319</P>
2320<P>
2321When a string is yanked into a PO file entry, it is fully and
2322automatically requoted for complying with the format PO files should
2323have. Further, if the entry is obsolete, PO mode then appropriately
2324push the inserted string inside comments. Once again, translators
2325should not burden themselves with quoting considerations besides, of
2326course, the necessity of the translated string itself respective to
2327the program using it.
2328
2329</P>
2330<P>
2331Note that <KBD>k</KBD> or <KBD>w</KBD> are not the only commands pushing strings
2332on the kill ring, as almost any PO mode command replacing translation
2333strings (or the translator comments) automatically save the old string
2334on the kill ring. The main exceptions to this general rule are the
2335yanking commands themselves.
2336
2337</P>
2338<P>
2339To better illustrate the operation of killing and yanking, let's
2340use an actual example, taken from a common situation. When the
2341programmer slightly modifies some string right in the program, his
2342change is later reflected in the PO file by the appearance
2343of a new untranslated entry for the modified string, and the fact
2344that the entry translating the original or unmodified string becomes
2345obsolete. In many cases, the translator might spare herself some work
2346by retrieving the unmodified translation from the obsolete entry,
2347then initializing the untranslated entry <CODE>msgstr</CODE> field with
2348this retrieved translation. Once this done, the obsolete entry is
2349not wanted anymore, and may be safely deleted.
2350
2351</P>
2352<P>
2353When the translator finds an untranslated entry and suspects that a
2354slight variant of the translation exists, she immediately uses <KBD>m</KBD>
2355to mark the current entry location, then starts chasing obsolete
2356entries with <KBD>M-SPC</KBD>, hoping to find some translation corresponding
2357to the unmodified string. Once found, she uses the <KBD>z</KBD> command
2358for deleting the obsolete entry, knowing that <KBD>z</KBD> also <EM>kills</EM>
2359the translation, that is, pushes the translation on the kill ring.
2360Then, <KBD>l</KBD> returns to the initial untranslated entry, <KBD>y</KBD>
2361then <EM>yanks</EM> the saved translation right into the <CODE>msgstr</CODE>
2362field. The translator is then free to use <KBD><KBD>RET</KBD></KBD> for fine
2363tuning the translation contents, and maybe to later use <KBD>e</KBD>,
2364then <KBD>m</KBD> again, for going on with the next untranslated string.
2365
2366</P>
2367<P>
2368When some sequence of keys has to be typed over and over again, the
2369translator may find comfortable to become more acquainted with the GNU
2370Emacs capability of learning these sequences and playing them back under
2371request. See section `Keyboard Macros' in <CITE>The Emacs Editor</CITE>.
2372
2373</P>
2374
2375
2376<H2><A NAME="SEC27" HREF="gettext_toc.html#TOC27">Modifying Comments</A></H2>
2377
2378<P>
2379Any translation work done seriously will raise many linguistic
2380difficulties, for which decisions have to be made, and the choices
2381further documented. These documents may be saved within the
2382PO file in form of translator comments, which the translator
2383is free to create, delete, or modify at will. These comments may
2384be useful to herself when she returns to this PO file after a while.
2385Memory forgets!
2386
2387</P>
2388<P>
2389These commands are somewhat similar to those modifying translations,
2390so the general indications given for these apply here. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2391
2392</P>
2393<DL COMPACT>
2394
2395<DT><KBD>M-RET</KBD>
2396<DD>
2397Interactively edit the translator comments.
2398
2399<DT><KBD>M-k</KBD>
2400<DD>
2401Save the translator comments on the kill ring, and delete it.
2402
2403<DT><KBD>M-w</KBD>
2404<DD>
2405Save the translator comments on the kill ring, without deleting it.
2406
2407<DT><KBD>M-y</KBD>
2408<DD>
2409Replace the translator comments, taking the new from the kill ring.
2410
2411</DL>
2412
2413<P>
2414Those commands parallel PO mode commands for modifying the translation
2415strings, and behave much the same way as them, except that they handle
2416this part of PO file comments meant for translator usage, rather
2417than the translation strings. So, the descriptions given below are
2418slightly succinct, because the full details have already been given.
2419See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2420
2421</P>
2422<P>
2423The command <KBD>M-RET</KBD> (<CODE>po-edit-comment</CODE>) opens a new Emacs
2424window containing a copy of the translator comments the current
2425PO file entry. If there is no such comments, PO mode
2426understands that the translator wants to add a comment to the entry,
2427and she is presented an empty screen. Comment marks (<KBD>#</KBD>) and
2428the space following them are automatically removed before edition,
2429and reinstated after. For translator comments pertaining to obsolete
2430entries, the uncommenting and recommenting operations are done twice.
2431The command <KBD>#</KBD> also has the same effect as <KBD>M-RET</KBD>, and might
2432be easier to type. Once in the editing window, the keys <KBD>C-c
2433C-c</KBD> allow the translator to tell she is finished with editing
2434the comment.
2435
2436</P>
2437<P>
2438The command <KBD>M-k</KBD> (<CODE>po-kill-comment</CODE>) get rid of all
2439translator comments, while saving those comments on the kill ring.
2440The command <KBD>M-w</KBD> (<CODE>po-kill-ring-save-comment</CODE>) takes
2441a copy of the translator comments on the kill ring, but leaves
2442them undisturbed in the current entry. The command <KBD>M-y</KBD>
2443(<CODE>po-yank-comment</CODE>) completely replaces the translator comments
2444by a string taken at the front of the kill ring. When this command
2445is immediately repeated, the comments just inserted are withdrawn,
2446and replaced by other strings taken along the kill ring.
2447
2448</P>
2449<P>
2450On the kill ring, all strings have the same nature. There is no
2451distinction between <EM>translation</EM> strings and <EM>translator
2452comments</EM> strings. So, for example, let's presume the translator
2453has just finished editing a translation, and wants to create a new
2454translator comments for documenting why the previous translation was
2455not good, just to remember what was the problem. Foreseeing that she
2456will do that in her documentation, the translator will want to quote
2457the previous translation in her translator comments. For doing so, she
2458may initialize the translator comments with the previous translation,
2459still at the head of the kill ring. Because editing already pushed the
2460previous translation on the kill ring, she just has to type <KBD>M-w</KBD>
2461prior to <KBD>#</KBD>, and the previous translation will be right there,
2462all ready for being introduced by some explanatory text.
2463
2464</P>
2465<P>
2466On the other hand, presume there are some translator comments already
2467and that the translator wants to add to those comments, instead
2468of wholly replacing them. Then, she should edit the comment right
2469away with <KBD>#</KBD>. Once inside the editing window, she can use the
2470regular GNU Emacs commands <KBD>C-y</KBD> (<CODE>yank</CODE>) and <KBD>M-y</KBD>
2471(<CODE>yank-pop</CODE>) for getting the previous translation where she likes.
2472
2473</P>
2474
2475
2476<H2><A NAME="SEC28" HREF="gettext_toc.html#TOC28">Consulting Auxiliary PO Files</A></H2>
2477
2478<P>
2479An incoming feature of PO mode should help the knowledgeable translator
2480to take advantage of translations already achieved in other languages
2481she just happens to know, by providing these other language translation
2482as additional context for her own work. Each PO file existing for
2483the same package the translator is working on, but targeted to a
2484different mother tongue language, is called an <STRONG>auxiliary</STRONG> PO file.
2485Commands will exist for declaring and handling auxiliary PO files,
2486and also for showing contexts for the entry under work. For this to
2487work fully, all auxiliary PO files will have to be normalized.
2488
2489</P>
2490
2491
2492<H1><A NAME="SEC29" HREF="gettext_toc.html#TOC29">Producing Binary MO Files</A></H1>
2493
2494
2495
2496<H2><A NAME="SEC30" HREF="gettext_toc.html#TOC30">Invoking the <CODE>msgfmt</CODE> Program</A></H2>
2497
2498
2499<PRE>
2500Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
2501</PRE>
2502
2503<DL COMPACT>
2504
2505<DT><SAMP>`-a <VAR>number</VAR>'</SAMP>
2506<DD>
2507<DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP>
2508<DD>
2509Align strings to <VAR>number</VAR> bytes (default: 1).
2510
2511<DT><SAMP>`-h'</SAMP>
2512<DD>
2513<DT><SAMP>`--help'</SAMP>
2514<DD>
2515Display this help and exit.
2516
2517<DT><SAMP>`-I <VAR>list</VAR>'</SAMP>
2518<DD>
2519<DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP>
2520<DD>
2521List of directories searched for input files.
2522
2523<DT><SAMP>`--no-hash'</SAMP>
2524<DD>
2525Binary file will not include the hash table.
2526
2527<DT><SAMP>`-o <VAR>file</VAR>'</SAMP>
2528<DD>
2529<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
2530<DD>
2531Specify output file name as <VAR>file</VAR>.
2532
2533<DT><SAMP>`-v'</SAMP>
2534<DD>
2535<DT><SAMP>`--verbose'</SAMP>
2536<DD>
2537Detect and diagnose input file anomalies which might represent
2538translation errors. The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are
2539studied and compared. It is considered abnormal that one string
2540starts or ends with a newline while the other does not. Also, both
2541strings should have the same number of <SAMP>`%'</SAMP> format specifiers,
2542with matching types. For example, the check will diagnose using
2543<SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%s'</SAMP>, or
2544<SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>. It can even handle positional parameters.
2545
2546<DT><SAMP>`-V'</SAMP>
2547<DD>
2548<DT><SAMP>`--version'</SAMP>
2549<DD>
2550Output version information and exit.
2551
2552</DL>
2553
2554<P>
2555If input file is <SAMP>`-'</SAMP>, standard input is read. If output file
2556is <SAMP>`-'</SAMP>, output is written to standard output.
2557
2558</P>
2559<P>
2560The search patch for <CODE>msgfmt</CODE> is <TT>`/usr/local/share/nls/src/'</TT>,
2561by default. It represents the path to additional directories where
2562other PO files can be found. This feature could be used for some
2563PO files for standard libraries, in case we would like to spare
2564translating their strings over and over again. The <SAMP>`-x'</SAMP> option
2565could then exclude these strings from the generation.
2566
2567</P>
2568
2569
2570<H2><A NAME="SEC31" HREF="gettext_toc.html#TOC31">The Format of GNU MO Files</A></H2>
2571
2572<P>
2573The format of the generated MO files is best described by a picture,
2574which appears below.
2575
2576</P>
2577<P>
2578The first two words serve the identification of the file. The magic
2579number will always signal GNU MO files. The number is stored in the
2580byte order of the generating machine, so the magic number really is
2581two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second
2582word describes the current revision of the file format. For now the
2583revision is 0. This might change in future versions, and ensures
2584that the readers of MO files can distinguish new formats from old
2585ones, so that both can be handled correctly. The version is kept
2586separate from the magic number, instead of using different magic
2587numbers for different formats, mainly because <TT>`/etc/magic'</TT> is
2588not updated often. It might be better to have magic separated from
2589internal format version identification.
2590
2591</P>
2592<P>
2593Follow a number of pointers to later tables in the file, allowing
2594for the extension of the prefix part of MO files without having to
2595recompile programs reading them. This might become useful for later
2596inserting a few flag bits, indication about the charset used, new
2597tables, or other things.
2598
2599</P>
2600<P>
2601Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
2602of string descriptors can be found. In both tables, each string
2603descriptor uses two 32 bits integers, one for the string length,
2604another for the offset of the string in the MO file, counting in bytes
2605from the start of the file. The first table contains descriptors
2606for the original strings, and is sorted so the original strings
2607are in increasing lexicographical order. The second table contains
2608descriptors for the translated strings, and is parallel to the first
2609table: to find the corresponding translation one has to access the
2610array slot in the second array with the same index.
2611
2612</P>
2613<P>
2614Having the original strings sorted enables the use of simple binary
2615search, for when the MO file does not contain an hashing table, or
2616for when it is not practical to use the hashing table provided in
2617the MO file. This also has another advantage, as the empty string
2618in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
2619some system information attached to that particular MO file, and the
2620empty string necessarily becomes the first in both the original and
2621translated tables, making the system information very easy to find.
2622
2623</P>
2624<P>
2625The size <VAR>S</VAR> of the hash table can be zero. In this case, the
2626hash table itself is not contained in the MO file. Some people might
2627prefer this because a precomputed hashing table takes disk space, and
2628does not win <EM>that</EM> much speed. The hash table contains indices
2629to the sorted array of strings in the MO file. Conflict resolution is
2630done by double hashing. The precise hashing algorithm used is fairly
2631dependent of GNU <CODE>gettext</CODE> code, and is not documented here.
2632
2633</P>
2634<P>
2635As for the strings themselves, they follow the hash file, and each
2636is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
2637the length which appears in the string descriptor. The <CODE>msgfmt</CODE>
2638program has an option selecting the alignment for MO file strings.
2639With this option, each string is separately aligned so it starts at
2640an offset which is a multiple of the alignment value. On some RISC
2641machines, a correct alignment will speed things up.
2642
2643</P>
2644<P>
2645Nothing prevents an MO file from having embedded <KBD>NUL</KBD>s in strings.
2646However, the program interface currently used already presumes
2647that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
2648somewhat useless. But MO file format is general enough so other
2649interfaces would be later possible, if for example, we ever want to
2650implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
2651accidently appear.
2652
2653</P>
2654<P>
2655This particular issue has been strongly debated in the GNU
2656<CODE>gettext</CODE> development forum, and it is expectable that MO file
2657format will evolve or change over time. It is even possible that many
2658formats may later be supported concurrently. But surely, we got to
2659start somewhere, and the MO file format described here is a good start.
2660Nothing is cast in concrete, and the format may later evolve fairly
2661easily, so we should feel comfortable with the current approach.
2662
2663</P>
2664
2665<PRE>
2666 byte
2667 +------------------------------------------+
2668 0 | magic number = 0x950412de |
2669 | |
2670 4 | file format revision = 0 |
2671 | |
2672 8 | number of strings | == N
2673 | |
2674 12 | offset of table with original strings | == O
2675 | |
2676 16 | offset of table with translation strings | == T
2677 | |
2678 20 | size of hashing table | == S
2679 | |
2680 24 | offset of hashing table | == H
2681 | |
2682 . .
2683 . (possibly more entries later) .
2684 . .
2685 | |
2686 O | length &#38; offset 0th string ----------------.
2687 O + 8 | length &#38; offset 1st string ------------------.
2688 ... ... | |
2689O + ((N-1)*8)| length &#38; offset (N-1)th string | | |
2690 | | | |
2691 T | length &#38; offset 0th translation ---------------.
2692 T + 8 | length &#38; offset 1st translation -----------------.
2693 ... ... | | | |
2694T + ((N-1)*8)| length &#38; offset (N-1)th translation | | | | |
2695 | | | | | |
2696 H | start hash table | | | | |
2697 ... ... | | | |
2698 H + S * 4 | end hash table | | | | |
2699 | | | | | |
2700 | NUL terminated 0th string &#60;----------------' | | |
2701 | | | | |
2702 | NUL terminated 1st string &#60;------------------' | |
2703 | | | |
2704 ... ... | |
2705 | | | |
2706 | NUL terminated 0th translation &#60;---------------' |
2707 | | |
2708 | NUL terminated 1st translation &#60;-----------------'
2709 | |
2710 ... ...
2711 | |
2712 +------------------------------------------+
2713</PRE>
2714
2715
2716
2717<H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">The User's View</A></H1>
2718
2719<P>
2720When GNU <CODE>gettext</CODE> will truly have reached is goal, average users
2721should feel some kind of astonished pleasure, seeing the effect of
2722that strange kind of magic that just makes their own native language
2723appear everywhere on their screens. As for naive users, they would
2724ideally have no special pleasure about it, merely taking their own
2725language for <EM>granted</EM>, and becoming rather unhappy otherwise.
2726
2727</P>
2728<P>
2729So, let's try to describe here how we would like the magic to operate,
2730as we want the users' view to be the simplest, among all ways one
2731could look at GNU <CODE>gettext</CODE>. All other software engineers:
2732programmers, translators, maintainers, should work together in such a
2733way that the magic becomes possible. This is a long and progressive
2734undertaking, and information is available about the progress of the
2735GNU Translation Project.
2736
2737</P>
2738<P>
2739When a package is distributed, there are two kind of users:
2740<STRONG>installers</STRONG> who fetch the distribution, unpack it, configure
2741it, compile it and install it for themselves or others to use; and
2742<STRONG>end users</STRONG> that call programs of the package, once these have
2743been installed at their site. GNU <CODE>gettext</CODE> is offering magic
2744for both installers and end users.
2745
2746</P>
2747
2748
2749
2750<H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">The Current <TT>`NLS'</TT> Matrix for GNU</A></H2>
2751
2752<P>
2753Languages are not equally supported in all GNU packages. To know
2754if some GNU package uses GNU <CODE>gettext</CODE>, one may check
2755the distribution for the <TT>`NLS'</TT> information file, for some
2756<TT>`<VAR>ll</VAR>.po'</TT> files, often kept together into some <TT>`po/'</TT>
2757directory, or for an <TT>`intl/'</TT> directory. Internationalized
2758packages have usually many <TT>`<VAR>ll</VAR>.po'</TT> files, where <VAR>ll</VAR>
2759represents the language. section <A HREF="gettext.html#SEC35">Magic for End Users</A> for a complete description
2760of the format for <VAR>ll</VAR>.
2761
2762</P>
2763<P>
2764More generally, a matrix is available for showing the current state
2765of GNU internationalization, listing which packages are prepared
2766for multi-lingual messages, and which languages is supported by each.
2767Because this information changes often, this matrix is not kept within
2768this GNU <CODE>gettext</CODE> manual. This information is often found in
2769file <TT>`NLS'</TT> from various GNU distributions, but is also as old
2770as the distribution itself. A recent copy of this <TT>`NLS'</TT> file,
2771containing up-to-date information, should generally be found on most
2772GNU archive sites.
2773
2774</P>
2775
2776
2777<H2><A NAME="SEC34" HREF="gettext_toc.html#TOC34">Magic for Installers</A></H2>
2778
2779<P>
2780By default, packages fully using GNU <CODE>gettext</CODE>, internally,
2781are installed in such a way that they to allow translation of
2782messages. At <EM>configuration</EM> time, those packages should
2783automatically detect whether the underlying host system provides usable
2784<CODE>catgets</CODE> or <CODE>gettext</CODE> functions. If neither is present,
2785the GNU <CODE>gettext</CODE> library should be automatically prepared
2786and used. Installers may use special options at configuration
2787time for changing this behavior. The command <SAMP>`./configure
2788--with-gnu-gettext'</SAMP> bypasses system <CODE>catgets</CODE> or <CODE>gettext</CODE> to
2789use GNU <CODE>gettext</CODE> instead, while <SAMP>`./configure --disable-nls'</SAMP>
2790produces program totally unable to translate messages.
2791
2792</P>
2793<P>
2794Internationalized packages have usually many <TT>`<VAR>ll</VAR>.po'</TT>
2795files. Unless
2796translations are disabled, all those available are installed together
2797with the package. However, the environment variable <CODE>LINGUAS</CODE>
2798may be set, prior to configuration, to limit the installed set.
2799<CODE>LINGUAS</CODE> should then contain a space separated list of two-letter
2800codes, stating which languages are allowed.
2801
2802</P>
2803
2804
2805<H2><A NAME="SEC35" HREF="gettext_toc.html#TOC35">Magic for End Users</A></H2>
2806
2807<P>
2808We consider here those packages using GNU <CODE>gettext</CODE> internally,
2809and for which the installers did not disable translation at
2810<EM>configure</EM> time. Then, users only have to set the <CODE>LANG</CODE>
2811environment variable to the appropriate <SAMP>`<VAR>ll</VAR>'</SAMP> prior to
2812using the programs in the package. See section <A HREF="gettext.html#SEC33">The Current <TT>`NLS'</TT> Matrix for GNU</A>. For example,
2813let's presume a German site. At the shell prompt, users merely have to
2814execute <SAMP>`setenv LANG de'</SAMP> (in <CODE>csh</CODE>) or <SAMP>`export
2815LANG; LANG=de'</SAMP> (in <CODE>sh</CODE>). They could even do this from their
2816<TT>`.login'</TT> or <TT>`.profile'</TT> file.
2817
2818</P>
2819
2820
2821<H1><A NAME="SEC36" HREF="gettext_toc.html#TOC36">The Programmer's View</A></H1>
2822
2823<P>
2824One aim of the current message catalog implementation provided by
2825GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the
2826installer wishes to do so. So we perhaps should first take a look at
2827the solutions we know about. The people in the POSIX committee does not
2828manage to agree on one of the semi-official standards which we'll
2829describe below. In fact they couldn't agree on anything, so nothing
2830decide only to include an example of an interface. The major Unix vendors
2831are split in the usage of the two most important specifications: X/Opens
2832catgets vs. Uniforums gettext interface. We'll describe them both and
2833later explain our solution of this dilemma.
2834
2835</P>
2836
2837
2838
2839<H2><A NAME="SEC37" HREF="gettext_toc.html#TOC37">About <CODE>catgets</CODE></A></H2>
2840
2841<P>
2842The <CODE>catgets</CODE> implementation is defined in the X/Open Portability
2843Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
2844process of creating this standard seemed to be too slow for some of
2845the Unix vendors so they created their implementations on preliminary
2846versions of the standard. Of course this leads again to problems while
2847writing platform independent programs: even the usage of <CODE>catgets</CODE>
2848does not guarantee a unique interface.
2849
2850</P>
2851<P>
2852Another, personal comment on this that only a bunch of committee members
2853could have made this interface. They never really tried to program
2854using this interface. It is a fast, memory-saving implementation, an
2855user can happily live with it. But programmers hate it (at least me and
2856some others do...)
2857
2858</P>
2859<P>
2860But we must not forget one point: after all the trouble with transfering
2861the rights on Unix(tm) they at last came to X/Open, the very same who
2862published this specifications. This leads me to making the prediction
2863that this interface will be in future Unix standards (e.g. Spec1170) and
2864therefore part of all Unix implementation (implementations, which are
2865<EM>allowed</EM> to wear this name).
2866
2867</P>
2868
2869
2870
2871<H3><A NAME="SEC38" HREF="gettext_toc.html#TOC38">The Interface</A></H3>
2872
2873<P>
2874The interface to the <CODE>catgets</CODE> implementation consists of three
2875functions which correspond to those used in file access: <CODE>catopen</CODE>
2876to open the catalog for using, <CODE>catgets</CODE> for accessing the message
2877tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes
2878for the functions and the needed definitions are in the
2879<CODE>&#60;nl_types.h&#62;</CODE> header file.
2880
2881</P>
2882<P>
2883<CODE>catopen</CODE> is used like in this:
2884
2885</P>
2886
2887<PRE>
2888nl_catd catd = catopen ("catalog_name", 0);
2889</PRE>
2890
2891<P>
2892The function takes as the argument the name of the catalog. This usual
2893refers to the name of the program or the package. The second parameter
2894is not further specified in the standard. I don't even know whether it
2895is implemented consistently among various systems. So the common advice
2896is to use <CODE>0</CODE> as the value. The return value is a handle to the
2897message catalog, equivalent to handles to file returned by <CODE>open</CODE>.
2898
2899</P>
2900<P>
2901This handle is of course used in the <CODE>catgets</CODE> function which can
2902be used like this:
2903
2904</P>
2905
2906<PRE>
2907char *translation = catgets (catd, set_no, msg_id, "original string");
2908</PRE>
2909
2910<P>
2911The first parameter is this catalog descriptor. The second parameter
2912specifies the set of messages in this catalog, in which the message
2913described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a
2914three-stage addressing:
2915
2916</P>
2917
2918<PRE>
2919catalog name => set number => message ID => translation
2920</PRE>
2921
2922<P>
2923The fourth argument is not used to address the translation. It is given
2924as a default value in case when one of the addressing stages fail. One
2925important thing to remember is that although the return type of catgets
2926is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It
2927should better <CODE>const char *</CODE>, but the standard is published in
29281988, one year before ANSI C.
2929
2930</P>
2931<P>
2932The last of these function functions is used and behaves as expected:
2933
2934</P>
2935
2936<PRE>
2937catclose (catd);
2938</PRE>
2939
2940<P>
2941After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
2942
2943</P>
2944
2945
2946<H3><A NAME="SEC39" HREF="gettext_toc.html#TOC39">Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
2947
2948<P>
2949Now that this descriptions seemed to be really easy where are the
2950problem we speak of. In fact the interface could be used in a
2951reasonable way, but constructing the message catalogs is a pain. The
2952reason for this lies in the third argument of <CODE>catgets</CODE>: the unique
2953message ID. This has to be a numeric value for all messages in a single
2954set. Perhaps you could imagine the problems keeping such list while
2955changing the source code. Add a new message here, remove one there. Of
2956course there have been developed a lot of tools helping to organize this
2957chaos but one as the other fails in one aspect or the other. We don't
2958want to say that the other approach has no problems but they are far
2959more easily to manage.
2960
2961</P>
2962
2963
2964<H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>gettext</CODE></A></H2>
2965
2966<P>
2967The definition of the <CODE>gettext</CODE> interface comes from a Uniforum
2968proposal and it is followed by at least one major Unix vendor
2969(Sun) in its last developments. It is not specified in any official
2970standard, though.
2971
2972</P>
2973<P>
2974The main points about this solution is that it does not follow the
2975method of normal file handling (open-use-close) and that it does not
2976burden the programmer so many task, especially the unique key handling.
2977Of course here is also a unique key needed, but this key is the
2978message itself (how long or short it is). See section <A HREF="gettext.html#SEC45">Comparing the Two Interfaces</A> for a
2979more detailed comparison of the two methods.
2980
2981</P>
2982<P>
2983The following section contains a rather detailed description of the
2984interface. We make it that detailed because this is the interface
2985we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested
2986in using this library will be interested in this description.
2987
2988</P>
2989
2990
2991
2992<H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3>
2993
2994<P>
2995The minimal functionality an interface must have is a) to select a
2996domain the strings are coming from (a single domain for all programs is
2997not reasonable because its construction and maintenance is difficult,
2998perhaps impossible) and b) to access a string in a selected domain.
2999
3000</P>
3001<P>
3002This is principally the description of the <CODE>gettext</CODE> interface. It
3003has an global domain which unqualified usages reference. Of course this
3004domain is selectable by the user.
3005
3006</P>
3007
3008<PRE>
3009char *textdomain (const char *domain_name);
3010</PRE>
3011
3012<P>
3013This provides the possibility to change or query the current status of
3014the current global domain of the <CODE>LC_MESSAGE</CODE> category. The
3015argument is a null-terminated string, whose characters must be legal in
3016the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
3017the function return the current value. If no value has been set
3018before, the name of the default domain is returned: <EM>messages</EM>.
3019Please note that although the return value of <CODE>textdomain</CODE> is of
3020type <CODE>char *</CODE> no changing is allowed. It is also important to know
3021that no checks of the availability are made. If the name is not
3022available you will see this by the fact that no translations are provided.
3023
3024</P>
3025<P>
3026To use a domain set by <CODE>textdomain</CODE> the function
3027
3028</P>
3029
3030<PRE>
3031char *gettext (const char *msgid);
3032</PRE>
3033
3034<P>
3035is to be used. This is the simplest reasonable form one can imagine.
3036The translation of the string <VAR>msgid</VAR> is returned if it is available
3037in the current domain. If not available the argument itself is
3038returned. If the argument is <CODE>NULL</CODE> the result is undefined.
3039
3040</P>
3041<P>
3042One things which should come into mind is that no explicit dependency to
3043the used domain is given. The current value of the domain for the
3044<CODE>LC_MESSAGES</CODE> locale is used. If this changes between two
3045executions of the same <CODE>gettext</CODE> call in the program, both calls
3046reference a different message catalog.
3047
3048</P>
3049<P>
3050For the easiest case, which is normally used in internationalized GNU
3051packages, once at the beginning of execution a call to <CODE>textdomain</CODE>
3052is issued, setting the domain to a unique name, normally the package
3053name. In the following code all strings which have to be translated are
3054filtered through the gettext function. That's all, the package speaks
3055your language.
3056
3057</P>
3058
3059
3060<H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Solving Ambiguities</A></H3>
3061
3062<P>
3063While this single name domain work good for most applications there
3064might be the need to get translations from more than one domain. Of
3065course one could switch between different domains with calls to
3066<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A
3067possible situation could be one case discussing while this writing: all
3068error messages of functions in the set of common used functions should
3069go into a separate domain <CODE>error</CODE>. By this mean we would only need
3070to translate them once.
3071
3072</P>
3073<P>
3074For this reasons there are two more functions to retrieve strings:
3075
3076</P>
3077
3078<PRE>
3079char *dgettext (const char *domain_name, const char *msgid);
3080char *dcgettext (const char *domain_name, const char *msgid,
3081 int category);
3082</PRE>
3083
3084<P>
3085Both take an additional argument at the first place, which corresponds
3086to the argument of <CODE>textdomain</CODE>. The third argument of
3087<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>.
3088But I really don't know where this can be useful. If the
3089<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
3090the known ones, the result is undefined. It should also be noted that
3091this function is not part of the second known implementation of this
3092function family, the one found in Solaris.
3093
3094</P>
3095<P>
3096A second ambiguity can arise by the fact, that perhaps more than one
3097domain has the same name. This can be solved by specifying where the
3098needed message catalog files can be found.
3099
3100</P>
3101
3102<PRE>
3103char *bindtextdomain (const char *domain_name,
3104 const char *dir_name);
3105</PRE>
3106
3107<P>
3108Calling this function binds the given domain to a file in the specified
3109directory (how this file is determined follows below). Esp a file in
3110the systems default place is not favored against the specified file
3111anymore (as it would be by solely using <CODE>textdomain</CODE>). A <CODE>NULL</CODE>
3112pointer for the <VAR>dir_name</VAR> parameter returns the binding associated
3113with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is <CODE>NULL</CODE>
3114nothing happens and a <CODE>NULL</CODE> pointer is returned. Here again as
3115for all the other functions is true that none of the return value must
3116be changed!
3117
3118</P>
3119
3120
3121<H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">Locating Message Catalog Files</A></H3>
3122
3123<P>
3124Because many different languages for many different packages have to be
3125stored we need some way to add these information to file message catalog
3126files. The way usually used in Unix environments is have this encoding
3127in the file name. This is also done here. The directory name given in
3128<CODE>bindtextdomain</CODE>s second argument (or the default directory),
3129followed by the value and name of the locale and the domain name are
3130concatenated:
3131
3132</P>
3133
3134<PRE>
3135<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
3136</PRE>
3137
3138<P>
3139The default value for <VAR>dir_name</VAR> is system specific. For the GNU
3140library it's:
3141
3142<PRE>
3143/usr/local/share/locale
3144</PRE>
3145
3146<P>
3147<VAR>locale</VAR> is the value of the locale whose name is this
3148<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
3149locale is always <CODE>LC_MESSAGES</CODE>. <CODE>dcgettext</CODE> specifies the
3150locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
3151
3152</P>
3153
3154
3155<H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">Optimization of the *gettext functions</A></H3>
3156
3157<P>
3158At this point of the discussion we should talk about an advantage of the
3159GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out
3160that an internationalized program might have a poor performance if some
3161string has to be translated in an inner loop. While this is unavoidable
3162when the string varies from one run of the loop to the other it is
3163simply a waste of time when the string is always the same. Take the
3164following example:
3165
3166</P>
3167
3168<PRE>
3169{
3170 while (...)
3171 {
3172 puts (gettext ("Hello world"));
3173 }
3174}
3175</PRE>
3176
3177<P>
3178When the locale selection does not change between two runs the resulting
3179string is always the same. One way to use this is:
3180
3181</P>
3182
3183<PRE>
3184{
3185 str = gettext ("Hello world");
3186 while (...)
3187 {
3188 puts (str);
3189 }
3190}
3191</PRE>
3192
3193<P>
3194But this solution is not usable in all situation (e.g. when the locale
3195selection changes) nor is it good readable.
3196
3197</P>
3198<P>
3199The GNU C compiler, version 2.7 and above, provide another solution for
3200this. To describe this we show here some lines of the
3201<TT>`intl/libgettext.h'</TT> file. For an explanation of the expression
3202command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>.
3203
3204</P>
3205
3206<PRE>
3207# if defined __GNUC__ &#38;&#38; __GNUC__ == 2 &#38;&#38; __GNUC_MINOR__ &#62;= 7
3208# define dcgettext(domainname, msgid, category) \
3209 (__extension__ \
3210 ({ \
3211 char *result; \
3212 if (__builtin_constant_p (msgid)) \
3213 { \
3214 extern int _nl_msg_cat_cntr; \
3215 static char *__translation__; \
3216 static int __catalog_counter__; \
3217 if (! __translation__ \
3218 || __catalog_counter__ != _nl_msg_cat_cntr) \
3219 { \
3220 __translation__ = \
3221 dcgettext__ ((domainname), (msgid), (category)); \
3222 __catalog_counter__ = _nl_msg_cat_cntr; \
3223 } \
3224 result = __translation__; \
3225 } \
3226 else \
3227 result = dcgettext__ ((domainname), (msgid), (category)); \
3228 result; \
3229 }))
3230# endif
3231</PRE>
3232
3233<P>
3234The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate.
3235This is evaluated at compile time and so optimization can take place
3236immediately. Here two cases are distinguished: the argument to
3237<CODE>gettext</CODE> is not a constant value in which case simply the function
3238<CODE>dcgettext__</CODE> is called, the real implementation of the
3239<CODE>dcgettext</CODE> function.
3240
3241</P>
3242<P>
3243If the string argument <EM>is</EM> constant we can reuse the once gained
3244translation when the locale selection has not changed. This is exactly
3245what is done here. The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in
3246the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is
3247changed whenever a new message catalog is loaded.
3248
3249</P>
3250
3251
3252<H2><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Comparing the Two Interfaces</A></H2>
3253
3254<P>
3255The following discussion is perhaps a little bit colored. As said
3256above we implemented GNU <CODE>gettext</CODE> following the Uniforum
3257proposal and this surely has its reasons. But it should show how we
3258came to this decision.
3259
3260</P>
3261<P>
3262First we take a look at the developing process. When we write an
3263application using NLS provided by <CODE>gettext</CODE> we proceed as always.
3264Only when we come to a string which might be seen by the users and thus
3265has to be translated we use <CODE>gettext("...")</CODE> instead of
3266<CODE>"..."</CODE>. At the beginning of each source file (or in a central
3267header file) we define
3268
3269</P>
3270
3271<PRE>
3272#define gettext(String) (String)
3273</PRE>
3274
3275<P>
3276Even this definition can be avoided when the system supports the
3277<CODE>gettext</CODE> function in its C library. When we compile this code the
3278result is the same as if no NLS code is used. When you take a look at
3279the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
3280instead of <CODE>gettext("...")</CODE>. This reduces the number of
3281additional characters per translatable string to <EM>3</EM> (in words:
3282three).
3283
3284</P>
3285<P>
3286When now a production version of the program is needed we simply replace
3287the definition
3288
3289</P>
3290
3291<PRE>
3292#define _(String) (String)
3293</PRE>
3294
3295<P>
3296by
3297
3298</P>
3299
3300<PRE>
3301#include &#60;libintl.h&#62;
3302#define _(String) gettext (String)
3303</PRE>
3304
3305<P>
3306and include the header <TT>`libintl.h'</TT>. Additionally we run the
3307program <TT>`xgettext'</TT> on all source code file which contain
3308translatable strings and we are gone. We have a running program which
3309does not depend on translations to be available, but which can use any
3310that becomes available.
3311
3312</P>
3313<P>
3314The same procedure can be done for the <CODE>gettext_noop</CODE> invocations
3315(see section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>). First you can define <CODE>gettext_noop</CODE> to a
3316no-op macro and later use the definition from <TT>`libintl.h'</TT>. Because
3317this name is not used in Suns implementation of <TT>`libintl.h'</TT>,
3318you should consider the following code for your project:
3319
3320</P>
3321
3322<PRE>
3323#ifdef gettext_noop
3324# define N_(Str) gettext_noop (Str)
3325#else
3326# define N_(Str) (Str)
3327#endif
3328</PRE>
3329
3330<P>
3331<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in
3332the <TT>`po/'</TT> directory of GNU gettext knows by default both of the
3333mentioned short forms so you are invited to follow this proposal for
3334your own ease.
3335
3336</P>
3337<P>
3338Now to <CODE>catgets</CODE>. The main problem is the work for the
3339programmer. Every time he comes to a translatable string he has to
3340define a number (or a symbolic constant) which has also be defined in
3341the message catalog file. He also has to take care for duplicate
3342entries, duplicate message IDs etc. If he wants to have the same
3343quality in the message catalog as the GNU <CODE>gettext</CODE> program
3344provides he also has to put the descriptive comments for the strings and
3345the location in all source code files in the message catalog. This is
3346nearly a Mission: Impossible.
3347
3348</P>
3349<P>
3350But there are also some points people might call advantages speaking for
3351<CODE>catgets</CODE>. If you have a single word in a string and this string
3352is used in different contexts it is likely that in one or the other
3353language the word has different translations. Example:
3354
3355</P>
3356
3357<PRE>
3358printf ("%s: %d", gettext ("number"), number_of_errors)
3359
3360printf ("you should see %d %s", number_count,
3361 number_count == 1 ? gettext ("number") : gettext ("numbers"))
3362</PRE>
3363
3364<P>
3365Here we have to translate two times the string <CODE>"number"</CODE>. Even
3366if you do not speak a language beside English it might be possible to
3367recognize that the two words have a different meaning. In German the
3368first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
3369to <CODE>"Zahl"</CODE>.
3370
3371</P>
3372<P>
3373Now you can say that this example is really esoteric. And you are
3374right! This is exactly how we felt about this problem and decide that
3375it does not weight that much. The solution for the above problem could
3376be very easy:
3377
3378</P>
3379
3380<PRE>
3381printf (gettext ("number: %d"), number_of_errors)
3382
3383printf (number_count == 1 ? gettext ("you should see %d number")
3384 : gettext ("you should see %d numbers"),
3385 number_count)
3386</PRE>
3387
3388<P>
3389We believe that we can solve all conflicts with this method. If it is
3390difficult one can also consider changing one of the conflicting string a
3391little bit. But it is not impossible to overcome.
3392
3393</P>
3394<P>
3395Translator note: It is perhaps appropriate here to tell those English
3396speaking programmers that the plural form of a noun cannot be formed by
3397appending a single `s'. Most other languages use different methods. So
3398you should at least use the method given in the above example.
3399
3400</P>
3401<P>
3402But I have been told that some languages have even more complex rules.
3403A good approach might be to consider methods like the one used for
3404<CODE>LC_TIME</CODE> in the POSIX.2 standard.
3405
3406</P>
3407
3408
3409
3410<H2><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Using libintl.a in own programs</A></H2>
3411
3412<P>
3413Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be more
3414or less self-contained. I.e. you can use it in your own programs. The
3415<TT>`Makefile'</TT> will put the header and the library in directories
3416selected using the <CODE>$(prefix)</CODE>.
3417
3418</P>
3419<P>
3420One exception of the above is found on HP-UX systems. Here the C library
3421does not contain the <CODE>alloca</CODE> function (and the HP compiler does
3422not generate it inlined). But it is not intended to rewrite the whole
3423library just because of this dumb system. Instead include the
3424<CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in.
3425
3426</P>
3427
3428
3429
3430<H2><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Being a <CODE>gettext</CODE> grok</A></H2>
3431
3432<P>
3433To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
3434is surely helpful to read the source code. But for those who don't want
3435to spend that much time in reading the (sometimes complicated) code here
3436is a list comments:
3437
3438</P>
3439
3440<UL>
3441<LI>Changing the language at runtime
3442
3443For interactive programs it might be useful to offer a selection of the
3444used language at runtime. To understand how to do this one need to know
3445how the used language is determined while executing the <CODE>gettext</CODE>
3446function. The method which is presented here only works correctly
3447with the GNU implementation of the <CODE>gettext</CODE> functions. It is not
3448possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE>
3449functions from the systems C library. The exception is of course the
3450GNU C Library which uses the GNU gettext Library for message handling.
3451
3452In the function <CODE>dcgettext</CODE> at every call the current setting of
3453the highest priority environment variable is determined and used.
3454Highest priority means here the following list with decreasing
3455priority:
3456
3457
3458<OL>
3459<LI><CODE>LANGUAGE</CODE>
3460
3461<LI><CODE>LC_ALL</CODE>
3462
3463<LI><CODE>LC_xxx</CODE>, according to selected locale
3464
3465<LI><CODE>LANG</CODE>
3466
3467</OL>
3468
3469Afterwards the path is constructed using the found value and the
3470translation file is loaded if available.
3471
3472What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According
3473to the process explained above the new value of this variable is found
3474as soon as the <CODE>dcgettext</CODE> function is called. But this also means
3475the (perhaps) different message catalog file is loaded. In other
3476words: the used language is changed.
3477
3478But there is one little hook. The code for gcc-2.7.0 and up provides
3479some optimization. This optimization normally prevents the calling of
3480the <CODE>dcgettext</CODE> function as long as now new catalog is loaded. But
3481if <CODE>dcgettext</CODE> is not called we program also cannot find the
3482<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext.html#SEC44">Optimization of the *gettext functions</A>). But the
3483solution is very easy. Include the following code in the language
3484switching function.
3485
3486
3487<PRE>
3488 /* Change language. */
3489 setenv ("LANGUAGE", "fr", 1);
3490
3491 /* Make change known. */
3492 {
3493 extern int _nl_msg_cat_cntr;
3494 ++_nl_msg_cat_cntr;
3495 }
3496</PRE>
3497
3498The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>.
3499
3500</UL>
3501
3502
3503
3504<H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Temporary Notes for the Programmers Chapter</A></H2>
3505
3506
3507
3508<H3><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Temporary - Two Possible Implementations</A></H3>
3509
3510<P>
3511There are two competing methods for language independent messages:
3512the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
3513method. The <CODE>catgets</CODE> method indexes messages by integers; the
3514<CODE>gettext</CODE> method indexes them by their English translations.
3515The <CODE>catgets</CODE> method has been around longer and is supported
3516by more vendors. The <CODE>gettext</CODE> method is supported by Sun,
3517and it has been heard that the COSE multi-vendor initiative is
3518supporting it. Neither method is a POSIX standard; the POSIX.1
3519committee had a lot of disagreement in this area.
3520
3521</P>
3522<P>
3523Neither one is in the POSIX standard. There was much disagreement
3524in the POSIX.1 committee about using the <CODE>gettext</CODE> routines
3525vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't
3526agree on anything, so no messaging system was included as part
3527of the standard. I believe the informative annex of the standard
3528includes the XPG3 messaging interfaces, "...as an example of
3529a messaging system that has been implemented..."
3530
3531</P>
3532<P>
3533They were very careful not to say anywhere that you should use one
3534set of interfaces over the other. For more on this topic please
3535see the Programming for Internationalization FAQ.
3536
3537</P>
3538
3539
3540<H3><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Temporary - About <CODE>catgets</CODE></A></H3>
3541
3542<P>
3543There have been a few discussions of late on the use of
3544<CODE>catgets</CODE> as a base. I think it important to present both
3545sides of the argument and hence am opting to play devil's advocate
3546for a little bit.
3547
3548</P>
3549<P>
3550I'll not deny the fact that <CODE>catgets</CODE> could have been designed
3551a lot better. It currently has quite a number of limitations and
3552these have already been pointed out.
3553
3554</P>
3555<P>
3556However there is a great deal to be said for consistency and
3557standardization. A common recurring problem when writing Unix
3558software is the myriad portability problems across Unix platforms.
3559It seems as if every Unix vendor had a look at the operating system
3560and found parts they could improve upon. Undoubtedly, these
3561modifications are probably innovative and solve real problems.
3562However, software developers have a hard time keeping up with all
3563these changes across so many platforms.
3564
3565</P>
3566<P>
3567And this has prompted the Unix vendors to begin to standardize their
3568systems. Hence the impetus for Spec1170. Every major Unix vendor
3569has committed to supporting this standard and every Unix software
3570developer waits with glee the day they can write software to this
3571standard and simply recompile (without having to use autoconf)
3572across different platforms.
3573
3574</P>
3575<P>
3576As I understand it, Spec1170 is roughly based upon version 4 of the
3577X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and
3578friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
3579is a part of Spec1170 and hence will become a standardized component
3580of all Unix systems.
3581
3582</P>
3583
3584
3585<H3><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary - Why a single implementation</A></H3>
3586
3587<P>
3588Now it seems kind of wasteful to me to have two different systems
3589installed for accessing message catalogs. If we do want to remedy
3590<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
3591(in a compatible manner) rather than implement an entirely new system.
3592Otherwise, we'll end up with two message catalog access systems
3593installed with an operating system - one set of routines for GNU
3594software, and another set of routines (catgets) for all other software.
3595Bloated?
3596
3597</P>
3598<P>
3599Supposing another catalog access system is implemented. Which do
3600we recommend? At least for Linux, we need to attract as many
3601software developers as possible. Hence we need to make it as easy
3602for them to port their software as possible. Which means supporting
3603<CODE>catgets</CODE>. We will be implementing the <CODE>glocale</CODE> code
3604within our <CODE>libc</CODE>, but does this mean we also have to incorporate
3605another message catalog access scheme within our <CODE>libc</CODE> as well?
3606And what about people who are going to be using the <CODE>glocale</CODE>
3607+ non-<CODE>catgets</CODE> routines. When they port their software to
3608other platforms, they're now going to have to include the front-end
3609(<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
3610access routines) with their software instead of just including the
3611<CODE>glocale</CODE> code with their software.
3612
3613</P>
3614<P>
3615Message catalog support is however only the tip of the iceberg.
3616What about the data for the other locale categories. They also have
3617a number of deficiencies. Are we going to abandon them as well and
3618develop another duplicate set of routines (should <CODE>glocale</CODE>
3619expand beyond message catalog support)?
3620
3621</P>
3622<P>
3623Like many parts of Unix that can be improved upon, we're stuck with balancing
3624compatibility with the past with useful improvements and innovations for
3625the future.
3626
3627</P>
3628
3629
3630<H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Double layer solution</A></H3>
3631
3632<P>
3633GNU locale implements a <CODE>gettext</CODE>-style interface on top of a
3634<CODE>catgets</CODE>-style interface.
3635
3636</P>
3637<P>
3638This is not needless complexity. It is absolutely vital, because
3639it enables <CODE>gettext</CODE> to run on top of <CODE>catgets</CODE>, which
3640enables Linux International to recommend users use it <EM>today</EM>.
3641
3642</P>
3643<P>
3644Rewriting <CODE>gettext</CODE> so that it could use <EM>either</EM>
3645<CODE>catgets</CODE> <EM>or</EM> some simpler mechanism would not break
3646anything, but would not reduce complexity either. It might be
3647worth doing, but it isn't urgent.
3648
3649</P>
3650<P>
3651In general, simplicity is not enough of a reason to rewrite a
3652program that works. Simplicity is just one desirable thing.
3653It is not overridingly important.
3654
3655</P>
3656
3657
3658<H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - Notes</A></H3>
3659
3660<P>
3661X/Open agreed very late on the standard form so that many
3662implementations differ from the final form. Both of my system (old
3663Linux catgets and Ultrix-4) have a strange variation.
3664
3665</P>
3666<P>
3667OK. After incorporating the last changes I have to spend some time on
3668making the GNU/Linux libc gettext functions. So in future Solaris is
3669not the only system having gettext.
3670
3671</P>
3672
3673
3674<H1><A NAME="SEC54" HREF="gettext_toc.html#TOC54">The Translator's View</A></H1>
3675
3676
3677
3678<H2><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Introduction 0</A></H2>
3679
3680<P>
3681GNU is going international! The GNU Translation Project is a way
3682to get maintainers, translators and users all together, so GNU will
3683gradually become able to speak many native languages.
3684
3685</P>
3686<P>
3687The GNU <CODE>gettext</CODE> tool set contains <EM>everything</EM> maintainers
3688need for internationalizing their packages for messages. It also
3689contains quite useful tools for helping translators at localizing
3690messages to their native language, once a package has already been
3691internationalized.
3692
3693</P>
3694<P>
3695To achieve the GNU Translation Project, we need many interested
3696people who like their own language and write it well, and who are also
3697able to synergize with other translators speaking the same language.
3698If you'd like to volunteer to <EM>work</EM> at translating messages,
3699please send mail to your translating team.
3700
3701</P>
3702<P>
3703Each team has its own mailing list, courtesy of Linux
3704International. You may reach your translating team at the address
3705<TT>`<VAR>ll</VAR>@li.org'</TT>, replacing <VAR>ll</VAR> by the two-letter ISO 639
3706code for your language. Language codes are <EM>not</EM> the same as
3707country codes given in ISO 3166. The following translating teams
3708exist:
3709
3710</P>
3711
3712<BLOCKQUOTE>
3713<P>
3714Chinese <CODE>zh</CODE>, Czech <CODE>cs</CODE>, Danish <CODE>da</CODE>, Dutch <CODE>nl</CODE>,
3715Esperanto <CODE>eo</CODE>, Finnish <CODE>fi</CODE>, French <CODE>fr</CODE>, Irish
3716<CODE>ga</CODE>, German <CODE>de</CODE>, Greek <CODE>el</CODE>, Italian <CODE>it</CODE>,
3717Japanese <CODE>ja</CODE>, Indonesian <CODE>in</CODE>, Norwegian <CODE>no</CODE>, Polish
3718<CODE>pl</CODE>, Portuguese <CODE>pt</CODE>, Russian <CODE>ru</CODE>, Spanish <CODE>es</CODE>,
3719Swedish <CODE>sv</CODE> and Turkish <CODE>tr</CODE>.
3720</BLOCKQUOTE>
3721
3722<P>
3723For example, you may reach the Chinese translating team by writing to
3724<TT>`zh@li.org'</TT>. When you become a member of the translating team
3725for your own language, you may subscribe to its list. For example,
3726Swedish people can send a message to <TT>`sv-request@li.org'</TT>,
3727having this message body:
3728
3729</P>
3730
3731<PRE>
3732subscribe
3733</PRE>
3734
3735<P>
3736Keep in mind that team members should be interested in <EM>working</EM>
3737at translations, or at solving translational difficulties, rather than
3738merely lurking around. If your team does not exist yet and you want to
3739start one, please write to <TT>`gnu-translation@prep.ai.mit.edu'</TT>;
3740you will then reach the GNU coordinator for all translator teams.
3741
3742</P>
3743<P>
3744A handful of GNU packages have already been adapted and provided
3745with message translations for several languages. Translation
3746teams have begun to organize, using these packages as a starting
3747point. But there are many more packages and many languages for
3748which we have no volunteer translators. If you would like to
3749volunteer to work at translating messages, please send mail to
3750<TT>`gnu-translation@prep.ai.mit.edu'</TT> indicating what language(s)
3751you can work on.
3752
3753</P>
3754
3755
3756<H2><A NAME="SEC56" HREF="gettext_toc.html#TOC56">Introduction 1</A></H2>
3757
3758<P>
3759This is now official, GNU is going international! Here is the
3760announcement submitted for the January 1995 GNU Bulletin:
3761
3762</P>
3763
3764<BLOCKQUOTE>
3765<P>
3766A handful of GNU packages have already been adapted and provided
3767with message translations for several languages. Translation
3768teams have begun to organize, using these packages as a starting
3769point. But there are many more packages and many languages
3770for which we have no volunteer translators. If you'd like to
3771volunteer to work at translating messages, please send mail to
3772<SAMP>`gnu-translation@prep.ai.mit.edu'</SAMP> indicating what language(s)
3773you can work on.
3774</BLOCKQUOTE>
3775
3776<P>
3777This document should answer many questions for those who are curious
3778about the process or would like to contribute. Please at least skim
3779over it, hoping to cut down a little of the high volume of email
3780generated by this collective effort towards GNU internationalization.
3781
3782</P>
3783<P>
3784GNU programming is done in English, and currently, English is used
3785as the main communicating language between national communities
3786collaborating to the GNU project. This very document is written
3787in English. This will not change in the foreseeable future.
3788
3789</P>
3790<P>
3791However, there is a strong appetite from national communities for
3792having more software able to write using national language and habits,
3793and there is an on-going effort to modify GNU software in such a way
3794that it becomes able to do so. The experiments driven so far raised
3795an enthusiastic response from pretesters, so we believe that GNU
3796internationalization is dedicated to succeed.
3797
3798</P>
3799<P>
3800For suggestion clarifications, additions or corrections to this
3801document, please email to <TT>`gnu-translation@prep.ai.mit.edu'</TT>.
3802
3803</P>
3804
3805
3806<H2><A NAME="SEC57" HREF="gettext_toc.html#TOC57">Discussions</A></H2>
3807
3808<P>
3809Facing this internationalization effort, a few users expressed their
3810concerns. Some of these doubts are presented and discussed, here.
3811
3812</P>
3813
3814<UL>
3815<LI>Smaller groups
3816
3817Some languages are not spoken by a very large number of people,
3818so people speaking them sometimes consider that there may not be
3819all that much demand such versions of GNU packages. Moreover, many
3820people being <EM>into computers</EM>, in some countries, generally seem
3821to prefer English versions of their software.
3822
3823On the other end, people might enjoy their own language a lot, and
3824be very motivated at providing to themselves the pleasure of having
3825their beloved GNU software speaking their mother tongue. They do
3826themselves a personal favor, and do not pay that much attention to
3827the number of people beneficiating of their work.
3828
3829<LI>Misinterpretation
3830
3831Other users are shy to push forward their own language, seeing in this
3832some kind of misplaced propaganda. Someone thought there must be some
3833users of the language over the networks pestering other people with it.
3834
3835But any spoken language is worth localization, because there are
3836people behind the language for whom the language is important and
3837dear to their hearts.
3838
3839<LI>Odd translations
3840
3841The biggest problem is to find the right translations so that
3842everybody can understand the messages. Translations are usually a
3843little odd. Some people get used to English, to the extent they may
3844find translations into their own language "rather pushy, obnoxious
3845and sometimes even hilarious." As a French speaking man, I have
3846the experience of those instruction manuals for goods, so poorly
3847translated in French in Korea or Taiwan...
3848
3849The fact is that we sometimes have to create a kind of national
3850computer culture, and this is not easy without the collaboration of
3851many people liking their mother tongue. This is why translations are
3852better achieved by people knowing and loving their own language, and
3853ready to work together at improving the results they obtain.
3854
3855<LI>Dependencies over the GPL
3856
3857Some people wonder if using GNU <CODE>gettext</CODE> necessarily brings their package
3858under the protective wing of the GNU General Public License, when they
3859do not want to make their program free, or want other kinds of freedom.
3860The simplest answer is yes.
3861
3862The mere marking of localizable strings in a package, or conditional
3863inclusion of a few lines for initialization, is not really including
3864GPL'ed code. However, the localization routines themselves are under
3865the GPL and would bring the remainder of the package under the GPL
3866if they were distributed with it. So, I presume that, for those
3867for which this is a problem, it could be circumvented by letting to
3868the end installers the burden of assembling a package prepared for
3869localization, but not providing the localization routines themselves.
3870
3871</UL>
3872
3873
3874
3875<H2><A NAME="SEC58" HREF="gettext_toc.html#TOC58">Organization</A></H2>
3876
3877<P>
3878On a larger scale, the true solution would be to organize some kind of
3879fairly precise set up in which volunteers could participate. I gave
3880some thought to this idea lately, and realize there will be some
3881touchy points. I thought of writing to Richard Stallman to launch
3882such a project, but feel it might be good to shake out the ideas
3883between ourselves first. Most probably that Linux International has
3884some experience in the field already, or would like to orchestrate
3885the volunteer work, maybe. Food for thought, in any case!
3886
3887</P>
3888<P>
3889I guess we have to setup something early, somehow, that will help
3890many possible contributors of the same language to interlock and avoid
3891work duplication, and further be put in contact for solving together
3892problems particular to their tongue (in most languages, there are many
3893difficulties peculiar to translating technical English). My Swedish
3894contributor acknowledged these difficulties, and I'm well aware of
3895them for French.
3896
3897</P>
3898<P>
3899This is surely not a technical issue, but we should manage so the
3900effort of locale contributors be maximally useful, despite the national
3901team layer interface between contributors and maintainers.
3902
3903</P>
3904<P>
3905GNU needs some setup for coordinating language coordinators.
3906Localizing evolving GNU programs will surely become a permanent
3907and continuous activity in GNU, once started. The setup should be
3908minimally completed and tested before GNU <CODE>gettext</CODE> becomes an official
3909reality. The email address <TT>`gnu-translation@prep.ai.mit.edu'</TT>
3910has been setup for receiving offers from volunteers and general
3911email on these topics. This address reaches the GNU Translation
3912Project coordinator.
3913
3914</P>
3915
3916
3917
3918<H3><A NAME="SEC59" HREF="gettext_toc.html#TOC59">Central Coordination</A></H3>
3919
3920<P>
3921I also think GNU will need sooner than it thinks, that someone setup
3922a way to organize and coordinate these groups. Some kind of group
3923of groups. My opinion is that it would be good that GNU delegate
3924this task to a small group of collaborating volunteers, shortly.
3925Perhaps in <TT>`gnu.announce'</TT> a list of this national committee's
3926can be published.
3927
3928</P>
3929<P>
3930My role as coordinator would simply be to refer to Ulrich any German
3931speaking volunteer interested to localization of GNU programs, and
3932maybe helping national groups to initially organize, while maintaining
3933national registries for until national groups are ready to take over.
3934In fact, the coordinator should ease volunteers to get in contact with
3935one another for creating national teams, which should then select
3936one coordinator per language, or country (regionalized language).
3937If well done, the coordination should be useful without being an
3938overwhelming task, the time to put delegations in place.
3939
3940</P>
3941
3942
3943<H3><A NAME="SEC60" HREF="gettext_toc.html#TOC60">National Teams</A></H3>
3944
3945<P>
3946I suggest we look for volunteer coordinators/editors for individual
3947languages. These people will scan contributions of translation files
3948for various programs, for their own languages, and will ensure high
3949and uniform standards of diction.
3950
3951</P>
3952<P>
3953From my current experience with other people in these days, those who
3954provide localizations are very enthusiastic about the process, and are
3955more interested in the localization process than in the program they
3956localize, and want to do many programs, not just one. This seems
3957to confirm that having a coordinator/editor for each language is a
3958good idea.
3959
3960</P>
3961<P>
3962We need to choose someone who is good at writing clear and concise
3963prose in the language in question. That is hard--we can't check
3964it ourselves. So we need to ask a few people to judge each others'
3965writing and select the one who is best.
3966
3967</P>
3968<P>
3969I announce my prerelease to a few dozen people, and you would not
3970believe all the discussions it generated already. I shudder to think
3971what will happen when this will be launched, for true, officially,
3972world wide. Who am I to arbitrate between two Czekolsovak users
3973contradicting each other, for example?
3974
3975</P>
3976<P>
3977I assume that your German is not much better than my French so that
3978I would not be able to judge about these formulations. What I would
3979suggest is that for each language there is a group for people who
3980maintain the PO files and judge about changes. I suspect there will
3981be cultural differences between how such groups of people will behave.
3982Some will have relaxed ways, reach consensus easily, and have anyone
3983of the group relate to the maintainers, while others will fight to
3984death, organize heavy administrations up to national standards, and
3985use strict channels.
3986
3987</P>
3988<P>
3989The German team is putting out a good example. Right now, they are
3990maybe half a dozen people revising translations of each other and
3991discussing the linguistic issues. I do not even have all the names.
3992Ulrich Drepper is taking care of coordinating the German team.
3993He subscribed to all my pretest lists, so I do not even have to warn
3994him specifically of incoming releases.
3995
3996</P>
3997<P>
3998I'm sure, that is a good idea to get teams for each language working
3999on translations. That will make the translations better and more
4000consistent.
4001
4002</P>
4003
4004
4005
4006<H4><A NAME="SEC61" HREF="gettext_toc.html#TOC61">Sub-Cultures</A></H4>
4007
4008<P>
4009Taking French for example, there are a few sub-cultures around
4010computers which developed diverging vocabularies. Picking volunteers
4011here and there without addressing this problem in an organized way,
4012soon in the project, might produce a distasteful mix of GNU programs,
4013and possibly trigger endless quarrels among those who really care.
4014
4015</P>
4016<P>
4017Keeping some kind of unity in the way French localization of GNU
4018programs is achieved is a difficult (and delicate) job. Knowing the
4019latin character of French people (:-), if we take this the wrong
4020way, we could end up nowhere, or spoil a lot of energies. Maybe we
4021should begin to address this problem seriously <EM>before</EM> GNU
4022<CODE>gettext</CODE> become officially published. And I suspect that this
4023means soon!
4024
4025</P>
4026
4027
4028<H4><A NAME="SEC62" HREF="gettext_toc.html#TOC62">Organizational Ideas</A></H4>
4029
4030<P>
4031I expect the next big changes after the official release. Please note
4032that I use the German translation of the short GPL message. We need
4033to set a few good examples before the localization goes out for true
4034in GNU. Here are a few points to discuss:
4035
4036</P>
4037
4038<UL>
4039<LI>
4040
4041Each group should have one FTP server (at least one master).
4042
4043<LI>
4044
4045The files on the server should reflect the latest version (of
4046course!) and it should also contain a RCS directory with the
4047corresponding archives (I don't have this now).
4048
4049<LI>
4050
4051There should also be a ChangeLog file (this is more useful than the
4052RCS archive but can be generated automatically from the later by
4053Emacs).
4054
4055<LI>
4056
4057A <STRONG>core group</STRONG> should judge about questionable changes (for now
4058this group consists solely by me but I ask some others occasionally;
4059this also seems to work).
4060
4061</UL>
4062
4063
4064
4065<H3><A NAME="SEC63" HREF="gettext_toc.html#TOC63">Mailing Lists</A></H3>
4066
4067<P>
4068If we get any inquiries about GNU <CODE>gettext</CODE>, send them on to:
4069
4070</P>
4071
4072<PRE>
4073<TT>`gnu-translation@prep.ai.mit.edu'</TT>
4074</PRE>
4075
4076<P>
4077The <TT>`*-pretest'</TT> lists are quite useful to me, maybe the idea could
4078be generalized to all GNU packages. But each maintainer his/her way!
4079
4080</P>
4081<P>
4082, we have a mechanism in place here at
4083<TT>`gnu.ai.mit.edu'</TT> to track teams, support mailing lists for
4084them and log members. We have a slight preference that you use it.
4085If this is OK with you, I can get you clued in.
4086
4087</P>
4088<P>
4089Things are changing! A few years ago, when Daniel Fekete and I
4090asked for a mailing list for GNU localization, nested at the FSF, we
4091were politely invited to organize it anywhere else, and so did we.
4092For communicating with my pretesters, I later made a handful of
4093mailing lists located at iro.umontreal.ca and administrated by
4094<CODE>majordomo</CODE>. These lists have been <EM>very</EM> dependable
4095so far...
4096
4097</P>
4098<P>
4099I suspect that the German team will organize itself a mailing list
4100located in Germany, and so forth for other countries. But before they
4101organize for true, it could surely be useful to offer mailing lists
4102located at the FSF to each national team. So yes, please explain me
4103how I should proceed to create and handle them.
4104
4105</P>
4106<P>
4107We should create temporary mailing lists, one per country, to help
4108people organize. Temporary, because once regrouped and structured, it
4109would be fair the volunteers from country bring back <EM>their</EM> list
4110in there and manage it as they want. My feeling is that, in the long
4111run, each team should run its own list, from within their country.
4112There also should be some central list to which all teams could
4113subscribe as they see fit, as long as each team is represented in it.
4114
4115</P>
4116
4117
4118<H2><A NAME="SEC64" HREF="gettext_toc.html#TOC64">Information Flow</A></H2>
4119
4120<P>
4121There will surely be some discussion about this messages after the
4122packages are finally released. If people now send you some proposals
4123for better messages, how do you proceed? Jim, please note that
4124right now, as I put forward nearly a dozen of localizable programs, I
4125receive both the translations and the coordination concerns about them.
4126
4127</P>
4128<P>
4129If I put one of my things to pretest, Ulrich receives the announcement
4130and passes it on to the German team, who make last minute revisions.
4131Then he submits the translation files to me <EM>as the maintainer</EM>.
4132For GNU packages I do not maintain, I would not even hear about it.
4133This scheme could be made to work GNU-wide, I think. For security
4134reasons, maybe Ulrich (national coordinators, in fact) should update
4135central registry kept by GNU (Jim, me, or Len's recruits) once in
4136a while.
4137
4138</P>
4139<P>
4140In December/January, I was aggressively ready to internationalize
4141all of GNU, giving myself the duty of one small GNU package per week
4142or so, taking many weeks or months for bigger packages. But it does
4143not work this way. I first did all the things I'm responsible for.
4144I've nothing against some missionary work on other maintainers, but
4145I'm also loosing a lot of energy over it--same debates over again.
4146
4147</P>
4148<P>
4149And when the first localized packages are released we'll get a lot of
4150responses about ugly translations :-). Surely, and we need to have
4151beforehand a fairly good idea about how to handle the information
4152flow between the national teams and the package maintainers.
4153
4154</P>
4155<P>
4156Please start saving somewhere a quick history of each PO file. I know
4157for sure that the file format will change, allowing for comments.
4158It would be nice that each file has a kind of log, and references for
4159those who want to submit comments or gripes, or otherwise contribute.
4160I sent a proposal for a fast and flexible format, but it is not
4161receiving acceptance yet by the GNU deciders. I'll tell you when I
4162have more information about this.
4163
4164</P>
4165
4166
4167<H1><A NAME="SEC65" HREF="gettext_toc.html#TOC65">The Maintainer's View</A></H1>
4168
4169<P>
4170The maintainer of a package has many responsibilities. One of them
4171is ensuring that the package will install easily on many platforms,
4172and that the magic we described earlier (see section <A HREF="gettext.html#SEC32">The User's View</A>) will work
4173for installers and end users.
4174
4175</P>
4176<P>
4177Of course, there are many possible ways by which GNU <CODE>gettext</CODE>
4178might be integrated in a distribution, and this chapter does not cover
4179them in all generality. Instead, it details one possible approach
4180which is especially adequate for many GNU distributions, because
4181GNU <CODE>gettext</CODE> is purposely for helping the internationalization
4182of the whole GNU project. So, the maintainer's view presented here
4183presumes that the package already has a <TT>`configure.in'</TT> file and
4184uses Autoconf.
4185
4186</P>
4187<P>
4188Nevertheless, GNU <CODE>gettext</CODE> may surely be useful for non-GNU
4189packages, but the maintainers of such packages might have to show
4190imagination and initiative in organizing their distributions so
4191<CODE>gettext</CODE> work for them in all situations. There are surely
4192many, out there.
4193
4194</P>
4195<P>
4196Even if <CODE>gettext</CODE> methods are now stabilizing, slight adjustments
4197might be needed between successive <CODE>gettext</CODE> versions, so you
4198should ideally revise this chapter in subsequent releases, looking
4199for changes.
4200
4201</P>
4202
4203
4204
4205<H2><A NAME="SEC66" HREF="gettext_toc.html#TOC66">Flat or Non-Flat Directory Structures</A></H2>
4206
4207<P>
4208Some GNU packages are distributed as <CODE>tar</CODE> files which unpack
4209in a single directory, these are said to be <STRONG>flat</STRONG> distributions.
4210Other GNU packages have a one level hierarchy of subdirectories, using
4211for example a subdirectory named <TT>`doc/'</TT> for the Texinfo manual and
4212man pages, another called <TT>`lib/'</TT> for holding functions meant to
4213replace or complement C libraries, and a subdirectory <TT>`src/'</TT> for
4214holding the proper sources for the package. These other distributions
4215are said to be <STRONG>non-flat</STRONG>.
4216
4217</P>
4218<P>
4219For now, we cannot say much about flat distributions. A flat
4220directory structure has the disadvantage of increasing the difficulty
4221of updating to a new version of GNU <CODE>gettext</CODE>. Also, if you have
4222many PO files, this could somewhat pollute your single directory.
4223In the GNU <CODE>gettext</CODE> distribution, the <TT>`misc/'</TT> directory
4224contains a shell script named <TT>`combine-sh'</TT>. That script may
4225be used for combining all the C files of the <TT>`intl/'</TT> directory
4226into a pair of C files (one <TT>`.c'</TT> and one <TT>`.h'</TT>). Those two
4227generated files would fit more easily in a flat directory structure,
4228and you will then have to add these two files to your project.
4229
4230</P>
4231<P>
4232Maybe because GNU <CODE>gettext</CODE> itself has a non-flat structure,
4233we have more experience with this approach, and this is what will be
4234described in the remaining of this chapter. Some maintainers might
4235use this as an opportunity to unflatten their package structure.
4236Only later, once gained more experience adapting GNU <CODE>gettext</CODE>
4237to flat distributions, we might add some notes about how to proceed
4238in flat situations.
4239
4240</P>
4241
4242
4243<H2><A NAME="SEC67" HREF="gettext_toc.html#TOC67">Prerequisite Works</A></H2>
4244
4245<P>
4246There are some works which are required for using GNU <CODE>gettext</CODE>
4247in one of your package. These works have some kind of generality
4248that escape the point by point descriptions used in the remainder
4249of this chapter. So, we describe them here.
4250
4251</P>
4252
4253<UL>
4254<LI>
4255
4256Before attempting to use you should install some other packages first.
4257Ensure that recent versions of GNU <CODE>m4</CODE>, GNU Autoconf and GNU
4258<CODE>gettext</CODE> are already installed at your site, and if not, proceed
4259to do this first. If you got to install these things, beware that
4260GNU <CODE>m4</CODE> must be fully installed before GNU Autoconf is even
4261<EM>configured</EM>.
4262
4263Those three packages are only needed to you, as a maintainer; the
4264installers of your own package and end users do not really need any
4265of GNU <CODE>m4</CODE>, GNU Autoconf or GNU <CODE>gettext</CODE> for successfully
4266installing and running your package, with messages properly translated.
4267But this is not completely true if you provide internationalized
4268shell scripts within your own package: GNU <CODE>gettext</CODE> shall
4269then be installed at the user site if the end users want to see the
4270translation of shell script messages.
4271
4272<LI>
4273
4274Your package should use Autoconf and have a <TT>`configure.in'</TT> file.
4275If it does not, you have to learn how. The Autoconf documentation
4276is quite well written, it is a good idea that you print it and get
4277familiar with it.
4278
4279<LI>
4280
4281Your C sources should have already been modified according to
4282instructions given earlier in this manual. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>.
4283
4284<LI>
4285
4286Your <TT>`po/'</TT> directory should receive all PO files submitted to you
4287by the translator teams, each having <TT>`<VAR>ll</VAR>.po'</TT> as a name.
4288This is not usually easy to get translation
4289work done before your package gets internationalized and available!
4290Since the cycle has to start somewhere, the easiest for the maintainer
4291is to start with absolutely no PO files, and wait until various
4292translator teams get interested in your package, and submit PO files.
4293
4294</UL>
4295
4296<P>
4297It is worth adding here a few words about how the maintainer should
4298ideally behave with PO files submissions. As a maintainer, your
4299role is to authentify the origin of the submission as being the
4300representative of the appropriate GNU translating team (forward the
4301submission to <TT>`gnu-translation@prep.ai.mit.edu'</TT> in case of
4302doubt), to ensure that the PO file format is not severely broken and
4303does not prevent successful installation, and for the rest, to merely
4304to put these PO files in <TT>`po/'</TT> for distribution.
4305
4306</P>
4307<P>
4308As a maintainer, you do not have to take on your shoulders the
4309responsibility of checking if the translations are adequate or
4310complete, and should avoid diving into linguistic matters. Translation
4311teams drive themselves and are fully responsible of their linguistic
4312choices for GNU. Keep in mind that translator teams are <EM>not</EM>
4313driven by maintainers. You can help by carefully redirecting all
4314communications and reports from users about linguistic matters to the
4315appropriate translation team, or explain users how to reach or join
4316their team. The simplest might be to send them the <TT>`NLS'</TT> file.
4317
4318</P>
4319<P>
4320Maintainers should <EM>never ever</EM> apply PO file bug reports
4321themselves, short-cutting translation teams. If some translator has
4322difficulty to get some of her points through her team, it should not be
4323an issue for her to directly negotiate translations with maintainers.
4324Teams ought to settle their problems themselves, if any. If you, as
4325a maintainer, ever think there is a real problem with a team, please
4326never try to <EM>solve</EM> a team's problem on your own.
4327
4328</P>
4329
4330
4331<H2><A NAME="SEC68" HREF="gettext_toc.html#TOC68">Invoking the <CODE>gettextize</CODE> Program</A></H2>
4332
4333<P>
4334Some files are consistently and identically needed in every package
4335internationalized through GNU <CODE>gettext</CODE>. As a matter of
4336convenience, the <CODE>gettextize</CODE> program puts all these files right
4337in your package. This program has the following synopsis:
4338
4339</P>
4340
4341<PRE>
4342gettextize [ <VAR>option</VAR>... ] [ <VAR>directory</VAR> ]
4343</PRE>
4344
4345<P>
4346and accepts the following options:
4347
4348</P>
4349<DL COMPACT>
4350
4351<DT><SAMP>`-f'</SAMP>
4352<DD>
4353<DT><SAMP>`--force'</SAMP>
4354<DD>
4355Force replacement of files which already exist.
4356
4357<DT><SAMP>`-h'</SAMP>
4358<DD>
4359<DT><SAMP>`--help'</SAMP>
4360<DD>
4361Display this help and exit.
4362
4363<DT><SAMP>`--version'</SAMP>
4364<DD>
4365Output version information and exit.
4366
4367</DL>
4368
4369<P>
4370If <VAR>directory</VAR> is given, this is the top level directory of a
4371package to prepare for using GNU <CODE>gettext</CODE>. If not given, it
4372is assumed that the current directory is the top level directory of
4373such a package.
4374
4375</P>
4376<P>
4377The program <CODE>gettextize</CODE> provides the following files. However,
4378no existing file will be replaced unless the option <CODE>--force</CODE>
4379(<CODE>-f</CODE>) is specified.
4380
4381</P>
4382
4383<OL>
4384<LI>
4385
4386The <TT>`NLS'</TT> file is copied in the main directory of your package,
4387the one being at the top level. This file gives the main indications
4388about how to install and use the Native Language Support features
4389of your program. You might elect to use a more recent copy of this
4390<TT>`NLS'</TT> file than the one provided through <CODE>gettextize</CODE>, if
4391you have one handy. You may also fetch a more recent copy of file
4392<TT>`NLS'</TT> from most GNU archive sites.
4393
4394<LI>
4395
4396A <TT>`po/'</TT> directory is created for eventually holding
4397all translation files, but initially only containing the file
4398<TT>`po/Makefile.in.in'</TT> from the GNU <CODE>gettext</CODE> distribution.
4399(beware the double <SAMP>`.in'</SAMP> in the file name). If the <TT>`po/'</TT>
4400directory already exists, it will be preserved along with the files
4401it contains, and only <TT>`Makefile.in.in'</TT> will be overwritten.
4402
4403<LI>
4404
4405A <TT>`intl/'</TT> directory is created and filled with most of the files
4406originally in the <TT>`intl/'</TT> directory of the GNU <CODE>gettext</CODE>
4407distribution. Also, if option <CODE>--force</CODE> (<CODE>-f</CODE>) is given,
4408the <TT>`intl/'</TT> directory is emptied first.
4409
4410</OL>
4411
4412<P>
4413If your site support symbolic links, <CODE>gettextize</CODE> will not
4414actually copy the files into your package, but establish symbolic
4415links instead. This avoids duplicating the disk space needed in
4416all packages. Merely using the <SAMP>`-h'</SAMP> option while creating the
4417<CODE>tar</CODE> archive of your distribution will resolve each link by an
4418actual copy in the distribution archive. So, to insist, you really
4419should use <SAMP>`-h'</SAMP> option with <CODE>tar</CODE> within your <CODE>dist</CODE>
4420goal of your main <TT>`Makefile.in'</TT>.
4421
4422</P>
4423<P>
4424It is interesting to understand that most new files for supporting
4425GNU <CODE>gettext</CODE> facilities in one package go in <TT>`intl/'</TT>
4426and <TT>`po/'</TT> subdirectories. One distinction between these two
4427directories is that <TT>`intl/'</TT> is meant to be completely identical
4428in all packages using GNU <CODE>gettext</CODE>, while all newly created
4429files, which have to be different, go into <TT>`po/'</TT>. There is a
4430common <TT>`Makefile.in.in'</TT> in <TT>`po/'</TT>, because the <TT>`po/'</TT>
4431directory needs its own <TT>`Makefile'</TT>, and it has been designed so
4432it can be identical in all packages.
4433
4434</P>
4435
4436
4437<H2><A NAME="SEC69" HREF="gettext_toc.html#TOC69">Files You Must Create or Alter</A></H2>
4438
4439<P>
4440Besides files which are automatically added through <CODE>gettextize</CODE>,
4441there are many files needing revision for properly interacting with
4442GNU <CODE>gettext</CODE>. If you are closely following GNU standards for
4443Makefile engineering and auto-configuration, the adaptations should
4444be easier to achieve. Here is a point by point description of the
4445changes needed in each.
4446
4447</P>
4448<P>
4449So, here comes a list of files, each one followed by a description of
4450all alterations it needs. Many examples are taken out from the GNU
4451<CODE>gettext</CODE> 0.10 distribution itself. You may indeed
4452refer to the source code of the GNU <CODE>gettext</CODE> package, as it
4453is intended to be a good example and master implementation for using
4454its own functionality.
4455
4456</P>
4457
4458
4459
4460<H3><A NAME="SEC70" HREF="gettext_toc.html#TOC70"><TT>`POTFILES'</TT> in <TT>`po/'</TT></A></H3>
4461
4462<P>
4463The <TT>`po/'</TT> directory should receive a file named
4464<TT>`POTFILES.in'</TT>. This file tells which files, among all program
4465sources, have marked strings needing translation. Here is an example
4466of such a file:
4467
4468</P>
4469
4470<PRE>
4471# List of source files containing translatable strings.
4472# Copyright (C) 1995 Free Software Foundation, Inc.
4473
4474# Common library files
4475lib/error.c
4476lib/getopt.c
4477lib/xmalloc.c
4478
4479# Package source files
4480src/gettextp.c
4481src/msgfmt.c
4482src/xgettext.c
4483</PRE>
4484
4485<P>
4486Dashed comments and white lines are ignored. All other lines
4487list those source files containing strings marked for translation
4488(see section <A HREF="gettext.html#SEC15">How Marks Appears in Sources</A>), in a notation relative to the top level
4489of your whole distribution, rather than the location of the
4490<TT>`POTFILES.in'</TT> file itself.
4491
4492</P>
4493
4494
4495<H3><A NAME="SEC71" HREF="gettext_toc.html#TOC71"><TT>`configure.in'</TT> at top level</A></H3>
4496
4497
4498<OL>
4499<LI>Declare the package and version.
4500
4501This is done by a set of lines like these:
4502
4503
4504<PRE>
4505PACKAGE=gettext
4506VERSION=0.10
4507AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
4508AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
4509AC_SUBST(PACKAGE)
4510AC_SUBST(VERSION)
4511</PRE>
4512
4513Of course, you replace <SAMP>`gettext'</SAMP> with the name of your package,
4514and <SAMP>`0.10'</SAMP> by its version numbers, exactly as they
4515should appear in the packaged <CODE>tar</CODE> file name of your distribution
4516(<TT>`gettext-0.10.tar.gz'</TT>, here).
4517
4518<LI>Declare the available translations.
4519
4520This is done by defining <CODE>ALL_LINGUAS</CODE> to the white separated,
4521quoted list of available languages, in a single line, like this:
4522
4523
4524<PRE>
4525ALL_LINGUAS="de fr"
4526</PRE>
4527
4528This example means that German and French PO files are available, so
4529that these languages are currently supported by your package. If you
4530want to further restrict, at installation time, the set of installed
4531languages, this should not be done by modifying <CODE>ALL_LINGUAS</CODE> in
4532<TT>`configure.in'</TT>, but rather by using the <CODE>LINGUAS</CODE> environment
4533variable (see section <A HREF="gettext.html#SEC34">Magic for Installers</A>).
4534
4535<LI>Check for internationalization support.
4536
4537Here is the main <CODE>m4</CODE> macro for triggering internationalization
4538support. Just add this line to <TT>`configure.in'</TT>:
4539
4540
4541<PRE>
4542ud_GNU_GETTEXT
4543</PRE>
4544
4545This call is purposely simple, even if it generates a lot of configure
4546time checking and actions.
4547
4548<LI>Obtain some <TT>`libintl.h'</TT> header file.
4549
4550Once you called <CODE>ud_GNU_GETTEXT</CODE> in <TT>`configure.in'</TT>, use:
4551
4552
4553<PRE>
4554AC_LINK_FILES($nls_cv_header_libgt, $nls_cv_header_intl)
4555</PRE>
4556
4557This will create one header file <TT>`libintl.h'</TT>. The reason for
4558this has to do with the fact that some systems, using the Uniforum
4559message handling functions, already have a file of this name.
4560
4561The <CODE>AC_LINK_FILES</CODE> call has not been integrated into the
4562<CODE>ud_GNU_GETTEXT</CODE> macro because there can be only one such call
4563in a <TT>`configure'</TT> file. If you already use it, you will have to
4564<EM>merge</EM> the needed <CODE>AC_LINK_FILES</CODE> within yours, by adding
4565the first argument at the end of the list of your first argument,
4566and adding the second argument at the end of the list of your second
4567argument.
4568
4569<LI>Have output files created.
4570
4571The <CODE>AC_OUTPUT</CODE> directive, at the end of your <TT>`configure.in'</TT>
4572file, needs to be modified in two ways:
4573
4574
4575<PRE>
4576AC_OUTPUT([<VAR>existing configuration files</VAR> intl/Makefile po/Makefile.in],
4577[sed -e "/POTFILES =/r po/POTFILES" po/Makefile.in &#62; po/Makefile
4578<VAR>existing additional actions</VAR>])
4579</PRE>
4580
4581The modification to the first argument to <CODE>AC_OUTPUT</CODE> asks
4582for substitution in the <TT>`intl/'</TT> and <TT>`po/'</TT> directories.
4583Note the <SAMP>`.in'</SAMP> suffix used for <TT>`po/'</TT> only. This is because
4584the distributed file is really <TT>`po/Makefile.in.in'</TT>.
4585
4586The modification to the second argument ensures that <TT>`po/Makefile'</TT>
4587gets generated out of the <TT>`po/Makefile.in'</TT> just created, including
4588in it the <TT>`po/POTFILES'</TT> produced by <CODE>ud_GNU_GETTEXT</CODE>.
4589Two steps are needed because <TT>`po/POTFILES'</TT> can get lengthy in
4590some packages, too lengthy in fact for being able to merely use an
4591Autoconf substituted variable, as many <CODE>sed</CODE>s cannot handle very
4592long lines.
4593
4594</OL>
4595
4596
4597
4598<H3><A NAME="SEC72" HREF="gettext_toc.html#TOC72"><TT>`aclocal.m4'</TT> at top level</A></H3>
4599
4600<P>
4601If you do not have an <TT>`aclocal.m4'</TT> file in your distribution,
4602the simplest is taking a copy of <TT>`aclocal.m4'</TT> from
4603GNU <CODE>gettext</CODE>. But to be precise, you only need macros
4604<CODE>ud_LC_MESSAGES</CODE>, <CODE>ud_WITH_NLS</CODE> and <CODE>ud_GNU_GETTEXT</CODE>,
4605so you may use an editor and remove macros you do not need.
4606
4607</P>
4608<P>
4609If you already have an <TT>`aclocal.m4'</TT> file, then you will have
4610to merge the said macros into your <TT>`aclocal.m4'</TT>. Note that if
4611you are upgrading from a previous release of GNU <CODE>gettext</CODE>, you
4612should most probably <EM>replace</EM> the said macros, as they usually
4613change a little from one release of GNU <CODE>gettext</CODE> to the next.
4614Their contents may vary as we get more experience with strange systems
4615out there.
4616
4617</P>
4618<P>
4619These macros check for the internationalization support functions
4620and related informations. Hopefully, once stabilized, these macros
4621might be integrated in the standard Autoconf set, because this
4622piece of <CODE>m4</CODE> code will be the same for all projects using GNU
4623<CODE>gettext</CODE>.
4624
4625</P>
4626
4627
4628<H3><A NAME="SEC73" HREF="gettext_toc.html#TOC73"><TT>`acconfig.h'</TT> at top level</A></H3>
4629
4630<P>
4631If you do not have an <TT>`acconfig.h'</TT> file in your distribution,
4632the simplest is use take a copy of <TT>`acconfig.h'</TT> from
4633GNU <CODE>gettext</CODE>. But to be precise, you only need the
4634lines and comments for <CODE>ENABLE_NLS</CODE>, <CODE>HAVE_CATGETS</CODE>,
4635<CODE>HAVE_GETTEXT</CODE> and <CODE>HAVE_LC_MESSAGES</CODE>, so you may use
4636an editor and remove everything else. If you already have an
4637<TT>`acconfig.h'</TT> file, then you should merge the said definitions
4638into your <TT>`acconfig.h'</TT>.
4639
4640</P>
4641
4642
4643<H3><A NAME="SEC74" HREF="gettext_toc.html#TOC74"><TT>`Makefile.in'</TT> at top level</A></H3>
4644
4645<P>
4646Here are a few modifications you need to make to your main, top-level
4647<TT>`Makefile.in'</TT> file.
4648
4649</P>
4650
4651<OL>
4652<LI>
4653
4654Add the following lines near the beginning of your <TT>`Makefile.in'</TT>,
4655so the <SAMP>`dist:'</SAMP> goal will work properly (as explained further down):
4656
4657
4658<PRE>
4659PACKAGE = @PACKAGE@
4660VERSION = @VERSION@
4661</PRE>
4662
4663<LI>
4664
4665Add file <TT>`NLS'</TT> to the <CODE>DISTFILES</CODE> definition, so the file gets
4666distributed.
4667
4668<LI>
4669
4670Wherever you process subdirectories in your <TT>`Makefile.in'</TT>, be
4671sure you also process <CODE>@INTLSUB@</CODE> and <CODE>@POSUB@</CODE>, which
4672are replaced respectively by <SAMP>`intl'</SAMP> and <SAMP>`po'</SAMP>, or empty
4673when the configuration processes decides these directories should
4674not be processed.
4675
4676Here is an example of a canonical order of processing. In this
4677example, we also define <CODE>SUBDIRS</CODE> in <CODE>Makefile.in</CODE> for it
4678to be further used in the <SAMP>`dist:'</SAMP> goal.
4679
4680
4681<PRE>
4682SUBDIRS = doc lib @INTLSUB@ src @POSUB@
4683</PRE>
4684
4685that you will have to adapt to your own package.
4686
4687<LI>
4688
4689A delicate point is the <SAMP>`dist:'</SAMP> goal, as both
4690<TT>`intl/Makefile'</TT> and <TT>`po/Makefile'</TT> will later assume that the
4691proper directory has been set up from the main <TT>`Makefile'</TT>. Here is
4692an example at what the <SAMP>`dist:'</SAMP> goal might look like:
4693
4694
4695<PRE>
4696distdir = $(PACKAGE)-$(VERSION)
4697dist: Makefile
4698 rm -fr $(distdir)
4699 mkdir $(distdir)
4700 chmod 777 $(distdir)
4701 for file in $(DISTFILES); do \
4702 ln $$file $(distdir) 2&#62;/dev/null || cp -p $$file $(distdir); \
4703 done
4704 for subdir in $(SUBDIRS); do \
4705 mkdir $(distdir)/$$subdir || exit 1; \
4706 chmod 777 $(distdir)/$$subdir; \
4707 (cd $$subdir &#38;&#38; $(MAKE) $@) || exit 1; \
4708 done
4709 tar chozf $(distdir).tar.gz $(distdir)
4710 rm -fr $(distdir)
4711</PRE>
4712
4713</OL>
4714
4715
4716
4717<H3><A NAME="SEC75" HREF="gettext_toc.html#TOC75"><TT>`Makefile.in'</TT> in <TT>`src/'</TT></A></H3>
4718
4719<P>
4720Some of the modifications made in the main <TT>`Makefile.in'</TT> will
4721also be needed in the <TT>`Makefile.in'</TT> from your package sources,
4722which we assume here to be in the <TT>`src/'</TT> subdirectory. Here are
4723all the modifications needed in <TT>`src/Makefile.in'</TT>:
4724
4725</P>
4726
4727<OL>
4728<LI>
4729
4730In view of the <SAMP>`dist:'</SAMP> goal, you should have these lines near the
4731beginning of <TT>`src/Makefile.in'</TT>:
4732
4733
4734<PRE>
4735PACKAGE = @PACKAGE@
4736VERSION = @VERSION@
4737</PRE>
4738
4739<LI>
4740
4741If not done already, you should guarantee that <CODE>top_srcdir</CODE>
4742gets defined. This will serve for <CODE>cpp</CODE> include files. Just add
4743the line:
4744
4745
4746<PRE>
4747top_srcdir = @top_srcdir@
4748</PRE>
4749
4750<LI>
4751
4752You might also want to define <CODE>subdir</CODE> as <SAMP>`src'</SAMP>, later
4753allowing for almost uniform <SAMP>`dist:'</SAMP> goals in all your
4754<TT>`Makefile.in'</TT>. At list, the <SAMP>`dist:'</SAMP> goal below assume that
4755you used:
4756
4757
4758<PRE>
4759subdir = src
4760</PRE>
4761
4762<LI>
4763
4764You should ensure that the final linking will use <CODE>@INTLLIBS@</CODE> as
4765a library. An easy way to achieve this is to manage that it gets into
4766<CODE>LIBS</CODE>, like this:
4767
4768
4769<PRE>
4770LIBS = @INTLLIBS@ @LIBS@
4771</PRE>
4772
4773In most GNU packages one will find a directory <TT>`lib/'</TT> in which a
4774library containing some helper functions will be build. (You need at
4775least the few functions which the GNU <CODE>gettext</CODE> Library itself
4776needs.) However some of the functions in the <TT>`lib/'</TT> also give
4777messages to the user which of course should be translated, too. Taking
4778care of this it is not enough to place the support library (say
4779<TT>`libsupport.a'</TT>) just between the <CODE>@INTLLIBS@</CODE> and
4780<CODE>@LIBS@</CODE> in the above example. Instead one has to write this:
4781
4782
4783<PRE>
4784LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@
4785</PRE>
4786
4787<LI>
4788
4789You should also ensure that directory <TT>`intl/'</TT> will be searched for
4790C preprocessor include files in all circumstances. So, you have to
4791manage so both <SAMP>`-I../intl'</SAMP> and <SAMP>`-I$(top_srcdir)/intl'</SAMP> will
4792be given to the C compiler.
4793
4794<LI>
4795
4796Your <SAMP>`dist:'</SAMP> goal has to conform with others. Here is a
4797reasonable definition for it:
4798
4799
4800<PRE>
4801distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
4802dist: Makefile $(DISTFILES)
4803 for file in $(DISTFILES); do \
4804 ln $$file $(distdir) 2&#62;/dev/null || cp -p $$file $(distdir); \
4805 done
4806</PRE>
4807
4808</OL>
4809
4810
4811
4812<H1><A NAME="SEC76" HREF="gettext_toc.html#TOC76">Concluding Remarks</A></H1>
4813
4814<P>
4815We would like to conclude this GNU <CODE>gettext</CODE> manual by presenting
4816an history of the GNU Translation Project so far. We finally give
4817a few pointers for those who want to do further research or readings
4818about Native Language Support matters.
4819
4820</P>
4821
4822
4823
4824<H2><A NAME="SEC77" HREF="gettext_toc.html#TOC77">History of GNU <CODE>gettext</CODE></A></H2>
4825
4826<P>
4827Internationalization concerns and algorithms have been informally
4828and casually discussed for years in GNU, sometimes around GNU
4829<CODE>libc</CODE>, maybe around the incoming <CODE>Hurd</CODE>, or otherwise
4830(nobody clearly remembers). And even then, when the work started for
4831real, this was somewhat independently of these previous discussions.
4832
4833</P>
4834<P>
4835This all began in July 1994, when Patrick D'Cruze had the idea and
4836initiative of internationalizing version 3.9.2 of GNU <CODE>fileutils</CODE>.
4837He then asked Jim Meyering, the maintainer, how to get those changes
4838folded into an official release. That first draft was full of
4839<CODE>#ifdef</CODE>s and somewhat disconcerting, and Jim wanted to find
4840nicer ways. Patrick and Jim shared some tries and experimentations
4841in this area. Then, feeling that this might eventually have a deeper
4842impact on GNU, Jim wanted to know what standards were, and contacted
4843Richard Stallman, who very quickly and verbally described an overall
4844design for what was meant to become <CODE>glocale</CODE>, at that time.
4845
4846</P>
4847<P>
4848Jim implemented <CODE>glocale</CODE> and got a lot of exhausting feedback
4849from Patrick and Richard, of course, but also from Mitchum DSouza
4850(who wrote a <CODE>catgets</CODE>-like package), Roland McGrath, maybe David
4851MacKenzie, Pinard, and Paul Eggert, all pushing and
4852pulling in various directions, not always compatible, to the extent
4853that after a couple of test releases, <CODE>glocale</CODE> was torn apart.
4854
4855</P>
4856<P>
4857While Jim took some distance and time and became dad for a second
4858time, Roland wanted to get GNU <CODE>libc</CODE> internationalized, and
4859got Ulrich Drepper involved in that project. Instead of starting
4860from <CODE>glocale</CODE>, Ulrich rewrote something from scratch, but
4861more conformant to the set of guidelines who emerged out of the
4862<CODE>glocale</CODE> effort. Then, Ulrich got people from the previous
4863forum to involve themselves into this new project, and the switch
4864from <CODE>glocale</CODE> to what was first named <CODE>msgutils</CODE>, renamed
4865<CODE>nlsutils</CODE>, and later <CODE>gettext</CODE>, became officially accepted
4866by Richard in May 1995 or so.
4867
4868</P>
4869<P>
4870Let's summarize by saying that Ulrich Drepper wrote GNU <CODE>gettext</CODE>
4871in April 1995. The first official release of the package, including
4872PO mode, occurred in July 1995, and was numbered 0.7. Other people
4873contributed to the effort by providing a discussion forum around
4874Ulrich, writing little pieces of code, or testing. These are quoted
4875in the <CODE>THANKS</CODE> file which comes with the GNU <CODE>gettext</CODE>
4876distribution.
4877
4878</P>
4879<P>
4880While this was being done, adapted half a dozen of
4881GNU packages to <CODE>glocale</CODE> first, then later to <CODE>gettext</CODE>,
4882putting them in pretest, so providing along the way an effective
4883user environment for fine tuning the evolving tools. He also took
4884the responsibility of organizing and coordinating the GNU Translation
4885Project. After nearly a year of informal exchanges between people from
4886many countries, translator teams started to exist in May 1995, through
4887the creation and support by Patrick D'Cruze of twenty unmoderated
4888mailing lists for that many native languages, and two moderated
4889lists: one for reaching all teams at once, the other for reaching
4890all maintainers of internationalized packages in GNU.
4891
4892</P>
4893<P>
4894 also wrote PO mode in June 1995 with the collaboration
4895of Greg McGary, as a kind of contribution to Ulrich's package.
4896He also gave a hand with the GNU <CODE>gettext</CODE> Texinfo manual.
4897
4898</P>
4899
4900
4901<H2><A NAME="SEC78" HREF="gettext_toc.html#TOC78">Related Readings</A></H2>
4902
4903<P>
4904Eugene H. Dorr (<TT>`dorre@well.com'</TT>) maintains an interesting
4905bibliography on internationalization matters, called
4906<CITE>Internationalization Reference List</CITE>, which is available as:
4907
4908<PRE>
4909ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
4910</PRE>
4911
4912<P>
4913Michael Gschwind (<TT>`mike@vlsivie.tuwien.ac.at'</TT>) maintains a
4914Frequently Asked Questions (FAQ) list, entitled <CITE>Programming for
4915Internationalisation</CITE>. This FAQ discusses writing programs which
4916can handle different language conventions, character sets, etc.;
4917and is applicable to all character set encodings, with particular
4918emphasis on ISO 8859-1. It is regularly published in Usenet
4919groups <TT>`comp.unix.questions'</TT>, <TT>`comp.std.internat'</TT>,
4920<TT>`comp.software.international'</TT>, <TT>`comp.lang.c'</TT>,
4921<TT>`comp.windows.x'</TT>, <TT>`comp.std.c'</TT>, <TT>`comp.answers'</TT>
4922and <TT>`news.answers'</TT>. The home location of this document is:
4923
4924<PRE>
4925ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
4926</PRE>
4927
4928<P>
4929Patrick D'Cruze (<TT>`pdcruze@li.org'</TT>) wrote a tutorial about NLS
4930matters, and Jochen Hein (<TT>`Hein@student.tu-clausthal.de'</TT>) took
4931over the responsibility of maintaining it. It may be found as:
4932
4933<PRE>
4934ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
4935 ...locale-tutorial-0.8.txt.gz
4936</PRE>
4937
4938<P>
4939This site is mirrored in:
4940
4941<PRE>
4942ftp://ftp.ibp.fr/pub/linux/sunsite/
4943</PRE>
4944
4945<P>
4946A French version of the same tutorial should be findable at:
4947
4948<PRE>
4949ftp://ftp.ibp.fr/pub/linux/french/docs/
4950</PRE>
4951
4952<P>
4953together with French translations of many Linux-related documents.
4954
4955</P>
4956<P><HR><P>
4957This document was generated on 4 September 1998 using the
4958<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
4959translator version 1.51.</P>
4960</BODY>
4961</HTML>