]> git.saurik.com Git - wxWidgets.git/blob - docs/html/gettext/gettext.htm
more tweaks for release
[wxWidgets.git] / docs / html / gettext / gettext.htm
1 <HTML>
2 <HEAD>
3 <!-- This HTML file has been created by texi2html 1.51
4 from gettext.texi on 4 September 1998 -->
5
6 <TITLE>GNU gettext utilities</TITLE>
7 </HEAD>
8 <BODY>
9 <H1>GNU gettext tools, version 0.10</H1>
10 <H2>Native Language Support Library and Tools</H2>
11 <H2>Edition 0.10, 26 November</H2>
12 <ADDRESS>Ulrich Drepper</ADDRESS>
13 <ADDRESS>Jim Meyering</ADDRESS>
14 <ADDRESS>Pinard</ADDRESS>
15 <P>
16 <P><HR><P>
17
18 <P>
19 Copyright (C) 1995 Free Software Foundation, Inc.
20
21 </P>
22 <P>
23 Permission is granted to make and distribute verbatim copies of
24 this manual provided the copyright notice and this permission notice
25 are preserved on all copies.
26
27 </P>
28 <P>
29 Permission is granted to copy and distribute modified versions of this
30 manual under the conditions for verbatim copying, provided that the entire
31 resulting derived work is distributed under the terms of a permission
32 notice identical to this one.
33
34 </P>
35 <P>
36 Permission is granted to copy and distribute translations of this manual
37 into another language, under the above conditions for modified versions,
38 except that this permission notice may be stated in a translation approved
39 by the Foundation.
40
41 </P>
42
43
44
45 <H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">Introduction</A></H1>
46
47
48 <BLOCKQUOTE>
49 <P>
50 This manual is still in <EM>DRAFT</EM> state. Some sections are still
51 empty, or almost. We keep merging material from other sources
52 (essentially email folders) while the proper integration of this
53 material is delayed.
54 </BLOCKQUOTE>
55
56 <P>
57 In this manual, we use <EM>he</EM> when speaking of the programmer or
58 maintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM>
59 when speaking of the installers or end users of the translated program.
60 This is only a convenience for clarifying the documentation. It is
61 absolutely not meant to imply that some roles are more appropriate
62 to males or females. Besides, as you might guess, GNU <CODE>gettext</CODE>
63 is meant to be useful for people using computers, whatever their sex,
64 race, religion or nationality!
65
66 </P>
67 <P>
68 This chapter explains what are the goals seeked by the mere existence
69 of GNU <CODE>gettext</CODE>. Then, it explains a few wide concepts around
70 Native Language Support, and situates message translation in regard
71 to other aspects of national and cultural variance, as applicable
72 to programs. It also surveys what are those files used to convey
73 translations. It explains how the various tools interrelate in the
74 initial generation for these files, and later, how the maintenance
75 cycle usually operate.
76
77 </P>
78
79
80
81 <H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">The Purpose of GNU <CODE>gettext</CODE></A></H2>
82
83 <P>
84 Usually, programs are written and documented in English, and use
85 English at execution time for interacting with users. This is true
86 not only from within GNU, but also in a great deal of commercial
87 and free software. Using a common language is quite handy for
88 communication between developers, maintainers and users from all
89 countries. On the other hand, most people are less comfortable with
90 English than with their own native language, and would rather prefer
91 using their mother tongue for day to day's work, as far as possible.
92 Many would simply <EM>love</EM> seeing their computer screen showing
93 a lot less of English, and far more of their own spoken language.
94
95 </P>
96 <P>
97 However, to some people, this dream might appear so far fetched that
98 they may believe it is not even worth spending time thinking about
99 it, and they have no confidence at all that the dream might ever
100 become true. Many did not loose hope yet, and organized themselves.
101 The GNU Translation Project is a formalization of this hope into a
102 workable structure, which has a good chance to get all of us nearer
103 the achievement of a truly multi-lingual set of programs.
104
105 </P>
106 <P>
107 GNU <CODE>gettext</CODE> is an important step for the GNU Translation
108 Project, as it is an asset on which we may build many other steps.
109 This package offers to programmers, translators and even users, a
110 well integrated set of tools and documentation. Specifically, the GNU
111 <CODE>gettext</CODE> utilities are a set of tools that provides a framework
112 to help other GNU packages produce multi-lingual messages. These tools
113 include a set of conventions about how programs should be written to
114 support message catalogs, a directory and file naming organization
115 for the message catalogs themselves, a runtime library supporting the
116 retrieval of translated messages, and a few stand-alone programs to
117 massage in various ways the sets of translatable strings, or already
118 translated strings. A special GNU Emacs mode also helps interested
119 parties into preparing these sets, or bringing them up to date.
120
121 </P>
122 <P>
123 GNU <CODE>gettext</CODE> is designed so it minimizes the impact of
124 internationalization on program sources, keeping this impact as small
125 and hardly noticeable as possible. Internationalization has better
126 chances of succeeding if it is very light weighted, or at least,
127 appear to be so, when looking at program sources.
128
129 </P>
130 <P>
131 The GNU Translation Project also uses the GNU <CODE>gettext</CODE>
132 distribution as a vehicle for documenting its structure and methods,
133 even if this goes beyond the technicalities of the GNU <CODE>gettext</CODE>
134 proper. By doing so, translators will find in a single place, as
135 far as possible, all they need to know for properly doing their
136 translating work. Also, this supplementary documentation might also
137 help programmers, and even curious users, at understanding how GNU
138 <CODE>gettext</CODE> is related to the remainder of the GNU Translation
139 Project, and consequently, have a glimpse at the <EM>big picture</EM>.
140
141 </P>
142
143
144 <H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">I18n, L10n, and Such</A></H2>
145
146 <P>
147 Two long words appear all the time when we discuss support of native
148 language in programs, and these words have a precise meaning, worth
149 being explained here, once and for all in this document. The words are
150 <EM>internationalization</EM> and <EM>localization</EM>. Many people,
151 tired of writing these long words over and over again, took the
152 habit of writing <STRONG>i18n</STRONG> and <STRONG>l10n</STRONG> instead, quoting the first
153 and last letter of each word, and replacing the run of intermediate
154 letters by a number merely telling how many such letters there are.
155 But in this manual, in the sake of clarity, we will patiently write
156 the names in full, each time...
157
158 </P>
159 <P>
160 By <STRONG>internationalization</STRONG>, one refers to the operation by which a
161 program, or a set of programs turned into a package, is made aware and
162 able to support multiple languages. This is a generalization process,
163 by which the programs are untied from using only English strings or
164 other English specific habits, and connected to generic ways of doing
165 the same, instead. Program developers may use various techniques to
166 internationalize their programs, some of them have been standardized.
167 GNU <CODE>gettext</CODE> offers one of these standards. See section <A HREF="gettext.html#SEC36">The Programmer's View</A>.
168
169 </P>
170 <P>
171 By <STRONG>localization</STRONG>, one means the operation by which, in a set
172 of programs already internationalized, one gives the program all
173 needed information so that it can bend itself to handle its input
174 and output in a fashion which is correct for some native language and
175 cultural habits. This is a particularisation process, by which generic
176 methods already implemented in an internationalized program are used
177 in specific ways. The programming environment puts several functions
178 to the programmers disposal which allow this runtime configuration.
179 The formal description of specific set of cultural habits for some
180 country, together with all associated translations targeted to the
181 same native language, is called the <STRONG>locale</STRONG> for this language
182 or country. Users achieve localization of programs by setting proper
183 values to special environment variables, prior to executing those
184 programs, identifying which locale should be used.
185
186 </P>
187 <P>
188 In fact, locale message support is only one component of the cultural
189 data that makes up a particular locale. There are a whole host of
190 routines and functions provided to aid programmers in developing
191 internationalized software and which allows them to access the data
192 stored in a particular locale. When someone presently refers to a
193 particular locale, they are obviously referring to the data stored
194 within that particular locale. Similarly, if a programmer is referring
195 to "accessing the locale routines", they are referring to the
196 complete suite of routines that access all of the locale's information.
197
198 </P>
199 <P>
200 One uses the expression <STRONG>Native Language Support</STRONG>, or merely NLS,
201 for speaking of the overall activity or feature encompassing both
202 internationalization and localization, allowing for multi-lingual
203 interactions in a program. In a nutshell, one could say that
204 internationalization is the operation by which further localizations
205 are made possible.
206
207 </P>
208 <P>
209 Also, very roughly said, when it comes to multi-lingual messages,
210 internationalization is usually taken care of by programmers, and
211 localization is usually taken care of by translators.
212
213 </P>
214
215
216 <H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">Aspects in Native Language Support</A></H2>
217
218 <P>
219 For a totally multi-lingual distribution, there are many things to
220 translate beyond output messages.
221
222 </P>
223
224 <UL>
225 <LI>
226
227 As of today, GNU <CODE>gettext</CODE> offers a complete toolset for
228 translating messages output by C programs. Perl scripts and shell
229 scripts also need to be translated. Even if there are some hooks
230 so this can be done, these hooks are not integrated as well as they
231 should be.
232
233 <LI>
234
235 Some programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able
236 to produce other programs (or scripts). Even if the generating
237 programs themselves are internationalized, the generated programs they
238 produce may need internationalization on their own, and this indirect
239 internationalization could be automated right from the generating
240 program. In fact, quite usually, generating and generated programs
241 could be internationalized independently, as the effort needed is
242 fairly orthogonal.
243
244 <LI>
245
246 A few programs include textual tables which might need translation
247 themselves, independently of the strings contained in the program
248 itself. For example, RFC 1345 gives an English description for each
249 character which GNU <CODE>recode</CODE> is able to reconstruct at execution.
250 Since these descriptions are extracted from the RFC by mechanical means,
251 translating them properly would require a prior translation of the RFC
252 itself.
253
254 <LI>
255
256 Almost all programs accept options, which are often worded out so to
257 be descriptive for the English readers; one might want to consider
258 offering translated versions for program options as well.
259
260 <LI>
261
262 Many programs read, interpret, compile, or are somewhat driven by
263 input files which are texts containing keywords, identifiers, or
264 replies which are inherently translatable. For example, one may want
265 <CODE>gcc</CODE> to allow diacriticized characters in identifiers or use
266 translated keywords; <SAMP>`rm -i'</SAMP> might accept something else than
267 <SAMP>`y'</SAMP> or <SAMP>`n'</SAMP> for replies, etc. Even if the program will
268 eventually make most of its output in the foreign languages, one has
269 to decide whether the input syntax, option values, etc., are to be
270 localized or not.
271
272 <LI>
273
274 The manual accompanying a package, as well as all documentation files
275 in the distribution, could surely be translated, too. Translating a
276 manual, with the intent of later keeping up with updates, is a major
277 undertaking in itself, generally.
278
279 </UL>
280
281 <P>
282 As we already stressed, translation is only one aspect of locales.
283 Other internationalization aspects are not currently handled by GNU
284 <CODE>gettext</CODE>, but perhaps may be handled in future versions. There
285 are many attributes that are needed to define a country's cultural
286 conventions. These attributes include beside the country's native
287 language, the formatting of the date and time, the representation of
288 numbers, the symbols for currency, etc. These local <STRONG>rules</STRONG> are
289 termed the country's locale. The locale represents the knowledge
290 needed to support the country's native attributes.
291
292 </P>
293 <P>
294 There are a few major areas which may vary between countries and
295 hence, define what a locale must describe. The following list helps
296 putting multi-lingual messages into the proper context of other tasks
297 related to locales, and also presents some other areas which GNU
298 <CODE>gettext</CODE> might eventually tackle, maybe, one of these days.
299
300 </P>
301 <DL COMPACT>
302
303 <DT><EM>Characters and Codesets</EM>
304 <DD>
305 The codeset most commonly used through out the USA and most English
306 speaking parts of the world is the ASCII codeset. However, there are
307 many characters needed by various locales that are not found within
308 this codeset. The 8-bit ISO 8859-1 code set has most of the special
309 characters needed to handle the major European languages. However, in
310 many cases, the ISO 8859-1 font is not adequate. Hence each locale
311 will need to specify which codeset they need to use and will need
312 to have the appropriate character handling routines to cope with
313 the codeset.
314
315 <DT><EM>Currency</EM>
316 <DD>
317 The symbols used vary from country to country as does the position
318 used by the symbol. Software needs to be able to transparently
319 display currency figures in the native mode for each locale.
320
321 <DT><EM>Dates</EM>
322 <DD>
323 The format of date varies between locales. For example, Christmas day
324 in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
325 Other countries might use ISO 8061 dates, etc.
326
327 Time of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>,
328 or otherwise. Some locales require time to be specified in 24-hour
329 mode rather than as AM or PM. Further, the nature and yearly extent
330 of the Daylight Saving correction vary widely between countries.
331
332 <DT><EM>Numbers</EM>
333 <DD>
334 Numbers can be represented differently in different locales.
335 For example, the following numbers are all written correctly for
336 their respective locales:
337
338
339 <PRE>
340 12,345.67 English
341 12.345,67 French
342 1,2345.67 Asia
343 </PRE>
344
345 Some programs could go further and use different unit systems, like
346 English units or Metric units, or even take into account variants
347 about how numbers are spelled in full.
348
349 <DT><EM>Messages</EM>
350 <DD>
351 The most obvious area is the language support within a locale. This is
352 where GNU <CODE>gettext</CODE> provide an ease for developers and users to
353 easily change the language that the software uses to communicate to
354 the user.
355
356 </DL>
357
358 <P>
359 In the near future we see no chance that beside message handling
360 more components of locale will be made available for use in other
361 GNU packages. The reason for this is that most modern system provide
362 a more or less reasonable support for at least some of the missing
363 components. Another point is that the GNU libc and Linux will get
364 a new and complete implementation of the whole locale functionality
365 which could be adopted by system lacking a reasonable locale support.
366
367 </P>
368
369
370 <H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">Files Conveying Translations</A></H2>
371
372 <P>
373 The letters PO in <TT>`.po'</TT> files means Portable Object, to
374 distinguish it from <TT>`.mo'</TT> files, where MO stands for Machine
375 Object. This paradigm, as well as the PO file format, is inspired
376 by the NLS standard developed by Uniforum, and implemented by Sun
377 in their Solaris system.
378
379 </P>
380 <P>
381 PO files are meant to be read and edited by humans, and associate each
382 original, translatable string of a given package with its translation
383 in a particular target language. A single PO file is dedicated to
384 a single target language. If a package supports many languages,
385 there is one such PO file per language supported, and each package
386 has its own set of PO files. These PO files are best created by
387 the <CODE>xgettext</CODE> program, and later updated or refreshed through
388 the <CODE>tupdate</CODE> program. Program <CODE>xgettext</CODE> extracts all
389 marked messages from a set of C files and initializes a PO file with
390 empty translations. Program <CODE>tupdate</CODE> takes care of adjusting
391 PO files between releases of the corresponding sources, commenting
392 obsolete entries, initializing new ones, and updating all source
393 line references. Files ending with <TT>`.pot'</TT> are kind of base
394 translation files found in distributions, in PO file format, and
395 <TT>`.pox'</TT> files are often temporary PO files.
396
397 </P>
398 <P>
399 MO files are meant to be read by programs, and are binary in nature.
400 A few systems already offer tools for creating and handling MO files
401 as part of the Native Language Support coming with the system, but the
402 format of these MO files is often different from system to system,
403 and non-portable. They do not necessary use <TT>`.mo'</TT> for file
404 extensions, but since system libraries are also used for accessing
405 these files, it works as long as the system is self-consistent about
406 it. If GNU <CODE>gettext</CODE> is able to interface with the tools already
407 provided with systems, it will consequently let these provided tools
408 take care of generating the MO files. Or else, if such tools are not
409 found or do not seem usable, GNU <CODE>gettext</CODE> will use its own ways
410 and its own format for MO files. Files ending with <TT>`.gmo'</TT> are
411 really MO files, when it is known that these files use the GNU format.
412
413 </P>
414
415
416 <H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">Overview of GNU <CODE>gettext</CODE></A></H2>
417
418 <P>
419 The following diagram summarizes the relation between the files
420 handled by GNU <CODE>gettext</CODE> and the tools acting on these files.
421 It is followed by a somewhat detailed explanations, which you should
422 read while keeping an eye on the diagram. Having a clear understanding
423 of these interrelations would surely help programmers, translators
424 and maintainers.
425
426 </P>
427
428 <PRE>
429 Original C Sources ---&#62; PO mode ---&#62; Marked C Sources ---.
430 |
431 .---------&#60;--- GNU gettext Library |
432 .--- make &#60;---+ |
433 | `---------&#60;--------------------+-----------'
434 | |
435 | .-----&#60;--- PACKAGE.pot &#60;--- xgettext &#60;---' .---&#60;--- PO Compendium
436 | | | ^
437 | | `---. |
438 | `---. +---&#62; PO mode ---.
439 | +----&#62; tupdate -------&#62; LANG.pox ---&#62;--------' |
440 | .---' |
441 | | |
442 | `-------------&#60;---------------. |
443 | +--- LANG.po &#60;--- New LANG.pox &#60;----'
444 | .--- LANG.gmo &#60;--- msgfmt &#60;---'
445 | |
446 | `---&#62; install ---&#62; /.../LANG/PACKAGE.mo ---.
447 | +---&#62; "Hello world!"
448 `-------&#62; install ---&#62; /.../bin/PROGRAM -------'
449 </PRE>
450
451 <P>
452 The indication <SAMP>`PO mode'</SAMP> appears in two places in this picture,
453 and you may safely read it as merely meaning "hand editing", using
454 any editor of your choice, really. However, for those of you being
455 the lucky users of GNU Emacs, PO mode has been specifically created
456 for providing a cosy environment for editing or modifying PO files.
457 While editing a PO file, PO mode allows for the easy browsing of
458 auxiliary and compendium PO files, as well as following references into
459 the set of C program sources from which PO files has been derived.
460 It has a few special features, among which the interactive marking
461 of program strings as translatable, and the validatation of PO files
462 with easy repositioning to PO file lines showing errors.
463
464 </P>
465 <P>
466 As a programmer, the first step into bringing GNU <CODE>gettext</CODE>
467 into your package is identifying, right in the C sources, which
468 strings are meant to be translatable, and which are untranslatable.
469 This tedious job can be done a little more comfortably using PO
470 mode, but you can use any means being usual to you for modifying your
471 C sources. Some other simple, standard changes are also needed to
472 properly initialize the translation library. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>, for
473 more information about all this.
474
475 </P>
476 <P>
477 Once the C sources have been modified, the <CODE>xgettext</CODE> program
478 is used to find and extract all translatable strings, and create an
479 initial PO file out of all these. This <TT>`<VAR>package</VAR>.pot'</TT> file
480 contains all original program strings, it has sets of pointers to
481 exactly where in C sources each string is used, and all translations
482 are set to empty. The letter <KBD>t</KBD> in <TT>`.pot'</TT> marks that this is
483 a Template PO file, not yet oriented towards any particular language.
484 See section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the
485 <CODE>xgettext</CODE> program. If you are <EM>really</EM> lazy, you might
486 be interested at working a lot more right away, and preparing the
487 whole distribution setup (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>). By doing so, you
488 spare typing the <CODE>xgettext</CODE> command yourself, as <CODE>make</CODE>
489 should now generate the proper things automatically for you!
490
491 </P>
492 <P>
493 The first time through, there is no <TT>`<VAR>lang</VAR>.po'</TT> yet, so the
494 <CODE>tupdate</CODE> step may be skipped and replaced by a mere copy of
495 <TT>`<VAR>package</VAR>.pot'</TT> to <TT>`<VAR>lang</VAR>.pox'</TT>, where <VAR>lang</VAR>
496 represents the target language.
497
498 </P>
499 <P>
500 Then comes the initial translation of messages. Translation in
501 itself is a whole matter, still exclusively meant for humans,
502 and whose complexity far overwhelms the level of this manual.
503 Nevertheless, a few hints are given in some other chapter of this
504 manual (see section <A HREF="gettext.html#SEC54">The Translator's View</A>). You will also find there indications
505 about how to contact translating teams, or becoming part of them,
506 for sharing your translating concerns with others who target the same
507 native language.
508
509 </P>
510 <P>
511 While adding the translated messages into the <TT>`<VAR>lang</VAR>.pox'</TT>
512 PO file, if you do not have GNU Emacs handy, you are on your own
513 for ensuring that your fully respect the PO file format, and quoting
514 conventions (see section <A HREF="gettext.html#SEC9">The Format of PO Files</A>). This is surely not an impossible task,
515 as this is the way many people handled PO files already for Uniforum or
516 Solaris. On the other hand, using PO mode in GNU Emacs, most details
517 of PO file format are taken care for you, but you have to acquire
518 some familiarity with PO mode itself. Besides main PO mode commands
519 (see section <A HREF="gettext.html#SEC10">Main Commands</A>), you should know how to move between entries
520 (see section <A HREF="gettext.html#SEC11">Entry Positioning</A>), and how to handle untranslated entries
521 (see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>).
522
523 </P>
524 <P>
525 If some common translations have already been saved into a compendium
526 PO file, translators may use PO mode for initializing untranslated
527 entries from the compendium, and also save selected translations into
528 the compendium, updating it (see section <A HREF="gettext.html#SEC21">Using Translation Compendiums</A>). Compendium files
529 are meant to be exchanged between members of a given translation team.
530
531 </P>
532 <P>
533 Programs, or packages of programs, are dynamic in nature: users write
534 bug reports and suggestion for improvements, maintainers react by
535 modifying programs in various ways. The fact that a package has
536 already been internationalized should not make maintainers shy
537 of adding new strings, or modifying strings already translated.
538 They just do their job the best they can. For the GNU Translation
539 Project to work smoothly, it is important that maintainers do not
540 carry translation concerns on their already loaded shoulders, and that
541 translators be kept as free as possible of programmatic concerns.
542
543 </P>
544 <P>
545 The only concern maintainers should have is carefully marking new
546 strings are translatable, when they should be, and do not otherwise
547 worry about them being translated, as this will come in proper time.
548 Consequently, when programs and their strings are adjusted in various
549 ways by maintainers, and for matters usually unrelated to translation,
550 <CODE>xgettext</CODE> would construct <TT>`<VAR>package</VAR>.pot'</TT> files which are
551 evolving over time, so the translations carried by <TT>`<VAR>lang</VAR>.po'</TT>
552 are slowly fading out of date.
553
554 </P>
555 <P>
556 It is important for translators (and even maintainers) to understand
557 that package translation is a continuous process in the lifetime of a
558 package, and not something which is done once and for all at the start.
559 After an initial burst of translation activity for a given package,
560 interventions are needed once in a while, because here and there,
561 translated entries become obsolete, and new untranslated entries
562 appear, needing translation.
563
564 </P>
565 <P>
566 The <CODE>tupdate</CODE> program has the purpose of refreshing an already
567 existing <TT>`<VAR>lang</VAR>.po'</TT> file, by comparing it with a newer
568 <TT>`<VAR>package</VAR>.pot'</TT> template file, extracted by <CODE>xgettext</CODE>
569 out of recent C sources. The refreshing operation adjusts all
570 references to C source locations for strings, since these strings
571 move as programs are modified. Also, <CODE>tupdate</CODE> comments out as
572 obsolete, in <TT>`<VAR>lang</VAR>.pox'</TT>, those already translated entries
573 which are no longer used in the program sources (see section <A HREF="gettext.html#SEC25">Obsolete Entries</A>. It finally discovers new strings and insert them in
574 the resulting PO file as untranslated entries (see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>. See section <A HREF="gettext.html#SEC23">Invoking the <CODE>tupdate</CODE> Program</A>, for more information about what
575 <CODE>tupdate</CODE> really does.
576
577 </P>
578 <P>
579 Whatever route or means taken, the goal is obtaining an updated
580 <TT>`<VAR>lang</VAR>.pox'</TT> file offering translations for all strings.
581 When this is properly achieved, this file <TT>`<VAR>lang</VAR>.pox'</TT> may
582 take the place of the previous official <TT>`<VAR>lang</VAR>.po'</TT> file.
583
584 </P>
585 <P>
586 The time mobility, or fluidity of PO files, is an integral part of
587 the translation game, and should be well understood, and accepted.
588 People resisting it will have a hard time participating in the GNU
589 Translation Project, or will give a hard time to other participants!
590 In particular, maintainers should relax and include all available PO
591 files in their distributions, even if these have not recently been
592 updated, without banging or otherwise trying to exert pressure on the
593 translator teams to get the job done. The pressure should rather
594 come from the community of users speaking a particular language,
595 and maintainers should consider themselves fairly relieved of any
596 concern about the adequacy of translation files. On the other hand,
597 translators should reasonably try updating the PO files they are
598 responsible for, while the package is undergoing pretest, prior to
599 an official distribution.
600
601 </P>
602 <P>
603 Once the PO file is complete and dependable, the <CODE>msgfmt</CODE> program
604 is used for turning the PO file into a machine-oriented format, which
605 may yield efficient retrieval of translations by the programs of the
606 package, whenever needed at runtime (see section <A HREF="gettext.html#SEC31">The Format of GNU MO Files</A>). See section <A HREF="gettext.html#SEC30">Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modalities of execution
607 for the <CODE>msgfmt</CODE> program.
608
609 </P>
610 <P>
611 Finally, the modified and marked C sources are compiled and linked
612 with the GNU <CODE>gettext</CODE> library, usually through the operation of
613 <CODE>make</CODE>, given a suitable <TT>`Makefile'</TT> exists for the project,
614 and the resulting executable is installed somewhere users will find it.
615 The MO files themselves should also be properly installed. Given the
616 appropriate environment variables are set (see section <A HREF="gettext.html#SEC35">Magic for End Users</A>), the
617 program should localize itself automatically, whenever it executes.
618
619 </P>
620 <P>
621 The remaining of this manual has the purpose of deepening the various
622 steps outlined in this section.
623
624 </P>
625
626
627 <H1><A NAME="SEC7" HREF="gettext_toc.html#TOC7">PO Files and PO Mode Basics</A></H1>
628
629 <P>
630 The GNU <CODE>gettext</CODE> toolset helps programmers and translators
631 at producing, updating and using translation files, mainly those
632 PO files which are textual, editable files. This chapter insists
633 on the format of PO files, and contains a PO mode starter. PO mode
634 description is spread over this manual instead of being concentrated
635 in one place, this chapter presents only the basics of PO mode.
636
637 </P>
638
639
640
641 <H2><A NAME="SEC8" HREF="gettext_toc.html#TOC8">Completing GNU <CODE>gettext</CODE> Installation</A></H2>
642
643 <P>
644 Once you have received, unpacked, configured and compiled the GNU
645 <CODE>gettext</CODE> distribution, the <SAMP>`make install'</SAMP> command puts in
646 place the programs <CODE>xgettext</CODE>, <CODE>msgfmt</CODE>, <CODE>gettext</CODE>, and
647 <CODE>tupdate</CODE>, as well as their available message catalogs. For
648 completing a comfortable installation, you might also want to make the
649 PO mode available to your GNU Emacs users.
650
651 </P>
652 <P>
653 To finish the installation of the PO mode, you might want modify your
654 file <TT>`.emacs'</TT>, once and for all, so it contains a few lines looking
655 like:
656
657 </P>
658
659 <PRE>
660 (setq auto-mode-alist
661 (cons '("\\.pox?\\'" . po-mode) auto-mode-alist))
662 (autoload 'po-mode "po-mode")
663 </PRE>
664
665 <P>
666 Later, whenever you edit some <TT>`.po'</TT> or <TT>`.pox'</TT> file, Emacs
667 loads <TT>`po-mode.elc'</TT> (or <TT>`po-mode.el'</TT>) as needed, and
668 automatically activate PO mode commands for the associated buffer.
669 The string <EM>PO</EM> appears in the mode line for any buffer for
670 which PO mode is active. Many PO files may be active at once in a
671 single Emacs session.
672
673 </P>
674
675
676 <H2><A NAME="SEC9" HREF="gettext_toc.html#TOC9">The Format of PO Files</A></H2>
677
678 <P>
679 A PO file is made up of many entries, each entry holding the relation
680 between an original untranslated string and its corresponding
681 translation. All entries in a given PO file usually pertain
682 to a single project, and all translations are expressed in a single
683 target language. One PO file <STRONG>entry</STRONG> has the following schematic
684 structure:
685
686 </P>
687
688 <PRE>
689 <VAR>white-space</VAR>
690 # <VAR>translator-comments</VAR>
691 #. <VAR>automatic-comments</VAR>
692 #: <VAR>reference</VAR>...
693 msgid <VAR>untranslated-string</VAR>
694 msgstr <VAR>translated-string</VAR>
695 </PRE>
696
697 <P>
698 The general structure of a PO file should be well understood by
699 the translator. When using PO mode, very little has to be known
700 about the format details, as PO mode takes care of them for her.
701
702 </P>
703 <P>
704 Entries begin with some optional white space. Usually, when generated
705 through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
706 between entries. Then comments follow, on lines all starting with the
707 character <KBD>#</KBD>. There are two kinds of comments: those which have
708 some white space immediately following the <KBD>#</KBD>, which comments are
709 created and maintained exclusively by the translator, and those which
710 have some non-white character just after the <KBD>#</KBD>, which comments
711 are created and maintained automatically by GNU <CODE>gettext</CODE> tools.
712 All comments, of any kind, are optional.
713
714 </P>
715 <P>
716 After white space and comments, entries show two strings, giving
717 first the untranslated string as it appears in the original program
718 sources, and then, the translation of this string. The original
719 string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
720 by <CODE>msgstr</CODE>. The two strings, untranslated and translated,
721 are quoted in various ways in the PO file, using <KBD>"</KBD>
722 delimiters and <KBD>\</KBD> escapes, but the translator does not really
723 have to pay attention to the precise quoting format, as PO mode fully
724 intend to take care of quoting for her.
725
726 </P>
727 <P>
728 The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
729 and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
730 provide means for the translator to alter these. The most she can
731 do is merely deleting them, and only by deleting the whole entry.
732 On the other hand, the <CODE>msgstr</CODE> string, as well as translator
733 comments, are really meant for the translator, and PO mode gives her
734 the full control she needs.
735
736 </P>
737 <P>
738 It happens that some lines, usually whitespace or comments, follow the
739 very last entry of a PO file. Such lines are not part of any entry,
740 and PO mode is unable to take action on those lines. By using the
741 PO mode function <KBD>M-x po-normalize</KBD>, the translator may get
742 rid of those spurious lines. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>.
743
744 </P>
745 <P>
746 The remainder of this section may be safely skipped for those using
747 PO mode, yet it may be interesting for everybody to have a better
748 idea of the precise format of a PO file. On the other hand, those
749 not having GNU Emacs handy should carefully continue reading on.
750
751 </P>
752 <P>
753 Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
754 the C syntax for a character string, including the surrounding quotes
755 and imbedded backslashed escape sequences. When the time comes
756 to write multi-line strings, one should not use escaped newlines.
757 Instead, a closing quote should follow the last character on the
758 line to be continued, and an opening quote should resume the string
759 at the beginning of the following PO file line. For example:
760
761 </P>
762
763 <PRE>
764 msgid ""
765 "Here is an example of how one might continue a very long string\n"
766 "for the common case the string represents multi-line output.\n"
767 </PRE>
768
769 <P>
770 In this example, the empty string is used on the first line, for
771 allowing the better alignment of the <KBD>H</KBD> from the word <SAMP>`Here'</SAMP>
772 over the <KBD>f</KBD> from the word <SAMP>`for'</SAMP>. In this example, the
773 <CODE>msgid</CODE> keyword is followed by three strings, which are meant
774 to be concatenated. Concatenating the empty string does not change
775 the resulting overall string, but it is a way for us to comply with
776 the necessity of <CODE>msgid</CODE> to be followed by a string on the same
777 line, while keeping the multi-line presentation left-justified, as
778 we find this to be cleaner disposition. The empty string could have
779 been omitted, but only if the string starting with <SAMP>`Here'</SAMP> was
780 promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> It was not really necessary
781 either to switch between the two last quoted strings immediately after
782 the newline <SAMP>`\n'</SAMP>, the switch could have occurred after <EM>any</EM>
783 other character, we just did it this way because it is neater.
784
785 </P>
786 <P>
787 One should carefully distinguish between end of lines marked as
788 <SAMP>`\n'</SAMP> <EM>inside</EM> quotes, which are part of the represented
789 string, and end of lines in the PO file itself, outside string quotes,
790 which have no incidence on the represented string.
791
792 </P>
793 <P>
794 Outside strings, white lines and comments may be used freely.
795 Comments start at the beginning of a line with <SAMP>`#'</SAMP> and extend
796 until the end of the PO file line. Comments written by translators
797 should have the initial <SAMP>`#'</SAMP> immediately followed by some white
798 space. If the <SAMP>`#'</SAMP> is not immediately followed by white space,
799 this comment is most likely generated and managed by specialized GNU
800 tools, and might disappear or be replaced unexpectandly when the PO
801 file is given to <CODE>tupdate</CODE>.
802
803 </P>
804
805
806 <H2><A NAME="SEC10" HREF="gettext_toc.html#TOC10">Main Commands</A></H2>
807
808 <P>
809 When Emacs finds a PO file in a window, PO mode is activated
810 for that window. This puts the window read-only and establishes a
811 po-mode-map, which is a genuine Emacs mode, in that way that it is
812 not derived from text mode in any way.
813
814 </P>
815 <P>
816 The main PO commands are those who do not fit in the other categories in
817 subsequent sections, they allow for quitting PO mode or managing windows
818 in special ways.
819
820 </P>
821 <DL COMPACT>
822
823 <DT><KBD>u</KBD>
824 <DD>
825 Undo last modification to the PO file.
826
827 <DT><KBD>q</KBD>
828 <DD>
829 Quit processing and save the PO file.
830
831 <DT><KBD>o</KBD>
832 <DD>
833 Temporary leave the PO file window.
834
835 <DT><KBD>h</KBD>
836 <DD>
837 Show help about PO mode.
838
839 <DT><KBD>=</KBD>
840 <DD>
841 Give some PO file statistics.
842
843 <DT><KBD>v</KBD>
844 <DD>
845 Batch validate the format of the whole PO file.
846
847 </DL>
848
849 <P>
850 The command <KBD>u</KBD> (<CODE>po-undo</CODE>) interfaces to the GNU Emacs
851 <EM>undo</EM> facility. See section `Undoing Changes' in <CITE>The Emacs Editor</CITE>. Each time <KBD>u</KBD> is typed, modifications the translator
852 did to the PO file are undone a little more. For the purpose of
853 undoing, each PO mode command is atomic. This is especially true for
854 the <KBD><KBD>RET</KBD></KBD> command: the whole edition made by using a single
855 use of this command is undone at once, even if the edition itself
856 implied several actions. However, while in the editing window, one
857 can undo the edition work quite parsimoniously.
858
859 </P>
860 <P>
861 The command <KBD>q</KBD> (<CODE>po-quit</CODE>) is used when the translator is
862 done with the PO file. If the file has been modified, it is saved
863 on disk first. However, prior to all this, the command checks if
864 some untranslated message remains in the PO file and, if yes, the
865 translator is asked if she really wants to leave working with this
866 PO file. This is the preferred way of getting rid of an Emacs PO
867 file buffer. Merely killing it through the usual command <KBD>C-x
868 k</KBD> (<CODE>kill-buffer</CODE>), say, has the unnice effect of leaving a PO
869 internal work buffer behind.
870
871 </P>
872 <P>
873 The command <KBD>o</KBD> (<CODE>po-other-window</CODE>) is another, softer
874 way, to leave PO mode, temporarily. It just moves the cursor in
875 some other Emacs window, and pops one if necessary. For example, if
876 the translator just got PO mode to show some source context in some
877 other, she might discover some apparent bug in the program source
878 that needs correction. This command allows the translator to change
879 sex, become a programmer, and have the cursor right into the window
880 containing the program she (or rather <EM>he</EM>) wants to modify.
881 By later getting the cursor back in the PO file window, or by
882 asking Emacs to edit this file once again, PO mode is then recovered.
883
884 </P>
885 <P>
886 The command <KBD>h</KBD> (<CODE>po-help</CODE>) displays a summary of all
887 available PO mode commands. The translator should then type any
888 character to resume normal PO mode operations. The command <KBD>?</KBD>
889 has the same effect as <KBD>h</KBD>.
890
891 </P>
892 <P>
893 The command <KBD>=</KBD> (<CODE>po-statistics</CODE>) computes the total number
894 of entries in the PO file, the ordinal of the current entry
895 (counted from 1), the number of untranslated entries, the number of
896 obsolete entries, and displays all these numbers.
897
898 </P>
899 <P>
900 The command <KBD>v</KBD> (<CODE>po-validate</CODE>) launches <CODE>msgfmt</CODE> in
901 verbose mode over the current PO file. This command first offers
902 to save the current PO file on disk. The <CODE>msgfmt</CODE> tool, from
903 GNU <CODE>gettext</CODE>, has the purpose of creating an MO file out of a
904 PO file, and PO mode uses the features of this program for checking
905 the overall format of a PO file, as well as all individual entries.
906
907 </P>
908 <P>
909 The program <CODE>msgfmt</CODE> runs asynchronously with Emacs, so
910 the translator regains control immediately while her PO file
911 is being studied. Error output is collected in the GNU Emacs
912 <SAMP>`*compilation*'</SAMP> buffer, displayed in another window. The regular
913 GNU Emacs command <KBD>C-x`</KBD> (<CODE>next-error</CODE>), as well as other
914 usual compile commands, allow the translator to reposition quickly to
915 the offending parts of the PO file. Once the cursor on the line in
916 error, the translator may decide for any PO mode action which would
917 help correcting the error.
918
919 </P>
920
921
922 <H2><A NAME="SEC11" HREF="gettext_toc.html#TOC11">Entry Positioning</A></H2>
923
924 <P>
925 The cursor in a PO file window is almost always part of
926 an entry. The only exceptions are the special case when the cursor
927 is after the last entry in the file, or when the PO file is
928 empty. The entry where the cursor is found to be is said to be the
929 current entry. Many PO mode commands operate on the current entry,
930 so moving the cursor does more than allowing the translator to browse
931 the PO file, this also selects on which entry commands operate.
932
933 </P>
934 <P>
935 Some PO mode commands alter the position of the cursor in a specialized
936 way. A few of those special purpose positioning are described here,
937 the others are described in following sections.
938
939 </P>
940 <DL COMPACT>
941
942 <DT><KBD>.</KBD>
943 <DD>
944 Redisplay the current entry.
945
946 <DT><KBD>n</KBD>
947 <DD>
948 <DT><KBD>SPC</KBD>
949 <DD>
950 Select the entry after the current one.
951
952 <DT><KBD>p</KBD>
953 <DD>
954 <DT><KBD>DEL</KBD>
955 <DD>
956 Select the entry before the current one.
957
958 <DT><KBD>&#60;</KBD>
959 <DD>
960 Select the first entry in the PO file.
961
962 <DT><KBD>&#62;</KBD>
963 <DD>
964 Select the last entry in the PO file.
965
966 <DT><KBD>m</KBD>
967 <DD>
968 Record the location of the current entry for later use.
969
970 <DT><KBD>l</KBD>
971 <DD>
972 Return to a previously saved entry location.
973
974 <DT><KBD>x</KBD>
975 <DD>
976 Exchange the current entry location with the previously saved one.
977
978 </DL>
979
980 <P>
981 Any GNU Emacs command able to reposition the cursor may be used
982 to select the current entry in PO mode, including commands which
983 move by characters, lines, paragraphs, screens or pages, and search
984 commands. However, there is a kind of standard way to display the
985 current entry in PO mode, which usual GNU Emacs commands moving
986 the cursor do not especially try to enforce. The command <KBD>.</KBD>
987 (<CODE>po-current-entry</CODE>) has the sole purpose of redisplaying the
988 current entry properly, after the current entry has been changed by
989 means external to PO mode, or the Emacs screen otherwise altered.
990
991 </P>
992 <P>
993 It is yet to decide if PO mode would help the translator, or otherwise
994 irritate her, by forcing a more fixed window disposition while she
995 is doing her work. We originally had quite precise ideas about
996 how windows should behave, but on the other hand, anyone used to
997 GNU Emacs is often happy to keep full control. Maybe a fixed window
998 disposition might be offered as a PO mode option that the translator
999 might activate or deactivate at will, so it could be offered on an
1000 experimental basis. If nobody feels a real need for using it, or
1001 a compulsion for writing it, we might as well drop this whole idea.
1002 The incentive for doing it should come from translators rather than
1003 programmers, as opinions from an experienced translator are surely
1004 more worth to me than opinions from programmers <EM>thinking</EM> about
1005 how <EM>others</EM> should do translation.
1006
1007 </P>
1008 <P>
1009 The commands <KBD>n</KBD> (<CODE>po-next-entry</CODE>) and <KBD>p</KBD>
1010 (<CODE>po-previous-entry</CODE>) move the cursor the entry following,
1011 or preceding, the current one. If <KBD>n</KBD> is given while the
1012 cursor is on the last entry of the PO file, or if <KBD>p</KBD>
1013 is given while the cursor is on the first entry, no move is done.
1014 <KBD><KBD>SPC</KBD></KBD> and <KBD><KBD>DEL</KBD></KBD> are alternate keys for <KBD>n</KBD> and
1015 <KBD>p</KBD>, respectively.
1016
1017 </P>
1018 <P>
1019 The commands <KBD>&#60;</KBD> (<CODE>po-first-entry</CODE>) and <KBD>&#62;</KBD>
1020 (<CODE>po-last-entry</CODE>) move the cursor to the first entry, or last
1021 entry, of the PO file. When the cursor is located past the last
1022 entry in a PO file, most PO mode commands will return an error saying
1023 <SAMP>`After last entry'</SAMP>. However, the commands <KBD>&#60;</KBD> and <KBD>&#62;</KBD>
1024 have the special property of being able to work even when the cursor
1025 is not into some PO file entry, and you may use them for nicely
1026 correcting this situation. But even these commands will fail on a
1027 truly empty PO file. There are development plans for PO mode for it
1028 to interactively fill an empty PO file from sources. See section <A HREF="gettext.html#SEC16">Marking Translatable Strings</A>.
1029
1030 </P>
1031 <P>
1032 The translator may decide, before working at the translation of
1033 a particular entry, that she needs browsing the remainder of the
1034 PO file, maybe for finding the terminology or phraseology used
1035 in related entries. She can of course use the standard Emacs idioms
1036 for saving the current cursor location in some register, and use that
1037 register for getting back, or else, to use the location ring.
1038
1039 </P>
1040 <P>
1041 PO mode offers another approach, by which cursor locations may be saved
1042 onto a special stack. The command <KBD>m</KBD> (<CODE>po-push-location</CODE>)
1043 merely adds the location of current entry to the stack, pushing
1044 the already saved locations under the new one. The command
1045 <KBD>l</KBD> (<CODE>po-pop-location</CODE>) consumes the top stack element and
1046 reposition the cursor to the entry associated with that top element.
1047 This position is then lost, for the next <KBD>l</KBD> will move the cursor
1048 to the previously saved location, and so on until locations remain
1049 on the stack.
1050
1051 </P>
1052 <P>
1053 If the translator wants the position to be kept on the location stack,
1054 maybe for taking a mere look at the entry associated with the top
1055 element, then go elsewhere with the intent of getting back later, she
1056 ought to use <KBD>m</KBD> immediately after <KBD>l</KBD>.
1057
1058 </P>
1059 <P>
1060 The command <KBD>x</KBD> (<CODE>po-exchange-location</CODE>) simultaneously
1061 reposition the cursor to the entry associated with the top element of
1062 the stack of saved locations, and replace that top element with the
1063 location of the current entry before the move. Consequently, repeating
1064 the <KBD>x</KBD> command toggles alternatively between two entries.
1065 For achieving this, the translator will position the cursor on the
1066 first entry, use <KBD>m</KBD>, then position to the second entry, and
1067 merely use <KBD>x</KBD> for making the switch.
1068
1069 </P>
1070
1071
1072 <H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">Normalizing Strings in Entries</A></H2>
1073
1074 <P>
1075 There are many different ways for encoding a particular string into a
1076 PO file entry, because there are so many different ways to split and
1077 quote multi-line strings, and even, to represent special characters
1078 by backslahsed escaped sequences. Some features of PO mode rely on
1079 the ability for PO mode to scan an already existing PO file for a
1080 particular string encoded into the <CODE>msgid</CODE> field of some entry.
1081 Even if PO mode has internally all the built-in machinery for
1082 implementing this recognition easily, doing it fast is technically
1083 difficult. For facilitating a solution to this efficiency problem,
1084 we decided for a canonical representation for strings.
1085
1086 </P>
1087 <P>
1088 A conventional representation of strings in a PO file is currently
1089 under discussion, and PO mode experiments a canonical representation.
1090 Having both <CODE>xgettext</CODE> and PO mode converging towards a uniform
1091 way of representing equivalent strings would be useful, as the internal
1092 normalization needed by PO mode could be automatically satisfied
1093 when using <CODE>xgettext</CODE> from GNU <CODE>gettext</CODE>. An explicit
1094 PO mode normalization should then be only necessary for PO files
1095 imported from elsewhere, or for when the convention itself evolves.
1096
1097 </P>
1098 <P>
1099 So, for achieving normalization of at least the strings of a given
1100 PO file needing a canonical representation, the following PO mode
1101 command is available:
1102
1103 </P>
1104 <DL COMPACT>
1105
1106 <DT><KBD>M-x po-normalize</KBD>
1107 <DD>
1108 Tidy the whole PO file by making entries more uniform.
1109
1110 </DL>
1111
1112 <P>
1113 The special command <KBD>M-x po-normalize</KBD>, which has no associate
1114 keys, revises all entries, ensuring that strings of both original
1115 and translated entries use uniform internal quoting in the PO file.
1116 It also removes any crumb after the last entry. This command may be
1117 useful for PO files freshly imported from elsewhere, or if we ever
1118 improve on the canonical quoting format we use. This canonical format
1119 is not only meant for getting cleaner PO files, but also for greatly
1120 speeding up <CODE>msgid</CODE> string lookup for some other PO mode commands.
1121
1122 </P>
1123 <P>
1124 <KBD>M-x po-normalize</KBD> presently makes three passes over the entries.
1125 The first implements heuristics for converting PO files for GNU
1126 <CODE>gettext</CODE> 0.6 and earlier, in which <CODE>msgid</CODE> and <CODE>msgstr</CODE>
1127 fields were using K&#38;R style C string syntax for multi-line strings.
1128 These heuristics may fail for comments not related to obsolete
1129 entries and ending with a backslash; they also depend on subsequent
1130 passes for finalizing the proper commenting of continued lines for
1131 obsolete entries. This first pass might disappear once all oldish PO
1132 files would have been adjusted. The second and third pass normalize
1133 all <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings respectively. They also
1134 clean out those trailing backslashes used by XView's <CODE>msgfmt</CODE>
1135 for continued lines.
1136
1137 </P>
1138 <P>
1139 Having such an explicit normalizing command allows for importing PO
1140 files from other sources, but also eases the evolution of the current
1141 convention, evolution driven mostly by aesthetic concerns, as of now.
1142 It is all easy to make suggested adjustments at a later time, as the
1143 normalizing command and eventually, other GNU <CODE>gettext</CODE> tools
1144 should greatly automate conformance. A description of the canonical
1145 string format is given below, for the particular benefit of those not
1146 having GNU Emacs handy, and who would nevertheless want to handcraft
1147 their PO files in nice ways.
1148
1149 </P>
1150 <P>
1151 Right now, in PO mode, strings are single line or multi-line. A string
1152 goes multi-line if and only if it has <EM>embedded</EM> newlines, that
1153 is, if it matches <SAMP>`[^\n]\n+[^\n]'</SAMP>. So, we would have:
1154
1155 </P>
1156
1157 <PRE>
1158 msgstr "\n\nHello, world!\n\n\n"
1159 </PRE>
1160
1161 <P>
1162 but, replacing the space by a newline, this becomes:
1163
1164 </P>
1165
1166 <PRE>
1167 msgstr ""
1168 "\n"
1169 "\n"
1170 "Hello,\n"
1171 "world!\n"
1172 "\n"
1173 "\n"
1174 </PRE>
1175
1176 <P>
1177 We are deliberately using a caricatural example, here, to make the
1178 point clearer. Usually, multi-lines are not that bad looking.
1179 It is probable that we will implement the following suggestion.
1180 We might lump together all initial newlines into the empty string,
1181 and also all newlines introducing empty lines (that is, for <VAR>n</VAR>
1182 &#62; 1, the <VAR>n</VAR>-1'th last newlines would go together on a separate
1183 string), so making the previous example appear:
1184
1185 </P>
1186
1187 <PRE>
1188 msgstr "\n\n"
1189 "Hello,\n"
1190 "world!\n"
1191 "\n\n"
1192 </PRE>
1193
1194 <P>
1195 There are a few yet undecided little points about string normalization,
1196 to be documented in this manual, once these questions settle.
1197
1198 </P>
1199
1200
1201 <H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">Preparing Program Sources</A></H1>
1202
1203 <P>
1204 For the programmer, changes to the C source code fall into three
1205 categories. First, you have to make the localization functions
1206 known to all modules needing message translation. Second, you should
1207 properly trigger the operation of GNU <CODE>gettext</CODE> when the program
1208 initializes, usually from the <CODE>main</CODE> function. Last, you should
1209 identify and especially mark all constant strings in your program
1210 needing translation.
1211
1212 </P>
1213 <P>
1214 Presuming that your set of programs, or package, has been adjusted
1215 so all needed GNU <CODE>gettext</CODE> files are available, and your
1216 <TT>`Makefile'</TT> files are adjusted (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>), each C module
1217 having translated C strings should contain the line:
1218
1219 </P>
1220
1221 <PRE>
1222 #include &#60;libintl.h&#62;
1223 </PRE>
1224
1225 <P>
1226 The remaining changes to your C sources are discussed in the further
1227 sections of this chapter.
1228
1229 </P>
1230
1231
1232
1233 <H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">Triggering <CODE>gettext</CODE> Operations</A></H2>
1234
1235 <P>
1236 The initialization of locale data should be done with more or less
1237 the same code in every program, as demonstrated below:
1238
1239 </P>
1240
1241 <PRE>
1242 int
1243 main (argc, argv)
1244 int argc;
1245 char argv;
1246 {
1247 ...
1248 setlocale (LC_ALL, "");
1249 bindtextdomain (PACKAGE, LOCALEDIR);
1250 textdomain (PACKAGE);
1251 ...
1252 }
1253 </PRE>
1254
1255 <P>
1256 <VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
1257 <TT>`config.h'</TT> or by the Makefile. For now consult the <CODE>gettext</CODE>
1258 sources for more information.
1259
1260 </P>
1261 <P>
1262 The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
1263 <CODE>LC_ALL</CODE> includes all locale categories and especially
1264 <CODE>LC_CTYPE</CODE>. This later category is responsible for determining
1265 character classes with the <CODE>isalnum</CODE> etc. functions from
1266 <TT>`ctype.h'</TT> which could especially for programs, which process some
1267 kind of input language, be wrong. For example this would mean that a
1268 source code using the (cedille character) is runnable in
1269 France but not in the U.S.
1270
1271 </P>
1272 <P>
1273 So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
1274 code above by a sequence of <CODE>setlocale</CODE> lines
1275
1276 </P>
1277
1278 <PRE>
1279 {
1280 ...
1281 setlocale (LC_TIME, "");
1282 setlocale (LC_MESSAGES, "");
1283 ...
1284 }
1285 </PRE>
1286
1287 <P>
1288 or to switch for and back to the character class in question.
1289
1290 </P>
1291
1292
1293 <H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">How Marks Appears in Sources</A></H2>
1294
1295 <P>
1296 The C sources should mark all strings requiring translation. Marking
1297 is done in such a way that each translatable string appears to be
1298 the sole argument of some function or preprocessor macro. There are
1299 only a few such possible functions or macros meant for translation,
1300 and their names are said to be marking keywords. The marking is
1301 attached to strings themselves, rather than to what we do with them.
1302 This approach has more uses. A blatant example is an error message
1303 produced by formatting. The format string needs translation, as
1304 well as some strings inserted through some <SAMP>`%s'</SAMP> specification
1305 in the format, while the result from <CODE>sprintf</CODE> may have so many
1306 different instances that it is unpractical to list them all in some
1307 <SAMP>`error_string_out()'</SAMP> routine, say.
1308
1309 </P>
1310 <P>
1311 This marking operation has two goals. The first goal of marking
1312 is for triggering the retrieval of the translation, at run time.
1313 The keyword are possibly resolved into a routine able to dynamically
1314 return the proper translation, as far as possible or wanted, for the
1315 argument string. Most localizable strings are found into executable
1316 positions, that is, affected to variables or given as parameter to
1317 functions. But this is not universal usage, and some translatable
1318 strings appear in structured initializations. See section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>.
1319
1320 </P>
1321 <P>
1322 The second goal of the marking operation is to help <CODE>xgettext</CODE>
1323 at properly extracting all translatable strings when it scans a set
1324 of program sources and produces PO file templates.
1325
1326 </P>
1327 <P>
1328 The canonical keyword for marking translatable strings is
1329 <SAMP>`gettext'</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
1330 package. For packages making only light use of the <SAMP>`gettext'</SAMP>
1331 keyword, macro or function, it is easily used <EM>as is</EM>. However,
1332 for packages using the <CODE>gettext</CODE> interface more heavily, it
1333 is usually more convenient giving the main keyword a shorter, less
1334 obtrusive name. Indeed, the keyword might appear on a lot of strings
1335 all over the package, and programmers usually do not want nor need
1336 that their program sources remind them loud, all the time, that they
1337 are internationalized. Further, a long keyword has the disadvantage
1338 of using more horizontal space, forcing more indentation work on
1339 sources for those trying to keep them within 79 or 80 columns.
1340
1341 </P>
1342 <P>
1343 Many GNU packages use <SAMP>`_'</SAMP> (a simple underline) as a keyword,
1344 and write <SAMP>`_("Translatable string")'</SAMP> instead of <SAMP>`gettext
1345 ("Translatable string")'</SAMP>. Further, the usual GNU coding rule
1346 wanting that there is a space between the keyword and the opening
1347 parenthesis is relaxed, in practice, for this particular usage.
1348 So, the textual overhead per translatable string is reduced to
1349 only three characters: the underline and the two parentheses.
1350 However, even if GNU <CODE>gettext</CODE> uses this convention internally,
1351 it does not offer it officially. The real, genuine keyword is truly
1352 <SAMP>`gettext'</SAMP> indeed. It is fairly easy for those wanting to use
1353 <SAMP>`_'</SAMP> instead of <SAMP>`gettext'</SAMP> to declare:
1354
1355 </P>
1356
1357 <PRE>
1358 #include &#60;libintl.h&#62;
1359 #define _(String) gettext (String)
1360 </PRE>
1361
1362 <P>
1363 instead of merely using <SAMP>`#include &#60;libintl.h&#62;'</SAMP>.
1364
1365 </P>
1366 <P>
1367 Later on, the maintenance is relatively easy. If, as a programmer,
1368 you add or modify a string, you will have to ask yourself if the
1369 new or altered string requires translation, and include it within
1370 <SAMP>`_()'</SAMP> if you think it should be translated. <SAMP>`"%s: %d"'</SAMP> is
1371 an example of string <EM>not</EM> requiring translation!
1372
1373 </P>
1374
1375
1376 <H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">Marking Translatable Strings</A></H2>
1377
1378 <P>
1379 In PO mode, one set of features is meant more for the programmer than
1380 for the translator, and allows him to interactively mark which strings,
1381 in a set of program sources, are translatable, and which are not.
1382 Even if it is a fairly easy job for a programmer to find and mark
1383 such strings by other means, using any editor of his choice, PO mode
1384 makes this work more comfortable. Further, this gives translators
1385 who feel a little like programmers, or programmers who feel a little
1386 like translators, a tool letting them work at marking translatable
1387 strings in the program sources, while simultaneously producing a set of
1388 translation in some language, for the package being internationalized.
1389
1390 </P>
1391 <P>
1392 The set of program sources, aimed by the PO mode commands describe
1393 here, should have an Emacs tags table constructed for your project,
1394 prior to using these PO file commands. This is easy to do. In any
1395 shell window, change the directory to the root of your project, then
1396 execute a command resembling:
1397
1398 </P>
1399
1400 <PRE>
1401 etags src/*.[hc] lib/*.[hc]
1402 </PRE>
1403
1404 <P>
1405 presuming here you want to process all <TT>`.h'</TT> and <TT>`.c'</TT> files
1406 from the <TT>`src/'</TT> and <TT>`lib/'</TT> directories. This command will
1407 explore all said files and create a <TT>`TAGS'</TT> file in your root
1408 directory, somewhat summarizing the contents using a special file
1409 format Emacs can understand.
1410
1411 </P>
1412 <P>
1413 For official GNU packages which follow the GNU coding standard there is
1414 a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which construct the tag files in
1415 all directories and for all files containing source code.
1416
1417 </P>
1418 <P>
1419 Once your <TT>`TAGS'</TT> file is ready, the following commands assist
1420 the programmer at marking translatable strings in his set of sources.
1421 But these commands are necessarily driven from within a PO file
1422 window, and it is likely that you do not even have such a PO file yet.
1423 This is not a problem at all, as you may safely open a new, empty PO
1424 file, mainly for using these commands. This empty PO file will slowly
1425 fill in while you mark strings as translatable in your program sources.
1426
1427 </P>
1428 <DL COMPACT>
1429
1430 <DT><KBD>,</KBD>
1431 <DD>
1432 Search through program sources for a string which looks like a
1433 candidate for translation.
1434
1435 <DT><KBD>M-,</KBD>
1436 <DD>
1437 Mark the last string found with <SAMP>`_()'</SAMP>.
1438
1439 <DT><KBD>M-.</KBD>
1440 <DD>
1441 Mark the last string found with a keyword taken from a set of possible
1442 keywords. This command with a prefix allows some management of these
1443 keywords.
1444
1445 </DL>
1446
1447 <P>
1448 The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command search for the next
1449 occurrence of a string which looks like a possible candidate for
1450 translation, and displays the program source in another Emacs window,
1451 positioned in such a way that the string is near the top of this other
1452 window. If the string is to big to fit whole in this window, it is
1453 rather positioned so only its end is shown. In any case, the cursor
1454 is left in the PO file window. If the shown string would be better
1455 presented differently in different native languages, you may mark it
1456 using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it
1457 and skip to the next string by merely repeating the <KBD>,</KBD> command.
1458
1459 </P>
1460 <P>
1461 A string is a good candidate for translation if it contains a sequence
1462 of three or more letters. A string containing at most two letters in
1463 a row will be considered as a candidate if it has more letters than
1464 non-letters. The command disregards strings containing no letters,
1465 or isolated letters only. It also disregards strings within comments,
1466 or strings already marked with some keyword PO mode knows (see below).
1467
1468 </P>
1469 <P>
1470 If you have never told Emacs about some <TT>`TAGS'</TT> file to use, the
1471 command will request that you specify one from the minibuffer, the
1472 first time you use the command. You may later change your <TT>`TAGS'</TT>
1473 file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
1474 which will ask you to name the precise <TT>`TAGS'</TT> file you want
1475 to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>.
1476
1477 </P>
1478 <P>
1479 Each time you use the <KBD>,</KBD> command, the search resumes where it was
1480 left over by the previous search, and goes through all program sources,
1481 obeying the <TT>`TAGS'</TT> file, until all sources have been processed.
1482 However, by giving a prefix argument to the command (<KBD>C-u
1483 ,)</KBD>, you may request that the search be restarted all over again
1484 from the first program source; but in this case, strings that you
1485 recently marked as translatable will be automatically skipped.
1486
1487 </P>
1488 <P>
1489 Using this <KBD>,</KBD> command does not prevent using of other regular
1490 Emacs tags commands. For example, regular <CODE>tags-search</CODE> or
1491 <CODE>tags-query-replace</CODE> commands may be used without disrupting the
1492 independent <KBD>,</KBD> search sequence. However, as implemented, the
1493 <EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
1494 prefix) might also reinitialize the regular Emacs tags searching to the
1495 first tags file, this reinitialization might be considered spurious.
1496
1497 </P>
1498 <P>
1499 The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
1500 recently found string with the <SAMP>`_'</SAMP> keyword. The <KBD>M-.</KBD>
1501 (<CODE>po-select-mark-and-mark</CODE>) command will request that you type
1502 one keyword from the minibuffer and use that keyword for marking
1503 the string. Both commands will automatically create a new PO file
1504 untranslated entry for the string being marked, and make it the
1505 current entry (making it easy for you to immediately proceed to its
1506 translation, if you feel like doing it right away). It is possible
1507 that the modifications made to the program source by <KBD>M-,</KBD> or
1508 <KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
1509 to break and re-indent this line differently. You may use the <KBD>o</KBD>
1510 command from PO mode, or any other window changing command from
1511 GNU Emacs, to break out into the program source window, and do any
1512 needed adjustments. You will have to use some regular Emacs command
1513 to return the cursor to the PO file window, if you want commanding
1514 <KBD>,</KBD> for the next string, say.
1515
1516 </P>
1517 <P>
1518 The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
1519 have to explicitly type all keywords all the time. The first such
1520 speedup is that you are presented with a <EM>preferred</EM> keyword,
1521 which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
1522 The second speedup is that you may type any non-ambiguous prefix of the
1523 keyword you really mean, and the command will complete it automatically
1524 for you. This also means that PO mode has to <EM>know</EM> all
1525 your possible keywords, and that it will not accept mistyped keywords.
1526
1527 </P>
1528 <P>
1529 If you reply <KBD>?</KBD> to the keyword request, the command gives a
1530 list of all known keywords, from which you may choose. When the
1531 command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
1532 updating any program source or PO file buffer, and does some simple
1533 keyword management instead. In this case, the command asks for a
1534 keyword, written in full, which becomes a new allowed keyword for
1535 later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically
1536 becomes the <EM>preferred</EM> keyword for later commands. By typing
1537 an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
1538 changes the <EM>preferred</EM> keyword and does nothing more.
1539
1540 </P>
1541 <P>
1542 All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
1543 when scanning for strings, and strings already marked by any of those
1544 known keywords are automatically skipped. If many PO files are opened
1545 simultaneously, each one has its own independent set of known keywords.
1546 There is no provision in PO mode, currently, for deleting a known
1547 keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
1548 it afresh. When a PO file is newly brought up in an Emacs window, only
1549 <SAMP>`gettext'</SAMP> and <SAMP>`_'</SAMP> are known as keywords, and <SAMP>`gettext'</SAMP>
1550 is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to
1551 prefer <SAMP>`_'</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
1552
1553 </P>
1554
1555
1556 <H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">Special Cases of Translatable Strings</A></H2>
1557
1558 <P>
1559 The attentive reader might now point out that it is not always possible
1560 to mark translatable string with <CODE>gettext</CODE> or something like this.
1561 Consider the following case:
1562
1563 </P>
1564
1565 <PRE>
1566 {
1567 static const char *messages[] = {
1568 "some very meaningful message",
1569 "and another one"
1570 };
1571 const char *string;
1572 ...
1573 string
1574 = index &#62; 1 ? "a default message" : messages[index];
1575
1576 fputs (string);
1577 ...
1578 }
1579 </PRE>
1580
1581 <P>
1582 While it is no problem to mark the string <CODE>"a default message"</CODE> it
1583 is not possible to mark the string initializers for <CODE>messages</CODE>.
1584 What is to do? We have to fulfill two tasks. First we have to mark the
1585 strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>)
1586 can find them, and second we have to translate the string at runtime
1587 before printing them.
1588
1589 </P>
1590 <P>
1591 The first task can be fulfilled by creating a new keyword, which names a
1592 no-op. For the second we have to mark all access points to a string
1593 from the array. So one solution can look like this:
1594
1595 </P>
1596
1597 <PRE>
1598 #define gettext_noop(String) (String)
1599
1600 {
1601 static const char *messages[] = {
1602 gettext_noop ("some very meaningful message"),
1603 gettext_noop ("and another one")
1604 };
1605 const char *string;
1606 ...
1607 string
1608 = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
1609
1610 fputs (string);
1611 ...
1612 }
1613 </PRE>
1614
1615 <P>
1616 Please convince yourself that the string which is written by
1617 <CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know
1618 the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>.
1619
1620 </P>
1621 <P>
1622 The above is of course not the only solution. You could also come along
1623 with the following one:
1624
1625 </P>
1626
1627 <PRE>
1628 #define gettext_noop(String) (String)
1629
1630 {
1631 static const char *messages[] = {
1632 gettext_noop ("some very meaningful message",
1633 gettext_noop ("and another one")
1634 };
1635 const char *string;
1636 ...
1637 string
1638 = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
1639
1640 fputs (gettext (string));
1641 ...
1642 }
1643 </PRE>
1644
1645 <P>
1646 But this has some drawbacks. First the programmer has to take care that
1647 he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
1648 A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
1649 The second reason is found in the internals of the GNU <CODE>gettext</CODE>
1650 Library which will make this solution less efficient.
1651
1652 </P>
1653 <P>
1654 One advantage is that you need not make control flow analysis to make
1655 sure the output is really translated in any case. But this analysis is
1656 generally not very difficult. If it should be in any situation you can
1657 use this second method in this situation.
1658
1659 </P>
1660
1661
1662
1663 <H1><A NAME="SEC18" HREF="gettext_toc.html#TOC18">Making the Initial PO File</A></H1>
1664
1665
1666
1667 <H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">Invoking the <CODE>xgettext</CODE> Program</A></H2>
1668
1669
1670 <PRE>
1671 xgettext [<VAR>option</VAR>] <VAR>inputfile</VAR> ...
1672 </PRE>
1673
1674 <DL COMPACT>
1675
1676 <DT><SAMP>`-a'</SAMP>
1677 <DD>
1678 <DT><SAMP>`--extract-all'</SAMP>
1679 <DD>
1680 Extract all strings.
1681
1682 <DT><SAMP>`-c [<VAR>tag</VAR>]'</SAMP>
1683 <DD>
1684 <DT><SAMP>`--add-comments[=<VAR>tag</VAR>]'</SAMP>
1685 <DD>
1686 Place comment block with <VAR>tag</VAR> (or those preceding keyword lines)
1687 in output file.
1688
1689 <DT><SAMP>`-C'</SAMP>
1690 <DD>
1691 <DT><SAMP>`--c++'</SAMP>
1692 <DD>
1693 Recognize C++ style comments.
1694
1695 <DT><SAMP>`-d <VAR>name</VAR>'</SAMP>
1696 <DD>
1697 <DT><SAMP>`--default-domain=<VAR>name</VAR>'</SAMP>
1698 <DD>
1699 Use <TT>`<VAR>name</VAR>.po'</TT> for output (instead of <TT>`messages.po'</TT>).
1700
1701 <DT><SAMP>`-D <VAR>directory</VAR>'</SAMP>
1702 <DD>
1703 <DT><SAMP>`--directory=<VAR>directory</VAR>'</SAMP>
1704 <DD>
1705 Change to <VAR>directory</VAR> before beginning to search and scan source
1706 files. The resulting <TT>`.po'</TT> file will be written relative to the
1707 original directory, though.
1708
1709 <DT><SAMP>`-f <VAR>file</VAR>'</SAMP>
1710 <DD>
1711 <DT><SAMP>`--files-from=<VAR>file</VAR>'</SAMP>
1712 <DD>
1713 Read the names of the input files from <VAR>file</VAR> instead of getting
1714 them from the command line.
1715
1716 <DT><SAMP>`-h'</SAMP>
1717 <DD>
1718 <DT><SAMP>`--help'</SAMP>
1719 <DD>
1720 Display this help and exit.
1721
1722 <DT><SAMP>`-I <VAR>list</VAR>'</SAMP>
1723 <DD>
1724 <DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP>
1725 <DD>
1726 List of directories searched for input files.
1727
1728 <DT><SAMP>`-j'</SAMP>
1729 <DD>
1730 <DT><SAMP>`--join-existing'</SAMP>
1731 <DD>
1732 Join messages with existing file.
1733
1734 <DT><SAMP>`-k <VAR>word</VAR>'</SAMP>
1735 <DD>
1736 <DT><SAMP>`--keyword[=<VAR>word</VAR>]'</SAMP>
1737 <DD>
1738 Additonal keyword to be looked for (without <VAR>word</VAR> means not to
1739 use default keywords).
1740
1741 The default keywords, which are always looked for if not explicitly
1742 disabled, are <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE> and
1743 <CODE>gettext_noop</CODE>.
1744
1745 <DT><SAMP>`-m [<VAR>string</VAR>]'</SAMP>
1746 <DD>
1747 <DT><SAMP>`--msgstr-prefix[=<VAR>string</VAR>]'</SAMP>
1748 <DD>
1749 Use <VAR>string</VAR> or "" as prefix for msgstr entries.
1750
1751 <DT><SAMP>`-M [<VAR>string</VAR>]'</SAMP>
1752 <DD>
1753 <DT><SAMP>`--msgstr-suffix[=<VAR>string</VAR>]'</SAMP>
1754 <DD>
1755 Use <VAR>string</VAR> or "" as suffix for msgstr entries.
1756
1757 <DT><SAMP>`--no-location'</SAMP>
1758 <DD>
1759 Do not write <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines.
1760
1761 <DT><SAMP>`-n'</SAMP>
1762 <DD>
1763 <DT><SAMP>`--add-location'</SAMP>
1764 <DD>
1765 Generate <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines (default).
1766
1767 <DT><SAMP>`--omit-header'</SAMP>
1768 <DD>
1769 Don't write header with <SAMP>`msgid ""'</SAMP> entry.
1770
1771 This is useful for testing purposes because it eliminates a source
1772 of variance for generated <CODE>.gmo</CODE> files. We can ship some of
1773 these files in the GNU <CODE>gettext</CODE> package, and the result of
1774 regenerating them through <CODE>msgfmt</CODE> should yield the same values.
1775
1776 <DT><SAMP>`-p <VAR>dir</VAR>'</SAMP>
1777 <DD>
1778 <DT><SAMP>`--output-dir=<VAR>dir</VAR>'</SAMP>
1779 <DD>
1780 Output files will be placed in directory <VAR>dir</VAR>.
1781
1782 <DT><SAMP>`-s'</SAMP>
1783 <DD>
1784 <DT><SAMP>`--sort-output'</SAMP>
1785 <DD>
1786 Generate sorted output and remove duplicates.
1787
1788 <DT><SAMP>`--strict'</SAMP>
1789 <DD>
1790 Write out strict Uniforum conforming PO file.
1791
1792 <DT><SAMP>`-v'</SAMP>
1793 <DD>
1794 <DT><SAMP>`--version'</SAMP>
1795 <DD>
1796 Output version information and exit.
1797
1798 <DT><SAMP>`-x <VAR>file</VAR>'</SAMP>
1799 <DD>
1800 <DT><SAMP>`--exclude-file=<VAR>file</VAR>'</SAMP>
1801 <DD>
1802 Entries from <VAR>file</VAR> are not extracted.
1803
1804 </DL>
1805
1806 <P>
1807 Search path for supplementary PO files is:
1808 <TT>`/usr/local/share/nls/src/'</TT>.
1809
1810 </P>
1811 <P>
1812 If <VAR>inputfile</VAR> is <SAMP>`-'</SAMP>, standard input is read.
1813
1814 </P>
1815 <P>
1816 This implementation of <CODE>xgettext</CODE> is able to process a few awkward
1817 cases, like strings in preprocessor macros, ANSI concatenation of
1818 adjacent strings, and escaped end of lines for continued strings.
1819
1820 </P>
1821
1822
1823 <H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">C Sources Context</A></H2>
1824
1825 <P>
1826 PO mode is particularily powerful when used with PO files
1827 created through GNU <CODE>gettext</CODE> utilities, as those utilities
1828 insert special comments in the PO files they generate.
1829 Some of these special comments relate the PO file entry to
1830 exactly where the untranslated string appears in the program sources.
1831
1832 </P>
1833 <P>
1834 When the translator gets to an untranslated entry, she is fairly
1835 often faced with an original string which is not as informative as
1836 it normally should, being succinct, cryptic, or otherwise ambiguous.
1837 Before chosing how to translate the string, she needs to understand
1838 better what the string really means and how tight the translation has
1839 to be. Most of times, when problems arise, the only way left to make
1840 her judgment is looking at the true program sources from where this
1841 string originated, searching for surrounding comments the programmer
1842 might have put in there, and looking around for helping clues of
1843 <EM>any</EM> kind.
1844
1845 </P>
1846 <P>
1847 Surely, when looking at program sources, the translator will receive
1848 more help if she is a fluent programmer. However, even if she is
1849 not versed in programming and feels a little lost in C code, the
1850 translator should not be shy at taking a look, once in a while.
1851 It is most probable that she will still be able to find some of the
1852 hints she needs. She will learn quickly to not feel uncomfortable
1853 in program code, paying more attention to programmer's comments,
1854 variable and function names (if he dared chosing them well), and
1855 overall organization, than to programmation itself.
1856
1857 </P>
1858 <P>
1859 The following commands are meant to help the translator at getting
1860 program source context for a PO file entry.
1861
1862 </P>
1863 <DL COMPACT>
1864
1865 <DT><KBD>c</KBD>
1866 <DD>
1867 Resume the display of a program source context, or cycle through them.
1868
1869 <DT><KBD>M-c</KBD>
1870 <DD>
1871 Display of a program source context selected by menu.
1872
1873 <DT><KBD>d</KBD>
1874 <DD>
1875 Add a directory to the search path for source files.
1876
1877 <DT><KBD>M-d</KBD>
1878 <DD>
1879 Delete a directory from the search path for source files.
1880
1881 </DL>
1882
1883 <P>
1884 The commands <KBD>c</KBD> (<CODE>po-cycle-reference</CODE>) and <KBD>M-c</KBD>
1885 (<CODE>po-select-reference</CODE>) both open another window displaying
1886 some source program file, and already positioned in such a way that
1887 it shows an actual use of the current string to translate. By doing
1888 so, the command gives source program context for the string. But if
1889 the entry has no source context references, or if all references
1890 are unresolved along the search path for program sources, then the
1891 command diagnoses this as an error.
1892
1893 </P>
1894 <P>
1895 Even if <KBD>c</KBD> (or <KBD>M-c</KBD>) opens a new window, the cursor stays
1896 in the PO file window. If the translator really wants to
1897 get into the program source window, she ought to do it explicitly,
1898 maybe by using command <KBD>o</KBD>.
1899
1900 </P>
1901 <P>
1902 When <KBD>c</KBD> is typed for the first time, or for a PO file entry which
1903 is different of the last one used for getting source context, then the
1904 command reacts by giving the first context available for this entry,
1905 if any. If some context has already been recently displayed for the
1906 current PO file entry, and the translator wandered to do other
1907 things, typing <KBD>c</KBD> again will merely resume, in another window,
1908 the context last displayed. In particular, if the translator moved
1909 the cursor away from the context in the source file, the command will
1910 bring the cursor back to the context. By using <KBD>c</KBD> many times
1911 in a row, with no interning other commands, PO mode will cycle to
1912 the next available contexts for this particular entry, getting back
1913 to the first context once the last has been shown.
1914
1915 </P>
1916 <P>
1917 The command <KBD>M-c</KBD> behaves differently. Instead of cycling through
1918 references, it lets the translator choose of particular reference among
1919 many, and displays that reference. It is best used with completion,
1920 if the translator types <KBD>TAB</KBD> immediately after <KBD>M-c</KBD>, in
1921 response to the question, she will be offered a menu of all possible
1922 references, as a reminder of which are the acceptable answers.
1923 This command is useful only where there are really many contexts
1924 available for a single string to translate.
1925
1926 </P>
1927 <P>
1928 Program source files are usually found relative to where the PO
1929 file stands. As a special provision, when this fails, the file is
1930 also looked for, but relative to the directory immediately above it.
1931 Those two cases take proper care of most PO files. However, it might
1932 happen that a PO file has been moved, or is edited in a different
1933 place than its normal location. When this happens, the translator
1934 should tell PO mode in which directory normally sits the genuine PO
1935 file. Many such directories may be specified, and all together, they
1936 constitute what is called the <STRONG>search path</STRONG> for program sources.
1937 The command <KBD>d</KBD> (<CODE>po-add-path</CODE>) is used to interactively
1938 enter a new directory at the front of the search path, and the command
1939 <KBD>M-d</KBD> (<CODE>po-delete-path</CODE>) is used to select, with completion,
1940 one of the directories she does not want anymore on the search path.
1941
1942 </P>
1943
1944
1945 <H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">Using Translation Compendiums</A></H2>
1946
1947 <P>
1948 Compendiums are yet to be implemented.
1949
1950 </P>
1951 <P>
1952 An incoming PO mode feature will let the translator maintain a
1953 compendium of already achieved translations. A <STRONG>compendium</STRONG>
1954 is a special PO file containing a set of translations recurring in
1955 many different packages. The translator will be given commands for
1956 adding entries to her compendium, and later initializing untranslated
1957 entries, or updating already translated entries, from translations
1958 kept in the compendium. For this to work, however, the compendium
1959 would have to be normalized. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>.
1960
1961 </P>
1962
1963
1964
1965 <H1><A NAME="SEC22" HREF="gettext_toc.html#TOC22">Updating Existing PO Files</A></H1>
1966
1967
1968
1969 <H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">Invoking the <CODE>tupdate</CODE> Program</A></H2>
1970
1971
1972 <PRE>
1973 tupdate --help
1974 tupdate --version
1975 tupdate <VAR>new</VAR> <VAR>old</VAR>
1976 </PRE>
1977
1978 <P>
1979 File <VAR>new</VAR> is the last created PO file (generally by
1980 <CODE>xgettext</CODE>). It need not contain any translations. File
1981 <VAR>old</VAR> is the PO file including the old translations which will
1982 be taken over to the newly created file as long as they still match.
1983
1984 </P>
1985 <P>
1986 When English messages change in the programs, this is reflected in
1987 the PO file as extracted by <CODE>xgettext</CODE>. In large messages, that
1988 can be hard to detect, and will obviously result in an incomplete
1989 translation. One of the virtues of <CODE>tupdate</CODE> is that it detects
1990 such changes, saving the previous translation into a PO file comment,
1991 so marking the entry as obsolete, and giving the modified string with
1992 an empty translation, that is, marking the entry as untranslated.
1993
1994 </P>
1995
1996
1997 <H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">Untranslated Entries</A></H2>
1998
1999 <P>
2000 When <CODE>xgettext</CODE> originally creates a PO file, unless told
2001 otherwise, it initializes the <CODE>msgid</CODE> field with the untranslated
2002 string, and leaves the <CODE>msgstr</CODE> string to be empty. Such entries,
2003 having an empty translation, are said to be <STRONG>untranslated</STRONG> entries.
2004 Later, when the programmer slightly modifies some string right in
2005 the program, this change is later reflected in the PO file
2006 by the appearance of a new untranslated entry for the modified string.
2007
2008 </P>
2009 <P>
2010 The usual commands moving from entry to entry consider untranslated
2011 entries on the same level as active entries. Untranslated entries
2012 are easily recognizable by the fact they end with <SAMP>`msgstr ""'</SAMP>.
2013
2014 </P>
2015 <P>
2016 The work of the translator might be (quite naively) seen as the process
2017 of seeking after an untranslated entry, editing a translation for
2018 it, and repeating these actions until no untranslated entries remain.
2019 Some commands are more specifically related to untranslated entry
2020 processing.
2021
2022 </P>
2023 <DL COMPACT>
2024
2025 <DT><KBD>e</KBD>
2026 <DD>
2027 Find the next untranslated entry.
2028
2029 <DT><KBD>M-e</KBD>
2030 <DD>
2031 Find the previous untranslated entry.
2032
2033 <DT><KBD>k</KBD>
2034 <DD>
2035 Turn the current entry into an untranslated one.
2036
2037 </DL>
2038
2039 <P>
2040 The commands <KBD>e</KBD> (<CODE>po-next-empty-entry</CODE>) and <KBD>M-e</KBD>
2041 (<CODE>po-previous-empty</CODE>) move forwards or backwards, chasing for an
2042 obsolete entry. If none is found, the search is extended and wraps
2043 around in the PO file buffer.
2044
2045 </P>
2046 <P>
2047 An entry can be turned back into an untranslated entry by
2048 merely emptying its translation, using the command <KBD>k</KBD>
2049 (<CODE>po-kill-msgstr</CODE>). See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2050
2051 </P>
2052 <P>
2053 Also, when time comes to quit working on a PO file buffer
2054 with the <KBD>q</KBD> command, the translator is asked for confirmation,
2055 if some untranslated string still exists.
2056
2057 </P>
2058
2059
2060 <H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">Obsolete Entries</A></H2>
2061
2062 <P>
2063 By <STRONG>obsolete</STRONG> PO file entries, we mean those entries which are
2064 commented out, usually by <CODE>tupdate</CODE> when it found that the
2065 translation is not needed anymore by the package being localized.
2066
2067 </P>
2068 <P>
2069 The usual commands moving from entry to entry consider obsolete
2070 entries on the same level as active entries. Obsolete entries are
2071 easily recognizable by the fact that all their lines start with
2072 <KBD>#</KBD>, even those lines containing <CODE>msgid</CODE> or <CODE>msgstr</CODE>.
2073
2074 </P>
2075 <P>
2076 Commands exist for emptying the translation or reinitializing it
2077 to the original untranslated string. Commands interfacing with the
2078 kill ring may force some previously saved text into the translation.
2079 The user may interactively edit the translation. All these commands
2080 may apply to obsolete entries, carefully leaving the entry obsolete
2081 after the fact.
2082
2083 </P>
2084 <P>
2085 Moreover, some commands are more specifically related to obsolete
2086 entry processing.
2087
2088 </P>
2089 <DL COMPACT>
2090
2091 <DT><KBD>M-n</KBD>
2092 <DD>
2093 <DT><KBD>M-<KBD>SPC</KBD></KBD>
2094 <DD>
2095 Find the next obsolete entry.
2096
2097 <DT><KBD>M-p</KBD>
2098 <DD>
2099 <DT><KBD>M-<KBD>DEL</KBD></KBD>
2100 <DD>
2101 Find the previous obsolete entry.
2102
2103 <DT><KBD>z</KBD>
2104 <DD>
2105 Make an active entry obsolete, or zap out an obsolete entry.
2106
2107 </DL>
2108
2109 <P>
2110 The commands <KBD>M-n</KBD> (<CODE>po-next-obsolete-entry</CODE>) and <KBD>M-p</KBD>
2111 (<CODE>po-previous-obsolete-entry</CODE>) move forwards or backwards,
2112 chasing for an obsolete entry. If none is found, the search is
2113 extended and wraps around in the PO file buffer. The commands
2114 <KBD>M-<KBD>SPC</KBD></KBD> and <KBD>M-<KBD>DEL</KBD></KBD> are synonymous to <KBD>M-n</KBD>
2115 and <KBD>M-p</KBD>, respectively.
2116
2117 </P>
2118 <P>
2119 PO mode does not provide ways for un-commenting an obsolete entry
2120 and making it active, because this would reintroduce an original
2121 untranslated string which does not correspond to any marked string
2122 in the program sources. This goes with the philosophy of never
2123 introducing useless <CODE>msgid</CODE> values.
2124
2125 </P>
2126 <P>
2127 However, it is possible to comment out an active entry, so making
2128 it obsolete. GNU <CODE>gettext</CODE> utilities will later react to the
2129 disappearance of a translation by using the untranslated string.
2130 The command <KBD>z</KBD> (<CODE>po-fade-out-entry</CODE>) pushes the current entry
2131 a little further towards annihilation. If the entry is active, then
2132 the entry is merely commented out. If the entry is already obsolete,
2133 then it is completely deleted from the PO file. It is easy to recycle
2134 the translation so deleted into some other PO file entry, usually
2135 one which is untranslated. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2136
2137 </P>
2138 <P>
2139 Here is a quite interesting problem to solve for later development of
2140 PO mode, for those nights you are not sleepy. The idea would be that
2141 PO mode might become bright enough, one of these days, to make good
2142 guesses at retrieving the most probable candidate, among all obsolete
2143 entries, for initializing the translation of a newly appeared string.
2144 I think it might be a quite hard problem to do this algorithmically, as
2145 we have to develop good and efficient measures of string similarity.
2146 Right now, PO mode completely lets the decision to the translator,
2147 when the time comes to find the adequate obsolete translation, it
2148 merely tries to provide handy tools for helping her to do so.
2149
2150 </P>
2151
2152
2153 <H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">Modifying Translations</A></H2>
2154
2155 <P>
2156 PO mode prevents direct edition of the PO file, by the usual
2157 means Emacs give for altering a buffer's contents. By doing so,
2158 it pretends helping the translator to avoid little clerical errors
2159 about the overall file format, or the proper quoting of strings,
2160 as those errors would be easily made. Other kinds of errors are
2161 still possible, but some may be catched and diagnosed by the batch
2162 validation process, which the translator may always trigger by the
2163 <KBD>v</KBD> command. For all other errors, the translator has to rely on
2164 her own judgment, and also on the linguistic reports submitted to her
2165 by the users of the translated package, having the same mother tongue.
2166
2167 </P>
2168 <P>
2169 When the time comes to create a translation, correct a error diagnosed
2170 mechanically or reported by a user, the translator have to resort to
2171 using the following commands for modifying the translations.
2172
2173 </P>
2174 <DL COMPACT>
2175
2176 <DT><KBD>RET</KBD>
2177 <DD>
2178 Interactively edit the translation.
2179
2180 <DT><KBD>TAB</KBD>
2181 <DD>
2182 Reinitialize the translation with the original, untranslated string.
2183
2184 <DT><KBD>k</KBD>
2185 <DD>
2186 Save the translation on the kill ring, and delete it.
2187
2188 <DT><KBD>w</KBD>
2189 <DD>
2190 Save the translation on the kill ring, without deleting it.
2191
2192 <DT><KBD>y</KBD>
2193 <DD>
2194 Replace the translation, taking the new from the kill ring.
2195
2196 </DL>
2197
2198 <P>
2199 The command <KBD>RET</KBD> (<CODE>po-edit-msgstr</CODE>) opens a new Emacs
2200 window containing a copy of the translation taken from the current
2201 PO file entry, all ready for edition, fully modifiable
2202 and with the complete extent of GNU Emacs modifying commands.
2203 The string is presented to the translator expunged of all quoting
2204 marks, and she will modify the <EM>unquoted</EM> string in this
2205 window to heart's content. Once done, the regular Emacs command
2206 <KBD>M-C-c</KBD> (<CODE>exit-recursive-edit</CODE>) may be used to return the
2207 edited translation into the PO file, replacing the original
2208 translation. The keys <KBD>C-c C-c</KBD> are bound so they have the
2209 same effect as <KBD>M-C-c</KBD>.
2210
2211 </P>
2212 <P>
2213 If the translator becomes unsatisfied with her translation to the
2214 extent she prefers keeping the translation which was existent prior to
2215 the <KBD>RET</KBD> command, she may use the regular Emacs command <KBD>C-]</KBD>
2216 (<CODE>abort-recursive-edit</CODE>) to merely get rid of edition, while
2217 preserving the original translation. Another way would be for her
2218 to exit normally with <KBD>C-c C-c</KBD>, then type <CODE>u</CODE> once for
2219 undoing the whole effect of last edition.
2220
2221 </P>
2222 <P>
2223 While editing her translation, the translator should pay attention at
2224 not inserting unwanted <KBD><KBD>RET</KBD></KBD> (carriage returns) characters at
2225 the end of the translated string if those are not meant to be there,
2226 or removing such characters when they are required. Since these
2227 characters are not visible in the editing buffer, they are easily to
2228 introduce by mistake. To help her, <KBD><KBD>RET</KBD></KBD> automatically puts
2229 the character <KBD>&#60;</KBD> at the end of the string being edited, but this
2230 <KBD>&#60;</KBD> is not really part of the string. On exiting the editing
2231 window with <KBD>C-c C-c</KBD>, PO mode automatically removes such
2232 <KBD>&#60;</KBD> and all whitespace added after it. If the translator adds
2233 characters after the terminating <KBD>&#60;</KBD>, it looses its delimiting
2234 property and integrally becomes part of the string. If she removes
2235 the delimiting <KBD>&#60;</KBD>, then the edited string is taken <EM>as
2236 is</EM>, with all trailing newlines, even if invisible. Also, if the
2237 translated string ought to end itself with a genuine <KBD>&#60;</KBD>, then the
2238 delimiting <KBD>&#60;</KBD> may not be removed; so the string should appear,
2239 in the editing window, as ending with two <KBD>&#60;</KBD> in a row.
2240
2241 </P>
2242 <P>
2243 When a translation (or a comment) is being edited, the translator
2244 may move the cursor back into the PO file buffer and freely
2245 move to other entries, and browsing at will. The edited entry will
2246 be recovered as soon as the edit ceases, because this is this entry
2247 only which is being modified. If, with an edition still opened, the
2248 translator wanders in the PO file buffer, she cannot modify
2249 any other entry. If she tries to, PO mode will react by suggesting
2250 that she aborts the current edit, or else, by inviting her to finish
2251 the current edit prior to any other modification.
2252
2253 </P>
2254 <P>
2255 The command <KBD>TAB</KBD> (<CODE>po-msgid-to-msgstr</CODE>) initializes, or
2256 reinitializes the translation with the original string. This command
2257 is normally used when the translator wants to redo a fresh translation
2258 of the original string, disregarding any previous work.
2259
2260 </P>
2261 <P>
2262 In fact, whether it is best to start a translation with an empty
2263 string, or rather with a copy of the original string, is a matter of
2264 taste or habit. Sometimes, the source mother tongue language and the
2265 target language are so different that is simply best to start writing
2266 on an empty page. At other times, the source and target languages
2267 are so close that it would be a waste to retype a number of words
2268 already being written in the original string. A translator may also
2269 like having the original string right under her eyes, as she will
2270 progressively overwrite the original text with the translation, even
2271 if this requires some extra editing work to get rid of the original.
2272
2273 </P>
2274 <P>
2275 The command <KBD>k</KBD> (<CODE>po-kill-msgstr</CODE>) merely empties the
2276 translation string, so turning the entry into an untranslated
2277 one. But while doing so, its previous contents is put apart in
2278 a special place, known as the kill ring. The command <KBD>w</KBD>
2279 (<CODE>po-kill-ring-save-msgstr</CODE>) has also the effect of taking a
2280 copy of the translation onto the kill ring, but it otherwise leaves
2281 the entry alone, and does <EM>not</EM> remove the translation from the
2282 entry. Both commands use exactly the Emacs kill ring, which is shared
2283 between buffers, and which is well known already to GNU Emacs lovers.
2284
2285 </P>
2286 <P>
2287 The translator may use <KBD>k</KBD> or <KBD>w</KBD> many times in the course
2288 of her work, as the kill ring may hold several saved translations.
2289 From the kill ring, strings may later be reinserted in various
2290 Emacs buffers. In particular, the kill ring may be used for moving
2291 translation strings between different entries of a single PO file
2292 buffer, or if the translator is handling many such buffers at once,
2293 even between PO files.
2294
2295 </P>
2296 <P>
2297 To facilitate exchanges with buffers which are not in PO mode, the
2298 translation string put on the kill ring by the <KBD>k</KBD> command is fully
2299 unquoted before being saved: external quotes are removed, multi-lines
2300 strings are concatenated, and backslashed escaped sequences are turned
2301 into their corresponding characters. In the special case of obsolete
2302 entries, the translation is also uncommented prior to saving.
2303
2304 </P>
2305 <P>
2306 The command <KBD>y</KBD> (<CODE>po-yank-msgstr</CODE>) completely replaces the
2307 translation of the current entry by a string taken from the kill ring.
2308 Following GNU Emacs terminology, we then say that the replacement
2309 string is <STRONG>yanked</STRONG> into the PO file buffer.
2310 See section `Yanking' in <CITE>The Emacs Editor</CITE>.
2311 The first time <KBD>y</KBD> is used, the translation receives the value of
2312 the most recent addition to the kill ring. If <KBD>y</KBD> is typed once
2313 again, immediately, without intervening keystrokes, the translation
2314 just inserted is taken away and replaced by the second most recent
2315 addition to the kill ring. By repeating <KBD>y</KBD> many times in a row,
2316 the translator may travel along the kill ring for saved strings,
2317 until she finds the string she really wanted.
2318
2319 </P>
2320 <P>
2321 When a string is yanked into a PO file entry, it is fully and
2322 automatically requoted for complying with the format PO files should
2323 have. Further, if the entry is obsolete, PO mode then appropriately
2324 push the inserted string inside comments. Once again, translators
2325 should not burden themselves with quoting considerations besides, of
2326 course, the necessity of the translated string itself respective to
2327 the program using it.
2328
2329 </P>
2330 <P>
2331 Note that <KBD>k</KBD> or <KBD>w</KBD> are not the only commands pushing strings
2332 on the kill ring, as almost any PO mode command replacing translation
2333 strings (or the translator comments) automatically save the old string
2334 on the kill ring. The main exceptions to this general rule are the
2335 yanking commands themselves.
2336
2337 </P>
2338 <P>
2339 To better illustrate the operation of killing and yanking, let's
2340 use an actual example, taken from a common situation. When the
2341 programmer slightly modifies some string right in the program, his
2342 change is later reflected in the PO file by the appearance
2343 of a new untranslated entry for the modified string, and the fact
2344 that the entry translating the original or unmodified string becomes
2345 obsolete. In many cases, the translator might spare herself some work
2346 by retrieving the unmodified translation from the obsolete entry,
2347 then initializing the untranslated entry <CODE>msgstr</CODE> field with
2348 this retrieved translation. Once this done, the obsolete entry is
2349 not wanted anymore, and may be safely deleted.
2350
2351 </P>
2352 <P>
2353 When the translator finds an untranslated entry and suspects that a
2354 slight variant of the translation exists, she immediately uses <KBD>m</KBD>
2355 to mark the current entry location, then starts chasing obsolete
2356 entries with <KBD>M-SPC</KBD>, hoping to find some translation corresponding
2357 to the unmodified string. Once found, she uses the <KBD>z</KBD> command
2358 for deleting the obsolete entry, knowing that <KBD>z</KBD> also <EM>kills</EM>
2359 the translation, that is, pushes the translation on the kill ring.
2360 Then, <KBD>l</KBD> returns to the initial untranslated entry, <KBD>y</KBD>
2361 then <EM>yanks</EM> the saved translation right into the <CODE>msgstr</CODE>
2362 field. The translator is then free to use <KBD><KBD>RET</KBD></KBD> for fine
2363 tuning the translation contents, and maybe to later use <KBD>e</KBD>,
2364 then <KBD>m</KBD> again, for going on with the next untranslated string.
2365
2366 </P>
2367 <P>
2368 When some sequence of keys has to be typed over and over again, the
2369 translator may find comfortable to become more acquainted with the GNU
2370 Emacs capability of learning these sequences and playing them back under
2371 request. See section `Keyboard Macros' in <CITE>The Emacs Editor</CITE>.
2372
2373 </P>
2374
2375
2376 <H2><A NAME="SEC27" HREF="gettext_toc.html#TOC27">Modifying Comments</A></H2>
2377
2378 <P>
2379 Any translation work done seriously will raise many linguistic
2380 difficulties, for which decisions have to be made, and the choices
2381 further documented. These documents may be saved within the
2382 PO file in form of translator comments, which the translator
2383 is free to create, delete, or modify at will. These comments may
2384 be useful to herself when she returns to this PO file after a while.
2385 Memory forgets!
2386
2387 </P>
2388 <P>
2389 These commands are somewhat similar to those modifying translations,
2390 so the general indications given for these apply here. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2391
2392 </P>
2393 <DL COMPACT>
2394
2395 <DT><KBD>M-RET</KBD>
2396 <DD>
2397 Interactively edit the translator comments.
2398
2399 <DT><KBD>M-k</KBD>
2400 <DD>
2401 Save the translator comments on the kill ring, and delete it.
2402
2403 <DT><KBD>M-w</KBD>
2404 <DD>
2405 Save the translator comments on the kill ring, without deleting it.
2406
2407 <DT><KBD>M-y</KBD>
2408 <DD>
2409 Replace the translator comments, taking the new from the kill ring.
2410
2411 </DL>
2412
2413 <P>
2414 Those commands parallel PO mode commands for modifying the translation
2415 strings, and behave much the same way as them, except that they handle
2416 this part of PO file comments meant for translator usage, rather
2417 than the translation strings. So, the descriptions given below are
2418 slightly succinct, because the full details have already been given.
2419 See section <A HREF="gettext.html#SEC26">Modifying Translations</A>.
2420
2421 </P>
2422 <P>
2423 The command <KBD>M-RET</KBD> (<CODE>po-edit-comment</CODE>) opens a new Emacs
2424 window containing a copy of the translator comments the current
2425 PO file entry. If there is no such comments, PO mode
2426 understands that the translator wants to add a comment to the entry,
2427 and she is presented an empty screen. Comment marks (<KBD>#</KBD>) and
2428 the space following them are automatically removed before edition,
2429 and reinstated after. For translator comments pertaining to obsolete
2430 entries, the uncommenting and recommenting operations are done twice.
2431 The command <KBD>#</KBD> also has the same effect as <KBD>M-RET</KBD>, and might
2432 be easier to type. Once in the editing window, the keys <KBD>C-c
2433 C-c</KBD> allow the translator to tell she is finished with editing
2434 the comment.
2435
2436 </P>
2437 <P>
2438 The command <KBD>M-k</KBD> (<CODE>po-kill-comment</CODE>) get rid of all
2439 translator comments, while saving those comments on the kill ring.
2440 The command <KBD>M-w</KBD> (<CODE>po-kill-ring-save-comment</CODE>) takes
2441 a copy of the translator comments on the kill ring, but leaves
2442 them undisturbed in the current entry. The command <KBD>M-y</KBD>
2443 (<CODE>po-yank-comment</CODE>) completely replaces the translator comments
2444 by a string taken at the front of the kill ring. When this command
2445 is immediately repeated, the comments just inserted are withdrawn,
2446 and replaced by other strings taken along the kill ring.
2447
2448 </P>
2449 <P>
2450 On the kill ring, all strings have the same nature. There is no
2451 distinction between <EM>translation</EM> strings and <EM>translator
2452 comments</EM> strings. So, for example, let's presume the translator
2453 has just finished editing a translation, and wants to create a new
2454 translator comments for documenting why the previous translation was
2455 not good, just to remember what was the problem. Foreseeing that she
2456 will do that in her documentation, the translator will want to quote
2457 the previous translation in her translator comments. For doing so, she
2458 may initialize the translator comments with the previous translation,
2459 still at the head of the kill ring. Because editing already pushed the
2460 previous translation on the kill ring, she just has to type <KBD>M-w</KBD>
2461 prior to <KBD>#</KBD>, and the previous translation will be right there,
2462 all ready for being introduced by some explanatory text.
2463
2464 </P>
2465 <P>
2466 On the other hand, presume there are some translator comments already
2467 and that the translator wants to add to those comments, instead
2468 of wholly replacing them. Then, she should edit the comment right
2469 away with <KBD>#</KBD>. Once inside the editing window, she can use the
2470 regular GNU Emacs commands <KBD>C-y</KBD> (<CODE>yank</CODE>) and <KBD>M-y</KBD>
2471 (<CODE>yank-pop</CODE>) for getting the previous translation where she likes.
2472
2473 </P>
2474
2475
2476 <H2><A NAME="SEC28" HREF="gettext_toc.html#TOC28">Consulting Auxiliary PO Files</A></H2>
2477
2478 <P>
2479 An incoming feature of PO mode should help the knowledgeable translator
2480 to take advantage of translations already achieved in other languages
2481 she just happens to know, by providing these other language translation
2482 as additional context for her own work. Each PO file existing for
2483 the same package the translator is working on, but targeted to a
2484 different mother tongue language, is called an <STRONG>auxiliary</STRONG> PO file.
2485 Commands will exist for declaring and handling auxiliary PO files,
2486 and also for showing contexts for the entry under work. For this to
2487 work fully, all auxiliary PO files will have to be normalized.
2488
2489 </P>
2490
2491
2492 <H1><A NAME="SEC29" HREF="gettext_toc.html#TOC29">Producing Binary MO Files</A></H1>
2493
2494
2495
2496 <H2><A NAME="SEC30" HREF="gettext_toc.html#TOC30">Invoking the <CODE>msgfmt</CODE> Program</A></H2>
2497
2498
2499 <PRE>
2500 Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
2501 </PRE>
2502
2503 <DL COMPACT>
2504
2505 <DT><SAMP>`-a <VAR>number</VAR>'</SAMP>
2506 <DD>
2507 <DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP>
2508 <DD>
2509 Align strings to <VAR>number</VAR> bytes (default: 1).
2510
2511 <DT><SAMP>`-h'</SAMP>
2512 <DD>
2513 <DT><SAMP>`--help'</SAMP>
2514 <DD>
2515 Display this help and exit.
2516
2517 <DT><SAMP>`-I <VAR>list</VAR>'</SAMP>
2518 <DD>
2519 <DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP>
2520 <DD>
2521 List of directories searched for input files.
2522
2523 <DT><SAMP>`--no-hash'</SAMP>
2524 <DD>
2525 Binary file will not include the hash table.
2526
2527 <DT><SAMP>`-o <VAR>file</VAR>'</SAMP>
2528 <DD>
2529 <DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
2530 <DD>
2531 Specify output file name as <VAR>file</VAR>.
2532
2533 <DT><SAMP>`-v'</SAMP>
2534 <DD>
2535 <DT><SAMP>`--verbose'</SAMP>
2536 <DD>
2537 Detect and diagnose input file anomalies which might represent
2538 translation errors. The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are
2539 studied and compared. It is considered abnormal that one string
2540 starts or ends with a newline while the other does not. Also, both
2541 strings should have the same number of <SAMP>`%'</SAMP> format specifiers,
2542 with matching types. For example, the check will diagnose using
2543 <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%s'</SAMP>, or
2544 <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>. It can even handle positional parameters.
2545
2546 <DT><SAMP>`-V'</SAMP>
2547 <DD>
2548 <DT><SAMP>`--version'</SAMP>
2549 <DD>
2550 Output version information and exit.
2551
2552 </DL>
2553
2554 <P>
2555 If input file is <SAMP>`-'</SAMP>, standard input is read. If output file
2556 is <SAMP>`-'</SAMP>, output is written to standard output.
2557
2558 </P>
2559 <P>
2560 The search patch for <CODE>msgfmt</CODE> is <TT>`/usr/local/share/nls/src/'</TT>,
2561 by default. It represents the path to additional directories where
2562 other PO files can be found. This feature could be used for some
2563 PO files for standard libraries, in case we would like to spare
2564 translating their strings over and over again. The <SAMP>`-x'</SAMP> option
2565 could then exclude these strings from the generation.
2566
2567 </P>
2568
2569
2570 <H2><A NAME="SEC31" HREF="gettext_toc.html#TOC31">The Format of GNU MO Files</A></H2>
2571
2572 <P>
2573 The format of the generated MO files is best described by a picture,
2574 which appears below.
2575
2576 </P>
2577 <P>
2578 The first two words serve the identification of the file. The magic
2579 number will always signal GNU MO files. The number is stored in the
2580 byte order of the generating machine, so the magic number really is
2581 two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second
2582 word describes the current revision of the file format. For now the
2583 revision is 0. This might change in future versions, and ensures
2584 that the readers of MO files can distinguish new formats from old
2585 ones, so that both can be handled correctly. The version is kept
2586 separate from the magic number, instead of using different magic
2587 numbers for different formats, mainly because <TT>`/etc/magic'</TT> is
2588 not updated often. It might be better to have magic separated from
2589 internal format version identification.
2590
2591 </P>
2592 <P>
2593 Follow a number of pointers to later tables in the file, allowing
2594 for the extension of the prefix part of MO files without having to
2595 recompile programs reading them. This might become useful for later
2596 inserting a few flag bits, indication about the charset used, new
2597 tables, or other things.
2598
2599 </P>
2600 <P>
2601 Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
2602 of string descriptors can be found. In both tables, each string
2603 descriptor uses two 32 bits integers, one for the string length,
2604 another for the offset of the string in the MO file, counting in bytes
2605 from the start of the file. The first table contains descriptors
2606 for the original strings, and is sorted so the original strings
2607 are in increasing lexicographical order. The second table contains
2608 descriptors for the translated strings, and is parallel to the first
2609 table: to find the corresponding translation one has to access the
2610 array slot in the second array with the same index.
2611
2612 </P>
2613 <P>
2614 Having the original strings sorted enables the use of simple binary
2615 search, for when the MO file does not contain an hashing table, or
2616 for when it is not practical to use the hashing table provided in
2617 the MO file. This also has another advantage, as the empty string
2618 in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
2619 some system information attached to that particular MO file, and the
2620 empty string necessarily becomes the first in both the original and
2621 translated tables, making the system information very easy to find.
2622
2623 </P>
2624 <P>
2625 The size <VAR>S</VAR> of the hash table can be zero. In this case, the
2626 hash table itself is not contained in the MO file. Some people might
2627 prefer this because a precomputed hashing table takes disk space, and
2628 does not win <EM>that</EM> much speed. The hash table contains indices
2629 to the sorted array of strings in the MO file. Conflict resolution is
2630 done by double hashing. The precise hashing algorithm used is fairly
2631 dependent of GNU <CODE>gettext</CODE> code, and is not documented here.
2632
2633 </P>
2634 <P>
2635 As for the strings themselves, they follow the hash file, and each
2636 is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
2637 the length which appears in the string descriptor. The <CODE>msgfmt</CODE>
2638 program has an option selecting the alignment for MO file strings.
2639 With this option, each string is separately aligned so it starts at
2640 an offset which is a multiple of the alignment value. On some RISC
2641 machines, a correct alignment will speed things up.
2642
2643 </P>
2644 <P>
2645 Nothing prevents an MO file from having embedded <KBD>NUL</KBD>s in strings.
2646 However, the program interface currently used already presumes
2647 that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
2648 somewhat useless. But MO file format is general enough so other
2649 interfaces would be later possible, if for example, we ever want to
2650 implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
2651 accidently appear.
2652
2653 </P>
2654 <P>
2655 This particular issue has been strongly debated in the GNU
2656 <CODE>gettext</CODE> development forum, and it is expectable that MO file
2657 format will evolve or change over time. It is even possible that many
2658 formats may later be supported concurrently. But surely, we got to
2659 start somewhere, and the MO file format described here is a good start.
2660 Nothing is cast in concrete, and the format may later evolve fairly
2661 easily, so we should feel comfortable with the current approach.
2662
2663 </P>
2664
2665 <PRE>
2666 byte
2667 +------------------------------------------+
2668 0 | magic number = 0x950412de |
2669 | |
2670 4 | file format revision = 0 |
2671 | |
2672 8 | number of strings | == N
2673 | |
2674 12 | offset of table with original strings | == O
2675 | |
2676 16 | offset of table with translation strings | == T
2677 | |
2678 20 | size of hashing table | == S
2679 | |
2680 24 | offset of hashing table | == H
2681 | |
2682 . .
2683 . (possibly more entries later) .
2684 . .
2685 | |
2686 O | length &#38; offset 0th string ----------------.
2687 O + 8 | length &#38; offset 1st string ------------------.
2688 ... ... | |
2689 O + ((N-1)*8)| length &#38; offset (N-1)th string | | |
2690 | | | |
2691 T | length &#38; offset 0th translation ---------------.
2692 T + 8 | length &#38; offset 1st translation -----------------.
2693 ... ... | | | |
2694 T + ((N-1)*8)| length &#38; offset (N-1)th translation | | | | |
2695 | | | | | |
2696 H | start hash table | | | | |
2697 ... ... | | | |
2698 H + S * 4 | end hash table | | | | |
2699 | | | | | |
2700 | NUL terminated 0th string &#60;----------------' | | |
2701 | | | | |
2702 | NUL terminated 1st string &#60;------------------' | |
2703 | | | |
2704 ... ... | |
2705 | | | |
2706 | NUL terminated 0th translation &#60;---------------' |
2707 | | |
2708 | NUL terminated 1st translation &#60;-----------------'
2709 | |
2710 ... ...
2711 | |
2712 +------------------------------------------+
2713 </PRE>
2714
2715
2716
2717 <H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">The User's View</A></H1>
2718
2719 <P>
2720 When GNU <CODE>gettext</CODE> will truly have reached is goal, average users
2721 should feel some kind of astonished pleasure, seeing the effect of
2722 that strange kind of magic that just makes their own native language
2723 appear everywhere on their screens. As for naive users, they would
2724 ideally have no special pleasure about it, merely taking their own
2725 language for <EM>granted</EM>, and becoming rather unhappy otherwise.
2726
2727 </P>
2728 <P>
2729 So, let's try to describe here how we would like the magic to operate,
2730 as we want the users' view to be the simplest, among all ways one
2731 could look at GNU <CODE>gettext</CODE>. All other software engineers:
2732 programmers, translators, maintainers, should work together in such a
2733 way that the magic becomes possible. This is a long and progressive
2734 undertaking, and information is available about the progress of the
2735 GNU Translation Project.
2736
2737 </P>
2738 <P>
2739 When a package is distributed, there are two kind of users:
2740 <STRONG>installers</STRONG> who fetch the distribution, unpack it, configure
2741 it, compile it and install it for themselves or others to use; and
2742 <STRONG>end users</STRONG> that call programs of the package, once these have
2743 been installed at their site. GNU <CODE>gettext</CODE> is offering magic
2744 for both installers and end users.
2745
2746 </P>
2747
2748
2749
2750 <H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">The Current <TT>`NLS'</TT> Matrix for GNU</A></H2>
2751
2752 <P>
2753 Languages are not equally supported in all GNU packages. To know
2754 if some GNU package uses GNU <CODE>gettext</CODE>, one may check
2755 the distribution for the <TT>`NLS'</TT> information file, for some
2756 <TT>`<VAR>ll</VAR>.po'</TT> files, often kept together into some <TT>`po/'</TT>
2757 directory, or for an <TT>`intl/'</TT> directory. Internationalized
2758 packages have usually many <TT>`<VAR>ll</VAR>.po'</TT> files, where <VAR>ll</VAR>
2759 represents the language. section <A HREF="gettext.html#SEC35">Magic for End Users</A> for a complete description
2760 of the format for <VAR>ll</VAR>.
2761
2762 </P>
2763 <P>
2764 More generally, a matrix is available for showing the current state
2765 of GNU internationalization, listing which packages are prepared
2766 for multi-lingual messages, and which languages is supported by each.
2767 Because this information changes often, this matrix is not kept within
2768 this GNU <CODE>gettext</CODE> manual. This information is often found in
2769 file <TT>`NLS'</TT> from various GNU distributions, but is also as old
2770 as the distribution itself. A recent copy of this <TT>`NLS'</TT> file,
2771 containing up-to-date information, should generally be found on most
2772 GNU archive sites.
2773
2774 </P>
2775
2776
2777 <H2><A NAME="SEC34" HREF="gettext_toc.html#TOC34">Magic for Installers</A></H2>
2778
2779 <P>
2780 By default, packages fully using GNU <CODE>gettext</CODE>, internally,
2781 are installed in such a way that they to allow translation of
2782 messages. At <EM>configuration</EM> time, those packages should
2783 automatically detect whether the underlying host system provides usable
2784 <CODE>catgets</CODE> or <CODE>gettext</CODE> functions. If neither is present,
2785 the GNU <CODE>gettext</CODE> library should be automatically prepared
2786 and used. Installers may use special options at configuration
2787 time for changing this behavior. The command <SAMP>`./configure
2788 --with-gnu-gettext'</SAMP> bypasses system <CODE>catgets</CODE> or <CODE>gettext</CODE> to
2789 use GNU <CODE>gettext</CODE> instead, while <SAMP>`./configure --disable-nls'</SAMP>
2790 produces program totally unable to translate messages.
2791
2792 </P>
2793 <P>
2794 Internationalized packages have usually many <TT>`<VAR>ll</VAR>.po'</TT>
2795 files. Unless
2796 translations are disabled, all those available are installed together
2797 with the package. However, the environment variable <CODE>LINGUAS</CODE>
2798 may be set, prior to configuration, to limit the installed set.
2799 <CODE>LINGUAS</CODE> should then contain a space separated list of two-letter
2800 codes, stating which languages are allowed.
2801
2802 </P>
2803
2804
2805 <H2><A NAME="SEC35" HREF="gettext_toc.html#TOC35">Magic for End Users</A></H2>
2806
2807 <P>
2808 We consider here those packages using GNU <CODE>gettext</CODE> internally,
2809 and for which the installers did not disable translation at
2810 <EM>configure</EM> time. Then, users only have to set the <CODE>LANG</CODE>
2811 environment variable to the appropriate <SAMP>`<VAR>ll</VAR>'</SAMP> prior to
2812 using the programs in the package. See section <A HREF="gettext.html#SEC33">The Current <TT>`NLS'</TT> Matrix for GNU</A>. For example,
2813 let's presume a German site. At the shell prompt, users merely have to
2814 execute <SAMP>`setenv LANG de'</SAMP> (in <CODE>csh</CODE>) or <SAMP>`export
2815 LANG; LANG=de'</SAMP> (in <CODE>sh</CODE>). They could even do this from their
2816 <TT>`.login'</TT> or <TT>`.profile'</TT> file.
2817
2818 </P>
2819
2820
2821 <H1><A NAME="SEC36" HREF="gettext_toc.html#TOC36">The Programmer's View</A></H1>
2822
2823 <P>
2824 One aim of the current message catalog implementation provided by
2825 GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the
2826 installer wishes to do so. So we perhaps should first take a look at
2827 the solutions we know about. The people in the POSIX committee does not
2828 manage to agree on one of the semi-official standards which we'll
2829 describe below. In fact they couldn't agree on anything, so nothing
2830 decide only to include an example of an interface. The major Unix vendors
2831 are split in the usage of the two most important specifications: X/Opens
2832 catgets vs. Uniforums gettext interface. We'll describe them both and
2833 later explain our solution of this dilemma.
2834
2835 </P>
2836
2837
2838
2839 <H2><A NAME="SEC37" HREF="gettext_toc.html#TOC37">About <CODE>catgets</CODE></A></H2>
2840
2841 <P>
2842 The <CODE>catgets</CODE> implementation is defined in the X/Open Portability
2843 Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
2844 process of creating this standard seemed to be too slow for some of
2845 the Unix vendors so they created their implementations on preliminary
2846 versions of the standard. Of course this leads again to problems while
2847 writing platform independent programs: even the usage of <CODE>catgets</CODE>
2848 does not guarantee a unique interface.
2849
2850 </P>
2851 <P>
2852 Another, personal comment on this that only a bunch of committee members
2853 could have made this interface. They never really tried to program
2854 using this interface. It is a fast, memory-saving implementation, an
2855 user can happily live with it. But programmers hate it (at least me and
2856 some others do...)
2857
2858 </P>
2859 <P>
2860 But we must not forget one point: after all the trouble with transfering
2861 the rights on Unix(tm) they at last came to X/Open, the very same who
2862 published this specifications. This leads me to making the prediction
2863 that this interface will be in future Unix standards (e.g. Spec1170) and
2864 therefore part of all Unix implementation (implementations, which are
2865 <EM>allowed</EM> to wear this name).
2866
2867 </P>
2868
2869
2870
2871 <H3><A NAME="SEC38" HREF="gettext_toc.html#TOC38">The Interface</A></H3>
2872
2873 <P>
2874 The interface to the <CODE>catgets</CODE> implementation consists of three
2875 functions which correspond to those used in file access: <CODE>catopen</CODE>
2876 to open the catalog for using, <CODE>catgets</CODE> for accessing the message
2877 tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes
2878 for the functions and the needed definitions are in the
2879 <CODE>&#60;nl_types.h&#62;</CODE> header file.
2880
2881 </P>
2882 <P>
2883 <CODE>catopen</CODE> is used like in this:
2884
2885 </P>
2886
2887 <PRE>
2888 nl_catd catd = catopen ("catalog_name", 0);
2889 </PRE>
2890
2891 <P>
2892 The function takes as the argument the name of the catalog. This usual
2893 refers to the name of the program or the package. The second parameter
2894 is not further specified in the standard. I don't even know whether it
2895 is implemented consistently among various systems. So the common advice
2896 is to use <CODE>0</CODE> as the value. The return value is a handle to the
2897 message catalog, equivalent to handles to file returned by <CODE>open</CODE>.
2898
2899 </P>
2900 <P>
2901 This handle is of course used in the <CODE>catgets</CODE> function which can
2902 be used like this:
2903
2904 </P>
2905
2906 <PRE>
2907 char *translation = catgets (catd, set_no, msg_id, "original string");
2908 </PRE>
2909
2910 <P>
2911 The first parameter is this catalog descriptor. The second parameter
2912 specifies the set of messages in this catalog, in which the message
2913 described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a
2914 three-stage addressing:
2915
2916 </P>
2917
2918 <PRE>
2919 catalog name => set number => message ID => translation
2920 </PRE>
2921
2922 <P>
2923 The fourth argument is not used to address the translation. It is given
2924 as a default value in case when one of the addressing stages fail. One
2925 important thing to remember is that although the return type of catgets
2926 is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It
2927 should better <CODE>const char *</CODE>, but the standard is published in
2928 1988, one year before ANSI C.
2929
2930 </P>
2931 <P>
2932 The last of these function functions is used and behaves as expected:
2933
2934 </P>
2935
2936 <PRE>
2937 catclose (catd);
2938 </PRE>
2939
2940 <P>
2941 After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
2942
2943 </P>
2944
2945
2946 <H3><A NAME="SEC39" HREF="gettext_toc.html#TOC39">Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
2947
2948 <P>
2949 Now that this descriptions seemed to be really easy where are the
2950 problem we speak of. In fact the interface could be used in a
2951 reasonable way, but constructing the message catalogs is a pain. The
2952 reason for this lies in the third argument of <CODE>catgets</CODE>: the unique
2953 message ID. This has to be a numeric value for all messages in a single
2954 set. Perhaps you could imagine the problems keeping such list while
2955 changing the source code. Add a new message here, remove one there. Of
2956 course there have been developed a lot of tools helping to organize this
2957 chaos but one as the other fails in one aspect or the other. We don't
2958 want to say that the other approach has no problems but they are far
2959 more easily to manage.
2960
2961 </P>
2962
2963
2964 <H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>gettext</CODE></A></H2>
2965
2966 <P>
2967 The definition of the <CODE>gettext</CODE> interface comes from a Uniforum
2968 proposal and it is followed by at least one major Unix vendor
2969 (Sun) in its last developments. It is not specified in any official
2970 standard, though.
2971
2972 </P>
2973 <P>
2974 The main points about this solution is that it does not follow the
2975 method of normal file handling (open-use-close) and that it does not
2976 burden the programmer so many task, especially the unique key handling.
2977 Of course here is also a unique key needed, but this key is the
2978 message itself (how long or short it is). See section <A HREF="gettext.html#SEC45">Comparing the Two Interfaces</A> for a
2979 more detailed comparison of the two methods.
2980
2981 </P>
2982 <P>
2983 The following section contains a rather detailed description of the
2984 interface. We make it that detailed because this is the interface
2985 we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested
2986 in using this library will be interested in this description.
2987
2988 </P>
2989
2990
2991
2992 <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3>
2993
2994 <P>
2995 The minimal functionality an interface must have is a) to select a
2996 domain the strings are coming from (a single domain for all programs is
2997 not reasonable because its construction and maintenance is difficult,
2998 perhaps impossible) and b) to access a string in a selected domain.
2999
3000 </P>
3001 <P>
3002 This is principally the description of the <CODE>gettext</CODE> interface. It
3003 has an global domain which unqualified usages reference. Of course this
3004 domain is selectable by the user.
3005
3006 </P>
3007
3008 <PRE>
3009 char *textdomain (const char *domain_name);
3010 </PRE>
3011
3012 <P>
3013 This provides the possibility to change or query the current status of
3014 the current global domain of the <CODE>LC_MESSAGE</CODE> category. The
3015 argument is a null-terminated string, whose characters must be legal in
3016 the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
3017 the function return the current value. If no value has been set
3018 before, the name of the default domain is returned: <EM>messages</EM>.
3019 Please note that although the return value of <CODE>textdomain</CODE> is of
3020 type <CODE>char *</CODE> no changing is allowed. It is also important to know
3021 that no checks of the availability are made. If the name is not
3022 available you will see this by the fact that no translations are provided.
3023
3024 </P>
3025 <P>
3026 To use a domain set by <CODE>textdomain</CODE> the function
3027
3028 </P>
3029
3030 <PRE>
3031 char *gettext (const char *msgid);
3032 </PRE>
3033
3034 <P>
3035 is to be used. This is the simplest reasonable form one can imagine.
3036 The translation of the string <VAR>msgid</VAR> is returned if it is available
3037 in the current domain. If not available the argument itself is
3038 returned. If the argument is <CODE>NULL</CODE> the result is undefined.
3039
3040 </P>
3041 <P>
3042 One things which should come into mind is that no explicit dependency to
3043 the used domain is given. The current value of the domain for the
3044 <CODE>LC_MESSAGES</CODE> locale is used. If this changes between two
3045 executions of the same <CODE>gettext</CODE> call in the program, both calls
3046 reference a different message catalog.
3047
3048 </P>
3049 <P>
3050 For the easiest case, which is normally used in internationalized GNU
3051 packages, once at the beginning of execution a call to <CODE>textdomain</CODE>
3052 is issued, setting the domain to a unique name, normally the package
3053 name. In the following code all strings which have to be translated are
3054 filtered through the gettext function. That's all, the package speaks
3055 your language.
3056
3057 </P>
3058
3059
3060 <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Solving Ambiguities</A></H3>
3061
3062 <P>
3063 While this single name domain work good for most applications there
3064 might be the need to get translations from more than one domain. Of
3065 course one could switch between different domains with calls to
3066 <CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A
3067 possible situation could be one case discussing while this writing: all
3068 error messages of functions in the set of common used functions should
3069 go into a separate domain <CODE>error</CODE>. By this mean we would only need
3070 to translate them once.
3071
3072 </P>
3073 <P>
3074 For this reasons there are two more functions to retrieve strings:
3075
3076 </P>
3077
3078 <PRE>
3079 char *dgettext (const char *domain_name, const char *msgid);
3080 char *dcgettext (const char *domain_name, const char *msgid,
3081 int category);
3082 </PRE>
3083
3084 <P>
3085 Both take an additional argument at the first place, which corresponds
3086 to the argument of <CODE>textdomain</CODE>. The third argument of
3087 <CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>.
3088 But I really don't know where this can be useful. If the
3089 <VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
3090 the known ones, the result is undefined. It should also be noted that
3091 this function is not part of the second known implementation of this
3092 function family, the one found in Solaris.
3093
3094 </P>
3095 <P>
3096 A second ambiguity can arise by the fact, that perhaps more than one
3097 domain has the same name. This can be solved by specifying where the
3098 needed message catalog files can be found.
3099
3100 </P>
3101
3102 <PRE>
3103 char *bindtextdomain (const char *domain_name,
3104 const char *dir_name);
3105 </PRE>
3106
3107 <P>
3108 Calling this function binds the given domain to a file in the specified
3109 directory (how this file is determined follows below). Esp a file in
3110 the systems default place is not favored against the specified file
3111 anymore (as it would be by solely using <CODE>textdomain</CODE>). A <CODE>NULL</CODE>
3112 pointer for the <VAR>dir_name</VAR> parameter returns the binding associated
3113 with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is <CODE>NULL</CODE>
3114 nothing happens and a <CODE>NULL</CODE> pointer is returned. Here again as
3115 for all the other functions is true that none of the return value must
3116 be changed!
3117
3118 </P>
3119
3120
3121 <H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">Locating Message Catalog Files</A></H3>
3122
3123 <P>
3124 Because many different languages for many different packages have to be
3125 stored we need some way to add these information to file message catalog
3126 files. The way usually used in Unix environments is have this encoding
3127 in the file name. This is also done here. The directory name given in
3128 <CODE>bindtextdomain</CODE>s second argument (or the default directory),
3129 followed by the value and name of the locale and the domain name are
3130 concatenated:
3131
3132 </P>
3133
3134 <PRE>
3135 <VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
3136 </PRE>
3137
3138 <P>
3139 The default value for <VAR>dir_name</VAR> is system specific. For the GNU
3140 library it's:
3141
3142 <PRE>
3143 /usr/local/share/locale
3144 </PRE>
3145
3146 <P>
3147 <VAR>locale</VAR> is the value of the locale whose name is this
3148 <CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
3149 locale is always <CODE>LC_MESSAGES</CODE>. <CODE>dcgettext</CODE> specifies the
3150 locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
3151
3152 </P>
3153
3154
3155 <H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">Optimization of the *gettext functions</A></H3>
3156
3157 <P>
3158 At this point of the discussion we should talk about an advantage of the
3159 GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out
3160 that an internationalized program might have a poor performance if some
3161 string has to be translated in an inner loop. While this is unavoidable
3162 when the string varies from one run of the loop to the other it is
3163 simply a waste of time when the string is always the same. Take the
3164 following example:
3165
3166 </P>
3167
3168 <PRE>
3169 {
3170 while (...)
3171 {
3172 puts (gettext ("Hello world"));
3173 }
3174 }
3175 </PRE>
3176
3177 <P>
3178 When the locale selection does not change between two runs the resulting
3179 string is always the same. One way to use this is:
3180
3181 </P>
3182
3183 <PRE>
3184 {
3185 str = gettext ("Hello world");
3186 while (...)
3187 {
3188 puts (str);
3189 }
3190 }
3191 </PRE>
3192
3193 <P>
3194 But this solution is not usable in all situation (e.g. when the locale
3195 selection changes) nor is it good readable.
3196
3197 </P>
3198 <P>
3199 The GNU C compiler, version 2.7 and above, provide another solution for
3200 this. To describe this we show here some lines of the
3201 <TT>`intl/libgettext.h'</TT> file. For an explanation of the expression
3202 command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>.
3203
3204 </P>
3205
3206 <PRE>
3207 # if defined __GNUC__ &#38;&#38; __GNUC__ == 2 &#38;&#38; __GNUC_MINOR__ &#62;= 7
3208 # define dcgettext(domainname, msgid, category) \
3209 (__extension__ \
3210 ({ \
3211 char *result; \
3212 if (__builtin_constant_p (msgid)) \
3213 { \
3214 extern int _nl_msg_cat_cntr; \
3215 static char *__translation__; \
3216 static int __catalog_counter__; \
3217 if (! __translation__ \
3218 || __catalog_counter__ != _nl_msg_cat_cntr) \
3219 { \
3220 __translation__ = \
3221 dcgettext__ ((domainname), (msgid), (category)); \
3222 __catalog_counter__ = _nl_msg_cat_cntr; \
3223 } \
3224 result = __translation__; \
3225 } \
3226 else \
3227 result = dcgettext__ ((domainname), (msgid), (category)); \
3228 result; \
3229 }))
3230 # endif
3231 </PRE>
3232
3233 <P>
3234 The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate.
3235 This is evaluated at compile time and so optimization can take place
3236 immediately. Here two cases are distinguished: the argument to
3237 <CODE>gettext</CODE> is not a constant value in which case simply the function
3238 <CODE>dcgettext__</CODE> is called, the real implementation of the
3239 <CODE>dcgettext</CODE> function.
3240
3241 </P>
3242 <P>
3243 If the string argument <EM>is</EM> constant we can reuse the once gained
3244 translation when the locale selection has not changed. This is exactly
3245 what is done here. The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in
3246 the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is
3247 changed whenever a new message catalog is loaded.
3248
3249 </P>
3250
3251
3252 <H2><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Comparing the Two Interfaces</A></H2>
3253
3254 <P>
3255 The following discussion is perhaps a little bit colored. As said
3256 above we implemented GNU <CODE>gettext</CODE> following the Uniforum
3257 proposal and this surely has its reasons. But it should show how we
3258 came to this decision.
3259
3260 </P>
3261 <P>
3262 First we take a look at the developing process. When we write an
3263 application using NLS provided by <CODE>gettext</CODE> we proceed as always.
3264 Only when we come to a string which might be seen by the users and thus
3265 has to be translated we use <CODE>gettext("...")</CODE> instead of
3266 <CODE>"..."</CODE>. At the beginning of each source file (or in a central
3267 header file) we define
3268
3269 </P>
3270
3271 <PRE>
3272 #define gettext(String) (String)
3273 </PRE>
3274
3275 <P>
3276 Even this definition can be avoided when the system supports the
3277 <CODE>gettext</CODE> function in its C library. When we compile this code the
3278 result is the same as if no NLS code is used. When you take a look at
3279 the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
3280 instead of <CODE>gettext("...")</CODE>. This reduces the number of
3281 additional characters per translatable string to <EM>3</EM> (in words:
3282 three).
3283
3284 </P>
3285 <P>
3286 When now a production version of the program is needed we simply replace
3287 the definition
3288
3289 </P>
3290
3291 <PRE>
3292 #define _(String) (String)
3293 </PRE>
3294
3295 <P>
3296 by
3297
3298 </P>
3299
3300 <PRE>
3301 #include &#60;libintl.h&#62;
3302 #define _(String) gettext (String)
3303 </PRE>
3304
3305 <P>
3306 and include the header <TT>`libintl.h'</TT>. Additionally we run the
3307 program <TT>`xgettext'</TT> on all source code file which contain
3308 translatable strings and we are gone. We have a running program which
3309 does not depend on translations to be available, but which can use any
3310 that becomes available.
3311
3312 </P>
3313 <P>
3314 The same procedure can be done for the <CODE>gettext_noop</CODE> invocations
3315 (see section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>). First you can define <CODE>gettext_noop</CODE> to a
3316 no-op macro and later use the definition from <TT>`libintl.h'</TT>. Because
3317 this name is not used in Suns implementation of <TT>`libintl.h'</TT>,
3318 you should consider the following code for your project:
3319
3320 </P>
3321
3322 <PRE>
3323 #ifdef gettext_noop
3324 # define N_(Str) gettext_noop (Str)
3325 #else
3326 # define N_(Str) (Str)
3327 #endif
3328 </PRE>
3329
3330 <P>
3331 <CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in
3332 the <TT>`po/'</TT> directory of GNU gettext knows by default both of the
3333 mentioned short forms so you are invited to follow this proposal for
3334 your own ease.
3335
3336 </P>
3337 <P>
3338 Now to <CODE>catgets</CODE>. The main problem is the work for the
3339 programmer. Every time he comes to a translatable string he has to
3340 define a number (or a symbolic constant) which has also be defined in
3341 the message catalog file. He also has to take care for duplicate
3342 entries, duplicate message IDs etc. If he wants to have the same
3343 quality in the message catalog as the GNU <CODE>gettext</CODE> program
3344 provides he also has to put the descriptive comments for the strings and
3345 the location in all source code files in the message catalog. This is
3346 nearly a Mission: Impossible.
3347
3348 </P>
3349 <P>
3350 But there are also some points people might call advantages speaking for
3351 <CODE>catgets</CODE>. If you have a single word in a string and this string
3352 is used in different contexts it is likely that in one or the other
3353 language the word has different translations. Example:
3354
3355 </P>
3356
3357 <PRE>
3358 printf ("%s: %d", gettext ("number"), number_of_errors)
3359
3360 printf ("you should see %d %s", number_count,
3361 number_count == 1 ? gettext ("number") : gettext ("numbers"))
3362 </PRE>
3363
3364 <P>
3365 Here we have to translate two times the string <CODE>"number"</CODE>. Even
3366 if you do not speak a language beside English it might be possible to
3367 recognize that the two words have a different meaning. In German the
3368 first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
3369 to <CODE>"Zahl"</CODE>.
3370
3371 </P>
3372 <P>
3373 Now you can say that this example is really esoteric. And you are
3374 right! This is exactly how we felt about this problem and decide that
3375 it does not weight that much. The solution for the above problem could
3376 be very easy:
3377
3378 </P>
3379
3380 <PRE>
3381 printf (gettext ("number: %d"), number_of_errors)
3382
3383 printf (number_count == 1 ? gettext ("you should see %d number")
3384 : gettext ("you should see %d numbers"),
3385 number_count)
3386 </PRE>
3387
3388 <P>
3389 We believe that we can solve all conflicts with this method. If it is
3390 difficult one can also consider changing one of the conflicting string a
3391 little bit. But it is not impossible to overcome.
3392
3393 </P>
3394 <P>
3395 Translator note: It is perhaps appropriate here to tell those English
3396 speaking programmers that the plural form of a noun cannot be formed by
3397 appending a single `s'. Most other languages use different methods. So
3398 you should at least use the method given in the above example.
3399
3400 </P>
3401 <P>
3402 But I have been told that some languages have even more complex rules.
3403 A good approach might be to consider methods like the one used for
3404 <CODE>LC_TIME</CODE> in the POSIX.2 standard.
3405
3406 </P>
3407
3408
3409
3410 <H2><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Using libintl.a in own programs</A></H2>
3411
3412 <P>
3413 Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be more
3414 or less self-contained. I.e. you can use it in your own programs. The
3415 <TT>`Makefile'</TT> will put the header and the library in directories
3416 selected using the <CODE>$(prefix)</CODE>.
3417
3418 </P>
3419 <P>
3420 One exception of the above is found on HP-UX systems. Here the C library
3421 does not contain the <CODE>alloca</CODE> function (and the HP compiler does
3422 not generate it inlined). But it is not intended to rewrite the whole
3423 library just because of this dumb system. Instead include the
3424 <CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in.
3425
3426 </P>
3427
3428
3429
3430 <H2><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Being a <CODE>gettext</CODE> grok</A></H2>
3431
3432 <P>
3433 To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
3434 is surely helpful to read the source code. But for those who don't want
3435 to spend that much time in reading the (sometimes complicated) code here
3436 is a list comments:
3437
3438 </P>
3439
3440 <UL>
3441 <LI>Changing the language at runtime
3442
3443 For interactive programs it might be useful to offer a selection of the
3444 used language at runtime. To understand how to do this one need to know
3445 how the used language is determined while executing the <CODE>gettext</CODE>
3446 function. The method which is presented here only works correctly
3447 with the GNU implementation of the <CODE>gettext</CODE> functions. It is not
3448 possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE>
3449 functions from the systems C library. The exception is of course the
3450 GNU C Library which uses the GNU gettext Library for message handling.
3451
3452 In the function <CODE>dcgettext</CODE> at every call the current setting of
3453 the highest priority environment variable is determined and used.
3454 Highest priority means here the following list with decreasing
3455 priority:
3456
3457
3458 <OL>
3459 <LI><CODE>LANGUAGE</CODE>
3460
3461 <LI><CODE>LC_ALL</CODE>
3462
3463 <LI><CODE>LC_xxx</CODE>, according to selected locale
3464
3465 <LI><CODE>LANG</CODE>
3466
3467 </OL>
3468
3469 Afterwards the path is constructed using the found value and the
3470 translation file is loaded if available.
3471
3472 What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According
3473 to the process explained above the new value of this variable is found
3474 as soon as the <CODE>dcgettext</CODE> function is called. But this also means
3475 the (perhaps) different message catalog file is loaded. In other
3476 words: the used language is changed.
3477
3478 But there is one little hook. The code for gcc-2.7.0 and up provides
3479 some optimization. This optimization normally prevents the calling of
3480 the <CODE>dcgettext</CODE> function as long as now new catalog is loaded. But
3481 if <CODE>dcgettext</CODE> is not called we program also cannot find the
3482 <CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext.html#SEC44">Optimization of the *gettext functions</A>). But the
3483 solution is very easy. Include the following code in the language
3484 switching function.
3485
3486
3487 <PRE>
3488 /* Change language. */
3489 setenv ("LANGUAGE", "fr", 1);
3490
3491 /* Make change known. */
3492 {
3493 extern int _nl_msg_cat_cntr;
3494 ++_nl_msg_cat_cntr;
3495 }
3496 </PRE>
3497
3498 The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>.
3499
3500 </UL>
3501
3502
3503
3504 <H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Temporary Notes for the Programmers Chapter</A></H2>
3505
3506
3507
3508 <H3><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Temporary - Two Possible Implementations</A></H3>
3509
3510 <P>
3511 There are two competing methods for language independent messages:
3512 the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
3513 method. The <CODE>catgets</CODE> method indexes messages by integers; the
3514 <CODE>gettext</CODE> method indexes them by their English translations.
3515 The <CODE>catgets</CODE> method has been around longer and is supported
3516 by more vendors. The <CODE>gettext</CODE> method is supported by Sun,
3517 and it has been heard that the COSE multi-vendor initiative is
3518 supporting it. Neither method is a POSIX standard; the POSIX.1
3519 committee had a lot of disagreement in this area.
3520
3521 </P>
3522 <P>
3523 Neither one is in the POSIX standard. There was much disagreement
3524 in the POSIX.1 committee about using the <CODE>gettext</CODE> routines
3525 vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't
3526 agree on anything, so no messaging system was included as part
3527 of the standard. I believe the informative annex of the standard
3528 includes the XPG3 messaging interfaces, "...as an example of
3529 a messaging system that has been implemented..."
3530
3531 </P>
3532 <P>
3533 They were very careful not to say anywhere that you should use one
3534 set of interfaces over the other. For more on this topic please
3535 see the Programming for Internationalization FAQ.
3536
3537 </P>
3538
3539
3540 <H3><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Temporary - About <CODE>catgets</CODE></A></H3>
3541
3542 <P>
3543 There have been a few discussions of late on the use of
3544 <CODE>catgets</CODE> as a base. I think it important to present both
3545 sides of the argument and hence am opting to play devil's advocate
3546 for a little bit.
3547
3548 </P>
3549 <P>
3550 I'll not deny the fact that <CODE>catgets</CODE> could have been designed
3551 a lot better. It currently has quite a number of limitations and
3552 these have already been pointed out.
3553
3554 </P>
3555 <P>
3556 However there is a great deal to be said for consistency and
3557 standardization. A common recurring problem when writing Unix
3558 software is the myriad portability problems across Unix platforms.
3559 It seems as if every Unix vendor had a look at the operating system
3560 and found parts they could improve upon. Undoubtedly, these
3561 modifications are probably innovative and solve real problems.
3562 However, software developers have a hard time keeping up with all
3563 these changes across so many platforms.
3564
3565 </P>
3566 <P>
3567 And this has prompted the Unix vendors to begin to standardize their
3568 systems. Hence the impetus for Spec1170. Every major Unix vendor
3569 has committed to supporting this standard and every Unix software
3570 developer waits with glee the day they can write software to this
3571 standard and simply recompile (without having to use autoconf)
3572 across different platforms.
3573
3574 </P>
3575 <P>
3576 As I understand it, Spec1170 is roughly based upon version 4 of the
3577 X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and
3578 friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
3579 is a part of Spec1170 and hence will become a standardized component
3580 of all Unix systems.
3581
3582 </P>
3583
3584
3585 <H3><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary - Why a single implementation</A></H3>
3586
3587 <P>
3588 Now it seems kind of wasteful to me to have two different systems
3589 installed for accessing message catalogs. If we do want to remedy
3590 <CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
3591 (in a compatible manner) rather than implement an entirely new system.
3592 Otherwise, we'll end up with two message catalog access systems
3593 installed with an operating system - one set of routines for GNU
3594 software, and another set of routines (catgets) for all other software.
3595 Bloated?
3596
3597 </P>
3598 <P>
3599 Supposing another catalog access system is implemented. Which do
3600 we recommend? At least for Linux, we need to attract as many
3601 software developers as possible. Hence we need to make it as easy
3602 for them to port their software as possible. Which means supporting
3603 <CODE>catgets</CODE>. We will be implementing the <CODE>glocale</CODE> code
3604 within our <CODE>libc</CODE>, but does this mean we also have to incorporate
3605 another message catalog access scheme within our <CODE>libc</CODE> as well?
3606 And what about people who are going to be using the <CODE>glocale</CODE>
3607 + non-<CODE>catgets</CODE> routines. When they port their software to
3608 other platforms, they're now going to have to include the front-end
3609 (<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
3610 access routines) with their software instead of just including the
3611 <CODE>glocale</CODE> code with their software.
3612
3613 </P>
3614 <P>
3615 Message catalog support is however only the tip of the iceberg.
3616 What about the data for the other locale categories. They also have
3617 a number of deficiencies. Are we going to abandon them as well and
3618 develop another duplicate set of routines (should <CODE>glocale</CODE>
3619 expand beyond message catalog support)?
3620
3621 </P>
3622 <P>
3623 Like many parts of Unix that can be improved upon, we're stuck with balancing
3624 compatibility with the past with useful improvements and innovations for
3625 the future.
3626
3627 </P>
3628
3629
3630 <H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Double layer solution</A></H3>
3631
3632 <P>
3633 GNU locale implements a <CODE>gettext</CODE>-style interface on top of a
3634 <CODE>catgets</CODE>-style interface.
3635
3636 </P>
3637 <P>
3638 This is not needless complexity. It is absolutely vital, because
3639 it enables <CODE>gettext</CODE> to run on top of <CODE>catgets</CODE>, which
3640 enables Linux International to recommend users use it <EM>today</EM>.
3641
3642 </P>
3643 <P>
3644 Rewriting <CODE>gettext</CODE> so that it could use <EM>either</EM>
3645 <CODE>catgets</CODE> <EM>or</EM> some simpler mechanism would not break
3646 anything, but would not reduce complexity either. It might be
3647 worth doing, but it isn't urgent.
3648
3649 </P>
3650 <P>
3651 In general, simplicity is not enough of a reason to rewrite a
3652 program that works. Simplicity is just one desirable thing.
3653 It is not overridingly important.
3654
3655 </P>
3656
3657
3658 <H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - Notes</A></H3>
3659
3660 <P>
3661 X/Open agreed very late on the standard form so that many
3662 implementations differ from the final form. Both of my system (old
3663 Linux catgets and Ultrix-4) have a strange variation.
3664
3665 </P>
3666 <P>
3667 OK. After incorporating the last changes I have to spend some time on
3668 making the GNU/Linux libc gettext functions. So in future Solaris is
3669 not the only system having gettext.
3670
3671 </P>
3672
3673
3674 <H1><A NAME="SEC54" HREF="gettext_toc.html#TOC54">The Translator's View</A></H1>
3675
3676
3677
3678 <H2><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Introduction 0</A></H2>
3679
3680 <P>
3681 GNU is going international! The GNU Translation Project is a way
3682 to get maintainers, translators and users all together, so GNU will
3683 gradually become able to speak many native languages.
3684
3685 </P>
3686 <P>
3687 The GNU <CODE>gettext</CODE> tool set contains <EM>everything</EM> maintainers
3688 need for internationalizing their packages for messages. It also
3689 contains quite useful tools for helping translators at localizing
3690 messages to their native language, once a package has already been
3691 internationalized.
3692
3693 </P>
3694 <P>
3695 To achieve the GNU Translation Project, we need many interested
3696 people who like their own language and write it well, and who are also
3697 able to synergize with other translators speaking the same language.
3698 If you'd like to volunteer to <EM>work</EM> at translating messages,
3699 please send mail to your translating team.
3700
3701 </P>
3702 <P>
3703 Each team has its own mailing list, courtesy of Linux
3704 International. You may reach your translating team at the address
3705 <TT>`<VAR>ll</VAR>@li.org'</TT>, replacing <VAR>ll</VAR> by the two-letter ISO 639
3706 code for your language. Language codes are <EM>not</EM> the same as
3707 country codes given in ISO 3166. The following translating teams
3708 exist:
3709
3710 </P>
3711
3712 <BLOCKQUOTE>
3713 <P>
3714 Chinese <CODE>zh</CODE>, Czech <CODE>cs</CODE>, Danish <CODE>da</CODE>, Dutch <CODE>nl</CODE>,
3715 Esperanto <CODE>eo</CODE>, Finnish <CODE>fi</CODE>, French <CODE>fr</CODE>, Irish
3716 <CODE>ga</CODE>, German <CODE>de</CODE>, Greek <CODE>el</CODE>, Italian <CODE>it</CODE>,
3717 Japanese <CODE>ja</CODE>, Indonesian <CODE>in</CODE>, Norwegian <CODE>no</CODE>, Polish
3718 <CODE>pl</CODE>, Portuguese <CODE>pt</CODE>, Russian <CODE>ru</CODE>, Spanish <CODE>es</CODE>,
3719 Swedish <CODE>sv</CODE> and Turkish <CODE>tr</CODE>.
3720 </BLOCKQUOTE>
3721
3722 <P>
3723 For example, you may reach the Chinese translating team by writing to
3724 <TT>`zh@li.org'</TT>. When you become a member of the translating team
3725 for your own language, you may subscribe to its list. For example,
3726 Swedish people can send a message to <TT>`sv-request@li.org'</TT>,
3727 having this message body:
3728
3729 </P>
3730
3731 <PRE>
3732 subscribe
3733 </PRE>
3734
3735 <P>
3736 Keep in mind that team members should be interested in <EM>working</EM>
3737 at translations, or at solving translational difficulties, rather than
3738 merely lurking around. If your team does not exist yet and you want to
3739 start one, please write to <TT>`gnu-translation@prep.ai.mit.edu'</TT>;
3740 you will then reach the GNU coordinator for all translator teams.
3741
3742 </P>
3743 <P>
3744 A handful of GNU packages have already been adapted and provided
3745 with message translations for several languages. Translation
3746 teams have begun to organize, using these packages as a starting
3747 point. But there are many more packages and many languages for
3748 which we have no volunteer translators. If you would like to
3749 volunteer to work at translating messages, please send mail to
3750 <TT>`gnu-translation@prep.ai.mit.edu'</TT> indicating what language(s)
3751 you can work on.
3752
3753 </P>
3754
3755
3756 <H2><A NAME="SEC56" HREF="gettext_toc.html#TOC56">Introduction 1</A></H2>
3757
3758 <P>
3759 This is now official, GNU is going international! Here is the
3760 announcement submitted for the January 1995 GNU Bulletin:
3761
3762 </P>
3763
3764 <BLOCKQUOTE>
3765 <P>
3766 A handful of GNU packages have already been adapted and provided
3767 with message translations for several languages. Translation
3768 teams have begun to organize, using these packages as a starting
3769 point. But there are many more packages and many languages
3770 for which we have no volunteer translators. If you'd like to
3771 volunteer to work at translating messages, please send mail to
3772 <SAMP>`gnu-translation@prep.ai.mit.edu'</SAMP> indicating what language(s)
3773 you can work on.
3774 </BLOCKQUOTE>
3775
3776 <P>
3777 This document should answer many questions for those who are curious
3778 about the process or would like to contribute. Please at least skim
3779 over it, hoping to cut down a little of the high volume of email
3780 generated by this collective effort towards GNU internationalization.
3781
3782 </P>
3783 <P>
3784 GNU programming is done in English, and currently, English is used
3785 as the main communicating language between national communities
3786 collaborating to the GNU project. This very document is written
3787 in English. This will not change in the foreseeable future.
3788
3789 </P>
3790 <P>
3791 However, there is a strong appetite from national communities for
3792 having more software able to write using national language and habits,
3793 and there is an on-going effort to modify GNU software in such a way
3794 that it becomes able to do so. The experiments driven so far raised
3795 an enthusiastic response from pretesters, so we believe that GNU
3796 internationalization is dedicated to succeed.
3797
3798 </P>
3799 <P>
3800 For suggestion clarifications, additions or corrections to this
3801 document, please email to <TT>`gnu-translation@prep.ai.mit.edu'</TT>.
3802
3803 </P>
3804
3805
3806 <H2><A NAME="SEC57" HREF="gettext_toc.html#TOC57">Discussions</A></H2>
3807
3808 <P>
3809 Facing this internationalization effort, a few users expressed their
3810 concerns. Some of these doubts are presented and discussed, here.
3811
3812 </P>
3813
3814 <UL>
3815 <LI>Smaller groups
3816
3817 Some languages are not spoken by a very large number of people,
3818 so people speaking them sometimes consider that there may not be
3819 all that much demand such versions of GNU packages. Moreover, many
3820 people being <EM>into computers</EM>, in some countries, generally seem
3821 to prefer English versions of their software.
3822
3823 On the other end, people might enjoy their own language a lot, and
3824 be very motivated at providing to themselves the pleasure of having
3825 their beloved GNU software speaking their mother tongue. They do
3826 themselves a personal favor, and do not pay that much attention to
3827 the number of people beneficiating of their work.
3828
3829 <LI>Misinterpretation
3830
3831 Other users are shy to push forward their own language, seeing in this
3832 some kind of misplaced propaganda. Someone thought there must be some
3833 users of the language over the networks pestering other people with it.
3834
3835 But any spoken language is worth localization, because there are
3836 people behind the language for whom the language is important and
3837 dear to their hearts.
3838
3839 <LI>Odd translations
3840
3841 The biggest problem is to find the right translations so that
3842 everybody can understand the messages. Translations are usually a
3843 little odd. Some people get used to English, to the extent they may
3844 find translations into their own language "rather pushy, obnoxious
3845 and sometimes even hilarious." As a French speaking man, I have
3846 the experience of those instruction manuals for goods, so poorly
3847 translated in French in Korea or Taiwan...
3848
3849 The fact is that we sometimes have to create a kind of national
3850 computer culture, and this is not easy without the collaboration of
3851 many people liking their mother tongue. This is why translations are
3852 better achieved by people knowing and loving their own language, and
3853 ready to work together at improving the results they obtain.
3854
3855 <LI>Dependencies over the GPL
3856
3857 Some people wonder if using GNU <CODE>gettext</CODE> necessarily brings their package
3858 under the protective wing of the GNU General Public License, when they
3859 do not want to make their program free, or want other kinds of freedom.
3860 The simplest answer is yes.
3861
3862 The mere marking of localizable strings in a package, or conditional
3863 inclusion of a few lines for initialization, is not really including
3864 GPL'ed code. However, the localization routines themselves are under
3865 the GPL and would bring the remainder of the package under the GPL
3866 if they were distributed with it. So, I presume that, for those
3867 for which this is a problem, it could be circumvented by letting to
3868 the end installers the burden of assembling a package prepared for
3869 localization, but not providing the localization routines themselves.
3870
3871 </UL>
3872
3873
3874
3875 <H2><A NAME="SEC58" HREF="gettext_toc.html#TOC58">Organization</A></H2>
3876
3877 <P>
3878 On a larger scale, the true solution would be to organize some kind of
3879 fairly precise set up in which volunteers could participate. I gave
3880 some thought to this idea lately, and realize there will be some
3881 touchy points. I thought of writing to Richard Stallman to launch
3882 such a project, but feel it might be good to shake out the ideas
3883 between ourselves first. Most probably that Linux International has
3884 some experience in the field already, or would like to orchestrate
3885 the volunteer work, maybe. Food for thought, in any case!
3886
3887 </P>
3888 <P>
3889 I guess we have to setup something early, somehow, that will help
3890 many possible contributors of the same language to interlock and avoid
3891 work duplication, and further be put in contact for solving together
3892 problems particular to their tongue (in most languages, there are many
3893 difficulties peculiar to translating technical English). My Swedish
3894 contributor acknowledged these difficulties, and I'm well aware of
3895 them for French.
3896
3897 </P>
3898 <P>
3899 This is surely not a technical issue, but we should manage so the
3900 effort of locale contributors be maximally useful, despite the national
3901 team layer interface between contributors and maintainers.
3902
3903 </P>
3904 <P>
3905 GNU needs some setup for coordinating language coordinators.
3906 Localizing evolving GNU programs will surely become a permanent
3907 and continuous activity in GNU, once started. The setup should be
3908 minimally completed and tested before GNU <CODE>gettext</CODE> becomes an official
3909 reality. The email address <TT>`gnu-translation@prep.ai.mit.edu'</TT>
3910 has been setup for receiving offers from volunteers and general
3911 email on these topics. This address reaches the GNU Translation
3912 Project coordinator.
3913
3914 </P>
3915
3916
3917
3918 <H3><A NAME="SEC59" HREF="gettext_toc.html#TOC59">Central Coordination</A></H3>
3919
3920 <P>
3921 I also think GNU will need sooner than it thinks, that someone setup
3922 a way to organize and coordinate these groups. Some kind of group
3923 of groups. My opinion is that it would be good that GNU delegate
3924 this task to a small group of collaborating volunteers, shortly.
3925 Perhaps in <TT>`gnu.announce'</TT> a list of this national committee's
3926 can be published.
3927
3928 </P>
3929 <P>
3930 My role as coordinator would simply be to refer to Ulrich any German
3931 speaking volunteer interested to localization of GNU programs, and
3932 maybe helping national groups to initially organize, while maintaining
3933 national registries for until national groups are ready to take over.
3934 In fact, the coordinator should ease volunteers to get in contact with
3935 one another for creating national teams, which should then select
3936 one coordinator per language, or country (regionalized language).
3937 If well done, the coordination should be useful without being an
3938 overwhelming task, the time to put delegations in place.
3939
3940 </P>
3941
3942
3943 <H3><A NAME="SEC60" HREF="gettext_toc.html#TOC60">National Teams</A></H3>
3944
3945 <P>
3946 I suggest we look for volunteer coordinators/editors for individual
3947 languages. These people will scan contributions of translation files
3948 for various programs, for their own languages, and will ensure high
3949 and uniform standards of diction.
3950
3951 </P>
3952 <P>
3953 From my current experience with other people in these days, those who
3954 provide localizations are very enthusiastic about the process, and are
3955 more interested in the localization process than in the program they
3956 localize, and want to do many programs, not just one. This seems
3957 to confirm that having a coordinator/editor for each language is a
3958 good idea.
3959
3960 </P>
3961 <P>
3962 We need to choose someone who is good at writing clear and concise
3963 prose in the language in question. That is hard--we can't check
3964 it ourselves. So we need to ask a few people to judge each others'
3965 writing and select the one who is best.
3966
3967 </P>
3968 <P>
3969 I announce my prerelease to a few dozen people, and you would not
3970 believe all the discussions it generated already. I shudder to think
3971 what will happen when this will be launched, for true, officially,
3972 world wide. Who am I to arbitrate between two Czekolsovak users
3973 contradicting each other, for example?
3974
3975 </P>
3976 <P>
3977 I assume that your German is not much better than my French so that
3978 I would not be able to judge about these formulations. What I would
3979 suggest is that for each language there is a group for people who
3980 maintain the PO files and judge about changes. I suspect there will
3981 be cultural differences between how such groups of people will behave.
3982 Some will have relaxed ways, reach consensus easily, and have anyone
3983 of the group relate to the maintainers, while others will fight to
3984 death, organize heavy administrations up to national standards, and
3985 use strict channels.
3986
3987 </P>
3988 <P>
3989 The German team is putting out a good example. Right now, they are
3990 maybe half a dozen people revising translations of each other and
3991 discussing the linguistic issues. I do not even have all the names.
3992 Ulrich Drepper is taking care of coordinating the German team.
3993 He subscribed to all my pretest lists, so I do not even have to warn
3994 him specifically of incoming releases.
3995
3996 </P>
3997 <P>
3998 I'm sure, that is a good idea to get teams for each language working
3999 on translations. That will make the translations better and more
4000 consistent.
4001
4002 </P>
4003
4004
4005
4006 <H4><A NAME="SEC61" HREF="gettext_toc.html#TOC61">Sub-Cultures</A></H4>
4007
4008 <P>
4009 Taking French for example, there are a few sub-cultures around
4010 computers which developed diverging vocabularies. Picking volunteers
4011 here and there without addressing this problem in an organized way,
4012 soon in the project, might produce a distasteful mix of GNU programs,
4013 and possibly trigger endless quarrels among those who really care.
4014
4015 </P>
4016 <P>
4017 Keeping some kind of unity in the way French localization of GNU
4018 programs is achieved is a difficult (and delicate) job. Knowing the
4019 latin character of French people (:-), if we take this the wrong
4020 way, we could end up nowhere, or spoil a lot of energies. Maybe we
4021 should begin to address this problem seriously <EM>before</EM> GNU
4022 <CODE>gettext</CODE> become officially published. And I suspect that this
4023 means soon!
4024
4025 </P>
4026
4027
4028 <H4><A NAME="SEC62" HREF="gettext_toc.html#TOC62">Organizational Ideas</A></H4>
4029
4030 <P>
4031 I expect the next big changes after the official release. Please note
4032 that I use the German translation of the short GPL message. We need
4033 to set a few good examples before the localization goes out for true
4034 in GNU. Here are a few points to discuss:
4035
4036 </P>
4037
4038 <UL>
4039 <LI>
4040
4041 Each group should have one FTP server (at least one master).
4042
4043 <LI>
4044
4045 The files on the server should reflect the latest version (of
4046 course!) and it should also contain a RCS directory with the
4047 corresponding archives (I don't have this now).
4048
4049 <LI>
4050
4051 There should also be a ChangeLog file (this is more useful than the
4052 RCS archive but can be generated automatically from the later by
4053 Emacs).
4054
4055 <LI>
4056
4057 A <STRONG>core group</STRONG> should judge about questionable changes (for now
4058 this group consists solely by me but I ask some others occasionally;
4059 this also seems to work).
4060
4061 </UL>
4062
4063
4064
4065 <H3><A NAME="SEC63" HREF="gettext_toc.html#TOC63">Mailing Lists</A></H3>
4066
4067 <P>
4068 If we get any inquiries about GNU <CODE>gettext</CODE>, send them on to:
4069
4070 </P>
4071
4072 <PRE>
4073 <TT>`gnu-translation@prep.ai.mit.edu'</TT>
4074 </PRE>
4075
4076 <P>
4077 The <TT>`*-pretest'</TT> lists are quite useful to me, maybe the idea could
4078 be generalized to all GNU packages. But each maintainer his/her way!
4079
4080 </P>
4081 <P>
4082 , we have a mechanism in place here at
4083 <TT>`gnu.ai.mit.edu'</TT> to track teams, support mailing lists for
4084 them and log members. We have a slight preference that you use it.
4085 If this is OK with you, I can get you clued in.
4086
4087 </P>
4088 <P>
4089 Things are changing! A few years ago, when Daniel Fekete and I
4090 asked for a mailing list for GNU localization, nested at the FSF, we
4091 were politely invited to organize it anywhere else, and so did we.
4092 For communicating with my pretesters, I later made a handful of
4093 mailing lists located at iro.umontreal.ca and administrated by
4094 <CODE>majordomo</CODE>. These lists have been <EM>very</EM> dependable
4095 so far...
4096
4097 </P>
4098 <P>
4099 I suspect that the German team will organize itself a mailing list
4100 located in Germany, and so forth for other countries. But before they
4101 organize for true, it could surely be useful to offer mailing lists
4102 located at the FSF to each national team. So yes, please explain me
4103 how I should proceed to create and handle them.
4104
4105 </P>
4106 <P>
4107 We should create temporary mailing lists, one per country, to help
4108 people organize. Temporary, because once regrouped and structured, it
4109 would be fair the volunteers from country bring back <EM>their</EM> list
4110 in there and manage it as they want. My feeling is that, in the long
4111 run, each team should run its own list, from within their country.
4112 There also should be some central list to which all teams could
4113 subscribe as they see fit, as long as each team is represented in it.
4114
4115 </P>
4116
4117
4118 <H2><A NAME="SEC64" HREF="gettext_toc.html#TOC64">Information Flow</A></H2>
4119
4120 <P>
4121 There will surely be some discussion about this messages after the
4122 packages are finally released. If people now send you some proposals
4123 for better messages, how do you proceed? Jim, please note that
4124 right now, as I put forward nearly a dozen of localizable programs, I
4125 receive both the translations and the coordination concerns about them.
4126
4127 </P>
4128 <P>
4129 If I put one of my things to pretest, Ulrich receives the announcement
4130 and passes it on to the German team, who make last minute revisions.
4131 Then he submits the translation files to me <EM>as the maintainer</EM>.
4132 For GNU packages I do not maintain, I would not even hear about it.
4133 This scheme could be made to work GNU-wide, I think. For security
4134 reasons, maybe Ulrich (national coordinators, in fact) should update
4135 central registry kept by GNU (Jim, me, or Len's recruits) once in
4136 a while.
4137
4138 </P>
4139 <P>
4140 In December/January, I was aggressively ready to internationalize
4141 all of GNU, giving myself the duty of one small GNU package per week
4142 or so, taking many weeks or months for bigger packages. But it does
4143 not work this way. I first did all the things I'm responsible for.
4144 I've nothing against some missionary work on other maintainers, but
4145 I'm also loosing a lot of energy over it--same debates over again.
4146
4147 </P>
4148 <P>
4149 And when the first localized packages are released we'll get a lot of
4150 responses about ugly translations :-). Surely, and we need to have
4151 beforehand a fairly good idea about how to handle the information
4152 flow between the national teams and the package maintainers.
4153
4154 </P>
4155 <P>
4156 Please start saving somewhere a quick history of each PO file. I know
4157 for sure that the file format will change, allowing for comments.
4158 It would be nice that each file has a kind of log, and references for
4159 those who want to submit comments or gripes, or otherwise contribute.
4160 I sent a proposal for a fast and flexible format, but it is not
4161 receiving acceptance yet by the GNU deciders. I'll tell you when I
4162 have more information about this.
4163
4164 </P>
4165
4166
4167 <H1><A NAME="SEC65" HREF="gettext_toc.html#TOC65">The Maintainer's View</A></H1>
4168
4169 <P>
4170 The maintainer of a package has many responsibilities. One of them
4171 is ensuring that the package will install easily on many platforms,
4172 and that the magic we described earlier (see section <A HREF="gettext.html#SEC32">The User's View</A>) will work
4173 for installers and end users.
4174
4175 </P>
4176 <P>
4177 Of course, there are many possible ways by which GNU <CODE>gettext</CODE>
4178 might be integrated in a distribution, and this chapter does not cover
4179 them in all generality. Instead, it details one possible approach
4180 which is especially adequate for many GNU distributions, because
4181 GNU <CODE>gettext</CODE> is purposely for helping the internationalization
4182 of the whole GNU project. So, the maintainer's view presented here
4183 presumes that the package already has a <TT>`configure.in'</TT> file and
4184 uses Autoconf.
4185
4186 </P>
4187 <P>
4188 Nevertheless, GNU <CODE>gettext</CODE> may surely be useful for non-GNU
4189 packages, but the maintainers of such packages might have to show
4190 imagination and initiative in organizing their distributions so
4191 <CODE>gettext</CODE> work for them in all situations. There are surely
4192 many, out there.
4193
4194 </P>
4195 <P>
4196 Even if <CODE>gettext</CODE> methods are now stabilizing, slight adjustments
4197 might be needed between successive <CODE>gettext</CODE> versions, so you
4198 should ideally revise this chapter in subsequent releases, looking
4199 for changes.
4200
4201 </P>
4202
4203
4204
4205 <H2><A NAME="SEC66" HREF="gettext_toc.html#TOC66">Flat or Non-Flat Directory Structures</A></H2>
4206
4207 <P>
4208 Some GNU packages are distributed as <CODE>tar</CODE> files which unpack
4209 in a single directory, these are said to be <STRONG>flat</STRONG> distributions.
4210 Other GNU packages have a one level hierarchy of subdirectories, using
4211 for example a subdirectory named <TT>`doc/'</TT> for the Texinfo manual and
4212 man pages, another called <TT>`lib/'</TT> for holding functions meant to
4213 replace or complement C libraries, and a subdirectory <TT>`src/'</TT> for
4214 holding the proper sources for the package. These other distributions
4215 are said to be <STRONG>non-flat</STRONG>.
4216
4217 </P>
4218 <P>
4219 For now, we cannot say much about flat distributions. A flat
4220 directory structure has the disadvantage of increasing the difficulty
4221 of updating to a new version of GNU <CODE>gettext</CODE>. Also, if you have
4222 many PO files, this could somewhat pollute your single directory.
4223 In the GNU <CODE>gettext</CODE> distribution, the <TT>`misc/'</TT> directory
4224 contains a shell script named <TT>`combine-sh'</TT>. That script may
4225 be used for combining all the C files of the <TT>`intl/'</TT> directory
4226 into a pair of C files (one <TT>`.c'</TT> and one <TT>`.h'</TT>). Those two
4227 generated files would fit more easily in a flat directory structure,
4228 and you will then have to add these two files to your project.
4229
4230 </P>
4231 <P>
4232 Maybe because GNU <CODE>gettext</CODE> itself has a non-flat structure,
4233 we have more experience with this approach, and this is what will be
4234 described in the remaining of this chapter. Some maintainers might
4235 use this as an opportunity to unflatten their package structure.
4236 Only later, once gained more experience adapting GNU <CODE>gettext</CODE>
4237 to flat distributions, we might add some notes about how to proceed
4238 in flat situations.
4239
4240 </P>
4241
4242
4243 <H2><A NAME="SEC67" HREF="gettext_toc.html#TOC67">Prerequisite Works</A></H2>
4244
4245 <P>
4246 There are some works which are required for using GNU <CODE>gettext</CODE>
4247 in one of your package. These works have some kind of generality
4248 that escape the point by point descriptions used in the remainder
4249 of this chapter. So, we describe them here.
4250
4251 </P>
4252
4253 <UL>
4254 <LI>
4255
4256 Before attempting to use you should install some other packages first.
4257 Ensure that recent versions of GNU <CODE>m4</CODE>, GNU Autoconf and GNU
4258 <CODE>gettext</CODE> are already installed at your site, and if not, proceed
4259 to do this first. If you got to install these things, beware that
4260 GNU <CODE>m4</CODE> must be fully installed before GNU Autoconf is even
4261 <EM>configured</EM>.
4262
4263 Those three packages are only needed to you, as a maintainer; the
4264 installers of your own package and end users do not really need any
4265 of GNU <CODE>m4</CODE>, GNU Autoconf or GNU <CODE>gettext</CODE> for successfully
4266 installing and running your package, with messages properly translated.
4267 But this is not completely true if you provide internationalized
4268 shell scripts within your own package: GNU <CODE>gettext</CODE> shall
4269 then be installed at the user site if the end users want to see the
4270 translation of shell script messages.
4271
4272 <LI>
4273
4274 Your package should use Autoconf and have a <TT>`configure.in'</TT> file.
4275 If it does not, you have to learn how. The Autoconf documentation
4276 is quite well written, it is a good idea that you print it and get
4277 familiar with it.
4278
4279 <LI>
4280
4281 Your C sources should have already been modified according to
4282 instructions given earlier in this manual. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>.
4283
4284 <LI>
4285
4286 Your <TT>`po/'</TT> directory should receive all PO files submitted to you
4287 by the translator teams, each having <TT>`<VAR>ll</VAR>.po'</TT> as a name.
4288 This is not usually easy to get translation
4289 work done before your package gets internationalized and available!
4290 Since the cycle has to start somewhere, the easiest for the maintainer
4291 is to start with absolutely no PO files, and wait until various
4292 translator teams get interested in your package, and submit PO files.
4293
4294 </UL>
4295
4296 <P>
4297 It is worth adding here a few words about how the maintainer should
4298 ideally behave with PO files submissions. As a maintainer, your
4299 role is to authentify the origin of the submission as being the
4300 representative of the appropriate GNU translating team (forward the
4301 submission to <TT>`gnu-translation@prep.ai.mit.edu'</TT> in case of
4302 doubt), to ensure that the PO file format is not severely broken and
4303 does not prevent successful installation, and for the rest, to merely
4304 to put these PO files in <TT>`po/'</TT> for distribution.
4305
4306 </P>
4307 <P>
4308 As a maintainer, you do not have to take on your shoulders the
4309 responsibility of checking if the translations are adequate or
4310 complete, and should avoid diving into linguistic matters. Translation
4311 teams drive themselves and are fully responsible of their linguistic
4312 choices for GNU. Keep in mind that translator teams are <EM>not</EM>
4313 driven by maintainers. You can help by carefully redirecting all
4314 communications and reports from users about linguistic matters to the
4315 appropriate translation team, or explain users how to reach or join
4316 their team. The simplest might be to send them the <TT>`NLS'</TT> file.
4317
4318 </P>
4319 <P>
4320 Maintainers should <EM>never ever</EM> apply PO file bug reports
4321 themselves, short-cutting translation teams. If some translator has
4322 difficulty to get some of her points through her team, it should not be
4323 an issue for her to directly negotiate translations with maintainers.
4324 Teams ought to settle their problems themselves, if any. If you, as
4325 a maintainer, ever think there is a real problem with a team, please
4326 never try to <EM>solve</EM> a team's problem on your own.
4327
4328 </P>
4329
4330
4331 <H2><A NAME="SEC68" HREF="gettext_toc.html#TOC68">Invoking the <CODE>gettextize</CODE> Program</A></H2>
4332
4333 <P>
4334 Some files are consistently and identically needed in every package
4335 internationalized through GNU <CODE>gettext</CODE>. As a matter of
4336 convenience, the <CODE>gettextize</CODE> program puts all these files right
4337 in your package. This program has the following synopsis:
4338
4339 </P>
4340
4341 <PRE>
4342 gettextize [ <VAR>option</VAR>... ] [ <VAR>directory</VAR> ]
4343 </PRE>
4344
4345 <P>
4346 and accepts the following options:
4347
4348 </P>
4349 <DL COMPACT>
4350
4351 <DT><SAMP>`-f'</SAMP>
4352 <DD>
4353 <DT><SAMP>`--force'</SAMP>
4354 <DD>
4355 Force replacement of files which already exist.
4356
4357 <DT><SAMP>`-h'</SAMP>
4358 <DD>
4359 <DT><SAMP>`--help'</SAMP>
4360 <DD>
4361 Display this help and exit.
4362
4363 <DT><SAMP>`--version'</SAMP>
4364 <DD>
4365 Output version information and exit.
4366
4367 </DL>
4368
4369 <P>
4370 If <VAR>directory</VAR> is given, this is the top level directory of a
4371 package to prepare for using GNU <CODE>gettext</CODE>. If not given, it
4372 is assumed that the current directory is the top level directory of
4373 such a package.
4374
4375 </P>
4376 <P>
4377 The program <CODE>gettextize</CODE> provides the following files. However,
4378 no existing file will be replaced unless the option <CODE>--force</CODE>
4379 (<CODE>-f</CODE>) is specified.
4380
4381 </P>
4382
4383 <OL>
4384 <LI>
4385
4386 The <TT>`NLS'</TT> file is copied in the main directory of your package,
4387 the one being at the top level. This file gives the main indications
4388 about how to install and use the Native Language Support features
4389 of your program. You might elect to use a more recent copy of this
4390 <TT>`NLS'</TT> file than the one provided through <CODE>gettextize</CODE>, if
4391 you have one handy. You may also fetch a more recent copy of file
4392 <TT>`NLS'</TT> from most GNU archive sites.
4393
4394 <LI>
4395
4396 A <TT>`po/'</TT> directory is created for eventually holding
4397 all translation files, but initially only containing the file
4398 <TT>`po/Makefile.in.in'</TT> from the GNU <CODE>gettext</CODE> distribution.
4399 (beware the double <SAMP>`.in'</SAMP> in the file name). If the <TT>`po/'</TT>
4400 directory already exists, it will be preserved along with the files
4401 it contains, and only <TT>`Makefile.in.in'</TT> will be overwritten.
4402
4403 <LI>
4404
4405 A <TT>`intl/'</TT> directory is created and filled with most of the files
4406 originally in the <TT>`intl/'</TT> directory of the GNU <CODE>gettext</CODE>
4407 distribution. Also, if option <CODE>--force</CODE> (<CODE>-f</CODE>) is given,
4408 the <TT>`intl/'</TT> directory is emptied first.
4409
4410 </OL>
4411
4412 <P>
4413 If your site support symbolic links, <CODE>gettextize</CODE> will not
4414 actually copy the files into your package, but establish symbolic
4415 links instead. This avoids duplicating the disk space needed in
4416 all packages. Merely using the <SAMP>`-h'</SAMP> option while creating the
4417 <CODE>tar</CODE> archive of your distribution will resolve each link by an
4418 actual copy in the distribution archive. So, to insist, you really
4419 should use <SAMP>`-h'</SAMP> option with <CODE>tar</CODE> within your <CODE>dist</CODE>
4420 goal of your main <TT>`Makefile.in'</TT>.
4421
4422 </P>
4423 <P>
4424 It is interesting to understand that most new files for supporting
4425 GNU <CODE>gettext</CODE> facilities in one package go in <TT>`intl/'</TT>
4426 and <TT>`po/'</TT> subdirectories. One distinction between these two
4427 directories is that <TT>`intl/'</TT> is meant to be completely identical
4428 in all packages using GNU <CODE>gettext</CODE>, while all newly created
4429 files, which have to be different, go into <TT>`po/'</TT>. There is a
4430 common <TT>`Makefile.in.in'</TT> in <TT>`po/'</TT>, because the <TT>`po/'</TT>
4431 directory needs its own <TT>`Makefile'</TT>, and it has been designed so
4432 it can be identical in all packages.
4433
4434 </P>
4435
4436
4437 <H2><A NAME="SEC69" HREF="gettext_toc.html#TOC69">Files You Must Create or Alter</A></H2>
4438
4439 <P>
4440 Besides files which are automatically added through <CODE>gettextize</CODE>,
4441 there are many files needing revision for properly interacting with
4442 GNU <CODE>gettext</CODE>. If you are closely following GNU standards for
4443 Makefile engineering and auto-configuration, the adaptations should
4444 be easier to achieve. Here is a point by point description of the
4445 changes needed in each.
4446
4447 </P>
4448 <P>
4449 So, here comes a list of files, each one followed by a description of
4450 all alterations it needs. Many examples are taken out from the GNU
4451 <CODE>gettext</CODE> 0.10 distribution itself. You may indeed
4452 refer to the source code of the GNU <CODE>gettext</CODE> package, as it
4453 is intended to be a good example and master implementation for using
4454 its own functionality.
4455
4456 </P>
4457
4458
4459
4460 <H3><A NAME="SEC70" HREF="gettext_toc.html#TOC70"><TT>`POTFILES'</TT> in <TT>`po/'</TT></A></H3>
4461
4462 <P>
4463 The <TT>`po/'</TT> directory should receive a file named
4464 <TT>`POTFILES.in'</TT>. This file tells which files, among all program
4465 sources, have marked strings needing translation. Here is an example
4466 of such a file:
4467
4468 </P>
4469
4470 <PRE>
4471 # List of source files containing translatable strings.
4472 # Copyright (C) 1995 Free Software Foundation, Inc.
4473
4474 # Common library files
4475 lib/error.c
4476 lib/getopt.c
4477 lib/xmalloc.c
4478
4479 # Package source files
4480 src/gettextp.c
4481 src/msgfmt.c
4482 src/xgettext.c
4483 </PRE>
4484
4485 <P>
4486 Dashed comments and white lines are ignored. All other lines
4487 list those source files containing strings marked for translation
4488 (see section <A HREF="gettext.html#SEC15">How Marks Appears in Sources</A>), in a notation relative to the top level
4489 of your whole distribution, rather than the location of the
4490 <TT>`POTFILES.in'</TT> file itself.
4491
4492 </P>
4493
4494
4495 <H3><A NAME="SEC71" HREF="gettext_toc.html#TOC71"><TT>`configure.in'</TT> at top level</A></H3>
4496
4497
4498 <OL>
4499 <LI>Declare the package and version.
4500
4501 This is done by a set of lines like these:
4502
4503
4504 <PRE>
4505 PACKAGE=gettext
4506 VERSION=0.10
4507 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
4508 AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
4509 AC_SUBST(PACKAGE)
4510 AC_SUBST(VERSION)
4511 </PRE>
4512
4513 Of course, you replace <SAMP>`gettext'</SAMP> with the name of your package,
4514 and <SAMP>`0.10'</SAMP> by its version numbers, exactly as they
4515 should appear in the packaged <CODE>tar</CODE> file name of your distribution
4516 (<TT>`gettext-0.10.tar.gz'</TT>, here).
4517
4518 <LI>Declare the available translations.
4519
4520 This is done by defining <CODE>ALL_LINGUAS</CODE> to the white separated,
4521 quoted list of available languages, in a single line, like this:
4522
4523
4524 <PRE>
4525 ALL_LINGUAS="de fr"
4526 </PRE>
4527
4528 This example means that German and French PO files are available, so
4529 that these languages are currently supported by your package. If you
4530 want to further restrict, at installation time, the set of installed
4531 languages, this should not be done by modifying <CODE>ALL_LINGUAS</CODE> in
4532 <TT>`configure.in'</TT>, but rather by using the <CODE>LINGUAS</CODE> environment
4533 variable (see section <A HREF="gettext.html#SEC34">Magic for Installers</A>).
4534
4535 <LI>Check for internationalization support.
4536
4537 Here is the main <CODE>m4</CODE> macro for triggering internationalization
4538 support. Just add this line to <TT>`configure.in'</TT>:
4539
4540
4541 <PRE>
4542 ud_GNU_GETTEXT
4543 </PRE>
4544
4545 This call is purposely simple, even if it generates a lot of configure
4546 time checking and actions.
4547
4548 <LI>Obtain some <TT>`libintl.h'</TT> header file.
4549
4550 Once you called <CODE>ud_GNU_GETTEXT</CODE> in <TT>`configure.in'</TT>, use:
4551
4552
4553 <PRE>
4554 AC_LINK_FILES($nls_cv_header_libgt, $nls_cv_header_intl)
4555 </PRE>
4556
4557 This will create one header file <TT>`libintl.h'</TT>. The reason for
4558 this has to do with the fact that some systems, using the Uniforum
4559 message handling functions, already have a file of this name.
4560
4561 The <CODE>AC_LINK_FILES</CODE> call has not been integrated into the
4562 <CODE>ud_GNU_GETTEXT</CODE> macro because there can be only one such call
4563 in a <TT>`configure'</TT> file. If you already use it, you will have to
4564 <EM>merge</EM> the needed <CODE>AC_LINK_FILES</CODE> within yours, by adding
4565 the first argument at the end of the list of your first argument,
4566 and adding the second argument at the end of the list of your second
4567 argument.
4568
4569 <LI>Have output files created.
4570
4571 The <CODE>AC_OUTPUT</CODE> directive, at the end of your <TT>`configure.in'</TT>
4572 file, needs to be modified in two ways:
4573
4574
4575 <PRE>
4576 AC_OUTPUT([<VAR>existing configuration files</VAR> intl/Makefile po/Makefile.in],
4577 [sed -e "/POTFILES =/r po/POTFILES" po/Makefile.in &#62; po/Makefile
4578 <VAR>existing additional actions</VAR>])
4579 </PRE>
4580
4581 The modification to the first argument to <CODE>AC_OUTPUT</CODE> asks
4582 for substitution in the <TT>`intl/'</TT> and <TT>`po/'</TT> directories.
4583 Note the <SAMP>`.in'</SAMP> suffix used for <TT>`po/'</TT> only. This is because
4584 the distributed file is really <TT>`po/Makefile.in.in'</TT>.
4585
4586 The modification to the second argument ensures that <TT>`po/Makefile'</TT>
4587 gets generated out of the <TT>`po/Makefile.in'</TT> just created, including
4588 in it the <TT>`po/POTFILES'</TT> produced by <CODE>ud_GNU_GETTEXT</CODE>.
4589 Two steps are needed because <TT>`po/POTFILES'</TT> can get lengthy in
4590 some packages, too lengthy in fact for being able to merely use an
4591 Autoconf substituted variable, as many <CODE>sed</CODE>s cannot handle very
4592 long lines.
4593
4594 </OL>
4595
4596
4597
4598 <H3><A NAME="SEC72" HREF="gettext_toc.html#TOC72"><TT>`aclocal.m4'</TT> at top level</A></H3>
4599
4600 <P>
4601 If you do not have an <TT>`aclocal.m4'</TT> file in your distribution,
4602 the simplest is taking a copy of <TT>`aclocal.m4'</TT> from
4603 GNU <CODE>gettext</CODE>. But to be precise, you only need macros
4604 <CODE>ud_LC_MESSAGES</CODE>, <CODE>ud_WITH_NLS</CODE> and <CODE>ud_GNU_GETTEXT</CODE>,
4605 so you may use an editor and remove macros you do not need.
4606
4607 </P>
4608 <P>
4609 If you already have an <TT>`aclocal.m4'</TT> file, then you will have
4610 to merge the said macros into your <TT>`aclocal.m4'</TT>. Note that if
4611 you are upgrading from a previous release of GNU <CODE>gettext</CODE>, you
4612 should most probably <EM>replace</EM> the said macros, as they usually
4613 change a little from one release of GNU <CODE>gettext</CODE> to the next.
4614 Their contents may vary as we get more experience with strange systems
4615 out there.
4616
4617 </P>
4618 <P>
4619 These macros check for the internationalization support functions
4620 and related informations. Hopefully, once stabilized, these macros
4621 might be integrated in the standard Autoconf set, because this
4622 piece of <CODE>m4</CODE> code will be the same for all projects using GNU
4623 <CODE>gettext</CODE>.
4624
4625 </P>
4626
4627
4628 <H3><A NAME="SEC73" HREF="gettext_toc.html#TOC73"><TT>`acconfig.h'</TT> at top level</A></H3>
4629
4630 <P>
4631 If you do not have an <TT>`acconfig.h'</TT> file in your distribution,
4632 the simplest is use take a copy of <TT>`acconfig.h'</TT> from
4633 GNU <CODE>gettext</CODE>. But to be precise, you only need the
4634 lines and comments for <CODE>ENABLE_NLS</CODE>, <CODE>HAVE_CATGETS</CODE>,
4635 <CODE>HAVE_GETTEXT</CODE> and <CODE>HAVE_LC_MESSAGES</CODE>, so you may use
4636 an editor and remove everything else. If you already have an
4637 <TT>`acconfig.h'</TT> file, then you should merge the said definitions
4638 into your <TT>`acconfig.h'</TT>.
4639
4640 </P>
4641
4642
4643 <H3><A NAME="SEC74" HREF="gettext_toc.html#TOC74"><TT>`Makefile.in'</TT> at top level</A></H3>
4644
4645 <P>
4646 Here are a few modifications you need to make to your main, top-level
4647 <TT>`Makefile.in'</TT> file.
4648
4649 </P>
4650
4651 <OL>
4652 <LI>
4653
4654 Add the following lines near the beginning of your <TT>`Makefile.in'</TT>,
4655 so the <SAMP>`dist:'</SAMP> goal will work properly (as explained further down):
4656
4657
4658 <PRE>
4659 PACKAGE = @PACKAGE@
4660 VERSION = @VERSION@
4661 </PRE>
4662
4663 <LI>
4664
4665 Add file <TT>`NLS'</TT> to the <CODE>DISTFILES</CODE> definition, so the file gets
4666 distributed.
4667
4668 <LI>
4669
4670 Wherever you process subdirectories in your <TT>`Makefile.in'</TT>, be
4671 sure you also process <CODE>@INTLSUB@</CODE> and <CODE>@POSUB@</CODE>, which
4672 are replaced respectively by <SAMP>`intl'</SAMP> and <SAMP>`po'</SAMP>, or empty
4673 when the configuration processes decides these directories should
4674 not be processed.
4675
4676 Here is an example of a canonical order of processing. In this
4677 example, we also define <CODE>SUBDIRS</CODE> in <CODE>Makefile.in</CODE> for it
4678 to be further used in the <SAMP>`dist:'</SAMP> goal.
4679
4680
4681 <PRE>
4682 SUBDIRS = doc lib @INTLSUB@ src @POSUB@
4683 </PRE>
4684
4685 that you will have to adapt to your own package.
4686
4687 <LI>
4688
4689 A delicate point is the <SAMP>`dist:'</SAMP> goal, as both
4690 <TT>`intl/Makefile'</TT> and <TT>`po/Makefile'</TT> will later assume that the
4691 proper directory has been set up from the main <TT>`Makefile'</TT>. Here is
4692 an example at what the <SAMP>`dist:'</SAMP> goal might look like:
4693
4694
4695 <PRE>
4696 distdir = $(PACKAGE)-$(VERSION)
4697 dist: Makefile
4698 rm -fr $(distdir)
4699 mkdir $(distdir)
4700 chmod 777 $(distdir)
4701 for file in $(DISTFILES); do \
4702 ln $$file $(distdir) 2&#62;/dev/null || cp -p $$file $(distdir); \
4703 done
4704 for subdir in $(SUBDIRS); do \
4705 mkdir $(distdir)/$$subdir || exit 1; \
4706 chmod 777 $(distdir)/$$subdir; \
4707 (cd $$subdir &#38;&#38; $(MAKE) $@) || exit 1; \
4708 done
4709 tar chozf $(distdir).tar.gz $(distdir)
4710 rm -fr $(distdir)
4711 </PRE>
4712
4713 </OL>
4714
4715
4716
4717 <H3><A NAME="SEC75" HREF="gettext_toc.html#TOC75"><TT>`Makefile.in'</TT> in <TT>`src/'</TT></A></H3>
4718
4719 <P>
4720 Some of the modifications made in the main <TT>`Makefile.in'</TT> will
4721 also be needed in the <TT>`Makefile.in'</TT> from your package sources,
4722 which we assume here to be in the <TT>`src/'</TT> subdirectory. Here are
4723 all the modifications needed in <TT>`src/Makefile.in'</TT>:
4724
4725 </P>
4726
4727 <OL>
4728 <LI>
4729
4730 In view of the <SAMP>`dist:'</SAMP> goal, you should have these lines near the
4731 beginning of <TT>`src/Makefile.in'</TT>:
4732
4733
4734 <PRE>
4735 PACKAGE = @PACKAGE@
4736 VERSION = @VERSION@
4737 </PRE>
4738
4739 <LI>
4740
4741 If not done already, you should guarantee that <CODE>top_srcdir</CODE>
4742 gets defined. This will serve for <CODE>cpp</CODE> include files. Just add
4743 the line:
4744
4745
4746 <PRE>
4747 top_srcdir = @top_srcdir@
4748 </PRE>
4749
4750 <LI>
4751
4752 You might also want to define <CODE>subdir</CODE> as <SAMP>`src'</SAMP>, later
4753 allowing for almost uniform <SAMP>`dist:'</SAMP> goals in all your
4754 <TT>`Makefile.in'</TT>. At list, the <SAMP>`dist:'</SAMP> goal below assume that
4755 you used:
4756
4757
4758 <PRE>
4759 subdir = src
4760 </PRE>
4761
4762 <LI>
4763
4764 You should ensure that the final linking will use <CODE>@INTLLIBS@</CODE> as
4765 a library. An easy way to achieve this is to manage that it gets into
4766 <CODE>LIBS</CODE>, like this:
4767
4768
4769 <PRE>
4770 LIBS = @INTLLIBS@ @LIBS@
4771 </PRE>
4772
4773 In most GNU packages one will find a directory <TT>`lib/'</TT> in which a
4774 library containing some helper functions will be build. (You need at
4775 least the few functions which the GNU <CODE>gettext</CODE> Library itself
4776 needs.) However some of the functions in the <TT>`lib/'</TT> also give
4777 messages to the user which of course should be translated, too. Taking
4778 care of this it is not enough to place the support library (say
4779 <TT>`libsupport.a'</TT>) just between the <CODE>@INTLLIBS@</CODE> and
4780 <CODE>@LIBS@</CODE> in the above example. Instead one has to write this:
4781
4782
4783 <PRE>
4784 LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@
4785 </PRE>
4786
4787 <LI>
4788
4789 You should also ensure that directory <TT>`intl/'</TT> will be searched for
4790 C preprocessor include files in all circumstances. So, you have to
4791 manage so both <SAMP>`-I../intl'</SAMP> and <SAMP>`-I$(top_srcdir)/intl'</SAMP> will
4792 be given to the C compiler.
4793
4794 <LI>
4795
4796 Your <SAMP>`dist:'</SAMP> goal has to conform with others. Here is a
4797 reasonable definition for it:
4798
4799
4800 <PRE>
4801 distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
4802 dist: Makefile $(DISTFILES)
4803 for file in $(DISTFILES); do \
4804 ln $$file $(distdir) 2&#62;/dev/null || cp -p $$file $(distdir); \
4805 done
4806 </PRE>
4807
4808 </OL>
4809
4810
4811
4812 <H1><A NAME="SEC76" HREF="gettext_toc.html#TOC76">Concluding Remarks</A></H1>
4813
4814 <P>
4815 We would like to conclude this GNU <CODE>gettext</CODE> manual by presenting
4816 an history of the GNU Translation Project so far. We finally give
4817 a few pointers for those who want to do further research or readings
4818 about Native Language Support matters.
4819
4820 </P>
4821
4822
4823
4824 <H2><A NAME="SEC77" HREF="gettext_toc.html#TOC77">History of GNU <CODE>gettext</CODE></A></H2>
4825
4826 <P>
4827 Internationalization concerns and algorithms have been informally
4828 and casually discussed for years in GNU, sometimes around GNU
4829 <CODE>libc</CODE>, maybe around the incoming <CODE>Hurd</CODE>, or otherwise
4830 (nobody clearly remembers). And even then, when the work started for
4831 real, this was somewhat independently of these previous discussions.
4832
4833 </P>
4834 <P>
4835 This all began in July 1994, when Patrick D'Cruze had the idea and
4836 initiative of internationalizing version 3.9.2 of GNU <CODE>fileutils</CODE>.
4837 He then asked Jim Meyering, the maintainer, how to get those changes
4838 folded into an official release. That first draft was full of
4839 <CODE>#ifdef</CODE>s and somewhat disconcerting, and Jim wanted to find
4840 nicer ways. Patrick and Jim shared some tries and experimentations
4841 in this area. Then, feeling that this might eventually have a deeper
4842 impact on GNU, Jim wanted to know what standards were, and contacted
4843 Richard Stallman, who very quickly and verbally described an overall
4844 design for what was meant to become <CODE>glocale</CODE>, at that time.
4845
4846 </P>
4847 <P>
4848 Jim implemented <CODE>glocale</CODE> and got a lot of exhausting feedback
4849 from Patrick and Richard, of course, but also from Mitchum DSouza
4850 (who wrote a <CODE>catgets</CODE>-like package), Roland McGrath, maybe David
4851 MacKenzie, Pinard, and Paul Eggert, all pushing and
4852 pulling in various directions, not always compatible, to the extent
4853 that after a couple of test releases, <CODE>glocale</CODE> was torn apart.
4854
4855 </P>
4856 <P>
4857 While Jim took some distance and time and became dad for a second
4858 time, Roland wanted to get GNU <CODE>libc</CODE> internationalized, and
4859 got Ulrich Drepper involved in that project. Instead of starting
4860 from <CODE>glocale</CODE>, Ulrich rewrote something from scratch, but
4861 more conformant to the set of guidelines who emerged out of the
4862 <CODE>glocale</CODE> effort. Then, Ulrich got people from the previous
4863 forum to involve themselves into this new project, and the switch
4864 from <CODE>glocale</CODE> to what was first named <CODE>msgutils</CODE>, renamed
4865 <CODE>nlsutils</CODE>, and later <CODE>gettext</CODE>, became officially accepted
4866 by Richard in May 1995 or so.
4867
4868 </P>
4869 <P>
4870 Let's summarize by saying that Ulrich Drepper wrote GNU <CODE>gettext</CODE>
4871 in April 1995. The first official release of the package, including
4872 PO mode, occurred in July 1995, and was numbered 0.7. Other people
4873 contributed to the effort by providing a discussion forum around
4874 Ulrich, writing little pieces of code, or testing. These are quoted
4875 in the <CODE>THANKS</CODE> file which comes with the GNU <CODE>gettext</CODE>
4876 distribution.
4877
4878 </P>
4879 <P>
4880 While this was being done, adapted half a dozen of
4881 GNU packages to <CODE>glocale</CODE> first, then later to <CODE>gettext</CODE>,
4882 putting them in pretest, so providing along the way an effective
4883 user environment for fine tuning the evolving tools. He also took
4884 the responsibility of organizing and coordinating the GNU Translation
4885 Project. After nearly a year of informal exchanges between people from
4886 many countries, translator teams started to exist in May 1995, through
4887 the creation and support by Patrick D'Cruze of twenty unmoderated
4888 mailing lists for that many native languages, and two moderated
4889 lists: one for reaching all teams at once, the other for reaching
4890 all maintainers of internationalized packages in GNU.
4891
4892 </P>
4893 <P>
4894 also wrote PO mode in June 1995 with the collaboration
4895 of Greg McGary, as a kind of contribution to Ulrich's package.
4896 He also gave a hand with the GNU <CODE>gettext</CODE> Texinfo manual.
4897
4898 </P>
4899
4900
4901 <H2><A NAME="SEC78" HREF="gettext_toc.html#TOC78">Related Readings</A></H2>
4902
4903 <P>
4904 Eugene H. Dorr (<TT>`dorre@well.com'</TT>) maintains an interesting
4905 bibliography on internationalization matters, called
4906 <CITE>Internationalization Reference List</CITE>, which is available as:
4907
4908 <PRE>
4909 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
4910 </PRE>
4911
4912 <P>
4913 Michael Gschwind (<TT>`mike@vlsivie.tuwien.ac.at'</TT>) maintains a
4914 Frequently Asked Questions (FAQ) list, entitled <CITE>Programming for
4915 Internationalisation</CITE>. This FAQ discusses writing programs which
4916 can handle different language conventions, character sets, etc.;
4917 and is applicable to all character set encodings, with particular
4918 emphasis on ISO 8859-1. It is regularly published in Usenet
4919 groups <TT>`comp.unix.questions'</TT>, <TT>`comp.std.internat'</TT>,
4920 <TT>`comp.software.international'</TT>, <TT>`comp.lang.c'</TT>,
4921 <TT>`comp.windows.x'</TT>, <TT>`comp.std.c'</TT>, <TT>`comp.answers'</TT>
4922 and <TT>`news.answers'</TT>. The home location of this document is:
4923
4924 <PRE>
4925 ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
4926 </PRE>
4927
4928 <P>
4929 Patrick D'Cruze (<TT>`pdcruze@li.org'</TT>) wrote a tutorial about NLS
4930 matters, and Jochen Hein (<TT>`Hein@student.tu-clausthal.de'</TT>) took
4931 over the responsibility of maintaining it. It may be found as:
4932
4933 <PRE>
4934 ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
4935 ...locale-tutorial-0.8.txt.gz
4936 </PRE>
4937
4938 <P>
4939 This site is mirrored in:
4940
4941 <PRE>
4942 ftp://ftp.ibp.fr/pub/linux/sunsite/
4943 </PRE>
4944
4945 <P>
4946 A French version of the same tutorial should be findable at:
4947
4948 <PRE>
4949 ftp://ftp.ibp.fr/pub/linux/french/docs/
4950 </PRE>
4951
4952 <P>
4953 together with French translations of many Linux-related documents.
4954
4955 </P>
4956 <P><HR><P>
4957 This document was generated on 4 September 1998 using the
4958 <A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
4959 translator version 1.51.</P>
4960 </BODY>
4961 </HTML>