3 <!-- This HTML file has been created by texi2html 1.51 
   4      from gettext.texi on 4 September 1998 --> 
   6 <TITLE>GNU gettext utilities
</TITLE> 
   9 <H1>GNU gettext tools, version 
0.10</H1> 
  10 <H2>Native Language Support Library and Tools
</H2> 
  11 <H2>Edition 
0.10, 
26 November
</H2> 
  12 <ADDRESS>Ulrich Drepper
</ADDRESS> 
  13 <ADDRESS>Jim Meyering
</ADDRESS> 
  14 <ADDRESS>Pinard
</ADDRESS> 
  19 Copyright (C) 
1995 Free Software Foundation, Inc.
 
  23 Permission is granted to make and distribute verbatim copies of
 
  24 this manual provided the copyright notice and this permission notice
 
  25 are preserved on all copies.
 
  29 Permission is granted to copy and distribute modified versions of this
 
  30 manual under the conditions for verbatim copying, provided that the entire
 
  31 resulting derived work is distributed under the terms of a permission
 
  32 notice identical to this one.
 
  36 Permission is granted to copy and distribute translations of this manual
 
  37 into another language, under the above conditions for modified versions,
 
  38 except that this permission notice may be stated in a translation approved
 
  45 <H1><A NAME=
"SEC1" HREF=
"gettext_toc.html#TOC1">Introduction
</A></H1> 
  50 This manual is still in 
<EM>DRAFT
</EM> state.  Some sections are still
 
  51 empty, or almost.  We keep merging material from other sources
 
  52 (essentially email folders) while the proper integration of this
 
  57 In this manual, we use 
<EM>he
</EM> when speaking of the programmer or
 
  58 maintainer, 
<EM>she
</EM> when speaking of the translator, and 
<EM>they
</EM> 
  59 when speaking of the installers or end users of the translated program.
 
  60 This is only a convenience for clarifying the documentation.  It is
 
  61 absolutely not meant to imply that some roles are more appropriate
 
  62 to males or females.  Besides, as you might guess, GNU 
<CODE>gettext
</CODE> 
  63 is meant to be useful for people using computers, whatever their sex,
 
  64 race, religion or nationality!
 
  68 This chapter explains what are the goals seeked by the mere existence
 
  69 of GNU 
<CODE>gettext
</CODE>.  Then, it explains a few wide concepts around
 
  70 Native Language Support, and situates message translation in regard
 
  71 to other aspects of national and cultural variance, as applicable
 
  72 to programs.  It also surveys what are those files used to convey
 
  73 translations.  It explains how the various tools interrelate in the
 
  74 initial generation for these files, and later, how the maintenance
 
  75 cycle usually operate.
 
  81 <H2><A NAME=
"SEC2" HREF=
"gettext_toc.html#TOC2">The Purpose of GNU 
<CODE>gettext
</CODE></A></H2> 
  84 Usually, programs are written and documented in English, and use
 
  85 English at execution time for interacting with users.  This is true
 
  86 not only from within GNU, but also in a great deal of commercial
 
  87 and free software.  Using a common language is quite handy for
 
  88 communication between developers, maintainers and users from all
 
  89 countries.  On the other hand, most people are less comfortable with
 
  90 English than with their own native language, and would rather prefer
 
  91 using their mother tongue for day to day's work, as far as possible.
 
  92 Many would simply 
<EM>love
</EM> seeing their computer screen showing
 
  93 a lot less of English, and far more of their own spoken language.
 
  97 However, to some people, this dream might appear so far fetched that
 
  98 they may believe it is not even worth spending time thinking about
 
  99 it, and they have no confidence at all that the dream might ever
 
 100 become true.  Many did not loose hope yet, and organized themselves.
 
 101 The GNU Translation Project is a formalization of this hope into a
 
 102 workable structure, which has a good chance to get all of us nearer
 
 103 the achievement of a truly multi-lingual set of programs.
 
 107 GNU 
<CODE>gettext
</CODE> is an important step for the GNU Translation
 
 108 Project, as it is an asset on which we may build many other steps.
 
 109 This package offers to programmers, translators and even users, a
 
 110 well integrated set of tools and documentation.  Specifically, the GNU
 
 111 <CODE>gettext
</CODE> utilities are a set of tools that provides a framework
 
 112 to help other GNU packages produce multi-lingual messages.  These tools
 
 113 include a set of conventions about how programs should be written to
 
 114 support message catalogs, a directory and file naming organization
 
 115 for the message catalogs themselves, a runtime library supporting the
 
 116 retrieval of translated messages, and a few stand-alone programs to
 
 117 massage in various ways the sets of translatable strings, or already
 
 118 translated strings.  A special GNU Emacs mode also helps interested
 
 119 parties into preparing these sets, or bringing them up to date.
 
 123 GNU 
<CODE>gettext
</CODE> is designed so it minimizes the impact of
 
 124 internationalization on program sources, keeping this impact as small
 
 125 and hardly noticeable as possible.  Internationalization has better
 
 126 chances of succeeding if it is very light weighted, or at least,
 
 127 appear to be so, when looking at program sources.
 
 131 The GNU Translation Project also uses the GNU 
<CODE>gettext
</CODE> 
 132 distribution as a vehicle for documenting its structure and methods,
 
 133 even if this goes beyond the technicalities of the GNU 
<CODE>gettext
</CODE> 
 134 proper.  By doing so, translators will find in a single place, as
 
 135 far as possible, all they need to know for properly doing their
 
 136 translating work.  Also, this supplementary documentation might also
 
 137 help programmers, and even curious users, at understanding how GNU
 
 138 <CODE>gettext
</CODE> is related to the remainder of the GNU Translation
 
 139 Project, and consequently, have a glimpse at the 
<EM>big picture
</EM>.
 
 144 <H2><A NAME=
"SEC3" HREF=
"gettext_toc.html#TOC3">I18n, L10n, and Such
</A></H2> 
 147 Two long words appear all the time when we discuss support of native
 
 148 language in programs, and these words have a precise meaning, worth
 
 149 being explained here, once and for all in this document.  The words are
 
 150 <EM>internationalization
</EM> and 
<EM>localization
</EM>.  Many people,
 
 151 tired of writing these long words over and over again, took the
 
 152 habit of writing 
<STRONG>i18n
</STRONG> and 
<STRONG>l10n
</STRONG> instead, quoting the first
 
 153 and last letter of each word, and replacing the run of intermediate
 
 154 letters by a number merely telling how many such letters there are.
 
 155 But in this manual, in the sake of clarity, we will patiently write
 
 156 the names in full, each time...
 
 160 By 
<STRONG>internationalization
</STRONG>, one refers to the operation by which a
 
 161 program, or a set of programs turned into a package, is made aware and
 
 162 able to support multiple languages.  This is a generalization process,
 
 163 by which the programs are untied from using only English strings or
 
 164 other English specific habits, and connected to generic ways of doing
 
 165 the same, instead.  Program developers may use various techniques to
 
 166 internationalize their programs, some of them have been standardized.
 
 167 GNU 
<CODE>gettext
</CODE> offers one of these standards.  See section 
<A HREF=
"gettext.html#SEC36">The Programmer's View
</A>.
 
 171 By 
<STRONG>localization
</STRONG>, one means the operation by which, in a set
 
 172 of programs already internationalized, one gives the program all
 
 173 needed information so that it can bend itself to handle its input
 
 174 and output in a fashion which is correct for some native language and
 
 175 cultural habits.  This is a particularisation process, by which generic
 
 176 methods already implemented in an internationalized program are used
 
 177 in specific ways.  The programming environment puts several functions
 
 178 to the programmers disposal which allow this runtime configuration.
 
 179 The formal description of specific set of cultural habits for some
 
 180 country, together with all associated translations targeted to the
 
 181 same native language, is called the 
<STRONG>locale
</STRONG> for this language
 
 182 or country.  Users achieve localization of programs by setting proper
 
 183 values to special environment variables, prior to executing those
 
 184 programs, identifying which locale should be used.
 
 188 In fact, locale message support is only one component of the cultural
 
 189 data that makes up a particular locale.  There are a whole host of
 
 190 routines and functions provided to aid programmers in developing
 
 191 internationalized software and which allows them to access the data
 
 192 stored in a particular locale.  When someone presently refers to a
 
 193 particular locale, they are obviously referring to the data stored
 
 194 within that particular locale.  Similarly, if a programmer is referring
 
 195 to "accessing the locale routines", they are referring to the
 
 196 complete suite of routines that access all of the locale's information.
 
 200 One uses the expression 
<STRONG>Native Language Support
</STRONG>, or merely NLS,
 
 201 for speaking of the overall activity or feature encompassing both
 
 202 internationalization and localization, allowing for multi-lingual
 
 203 interactions in a program.  In a nutshell, one could say that
 
 204 internationalization is the operation by which further localizations
 
 209 Also, very roughly said, when it comes to multi-lingual messages,
 
 210 internationalization is usually taken care of by programmers, and
 
 211 localization is usually taken care of by translators.
 
 216 <H2><A NAME=
"SEC4" HREF=
"gettext_toc.html#TOC4">Aspects in Native Language Support
</A></H2> 
 219 For a totally multi-lingual distribution, there are many things to
 
 220 translate beyond output messages.
 
 227 As of today, GNU 
<CODE>gettext
</CODE> offers a complete toolset for
 
 228 translating messages output by C programs.  Perl scripts and shell
 
 229 scripts also need to be translated.  Even if there are some hooks
 
 230 so this can be done, these hooks are not integrated as well as they
 
 235 Some programs, like 
<CODE>autoconf
</CODE> or 
<CODE>bison
</CODE>, are able
 
 236 to produce other programs (or scripts).  Even if the generating
 
 237 programs themselves are internationalized, the generated programs they
 
 238 produce may need internationalization on their own, and this indirect
 
 239 internationalization could be automated right from the generating
 
 240 program.  In fact, quite usually, generating and generated programs
 
 241 could be internationalized independently, as the effort needed is
 
 246 A few programs include textual tables which might need translation
 
 247 themselves, independently of the strings contained in the program
 
 248 itself.  For example, RFC 
1345 gives an English description for each
 
 249 character which GNU 
<CODE>recode
</CODE> is able to reconstruct at execution.
 
 250 Since these descriptions are extracted from the RFC by mechanical means,
 
 251 translating them properly would require a prior translation of the RFC
 
 256 Almost all programs accept options, which are often worded out so to
 
 257 be descriptive for the English readers; one might want to consider
 
 258 offering translated versions for program options as well.
 
 262 Many programs read, interpret, compile, or are somewhat driven by
 
 263 input files which are texts containing keywords, identifiers, or
 
 264 replies which are inherently translatable.  For example, one may want
 
 265 <CODE>gcc
</CODE> to allow diacriticized characters in identifiers or use
 
 266 translated keywords; 
<SAMP>`rm -i'
</SAMP> might accept something else than
 
 267 <SAMP>`y'
</SAMP> or 
<SAMP>`n'
</SAMP> for replies, etc.  Even if the program will
 
 268 eventually make most of its output in the foreign languages, one has
 
 269 to decide whether the input syntax, option values, etc., are to be
 
 274 The manual accompanying a package, as well as all documentation files
 
 275 in the distribution, could surely be translated, too.  Translating a
 
 276 manual, with the intent of later keeping up with updates, is a major
 
 277 undertaking in itself, generally.
 
 282 As we already stressed, translation is only one aspect of locales.
 
 283 Other internationalization aspects are not currently handled by GNU
 
 284 <CODE>gettext
</CODE>, but perhaps may be handled in future versions.  There
 
 285 are many attributes that are needed to define a country's cultural
 
 286 conventions.  These attributes include beside the country's native
 
 287 language, the formatting of the date and time, the representation of
 
 288 numbers, the symbols for currency, etc.  These local 
<STRONG>rules
</STRONG> are
 
 289 termed the country's locale.  The locale represents the knowledge
 
 290 needed to support the country's native attributes.
 
 294 There are a few major areas which may vary between countries and
 
 295 hence, define what a locale must describe.  The following list helps
 
 296 putting multi-lingual messages into the proper context of other tasks
 
 297 related to locales, and also presents some other areas which GNU
 
 298 <CODE>gettext
</CODE> might eventually tackle, maybe, one of these days.
 
 303 <DT><EM>Characters and Codesets
</EM> 
 305 The codeset most commonly used through out the USA and most English
 
 306 speaking parts of the world is the ASCII codeset.  However, there are
 
 307 many characters needed by various locales that are not found within
 
 308 this codeset.  The 
8-bit ISO 
8859-
1 code set has most of the special
 
 309 characters needed to handle the major European languages.  However, in
 
 310 many cases, the ISO 
8859-
1 font is not adequate.  Hence each locale
 
 311 will need to specify which codeset they need to use and will need
 
 312 to have the appropriate character handling routines to cope with
 
 315 <DT><EM>Currency
</EM> 
 317 The symbols used vary from country to country as does the position
 
 318 used by the symbol.  Software needs to be able to transparently
 
 319 display currency figures in the native mode for each locale.
 
 323 The format of date varies between locales.  For example, Christmas day
 
 324 in 
1994 is written as 
12/
25/
94 in the USA and as 
25/
12/
94 in Australia.
 
 325 Other countries might use ISO 
8061 dates, etc.
 
 327 Time of the day may be noted as 
<VAR>hh
</VAR>:
<VAR>mm
</VAR>, 
<VAR>hh
</VAR>.
<VAR>mm
</VAR>,
 
 328 or otherwise.  Some locales require time to be specified in 
24-hour
 
 329 mode rather than as AM or PM.  Further, the nature and yearly extent
 
 330 of the Daylight Saving correction vary widely between countries.
 
 334 Numbers can be represented differently in different locales.
 
 335 For example, the following numbers are all written correctly for
 
 336 their respective locales:
 
 345 Some programs could go further and use different unit systems, like
 
 346 English units or Metric units, or even take into account variants
 
 347 about how numbers are spelled in full.
 
 349 <DT><EM>Messages
</EM> 
 351 The most obvious area is the language support within a locale.  This is
 
 352 where GNU 
<CODE>gettext
</CODE> provide an ease for developers and users to
 
 353 easily change the language that the software uses to communicate to
 
 359 In the near future we see no chance that beside message handling
 
 360 more components of locale will be made available for use in other
 
 361 GNU packages.  The reason for this is that most modern system provide
 
 362 a more or less reasonable support for at least some of the missing
 
 363 components.  Another point is that the GNU libc and Linux will get
 
 364 a new and complete implementation of the whole locale functionality
 
 365 which could be adopted by system lacking a reasonable locale support.
 
 370 <H2><A NAME=
"SEC5" HREF=
"gettext_toc.html#TOC5">Files Conveying Translations
</A></H2> 
 373 The letters PO in 
<TT>`.po'
</TT> files means Portable Object, to
 
 374 distinguish it from 
<TT>`.mo'
</TT> files, where MO stands for Machine
 
 375 Object.  This paradigm, as well as the PO file format, is inspired
 
 376 by the NLS standard developed by Uniforum, and implemented by Sun
 
 377 in their Solaris system.
 
 381 PO files are meant to be read and edited by humans, and associate each
 
 382 original, translatable string of a given package with its translation
 
 383 in a particular target language.  A single PO file is dedicated to
 
 384 a single target language.  If a package supports many languages,
 
 385 there is one such PO file per language supported, and each package
 
 386 has its own set of PO files.  These PO files are best created by
 
 387 the 
<CODE>xgettext
</CODE> program, and later updated or refreshed through
 
 388 the 
<CODE>tupdate
</CODE> program.  Program 
<CODE>xgettext
</CODE> extracts all
 
 389 marked messages from a set of C files and initializes a PO file with
 
 390 empty translations.  Program 
<CODE>tupdate
</CODE> takes care of adjusting
 
 391 PO files between releases of the corresponding sources, commenting
 
 392 obsolete entries, initializing new ones, and updating all source
 
 393 line references.  Files ending with 
<TT>`.pot'
</TT> are kind of base
 
 394 translation files found in distributions, in PO file format, and
 
 395 <TT>`.pox'
</TT> files are often temporary PO files.
 
 399 MO files are meant to be read by programs, and are binary in nature.
 
 400 A few systems already offer tools for creating and handling MO files
 
 401 as part of the Native Language Support coming with the system, but the
 
 402 format of these MO files is often different from system to system,
 
 403 and non-portable.  They do not necessary use 
<TT>`.mo'
</TT> for file
 
 404 extensions, but since system libraries are also used for accessing
 
 405 these files, it works as long as the system is self-consistent about
 
 406 it.  If GNU 
<CODE>gettext
</CODE> is able to interface with the tools already
 
 407 provided with systems, it will consequently let these provided tools
 
 408 take care of generating the MO files.  Or else, if such tools are not
 
 409 found or do not seem usable, GNU 
<CODE>gettext
</CODE> will use its own ways
 
 410 and its own format for MO files.  Files ending with 
<TT>`.gmo'
</TT> are
 
 411 really MO files, when it is known that these files use the GNU format.
 
 416 <H2><A NAME=
"SEC6" HREF=
"gettext_toc.html#TOC6">Overview of GNU 
<CODE>gettext
</CODE></A></H2> 
 419 The following diagram summarizes the relation between the files
 
 420 handled by GNU 
<CODE>gettext
</CODE> and the tools acting on these files.
 
 421 It is followed by a somewhat detailed explanations, which you should
 
 422 read while keeping an eye on the diagram.  Having a clear understanding
 
 423 of these interrelations would surely help programmers, translators
 
 429 Original C Sources ---
> PO mode ---
> Marked C Sources ---.
 
 431               .---------
<--- GNU gettext Library         |
 
 432 .--- make 
<---+                                          |
 
 433 |             `---------
<--------------------+-----------'
 
 435 |   .-----
<--- PACKAGE.pot 
<--- xgettext 
<---'   .---
<--- PO Compendium
 
 438 |   `---.                                            +---
> PO mode ---.
 
 439 |       +----
> tupdate -------
> LANG.pox ---
>--------'                |
 
 442 |   `-------------
<---------------.                                   |
 
 443 |                                 +--- LANG.po 
<--- New LANG.pox 
<----'
 
 444 |   .--- LANG.gmo 
<--- msgfmt 
<---'
 
 446 |   `---
> install ---
> /.../LANG/PACKAGE.mo ---.
 
 447 |                                              +---
> "Hello world!"
 
 448 `-------
> install ---
> /.../bin/PROGRAM -------'
 
 452 The indication 
<SAMP>`PO mode'
</SAMP> appears in two places in this picture,
 
 453 and you may safely read it as merely meaning "hand editing", using
 
 454 any editor of your choice, really.  However, for those of you being
 
 455 the lucky users of GNU Emacs, PO mode has been specifically created
 
 456 for providing a cosy environment for editing or modifying PO files.
 
 457 While editing a PO file, PO mode allows for the easy browsing of
 
 458 auxiliary and compendium PO files, as well as following references into
 
 459 the set of C program sources from which PO files has been derived.
 
 460 It has a few special features, among which the interactive marking
 
 461 of program strings as translatable, and the validatation of PO files
 
 462 with easy repositioning to PO file lines showing errors.
 
 466 As a programmer, the first step into bringing GNU 
<CODE>gettext
</CODE> 
 467 into your package is identifying, right in the C sources, which
 
 468 strings are meant to be translatable, and which are untranslatable.
 
 469 This tedious job can be done a little more comfortably using PO
 
 470 mode, but you can use any means being usual to you for modifying your
 
 471 C sources.  Some other simple, standard changes are also needed to
 
 472 properly initialize the translation library.  See section 
<A HREF=
"gettext.html#SEC13">Preparing Program Sources
</A>, for
 
 473 more information about all this.
 
 477 Once the C sources have been modified, the 
<CODE>xgettext
</CODE> program
 
 478 is used to find and extract all translatable strings, and create an
 
 479 initial PO file out of all these.  This 
<TT>`
<VAR>package
</VAR>.pot'
</TT> file
 
 480 contains all original program strings, it has sets of pointers to
 
 481 exactly where in C sources each string is used, and all translations
 
 482 are set to empty.  The letter 
<KBD>t
</KBD> in 
<TT>`.pot'
</TT> marks that this is
 
 483 a Template PO file, not yet oriented towards any particular language.
 
 484 See section 
<A HREF=
"gettext.html#SEC19">Invoking the 
<CODE>xgettext
</CODE> Program
</A>, for more details about how one calls the
 
 485 <CODE>xgettext
</CODE> program.  If you are 
<EM>really
</EM> lazy, you might
 
 486 be interested at working a lot more right away, and preparing the
 
 487 whole distribution setup (see section 
<A HREF=
"gettext.html#SEC65">The Maintainer's View
</A>).  By doing so, you
 
 488 spare typing the 
<CODE>xgettext
</CODE> command yourself, as 
<CODE>make
</CODE> 
 489 should now generate the proper things automatically for you!
 
 493 The first time through, there is no 
<TT>`
<VAR>lang
</VAR>.po'
</TT> yet, so the
 
 494 <CODE>tupdate
</CODE> step may be skipped and replaced by a mere copy of
 
 495 <TT>`
<VAR>package
</VAR>.pot'
</TT> to 
<TT>`
<VAR>lang
</VAR>.pox'
</TT>, where 
<VAR>lang
</VAR> 
 496 represents the target language.
 
 500 Then comes the initial translation of messages.  Translation in
 
 501 itself is a whole matter, still exclusively meant for humans,
 
 502 and whose complexity far overwhelms the level of this manual.
 
 503 Nevertheless, a few hints are given in some other chapter of this
 
 504 manual (see section 
<A HREF=
"gettext.html#SEC54">The Translator's View
</A>).  You will also find there indications
 
 505 about how to contact translating teams, or becoming part of them,
 
 506 for sharing your translating concerns with others who target the same
 
 511 While adding the translated messages into the 
<TT>`
<VAR>lang
</VAR>.pox'
</TT> 
 512 PO file, if you do not have GNU Emacs handy, you are on your own
 
 513 for ensuring that your fully respect the PO file format, and quoting
 
 514 conventions (see section 
<A HREF=
"gettext.html#SEC9">The Format of PO Files
</A>).  This is surely not an impossible task,
 
 515 as this is the way many people handled PO files already for Uniforum or
 
 516 Solaris.  On the other hand, using PO mode in GNU Emacs, most details
 
 517 of PO file format are taken care for you, but you have to acquire
 
 518 some familiarity with PO mode itself.  Besides main PO mode commands
 
 519 (see section 
<A HREF=
"gettext.html#SEC10">Main Commands
</A>), you should know how to move between entries
 
 520 (see section 
<A HREF=
"gettext.html#SEC11">Entry Positioning
</A>), and how to handle untranslated entries
 
 521 (see section 
<A HREF=
"gettext.html#SEC24">Untranslated Entries
</A>).
 
 525 If some common translations have already been saved into a compendium
 
 526 PO file, translators may use PO mode for initializing untranslated
 
 527 entries from the compendium, and also save selected translations into
 
 528 the compendium, updating it (see section 
<A HREF=
"gettext.html#SEC21">Using Translation Compendiums
</A>).  Compendium files
 
 529 are meant to be exchanged between members of a given translation team.
 
 533 Programs, or packages of programs, are dynamic in nature: users write
 
 534 bug reports and suggestion for improvements, maintainers react by
 
 535 modifying programs in various ways.  The fact that a package has
 
 536 already been internationalized should not make maintainers shy
 
 537 of adding new strings, or modifying strings already translated.
 
 538 They just do their job the best they can.  For the GNU Translation
 
 539 Project to work smoothly, it is important that maintainers do not
 
 540 carry translation concerns on their already loaded shoulders, and that
 
 541 translators be kept as free as possible of programmatic concerns.
 
 545 The only concern maintainers should have is carefully marking new
 
 546 strings are translatable, when they should be, and do not otherwise
 
 547 worry about them being translated, as this will come in proper time.
 
 548 Consequently, when programs and their strings are adjusted in various
 
 549 ways by maintainers, and for matters usually unrelated to translation,
 
 550 <CODE>xgettext
</CODE> would construct 
<TT>`
<VAR>package
</VAR>.pot'
</TT> files which are
 
 551 evolving over time, so the translations carried by 
<TT>`
<VAR>lang
</VAR>.po'
</TT> 
 552 are slowly fading out of date.
 
 556 It is important for translators (and even maintainers) to understand
 
 557 that package translation is a continuous process in the lifetime of a
 
 558 package, and not something which is done once and for all at the start.
 
 559 After an initial burst of translation activity for a given package,
 
 560 interventions are needed once in a while, because here and there,
 
 561 translated entries become obsolete, and new untranslated entries
 
 562 appear, needing translation.
 
 566 The 
<CODE>tupdate
</CODE> program has the purpose of refreshing an already
 
 567 existing 
<TT>`
<VAR>lang
</VAR>.po'
</TT> file, by comparing it with a newer
 
 568 <TT>`
<VAR>package
</VAR>.pot'
</TT> template file, extracted by 
<CODE>xgettext
</CODE> 
 569 out of recent C sources.  The refreshing operation adjusts all
 
 570 references to C source locations for strings, since these strings
 
 571 move as programs are modified.  Also, 
<CODE>tupdate
</CODE> comments out as
 
 572 obsolete, in 
<TT>`
<VAR>lang
</VAR>.pox'
</TT>, those already translated entries
 
 573 which are no longer used in the program sources (see section 
<A HREF=
"gettext.html#SEC25">Obsolete Entries
</A>.  It finally discovers new strings and insert them in
 
 574 the resulting PO file as untranslated entries (see section 
<A HREF=
"gettext.html#SEC24">Untranslated Entries
</A>.  See section 
<A HREF=
"gettext.html#SEC23">Invoking the 
<CODE>tupdate
</CODE> Program
</A>, for more information about what
 
 575 <CODE>tupdate
</CODE> really does.
 
 579 Whatever route or means taken, the goal is obtaining an updated
 
 580 <TT>`
<VAR>lang
</VAR>.pox'
</TT> file offering translations for all strings.
 
 581 When this is properly achieved, this file 
<TT>`
<VAR>lang
</VAR>.pox'
</TT> may
 
 582 take the place of the previous official 
<TT>`
<VAR>lang
</VAR>.po'
</TT> file.
 
 586 The time mobility, or fluidity of PO files, is an integral part of
 
 587 the translation game, and should be well understood, and accepted.
 
 588 People resisting it will have a hard time participating in the GNU
 
 589 Translation Project, or will give a hard time to other participants!
 
 590 In particular, maintainers should relax and include all available PO
 
 591 files in their distributions, even if these have not recently been
 
 592 updated, without banging or otherwise trying to exert pressure on the
 
 593 translator teams to get the job done.  The pressure should rather
 
 594 come from the community of users speaking a particular language,
 
 595 and maintainers should consider themselves fairly relieved of any
 
 596 concern about the adequacy of translation files.  On the other hand,
 
 597 translators should reasonably try updating the PO files they are
 
 598 responsible for, while the package is undergoing pretest, prior to
 
 599 an official distribution.
 
 603 Once the PO file is complete and dependable, the 
<CODE>msgfmt
</CODE> program
 
 604 is used for turning the PO file into a machine-oriented format, which
 
 605 may yield efficient retrieval of translations by the programs of the
 
 606 package, whenever needed at runtime (see section 
<A HREF=
"gettext.html#SEC31">The Format of GNU MO Files
</A>).  See section 
<A HREF=
"gettext.html#SEC30">Invoking the 
<CODE>msgfmt
</CODE> Program
</A>, for more information about all modalities of execution
 
 607 for the 
<CODE>msgfmt
</CODE> program.
 
 611 Finally, the modified and marked C sources are compiled and linked
 
 612 with the GNU 
<CODE>gettext
</CODE> library, usually through the operation of
 
 613 <CODE>make
</CODE>, given a suitable 
<TT>`Makefile'
</TT> exists for the project,
 
 614 and the resulting executable is installed somewhere users will find it.
 
 615 The MO files themselves should also be properly installed.  Given the
 
 616 appropriate environment variables are set (see section 
<A HREF=
"gettext.html#SEC35">Magic for End Users
</A>), the
 
 617 program should localize itself automatically, whenever it executes.
 
 621 The remaining of this manual has the purpose of deepening the various
 
 622 steps outlined in this section.
 
 627 <H1><A NAME=
"SEC7" HREF=
"gettext_toc.html#TOC7">PO Files and PO Mode Basics
</A></H1> 
 630 The GNU 
<CODE>gettext
</CODE> toolset helps programmers and translators
 
 631 at producing, updating and using translation files, mainly those
 
 632 PO files which are textual, editable files.  This chapter insists
 
 633 on the format of PO files, and contains a PO mode starter.  PO mode
 
 634 description is spread over this manual instead of being concentrated
 
 635 in one place, this chapter presents only the basics of PO mode.
 
 641 <H2><A NAME=
"SEC8" HREF=
"gettext_toc.html#TOC8">Completing GNU 
<CODE>gettext
</CODE> Installation
</A></H2> 
 644 Once you have received, unpacked, configured and compiled the GNU
 
 645 <CODE>gettext
</CODE> distribution, the 
<SAMP>`make install'
</SAMP> command puts in
 
 646 place the programs 
<CODE>xgettext
</CODE>, 
<CODE>msgfmt
</CODE>, 
<CODE>gettext
</CODE>, and
 
 647 <CODE>tupdate
</CODE>, as well as their available message catalogs.  For
 
 648 completing a comfortable installation, you might also want to make the
 
 649 PO mode available to your GNU Emacs users.
 
 653 To finish the installation of the PO mode, you might want modify your
 
 654 file 
<TT>`.emacs'
</TT>, once and for all, so it contains a few lines looking
 
 660 (setq auto-mode-alist
 
 661       (cons '("\\.pox?\\'" . po-mode) auto-mode-alist))
 
 662 (autoload 'po-mode "po-mode")
 
 666 Later, whenever you edit some 
<TT>`.po'
</TT> or 
<TT>`.pox'
</TT> file, Emacs
 
 667 loads 
<TT>`po-mode.elc'
</TT> (or 
<TT>`po-mode.el'
</TT>) as needed, and
 
 668 automatically activate PO mode commands for the associated buffer.
 
 669 The string 
<EM>PO
</EM> appears in the mode line for any buffer for
 
 670 which PO mode is active.  Many PO files may be active at once in a
 
 671 single Emacs session.
 
 676 <H2><A NAME=
"SEC9" HREF=
"gettext_toc.html#TOC9">The Format of PO Files
</A></H2> 
 679 A PO file is made up of many entries, each entry holding the relation
 
 680 between an original untranslated string and its corresponding
 
 681 translation.  All entries in a given PO file usually pertain
 
 682 to a single project, and all translations are expressed in a single
 
 683 target language.  One PO file 
<STRONG>entry
</STRONG> has the following schematic
 
 689 <VAR>white-space
</VAR> 
 690 #  
<VAR>translator-comments
</VAR> 
 691 #. 
<VAR>automatic-comments
</VAR> 
 692 #: 
<VAR>reference
</VAR>...
 
 693 msgid 
<VAR>untranslated-string
</VAR> 
 694 msgstr 
<VAR>translated-string
</VAR> 
 698 The general structure of a PO file should be well understood by
 
 699 the translator.  When using PO mode, very little has to be known
 
 700 about the format details, as PO mode takes care of them for her.
 
 704 Entries begin with some optional white space.  Usually, when generated
 
 705 through GNU 
<CODE>gettext
</CODE> tools, there is exactly one blank line
 
 706 between entries.  Then comments follow, on lines all starting with the
 
 707 character 
<KBD>#
</KBD>.  There are two kinds of comments: those which have
 
 708 some white space immediately following the 
<KBD>#
</KBD>, which comments are
 
 709 created and maintained exclusively by the translator, and those which
 
 710 have some non-white character just after the 
<KBD>#
</KBD>, which comments
 
 711 are created and maintained automatically by GNU 
<CODE>gettext
</CODE> tools.
 
 712 All comments, of any kind, are optional.
 
 716 After white space and comments, entries show two strings, giving
 
 717 first the untranslated string as it appears in the original program
 
 718 sources, and then, the translation of this string.  The original
 
 719 string is introduced by the keyword 
<CODE>msgid
</CODE>, and the translation,
 
 720 by 
<CODE>msgstr
</CODE>.  The two strings, untranslated and translated,
 
 721 are quoted in various ways in the PO file, using 
<KBD>"</KBD> 
 722 delimiters and <KBD>\</KBD> escapes, but the translator does not really 
 723 have to pay attention to the precise quoting format, as PO mode fully 
 724 intend to take care of quoting for her. 
 728 The <CODE>msgid</CODE> strings, as well as automatic comments, are produced 
 729 and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not 
 730 provide means for the translator to alter these.  The most she can 
 731 do is merely deleting them, and only by deleting the whole entry. 
 732 On the other hand, the <CODE>msgstr</CODE> string, as well as translator 
 733 comments, are really meant for the translator, and PO mode gives her 
 734 the full control she needs. 
 738 It happens that some lines, usually whitespace or comments, follow the 
 739 very last entry of a PO file.  Such lines are not part of any entry, 
 740 and PO mode is unable to take action on those lines.  By using the 
 741 PO mode function <KBD>M-x po-normalize</KBD>, the translator may get 
 742 rid of those spurious lines.  See section <A HREF="gettext.html#SEC12
">Normalizing Strings in Entries</A>. 
 746 The remainder of this section may be safely skipped for those using 
 747 PO mode, yet it may be interesting for everybody to have a better 
 748 idea of the precise format of a PO file.  On the other hand, those 
 749 not having GNU Emacs handy should carefully continue reading on. 
 753 Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects 
 754 the C syntax for a character string, including the surrounding quotes 
 755 and imbedded backslashed escape sequences.  When the time comes 
 756 to write multi-line strings, one should not use escaped newlines. 
 757 Instead, a closing quote should follow the last character on the 
 758 line to be continued, and an opening quote should resume the string 
 759 at the beginning of the following PO file line.  For example: 
 765 "Here is an example of how one might continue a very long string\n"
 
 766 "for the common case the string represents multi-line output.\n"
 
 770 In this example, the empty string is used on the first line, for
 
 771 allowing the better alignment of the 
<KBD>H
</KBD> from the word 
<SAMP>`Here'
</SAMP> 
 772 over the 
<KBD>f
</KBD> from the word 
<SAMP>`for'
</SAMP>.  In this example, the
 
 773 <CODE>msgid
</CODE> keyword is followed by three strings, which are meant
 
 774 to be concatenated.  Concatenating the empty string does not change
 
 775 the resulting overall string, but it is a way for us to comply with
 
 776 the necessity of 
<CODE>msgid
</CODE> to be followed by a string on the same
 
 777 line, while keeping the multi-line presentation left-justified, as
 
 778 we find this to be cleaner disposition.  The empty string could have
 
 779 been omitted, but only if the string starting with 
<SAMP>`Here'
</SAMP> was
 
 780 promoted on the first line, right after 
<CODE>msgid
</CODE>.
<A NAME=
"DOCF1" HREF=
"gettext_foot.html#FOOT1">(
1)
</A> It was not really necessary
 
 781 either to switch between the two last quoted strings immediately after
 
 782 the newline 
<SAMP>`\n'
</SAMP>, the switch could have occurred after 
<EM>any
</EM> 
 783 other character, we just did it this way because it is neater.
 
 787 One should carefully distinguish between end of lines marked as
 
 788 <SAMP>`\n'
</SAMP> <EM>inside
</EM> quotes, which are part of the represented
 
 789 string, and end of lines in the PO file itself, outside string quotes,
 
 790 which have no incidence on the represented string.
 
 794 Outside strings, white lines and comments may be used freely.
 
 795 Comments start at the beginning of a line with 
<SAMP>`#'
</SAMP> and extend
 
 796 until the end of the PO file line.  Comments written by translators
 
 797 should have the initial 
<SAMP>`#'
</SAMP> immediately followed by some white
 
 798 space.  If the 
<SAMP>`#'
</SAMP> is not immediately followed by white space,
 
 799 this comment is most likely generated and managed by specialized GNU
 
 800 tools, and might disappear or be replaced unexpectandly when the PO
 
 801 file is given to 
<CODE>tupdate
</CODE>.
 
 806 <H2><A NAME=
"SEC10" HREF=
"gettext_toc.html#TOC10">Main Commands
</A></H2> 
 809 When Emacs finds a PO file in a window, PO mode is activated
 
 810 for that window.  This puts the window read-only and establishes a
 
 811 po-mode-map, which is a genuine Emacs mode, in that way that it is
 
 812 not derived from text mode in any way.
 
 816 The main PO commands are those who do not fit in the other categories in
 
 817 subsequent sections, they allow for quitting PO mode or managing windows
 
 825 Undo last modification to the PO file.
 
 829 Quit processing and save the PO file.
 
 833 Temporary leave the PO file window.
 
 837 Show help about PO mode.
 
 841 Give some PO file statistics.
 
 845 Batch validate the format of the whole PO file.
 
 850 The command 
<KBD>u
</KBD> (
<CODE>po-undo
</CODE>) interfaces to the GNU Emacs
 
 851 <EM>undo
</EM> facility.  See section `Undoing Changes' in 
<CITE>The Emacs Editor
</CITE>.  Each time 
<KBD>u
</KBD> is typed, modifications the translator
 
 852 did to the PO file are undone a little more.  For the purpose of
 
 853 undoing, each PO mode command is atomic.  This is especially true for
 
 854 the 
<KBD><KBD>RET
</KBD></KBD> command: the whole edition made by using a single
 
 855 use of this command is undone at once, even if the edition itself
 
 856 implied several actions.  However, while in the editing window, one
 
 857 can undo the edition work quite parsimoniously.
 
 861 The command 
<KBD>q
</KBD> (
<CODE>po-quit
</CODE>) is used when the translator is
 
 862 done with the PO file.  If the file has been modified, it is saved
 
 863 on disk first.  However, prior to all this, the command checks if
 
 864 some untranslated message remains in the PO file and, if yes, the
 
 865 translator is asked if she really wants to leave working with this
 
 866 PO file.  This is the preferred way of getting rid of an Emacs PO
 
 867 file buffer.  Merely killing it through the usual command 
<KBD>C-x
 
 868 k
</KBD> (
<CODE>kill-buffer
</CODE>), say, has the unnice effect of leaving a PO
 
 869 internal work buffer behind.
 
 873 The command 
<KBD>o
</KBD> (
<CODE>po-other-window
</CODE>) is another, softer
 
 874 way, to leave PO mode, temporarily.  It just moves the cursor in
 
 875 some other Emacs window, and pops one if necessary.  For example, if
 
 876 the translator just got PO mode to show some source context in some
 
 877 other, she might discover some apparent bug in the program source
 
 878 that needs correction.  This command allows the translator to change
 
 879 sex, become a programmer, and have the cursor right into the window
 
 880 containing the program she (or rather 
<EM>he
</EM>) wants to modify.
 
 881 By later getting the cursor back in the PO file window, or by
 
 882 asking Emacs to edit this file once again, PO mode is then recovered.
 
 886 The command 
<KBD>h
</KBD> (
<CODE>po-help
</CODE>) displays a summary of all
 
 887 available PO mode commands.  The translator should then type any
 
 888 character to resume normal PO mode operations.  The command 
<KBD>?
</KBD> 
 889 has the same effect as 
<KBD>h
</KBD>.
 
 893 The command 
<KBD>=
</KBD> (
<CODE>po-statistics
</CODE>) computes the total number
 
 894 of entries in the PO file, the ordinal of the current entry
 
 895 (counted from 
1), the number of untranslated entries, the number of
 
 896 obsolete entries, and displays all these numbers.
 
 900 The command 
<KBD>v
</KBD> (
<CODE>po-validate
</CODE>) launches 
<CODE>msgfmt
</CODE> in
 
 901 verbose mode over the current PO file.  This command first offers
 
 902 to save the current PO file on disk.  The 
<CODE>msgfmt
</CODE> tool, from
 
 903 GNU 
<CODE>gettext
</CODE>, has the purpose of creating an MO file out of a
 
 904 PO file, and PO mode uses the features of this program for checking
 
 905 the overall format of a PO file, as well as all individual entries.
 
 909 The program 
<CODE>msgfmt
</CODE> runs asynchronously with Emacs, so
 
 910 the translator regains control immediately while her PO file
 
 911 is being studied.  Error output is collected in the GNU Emacs
 
 912 <SAMP>`*compilation*'
</SAMP> buffer, displayed in another window.  The regular
 
 913 GNU Emacs command 
<KBD>C-x`
</KBD> (
<CODE>next-error
</CODE>), as well as other
 
 914 usual compile commands, allow the translator to reposition quickly to
 
 915 the offending parts of the PO file.  Once the cursor on the line in
 
 916 error, the translator may decide for any PO mode action which would
 
 917 help correcting the error.
 
 922 <H2><A NAME=
"SEC11" HREF=
"gettext_toc.html#TOC11">Entry Positioning
</A></H2> 
 925 The cursor in a PO file window is almost always part of
 
 926 an entry.  The only exceptions are the special case when the cursor
 
 927 is after the last entry in the file, or when the PO file is
 
 928 empty.  The entry where the cursor is found to be is said to be the
 
 929 current entry.  Many PO mode commands operate on the current entry,
 
 930 so moving the cursor does more than allowing the translator to browse
 
 931 the PO file, this also selects on which entry commands operate.
 
 935 Some PO mode commands alter the position of the cursor in a specialized
 
 936 way.  A few of those special purpose positioning are described here,
 
 937 the others are described in following sections.
 
 944 Redisplay the current entry.
 
 950 Select the entry after the current one.
 
 956 Select the entry before the current one.
 
 960 Select the first entry in the PO file.
 
 964 Select the last entry in the PO file.
 
 968 Record the location of the current entry for later use.
 
 972 Return to a previously saved entry location.
 
 976 Exchange the current entry location with the previously saved one.
 
 981 Any GNU Emacs command able to reposition the cursor may be used
 
 982 to select the current entry in PO mode, including commands which
 
 983 move by characters, lines, paragraphs, screens or pages, and search
 
 984 commands.  However, there is a kind of standard way to display the
 
 985 current entry in PO mode, which usual GNU Emacs commands moving
 
 986 the cursor do not especially try to enforce.  The command 
<KBD>.
</KBD> 
 987 (
<CODE>po-current-entry
</CODE>) has the sole purpose of redisplaying the
 
 988 current entry properly, after the current entry has been changed by
 
 989 means external to PO mode, or the Emacs screen otherwise altered.
 
 993 It is yet to decide if PO mode would help the translator, or otherwise
 
 994 irritate her, by forcing a more fixed window disposition while she
 
 995 is doing her work.  We originally had quite precise ideas about
 
 996 how windows should behave, but on the other hand, anyone used to
 
 997 GNU Emacs is often happy to keep full control.  Maybe a fixed window
 
 998 disposition might be offered as a PO mode option that the translator
 
 999 might activate or deactivate at will, so it could be offered on an
 
1000 experimental basis.  If nobody feels a real need for using it, or
 
1001 a compulsion for writing it, we might as well drop this whole idea.
 
1002 The incentive for doing it should come from translators rather than
 
1003 programmers, as opinions from an experienced translator are surely
 
1004 more worth to me than opinions from programmers 
<EM>thinking
</EM> about
 
1005 how 
<EM>others
</EM> should do translation.
 
1009 The commands 
<KBD>n
</KBD> (
<CODE>po-next-entry
</CODE>) and 
<KBD>p
</KBD> 
1010 (
<CODE>po-previous-entry
</CODE>) move the cursor the entry following,
 
1011 or preceding, the current one.  If 
<KBD>n
</KBD> is given while the
 
1012 cursor is on the last entry of the PO file, or if 
<KBD>p
</KBD> 
1013 is given while the cursor is on the first entry, no move is done.
 
1014 <KBD><KBD>SPC
</KBD></KBD> and 
<KBD><KBD>DEL
</KBD></KBD> are alternate keys for 
<KBD>n
</KBD> and
 
1015 <KBD>p
</KBD>, respectively.
 
1019 The commands 
<KBD><</KBD> (
<CODE>po-first-entry
</CODE>) and 
<KBD>></KBD> 
1020 (
<CODE>po-last-entry
</CODE>) move the cursor to the first entry, or last
 
1021 entry, of the PO file.  When the cursor is located past the last
 
1022 entry in a PO file, most PO mode commands will return an error saying
 
1023 <SAMP>`After last entry'
</SAMP>.  However, the commands 
<KBD><</KBD> and 
<KBD>></KBD> 
1024 have the special property of being able to work even when the cursor
 
1025 is not into some PO file entry, and you may use them for nicely
 
1026 correcting this situation.  But even these commands will fail on a
 
1027 truly empty PO file.  There are development plans for PO mode for it
 
1028 to interactively fill an empty PO file from sources.  See section 
<A HREF=
"gettext.html#SEC16">Marking Translatable Strings
</A>.
 
1032 The translator may decide, before working at the translation of
 
1033 a particular entry, that she needs browsing the remainder of the
 
1034 PO file, maybe for finding the terminology or phraseology used
 
1035 in related entries.  She can of course use the standard Emacs idioms
 
1036 for saving the current cursor location in some register, and use that
 
1037 register for getting back, or else, to use the location ring.
 
1041 PO mode offers another approach, by which cursor locations may be saved
 
1042 onto a special stack.  The command 
<KBD>m
</KBD> (
<CODE>po-push-location
</CODE>)
 
1043 merely adds the location of current entry to the stack, pushing
 
1044 the already saved locations under the new one.  The command
 
1045 <KBD>l
</KBD> (
<CODE>po-pop-location
</CODE>) consumes the top stack element and
 
1046 reposition the cursor to the entry associated with that top element.
 
1047 This position is then lost, for the next 
<KBD>l
</KBD> will move the cursor
 
1048 to the previously saved location, and so on until locations remain
 
1053 If the translator wants the position to be kept on the location stack,
 
1054 maybe for taking a mere look at the entry associated with the top
 
1055 element, then go elsewhere with the intent of getting back later, she
 
1056 ought to use 
<KBD>m
</KBD> immediately after 
<KBD>l
</KBD>.
 
1060 The command 
<KBD>x
</KBD> (
<CODE>po-exchange-location
</CODE>) simultaneously
 
1061 reposition the cursor to the entry associated with the top element of
 
1062 the stack of saved locations, and replace that top element with the
 
1063 location of the current entry before the move.  Consequently, repeating
 
1064 the 
<KBD>x
</KBD> command toggles alternatively between two entries.
 
1065 For achieving this, the translator will position the cursor on the
 
1066 first entry, use 
<KBD>m
</KBD>, then position to the second entry, and
 
1067 merely use 
<KBD>x
</KBD> for making the switch.
 
1072 <H2><A NAME=
"SEC12" HREF=
"gettext_toc.html#TOC12">Normalizing Strings in Entries
</A></H2> 
1075 There are many different ways for encoding a particular string into a
 
1076 PO file entry, because there are so many different ways to split and
 
1077 quote multi-line strings, and even, to represent special characters
 
1078 by backslahsed escaped sequences.  Some features of PO mode rely on
 
1079 the ability for PO mode to scan an already existing PO file for a
 
1080 particular string encoded into the 
<CODE>msgid
</CODE> field of some entry.
 
1081 Even if PO mode has internally all the built-in machinery for
 
1082 implementing this recognition easily, doing it fast is technically
 
1083 difficult.  For facilitating a solution to this efficiency problem,
 
1084 we decided for a canonical representation for strings.
 
1088 A conventional representation of strings in a PO file is currently
 
1089 under discussion, and PO mode experiments a canonical representation.
 
1090 Having both 
<CODE>xgettext
</CODE> and PO mode converging towards a uniform
 
1091 way of representing equivalent strings would be useful, as the internal
 
1092 normalization needed by PO mode could be automatically satisfied
 
1093 when using 
<CODE>xgettext
</CODE> from GNU 
<CODE>gettext
</CODE>.  An explicit
 
1094 PO mode normalization should then be only necessary for PO files
 
1095 imported from elsewhere, or for when the convention itself evolves.
 
1099 So, for achieving normalization of at least the strings of a given
 
1100 PO file needing a canonical representation, the following PO mode
 
1101 command is available:
 
1106 <DT><KBD>M-x po-normalize
</KBD> 
1108 Tidy the whole PO file by making entries more uniform.
 
1113 The special command 
<KBD>M-x po-normalize
</KBD>, which has no associate
 
1114 keys, revises all entries, ensuring that strings of both original
 
1115 and translated entries use uniform internal quoting in the PO file.
 
1116 It also removes any crumb after the last entry.  This command may be
 
1117 useful for PO files freshly imported from elsewhere, or if we ever
 
1118 improve on the canonical quoting format we use.  This canonical format
 
1119 is not only meant for getting cleaner PO files, but also for greatly
 
1120 speeding up 
<CODE>msgid
</CODE> string lookup for some other PO mode commands.
 
1124 <KBD>M-x po-normalize
</KBD> presently makes three passes over the entries.
 
1125 The first implements heuristics for converting PO files for GNU
 
1126 <CODE>gettext
</CODE> 0.6 and earlier, in which 
<CODE>msgid
</CODE> and 
<CODE>msgstr
</CODE> 
1127 fields were using K
&R style C string syntax for multi-line strings.
 
1128 These heuristics may fail for comments not related to obsolete
 
1129 entries and ending with a backslash; they also depend on subsequent
 
1130 passes for finalizing the proper commenting of continued lines for
 
1131 obsolete entries.  This first pass might disappear once all oldish PO
 
1132 files would have been adjusted.  The second and third pass normalize
 
1133 all 
<CODE>msgid
</CODE> and 
<CODE>msgstr
</CODE> strings respectively.  They also
 
1134 clean out those trailing backslashes used by XView's 
<CODE>msgfmt
</CODE> 
1135 for continued lines.
 
1139 Having such an explicit normalizing command allows for importing PO
 
1140 files from other sources, but also eases the evolution of the current
 
1141 convention, evolution driven mostly by aesthetic concerns, as of now.
 
1142 It is all easy to make suggested adjustments at a later time, as the
 
1143 normalizing command and eventually, other GNU 
<CODE>gettext
</CODE> tools
 
1144 should greatly automate conformance.  A description of the canonical
 
1145 string format is given below, for the particular benefit of those not
 
1146 having GNU Emacs handy, and who would nevertheless want to handcraft
 
1147 their PO files in nice ways.
 
1151 Right now, in PO mode, strings are single line or multi-line.  A string
 
1152 goes multi-line if and only if it has 
<EM>embedded
</EM> newlines, that
 
1153 is, if it matches 
<SAMP>`[^\n]\n+[^\n]'
</SAMP>.  So, we would have:
 
1158 msgstr "\n\nHello, world!\n\n\n"
 
1162 but, replacing the space by a newline, this becomes:
 
1177 We are deliberately using a caricatural example, here, to make the
 
1178 point clearer.  Usually, multi-lines are not that bad looking.
 
1179 It is probable that we will implement the following suggestion.
 
1180 We might lump together all initial newlines into the empty string,
 
1181 and also all newlines introducing empty lines (that is, for 
<VAR>n
</VAR> 
1182 > 1, the 
<VAR>n
</VAR>-
1'th last newlines would go together on a separate
 
1183 string), so making the previous example appear:
 
1195 There are a few yet undecided little points about string normalization,
 
1196 to be documented in this manual, once these questions settle.
 
1201 <H1><A NAME=
"SEC13" HREF=
"gettext_toc.html#TOC13">Preparing Program Sources
</A></H1> 
1204 For the programmer, changes to the C source code fall into three
 
1205 categories.  First, you have to make the localization functions
 
1206 known to all modules needing message translation.  Second, you should
 
1207 properly trigger the operation of GNU 
<CODE>gettext
</CODE> when the program
 
1208 initializes, usually from the 
<CODE>main
</CODE> function.  Last, you should
 
1209 identify and especially mark all constant strings in your program
 
1210 needing translation.
 
1214 Presuming that your set of programs, or package, has been adjusted
 
1215 so all needed GNU 
<CODE>gettext
</CODE> files are available, and your
 
1216 <TT>`Makefile'
</TT> files are adjusted (see section 
<A HREF=
"gettext.html#SEC65">The Maintainer's View
</A>), each C module
 
1217 having translated C strings should contain the line:
 
1222 #include 
<libintl.h
> 
1226 The remaining changes to your C sources are discussed in the further
 
1227 sections of this chapter.
 
1233 <H2><A NAME=
"SEC14" HREF=
"gettext_toc.html#TOC14">Triggering 
<CODE>gettext
</CODE> Operations
</A></H2> 
1236 The initialization of locale data should be done with more or less
 
1237 the same code in every program, as demonstrated below:
 
1248   setlocale (LC_ALL, "");
 
1249   bindtextdomain (PACKAGE, LOCALEDIR);
 
1250   textdomain (PACKAGE);
 
1256 <VAR>PACKAGE
</VAR> and 
<VAR>LOCALEDIR
</VAR> should be provided either by
 
1257 <TT>`config.h'
</TT> or by the Makefile.  For now consult the 
<CODE>gettext
</CODE> 
1258 sources for more information.
 
1262 The use of 
<CODE>LC_ALL
</CODE> might not be appropriate for you.
 
1263 <CODE>LC_ALL
</CODE> includes all locale categories and especially
 
1264 <CODE>LC_CTYPE
</CODE>.  This later category is responsible for determining
 
1265 character classes with the 
<CODE>isalnum
</CODE> etc. functions from
 
1266 <TT>`ctype.h'
</TT> which could especially for programs, which process some
 
1267 kind of input language, be wrong.  For example this would mean that a
 
1268 source code using the  (cedille character) is runnable in
 
1269 France but not in the U.S.
 
1273 So it is sometimes necessary to replace the 
<CODE>LC_ALL
</CODE> line in the
 
1274 code above by a sequence of 
<CODE>setlocale
</CODE> lines
 
1281   setlocale (LC_TIME, "");
 
1282   setlocale (LC_MESSAGES, "");
 
1288 or to switch for and back to the character class in question.
 
1293 <H2><A NAME=
"SEC15" HREF=
"gettext_toc.html#TOC15">How Marks Appears in Sources
</A></H2> 
1296 The C sources should mark all strings requiring translation.  Marking
 
1297 is done in such a way that each translatable string appears to be
 
1298 the sole argument of some function or preprocessor macro.  There are
 
1299 only a few such possible functions or macros meant for translation,
 
1300 and their names are said to be marking keywords.  The marking is
 
1301 attached to strings themselves, rather than to what we do with them.
 
1302 This approach has more uses.  A blatant example is an error message
 
1303 produced by formatting.  The format string needs translation, as
 
1304 well as some strings inserted through some 
<SAMP>`%s'
</SAMP> specification
 
1305 in the format, while the result from 
<CODE>sprintf
</CODE> may have so many
 
1306 different instances that it is unpractical to list them all in some
 
1307 <SAMP>`error_string_out()'
</SAMP> routine, say.
 
1311 This marking operation has two goals.  The first goal of marking
 
1312 is for triggering the retrieval of the translation, at run time.
 
1313 The keyword are possibly resolved into a routine able to dynamically
 
1314 return the proper translation, as far as possible or wanted, for the
 
1315 argument string.  Most localizable strings are found into executable
 
1316 positions, that is, affected to variables or given as parameter to
 
1317 functions.  But this is not universal usage, and some translatable
 
1318 strings appear in structured initializations.  See section 
<A HREF=
"gettext.html#SEC17">Special Cases of Translatable Strings
</A>.
 
1322 The second goal of the marking operation is to help 
<CODE>xgettext
</CODE> 
1323 at properly extracting all translatable strings when it scans a set
 
1324 of program sources and produces PO file templates.
 
1328 The canonical keyword for marking translatable strings is
 
1329 <SAMP>`gettext'
</SAMP>, it gave its name to the whole GNU 
<CODE>gettext
</CODE> 
1330 package.  For packages making only light use of the 
<SAMP>`gettext'
</SAMP> 
1331 keyword, macro or function, it is easily used 
<EM>as is
</EM>.  However,
 
1332 for packages using the 
<CODE>gettext
</CODE> interface more heavily, it
 
1333 is usually more convenient giving the main keyword a shorter, less
 
1334 obtrusive name.  Indeed, the keyword might appear on a lot of strings
 
1335 all over the package, and programmers usually do not want nor need
 
1336 that their program sources remind them loud, all the time, that they
 
1337 are internationalized.  Further, a long keyword has the disadvantage
 
1338 of using more horizontal space, forcing more indentation work on
 
1339 sources for those trying to keep them within 
79 or 
80 columns.
 
1343 Many GNU packages use 
<SAMP>`_'
</SAMP> (a simple underline) as a keyword,
 
1344 and write 
<SAMP>`_("Translatable string")'
</SAMP> instead of 
<SAMP>`gettext
 
1345 ("Translatable string")'
</SAMP>.  Further, the usual GNU coding rule
 
1346 wanting that there is a space between the keyword and the opening
 
1347 parenthesis is relaxed, in practice, for this particular usage.
 
1348 So, the textual overhead per translatable string is reduced to
 
1349 only three characters: the underline and the two parentheses.
 
1350 However, even if GNU 
<CODE>gettext
</CODE> uses this convention internally,
 
1351 it does not offer it officially.  The real, genuine keyword is truly
 
1352 <SAMP>`gettext'
</SAMP> indeed.  It is fairly easy for those wanting to use
 
1353 <SAMP>`_'
</SAMP> instead of 
<SAMP>`gettext'
</SAMP> to declare:
 
1358 #include 
<libintl.h
> 
1359 #define _(String) gettext (String)
 
1363 instead of merely using 
<SAMP>`#include 
<libintl.h
>'
</SAMP>.
 
1367 Later on, the maintenance is relatively easy.  If, as a programmer,
 
1368 you add or modify a string, you will have to ask yourself if the
 
1369 new or altered string requires translation, and include it within
 
1370 <SAMP>`_()'
</SAMP> if you think it should be translated.  
<SAMP>`"%s: %d"'
</SAMP> is
 
1371 an example of string 
<EM>not
</EM> requiring translation!
 
1376 <H2><A NAME=
"SEC16" HREF=
"gettext_toc.html#TOC16">Marking Translatable Strings
</A></H2> 
1379 In PO mode, one set of features is meant more for the programmer than
 
1380 for the translator, and allows him to interactively mark which strings,
 
1381 in a set of program sources, are translatable, and which are not.
 
1382 Even if it is a fairly easy job for a programmer to find and mark
 
1383 such strings by other means, using any editor of his choice, PO mode
 
1384 makes this work more comfortable.  Further, this gives translators
 
1385 who feel a little like programmers, or programmers who feel a little
 
1386 like translators, a tool letting them work at marking translatable
 
1387 strings in the program sources, while simultaneously producing a set of
 
1388 translation in some language, for the package being internationalized.
 
1392 The set of program sources, aimed by the PO mode commands describe
 
1393 here, should have an Emacs tags table constructed for your project,
 
1394 prior to using these PO file commands.  This is easy to do.  In any
 
1395 shell window, change the directory to the root of your project, then
 
1396 execute a command resembling:
 
1401 etags src/*.[hc] lib/*.[hc]
 
1405 presuming here you want to process all 
<TT>`.h'
</TT> and 
<TT>`.c'
</TT> files
 
1406 from the 
<TT>`src/'
</TT> and 
<TT>`lib/'
</TT> directories.  This command will
 
1407 explore all said files and create a 
<TT>`TAGS'
</TT> file in your root
 
1408 directory, somewhat summarizing the contents using a special file
 
1409 format Emacs can understand.
 
1413 For official GNU packages which follow the GNU coding standard there is
 
1414 a make goal 
<CODE>tags
</CODE> or 
<CODE>TAGS
</CODE> which construct the tag files in
 
1415 all directories and for all files containing source code.
 
1419 Once your 
<TT>`TAGS'
</TT> file is ready, the following commands assist
 
1420 the programmer at marking translatable strings in his set of sources.
 
1421 But these commands are necessarily driven from within a PO file
 
1422 window, and it is likely that you do not even have such a PO file yet.
 
1423 This is not a problem at all, as you may safely open a new, empty PO
 
1424 file, mainly for using these commands.  This empty PO file will slowly
 
1425 fill in while you mark strings as translatable in your program sources.
 
1432 Search through program sources for a string which looks like a
 
1433 candidate for translation.
 
1437 Mark the last string found with 
<SAMP>`_()'
</SAMP>.
 
1441 Mark the last string found with a keyword taken from a set of possible
 
1442 keywords.  This command with a prefix allows some management of these
 
1448 The 
<KBD>,
</KBD> (
<CODE>po-tags-search
</CODE>) command search for the next
 
1449 occurrence of a string which looks like a possible candidate for
 
1450 translation, and displays the program source in another Emacs window,
 
1451 positioned in such a way that the string is near the top of this other
 
1452 window.  If the string is to big to fit whole in this window, it is
 
1453 rather positioned so only its end is shown.  In any case, the cursor
 
1454 is left in the PO file window.  If the shown string would be better
 
1455 presented differently in different native languages, you may mark it
 
1456 using 
<KBD>M-,
</KBD> or 
<KBD>M-.
</KBD>.  Otherwise, you might rather ignore it
 
1457 and skip to the next string by merely repeating the 
<KBD>,
</KBD> command.
 
1461 A string is a good candidate for translation if it contains a sequence
 
1462 of three or more letters.  A string containing at most two letters in
 
1463 a row will be considered as a candidate if it has more letters than
 
1464 non-letters.  The command disregards strings containing no letters,
 
1465 or isolated letters only.  It also disregards strings within comments,
 
1466 or strings already marked with some keyword PO mode knows (see below).
 
1470 If you have never told Emacs about some 
<TT>`TAGS'
</TT> file to use, the
 
1471 command will request that you specify one from the minibuffer, the
 
1472 first time you use the command.  You may later change your 
<TT>`TAGS'
</TT> 
1473 file by using the regular Emacs command 
<KBD>M-x visit-tags-table
</KBD>,
 
1474 which will ask you to name the precise 
<TT>`TAGS'
</TT> file you want
 
1475 to use.  See section `Tag Tables' in 
<CITE>The Emacs Editor
</CITE>.
 
1479 Each time you use the 
<KBD>,
</KBD> command, the search resumes where it was
 
1480 left over by the previous search, and goes through all program sources,
 
1481 obeying the 
<TT>`TAGS'
</TT> file, until all sources have been processed.
 
1482 However, by giving a prefix argument to the command (
<KBD>C-u
 
1483 ,)
</KBD>, you may request that the search be restarted all over again
 
1484 from the first program source; but in this case, strings that you
 
1485 recently marked as translatable will be automatically skipped.
 
1489 Using this 
<KBD>,
</KBD> command does not prevent using of other regular
 
1490 Emacs tags commands.  For example, regular 
<CODE>tags-search
</CODE> or
 
1491 <CODE>tags-query-replace
</CODE> commands may be used without disrupting the
 
1492 independent 
<KBD>,
</KBD> search sequence.  However, as implemented, the
 
1493 <EM>initial
</EM> <KBD>,
</KBD> command (or the 
<KBD>,
</KBD> command is used with a
 
1494 prefix) might also reinitialize the regular Emacs tags searching to the
 
1495 first tags file, this reinitialization might be considered spurious.
 
1499 The 
<KBD>M-,
</KBD> (
<CODE>po-mark-translatable
</CODE>) command will mark the
 
1500 recently found string with the 
<SAMP>`_'
</SAMP> keyword.  The 
<KBD>M-.
</KBD> 
1501 (
<CODE>po-select-mark-and-mark
</CODE>) command will request that you type
 
1502 one keyword from the minibuffer and use that keyword for marking
 
1503 the string.  Both commands will automatically create a new PO file
 
1504 untranslated entry for the string being marked, and make it the
 
1505 current entry (making it easy for you to immediately proceed to its
 
1506 translation, if you feel like doing it right away).  It is possible
 
1507 that the modifications made to the program source by 
<KBD>M-,
</KBD> or
 
1508 <KBD>M-.
</KBD> render some source line longer than 
80 columns, forcing you
 
1509 to break and re-indent this line differently.  You may use the 
<KBD>o
</KBD> 
1510 command from PO mode, or any other window changing command from
 
1511 GNU Emacs, to break out into the program source window, and do any
 
1512 needed adjustments.  You will have to use some regular Emacs command
 
1513 to return the cursor to the PO file window, if you want commanding
 
1514 <KBD>,
</KBD> for the next string, say.
 
1518 The 
<KBD>M-.
</KBD> command has a few built-in speedups, so you do not
 
1519 have to explicitly type all keywords all the time.  The first such
 
1520 speedup is that you are presented with a 
<EM>preferred
</EM> keyword,
 
1521 which you may accept by merely typing 
<KBD><KBD>RET
</KBD></KBD> at the prompt.
 
1522 The second speedup is that you may type any non-ambiguous prefix of the
 
1523 keyword you really mean, and the command will complete it automatically
 
1524 for you.  This also means that PO mode has to 
<EM>know
</EM> all
 
1525 your possible keywords, and that it will not accept mistyped keywords.
 
1529 If you reply 
<KBD>?
</KBD> to the keyword request, the command gives a
 
1530 list of all known keywords, from which you may choose.  When the
 
1531 command is prefixed by an argument (
<KBD>C-u M-.
</KBD>), it inhibits
 
1532 updating any program source or PO file buffer, and does some simple
 
1533 keyword management instead.  In this case, the command asks for a
 
1534 keyword, written in full, which becomes a new allowed keyword for
 
1535 later 
<KBD>M-.
</KBD> commands.  Moreover, this new keyword automatically
 
1536 becomes the 
<EM>preferred
</EM> keyword for later commands.  By typing
 
1537 an already known keyword in response to 
<KBD>C-u M-.
</KBD>, one merely
 
1538 changes the 
<EM>preferred
</EM> keyword and does nothing more.
 
1542 All keywords known for 
<KBD>M-.
</KBD> are recognized by the 
<KBD>,
</KBD> command
 
1543 when scanning for strings, and strings already marked by any of those
 
1544 known keywords are automatically skipped.  If many PO files are opened
 
1545 simultaneously, each one has its own independent set of known keywords.
 
1546 There is no provision in PO mode, currently, for deleting a known
 
1547 keyword, you have to quit the file (maybe using 
<KBD>q
</KBD>) and reopen
 
1548 it afresh.  When a PO file is newly brought up in an Emacs window, only
 
1549 <SAMP>`gettext'
</SAMP> and 
<SAMP>`_'
</SAMP> are known as keywords, and 
<SAMP>`gettext'
</SAMP> 
1550 is preferred for the 
<KBD>M-.
</KBD> command.  In fact, this is not useful to
 
1551 prefer 
<SAMP>`_'
</SAMP>, as this one is already built in the 
<KBD>M-,
</KBD> command.
 
1556 <H2><A NAME=
"SEC17" HREF=
"gettext_toc.html#TOC17">Special Cases of Translatable Strings
</A></H2> 
1559 The attentive reader might now point out that it is not always possible
 
1560 to mark translatable string with 
<CODE>gettext
</CODE> or something like this.
 
1561 Consider the following case:
 
1567   static const char *messages[] = {
 
1568     "some very meaningful message",
 
1574     = index 
> 1 ? "a default message" : messages[index];
 
1582 While it is no problem to mark the string 
<CODE>"a default message"</CODE> it
 
1583 is not possible to mark the string initializers for 
<CODE>messages
</CODE>.
 
1584 What is to do?  We have to fulfill two tasks.  First we have to mark the
 
1585 strings so that the 
<CODE>xgettext
</CODE> program (see section 
<A HREF=
"gettext.html#SEC19">Invoking the 
<CODE>xgettext
</CODE> Program
</A>)
 
1586 can find them, and second we have to translate the string at runtime
 
1587 before printing them.
 
1591 The first task can be fulfilled by creating a new keyword, which names a
 
1592 no-op.  For the second we have to mark all access points to a string
 
1593 from the array.  So one solution can look like this:
 
1598 #define gettext_noop(String) (String)
 
1601   static const char *messages[] = {
 
1602     gettext_noop ("some very meaningful message"),
 
1603     gettext_noop ("and another one")
 
1608     = index 
> 1 ? gettext ("a default message") : gettext (messages[index]);
 
1616 Please convince yourself that the string which is written by
 
1617 <CODE>fputs
</CODE> is translated in any case.  How to get 
<CODE>xgettext
</CODE> know
 
1618 the additional keyword 
<CODE>gettext_noop
</CODE> is explained in section 
<A HREF=
"gettext.html#SEC19">Invoking the 
<CODE>xgettext
</CODE> Program
</A>.
 
1622 The above is of course not the only solution.  You could also come along
 
1623 with the following one:
 
1628 #define gettext_noop(String) (String)
 
1631   static const char *messages[] = {
 
1632     gettext_noop ("some very meaningful message",
 
1633     gettext_noop ("and another one")
 
1638     = index 
> 1 ? gettext_noop ("a default message") : messages[index];
 
1640   fputs (gettext (string));
 
1646 But this has some drawbacks.  First the programmer has to take care that
 
1647 he uses 
<CODE>gettext_noop
</CODE> for the string 
<CODE>"a default message"</CODE>.
 
1648 A use of 
<CODE>gettext
</CODE> could have in rare cases unpredictable results.
 
1649 The second reason is found in the internals of the GNU 
<CODE>gettext
</CODE> 
1650 Library which will make this solution less efficient.
 
1654 One advantage is that you need not make control flow analysis to make
 
1655 sure the output is really translated in any case.  But this analysis is
 
1656 generally not very difficult.  If it should be in any situation you can
 
1657 use this second method in this situation.
 
1663 <H1><A NAME=
"SEC18" HREF=
"gettext_toc.html#TOC18">Making the Initial PO File
</A></H1> 
1667 <H2><A NAME=
"SEC19" HREF=
"gettext_toc.html#TOC19">Invoking the 
<CODE>xgettext
</CODE> Program
</A></H2> 
1671 xgettext [
<VAR>option
</VAR>] 
<VAR>inputfile
</VAR> ...
 
1676 <DT><SAMP>`-a'
</SAMP> 
1678 <DT><SAMP>`--extract-all'
</SAMP> 
1680 Extract all strings.
 
1682 <DT><SAMP>`-c [
<VAR>tag
</VAR>]'
</SAMP> 
1684 <DT><SAMP>`--add-comments[=
<VAR>tag
</VAR>]'
</SAMP> 
1686 Place comment block with 
<VAR>tag
</VAR> (or those preceding keyword lines)
 
1689 <DT><SAMP>`-C'
</SAMP> 
1691 <DT><SAMP>`--c++'
</SAMP> 
1693 Recognize C++ style comments.
 
1695 <DT><SAMP>`-d 
<VAR>name
</VAR>'
</SAMP> 
1697 <DT><SAMP>`--default-domain=
<VAR>name
</VAR>'
</SAMP> 
1699 Use 
<TT>`
<VAR>name
</VAR>.po'
</TT> for output (instead of 
<TT>`messages.po'
</TT>).
 
1701 <DT><SAMP>`-D 
<VAR>directory
</VAR>'
</SAMP> 
1703 <DT><SAMP>`--directory=
<VAR>directory
</VAR>'
</SAMP> 
1705 Change to 
<VAR>directory
</VAR> before beginning to search and scan source
 
1706 files.  The resulting 
<TT>`.po'
</TT> file will be written relative to the
 
1707 original directory, though.
 
1709 <DT><SAMP>`-f 
<VAR>file
</VAR>'
</SAMP> 
1711 <DT><SAMP>`--files-from=
<VAR>file
</VAR>'
</SAMP> 
1713 Read the names of the input files from 
<VAR>file
</VAR> instead of getting
 
1714 them from the command line.
 
1716 <DT><SAMP>`-h'
</SAMP> 
1718 <DT><SAMP>`--help'
</SAMP> 
1720 Display this help and exit.
 
1722 <DT><SAMP>`-I 
<VAR>list
</VAR>'
</SAMP> 
1724 <DT><SAMP>`--input-path=
<VAR>list
</VAR>'
</SAMP> 
1726 List of directories searched for input files.
 
1728 <DT><SAMP>`-j'
</SAMP> 
1730 <DT><SAMP>`--join-existing'
</SAMP> 
1732 Join messages with existing file.
 
1734 <DT><SAMP>`-k 
<VAR>word
</VAR>'
</SAMP> 
1736 <DT><SAMP>`--keyword[=
<VAR>word
</VAR>]'
</SAMP> 
1738 Additonal keyword to be looked for (without 
<VAR>word
</VAR> means not to
 
1739 use default keywords).
 
1741 The default keywords, which are always looked for if not explicitly
 
1742 disabled, are 
<CODE>gettext
</CODE>, 
<CODE>dgettext
</CODE>, 
<CODE>dcgettext
</CODE> and
 
1743 <CODE>gettext_noop
</CODE>.
 
1745 <DT><SAMP>`-m [
<VAR>string
</VAR>]'
</SAMP> 
1747 <DT><SAMP>`--msgstr-prefix[=
<VAR>string
</VAR>]'
</SAMP> 
1749 Use 
<VAR>string
</VAR> or "" as prefix for msgstr entries.
 
1751 <DT><SAMP>`-M [
<VAR>string
</VAR>]'
</SAMP> 
1753 <DT><SAMP>`--msgstr-suffix[=
<VAR>string
</VAR>]'
</SAMP> 
1755 Use 
<VAR>string
</VAR> or "" as suffix for msgstr entries.
 
1757 <DT><SAMP>`--no-location'
</SAMP> 
1759 Do not write 
<SAMP>`#: 
<VAR>filename
</VAR>:
<VAR>line
</VAR>'
</SAMP> lines.
 
1761 <DT><SAMP>`-n'
</SAMP> 
1763 <DT><SAMP>`--add-location'
</SAMP> 
1765 Generate 
<SAMP>`#: 
<VAR>filename
</VAR>:
<VAR>line
</VAR>'
</SAMP> lines (default).
 
1767 <DT><SAMP>`--omit-header'
</SAMP> 
1769 Don't write header with 
<SAMP>`msgid ""'
</SAMP> entry.
 
1771 This is useful for testing purposes because it eliminates a source
 
1772 of variance for generated 
<CODE>.gmo
</CODE> files.  We can ship some of
 
1773 these files in the GNU 
<CODE>gettext
</CODE> package, and the result of
 
1774 regenerating them through 
<CODE>msgfmt
</CODE> should yield the same values.
 
1776 <DT><SAMP>`-p 
<VAR>dir
</VAR>'
</SAMP> 
1778 <DT><SAMP>`--output-dir=
<VAR>dir
</VAR>'
</SAMP> 
1780 Output files will be placed in directory 
<VAR>dir
</VAR>.
 
1782 <DT><SAMP>`-s'
</SAMP> 
1784 <DT><SAMP>`--sort-output'
</SAMP> 
1786 Generate sorted output and remove duplicates.
 
1788 <DT><SAMP>`--strict'
</SAMP> 
1790 Write out strict Uniforum conforming PO file.
 
1792 <DT><SAMP>`-v'
</SAMP> 
1794 <DT><SAMP>`--version'
</SAMP> 
1796 Output version information and exit.
 
1798 <DT><SAMP>`-x 
<VAR>file
</VAR>'
</SAMP> 
1800 <DT><SAMP>`--exclude-file=
<VAR>file
</VAR>'
</SAMP> 
1802 Entries from 
<VAR>file
</VAR> are not extracted.
 
1807 Search path for supplementary PO files is:
 
1808 <TT>`/usr/local/share/nls/src/'
</TT>.
 
1812 If 
<VAR>inputfile
</VAR> is 
<SAMP>`-'
</SAMP>, standard input is read.
 
1816 This implementation of 
<CODE>xgettext
</CODE> is able to process a few awkward
 
1817 cases, like strings in preprocessor macros, ANSI concatenation of
 
1818 adjacent strings, and escaped end of lines for continued strings.
 
1823 <H2><A NAME=
"SEC20" HREF=
"gettext_toc.html#TOC20">C Sources Context
</A></H2> 
1826 PO mode is particularily powerful when used with PO files
 
1827 created through GNU 
<CODE>gettext
</CODE> utilities, as those utilities
 
1828 insert special comments in the PO files they generate.
 
1829 Some of these special comments relate the PO file entry to
 
1830 exactly where the untranslated string appears in the program sources.
 
1834 When the translator gets to an untranslated entry, she is fairly
 
1835 often faced with an original string which is not as informative as
 
1836 it normally should, being succinct, cryptic, or otherwise ambiguous.
 
1837 Before chosing how to translate the string, she needs to understand
 
1838 better what the string really means and how tight the translation has
 
1839 to be.  Most of times, when problems arise, the only way left to make
 
1840 her judgment is looking at the true program sources from where this
 
1841 string originated, searching for surrounding comments the programmer
 
1842 might have put in there, and looking around for helping clues of
 
1847 Surely, when looking at program sources, the translator will receive
 
1848 more help if she is a fluent programmer.  However, even if she is
 
1849 not versed in programming and feels a little lost in C code, the
 
1850 translator should not be shy at taking a look, once in a while.
 
1851 It is most probable that she will still be able to find some of the
 
1852 hints she needs.  She will learn quickly to not feel uncomfortable
 
1853 in program code, paying more attention to programmer's comments,
 
1854 variable and function names (if he dared chosing them well), and
 
1855 overall organization, than to programmation itself.
 
1859 The following commands are meant to help the translator at getting
 
1860 program source context for a PO file entry.
 
1867 Resume the display of a program source context, or cycle through them.
 
1871 Display of a program source context selected by menu.
 
1875 Add a directory to the search path for source files.
 
1879 Delete a directory from the search path for source files.
 
1884 The commands 
<KBD>c
</KBD> (
<CODE>po-cycle-reference
</CODE>) and 
<KBD>M-c
</KBD> 
1885 (
<CODE>po-select-reference
</CODE>) both open another window displaying
 
1886 some source program file, and already positioned in such a way that
 
1887 it shows an actual use of the current string to translate.  By doing
 
1888 so, the command gives source program context for the string.  But if
 
1889 the entry has no source context references, or if all references
 
1890 are unresolved along the search path for program sources, then the
 
1891 command diagnoses this as an error.
 
1895 Even if 
<KBD>c
</KBD> (or 
<KBD>M-c
</KBD>) opens a new window, the cursor stays
 
1896 in the PO file window.  If the translator really wants to
 
1897 get into the program source window, she ought to do it explicitly,
 
1898 maybe by using command 
<KBD>o
</KBD>.
 
1902 When 
<KBD>c
</KBD> is typed for the first time, or for a PO file entry which
 
1903 is different of the last one used for getting source context, then the
 
1904 command reacts by giving the first context available for this entry,
 
1905 if any.  If some context has already been recently displayed for the
 
1906 current PO file entry, and the translator wandered to do other
 
1907 things, typing 
<KBD>c
</KBD> again will merely resume, in another window,
 
1908 the context last displayed.  In particular, if the translator moved
 
1909 the cursor away from the context in the source file, the command will
 
1910 bring the cursor back to the context.  By using 
<KBD>c
</KBD> many times
 
1911 in a row, with no interning other commands, PO mode will cycle to
 
1912 the next available contexts for this particular entry, getting back
 
1913 to the first context once the last has been shown.
 
1917 The command 
<KBD>M-c
</KBD> behaves differently.  Instead of cycling through
 
1918 references, it lets the translator choose of particular reference among
 
1919 many, and displays that reference.  It is best used with completion,
 
1920 if the translator types 
<KBD>TAB
</KBD> immediately after 
<KBD>M-c
</KBD>, in
 
1921 response to the question, she will be offered a menu of all possible
 
1922 references, as a reminder of which are the acceptable answers.
 
1923 This command is useful only where there are really many contexts
 
1924 available for a single string to translate.
 
1928 Program source files are usually found relative to where the PO
 
1929 file stands.  As a special provision, when this fails, the file is
 
1930 also looked for, but relative to the directory immediately above it.
 
1931 Those two cases take proper care of most PO files.  However, it might
 
1932 happen that a PO file has been moved, or is edited in a different
 
1933 place than its normal location.  When this happens, the translator
 
1934 should tell PO mode in which directory normally sits the genuine PO
 
1935 file.  Many such directories may be specified, and all together, they
 
1936 constitute what is called the 
<STRONG>search path
</STRONG> for program sources.
 
1937 The command 
<KBD>d
</KBD> (
<CODE>po-add-path
</CODE>) is used to interactively
 
1938 enter a new directory at the front of the search path, and the command
 
1939 <KBD>M-d
</KBD> (
<CODE>po-delete-path
</CODE>) is used to select, with completion,
 
1940 one of the directories she does not want anymore on the search path.
 
1945 <H2><A NAME=
"SEC21" HREF=
"gettext_toc.html#TOC21">Using Translation Compendiums
</A></H2> 
1948 Compendiums are yet to be implemented.
 
1952 An incoming PO mode feature will let the translator maintain a
 
1953 compendium of already achieved translations.  A 
<STRONG>compendium
</STRONG> 
1954 is a special PO file containing a set of translations recurring in
 
1955 many different packages.  The translator will be given commands for
 
1956 adding entries to her compendium, and later initializing untranslated
 
1957 entries, or updating already translated entries, from translations
 
1958 kept in the compendium.  For this to work, however, the compendium
 
1959 would have to be normalized.  See section 
<A HREF=
"gettext.html#SEC12">Normalizing Strings in Entries
</A>.
 
1965 <H1><A NAME=
"SEC22" HREF=
"gettext_toc.html#TOC22">Updating Existing PO Files
</A></H1> 
1969 <H2><A NAME=
"SEC23" HREF=
"gettext_toc.html#TOC23">Invoking the 
<CODE>tupdate
</CODE> Program
</A></H2> 
1975 tupdate 
<VAR>new
</VAR> <VAR>old
</VAR> 
1979 File 
<VAR>new
</VAR> is the last created PO file (generally by
 
1980 <CODE>xgettext
</CODE>).  It need not contain any translations.  File
 
1981 <VAR>old
</VAR> is the PO file including the old translations which will
 
1982 be taken over to the newly created file as long as they still match.
 
1986 When English messages change in the programs, this is reflected in
 
1987 the PO file as extracted by 
<CODE>xgettext
</CODE>.  In large messages, that
 
1988 can be hard to detect, and will obviously result in an incomplete
 
1989 translation.  One of the virtues of 
<CODE>tupdate
</CODE> is that it detects
 
1990 such changes, saving the previous translation into a PO file comment,
 
1991 so marking the entry as obsolete, and giving the modified string with
 
1992 an empty translation, that is, marking the entry as untranslated.
 
1997 <H2><A NAME=
"SEC24" HREF=
"gettext_toc.html#TOC24">Untranslated Entries
</A></H2> 
2000 When 
<CODE>xgettext
</CODE> originally creates a PO file, unless told
 
2001 otherwise, it initializes the 
<CODE>msgid
</CODE> field with the untranslated
 
2002 string, and leaves the 
<CODE>msgstr
</CODE> string to be empty.  Such entries,
 
2003 having an empty translation, are said to be 
<STRONG>untranslated
</STRONG> entries.
 
2004 Later, when the programmer slightly modifies some string right in
 
2005 the program, this change is later reflected in the PO file
 
2006 by the appearance of a new untranslated entry for the modified string.
 
2010 The usual commands moving from entry to entry consider untranslated
 
2011 entries on the same level as active entries.  Untranslated entries
 
2012 are easily recognizable by the fact they end with 
<SAMP>`msgstr ""'
</SAMP>.
 
2016 The work of the translator might be (quite naively) seen as the process
 
2017 of seeking after an untranslated entry, editing a translation for
 
2018 it, and repeating these actions until no untranslated entries remain.
 
2019 Some commands are more specifically related to untranslated entry
 
2027 Find the next untranslated entry.
 
2031 Find the previous untranslated entry.
 
2035 Turn the current entry into an untranslated one.
 
2040 The commands 
<KBD>e
</KBD> (
<CODE>po-next-empty-entry
</CODE>) and 
<KBD>M-e
</KBD> 
2041 (
<CODE>po-previous-empty
</CODE>) move forwards or backwards, chasing for an
 
2042 obsolete entry.  If none is found, the search is extended and wraps
 
2043 around in the PO file buffer.
 
2047 An entry can be turned back into an untranslated entry by
 
2048 merely emptying its translation, using the command 
<KBD>k
</KBD> 
2049 (
<CODE>po-kill-msgstr
</CODE>).  See section 
<A HREF=
"gettext.html#SEC26">Modifying Translations
</A>.
 
2053 Also, when time comes to quit working on a PO file buffer
 
2054 with the 
<KBD>q
</KBD> command, the translator is asked for confirmation,
 
2055 if some untranslated string still exists.
 
2060 <H2><A NAME=
"SEC25" HREF=
"gettext_toc.html#TOC25">Obsolete Entries
</A></H2> 
2063 By 
<STRONG>obsolete
</STRONG> PO file entries, we mean those entries which are
 
2064 commented out, usually by 
<CODE>tupdate
</CODE> when it found that the
 
2065 translation is not needed anymore by the package being localized.
 
2069 The usual commands moving from entry to entry consider obsolete
 
2070 entries on the same level as active entries.  Obsolete entries are
 
2071 easily recognizable by the fact that all their lines start with
 
2072 <KBD>#
</KBD>, even those lines containing 
<CODE>msgid
</CODE> or 
<CODE>msgstr
</CODE>.
 
2076 Commands exist for emptying the translation or reinitializing it
 
2077 to the original untranslated string.  Commands interfacing with the
 
2078 kill ring may force some previously saved text into the translation.
 
2079 The user may interactively edit the translation.  All these commands
 
2080 may apply to obsolete entries, carefully leaving the entry obsolete
 
2085 Moreover, some commands are more specifically related to obsolete
 
2093 <DT><KBD>M-
<KBD>SPC
</KBD></KBD> 
2095 Find the next obsolete entry.
 
2099 <DT><KBD>M-
<KBD>DEL
</KBD></KBD> 
2101 Find the previous obsolete entry.
 
2105 Make an active entry obsolete, or zap out an obsolete entry.
 
2110 The commands 
<KBD>M-n
</KBD> (
<CODE>po-next-obsolete-entry
</CODE>) and 
<KBD>M-p
</KBD> 
2111 (
<CODE>po-previous-obsolete-entry
</CODE>) move forwards or backwards,
 
2112 chasing for an obsolete entry.  If none is found, the search is
 
2113 extended and wraps around in the PO file buffer.  The commands
 
2114 <KBD>M-
<KBD>SPC
</KBD></KBD> and 
<KBD>M-
<KBD>DEL
</KBD></KBD> are synonymous to 
<KBD>M-n
</KBD> 
2115 and 
<KBD>M-p
</KBD>, respectively.
 
2119 PO mode does not provide ways for un-commenting an obsolete entry
 
2120 and making it active, because this would reintroduce an original
 
2121 untranslated string which does not correspond to any marked string
 
2122 in the program sources.  This goes with the philosophy of never
 
2123 introducing useless 
<CODE>msgid
</CODE> values.
 
2127 However, it is possible to comment out an active entry, so making
 
2128 it obsolete.  GNU 
<CODE>gettext
</CODE> utilities will later react to the
 
2129 disappearance of a translation by using the untranslated string.
 
2130 The command 
<KBD>z
</KBD> (
<CODE>po-fade-out-entry
</CODE>) pushes the current entry
 
2131 a little further towards annihilation.  If the entry is active, then
 
2132 the entry is merely commented out.  If the entry is already obsolete,
 
2133 then it is completely deleted from the PO file.  It is easy to recycle
 
2134 the translation so deleted into some other PO file entry, usually
 
2135 one which is untranslated.  See section 
<A HREF=
"gettext.html#SEC26">Modifying Translations
</A>.
 
2139 Here is a quite interesting problem to solve for later development of
 
2140 PO mode, for those nights you are not sleepy.  The idea would be that
 
2141 PO mode might become bright enough, one of these days, to make good
 
2142 guesses at retrieving the most probable candidate, among all obsolete
 
2143 entries, for initializing the translation of a newly appeared string.
 
2144 I think it might be a quite hard problem to do this algorithmically, as
 
2145 we have to develop good and efficient measures of string similarity.
 
2146 Right now, PO mode completely lets the decision to the translator,
 
2147 when the time comes to find the adequate obsolete translation, it
 
2148 merely tries to provide handy tools for helping her to do so.
 
2153 <H2><A NAME=
"SEC26" HREF=
"gettext_toc.html#TOC26">Modifying Translations
</A></H2> 
2156 PO mode prevents direct edition of the PO file, by the usual
 
2157 means Emacs give for altering a buffer's contents.  By doing so,
 
2158 it pretends helping the translator to avoid little clerical errors
 
2159 about the overall file format, or the proper quoting of strings,
 
2160 as those errors would be easily made.  Other kinds of errors are
 
2161 still possible, but some may be catched and diagnosed by the batch
 
2162 validation process, which the translator may always trigger by the
 
2163 <KBD>v
</KBD> command.  For all other errors, the translator has to rely on
 
2164 her own judgment, and also on the linguistic reports submitted to her
 
2165 by the users of the translated package, having the same mother tongue.
 
2169 When the time comes to create a translation, correct a error diagnosed
 
2170 mechanically or reported by a user, the translator have to resort to
 
2171 using the following commands for modifying the translations.
 
2178 Interactively edit the translation.
 
2182 Reinitialize the translation with the original, untranslated string.
 
2186 Save the translation on the kill ring, and delete it.
 
2190 Save the translation on the kill ring, without deleting it.
 
2194 Replace the translation, taking the new from the kill ring.
 
2199 The command 
<KBD>RET
</KBD> (
<CODE>po-edit-msgstr
</CODE>) opens a new Emacs
 
2200 window containing a copy of the translation taken from the current
 
2201 PO file entry, all ready for edition, fully modifiable
 
2202 and with the complete extent of GNU Emacs modifying commands.
 
2203 The string is presented to the translator expunged of all quoting
 
2204 marks, and she will modify the 
<EM>unquoted
</EM> string in this
 
2205 window to heart's content.  Once done, the regular Emacs command
 
2206 <KBD>M-C-c
</KBD> (
<CODE>exit-recursive-edit
</CODE>) may be used to return the
 
2207 edited translation into the PO file, replacing the original
 
2208 translation.  The keys 
<KBD>C-c C-c
</KBD> are bound so they have the
 
2209 same effect as 
<KBD>M-C-c
</KBD>.
 
2213 If the translator becomes unsatisfied with her translation to the
 
2214 extent she prefers keeping the translation which was existent prior to
 
2215 the 
<KBD>RET
</KBD> command, she may use the regular Emacs command 
<KBD>C-]
</KBD> 
2216 (
<CODE>abort-recursive-edit
</CODE>) to merely get rid of edition, while
 
2217 preserving the original translation.  Another way would be for her
 
2218 to exit normally with 
<KBD>C-c C-c
</KBD>, then type 
<CODE>u
</CODE> once for
 
2219 undoing the whole effect of last edition.
 
2223 While editing her translation, the translator should pay attention at
 
2224 not inserting unwanted 
<KBD><KBD>RET
</KBD></KBD> (carriage returns) characters at
 
2225 the end of the translated string if those are not meant to be there,
 
2226 or removing such characters when they are required.  Since these
 
2227 characters are not visible in the editing buffer, they are easily to
 
2228 introduce by mistake.  To help her, 
<KBD><KBD>RET
</KBD></KBD> automatically puts
 
2229 the character 
<KBD><</KBD> at the end of the string being edited, but this
 
2230 <KBD><</KBD> is not really part of the string.  On exiting the editing
 
2231 window with 
<KBD>C-c C-c
</KBD>, PO mode automatically removes such
 
2232 <KBD><</KBD> and all whitespace added after it.  If the translator adds
 
2233 characters after the terminating 
<KBD><</KBD>, it looses its delimiting
 
2234 property and integrally becomes part of the string.  If she removes
 
2235 the delimiting 
<KBD><</KBD>, then the edited string is taken 
<EM>as
 
2236 is
</EM>, with all trailing newlines, even if invisible.  Also, if the
 
2237 translated string ought to end itself with a genuine 
<KBD><</KBD>, then the
 
2238 delimiting 
<KBD><</KBD> may not be removed; so the string should appear,
 
2239 in the editing window, as ending with two 
<KBD><</KBD> in a row.
 
2243 When a translation (or a comment) is being edited, the translator
 
2244 may move the cursor back into the PO file buffer and freely
 
2245 move to other entries, and browsing at will.  The edited entry will
 
2246 be recovered as soon as the edit ceases, because this is this entry
 
2247 only which is being modified.  If, with an edition still opened, the
 
2248 translator wanders in the PO file buffer, she cannot modify
 
2249 any other entry.  If she tries to, PO mode will react by suggesting
 
2250 that she aborts the current edit, or else, by inviting her to finish
 
2251 the current edit prior to any other modification.
 
2255 The command 
<KBD>TAB
</KBD> (
<CODE>po-msgid-to-msgstr
</CODE>) initializes, or
 
2256 reinitializes the translation with the original string.  This command
 
2257 is normally used when the translator wants to redo a fresh translation
 
2258 of the original string, disregarding any previous work.
 
2262 In fact, whether it is best to start a translation with an empty
 
2263 string, or rather with a copy of the original string, is a matter of
 
2264 taste or habit.  Sometimes, the source mother tongue language and the
 
2265 target language are so different that is simply best to start writing
 
2266 on an empty page.  At other times, the source and target languages
 
2267 are so close that it would be a waste to retype a number of words
 
2268 already being written in the original string.  A translator may also
 
2269 like having the original string right under her eyes, as she will
 
2270 progressively overwrite the original text with the translation, even
 
2271 if this requires some extra editing work to get rid of the original.
 
2275 The command 
<KBD>k
</KBD> (
<CODE>po-kill-msgstr
</CODE>) merely empties the
 
2276 translation string, so turning the entry into an untranslated
 
2277 one.  But while doing so, its previous contents is put apart in
 
2278 a special place, known as the kill ring.  The command 
<KBD>w
</KBD> 
2279 (
<CODE>po-kill-ring-save-msgstr
</CODE>) has also the effect of taking a
 
2280 copy of the translation onto the kill ring, but it otherwise leaves
 
2281 the entry alone, and does 
<EM>not
</EM> remove the translation from the
 
2282 entry.  Both commands use exactly the Emacs kill ring, which is shared
 
2283 between buffers, and which is well known already to GNU Emacs lovers.
 
2287 The translator may use 
<KBD>k
</KBD> or 
<KBD>w
</KBD> many times in the course
 
2288 of her work, as the kill ring may hold several saved translations.
 
2289 From the kill ring, strings may later be reinserted in various
 
2290 Emacs buffers.  In particular, the kill ring may be used for moving
 
2291 translation strings between different entries of a single PO file
 
2292 buffer, or if the translator is handling many such buffers at once,
 
2293 even between PO files.
 
2297 To facilitate exchanges with buffers which are not in PO mode, the
 
2298 translation string put on the kill ring by the 
<KBD>k
</KBD> command is fully
 
2299 unquoted before being saved: external quotes are removed, multi-lines
 
2300 strings are concatenated, and backslashed escaped sequences are turned
 
2301 into their corresponding characters.  In the special case of obsolete
 
2302 entries, the translation is also uncommented prior to saving.
 
2306 The command 
<KBD>y
</KBD> (
<CODE>po-yank-msgstr
</CODE>) completely replaces the
 
2307 translation of the current entry by a string taken from the kill ring.
 
2308 Following GNU Emacs terminology, we then say that the replacement
 
2309 string is 
<STRONG>yanked
</STRONG> into the PO file buffer.
 
2310 See section `Yanking' in 
<CITE>The Emacs Editor
</CITE>.
 
2311 The first time 
<KBD>y
</KBD> is used, the translation receives the value of
 
2312 the most recent addition to the kill ring.  If 
<KBD>y
</KBD> is typed once
 
2313 again, immediately, without intervening keystrokes, the translation
 
2314 just inserted is taken away and replaced by the second most recent
 
2315 addition to the kill ring.  By repeating 
<KBD>y
</KBD> many times in a row,
 
2316 the translator may travel along the kill ring for saved strings,
 
2317 until she finds the string she really wanted.
 
2321 When a string is yanked into a PO file entry, it is fully and
 
2322 automatically requoted for complying with the format PO files should
 
2323 have.  Further, if the entry is obsolete, PO mode then appropriately
 
2324 push the inserted string inside comments.  Once again, translators
 
2325 should not burden themselves with quoting considerations besides, of
 
2326 course, the necessity of the translated string itself respective to
 
2327 the program using it.
 
2331 Note that 
<KBD>k
</KBD> or 
<KBD>w
</KBD> are not the only commands pushing strings
 
2332 on the kill ring, as almost any PO mode command replacing translation
 
2333 strings (or the translator comments) automatically save the old string
 
2334 on the kill ring.  The main exceptions to this general rule are the
 
2335 yanking commands themselves.
 
2339 To better illustrate the operation of killing and yanking, let's
 
2340 use an actual example, taken from a common situation.  When the
 
2341 programmer slightly modifies some string right in the program, his
 
2342 change is later reflected in the PO file by the appearance
 
2343 of a new untranslated entry for the modified string, and the fact
 
2344 that the entry translating the original or unmodified string becomes
 
2345 obsolete.  In many cases, the translator might spare herself some work
 
2346 by retrieving the unmodified translation from the obsolete entry,
 
2347 then initializing the untranslated entry 
<CODE>msgstr
</CODE> field with
 
2348 this retrieved translation.  Once this done, the obsolete entry is
 
2349 not wanted anymore, and may be safely deleted.
 
2353 When the translator finds an untranslated entry and suspects that a
 
2354 slight variant of the translation exists, she immediately uses 
<KBD>m
</KBD> 
2355 to mark the current entry location, then starts chasing obsolete
 
2356 entries with 
<KBD>M-SPC
</KBD>, hoping to find some translation corresponding
 
2357 to the unmodified string.  Once found, she uses the 
<KBD>z
</KBD> command
 
2358 for deleting the obsolete entry, knowing that 
<KBD>z
</KBD> also 
<EM>kills
</EM> 
2359 the translation, that is, pushes the translation on the kill ring.
 
2360 Then, 
<KBD>l
</KBD> returns to the initial untranslated entry, 
<KBD>y
</KBD> 
2361 then 
<EM>yanks
</EM> the saved translation right into the 
<CODE>msgstr
</CODE> 
2362 field.  The translator is then free to use 
<KBD><KBD>RET
</KBD></KBD> for fine
 
2363 tuning the translation contents, and maybe to later use 
<KBD>e
</KBD>,
 
2364 then 
<KBD>m
</KBD> again, for going on with the next untranslated string.
 
2368 When some sequence of keys has to be typed over and over again, the
 
2369 translator may find comfortable to become more acquainted with the GNU
 
2370 Emacs capability of learning these sequences and playing them back under
 
2371 request.  See section `Keyboard Macros' in 
<CITE>The Emacs Editor
</CITE>.
 
2376 <H2><A NAME=
"SEC27" HREF=
"gettext_toc.html#TOC27">Modifying Comments
</A></H2> 
2379 Any translation work done seriously will raise many linguistic
 
2380 difficulties, for which decisions have to be made, and the choices
 
2381 further documented.  These documents may be saved within the
 
2382 PO file in form of translator comments, which the translator
 
2383 is free to create, delete, or modify at will.  These comments may
 
2384 be useful to herself when she returns to this PO file after a while.
 
2389 These commands are somewhat similar to those modifying translations,
 
2390 so the general indications given for these apply here.  See section 
<A HREF=
"gettext.html#SEC26">Modifying Translations
</A>.
 
2395 <DT><KBD>M-RET
</KBD> 
2397 Interactively edit the translator comments.
 
2401 Save the translator comments on the kill ring, and delete it.
 
2405 Save the translator comments on the kill ring, without deleting it.
 
2409 Replace the translator comments, taking the new from the kill ring.
 
2414 Those commands parallel PO mode commands for modifying the translation
 
2415 strings, and behave much the same way as them, except that they handle
 
2416 this part of PO file comments meant for translator usage, rather
 
2417 than the translation strings.  So, the descriptions given below are
 
2418 slightly succinct, because the full details have already been given.
 
2419 See section 
<A HREF=
"gettext.html#SEC26">Modifying Translations
</A>.
 
2423 The command 
<KBD>M-RET
</KBD> (
<CODE>po-edit-comment
</CODE>) opens a new Emacs
 
2424 window containing a copy of the translator comments the current
 
2425 PO file entry.  If there is no such comments, PO mode
 
2426 understands that the translator wants to add a comment to the entry,
 
2427 and she is presented an empty screen.  Comment marks (
<KBD>#
</KBD>) and
 
2428 the space following them are automatically removed before edition,
 
2429 and reinstated after.  For translator comments pertaining to obsolete
 
2430 entries, the uncommenting and recommenting operations are done twice.
 
2431 The command 
<KBD>#
</KBD> also has the same effect as 
<KBD>M-RET
</KBD>, and might
 
2432 be easier to type.  Once in the editing window, the keys 
<KBD>C-c
 
2433 C-c
</KBD> allow the translator to tell she is finished with editing
 
2438 The command 
<KBD>M-k
</KBD> (
<CODE>po-kill-comment
</CODE>) get rid of all
 
2439 translator comments, while saving those comments on the kill ring.
 
2440 The command 
<KBD>M-w
</KBD> (
<CODE>po-kill-ring-save-comment
</CODE>) takes
 
2441 a copy of the translator comments on the kill ring, but leaves
 
2442 them undisturbed in the current entry.  The command 
<KBD>M-y
</KBD> 
2443 (
<CODE>po-yank-comment
</CODE>) completely replaces the translator comments
 
2444 by a string taken at the front of the kill ring.  When this command
 
2445 is immediately repeated, the comments just inserted are withdrawn,
 
2446 and replaced by other strings taken along the kill ring.
 
2450 On the kill ring, all strings have the same nature.  There is no
 
2451 distinction between 
<EM>translation
</EM> strings and 
<EM>translator
 
2452 comments
</EM> strings.  So, for example, let's presume the translator
 
2453 has just finished editing a translation, and wants to create a new
 
2454 translator comments for documenting why the previous translation was
 
2455 not good, just to remember what was the problem.  Foreseeing that she
 
2456 will do that in her documentation, the translator will want to quote
 
2457 the previous translation in her translator comments.  For doing so, she
 
2458 may initialize the translator comments with the previous translation,
 
2459 still at the head of the kill ring.  Because editing already pushed the
 
2460 previous translation on the kill ring, she just has to type 
<KBD>M-w
</KBD> 
2461 prior to 
<KBD>#
</KBD>, and the previous translation will be right there,
 
2462 all ready for being introduced by some explanatory text.
 
2466 On the other hand, presume there are some translator comments already
 
2467 and that the translator wants to add to those comments, instead
 
2468 of wholly replacing them.  Then, she should edit the comment right
 
2469 away with 
<KBD>#
</KBD>.  Once inside the editing window, she can use the
 
2470 regular GNU Emacs commands 
<KBD>C-y
</KBD> (
<CODE>yank
</CODE>) and 
<KBD>M-y
</KBD> 
2471 (
<CODE>yank-pop
</CODE>) for getting the previous translation where she likes.
 
2476 <H2><A NAME=
"SEC28" HREF=
"gettext_toc.html#TOC28">Consulting Auxiliary PO Files
</A></H2> 
2479 An incoming feature of PO mode should help the knowledgeable translator
 
2480 to take advantage of translations already achieved in other languages
 
2481 she just happens to know, by providing these other language translation
 
2482 as additional context for her own work.  Each PO file existing for
 
2483 the same package the translator is working on, but targeted to a
 
2484 different mother tongue language, is called an 
<STRONG>auxiliary
</STRONG> PO file.
 
2485 Commands will exist for declaring and handling auxiliary PO files,
 
2486 and also for showing contexts for the entry under work.  For this to
 
2487 work fully, all auxiliary PO files will have to be normalized.
 
2492 <H1><A NAME=
"SEC29" HREF=
"gettext_toc.html#TOC29">Producing Binary MO Files
</A></H1> 
2496 <H2><A NAME=
"SEC30" HREF=
"gettext_toc.html#TOC30">Invoking the 
<CODE>msgfmt
</CODE> Program
</A></H2> 
2500 Usage: msgfmt [
<VAR>option
</VAR>] 
<VAR>filename
</VAR>.po ...
 
2505 <DT><SAMP>`-a 
<VAR>number
</VAR>'
</SAMP> 
2507 <DT><SAMP>`--alignment=
<VAR>number
</VAR>'
</SAMP> 
2509 Align strings to 
<VAR>number
</VAR> bytes (default: 
1).
 
2511 <DT><SAMP>`-h'
</SAMP> 
2513 <DT><SAMP>`--help'
</SAMP> 
2515 Display this help and exit.
 
2517 <DT><SAMP>`-I 
<VAR>list
</VAR>'
</SAMP> 
2519 <DT><SAMP>`--input-path=
<VAR>list
</VAR>'
</SAMP> 
2521 List of directories searched for input files.
 
2523 <DT><SAMP>`--no-hash'
</SAMP> 
2525 Binary file will not include the hash table.
 
2527 <DT><SAMP>`-o 
<VAR>file
</VAR>'
</SAMP> 
2529 <DT><SAMP>`--output-file=
<VAR>file
</VAR>'
</SAMP> 
2531 Specify output file name as 
<VAR>file
</VAR>.
 
2533 <DT><SAMP>`-v'
</SAMP> 
2535 <DT><SAMP>`--verbose'
</SAMP> 
2537 Detect and diagnose input file anomalies which might represent
 
2538 translation errors.  The 
<CODE>msgid
</CODE> and 
<CODE>msgstr
</CODE> strings are
 
2539 studied and compared.  It is considered abnormal that one string
 
2540 starts or ends with a newline while the other does not.  Also, both
 
2541 strings should have the same number of 
<SAMP>`%'
</SAMP> format specifiers,
 
2542 with matching types.  For example, the check will diagnose using
 
2543 <SAMP>`%.*s'
</SAMP> against 
<SAMP>`%s'
</SAMP>, or 
<SAMP>`%d'
</SAMP> against 
<SAMP>`%s'
</SAMP>, or
 
2544 <SAMP>`%d'
</SAMP> against 
<SAMP>`%x'
</SAMP>.  It can even handle positional parameters.
 
2546 <DT><SAMP>`-V'
</SAMP> 
2548 <DT><SAMP>`--version'
</SAMP> 
2550 Output version information and exit.
 
2555 If input file is 
<SAMP>`-'
</SAMP>, standard input is read.  If output file
 
2556 is 
<SAMP>`-'
</SAMP>, output is written to standard output.
 
2560 The search patch for 
<CODE>msgfmt
</CODE> is 
<TT>`/usr/local/share/nls/src/'
</TT>,
 
2561 by default.  It represents the path to additional directories where
 
2562 other PO files can be found.  This feature could be used for some
 
2563 PO files for standard libraries, in case we would like to spare
 
2564 translating their strings over and over again.  The 
<SAMP>`-x'
</SAMP> option
 
2565 could then exclude these strings from the generation.
 
2570 <H2><A NAME=
"SEC31" HREF=
"gettext_toc.html#TOC31">The Format of GNU MO Files
</A></H2> 
2573 The format of the generated MO files is best described by a picture,
 
2574 which appears below.
 
2578 The first two words serve the identification of the file.  The magic
 
2579 number will always signal GNU MO files.  The number is stored in the
 
2580 byte order of the generating machine, so the magic number really is
 
2581 two numbers: 
<CODE>0x950412de</CODE> and 
<CODE>0xde120495</CODE>.  The second
 
2582 word describes the current revision of the file format.  For now the
 
2583 revision is 
0.  This might change in future versions, and ensures
 
2584 that the readers of MO files can distinguish new formats from old
 
2585 ones, so that both can be handled correctly.  The version is kept
 
2586 separate from the magic number, instead of using different magic
 
2587 numbers for different formats, mainly because 
<TT>`/etc/magic'
</TT> is
 
2588 not updated often.  It might be better to have magic separated from
 
2589 internal format version identification.
 
2593 Follow a number of pointers to later tables in the file, allowing
 
2594 for the extension of the prefix part of MO files without having to
 
2595 recompile programs reading them.  This might become useful for later
 
2596 inserting a few flag bits, indication about the charset used, new
 
2597 tables, or other things.
 
2601 Then, at offset 
<VAR>O
</VAR> and offset 
<VAR>T
</VAR> in the picture, two tables
 
2602 of string descriptors can be found.  In both tables, each string
 
2603 descriptor uses two 
32 bits integers, one for the string length,
 
2604 another for the offset of the string in the MO file, counting in bytes
 
2605 from the start of the file.  The first table contains descriptors
 
2606 for the original strings, and is sorted so the original strings
 
2607 are in increasing lexicographical order.  The second table contains
 
2608 descriptors for the translated strings, and is parallel to the first
 
2609 table: to find the corresponding translation one has to access the
 
2610 array slot in the second array with the same index.
 
2614 Having the original strings sorted enables the use of simple binary
 
2615 search, for when the MO file does not contain an hashing table, or
 
2616 for when it is not practical to use the hashing table provided in
 
2617 the MO file.  This also has another advantage, as the empty string
 
2618 in a PO file GNU 
<CODE>gettext
</CODE> is usually 
<EM>translated
</EM> into
 
2619 some system information attached to that particular MO file, and the
 
2620 empty string necessarily becomes the first in both the original and
 
2621 translated tables, making the system information very easy to find.
 
2625 The size 
<VAR>S
</VAR> of the hash table can be zero.  In this case, the
 
2626 hash table itself is not contained in the MO file.  Some people might
 
2627 prefer this because a precomputed hashing table takes disk space, and
 
2628 does not win 
<EM>that
</EM> much speed.  The hash table contains indices
 
2629 to the sorted array of strings in the MO file.  Conflict resolution is
 
2630 done by double hashing.  The precise hashing algorithm used is fairly
 
2631 dependent of GNU 
<CODE>gettext
</CODE> code, and is not documented here.
 
2635 As for the strings themselves, they follow the hash file, and each
 
2636 is terminated with a 
<KBD>NUL
</KBD>, and this 
<KBD>NUL
</KBD> is not counted in
 
2637 the length which appears in the string descriptor.  The 
<CODE>msgfmt
</CODE> 
2638 program has an option selecting the alignment for MO file strings.
 
2639 With this option, each string is separately aligned so it starts at
 
2640 an offset which is a multiple of the alignment value.  On some RISC
 
2641 machines, a correct alignment will speed things up.
 
2645 Nothing prevents an MO file from having embedded 
<KBD>NUL
</KBD>s in strings.
 
2646 However, the program interface currently used already presumes
 
2647 that strings are 
<KBD>NUL
</KBD> terminated, so embedded 
<KBD>NUL
</KBD>s are
 
2648 somewhat useless.  But MO file format is general enough so other
 
2649 interfaces would be later possible, if for example, we ever want to
 
2650 implement wide characters right in MO files, where 
<KBD>NUL
</KBD> bytes may
 
2655 This particular issue has been strongly debated in the GNU
 
2656 <CODE>gettext
</CODE> development forum, and it is expectable that MO file
 
2657 format will evolve or change over time.  It is even possible that many
 
2658 formats may later be supported concurrently.  But surely, we got to
 
2659 start somewhere, and the MO file format described here is a good start.
 
2660 Nothing is cast in concrete, and the format may later evolve fairly
 
2661 easily, so we should feel comfortable with the current approach.
 
2667              +------------------------------------------+
 
2668           0  | magic number = 
0x950412de                |
 
2670           4  | file format revision = 
0                 |
 
2672           8  | number of strings                        |  == N
 
2674          12  | offset of table with original strings    |  == O
 
2676          16  | offset of table with translation strings |  == T
 
2678          20  | size of hashing table                    |  == S
 
2680          24  | offset of hashing table                  |  == H
 
2683              .    (possibly more entries later)         .
 
2686           O  | length 
& offset 
0th string  ----------------.
 
2687       O + 
8  | length 
& offset 
1st string  ------------------.
 
2689 O + ((N-
1)*
8)| length 
& offset (N-
1)th string           |  | |
 
2691           T  | length 
& offset 
0th translation  ---------------.
 
2692       T + 
8  | length 
& offset 
1st translation  -----------------.
 
2694 T + ((N-
1)*
8)| length 
& offset (N-
1)th translation      |  | | | |
 
2696           H  | start hash table                         |  | | | |
 
2698   H + S * 
4  | end hash table                           |  | | | |
 
2700              | NUL terminated 
0th string  
<----------------' | | |
 
2702              | NUL terminated 
1st string  
<------------------' | |
 
2706              | NUL terminated 
0th translation  
<---------------' |
 
2708              | NUL terminated 
1st translation  
<-----------------'
 
2712              +------------------------------------------+
 
2717 <H1><A NAME=
"SEC32" HREF=
"gettext_toc.html#TOC32">The User's View
</A></H1> 
2720 When GNU 
<CODE>gettext
</CODE> will truly have reached is goal, average users
 
2721 should feel some kind of astonished pleasure, seeing the effect of
 
2722 that strange kind of magic that just makes their own native language
 
2723 appear everywhere on their screens.  As for naive users, they would
 
2724 ideally have no special pleasure about it, merely taking their own
 
2725 language for 
<EM>granted
</EM>, and becoming rather unhappy otherwise.
 
2729 So, let's try to describe here how we would like the magic to operate,
 
2730 as we want the users' view to be the simplest, among all ways one
 
2731 could look at GNU 
<CODE>gettext
</CODE>.  All other software engineers:
 
2732 programmers, translators, maintainers, should work together in such a
 
2733 way that the magic becomes possible.  This is a long and progressive
 
2734 undertaking, and information is available about the progress of the
 
2735 GNU Translation Project.
 
2739 When a package is distributed, there are two kind of users:
 
2740 <STRONG>installers
</STRONG> who fetch the distribution, unpack it, configure
 
2741 it, compile it and install it for themselves or others to use; and
 
2742 <STRONG>end users
</STRONG> that call programs of the package, once these have
 
2743 been installed at their site.  GNU 
<CODE>gettext
</CODE> is offering magic
 
2744 for both installers and end users.
 
2750 <H2><A NAME=
"SEC33" HREF=
"gettext_toc.html#TOC33">The Current 
<TT>`NLS'
</TT> Matrix for GNU
</A></H2> 
2753 Languages are not equally supported in all GNU packages.  To know
 
2754 if some GNU package uses GNU 
<CODE>gettext
</CODE>, one may check
 
2755 the distribution for the 
<TT>`NLS'
</TT> information file, for some
 
2756 <TT>`
<VAR>ll
</VAR>.po'
</TT> files, often kept together into some 
<TT>`po/'
</TT> 
2757 directory, or for an 
<TT>`intl/'
</TT> directory.  Internationalized
 
2758 packages have usually many 
<TT>`
<VAR>ll
</VAR>.po'
</TT> files, where 
<VAR>ll
</VAR> 
2759 represents the language.  section 
<A HREF=
"gettext.html#SEC35">Magic for End Users
</A> for a complete description
 
2760 of the format for 
<VAR>ll
</VAR>.
 
2764 More generally, a matrix is available for showing the current state
 
2765 of GNU internationalization, listing which packages are prepared
 
2766 for multi-lingual messages, and which languages is supported by each.
 
2767 Because this information changes often, this matrix is not kept within
 
2768 this GNU 
<CODE>gettext
</CODE> manual.  This information is often found in
 
2769 file 
<TT>`NLS'
</TT> from various GNU distributions, but is also as old
 
2770 as the distribution itself.  A recent copy of this 
<TT>`NLS'
</TT> file,
 
2771 containing up-to-date information, should generally be found on most
 
2777 <H2><A NAME=
"SEC34" HREF=
"gettext_toc.html#TOC34">Magic for Installers
</A></H2> 
2780 By default, packages fully using GNU 
<CODE>gettext
</CODE>, internally,
 
2781 are installed in such a way that they to allow translation of
 
2782 messages.  At 
<EM>configuration
</EM> time, those packages should
 
2783 automatically detect whether the underlying host system provides usable
 
2784 <CODE>catgets
</CODE> or 
<CODE>gettext
</CODE> functions.  If neither is present,
 
2785 the GNU 
<CODE>gettext
</CODE> library should be automatically prepared
 
2786 and used.  Installers may use special options at configuration
 
2787 time for changing this behavior.  The command 
<SAMP>`./configure
 
2788 --with-gnu-gettext'
</SAMP> bypasses system 
<CODE>catgets
</CODE> or 
<CODE>gettext
</CODE> to
 
2789 use GNU 
<CODE>gettext
</CODE> instead, while 
<SAMP>`./configure --disable-nls'
</SAMP> 
2790 produces program totally unable to translate messages.
 
2794 Internationalized packages have usually many 
<TT>`
<VAR>ll
</VAR>.po'
</TT> 
2796 translations are disabled, all those available are installed together
 
2797 with the package.  However, the environment variable 
<CODE>LINGUAS
</CODE> 
2798 may be set, prior to configuration, to limit the installed set.
 
2799 <CODE>LINGUAS
</CODE> should then contain a space separated list of two-letter
 
2800 codes, stating which languages are allowed.
 
2805 <H2><A NAME=
"SEC35" HREF=
"gettext_toc.html#TOC35">Magic for End Users
</A></H2> 
2808 We consider here those packages using GNU 
<CODE>gettext
</CODE> internally,
 
2809 and for which the installers did not disable translation at
 
2810 <EM>configure
</EM> time.  Then, users only have to set the 
<CODE>LANG
</CODE> 
2811 environment variable to the appropriate 
<SAMP>`
<VAR>ll
</VAR>'
</SAMP> prior to
 
2812 using the programs in the package.  See section 
<A HREF=
"gettext.html#SEC33">The Current 
<TT>`NLS'
</TT> Matrix for GNU
</A>.  For example,
 
2813 let's presume a German site.  At the shell prompt, users merely have to
 
2814 execute 
<SAMP>`setenv LANG de'
</SAMP> (in 
<CODE>csh
</CODE>) or 
<SAMP>`export
 
2815 LANG; LANG=de'
</SAMP> (in 
<CODE>sh
</CODE>).  They could even do this from their
 
2816 <TT>`.login'
</TT> or 
<TT>`.profile'
</TT> file.
 
2821 <H1><A NAME=
"SEC36" HREF=
"gettext_toc.html#TOC36">The Programmer's View
</A></H1> 
2824 One aim of the current message catalog implementation provided by
 
2825 GNU 
<CODE>gettext
</CODE> was to use the systems message catalog handling, if the
 
2826 installer wishes to do so.  So we perhaps should first take a look at
 
2827 the solutions we know about.  The people in the POSIX committee does not
 
2828 manage to agree on one of the semi-official standards which we'll
 
2829 describe below.  In fact they couldn't agree on anything, so nothing
 
2830 decide only to include an example of an interface.  The major Unix vendors
 
2831 are split in the usage of the two most important specifications: X/Opens
 
2832 catgets vs. Uniforums gettext interface.  We'll describe them both and
 
2833 later explain our solution of this dilemma.
 
2839 <H2><A NAME=
"SEC37" HREF=
"gettext_toc.html#TOC37">About 
<CODE>catgets
</CODE></A></H2> 
2842 The 
<CODE>catgets
</CODE> implementation is defined in the X/Open Portability
 
2843 Guide, Volume 
3, XSI Supplementary Definitions, Chapter 
5.  But the
 
2844 process of creating this standard seemed to be too slow for some of
 
2845 the Unix vendors so they created their implementations on preliminary
 
2846 versions of the standard.  Of course this leads again to problems while
 
2847 writing platform independent programs: even the usage of 
<CODE>catgets
</CODE> 
2848 does not guarantee a unique interface.
 
2852 Another, personal comment on this that only a bunch of committee members
 
2853 could have made this interface.  They never really tried to program
 
2854 using this interface.  It is a fast, memory-saving implementation, an
 
2855 user can happily live with it.  But programmers hate it (at least me and
 
2860 But we must not forget one point: after all the trouble with transfering
 
2861 the rights on Unix(tm) they at last came to X/Open, the very same who
 
2862 published this specifications.  This leads me to making the prediction
 
2863 that this interface will be in future Unix standards (e.g. Spec1170) and
 
2864 therefore part of all Unix implementation (implementations, which are
 
2865 <EM>allowed
</EM> to wear this name).
 
2871 <H3><A NAME=
"SEC38" HREF=
"gettext_toc.html#TOC38">The Interface
</A></H3> 
2874 The interface to the 
<CODE>catgets
</CODE> implementation consists of three
 
2875 functions which correspond to those used in file access: 
<CODE>catopen
</CODE> 
2876 to open the catalog for using, 
<CODE>catgets
</CODE> for accessing the message
 
2877 tables, and 
<CODE>catclose
</CODE> for closing after work is done.  Prototypes
 
2878 for the functions and the needed definitions are in the
 
2879 <CODE><nl_types.h
></CODE> header file.
 
2883 <CODE>catopen
</CODE> is used like in this:
 
2888 nl_catd catd = catopen ("catalog_name", 
0);
 
2892 The function takes as the argument the name of the catalog.  This usual
 
2893 refers to the name of the program or the package.  The second parameter
 
2894 is not further specified in the standard.  I don't even know whether it
 
2895 is implemented consistently among various systems.  So the common advice
 
2896 is to use 
<CODE>0</CODE> as the value.  The return value is a handle to the
 
2897 message catalog, equivalent to handles to file returned by 
<CODE>open
</CODE>.
 
2901 This handle is of course used in the 
<CODE>catgets
</CODE> function which can
 
2907 char *translation = catgets (catd, set_no, msg_id, "original string");
 
2911 The first parameter is this catalog descriptor.  The second parameter
 
2912 specifies the set of messages in this catalog, in which the message
 
2913 described by 
<CODE>msg_id
</CODE> is obtained.  
<CODE>catgets
</CODE> therefore uses a
 
2914 three-stage addressing:
 
2919 catalog name =
> set number =
> message ID =
> translation
 
2923 The fourth argument is not used to address the translation.  It is given
 
2924 as a default value in case when one of the addressing stages fail.  One
 
2925 important thing to remember is that although the return type of catgets
 
2926 is 
<CODE>char *
</CODE> the resulting string 
<EM>must not
</EM> be changed.  It
 
2927 should better 
<CODE>const char *
</CODE>, but the standard is published in
 
2928 1988, one year before ANSI C.
 
2932 The last of these function functions is used and behaves as expected:
 
2941 After this no 
<CODE>catgets
</CODE> call using the descriptor is legal anymore.
 
2946 <H3><A NAME=
"SEC39" HREF=
"gettext_toc.html#TOC39">Problems with the 
<CODE>catgets
</CODE> Interface?!
</A></H3> 
2949 Now that this descriptions seemed to be really easy where are the
 
2950 problem we speak of.  In fact the interface could be used in a
 
2951 reasonable way, but constructing the message catalogs is a pain.  The
 
2952 reason for this lies in the third argument of 
<CODE>catgets
</CODE>: the unique
 
2953 message ID.  This has to be a numeric value for all messages in a single
 
2954 set.  Perhaps you could imagine the problems keeping such list while
 
2955 changing the source code.  Add a new message here, remove one there.  Of
 
2956 course there have been developed a lot of tools helping to organize this
 
2957 chaos but one as the other fails in one aspect or the other.  We don't
 
2958 want to say that the other approach has no problems but they are far
 
2959 more easily to manage.
 
2964 <H2><A NAME=
"SEC40" HREF=
"gettext_toc.html#TOC40">About 
<CODE>gettext
</CODE></A></H2> 
2967 The definition of the 
<CODE>gettext
</CODE> interface comes from a Uniforum
 
2968 proposal and it is followed by at least one major Unix vendor
 
2969 (Sun) in its last developments.  It is not specified in any official
 
2974 The main points about this solution is that it does not follow the
 
2975 method of normal file handling (open-use-close) and that it does not
 
2976 burden the programmer so many task, especially the unique key handling.
 
2977 Of course here is also a unique key needed, but this key is the
 
2978 message itself (how long or short it is).  See section 
<A HREF=
"gettext.html#SEC45">Comparing the Two Interfaces
</A> for a
 
2979 more detailed comparison of the two methods.
 
2983 The following section contains a rather detailed description of the
 
2984 interface.  We make it that detailed because this is the interface
 
2985 we chose for the GNU 
<CODE>gettext
</CODE> Library.  Programmers interested
 
2986 in using this library will be interested in this description.
 
2992 <H3><A NAME=
"SEC41" HREF=
"gettext_toc.html#TOC41">The Interface
</A></H3> 
2995 The minimal functionality an interface must have is a) to select a
 
2996 domain the strings are coming from (a single domain for all programs is
 
2997 not reasonable because its construction and maintenance is difficult,
 
2998 perhaps impossible) and b) to access a string in a selected domain.
 
3002 This is principally the description of the 
<CODE>gettext
</CODE> interface.  It
 
3003 has an global domain which unqualified usages reference.  Of course this
 
3004 domain is selectable by the user.
 
3009 char *textdomain (const char *domain_name);
 
3013 This provides the possibility to change or query the current status of
 
3014 the current global domain of the 
<CODE>LC_MESSAGE
</CODE> category.  The
 
3015 argument is a null-terminated string, whose characters must be legal in
 
3016 the use in filenames.  If the 
<VAR>domain_name
</VAR> argument is 
<CODE>NULL
</CODE>,
 
3017 the function return the current value.  If no value has been set
 
3018 before, the name of the default domain is returned: 
<EM>messages
</EM>.
 
3019 Please note that although the return value of 
<CODE>textdomain
</CODE> is of
 
3020 type 
<CODE>char *
</CODE> no changing is allowed.  It is also important to know
 
3021 that no checks of the availability are made.  If the name is not
 
3022 available you will see this by the fact that no translations are provided.
 
3026 To use a domain set by 
<CODE>textdomain
</CODE> the function
 
3031 char *gettext (const char *msgid);
 
3035 is to be used.  This is the simplest reasonable form one can imagine.
 
3036 The translation of the string 
<VAR>msgid
</VAR> is returned if it is available
 
3037 in the current domain.  If not available the argument itself is
 
3038 returned.  If the argument is 
<CODE>NULL
</CODE> the result is undefined.
 
3042 One things which should come into mind is that no explicit dependency to
 
3043 the used domain is given.  The current value of the domain for the
 
3044 <CODE>LC_MESSAGES
</CODE> locale is used.  If this changes between two
 
3045 executions of the same 
<CODE>gettext
</CODE> call in the program, both calls
 
3046 reference a different message catalog.
 
3050 For the easiest case, which is normally used in internationalized GNU
 
3051 packages, once at the beginning of execution a call to 
<CODE>textdomain
</CODE> 
3052 is issued, setting the domain to a unique name, normally the package
 
3053 name.  In the following code all strings which have to be translated are
 
3054 filtered through the gettext function.  That's all, the package speaks
 
3060 <H3><A NAME=
"SEC42" HREF=
"gettext_toc.html#TOC42">Solving Ambiguities
</A></H3> 
3063 While this single name domain work good for most applications there
 
3064 might be the need to get translations from more than one domain.  Of
 
3065 course one could switch between different domains with calls to
 
3066 <CODE>textdomain
</CODE>, but this is really not convenient nor is it fast.  A
 
3067 possible situation could be one case discussing while this writing:  all
 
3068 error messages of functions in the set of common used functions should
 
3069 go into a separate domain 
<CODE>error
</CODE>.  By this mean we would only need
 
3070 to translate them once.
 
3074 For this reasons there are two more functions to retrieve strings:
 
3079 char *dgettext (const char *domain_name, const char *msgid);
 
3080 char *dcgettext (const char *domain_name, const char *msgid,
 
3085 Both take an additional argument at the first place, which corresponds
 
3086 to the argument of 
<CODE>textdomain
</CODE>.  The third argument of
 
3087 <CODE>dcgettext
</CODE> allows to use another locale but 
<CODE>LC_MESSAGES
</CODE>.
 
3088 But I really don't know where this can be useful.  If the
 
3089 <VAR>domain_name
</VAR> is 
<CODE>NULL
</CODE> or 
<VAR>category
</VAR> has an value beside
 
3090 the known ones, the result is undefined.  It should also be noted that
 
3091 this function is not part of the second known implementation of this
 
3092 function family, the one found in Solaris.
 
3096 A second ambiguity can arise by the fact, that perhaps more than one
 
3097 domain has the same name.  This can be solved by specifying where the
 
3098 needed message catalog files can be found.
 
3103 char *bindtextdomain (const char *domain_name,
 
3104                       const char *dir_name);
 
3108 Calling this function binds the given domain to a file in the specified
 
3109 directory (how this file is determined follows below).  Esp a file in
 
3110 the systems default place is not favored against the specified file
 
3111 anymore (as it would be by solely using 
<CODE>textdomain
</CODE>).  A 
<CODE>NULL
</CODE> 
3112 pointer for the 
<VAR>dir_name
</VAR> parameter returns the binding associated
 
3113 with 
<VAR>domain_name
</VAR>.  If 
<VAR>domain_name
</VAR> itself is 
<CODE>NULL
</CODE> 
3114 nothing happens and a 
<CODE>NULL
</CODE> pointer is returned.  Here again as
 
3115 for all the other functions is true that none of the return value must
 
3121 <H3><A NAME=
"SEC43" HREF=
"gettext_toc.html#TOC43">Locating Message Catalog Files
</A></H3> 
3124 Because many different languages for many different packages have to be
 
3125 stored we need some way to add these information to file message catalog
 
3126 files.  The way usually used in Unix environments is have this encoding
 
3127 in the file name.  This is also done here.  The directory name given in
 
3128 <CODE>bindtextdomain
</CODE>s second argument (or the default directory),
 
3129 followed by the value and name of the locale and the domain name are
 
3135 <VAR>dir_name
</VAR>/
<VAR>locale
</VAR>/LC_
<VAR>category
</VAR>/
<VAR>domain_name
</VAR>.mo
 
3139 The default value for 
<VAR>dir_name
</VAR> is system specific.  For the GNU
 
3143 /usr/local/share/locale
 
3147 <VAR>locale
</VAR> is the value of the locale whose name is this
 
3148 <CODE>LC_
<VAR>category
</VAR></CODE>.  For 
<CODE>gettext
</CODE> and 
<CODE>dgettext
</CODE> this
 
3149 locale is always 
<CODE>LC_MESSAGES
</CODE>.  
<CODE>dcgettext
</CODE> specifies the
 
3150 locale by the third argument.
<A NAME=
"DOCF2" HREF=
"gettext_foot.html#FOOT2">(
2)
</A> <A NAME=
"DOCF3" HREF=
"gettext_foot.html#FOOT3">(
3)
</A> 
3155 <H3><A NAME=
"SEC44" HREF=
"gettext_toc.html#TOC44">Optimization of the *gettext functions
</A></H3> 
3158 At this point of the discussion we should talk about an advantage of the
 
3159 GNU 
<CODE>gettext
</CODE> implementation.  Some readers might have pointed out
 
3160 that an internationalized program might have a poor performance if some
 
3161 string has to be translated in an inner loop.  While this is unavoidable
 
3162 when the string varies from one run of the loop to the other it is
 
3163 simply a waste of time when the string is always the same.  Take the
 
3172       puts (gettext ("Hello world"));
 
3178 When the locale selection does not change between two runs the resulting
 
3179 string is always the same.  One way to use this is:
 
3185   str = gettext ("Hello world");
 
3194 But this solution is not usable in all situation (e.g. when the locale
 
3195 selection changes) nor is it good readable.
 
3199 The GNU C compiler, version 
2.7 and above, provide another solution for
 
3200 this.  To describe this we show here some lines of the
 
3201 <TT>`intl/libgettext.h'
</TT> file.  For an explanation of the expression
 
3202 command block see section `Statements and Declarations in Expressions' in 
<CITE>The GNU CC Manual
</CITE>.
 
3207 #  if defined __GNUC__ 
&& __GNUC__ == 
2 && __GNUC_MINOR__ 
>= 
7 
3208 #   define      dcgettext(domainname, msgid, category)           \
 
3212      if (__builtin_constant_p (msgid))                           \
 
3214          extern int _nl_msg_cat_cntr;                            \
 
3215          static char *__translation__;                           \
 
3216          static int __catalog_counter__;                         \
 
3217          if (! __translation__                                   \
 
3218              || __catalog_counter__ != _nl_msg_cat_cntr)         \
 
3221                dcgettext__ ((domainname), (msgid), (category));  \
 
3222              __catalog_counter__ = _nl_msg_cat_cntr;             \
 
3224          result = __translation__;                               \
 
3227        result = dcgettext__ ((domainname), (msgid), (category)); \
 
3234 The interesting thing here is the 
<CODE>__builtin_constant_p
</CODE> predicate.
 
3235 This is evaluated at compile time and so optimization can take place
 
3236 immediately.  Here two cases are distinguished: the argument to
 
3237 <CODE>gettext
</CODE> is not a constant value in which case simply the function
 
3238 <CODE>dcgettext__
</CODE> is called, the real implementation of the
 
3239 <CODE>dcgettext
</CODE> function.
 
3243 If the string argument 
<EM>is
</EM> constant we can reuse the once gained
 
3244 translation when the locale selection has not changed.  This is exactly
 
3245 what is done here.  The 
<CODE>_nl_msg_cat_cntr
</CODE> variable is defined in
 
3246 the 
<TT>`loadmsgcat.c'
</TT> which is available in 
<TT>`libintl.a'
</TT> and is
 
3247 changed whenever a new message catalog is loaded.
 
3252 <H2><A NAME=
"SEC45" HREF=
"gettext_toc.html#TOC45">Comparing the Two Interfaces
</A></H2> 
3255 The following discussion is perhaps a little bit colored.  As said
 
3256 above we implemented GNU 
<CODE>gettext
</CODE> following the Uniforum
 
3257 proposal and this surely has its reasons.  But it should show how we
 
3258 came to this decision.
 
3262 First we take a look at the developing process.  When we write an
 
3263 application using NLS provided by 
<CODE>gettext
</CODE> we proceed as always.
 
3264 Only when we come to a string which might be seen by the users and thus
 
3265 has to be translated we use 
<CODE>gettext("...")
</CODE> instead of
 
3266 <CODE>"..."</CODE>.  At the beginning of each source file (or in a central
 
3267 header file) we define
 
3272 #define gettext(String) (String)
 
3276 Even this definition can be avoided when the system supports the
 
3277 <CODE>gettext
</CODE> function in its C library.  When we compile this code the
 
3278 result is the same as if no NLS code is used.  When  you take a look at
 
3279 the GNU 
<CODE>gettext
</CODE> code you will see that we use 
<CODE>_("...")
</CODE> 
3280 instead of 
<CODE>gettext("...")
</CODE>.  This reduces the number of
 
3281 additional characters per translatable string to 
<EM>3</EM> (in words:
 
3286 When now a production version of the program is needed we simply replace
 
3292 #define _(String) (String)
 
3301 #include 
<libintl.h
> 
3302 #define _(String) gettext (String)
 
3306 and include the header 
<TT>`libintl.h'
</TT>.  Additionally we run the
 
3307 program 
<TT>`xgettext'
</TT> on all source code file which contain
 
3308 translatable strings and we are gone.  We have a running program which
 
3309 does not depend on translations to be available, but which can use any
 
3310 that becomes available.
 
3314 The same procedure can be done for the 
<CODE>gettext_noop
</CODE> invocations
 
3315 (see section 
<A HREF=
"gettext.html#SEC17">Special Cases of Translatable Strings
</A>).  First you can define 
<CODE>gettext_noop
</CODE> to a
 
3316 no-op macro and later use the definition from 
<TT>`libintl.h'
</TT>.  Because
 
3317 this name is not used in Suns implementation of 
<TT>`libintl.h'
</TT>,
 
3318 you should consider the following code for your project:
 
3324 # define N_(Str) gettext_noop (Str)
 
3326 # define N_(Str) (Str)
 
3331 <CODE>N_
</CODE> is a short form similar to 
<CODE>_
</CODE>.  The 
<TT>`Makefile'
</TT> in
 
3332 the 
<TT>`po/'
</TT> directory of GNU gettext knows by default both of the
 
3333 mentioned short forms so you are invited to follow this proposal for
 
3338 Now to 
<CODE>catgets
</CODE>.  The main problem is the work for the
 
3339 programmer.  Every time he comes to a translatable string he has to
 
3340 define a number (or a symbolic constant) which has also be defined in
 
3341 the message catalog file.  He also has to take care for duplicate
 
3342 entries, duplicate message IDs etc.  If he wants to have the same
 
3343 quality in the message catalog as the GNU 
<CODE>gettext
</CODE> program
 
3344 provides he also has to put the descriptive comments for the strings and
 
3345 the location in all source code files in the message catalog.  This is
 
3346 nearly a Mission: Impossible.
 
3350 But there are also some points people might call advantages speaking for
 
3351 <CODE>catgets
</CODE>.  If you have a single word in a string and this string
 
3352 is used in different contexts it is likely that in one or the other
 
3353 language the word has different translations.  Example:
 
3358 printf ("%s: %d", gettext ("number"), number_of_errors)
 
3360 printf ("you should see %d %s", number_count,
 
3361         number_count == 
1 ? gettext ("number") : gettext ("numbers"))
 
3365 Here we have to translate two times the string 
<CODE>"number"</CODE>.  Even
 
3366 if you do not speak a language beside English it might be possible to
 
3367 recognize that the two words have a different meaning.  In German the
 
3368 first appearance has to be translated to 
<CODE>"Anzahl"</CODE> and the second
 
3369 to 
<CODE>"Zahl"</CODE>.
 
3373 Now you can say that this example is really esoteric.  And you are
 
3374 right!  This is exactly how we felt about this problem and decide that
 
3375 it does not weight that much.  The solution for the above problem could
 
3381 printf (gettext ("number: %d"), number_of_errors)
 
3383 printf (number_count == 
1 ? gettext ("you should see %d number")
 
3384                           : gettext ("you should see %d numbers"),
 
3389 We believe that we can solve all conflicts with this method.  If it is
 
3390 difficult one can also consider changing one of the conflicting string a
 
3391 little bit.  But it is not impossible to overcome.
 
3395 Translator note: It is perhaps appropriate here to tell those English
 
3396 speaking programmers that the plural form of a noun cannot be formed by
 
3397 appending a single `s'.  Most other languages use different methods.  So
 
3398 you should at least use the method given in the above example.
 
3402 But I have been told that some languages have even more complex rules.
 
3403 A good approach might be to consider methods like the one used for
 
3404 <CODE>LC_TIME
</CODE> in the POSIX
.2 standard.
 
3410 <H2><A NAME=
"SEC46" HREF=
"gettext_toc.html#TOC46">Using libintl.a in own programs
</A></H2> 
3413 Starting with version 
0.9.4 the library 
<CODE>libintl.h
</CODE> should be more
 
3414 or less self-contained.  I.e. you can use it in your own programs.  The
 
3415 <TT>`Makefile'
</TT> will put the header and the library in directories
 
3416 selected using the 
<CODE>$(prefix)
</CODE>.
 
3420 One exception of the above is found on HP-UX systems.  Here the C library
 
3421 does not contain the 
<CODE>alloca
</CODE> function (and the HP compiler does
 
3422 not generate it inlined).  But it is not intended to rewrite the whole
 
3423 library just because of this dumb system.  Instead include the
 
3424 <CODE>alloca
</CODE> function in all package you use the 
<CODE>libintl.a
</CODE> in.
 
3430 <H2><A NAME=
"SEC47" HREF=
"gettext_toc.html#TOC47">Being a 
<CODE>gettext
</CODE> grok
</A></H2> 
3433 To fully exploit the functionality of the GNU 
<CODE>gettext
</CODE> library it
 
3434 is surely helpful to read the source code.  But for those who don't want
 
3435 to spend that much time in reading the (sometimes complicated) code here
 
3441 <LI>Changing the language at runtime
 
3443 For interactive programs it might be useful to offer a selection of the
 
3444 used language at runtime.  To understand how to do this one need to know
 
3445 how the used language is determined while executing the 
<CODE>gettext
</CODE> 
3446 function.  The method which is presented here only works correctly
 
3447 with the GNU implementation of the 
<CODE>gettext
</CODE> functions.  It is not
 
3448 possible with underlying 
<CODE>catgets
</CODE> functions or 
<CODE>gettext
</CODE> 
3449 functions from the systems C library.  The exception is of course the
 
3450 GNU C Library which uses the GNU gettext Library for message handling.
 
3452 In the function 
<CODE>dcgettext
</CODE> at every call the current setting of
 
3453 the highest priority environment variable is determined and used.
 
3454 Highest priority means here the following list with decreasing
 
3459 <LI><CODE>LANGUAGE
</CODE> 
3461 <LI><CODE>LC_ALL
</CODE> 
3463 <LI><CODE>LC_xxx
</CODE>, according to selected locale
 
3465 <LI><CODE>LANG
</CODE> 
3469 Afterwards the path is constructed using the found value and the
 
3470 translation file is loaded if available.
 
3472 What is now when the value for, say, 
<CODE>LANGUAGE
</CODE> changes.  According
 
3473 to the process explained above the new value of this variable is found
 
3474 as soon as the 
<CODE>dcgettext
</CODE> function is called.  But this also means
 
3475 the (perhaps) different message catalog file is loaded.  In other
 
3476 words: the used language is changed.
 
3478 But there is one little hook.  The code for gcc-
2.7.0 and up provides
 
3479 some optimization.  This optimization normally prevents the calling of
 
3480 the 
<CODE>dcgettext
</CODE> function as long as now new catalog is loaded.  But
 
3481 if 
<CODE>dcgettext
</CODE> is not called we program also cannot find the
 
3482 <CODE>LANGUAGE
</CODE> variable be changed (see section 
<A HREF=
"gettext.html#SEC44">Optimization of the *gettext functions
</A>).  But the
 
3483 solution is very easy.  Include the following code in the language
 
3488   /* Change language.  */
 
3489   setenv ("LANGUAGE", "fr", 
1);
 
3491   /* Make change known.  */
 
3493     extern int  _nl_msg_cat_cntr;
 
3498 The variable 
<CODE>_nl_msg_cat_cntr
</CODE> is defined in 
<TT>`loadmsgcat.c'
</TT>.
 
3504 <H2><A NAME=
"SEC48" HREF=
"gettext_toc.html#TOC48">Temporary Notes for the Programmers Chapter
</A></H2> 
3508 <H3><A NAME=
"SEC49" HREF=
"gettext_toc.html#TOC49">Temporary - Two Possible Implementations
</A></H3> 
3511 There are two competing methods for language independent messages:
 
3512 the X/Open 
<CODE>catgets
</CODE> method, and the Uniforum 
<CODE>gettext
</CODE> 
3513 method.  The 
<CODE>catgets
</CODE> method indexes messages by integers; the
 
3514 <CODE>gettext
</CODE> method indexes them by their English translations.
 
3515 The 
<CODE>catgets
</CODE> method has been around longer and is supported
 
3516 by more vendors.  The 
<CODE>gettext
</CODE> method is supported by Sun,
 
3517 and it has been heard that the COSE multi-vendor initiative is
 
3518 supporting it.  Neither method is a POSIX standard; the POSIX
.1 
3519 committee had a lot of disagreement in this area.
 
3523 Neither one is in the POSIX standard.  There was much disagreement
 
3524 in the POSIX
.1 committee about using the 
<CODE>gettext
</CODE> routines
 
3525 vs. 
<CODE>catgets
</CODE> (XPG).  In the end the committee couldn't
 
3526 agree on anything, so no messaging system was included as part
 
3527 of the standard.  I believe the informative annex of the standard
 
3528 includes the XPG3 messaging interfaces, "...as an example of
 
3529 a messaging system that has been implemented..."
 
3533 They were very careful not to say anywhere that you should use one
 
3534 set of interfaces over the other.  For more on this topic please
 
3535 see the Programming for Internationalization FAQ.
 
3540 <H3><A NAME=
"SEC50" HREF=
"gettext_toc.html#TOC50">Temporary - About 
<CODE>catgets
</CODE></A></H3> 
3543 There have been a few discussions of late on the use of
 
3544 <CODE>catgets
</CODE> as a base.  I think it important to present both
 
3545 sides of the argument and hence am opting to play devil's advocate
 
3550 I'll not deny the fact that 
<CODE>catgets
</CODE> could have been designed
 
3551 a lot better.  It currently has quite a number of limitations and
 
3552 these have already been pointed out.
 
3556 However there is a great deal to be said for consistency and
 
3557 standardization.  A common recurring problem when writing Unix
 
3558 software is the myriad portability problems across Unix platforms.
 
3559 It seems as if every Unix vendor had a look at the operating system
 
3560 and found parts they could improve upon.  Undoubtedly, these
 
3561 modifications are probably innovative and solve real problems.
 
3562 However, software developers have a hard time keeping up with all
 
3563 these changes across so many platforms.
 
3567 And this has prompted the Unix vendors to begin to standardize their
 
3568 systems.  Hence the impetus for Spec1170.  Every major Unix vendor
 
3569 has committed to supporting this standard and every Unix software
 
3570 developer waits with glee the day they can write software to this
 
3571 standard and simply recompile (without having to use autoconf)
 
3572 across different platforms.
 
3576 As I understand it, Spec1170 is roughly based upon version 
4 of the
 
3577 X/Open Portability Guidelines (XPG4).  Because 
<CODE>catgets
</CODE> and
 
3578 friends are defined in XPG4, I'm led to believe that 
<CODE>catgets
</CODE> 
3579 is a part of Spec1170 and hence will become a standardized component
 
3580 of all Unix systems.
 
3585 <H3><A NAME=
"SEC51" HREF=
"gettext_toc.html#TOC51">Temporary - Why a single implementation
</A></H3> 
3588 Now it seems kind of wasteful to me to have two different systems
 
3589 installed for accessing message catalogs.  If we do want to remedy
 
3590 <CODE>catgets
</CODE> deficiencies why don't we try to expand 
<CODE>catgets
</CODE> 
3591 (in a compatible manner) rather than implement an entirely new system.
 
3592 Otherwise, we'll end up with two message catalog access systems
 
3593 installed with an operating system - one set of routines for GNU
 
3594 software, and another set of routines (catgets) for all other software.
 
3599 Supposing another catalog access system is implemented.  Which do
 
3600 we recommend?  At least for Linux, we need to attract as many
 
3601 software developers as possible.  Hence we need to make it as easy
 
3602 for them to port their software as possible.  Which means supporting
 
3603 <CODE>catgets
</CODE>.  We will be implementing the 
<CODE>glocale
</CODE> code
 
3604 within our 
<CODE>libc
</CODE>, but does this mean we also have to incorporate
 
3605 another message catalog access scheme within our 
<CODE>libc
</CODE> as well?
 
3606 And what about people who are going to be using the 
<CODE>glocale
</CODE> 
3607 + non-
<CODE>catgets
</CODE> routines.  When they port their software to
 
3608 other platforms, they're now going to have to include the front-end
 
3609 (
<CODE>glocale
</CODE>) code plus the back-end code (the non-
<CODE>catgets
</CODE> 
3610 access routines) with their software instead of just including the
 
3611 <CODE>glocale
</CODE> code with their software.
 
3615 Message catalog support is however only the tip of the iceberg.
 
3616 What about the data for the other locale categories.  They also have
 
3617 a number of deficiencies.  Are we going to abandon them as well and
 
3618 develop another duplicate set of routines (should 
<CODE>glocale
</CODE> 
3619 expand beyond message catalog support)?
 
3623 Like many parts of Unix that can be improved upon, we're stuck with balancing
 
3624 compatibility with the past with useful improvements and innovations for
 
3630 <H3><A NAME=
"SEC52" HREF=
"gettext_toc.html#TOC52">Temporary - Double layer solution
</A></H3> 
3633 GNU locale implements a 
<CODE>gettext
</CODE>-style interface on top of a
 
3634 <CODE>catgets
</CODE>-style interface.
 
3638 This is not needless complexity.  It is absolutely vital, because
 
3639 it enables 
<CODE>gettext
</CODE> to run on top of 
<CODE>catgets
</CODE>, which
 
3640 enables Linux International to recommend users use it 
<EM>today
</EM>.
 
3644 Rewriting 
<CODE>gettext
</CODE> so that it could use 
<EM>either
</EM> 
3645 <CODE>catgets
</CODE> <EM>or
</EM> some simpler mechanism would not break
 
3646 anything, but would not reduce complexity either.  It might be
 
3647 worth doing, but it isn't urgent.
 
3651 In general, simplicity is not enough of a reason to rewrite a
 
3652 program that works.  Simplicity is just one desirable thing.
 
3653 It is not overridingly important.
 
3658 <H3><A NAME=
"SEC53" HREF=
"gettext_toc.html#TOC53">Temporary - Notes
</A></H3> 
3661 X/Open agreed very late on the standard form so that many
 
3662 implementations differ from the final form.  Both of my system (old
 
3663 Linux catgets and Ultrix-
4) have a strange variation.
 
3667 OK.  After incorporating the last changes I have to spend some time on
 
3668 making the GNU/Linux libc gettext functions.  So in future Solaris is
 
3669 not the only system having gettext.
 
3674 <H1><A NAME=
"SEC54" HREF=
"gettext_toc.html#TOC54">The Translator's View
</A></H1> 
3678 <H2><A NAME=
"SEC55" HREF=
"gettext_toc.html#TOC55">Introduction 
0</A></H2> 
3681 GNU is going international!  The GNU Translation Project is a way
 
3682 to get maintainers, translators and users all together, so GNU will
 
3683 gradually become able to speak many native languages.
 
3687 The GNU 
<CODE>gettext
</CODE> tool set contains 
<EM>everything
</EM> maintainers
 
3688 need for internationalizing their packages for messages.  It also
 
3689 contains quite useful tools for helping translators at localizing
 
3690 messages to their native language, once a package has already been
 
3695 To achieve the GNU Translation Project, we need many interested
 
3696 people who like their own language and write it well, and who are also
 
3697 able to synergize with other translators speaking the same language.
 
3698 If you'd like to volunteer to 
<EM>work
</EM> at translating messages,
 
3699 please send mail to your translating team.
 
3703 Each team has its own mailing list, courtesy of Linux
 
3704 International.  You may reach your translating team at the address
 
3705 <TT>`
<VAR>ll
</VAR>@li.org'
</TT>, replacing 
<VAR>ll
</VAR> by the two-letter ISO 
639 
3706 code for your language.  Language codes are 
<EM>not
</EM> the same as
 
3707 country codes given in ISO 
3166.  The following translating teams
 
3714 Chinese 
<CODE>zh
</CODE>, Czech 
<CODE>cs
</CODE>, Danish 
<CODE>da
</CODE>, Dutch 
<CODE>nl
</CODE>,
 
3715 Esperanto 
<CODE>eo
</CODE>, Finnish 
<CODE>fi
</CODE>, French 
<CODE>fr
</CODE>, Irish
 
3716 <CODE>ga
</CODE>, German 
<CODE>de
</CODE>, Greek 
<CODE>el
</CODE>, Italian 
<CODE>it
</CODE>,
 
3717 Japanese 
<CODE>ja
</CODE>, Indonesian 
<CODE>in
</CODE>, Norwegian 
<CODE>no
</CODE>, Polish
 
3718 <CODE>pl
</CODE>, Portuguese 
<CODE>pt
</CODE>, Russian 
<CODE>ru
</CODE>, Spanish 
<CODE>es
</CODE>,
 
3719 Swedish 
<CODE>sv
</CODE> and Turkish 
<CODE>tr
</CODE>.
 
3723 For example, you may reach the Chinese translating team by writing to
 
3724 <TT>`zh@li.org'
</TT>.  When you become a member of the translating team
 
3725 for your own language, you may subscribe to its list.  For example,
 
3726 Swedish people can send a message to 
<TT>`sv-request@li.org'
</TT>,
 
3727 having this message body:
 
3736 Keep in mind that team members should be interested in 
<EM>working
</EM> 
3737 at translations, or at solving translational difficulties, rather than
 
3738 merely lurking around.  If your team does not exist yet and you want to
 
3739 start one, please write to 
<TT>`gnu-translation@prep.ai.mit.edu'
</TT>;
 
3740 you will then reach the GNU coordinator for all translator teams.
 
3744 A handful of GNU packages have already been adapted and provided
 
3745 with message translations for several languages.  Translation
 
3746 teams have begun to organize, using these packages as a starting
 
3747 point.  But there are many more packages and many languages for
 
3748 which we have no volunteer translators.  If you would like to
 
3749 volunteer to work at translating messages, please send mail to
 
3750 <TT>`gnu-translation@prep.ai.mit.edu'
</TT> indicating what language(s)
 
3756 <H2><A NAME=
"SEC56" HREF=
"gettext_toc.html#TOC56">Introduction 
1</A></H2> 
3759 This is now official, GNU is going international!  Here is the
 
3760 announcement submitted for the January 
1995 GNU Bulletin:
 
3766 A handful of GNU packages have already been adapted and provided
 
3767 with message translations for several languages.  Translation
 
3768 teams have begun to organize, using these packages as a starting
 
3769 point.  But there are many more packages and many languages
 
3770 for which we have no volunteer translators.  If you'd like to
 
3771 volunteer to work at translating messages, please send mail to
 
3772 <SAMP>`gnu-translation@prep.ai.mit.edu'
</SAMP> indicating what language(s)
 
3777 This document should answer many questions for those who are curious
 
3778 about the process or would like to contribute.  Please at least skim
 
3779 over it, hoping to cut down a little of the high volume of email
 
3780 generated by this collective effort towards GNU internationalization.
 
3784 GNU programming is done in English, and currently, English is used
 
3785 as the main communicating language between national communities
 
3786 collaborating to the GNU project.  This very document is written
 
3787 in English.  This will not change in the foreseeable future.
 
3791 However, there is a strong appetite from national communities for
 
3792 having more software able to write using national language and habits,
 
3793 and there is an on-going effort to modify GNU software in such a way
 
3794 that it becomes able to do so.  The experiments driven so far raised
 
3795 an enthusiastic response from pretesters, so we believe that GNU
 
3796 internationalization is dedicated to succeed.
 
3800 For suggestion clarifications, additions or corrections to this
 
3801 document, please email to 
<TT>`gnu-translation@prep.ai.mit.edu'
</TT>.
 
3806 <H2><A NAME=
"SEC57" HREF=
"gettext_toc.html#TOC57">Discussions
</A></H2> 
3809 Facing this internationalization effort, a few users expressed their
 
3810 concerns.  Some of these doubts are presented and discussed, here.
 
3817 Some languages are not spoken by a very large number of people,
 
3818 so people speaking them sometimes consider that there may not be
 
3819 all that much demand such versions of GNU packages.  Moreover, many
 
3820 people being 
<EM>into computers
</EM>, in some countries, generally seem
 
3821 to prefer English versions of their software.
 
3823 On the other end, people might enjoy their own language a lot, and
 
3824 be very motivated at providing to themselves the pleasure of having
 
3825 their beloved GNU software speaking their mother tongue.  They do
 
3826 themselves a personal favor, and do not pay that much attention to
 
3827 the number of people beneficiating of their work.
 
3829 <LI>Misinterpretation
 
3831 Other users are shy to push forward their own language, seeing in this
 
3832 some kind of misplaced propaganda.  Someone thought there must be some
 
3833 users of the language over the networks pestering other people with it.
 
3835 But any spoken language is worth localization, because there are
 
3836 people behind the language for whom the language is important and
 
3837 dear to their hearts.
 
3839 <LI>Odd translations
 
3841 The biggest problem is to find the right translations so that
 
3842 everybody can understand the messages.  Translations are usually a
 
3843 little odd.  Some people get used to English, to the extent they may
 
3844 find translations into their own language "rather pushy, obnoxious
 
3845 and sometimes even hilarious."  As a French speaking man, I have
 
3846 the experience of those instruction manuals for goods, so poorly
 
3847 translated in French in Korea or Taiwan...
 
3849 The fact is that we sometimes have to create a kind of national
 
3850 computer culture, and this is not easy without the collaboration of
 
3851 many people liking their mother tongue.  This is why translations are
 
3852 better achieved by people knowing and loving their own language, and
 
3853 ready to work together at improving the results they obtain.
 
3855 <LI>Dependencies over the GPL
 
3857 Some people wonder if using GNU 
<CODE>gettext
</CODE> necessarily brings their package
 
3858 under the protective wing of the GNU General Public License, when they
 
3859 do not want to make their program free, or want other kinds of freedom.
 
3860 The simplest answer is yes.
 
3862 The mere marking of localizable strings in a package, or conditional
 
3863 inclusion of a few lines for initialization, is not really including
 
3864 GPL'ed code.  However, the localization routines themselves are under
 
3865 the GPL and would bring the remainder of the package under the GPL
 
3866 if they were distributed with it.  So, I presume that, for those
 
3867 for which this is a problem, it could be circumvented by letting to
 
3868 the end installers the burden of assembling a package prepared for
 
3869 localization, but not providing the localization routines themselves.
 
3875 <H2><A NAME=
"SEC58" HREF=
"gettext_toc.html#TOC58">Organization
</A></H2> 
3878 On a larger scale, the true solution would be to organize some kind of
 
3879 fairly precise set up in which volunteers could participate.  I gave
 
3880 some thought to this idea lately, and realize there will be some
 
3881 touchy points.  I thought of writing to Richard Stallman to launch
 
3882 such a project, but feel it might be good to shake out the ideas
 
3883 between ourselves first.  Most probably that Linux International has
 
3884 some experience in the field already, or would like to orchestrate
 
3885 the volunteer work, maybe.  Food for thought, in any case!
 
3889 I guess we have to setup something early, somehow, that will help
 
3890 many possible contributors of the same language to interlock and avoid
 
3891 work duplication, and further be put in contact for solving together
 
3892 problems particular to their tongue (in most languages, there are many
 
3893 difficulties peculiar to translating technical English).  My Swedish
 
3894 contributor acknowledged these difficulties, and I'm well aware of
 
3899 This is surely not a technical issue, but we should manage so the
 
3900 effort of locale contributors be maximally useful, despite the national
 
3901 team layer interface between contributors and maintainers.
 
3905 GNU needs some setup for coordinating language coordinators.
 
3906 Localizing evolving GNU programs will surely become a permanent
 
3907 and continuous activity in GNU, once started.  The setup should be
 
3908 minimally completed and tested before GNU 
<CODE>gettext
</CODE> becomes an official
 
3909 reality.  The email address 
<TT>`gnu-translation@prep.ai.mit.edu'
</TT> 
3910 has been setup for receiving offers from volunteers and general
 
3911 email on these topics.  This address reaches the GNU Translation
 
3912 Project coordinator.
 
3918 <H3><A NAME=
"SEC59" HREF=
"gettext_toc.html#TOC59">Central Coordination
</A></H3> 
3921 I also think GNU will need sooner than it thinks, that someone setup
 
3922 a way to organize and coordinate these groups.  Some kind of group
 
3923 of groups.  My opinion is that it would be good that GNU delegate
 
3924 this task to a small group of collaborating volunteers, shortly.
 
3925 Perhaps in 
<TT>`gnu.announce'
</TT> a list of this national committee's
 
3930 My role as coordinator would simply be to refer to Ulrich any German
 
3931 speaking volunteer interested to localization of GNU programs, and
 
3932 maybe helping national groups to initially organize, while maintaining
 
3933 national registries for until national groups are ready to take over.
 
3934 In fact, the coordinator should ease volunteers to get in contact with
 
3935 one another for creating national teams, which should then select
 
3936 one coordinator per language, or country (regionalized language).
 
3937 If well done, the coordination should be useful without being an
 
3938 overwhelming task, the time to put delegations in place.
 
3943 <H3><A NAME=
"SEC60" HREF=
"gettext_toc.html#TOC60">National Teams
</A></H3> 
3946 I suggest we look for volunteer coordinators/editors for individual
 
3947 languages.  These people will scan contributions of translation files
 
3948 for various programs, for their own languages, and will ensure high
 
3949 and uniform standards of diction.
 
3953 From my current experience with other people in these days, those who
 
3954 provide localizations are very enthusiastic about the process, and are
 
3955 more interested in the localization process than in the program they
 
3956 localize, and want to do many programs, not just one.  This seems
 
3957 to confirm that having a coordinator/editor for each language is a
 
3962 We need to choose someone who is good at writing clear and concise
 
3963 prose in the language in question.  That is hard--we can't check
 
3964 it ourselves.  So we need to ask a few people to judge each others'
 
3965 writing and select the one who is best.
 
3969 I announce my prerelease to a few dozen people, and you would not
 
3970 believe all the discussions it generated already.  I shudder to think
 
3971 what will happen when this will be launched, for true, officially,
 
3972 world wide.  Who am I to arbitrate between two Czekolsovak users
 
3973 contradicting each other, for example?
 
3977 I assume that your German is not much better than my French so that
 
3978 I would not be able to judge about these formulations.  What I would
 
3979 suggest is that for each language there is a group for people who
 
3980 maintain the PO files and judge about changes.  I suspect there will
 
3981 be cultural differences between how such groups of people will behave.
 
3982 Some will have relaxed ways, reach consensus easily, and have anyone
 
3983 of the group relate to the maintainers, while others will fight to
 
3984 death, organize heavy administrations up to national standards, and
 
3985 use strict channels.
 
3989 The German team is putting out a good example.  Right now, they are
 
3990 maybe half a dozen people revising translations of each other and
 
3991 discussing the linguistic issues.  I do not even have all the names.
 
3992 Ulrich Drepper is taking care of coordinating the German team.
 
3993 He subscribed to all my pretest lists, so I do not even have to warn
 
3994 him specifically of incoming releases.
 
3998 I'm sure, that is a good idea to get teams for each language working
 
3999 on translations. That will make the translations better and more
 
4006 <H4><A NAME=
"SEC61" HREF=
"gettext_toc.html#TOC61">Sub-Cultures
</A></H4> 
4009 Taking French for example, there are a few sub-cultures around
 
4010 computers which developed diverging vocabularies.  Picking volunteers
 
4011 here and there without addressing this problem in an organized way,
 
4012 soon in the project, might produce a distasteful mix of GNU programs,
 
4013 and possibly trigger endless quarrels among those who really care.
 
4017 Keeping some kind of unity in the way French localization of GNU
 
4018 programs is achieved is a difficult (and delicate) job.  Knowing the
 
4019 latin character of French people (:-), if we take this the wrong
 
4020 way, we could end up nowhere, or spoil a lot of energies.  Maybe we
 
4021 should begin to address this problem seriously 
<EM>before
</EM> GNU
 
4022 <CODE>gettext
</CODE> become officially published.  And I suspect that this
 
4028 <H4><A NAME=
"SEC62" HREF=
"gettext_toc.html#TOC62">Organizational Ideas
</A></H4> 
4031 I expect the next big changes after the official release.  Please note
 
4032 that I use the German translation of the short GPL message.  We need
 
4033 to set a few good examples before the localization goes out for true
 
4034 in GNU.  Here are a few points to discuss:
 
4041 Each group should have one FTP server (at least one master).
 
4045 The files on the server should reflect the latest version (of
 
4046 course!) and it should also contain a RCS directory with the
 
4047 corresponding archives (I don't have this now).
 
4051 There should also be a ChangeLog file (this is more useful than the
 
4052 RCS archive but can be generated automatically from the later by
 
4057 A 
<STRONG>core group
</STRONG> should judge about questionable changes (for now
 
4058 this group consists solely by me but I ask some others occasionally;
 
4059 this also seems to work).
 
4065 <H3><A NAME=
"SEC63" HREF=
"gettext_toc.html#TOC63">Mailing Lists
</A></H3> 
4068 If we get any inquiries about GNU 
<CODE>gettext
</CODE>, send them on to:
 
4073 <TT>`gnu-translation@prep.ai.mit.edu'
</TT> 
4077 The 
<TT>`*-pretest'
</TT> lists are quite useful to me, maybe the idea could
 
4078 be generalized to all GNU packages.  But each maintainer his/her way!
 
4082 , we have a mechanism in place here at
 
4083 <TT>`gnu.ai.mit.edu'
</TT> to track teams, support mailing lists for
 
4084 them and log members.  We have a slight preference that you use it.
 
4085 If this is OK with you, I can get you clued in.
 
4089 Things are changing!  A few years ago, when Daniel Fekete and I
 
4090 asked for a mailing list for GNU localization, nested at the FSF, we
 
4091 were politely invited to organize it anywhere else, and so did we.
 
4092 For communicating with my pretesters, I later made a handful of
 
4093 mailing lists located at iro.umontreal.ca and administrated by
 
4094 <CODE>majordomo
</CODE>.  These lists have been 
<EM>very
</EM> dependable
 
4099 I suspect that the German team will organize itself a mailing list
 
4100 located in Germany, and so forth for other countries.  But before they
 
4101 organize for true, it could surely be useful to offer mailing lists
 
4102 located at the FSF to each national team.  So yes, please explain me
 
4103 how I should proceed to create and handle them.
 
4107 We should create temporary mailing lists, one per country, to help
 
4108 people organize.  Temporary, because once regrouped and structured, it
 
4109 would be fair the volunteers from country bring back 
<EM>their
</EM> list
 
4110 in there and manage it as they want.  My feeling is that, in the long
 
4111 run, each team should run its own list, from within their country.
 
4112 There also should be some central list to which all teams could
 
4113 subscribe as they see fit, as long as each team is represented in it.
 
4118 <H2><A NAME=
"SEC64" HREF=
"gettext_toc.html#TOC64">Information Flow
</A></H2> 
4121 There will surely be some discussion about this messages after the
 
4122 packages are finally released.  If people now send you some proposals
 
4123 for better messages, how do you proceed?  Jim, please note that
 
4124 right now, as I put forward nearly a dozen of localizable programs, I
 
4125 receive both the translations and the coordination concerns about them.
 
4129 If I put one of my things to pretest, Ulrich receives the announcement
 
4130 and passes it on to the German team, who make last minute revisions.
 
4131 Then he submits the translation files to me 
<EM>as the maintainer
</EM>.
 
4132 For GNU packages I do not maintain, I would not even hear about it.
 
4133 This scheme could be made to work GNU-wide, I think.  For security
 
4134 reasons, maybe Ulrich (national coordinators, in fact) should update
 
4135 central registry kept by GNU (Jim, me, or Len's recruits) once in
 
4140 In December/January, I was aggressively ready to internationalize
 
4141 all of GNU, giving myself the duty of one small GNU package per week
 
4142 or so, taking many weeks or months for bigger packages.  But it does
 
4143 not work this way.  I first did all the things I'm responsible for.
 
4144 I've nothing against some missionary work on other maintainers, but
 
4145 I'm also loosing a lot of energy over it--same debates over again.
 
4149 And when the first localized packages are released we'll get a lot of
 
4150 responses about ugly translations :-).  Surely, and we need to have
 
4151 beforehand a fairly good idea about how to handle the information
 
4152 flow between the national teams and the package maintainers.
 
4156 Please start saving somewhere a quick history of each PO file.  I know
 
4157 for sure that the file format will change, allowing for comments.
 
4158 It would be nice that each file has a kind of log, and references for
 
4159 those who want to submit comments or gripes, or otherwise contribute.
 
4160 I sent a proposal for a fast and flexible format, but it is not
 
4161 receiving acceptance yet by the GNU deciders.  I'll tell you when I
 
4162 have more information about this.
 
4167 <H1><A NAME=
"SEC65" HREF=
"gettext_toc.html#TOC65">The Maintainer's View
</A></H1> 
4170 The maintainer of a package has many responsibilities.  One of them
 
4171 is ensuring that the package will install easily on many platforms,
 
4172 and that the magic we described earlier (see section 
<A HREF=
"gettext.html#SEC32">The User's View
</A>) will work
 
4173 for installers and end users.
 
4177 Of course, there are many possible ways by which GNU 
<CODE>gettext
</CODE> 
4178 might be integrated in a distribution, and this chapter does not cover
 
4179 them in all generality.  Instead, it details one possible approach
 
4180 which is especially adequate for many GNU distributions, because
 
4181 GNU 
<CODE>gettext
</CODE> is purposely for helping the internationalization
 
4182 of the whole GNU project.  So, the maintainer's view presented here
 
4183 presumes that the package already has a 
<TT>`configure.in'
</TT> file and
 
4188 Nevertheless, GNU 
<CODE>gettext
</CODE> may surely be useful for non-GNU
 
4189 packages, but the maintainers of such packages might have to show
 
4190 imagination and initiative in organizing their distributions so
 
4191 <CODE>gettext
</CODE> work for them in all situations.  There are surely
 
4196 Even if 
<CODE>gettext
</CODE> methods are now stabilizing, slight adjustments
 
4197 might be needed between successive 
<CODE>gettext
</CODE> versions, so you
 
4198 should ideally revise this chapter in subsequent releases, looking
 
4205 <H2><A NAME=
"SEC66" HREF=
"gettext_toc.html#TOC66">Flat or Non-Flat Directory Structures
</A></H2> 
4208 Some GNU packages are distributed as 
<CODE>tar
</CODE> files which unpack
 
4209 in a single directory, these are said to be 
<STRONG>flat
</STRONG> distributions.
 
4210 Other GNU packages have a one level hierarchy of subdirectories, using
 
4211 for example a subdirectory named 
<TT>`doc/'
</TT> for the Texinfo manual and
 
4212 man pages, another called 
<TT>`lib/'
</TT> for holding functions meant to
 
4213 replace or complement C libraries, and a subdirectory 
<TT>`src/'
</TT> for
 
4214 holding the proper sources for the package.  These other distributions
 
4215 are said to be 
<STRONG>non-flat
</STRONG>.
 
4219 For now, we cannot say much about flat distributions.  A flat
 
4220 directory structure has the disadvantage of increasing the difficulty
 
4221 of updating to a new version of GNU 
<CODE>gettext
</CODE>.  Also, if you have
 
4222 many PO files, this could somewhat pollute your single directory.
 
4223 In the GNU 
<CODE>gettext
</CODE> distribution, the 
<TT>`misc/'
</TT> directory
 
4224 contains a shell script named 
<TT>`combine-sh'
</TT>.  That script may
 
4225 be used for combining all the C files of the 
<TT>`intl/'
</TT> directory
 
4226 into a pair of C files (one 
<TT>`.c'
</TT> and one 
<TT>`.h'
</TT>).  Those two
 
4227 generated files would fit more easily in a flat directory structure,
 
4228 and you will then have to add these two files to your project.
 
4232 Maybe because GNU 
<CODE>gettext
</CODE> itself has a non-flat structure,
 
4233 we have more experience with this approach, and this is what will be
 
4234 described in the remaining of this chapter.  Some maintainers might
 
4235 use this as an opportunity to unflatten their package structure.
 
4236 Only later, once gained more experience adapting GNU 
<CODE>gettext
</CODE> 
4237 to flat distributions, we might add some notes about how to proceed
 
4243 <H2><A NAME=
"SEC67" HREF=
"gettext_toc.html#TOC67">Prerequisite Works
</A></H2> 
4246 There are some works which are required for using GNU 
<CODE>gettext
</CODE> 
4247 in one of your package.  These works have some kind of generality
 
4248 that escape the point by point descriptions used in the remainder
 
4249 of this chapter.  So, we describe them here.
 
4256 Before attempting to use you should install some other packages first.
 
4257 Ensure that recent versions of GNU 
<CODE>m4
</CODE>, GNU Autoconf and GNU
 
4258 <CODE>gettext
</CODE> are already installed at your site, and if not, proceed
 
4259 to do this first.  If you got to install these things, beware that
 
4260 GNU 
<CODE>m4
</CODE> must be fully installed before GNU Autoconf is even
 
4261 <EM>configured
</EM>.
 
4263 Those three packages are only needed to you, as a maintainer; the
 
4264 installers of your own package and end users do not really need any
 
4265 of GNU 
<CODE>m4
</CODE>, GNU Autoconf or GNU 
<CODE>gettext
</CODE> for successfully
 
4266 installing and running your package, with messages properly translated.
 
4267 But this is not completely true if you provide internationalized
 
4268 shell scripts within your own package: GNU 
<CODE>gettext
</CODE> shall
 
4269 then be installed at the user site if the end users want to see the
 
4270 translation of shell script messages.
 
4274 Your package should use Autoconf and have a 
<TT>`configure.in'
</TT> file.
 
4275 If it does not, you have to learn how.  The Autoconf documentation
 
4276 is quite well written, it is a good idea that you print it and get
 
4281 Your C sources should have already been modified according to
 
4282 instructions given earlier in this manual.  See section 
<A HREF=
"gettext.html#SEC13">Preparing Program Sources
</A>.
 
4286 Your 
<TT>`po/'
</TT> directory should receive all PO files submitted to you
 
4287 by the translator teams, each having 
<TT>`
<VAR>ll
</VAR>.po'
</TT> as a name.
 
4288 This is not usually easy to get translation
 
4289 work done before your package gets internationalized and available!
 
4290 Since the cycle has to start somewhere, the easiest for the maintainer
 
4291 is to start with absolutely no PO files, and wait until various
 
4292 translator teams get interested in your package, and submit PO files.
 
4297 It is worth adding here a few words about how the maintainer should
 
4298 ideally behave with PO files submissions.  As a maintainer, your
 
4299 role is to authentify the origin of the submission as being the
 
4300 representative of the appropriate GNU translating team (forward the
 
4301 submission to 
<TT>`gnu-translation@prep.ai.mit.edu'
</TT> in case of
 
4302 doubt), to ensure that the PO file format is not severely broken and
 
4303 does not prevent successful installation, and for the rest, to merely
 
4304 to put these PO files in 
<TT>`po/'
</TT> for distribution.
 
4308 As a maintainer, you do not have to take on your shoulders the
 
4309 responsibility of checking if the translations are adequate or
 
4310 complete, and should avoid diving into linguistic matters.  Translation
 
4311 teams drive themselves and are fully responsible of their linguistic
 
4312 choices for GNU.  Keep in mind that translator teams are 
<EM>not
</EM> 
4313 driven by maintainers.  You can help by carefully redirecting all
 
4314 communications and reports from users about linguistic matters to the
 
4315 appropriate translation team, or explain users how to reach or join
 
4316 their team.  The simplest might be to send them the 
<TT>`NLS'
</TT> file.
 
4320 Maintainers should 
<EM>never ever
</EM> apply PO file bug reports
 
4321 themselves, short-cutting translation teams.  If some translator has
 
4322 difficulty to get some of her points through her team, it should not be
 
4323 an issue for her to directly negotiate translations with maintainers.
 
4324 Teams ought to settle their problems themselves, if any.  If you, as
 
4325 a maintainer, ever think there is a real problem with a team, please
 
4326 never try to 
<EM>solve
</EM> a team's problem on your own.
 
4331 <H2><A NAME=
"SEC68" HREF=
"gettext_toc.html#TOC68">Invoking the 
<CODE>gettextize
</CODE> Program
</A></H2> 
4334 Some files are consistently and identically needed in every package
 
4335 internationalized through GNU 
<CODE>gettext
</CODE>.  As a matter of
 
4336 convenience, the 
<CODE>gettextize
</CODE> program puts all these files right
 
4337 in your package.  This program has the following synopsis:
 
4342 gettextize [ 
<VAR>option
</VAR>... ] [ 
<VAR>directory
</VAR> ]
 
4346 and accepts the following options:
 
4351 <DT><SAMP>`-f'
</SAMP> 
4353 <DT><SAMP>`--force'
</SAMP> 
4355 Force replacement of files which already exist.
 
4357 <DT><SAMP>`-h'
</SAMP> 
4359 <DT><SAMP>`--help'
</SAMP> 
4361 Display this help and exit.
 
4363 <DT><SAMP>`--version'
</SAMP> 
4365 Output version information and exit.
 
4370 If 
<VAR>directory
</VAR> is given, this is the top level directory of a
 
4371 package to prepare for using GNU 
<CODE>gettext
</CODE>.  If not given, it
 
4372 is assumed that the current directory is the top level directory of
 
4377 The program 
<CODE>gettextize
</CODE> provides the following files.  However,
 
4378 no existing file will be replaced unless the option 
<CODE>--force
</CODE> 
4379 (
<CODE>-f
</CODE>) is specified.
 
4386 The 
<TT>`NLS'
</TT> file is copied in the main directory of your package,
 
4387 the one being at the top level.  This file gives the main indications
 
4388 about how to install and use the Native Language Support features
 
4389 of your program.  You might elect to use a more recent copy of this
 
4390 <TT>`NLS'
</TT> file than the one provided through 
<CODE>gettextize
</CODE>, if
 
4391 you have one handy.  You may also fetch a more recent copy of file
 
4392 <TT>`NLS'
</TT> from most GNU archive sites.
 
4396 A 
<TT>`po/'
</TT> directory is created for eventually holding
 
4397 all translation files, but initially only containing the file
 
4398 <TT>`po/Makefile.in.in'
</TT> from the GNU 
<CODE>gettext
</CODE> distribution.
 
4399 (beware the double 
<SAMP>`.in'
</SAMP> in the file name). If the 
<TT>`po/'
</TT> 
4400 directory already exists, it will be preserved along with the files
 
4401 it contains, and only 
<TT>`Makefile.in.in'
</TT> will be overwritten.
 
4405 A 
<TT>`intl/'
</TT> directory is created and filled with most of the files
 
4406 originally in the 
<TT>`intl/'
</TT> directory of the GNU 
<CODE>gettext
</CODE> 
4407 distribution.  Also, if option 
<CODE>--force
</CODE> (
<CODE>-f
</CODE>) is given,
 
4408 the 
<TT>`intl/'
</TT> directory is emptied first.
 
4413 If your site support symbolic links, 
<CODE>gettextize
</CODE> will not
 
4414 actually copy the files into your package, but establish symbolic
 
4415 links instead.  This avoids duplicating the disk space needed in
 
4416 all packages.  Merely using the 
<SAMP>`-h'
</SAMP> option while creating the
 
4417 <CODE>tar
</CODE> archive of your distribution will resolve each link by an
 
4418 actual copy in the distribution archive.  So, to insist, you really
 
4419 should use 
<SAMP>`-h'
</SAMP> option with 
<CODE>tar
</CODE> within your 
<CODE>dist
</CODE> 
4420 goal of your main 
<TT>`Makefile.in'
</TT>.
 
4424 It is interesting to understand that most new files for supporting
 
4425 GNU 
<CODE>gettext
</CODE> facilities in one package go in 
<TT>`intl/'
</TT> 
4426 and 
<TT>`po/'
</TT> subdirectories.  One distinction between these two
 
4427 directories is that 
<TT>`intl/'
</TT> is meant to be completely identical
 
4428 in all packages using GNU 
<CODE>gettext
</CODE>, while all newly created
 
4429 files, which have to be different, go into 
<TT>`po/'
</TT>.  There is a
 
4430 common 
<TT>`Makefile.in.in'
</TT> in 
<TT>`po/'
</TT>, because the 
<TT>`po/'
</TT> 
4431 directory needs its own 
<TT>`Makefile'
</TT>, and it has been designed so
 
4432 it can be identical in all packages.
 
4437 <H2><A NAME=
"SEC69" HREF=
"gettext_toc.html#TOC69">Files You Must Create or Alter
</A></H2> 
4440 Besides files which are automatically added through 
<CODE>gettextize
</CODE>,
 
4441 there are many files needing revision for properly interacting with
 
4442 GNU 
<CODE>gettext
</CODE>.  If you are closely following GNU standards for
 
4443 Makefile engineering and auto-configuration, the adaptations should
 
4444 be easier to achieve.  Here is a point by point description of the
 
4445 changes needed in each.
 
4449 So, here comes a list of files, each one followed by a description of
 
4450 all alterations it needs.  Many examples are taken out from the GNU
 
4451 <CODE>gettext
</CODE> 0.10 distribution itself.  You may indeed
 
4452 refer to the source code of the GNU 
<CODE>gettext
</CODE> package, as it
 
4453 is intended to be a good example and master implementation for using
 
4454 its own functionality.
 
4460 <H3><A NAME=
"SEC70" HREF=
"gettext_toc.html#TOC70"><TT>`POTFILES'
</TT> in 
<TT>`po/'
</TT></A></H3> 
4463 The 
<TT>`po/'
</TT> directory should receive a file named
 
4464 <TT>`POTFILES.in'
</TT>.  This file tells which files, among all program
 
4465 sources, have marked strings needing translation.  Here is an example
 
4471 # List of source files containing translatable strings.
 
4472 # Copyright (C) 
1995 Free Software Foundation, Inc.
 
4474 # Common library files
 
4479 # Package source files
 
4486 Dashed comments and white lines are ignored.  All other lines
 
4487 list those source files containing strings marked for translation
 
4488 (see section 
<A HREF=
"gettext.html#SEC15">How Marks Appears in Sources
</A>), in a notation relative to the top level
 
4489 of your whole distribution, rather than the location of the
 
4490 <TT>`POTFILES.in'
</TT> file itself.
 
4495 <H3><A NAME=
"SEC71" HREF=
"gettext_toc.html#TOC71"><TT>`configure.in'
</TT> at top level
</A></H3> 
4499 <LI>Declare the package and version.
 
4501 This is done by a set of lines like these:
 
4507 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
 
4508 AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
 
4513 Of course, you replace 
<SAMP>`gettext'
</SAMP> with the name of your package,
 
4514 and 
<SAMP>`
0.10'
</SAMP> by its version numbers, exactly as they
 
4515 should appear in the packaged 
<CODE>tar
</CODE> file name of your distribution
 
4516 (
<TT>`gettext-
0.10.tar.gz'
</TT>, here).
 
4518 <LI>Declare the available translations.
 
4520 This is done by defining 
<CODE>ALL_LINGUAS
</CODE> to the white separated,
 
4521 quoted list of available languages, in a single line, like this:
 
4528 This example means that German and French PO files are available, so
 
4529 that these languages are currently supported by your package.  If you
 
4530 want to further restrict, at installation time, the set of installed
 
4531 languages, this should not be done by modifying 
<CODE>ALL_LINGUAS
</CODE> in
 
4532 <TT>`configure.in'
</TT>, but rather by using the 
<CODE>LINGUAS
</CODE> environment
 
4533 variable (see section 
<A HREF=
"gettext.html#SEC34">Magic for Installers
</A>).
 
4535 <LI>Check for internationalization support.
 
4537 Here is the main 
<CODE>m4
</CODE> macro for triggering internationalization
 
4538 support.  Just add this line to 
<TT>`configure.in'
</TT>:
 
4545 This call is purposely simple, even if it generates a lot of configure
 
4546 time checking and actions.
 
4548 <LI>Obtain some 
<TT>`libintl.h'
</TT> header file.
 
4550 Once you called 
<CODE>ud_GNU_GETTEXT
</CODE> in 
<TT>`configure.in'
</TT>, use:
 
4554 AC_LINK_FILES($nls_cv_header_libgt, $nls_cv_header_intl)
 
4557 This will create one header file 
<TT>`libintl.h'
</TT>.  The reason for
 
4558 this has to do with the fact that some systems, using the Uniforum
 
4559 message handling functions, already have a file of this name.
 
4561 The 
<CODE>AC_LINK_FILES
</CODE> call has not been integrated into the
 
4562 <CODE>ud_GNU_GETTEXT
</CODE> macro because there can be only one such call
 
4563 in a 
<TT>`configure'
</TT> file.  If you already use it, you will have to
 
4564 <EM>merge
</EM> the needed 
<CODE>AC_LINK_FILES
</CODE> within yours, by adding
 
4565 the first argument at the end of the list of your first argument,
 
4566 and adding the second argument at the end of the list of your second
 
4569 <LI>Have output files created.
 
4571 The 
<CODE>AC_OUTPUT
</CODE> directive, at the end of your 
<TT>`configure.in'
</TT> 
4572 file, needs to be modified in two ways:
 
4576 AC_OUTPUT([
<VAR>existing configuration files
</VAR> intl/Makefile po/Makefile.in],
 
4577 [sed -e "/POTFILES =/r po/POTFILES" po/Makefile.in 
> po/Makefile
 
4578 <VAR>existing additional actions
</VAR>])
 
4581 The modification to the first argument to 
<CODE>AC_OUTPUT
</CODE> asks
 
4582 for substitution in the 
<TT>`intl/'
</TT> and 
<TT>`po/'
</TT> directories.
 
4583 Note the 
<SAMP>`.in'
</SAMP> suffix used for 
<TT>`po/'
</TT> only.  This is because
 
4584 the distributed file is really 
<TT>`po/Makefile.in.in'
</TT>.
 
4586 The modification to the second argument ensures that 
<TT>`po/Makefile'
</TT> 
4587 gets generated out of the 
<TT>`po/Makefile.in'
</TT> just created, including
 
4588 in it the 
<TT>`po/POTFILES'
</TT> produced by 
<CODE>ud_GNU_GETTEXT
</CODE>.
 
4589 Two steps are needed because 
<TT>`po/POTFILES'
</TT> can get lengthy in
 
4590 some packages, too lengthy in fact for being able to merely use an
 
4591 Autoconf substituted variable, as many 
<CODE>sed
</CODE>s cannot handle very
 
4598 <H3><A NAME=
"SEC72" HREF=
"gettext_toc.html#TOC72"><TT>`aclocal.m4'
</TT> at top level
</A></H3> 
4601 If you do not have an 
<TT>`aclocal.m4'
</TT> file in your distribution,
 
4602 the simplest is taking a copy of 
<TT>`aclocal.m4'
</TT> from
 
4603 GNU 
<CODE>gettext
</CODE>.  But to be precise, you only need macros
 
4604 <CODE>ud_LC_MESSAGES
</CODE>, 
<CODE>ud_WITH_NLS
</CODE> and 
<CODE>ud_GNU_GETTEXT
</CODE>,
 
4605 so you may use an editor and remove macros you do not need.
 
4609 If you already have an 
<TT>`aclocal.m4'
</TT> file, then you will have
 
4610 to merge the said macros into your 
<TT>`aclocal.m4'
</TT>.  Note that if
 
4611 you are upgrading from a previous release of GNU 
<CODE>gettext
</CODE>, you
 
4612 should most probably 
<EM>replace
</EM> the said macros, as they usually
 
4613 change a little from one release of GNU 
<CODE>gettext
</CODE> to the next.
 
4614 Their contents may vary as we get more experience with strange systems
 
4619 These macros check for the internationalization support functions
 
4620 and related informations.  Hopefully, once stabilized, these macros
 
4621 might be integrated in the standard Autoconf set, because this
 
4622 piece of 
<CODE>m4
</CODE> code will be the same for all projects using GNU
 
4623 <CODE>gettext
</CODE>.
 
4628 <H3><A NAME=
"SEC73" HREF=
"gettext_toc.html#TOC73"><TT>`acconfig.h'
</TT> at top level
</A></H3> 
4631 If you do not have an 
<TT>`acconfig.h'
</TT> file in your distribution,
 
4632 the simplest is use take a copy of 
<TT>`acconfig.h'
</TT> from
 
4633 GNU 
<CODE>gettext
</CODE>.  But to be precise, you only need the
 
4634 lines and comments for 
<CODE>ENABLE_NLS
</CODE>, 
<CODE>HAVE_CATGETS
</CODE>,
 
4635 <CODE>HAVE_GETTEXT
</CODE> and 
<CODE>HAVE_LC_MESSAGES
</CODE>, so you may use
 
4636 an editor and remove everything else.  If you already have an
 
4637 <TT>`acconfig.h'
</TT> file, then you should merge the said definitions
 
4638 into your 
<TT>`acconfig.h'
</TT>.
 
4643 <H3><A NAME=
"SEC74" HREF=
"gettext_toc.html#TOC74"><TT>`Makefile.in'
</TT> at top level
</A></H3> 
4646 Here are a few modifications you need to make to your main, top-level
 
4647 <TT>`Makefile.in'
</TT> file.
 
4654 Add the following lines near the beginning of your 
<TT>`Makefile.in'
</TT>,
 
4655 so the 
<SAMP>`dist:'
</SAMP> goal will work properly (as explained further down):
 
4665 Add file 
<TT>`NLS'
</TT> to the 
<CODE>DISTFILES
</CODE> definition, so the file gets
 
4670 Wherever you process subdirectories in your 
<TT>`Makefile.in'
</TT>, be
 
4671 sure you also process 
<CODE>@INTLSUB@
</CODE> and 
<CODE>@POSUB@
</CODE>, which
 
4672 are replaced respectively by 
<SAMP>`intl'
</SAMP> and 
<SAMP>`po'
</SAMP>, or empty
 
4673 when the configuration processes decides these directories should
 
4676 Here is an example of a canonical order of processing.  In this
 
4677 example, we also define 
<CODE>SUBDIRS
</CODE> in 
<CODE>Makefile.in
</CODE> for it
 
4678 to be further used in the 
<SAMP>`dist:'
</SAMP> goal.
 
4682 SUBDIRS = doc lib @INTLSUB@ src @POSUB@
 
4685 that you will have to adapt to your own package.
 
4689 A delicate point is the 
<SAMP>`dist:'
</SAMP> goal, as both
 
4690 <TT>`intl/Makefile'
</TT> and 
<TT>`po/Makefile'
</TT> will later assume that the
 
4691 proper directory has been set up from the main 
<TT>`Makefile'
</TT>.  Here is
 
4692 an example at what the 
<SAMP>`dist:'
</SAMP> goal might look like:
 
4696 distdir = $(PACKAGE)-$(VERSION)
 
4700         chmod 
777 $(distdir)
 
4701         for file in $(DISTFILES); do \
 
4702           ln $$file $(distdir) 
2>/dev/null || cp -p $$file $(distdir); \
 
4704         for subdir in $(SUBDIRS); do \
 
4705           mkdir $(distdir)/$$subdir || exit 
1; \
 
4706           chmod 
777 $(distdir)/$$subdir; \
 
4707           (cd $$subdir 
&& $(MAKE) $@) || exit 
1; \
 
4709         tar chozf $(distdir).tar.gz $(distdir)
 
4717 <H3><A NAME=
"SEC75" HREF=
"gettext_toc.html#TOC75"><TT>`Makefile.in'
</TT> in 
<TT>`src/'
</TT></A></H3> 
4720 Some of the modifications made in the main 
<TT>`Makefile.in'
</TT> will
 
4721 also be needed in the 
<TT>`Makefile.in'
</TT> from your package sources,
 
4722 which we assume here to be in the 
<TT>`src/'
</TT> subdirectory.  Here are
 
4723 all the modifications needed in 
<TT>`src/Makefile.in'
</TT>:
 
4730 In view of the 
<SAMP>`dist:'
</SAMP> goal, you should have these lines near the
 
4731 beginning of 
<TT>`src/Makefile.in'
</TT>:
 
4741 If not done already, you should guarantee that 
<CODE>top_srcdir
</CODE> 
4742 gets defined.  This will serve for 
<CODE>cpp
</CODE> include files.  Just add
 
4747 top_srcdir = @top_srcdir@
 
4752 You might also want to define 
<CODE>subdir
</CODE> as 
<SAMP>`src'
</SAMP>, later
 
4753 allowing for almost uniform 
<SAMP>`dist:'
</SAMP> goals in all your
 
4754 <TT>`Makefile.in'
</TT>.  At list, the 
<SAMP>`dist:'
</SAMP> goal below assume that
 
4764 You should ensure that the final linking will use 
<CODE>@INTLLIBS@
</CODE> as
 
4765 a library.  An easy way to achieve this is to manage that it gets into
 
4766 <CODE>LIBS
</CODE>, like this:
 
4770 LIBS = @INTLLIBS@ @LIBS@
 
4773 In most GNU packages one will find a directory 
<TT>`lib/'
</TT> in which a
 
4774 library containing some helper functions will be build.  (You need at
 
4775 least the few functions which the GNU 
<CODE>gettext
</CODE> Library itself
 
4776 needs.)  However some of the functions in the 
<TT>`lib/'
</TT> also give
 
4777 messages to the user which of course should be translated, too.  Taking
 
4778 care of this it is not enough to place the support library (say
 
4779 <TT>`libsupport.a'
</TT>) just between the 
<CODE>@INTLLIBS@
</CODE> and
 
4780 <CODE>@LIBS@
</CODE> in the above example.  Instead one has to write this:
 
4784 LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@
 
4789 You should also ensure that directory 
<TT>`intl/'
</TT> will be searched for
 
4790 C preprocessor include files in all circumstances.  So, you have to
 
4791 manage so both 
<SAMP>`-I../intl'
</SAMP> and 
<SAMP>`-I$(top_srcdir)/intl'
</SAMP> will
 
4792 be given to the C compiler.
 
4796 Your 
<SAMP>`dist:'
</SAMP> goal has to conform with others.  Here is a
 
4797 reasonable definition for it:
 
4801 distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
 
4802 dist: Makefile $(DISTFILES)
 
4803         for file in $(DISTFILES); do \
 
4804           ln $$file $(distdir) 
2>/dev/null || cp -p $$file $(distdir); \
 
4812 <H1><A NAME=
"SEC76" HREF=
"gettext_toc.html#TOC76">Concluding Remarks
</A></H1> 
4815 We would like to conclude this GNU 
<CODE>gettext
</CODE> manual by presenting
 
4816 an history of the GNU Translation Project so far.  We finally give
 
4817 a few pointers for those who want to do further research or readings
 
4818 about Native Language Support matters.
 
4824 <H2><A NAME=
"SEC77" HREF=
"gettext_toc.html#TOC77">History of GNU 
<CODE>gettext
</CODE></A></H2> 
4827 Internationalization concerns and algorithms have been informally
 
4828 and casually discussed for years in GNU, sometimes around GNU
 
4829 <CODE>libc
</CODE>, maybe around the incoming 
<CODE>Hurd
</CODE>, or otherwise
 
4830 (nobody clearly remembers).  And even then, when the work started for
 
4831 real, this was somewhat independently of these previous discussions.
 
4835 This all began in July 
1994, when Patrick D'Cruze had the idea and
 
4836 initiative of internationalizing version 
3.9.2 of GNU 
<CODE>fileutils
</CODE>.
 
4837 He then asked Jim Meyering, the maintainer, how to get those changes
 
4838 folded into an official release.  That first draft was full of
 
4839 <CODE>#ifdef
</CODE>s and somewhat disconcerting, and Jim wanted to find
 
4840 nicer ways.  Patrick and Jim shared some tries and experimentations
 
4841 in this area.  Then, feeling that this might eventually have a deeper
 
4842 impact on GNU, Jim wanted to know what standards were, and contacted
 
4843 Richard Stallman, who very quickly and verbally described an overall
 
4844 design for what was meant to become 
<CODE>glocale
</CODE>, at that time.
 
4848 Jim implemented 
<CODE>glocale
</CODE> and got a lot of exhausting feedback
 
4849 from Patrick and Richard, of course, but also from Mitchum DSouza
 
4850 (who wrote a 
<CODE>catgets
</CODE>-like package), Roland McGrath, maybe David
 
4851 MacKenzie,  Pinard, and Paul Eggert, all pushing and
 
4852 pulling in various directions, not always compatible, to the extent
 
4853 that after a couple of test releases, 
<CODE>glocale
</CODE> was torn apart.
 
4857 While Jim took some distance and time and became dad for a second
 
4858 time, Roland wanted to get GNU 
<CODE>libc
</CODE> internationalized, and
 
4859 got Ulrich Drepper involved in that project.  Instead of starting
 
4860 from 
<CODE>glocale
</CODE>, Ulrich rewrote something from scratch, but
 
4861 more conformant to the set of guidelines who emerged out of the
 
4862 <CODE>glocale
</CODE> effort.  Then, Ulrich got people from the previous
 
4863 forum to involve themselves into this new project, and the switch
 
4864 from 
<CODE>glocale
</CODE> to what was first named 
<CODE>msgutils
</CODE>, renamed
 
4865 <CODE>nlsutils
</CODE>, and later 
<CODE>gettext
</CODE>, became officially accepted
 
4866 by Richard in May 
1995 or so.
 
4870 Let's summarize by saying that Ulrich Drepper wrote GNU 
<CODE>gettext
</CODE> 
4871 in April 
1995.  The first official release of the package, including
 
4872 PO mode, occurred in July 
1995, and was numbered 
0.7.  Other people
 
4873 contributed to the effort by providing a discussion forum around
 
4874 Ulrich, writing little pieces of code, or testing.  These are quoted
 
4875 in the 
<CODE>THANKS
</CODE> file which comes with the GNU 
<CODE>gettext
</CODE> 
4880 While this was being done,  adapted half a dozen of
 
4881 GNU packages to 
<CODE>glocale
</CODE> first, then later to 
<CODE>gettext
</CODE>,
 
4882 putting them in pretest, so providing along the way an effective
 
4883 user environment for fine tuning the evolving tools.  He also took
 
4884 the responsibility of organizing and coordinating the GNU Translation
 
4885 Project.  After nearly a year of informal exchanges between people from
 
4886 many countries, translator teams started to exist in May 
1995, through
 
4887 the creation and support by Patrick D'Cruze of twenty unmoderated
 
4888 mailing lists for that many native languages, and two moderated
 
4889 lists: one for reaching all teams at once, the other for reaching
 
4890 all maintainers of internationalized packages in GNU.
 
4894  also wrote PO mode in June 
1995 with the collaboration
 
4895 of Greg McGary, as a kind of contribution to Ulrich's package.
 
4896 He also gave a hand with the GNU 
<CODE>gettext
</CODE> Texinfo manual.
 
4901 <H2><A NAME=
"SEC78" HREF=
"gettext_toc.html#TOC78">Related Readings
</A></H2> 
4904 Eugene H. Dorr (
<TT>`dorre@well.com'
</TT>) maintains an interesting
 
4905 bibliography on internationalization matters, called
 
4906 <CITE>Internationalization Reference List
</CITE>, which is available as:
 
4909 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
 
4913 Michael Gschwind (
<TT>`mike@vlsivie.tuwien.ac.at'
</TT>) maintains a
 
4914 Frequently Asked Questions (FAQ) list, entitled 
<CITE>Programming for
 
4915 Internationalisation
</CITE>.  This FAQ discusses writing programs which
 
4916 can handle different language conventions, character sets, etc.;
 
4917 and is applicable to all character set encodings, with particular
 
4918 emphasis on ISO 
8859-
1.  It is regularly published in Usenet
 
4919 groups 
<TT>`comp.unix.questions'
</TT>, 
<TT>`comp.std.internat'
</TT>,
 
4920 <TT>`comp.software.international'
</TT>, 
<TT>`comp.lang.c'
</TT>,
 
4921 <TT>`comp.windows.x'
</TT>, 
<TT>`comp.std.c'
</TT>, 
<TT>`comp.answers'
</TT> 
4922 and 
<TT>`news.answers'
</TT>.  The home location of this document is:
 
4925 ftp://ftp.vlsivie.tuwien.ac.at/pub/
8bit/ISO-programming
 
4929 Patrick D'Cruze (
<TT>`pdcruze@li.org'
</TT>) wrote a tutorial about NLS
 
4930 matters, and Jochen Hein (
<TT>`Hein@student.tu-clausthal.de'
</TT>) took
 
4931 over the responsibility of maintaining it.  It may be found as:
 
4934 ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
 
4935      ...locale-tutorial-
0.8.txt.gz
 
4939 This site is mirrored in:
 
4942 ftp://ftp.ibp.fr/pub/linux/sunsite/
 
4946 A French version of the same tutorial should be findable at:
 
4949 ftp://ftp.ibp.fr/pub/linux/french/docs/
 
4953 together with French translations of many Linux-related documents.
 
4957 This document was generated on 
4 September 
1998 using the
 
4958 <A HREF=
"http://wwwcn.cern.ch/dci/texi2html/">texi2html
</A> 
4959 translator version 
1.51.
</P>