]>
Commit | Line | Data |
---|---|---|
90e94c04 JS |
1 | <HTML> |
2 | <HEAD> | |
3 | <!-- This HTML file has been created by texi2html 1.51 | |
4 | from gettext.texi on 4 September 1998 --> | |
5 | ||
6 | <TITLE>GNU gettext utilities</TITLE> | |
7 | </HEAD> | |
8 | <BODY> | |
9 | <H1>GNU gettext tools, version 0.10</H1> | |
10 | <H2>Native Language Support Library and Tools</H2> | |
11 | <H2>Edition 0.10, 26 November</H2> | |
12 | <ADDRESS>Ulrich Drepper</ADDRESS> | |
13 | <ADDRESS>Jim Meyering</ADDRESS> | |
14 | <ADDRESS>Pinard</ADDRESS> | |
15 | <P> | |
16 | <P><HR><P> | |
17 | ||
18 | <P> | |
19 | Copyright (C) 1995 Free Software Foundation, Inc. | |
20 | ||
21 | </P> | |
22 | <P> | |
23 | Permission is granted to make and distribute verbatim copies of | |
24 | this manual provided the copyright notice and this permission notice | |
25 | are preserved on all copies. | |
26 | ||
27 | </P> | |
28 | <P> | |
29 | Permission is granted to copy and distribute modified versions of this | |
30 | manual under the conditions for verbatim copying, provided that the entire | |
31 | resulting derived work is distributed under the terms of a permission | |
32 | notice identical to this one. | |
33 | ||
34 | </P> | |
35 | <P> | |
36 | Permission is granted to copy and distribute translations of this manual | |
37 | into another language, under the above conditions for modified versions, | |
38 | except that this permission notice may be stated in a translation approved | |
39 | by the Foundation. | |
40 | ||
41 | </P> | |
42 | ||
43 | ||
44 | ||
45 | <H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">Introduction</A></H1> | |
46 | ||
47 | ||
48 | <BLOCKQUOTE> | |
49 | <P> | |
50 | This manual is still in <EM>DRAFT</EM> state. Some sections are still | |
51 | empty, or almost. We keep merging material from other sources | |
52 | (essentially email folders) while the proper integration of this | |
53 | material is delayed. | |
54 | </BLOCKQUOTE> | |
55 | ||
56 | <P> | |
57 | In this manual, we use <EM>he</EM> when speaking of the programmer or | |
58 | maintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM> | |
59 | when speaking of the installers or end users of the translated program. | |
60 | This is only a convenience for clarifying the documentation. It is | |
61 | absolutely not meant to imply that some roles are more appropriate | |
62 | to males or females. Besides, as you might guess, GNU <CODE>gettext</CODE> | |
63 | is meant to be useful for people using computers, whatever their sex, | |
64 | race, religion or nationality! | |
65 | ||
66 | </P> | |
67 | <P> | |
68 | This chapter explains what are the goals seeked by the mere existence | |
69 | of GNU <CODE>gettext</CODE>. Then, it explains a few wide concepts around | |
70 | Native Language Support, and situates message translation in regard | |
71 | to other aspects of national and cultural variance, as applicable | |
72 | to programs. It also surveys what are those files used to convey | |
73 | translations. It explains how the various tools interrelate in the | |
74 | initial generation for these files, and later, how the maintenance | |
75 | cycle usually operate. | |
76 | ||
77 | </P> | |
78 | ||
79 | ||
80 | ||
81 | <H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">The Purpose of GNU <CODE>gettext</CODE></A></H2> | |
82 | ||
83 | <P> | |
84 | Usually, programs are written and documented in English, and use | |
85 | English at execution time for interacting with users. This is true | |
86 | not only from within GNU, but also in a great deal of commercial | |
87 | and free software. Using a common language is quite handy for | |
88 | communication between developers, maintainers and users from all | |
89 | countries. On the other hand, most people are less comfortable with | |
90 | English than with their own native language, and would rather prefer | |
91 | using their mother tongue for day to day's work, as far as possible. | |
92 | Many would simply <EM>love</EM> seeing their computer screen showing | |
93 | a lot less of English, and far more of their own spoken language. | |
94 | ||
95 | </P> | |
96 | <P> | |
97 | However, to some people, this dream might appear so far fetched that | |
98 | they may believe it is not even worth spending time thinking about | |
99 | it, and they have no confidence at all that the dream might ever | |
100 | become true. Many did not loose hope yet, and organized themselves. | |
101 | The GNU Translation Project is a formalization of this hope into a | |
102 | workable structure, which has a good chance to get all of us nearer | |
103 | the achievement of a truly multi-lingual set of programs. | |
104 | ||
105 | </P> | |
106 | <P> | |
107 | GNU <CODE>gettext</CODE> is an important step for the GNU Translation | |
108 | Project, as it is an asset on which we may build many other steps. | |
109 | This package offers to programmers, translators and even users, a | |
110 | well integrated set of tools and documentation. Specifically, the GNU | |
111 | <CODE>gettext</CODE> utilities are a set of tools that provides a framework | |
112 | to help other GNU packages produce multi-lingual messages. These tools | |
113 | include a set of conventions about how programs should be written to | |
114 | support message catalogs, a directory and file naming organization | |
115 | for the message catalogs themselves, a runtime library supporting the | |
116 | retrieval of translated messages, and a few stand-alone programs to | |
117 | massage in various ways the sets of translatable strings, or already | |
118 | translated strings. A special GNU Emacs mode also helps interested | |
119 | parties into preparing these sets, or bringing them up to date. | |
120 | ||
121 | </P> | |
122 | <P> | |
123 | GNU <CODE>gettext</CODE> is designed so it minimizes the impact of | |
124 | internationalization on program sources, keeping this impact as small | |
125 | and hardly noticeable as possible. Internationalization has better | |
126 | chances of succeeding if it is very light weighted, or at least, | |
127 | appear to be so, when looking at program sources. | |
128 | ||
129 | </P> | |
130 | <P> | |
131 | The GNU Translation Project also uses the GNU <CODE>gettext</CODE> | |
132 | distribution as a vehicle for documenting its structure and methods, | |
133 | even if this goes beyond the technicalities of the GNU <CODE>gettext</CODE> | |
134 | proper. By doing so, translators will find in a single place, as | |
135 | far as possible, all they need to know for properly doing their | |
136 | translating work. Also, this supplementary documentation might also | |
137 | help programmers, and even curious users, at understanding how GNU | |
138 | <CODE>gettext</CODE> is related to the remainder of the GNU Translation | |
139 | Project, and consequently, have a glimpse at the <EM>big picture</EM>. | |
140 | ||
141 | </P> | |
142 | ||
143 | ||
144 | <H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">I18n, L10n, and Such</A></H2> | |
145 | ||
146 | <P> | |
147 | Two long words appear all the time when we discuss support of native | |
148 | language in programs, and these words have a precise meaning, worth | |
149 | being explained here, once and for all in this document. The words are | |
150 | <EM>internationalization</EM> and <EM>localization</EM>. Many people, | |
151 | tired of writing these long words over and over again, took the | |
152 | habit of writing <STRONG>i18n</STRONG> and <STRONG>l10n</STRONG> instead, quoting the first | |
153 | and last letter of each word, and replacing the run of intermediate | |
154 | letters by a number merely telling how many such letters there are. | |
155 | But in this manual, in the sake of clarity, we will patiently write | |
156 | the names in full, each time... | |
157 | ||
158 | </P> | |
159 | <P> | |
160 | By <STRONG>internationalization</STRONG>, one refers to the operation by which a | |
161 | program, or a set of programs turned into a package, is made aware and | |
162 | able to support multiple languages. This is a generalization process, | |
163 | by which the programs are untied from using only English strings or | |
164 | other English specific habits, and connected to generic ways of doing | |
165 | the same, instead. Program developers may use various techniques to | |
166 | internationalize their programs, some of them have been standardized. | |
167 | GNU <CODE>gettext</CODE> offers one of these standards. See section <A HREF="gettext.html#SEC36">The Programmer's View</A>. | |
168 | ||
169 | </P> | |
170 | <P> | |
171 | By <STRONG>localization</STRONG>, one means the operation by which, in a set | |
172 | of programs already internationalized, one gives the program all | |
173 | needed information so that it can bend itself to handle its input | |
174 | and output in a fashion which is correct for some native language and | |
175 | cultural habits. This is a particularisation process, by which generic | |
176 | methods already implemented in an internationalized program are used | |
177 | in specific ways. The programming environment puts several functions | |
178 | to the programmers disposal which allow this runtime configuration. | |
179 | The formal description of specific set of cultural habits for some | |
180 | country, together with all associated translations targeted to the | |
181 | same native language, is called the <STRONG>locale</STRONG> for this language | |
182 | or country. Users achieve localization of programs by setting proper | |
183 | values to special environment variables, prior to executing those | |
184 | programs, identifying which locale should be used. | |
185 | ||
186 | </P> | |
187 | <P> | |
188 | In fact, locale message support is only one component of the cultural | |
189 | data that makes up a particular locale. There are a whole host of | |
190 | routines and functions provided to aid programmers in developing | |
191 | internationalized software and which allows them to access the data | |
192 | stored in a particular locale. When someone presently refers to a | |
193 | particular locale, they are obviously referring to the data stored | |
194 | within that particular locale. Similarly, if a programmer is referring | |
195 | to "accessing the locale routines", they are referring to the | |
196 | complete suite of routines that access all of the locale's information. | |
197 | ||
198 | </P> | |
199 | <P> | |
200 | One uses the expression <STRONG>Native Language Support</STRONG>, or merely NLS, | |
201 | for speaking of the overall activity or feature encompassing both | |
202 | internationalization and localization, allowing for multi-lingual | |
203 | interactions in a program. In a nutshell, one could say that | |
204 | internationalization is the operation by which further localizations | |
205 | are made possible. | |
206 | ||
207 | </P> | |
208 | <P> | |
209 | Also, very roughly said, when it comes to multi-lingual messages, | |
210 | internationalization is usually taken care of by programmers, and | |
211 | localization is usually taken care of by translators. | |
212 | ||
213 | </P> | |
214 | ||
215 | ||
216 | <H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">Aspects in Native Language Support</A></H2> | |
217 | ||
218 | <P> | |
219 | For a totally multi-lingual distribution, there are many things to | |
220 | translate beyond output messages. | |
221 | ||
222 | </P> | |
223 | ||
224 | <UL> | |
225 | <LI> | |
226 | ||
227 | As of today, GNU <CODE>gettext</CODE> offers a complete toolset for | |
228 | translating messages output by C programs. Perl scripts and shell | |
229 | scripts also need to be translated. Even if there are some hooks | |
230 | so this can be done, these hooks are not integrated as well as they | |
231 | should be. | |
232 | ||
233 | <LI> | |
234 | ||
235 | Some programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able | |
236 | to produce other programs (or scripts). Even if the generating | |
237 | programs themselves are internationalized, the generated programs they | |
238 | produce may need internationalization on their own, and this indirect | |
239 | internationalization could be automated right from the generating | |
240 | program. In fact, quite usually, generating and generated programs | |
241 | could be internationalized independently, as the effort needed is | |
242 | fairly orthogonal. | |
243 | ||
244 | <LI> | |
245 | ||
246 | A few programs include textual tables which might need translation | |
247 | themselves, independently of the strings contained in the program | |
248 | itself. For example, RFC 1345 gives an English description for each | |
249 | character which GNU <CODE>recode</CODE> is able to reconstruct at execution. | |
250 | Since these descriptions are extracted from the RFC by mechanical means, | |
251 | translating them properly would require a prior translation of the RFC | |
252 | itself. | |
253 | ||
254 | <LI> | |
255 | ||
256 | Almost all programs accept options, which are often worded out so to | |
257 | be descriptive for the English readers; one might want to consider | |
258 | offering translated versions for program options as well. | |
259 | ||
260 | <LI> | |
261 | ||
262 | Many programs read, interpret, compile, or are somewhat driven by | |
263 | input files which are texts containing keywords, identifiers, or | |
264 | replies which are inherently translatable. For example, one may want | |
265 | <CODE>gcc</CODE> to allow diacriticized characters in identifiers or use | |
266 | translated keywords; <SAMP>`rm -i'</SAMP> might accept something else than | |
267 | <SAMP>`y'</SAMP> or <SAMP>`n'</SAMP> for replies, etc. Even if the program will | |
268 | eventually make most of its output in the foreign languages, one has | |
269 | to decide whether the input syntax, option values, etc., are to be | |
270 | localized or not. | |
271 | ||
272 | <LI> | |
273 | ||
274 | The manual accompanying a package, as well as all documentation files | |
275 | in the distribution, could surely be translated, too. Translating a | |
276 | manual, with the intent of later keeping up with updates, is a major | |
277 | undertaking in itself, generally. | |
278 | ||
279 | </UL> | |
280 | ||
281 | <P> | |
282 | As we already stressed, translation is only one aspect of locales. | |
283 | Other internationalization aspects are not currently handled by GNU | |
284 | <CODE>gettext</CODE>, but perhaps may be handled in future versions. There | |
285 | are many attributes that are needed to define a country's cultural | |
286 | conventions. These attributes include beside the country's native | |
287 | language, the formatting of the date and time, the representation of | |
288 | numbers, the symbols for currency, etc. These local <STRONG>rules</STRONG> are | |
289 | termed the country's locale. The locale represents the knowledge | |
290 | needed to support the country's native attributes. | |
291 | ||
292 | </P> | |
293 | <P> | |
294 | There are a few major areas which may vary between countries and | |
295 | hence, define what a locale must describe. The following list helps | |
296 | putting multi-lingual messages into the proper context of other tasks | |
297 | related to locales, and also presents some other areas which GNU | |
298 | <CODE>gettext</CODE> might eventually tackle, maybe, one of these days. | |
299 | ||
300 | </P> | |
301 | <DL COMPACT> | |
302 | ||
303 | <DT><EM>Characters and Codesets</EM> | |
304 | <DD> | |
305 | The codeset most commonly used through out the USA and most English | |
306 | speaking parts of the world is the ASCII codeset. However, there are | |
307 | many characters needed by various locales that are not found within | |
308 | this codeset. The 8-bit ISO 8859-1 code set has most of the special | |
309 | characters needed to handle the major European languages. However, in | |
310 | many cases, the ISO 8859-1 font is not adequate. Hence each locale | |
311 | will need to specify which codeset they need to use and will need | |
312 | to have the appropriate character handling routines to cope with | |
313 | the codeset. | |
314 | ||
315 | <DT><EM>Currency</EM> | |
316 | <DD> | |
317 | The symbols used vary from country to country as does the position | |
318 | used by the symbol. Software needs to be able to transparently | |
319 | display currency figures in the native mode for each locale. | |
320 | ||
321 | <DT><EM>Dates</EM> | |
322 | <DD> | |
323 | The format of date varies between locales. For example, Christmas day | |
324 | in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. | |
325 | Other countries might use ISO 8061 dates, etc. | |
326 | ||
327 | Time of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>, | |
328 | or otherwise. Some locales require time to be specified in 24-hour | |
329 | mode rather than as AM or PM. Further, the nature and yearly extent | |
330 | of the Daylight Saving correction vary widely between countries. | |
331 | ||
332 | <DT><EM>Numbers</EM> | |
333 | <DD> | |
334 | Numbers can be represented differently in different locales. | |
335 | For example, the following numbers are all written correctly for | |
336 | their respective locales: | |
337 | ||
338 | ||
339 | <PRE> | |
340 | 12,345.67 English | |
341 | 12.345,67 French | |
342 | 1,2345.67 Asia | |
343 | </PRE> | |
344 | ||
345 | Some programs could go further and use different unit systems, like | |
346 | English units or Metric units, or even take into account variants | |
347 | about how numbers are spelled in full. | |
348 | ||
349 | <DT><EM>Messages</EM> | |
350 | <DD> | |
351 | The most obvious area is the language support within a locale. This is | |
352 | where GNU <CODE>gettext</CODE> provide an ease for developers and users to | |
353 | easily change the language that the software uses to communicate to | |
354 | the user. | |
355 | ||
356 | </DL> | |
357 | ||
358 | <P> | |
359 | In the near future we see no chance that beside message handling | |
360 | more components of locale will be made available for use in other | |
361 | GNU packages. The reason for this is that most modern system provide | |
362 | a more or less reasonable support for at least some of the missing | |
363 | components. Another point is that the GNU libc and Linux will get | |
364 | a new and complete implementation of the whole locale functionality | |
365 | which could be adopted by system lacking a reasonable locale support. | |
366 | ||
367 | </P> | |
368 | ||
369 | ||
370 | <H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">Files Conveying Translations</A></H2> | |
371 | ||
372 | <P> | |
373 | The letters PO in <TT>`.po'</TT> files means Portable Object, to | |
374 | distinguish it from <TT>`.mo'</TT> files, where MO stands for Machine | |
375 | Object. This paradigm, as well as the PO file format, is inspired | |
376 | by the NLS standard developed by Uniforum, and implemented by Sun | |
377 | in their Solaris system. | |
378 | ||
379 | </P> | |
380 | <P> | |
381 | PO files are meant to be read and edited by humans, and associate each | |
382 | original, translatable string of a given package with its translation | |
383 | in a particular target language. A single PO file is dedicated to | |
384 | a single target language. If a package supports many languages, | |
385 | there is one such PO file per language supported, and each package | |
386 | has its own set of PO files. These PO files are best created by | |
387 | the <CODE>xgettext</CODE> program, and later updated or refreshed through | |
388 | the <CODE>tupdate</CODE> program. Program <CODE>xgettext</CODE> extracts all | |
389 | marked messages from a set of C files and initializes a PO file with | |
390 | empty translations. Program <CODE>tupdate</CODE> takes care of adjusting | |
391 | PO files between releases of the corresponding sources, commenting | |
392 | obsolete entries, initializing new ones, and updating all source | |
393 | line references. Files ending with <TT>`.pot'</TT> are kind of base | |
394 | translation files found in distributions, in PO file format, and | |
395 | <TT>`.pox'</TT> files are often temporary PO files. | |
396 | ||
397 | </P> | |
398 | <P> | |
399 | MO files are meant to be read by programs, and are binary in nature. | |
400 | A few systems already offer tools for creating and handling MO files | |
401 | as part of the Native Language Support coming with the system, but the | |
402 | format of these MO files is often different from system to system, | |
403 | and non-portable. They do not necessary use <TT>`.mo'</TT> for file | |
404 | extensions, but since system libraries are also used for accessing | |
405 | these files, it works as long as the system is self-consistent about | |
406 | it. If GNU <CODE>gettext</CODE> is able to interface with the tools already | |
407 | provided with systems, it will consequently let these provided tools | |
408 | take care of generating the MO files. Or else, if such tools are not | |
409 | found or do not seem usable, GNU <CODE>gettext</CODE> will use its own ways | |
410 | and its own format for MO files. Files ending with <TT>`.gmo'</TT> are | |
411 | really MO files, when it is known that these files use the GNU format. | |
412 | ||
413 | </P> | |
414 | ||
415 | ||
416 | <H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">Overview of GNU <CODE>gettext</CODE></A></H2> | |
417 | ||
418 | <P> | |
419 | The following diagram summarizes the relation between the files | |
420 | handled by GNU <CODE>gettext</CODE> and the tools acting on these files. | |
421 | It is followed by a somewhat detailed explanations, which you should | |
422 | read while keeping an eye on the diagram. Having a clear understanding | |
423 | of these interrelations would surely help programmers, translators | |
424 | and maintainers. | |
425 | ||
426 | </P> | |
427 | ||
428 | <PRE> | |
429 | Original C Sources ---> PO mode ---> Marked C Sources ---. | |
430 | | | |
431 | .---------<--- GNU gettext Library | | |
432 | .--- make <---+ | | |
433 | | `---------<--------------------+-----------' | |
434 | | | | |
435 | | .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium | |
436 | | | | ^ | |
437 | | | `---. | | |
438 | | `---. +---> PO mode ---. | |
439 | | +----> tupdate -------> LANG.pox --->--------' | | |
440 | | .---' | | |
441 | | | | | |
442 | | `-------------<---------------. | | |
443 | | +--- LANG.po <--- New LANG.pox <----' | |
444 | | .--- LANG.gmo <--- msgfmt <---' | |
445 | | | | |
446 | | `---> install ---> /.../LANG/PACKAGE.mo ---. | |
447 | | +---> "Hello world!" | |
448 | `-------> install ---> /.../bin/PROGRAM -------' | |
449 | </PRE> | |
450 | ||
451 | <P> | |
452 | The indication <SAMP>`PO mode'</SAMP> appears in two places in this picture, | |
453 | and you may safely read it as merely meaning "hand editing", using | |
454 | any editor of your choice, really. However, for those of you being | |
455 | the lucky users of GNU Emacs, PO mode has been specifically created | |
456 | for providing a cosy environment for editing or modifying PO files. | |
457 | While editing a PO file, PO mode allows for the easy browsing of | |
458 | auxiliary and compendium PO files, as well as following references into | |
459 | the set of C program sources from which PO files has been derived. | |
460 | It has a few special features, among which the interactive marking | |
461 | of program strings as translatable, and the validatation of PO files | |
462 | with easy repositioning to PO file lines showing errors. | |
463 | ||
464 | </P> | |
465 | <P> | |
466 | As a programmer, the first step into bringing GNU <CODE>gettext</CODE> | |
467 | into your package is identifying, right in the C sources, which | |
468 | strings are meant to be translatable, and which are untranslatable. | |
469 | This tedious job can be done a little more comfortably using PO | |
470 | mode, but you can use any means being usual to you for modifying your | |
471 | C sources. Some other simple, standard changes are also needed to | |
472 | properly initialize the translation library. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>, for | |
473 | more information about all this. | |
474 | ||
475 | </P> | |
476 | <P> | |
477 | Once the C sources have been modified, the <CODE>xgettext</CODE> program | |
478 | is used to find and extract all translatable strings, and create an | |
479 | initial PO file out of all these. This <TT>`<VAR>package</VAR>.pot'</TT> file | |
480 | contains all original program strings, it has sets of pointers to | |
481 | exactly where in C sources each string is used, and all translations | |
482 | are set to empty. The letter <KBD>t</KBD> in <TT>`.pot'</TT> marks that this is | |
483 | a Template PO file, not yet oriented towards any particular language. | |
484 | See section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the | |
485 | <CODE>xgettext</CODE> program. If you are <EM>really</EM> lazy, you might | |
486 | be interested at working a lot more right away, and preparing the | |
487 | whole distribution setup (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>). By doing so, you | |
488 | spare typing the <CODE>xgettext</CODE> command yourself, as <CODE>make</CODE> | |
489 | should now generate the proper things automatically for you! | |
490 | ||
491 | </P> | |
492 | <P> | |
493 | The first time through, there is no <TT>`<VAR>lang</VAR>.po'</TT> yet, so the | |
494 | <CODE>tupdate</CODE> step may be skipped and replaced by a mere copy of | |
495 | <TT>`<VAR>package</VAR>.pot'</TT> to <TT>`<VAR>lang</VAR>.pox'</TT>, where <VAR>lang</VAR> | |
496 | represents the target language. | |
497 | ||
498 | </P> | |
499 | <P> | |
500 | Then comes the initial translation of messages. Translation in | |
501 | itself is a whole matter, still exclusively meant for humans, | |
502 | and whose complexity far overwhelms the level of this manual. | |
503 | Nevertheless, a few hints are given in some other chapter of this | |
504 | manual (see section <A HREF="gettext.html#SEC54">The Translator's View</A>). You will also find there indications | |
505 | about how to contact translating teams, or becoming part of them, | |
506 | for sharing your translating concerns with others who target the same | |
507 | native language. | |
508 | ||
509 | </P> | |
510 | <P> | |
511 | While adding the translated messages into the <TT>`<VAR>lang</VAR>.pox'</TT> | |
512 | PO file, if you do not have GNU Emacs handy, you are on your own | |
513 | for ensuring that your fully respect the PO file format, and quoting | |
514 | conventions (see section <A HREF="gettext.html#SEC9">The Format of PO Files</A>). This is surely not an impossible task, | |
515 | as this is the way many people handled PO files already for Uniforum or | |
516 | Solaris. On the other hand, using PO mode in GNU Emacs, most details | |
517 | of PO file format are taken care for you, but you have to acquire | |
518 | some familiarity with PO mode itself. Besides main PO mode commands | |
519 | (see section <A HREF="gettext.html#SEC10">Main Commands</A>), you should know how to move between entries | |
520 | (see section <A HREF="gettext.html#SEC11">Entry Positioning</A>), and how to handle untranslated entries | |
521 | (see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>). | |
522 | ||
523 | </P> | |
524 | <P> | |
525 | If some common translations have already been saved into a compendium | |
526 | PO file, translators may use PO mode for initializing untranslated | |
527 | entries from the compendium, and also save selected translations into | |
528 | the compendium, updating it (see section <A HREF="gettext.html#SEC21">Using Translation Compendiums</A>). Compendium files | |
529 | are meant to be exchanged between members of a given translation team. | |
530 | ||
531 | </P> | |
532 | <P> | |
533 | Programs, or packages of programs, are dynamic in nature: users write | |
534 | bug reports and suggestion for improvements, maintainers react by | |
535 | modifying programs in various ways. The fact that a package has | |
536 | already been internationalized should not make maintainers shy | |
537 | of adding new strings, or modifying strings already translated. | |
538 | They just do their job the best they can. For the GNU Translation | |
539 | Project to work smoothly, it is important that maintainers do not | |
540 | carry translation concerns on their already loaded shoulders, and that | |
541 | translators be kept as free as possible of programmatic concerns. | |
542 | ||
543 | </P> | |
544 | <P> | |
545 | The only concern maintainers should have is carefully marking new | |
546 | strings are translatable, when they should be, and do not otherwise | |
547 | worry about them being translated, as this will come in proper time. | |
548 | Consequently, when programs and their strings are adjusted in various | |
549 | ways by maintainers, and for matters usually unrelated to translation, | |
550 | <CODE>xgettext</CODE> would construct <TT>`<VAR>package</VAR>.pot'</TT> files which are | |
551 | evolving over time, so the translations carried by <TT>`<VAR>lang</VAR>.po'</TT> | |
552 | are slowly fading out of date. | |
553 | ||
554 | </P> | |
555 | <P> | |
556 | It is important for translators (and even maintainers) to understand | |
557 | that package translation is a continuous process in the lifetime of a | |
558 | package, and not something which is done once and for all at the start. | |
559 | After an initial burst of translation activity for a given package, | |
560 | interventions are needed once in a while, because here and there, | |
561 | translated entries become obsolete, and new untranslated entries | |
562 | appear, needing translation. | |
563 | ||
564 | </P> | |
565 | <P> | |
566 | The <CODE>tupdate</CODE> program has the purpose of refreshing an already | |
567 | existing <TT>`<VAR>lang</VAR>.po'</TT> file, by comparing it with a newer | |
568 | <TT>`<VAR>package</VAR>.pot'</TT> template file, extracted by <CODE>xgettext</CODE> | |
569 | out of recent C sources. The refreshing operation adjusts all | |
570 | references to C source locations for strings, since these strings | |
571 | move as programs are modified. Also, <CODE>tupdate</CODE> comments out as | |
572 | obsolete, in <TT>`<VAR>lang</VAR>.pox'</TT>, those already translated entries | |
573 | which are no longer used in the program sources (see section <A HREF="gettext.html#SEC25">Obsolete Entries</A>. It finally discovers new strings and insert them in | |
574 | the resulting PO file as untranslated entries (see section <A HREF="gettext.html#SEC24">Untranslated Entries</A>. See section <A HREF="gettext.html#SEC23">Invoking the <CODE>tupdate</CODE> Program</A>, for more information about what | |
575 | <CODE>tupdate</CODE> really does. | |
576 | ||
577 | </P> | |
578 | <P> | |
579 | Whatever route or means taken, the goal is obtaining an updated | |
580 | <TT>`<VAR>lang</VAR>.pox'</TT> file offering translations for all strings. | |
581 | When this is properly achieved, this file <TT>`<VAR>lang</VAR>.pox'</TT> may | |
582 | take the place of the previous official <TT>`<VAR>lang</VAR>.po'</TT> file. | |
583 | ||
584 | </P> | |
585 | <P> | |
586 | The time mobility, or fluidity of PO files, is an integral part of | |
587 | the translation game, and should be well understood, and accepted. | |
588 | People resisting it will have a hard time participating in the GNU | |
589 | Translation Project, or will give a hard time to other participants! | |
590 | In particular, maintainers should relax and include all available PO | |
591 | files in their distributions, even if these have not recently been | |
592 | updated, without banging or otherwise trying to exert pressure on the | |
593 | translator teams to get the job done. The pressure should rather | |
594 | come from the community of users speaking a particular language, | |
595 | and maintainers should consider themselves fairly relieved of any | |
596 | concern about the adequacy of translation files. On the other hand, | |
597 | translators should reasonably try updating the PO files they are | |
598 | responsible for, while the package is undergoing pretest, prior to | |
599 | an official distribution. | |
600 | ||
601 | </P> | |
602 | <P> | |
603 | Once the PO file is complete and dependable, the <CODE>msgfmt</CODE> program | |
604 | is used for turning the PO file into a machine-oriented format, which | |
605 | may yield efficient retrieval of translations by the programs of the | |
606 | package, whenever needed at runtime (see section <A HREF="gettext.html#SEC31">The Format of GNU MO Files</A>). See section <A HREF="gettext.html#SEC30">Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modalities of execution | |
607 | for the <CODE>msgfmt</CODE> program. | |
608 | ||
609 | </P> | |
610 | <P> | |
611 | Finally, the modified and marked C sources are compiled and linked | |
612 | with the GNU <CODE>gettext</CODE> library, usually through the operation of | |
613 | <CODE>make</CODE>, given a suitable <TT>`Makefile'</TT> exists for the project, | |
614 | and the resulting executable is installed somewhere users will find it. | |
615 | The MO files themselves should also be properly installed. Given the | |
616 | appropriate environment variables are set (see section <A HREF="gettext.html#SEC35">Magic for End Users</A>), the | |
617 | program should localize itself automatically, whenever it executes. | |
618 | ||
619 | </P> | |
620 | <P> | |
621 | The remaining of this manual has the purpose of deepening the various | |
622 | steps outlined in this section. | |
623 | ||
624 | </P> | |
625 | ||
626 | ||
627 | <H1><A NAME="SEC7" HREF="gettext_toc.html#TOC7">PO Files and PO Mode Basics</A></H1> | |
628 | ||
629 | <P> | |
630 | The GNU <CODE>gettext</CODE> toolset helps programmers and translators | |
631 | at producing, updating and using translation files, mainly those | |
632 | PO files which are textual, editable files. This chapter insists | |
633 | on the format of PO files, and contains a PO mode starter. PO mode | |
634 | description is spread over this manual instead of being concentrated | |
635 | in one place, this chapter presents only the basics of PO mode. | |
636 | ||
637 | </P> | |
638 | ||
639 | ||
640 | ||
641 | <H2><A NAME="SEC8" HREF="gettext_toc.html#TOC8">Completing GNU <CODE>gettext</CODE> Installation</A></H2> | |
642 | ||
643 | <P> | |
644 | Once you have received, unpacked, configured and compiled the GNU | |
645 | <CODE>gettext</CODE> distribution, the <SAMP>`make install'</SAMP> command puts in | |
646 | place the programs <CODE>xgettext</CODE>, <CODE>msgfmt</CODE>, <CODE>gettext</CODE>, and | |
647 | <CODE>tupdate</CODE>, as well as their available message catalogs. For | |
648 | completing a comfortable installation, you might also want to make the | |
649 | PO mode available to your GNU Emacs users. | |
650 | ||
651 | </P> | |
652 | <P> | |
653 | To finish the installation of the PO mode, you might want modify your | |
654 | file <TT>`.emacs'</TT>, once and for all, so it contains a few lines looking | |
655 | like: | |
656 | ||
657 | </P> | |
658 | ||
659 | <PRE> | |
660 | (setq auto-mode-alist | |
661 | (cons '("\\.pox?\\'" . po-mode) auto-mode-alist)) | |
662 | (autoload 'po-mode "po-mode") | |
663 | </PRE> | |
664 | ||
665 | <P> | |
666 | Later, whenever you edit some <TT>`.po'</TT> or <TT>`.pox'</TT> file, Emacs | |
667 | loads <TT>`po-mode.elc'</TT> (or <TT>`po-mode.el'</TT>) as needed, and | |
668 | automatically activate PO mode commands for the associated buffer. | |
669 | The string <EM>PO</EM> appears in the mode line for any buffer for | |
670 | which PO mode is active. Many PO files may be active at once in a | |
671 | single Emacs session. | |
672 | ||
673 | </P> | |
674 | ||
675 | ||
676 | <H2><A NAME="SEC9" HREF="gettext_toc.html#TOC9">The Format of PO Files</A></H2> | |
677 | ||
678 | <P> | |
679 | A PO file is made up of many entries, each entry holding the relation | |
680 | between an original untranslated string and its corresponding | |
681 | translation. All entries in a given PO file usually pertain | |
682 | to a single project, and all translations are expressed in a single | |
683 | target language. One PO file <STRONG>entry</STRONG> has the following schematic | |
684 | structure: | |
685 | ||
686 | </P> | |
687 | ||
688 | <PRE> | |
689 | <VAR>white-space</VAR> | |
690 | # <VAR>translator-comments</VAR> | |
691 | #. <VAR>automatic-comments</VAR> | |
692 | #: <VAR>reference</VAR>... | |
693 | msgid <VAR>untranslated-string</VAR> | |
694 | msgstr <VAR>translated-string</VAR> | |
695 | </PRE> | |
696 | ||
697 | <P> | |
698 | The general structure of a PO file should be well understood by | |
699 | the translator. When using PO mode, very little has to be known | |
700 | about the format details, as PO mode takes care of them for her. | |
701 | ||
702 | </P> | |
703 | <P> | |
704 | Entries begin with some optional white space. Usually, when generated | |
705 | through GNU <CODE>gettext</CODE> tools, there is exactly one blank line | |
706 | between entries. Then comments follow, on lines all starting with the | |
707 | character <KBD>#</KBD>. There are two kinds of comments: those which have | |
708 | some white space immediately following the <KBD>#</KBD>, which comments are | |
709 | created and maintained exclusively by the translator, and those which | |
710 | have some non-white character just after the <KBD>#</KBD>, which comments | |
711 | are created and maintained automatically by GNU <CODE>gettext</CODE> tools. | |
712 | All comments, of any kind, are optional. | |
713 | ||
714 | </P> | |
715 | <P> | |
716 | After white space and comments, entries show two strings, giving | |
717 | first the untranslated string as it appears in the original program | |
718 | sources, and then, the translation of this string. The original | |
719 | string is introduced by the keyword <CODE>msgid</CODE>, and the translation, | |
720 | by <CODE>msgstr</CODE>. The two strings, untranslated and translated, | |
721 | are quoted in various ways in the PO file, using <KBD>"</KBD> | |
722 | delimiters and <KBD>\</KBD> escapes, but the translator does not really | |
723 | have to pay attention to the precise quoting format, as PO mode fully | |
724 | intend to take care of quoting for her. | |
725 | ||
726 | </P> | |
727 | <P> | |
728 | The <CODE>msgid</CODE> strings, as well as automatic comments, are produced | |
729 | and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not | |
730 | provide means for the translator to alter these. The most she can | |
731 | do is merely deleting them, and only by deleting the whole entry. | |
732 | On the other hand, the <CODE>msgstr</CODE> string, as well as translator | |
733 | comments, are really meant for the translator, and PO mode gives her | |
734 | the full control she needs. | |
735 | ||
736 | </P> | |
737 | <P> | |
738 | It happens that some lines, usually whitespace or comments, follow the | |
739 | very last entry of a PO file. Such lines are not part of any entry, | |
740 | and PO mode is unable to take action on those lines. By using the | |
741 | PO mode function <KBD>M-x po-normalize</KBD>, the translator may get | |
742 | rid of those spurious lines. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>. | |
743 | ||
744 | </P> | |
745 | <P> | |
746 | The remainder of this section may be safely skipped for those using | |
747 | PO mode, yet it may be interesting for everybody to have a better | |
748 | idea of the precise format of a PO file. On the other hand, those | |
749 | not having GNU Emacs handy should carefully continue reading on. | |
750 | ||
751 | </P> | |
752 | <P> | |
753 | Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects | |
754 | the C syntax for a character string, including the surrounding quotes | |
755 | and imbedded backslashed escape sequences. When the time comes | |
756 | to write multi-line strings, one should not use escaped newlines. | |
757 | Instead, a closing quote should follow the last character on the | |
758 | line to be continued, and an opening quote should resume the string | |
759 | at the beginning of the following PO file line. For example: | |
760 | ||
761 | </P> | |
762 | ||
763 | <PRE> | |
764 | msgid "" | |
765 | "Here is an example of how one might continue a very long string\n" | |
766 | "for the common case the string represents multi-line output.\n" | |
767 | </PRE> | |
768 | ||
769 | <P> | |
770 | In this example, the empty string is used on the first line, for | |
771 | allowing the better alignment of the <KBD>H</KBD> from the word <SAMP>`Here'</SAMP> | |
772 | over the <KBD>f</KBD> from the word <SAMP>`for'</SAMP>. In this example, the | |
773 | <CODE>msgid</CODE> keyword is followed by three strings, which are meant | |
774 | to be concatenated. Concatenating the empty string does not change | |
775 | the resulting overall string, but it is a way for us to comply with | |
776 | the necessity of <CODE>msgid</CODE> to be followed by a string on the same | |
777 | line, while keeping the multi-line presentation left-justified, as | |
778 | we find this to be cleaner disposition. The empty string could have | |
779 | been omitted, but only if the string starting with <SAMP>`Here'</SAMP> was | |
780 | promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> It was not really necessary | |
781 | either to switch between the two last quoted strings immediately after | |
782 | the newline <SAMP>`\n'</SAMP>, the switch could have occurred after <EM>any</EM> | |
783 | other character, we just did it this way because it is neater. | |
784 | ||
785 | </P> | |
786 | <P> | |
787 | One should carefully distinguish between end of lines marked as | |
788 | <SAMP>`\n'</SAMP> <EM>inside</EM> quotes, which are part of the represented | |
789 | string, and end of lines in the PO file itself, outside string quotes, | |
790 | which have no incidence on the represented string. | |
791 | ||
792 | </P> | |
793 | <P> | |
794 | Outside strings, white lines and comments may be used freely. | |
795 | Comments start at the beginning of a line with <SAMP>`#'</SAMP> and extend | |
796 | until the end of the PO file line. Comments written by translators | |
797 | should have the initial <SAMP>`#'</SAMP> immediately followed by some white | |
798 | space. If the <SAMP>`#'</SAMP> is not immediately followed by white space, | |
799 | this comment is most likely generated and managed by specialized GNU | |
800 | tools, and might disappear or be replaced unexpectandly when the PO | |
801 | file is given to <CODE>tupdate</CODE>. | |
802 | ||
803 | </P> | |
804 | ||
805 | ||
806 | <H2><A NAME="SEC10" HREF="gettext_toc.html#TOC10">Main Commands</A></H2> | |
807 | ||
808 | <P> | |
809 | When Emacs finds a PO file in a window, PO mode is activated | |
810 | for that window. This puts the window read-only and establishes a | |
811 | po-mode-map, which is a genuine Emacs mode, in that way that it is | |
812 | not derived from text mode in any way. | |
813 | ||
814 | </P> | |
815 | <P> | |
816 | The main PO commands are those who do not fit in the other categories in | |
817 | subsequent sections, they allow for quitting PO mode or managing windows | |
818 | in special ways. | |
819 | ||
820 | </P> | |
821 | <DL COMPACT> | |
822 | ||
823 | <DT><KBD>u</KBD> | |
824 | <DD> | |
825 | Undo last modification to the PO file. | |
826 | ||
827 | <DT><KBD>q</KBD> | |
828 | <DD> | |
829 | Quit processing and save the PO file. | |
830 | ||
831 | <DT><KBD>o</KBD> | |
832 | <DD> | |
833 | Temporary leave the PO file window. | |
834 | ||
835 | <DT><KBD>h</KBD> | |
836 | <DD> | |
837 | Show help about PO mode. | |
838 | ||
839 | <DT><KBD>=</KBD> | |
840 | <DD> | |
841 | Give some PO file statistics. | |
842 | ||
843 | <DT><KBD>v</KBD> | |
844 | <DD> | |
845 | Batch validate the format of the whole PO file. | |
846 | ||
847 | </DL> | |
848 | ||
849 | <P> | |
850 | The command <KBD>u</KBD> (<CODE>po-undo</CODE>) interfaces to the GNU Emacs | |
851 | <EM>undo</EM> facility. See section `Undoing Changes' in <CITE>The Emacs Editor</CITE>. Each time <KBD>u</KBD> is typed, modifications the translator | |
852 | did to the PO file are undone a little more. For the purpose of | |
853 | undoing, each PO mode command is atomic. This is especially true for | |
854 | the <KBD><KBD>RET</KBD></KBD> command: the whole edition made by using a single | |
855 | use of this command is undone at once, even if the edition itself | |
856 | implied several actions. However, while in the editing window, one | |
857 | can undo the edition work quite parsimoniously. | |
858 | ||
859 | </P> | |
860 | <P> | |
861 | The command <KBD>q</KBD> (<CODE>po-quit</CODE>) is used when the translator is | |
862 | done with the PO file. If the file has been modified, it is saved | |
863 | on disk first. However, prior to all this, the command checks if | |
864 | some untranslated message remains in the PO file and, if yes, the | |
865 | translator is asked if she really wants to leave working with this | |
866 | PO file. This is the preferred way of getting rid of an Emacs PO | |
867 | file buffer. Merely killing it through the usual command <KBD>C-x | |
868 | k</KBD> (<CODE>kill-buffer</CODE>), say, has the unnice effect of leaving a PO | |
869 | internal work buffer behind. | |
870 | ||
871 | </P> | |
872 | <P> | |
873 | The command <KBD>o</KBD> (<CODE>po-other-window</CODE>) is another, softer | |
874 | way, to leave PO mode, temporarily. It just moves the cursor in | |
875 | some other Emacs window, and pops one if necessary. For example, if | |
876 | the translator just got PO mode to show some source context in some | |
877 | other, she might discover some apparent bug in the program source | |
878 | that needs correction. This command allows the translator to change | |
879 | sex, become a programmer, and have the cursor right into the window | |
880 | containing the program she (or rather <EM>he</EM>) wants to modify. | |
881 | By later getting the cursor back in the PO file window, or by | |
882 | asking Emacs to edit this file once again, PO mode is then recovered. | |
883 | ||
884 | </P> | |
885 | <P> | |
886 | The command <KBD>h</KBD> (<CODE>po-help</CODE>) displays a summary of all | |
887 | available PO mode commands. The translator should then type any | |
888 | character to resume normal PO mode operations. The command <KBD>?</KBD> | |
889 | has the same effect as <KBD>h</KBD>. | |
890 | ||
891 | </P> | |
892 | <P> | |
893 | The command <KBD>=</KBD> (<CODE>po-statistics</CODE>) computes the total number | |
894 | of entries in the PO file, the ordinal of the current entry | |
895 | (counted from 1), the number of untranslated entries, the number of | |
896 | obsolete entries, and displays all these numbers. | |
897 | ||
898 | </P> | |
899 | <P> | |
900 | The command <KBD>v</KBD> (<CODE>po-validate</CODE>) launches <CODE>msgfmt</CODE> in | |
901 | verbose mode over the current PO file. This command first offers | |
902 | to save the current PO file on disk. The <CODE>msgfmt</CODE> tool, from | |
903 | GNU <CODE>gettext</CODE>, has the purpose of creating an MO file out of a | |
904 | PO file, and PO mode uses the features of this program for checking | |
905 | the overall format of a PO file, as well as all individual entries. | |
906 | ||
907 | </P> | |
908 | <P> | |
909 | The program <CODE>msgfmt</CODE> runs asynchronously with Emacs, so | |
910 | the translator regains control immediately while her PO file | |
911 | is being studied. Error output is collected in the GNU Emacs | |
912 | <SAMP>`*compilation*'</SAMP> buffer, displayed in another window. The regular | |
913 | GNU Emacs command <KBD>C-x`</KBD> (<CODE>next-error</CODE>), as well as other | |
914 | usual compile commands, allow the translator to reposition quickly to | |
915 | the offending parts of the PO file. Once the cursor on the line in | |
916 | error, the translator may decide for any PO mode action which would | |
917 | help correcting the error. | |
918 | ||
919 | </P> | |
920 | ||
921 | ||
922 | <H2><A NAME="SEC11" HREF="gettext_toc.html#TOC11">Entry Positioning</A></H2> | |
923 | ||
924 | <P> | |
925 | The cursor in a PO file window is almost always part of | |
926 | an entry. The only exceptions are the special case when the cursor | |
927 | is after the last entry in the file, or when the PO file is | |
928 | empty. The entry where the cursor is found to be is said to be the | |
929 | current entry. Many PO mode commands operate on the current entry, | |
930 | so moving the cursor does more than allowing the translator to browse | |
931 | the PO file, this also selects on which entry commands operate. | |
932 | ||
933 | </P> | |
934 | <P> | |
935 | Some PO mode commands alter the position of the cursor in a specialized | |
936 | way. A few of those special purpose positioning are described here, | |
937 | the others are described in following sections. | |
938 | ||
939 | </P> | |
940 | <DL COMPACT> | |
941 | ||
942 | <DT><KBD>.</KBD> | |
943 | <DD> | |
944 | Redisplay the current entry. | |
945 | ||
946 | <DT><KBD>n</KBD> | |
947 | <DD> | |
948 | <DT><KBD>SPC</KBD> | |
949 | <DD> | |
950 | Select the entry after the current one. | |
951 | ||
952 | <DT><KBD>p</KBD> | |
953 | <DD> | |
954 | <DT><KBD>DEL</KBD> | |
955 | <DD> | |
956 | Select the entry before the current one. | |
957 | ||
958 | <DT><KBD><</KBD> | |
959 | <DD> | |
960 | Select the first entry in the PO file. | |
961 | ||
962 | <DT><KBD>></KBD> | |
963 | <DD> | |
964 | Select the last entry in the PO file. | |
965 | ||
966 | <DT><KBD>m</KBD> | |
967 | <DD> | |
968 | Record the location of the current entry for later use. | |
969 | ||
970 | <DT><KBD>l</KBD> | |
971 | <DD> | |
972 | Return to a previously saved entry location. | |
973 | ||
974 | <DT><KBD>x</KBD> | |
975 | <DD> | |
976 | Exchange the current entry location with the previously saved one. | |
977 | ||
978 | </DL> | |
979 | ||
980 | <P> | |
981 | Any GNU Emacs command able to reposition the cursor may be used | |
982 | to select the current entry in PO mode, including commands which | |
983 | move by characters, lines, paragraphs, screens or pages, and search | |
984 | commands. However, there is a kind of standard way to display the | |
985 | current entry in PO mode, which usual GNU Emacs commands moving | |
986 | the cursor do not especially try to enforce. The command <KBD>.</KBD> | |
987 | (<CODE>po-current-entry</CODE>) has the sole purpose of redisplaying the | |
988 | current entry properly, after the current entry has been changed by | |
989 | means external to PO mode, or the Emacs screen otherwise altered. | |
990 | ||
991 | </P> | |
992 | <P> | |
993 | It is yet to decide if PO mode would help the translator, or otherwise | |
994 | irritate her, by forcing a more fixed window disposition while she | |
995 | is doing her work. We originally had quite precise ideas about | |
996 | how windows should behave, but on the other hand, anyone used to | |
997 | GNU Emacs is often happy to keep full control. Maybe a fixed window | |
998 | disposition might be offered as a PO mode option that the translator | |
999 | might activate or deactivate at will, so it could be offered on an | |
1000 | experimental basis. If nobody feels a real need for using it, or | |
1001 | a compulsion for writing it, we might as well drop this whole idea. | |
1002 | The incentive for doing it should come from translators rather than | |
1003 | programmers, as opinions from an experienced translator are surely | |
1004 | more worth to me than opinions from programmers <EM>thinking</EM> about | |
1005 | how <EM>others</EM> should do translation. | |
1006 | ||
1007 | </P> | |
1008 | <P> | |
1009 | The commands <KBD>n</KBD> (<CODE>po-next-entry</CODE>) and <KBD>p</KBD> | |
1010 | (<CODE>po-previous-entry</CODE>) move the cursor the entry following, | |
1011 | or preceding, the current one. If <KBD>n</KBD> is given while the | |
1012 | cursor is on the last entry of the PO file, or if <KBD>p</KBD> | |
1013 | is given while the cursor is on the first entry, no move is done. | |
1014 | <KBD><KBD>SPC</KBD></KBD> and <KBD><KBD>DEL</KBD></KBD> are alternate keys for <KBD>n</KBD> and | |
1015 | <KBD>p</KBD>, respectively. | |
1016 | ||
1017 | </P> | |
1018 | <P> | |
1019 | The commands <KBD><</KBD> (<CODE>po-first-entry</CODE>) and <KBD>></KBD> | |
1020 | (<CODE>po-last-entry</CODE>) move the cursor to the first entry, or last | |
1021 | entry, of the PO file. When the cursor is located past the last | |
1022 | entry in a PO file, most PO mode commands will return an error saying | |
1023 | <SAMP>`After last entry'</SAMP>. However, the commands <KBD><</KBD> and <KBD>></KBD> | |
1024 | have the special property of being able to work even when the cursor | |
1025 | is not into some PO file entry, and you may use them for nicely | |
1026 | correcting this situation. But even these commands will fail on a | |
1027 | truly empty PO file. There are development plans for PO mode for it | |
1028 | to interactively fill an empty PO file from sources. See section <A HREF="gettext.html#SEC16">Marking Translatable Strings</A>. | |
1029 | ||
1030 | </P> | |
1031 | <P> | |
1032 | The translator may decide, before working at the translation of | |
1033 | a particular entry, that she needs browsing the remainder of the | |
1034 | PO file, maybe for finding the terminology or phraseology used | |
1035 | in related entries. She can of course use the standard Emacs idioms | |
1036 | for saving the current cursor location in some register, and use that | |
1037 | register for getting back, or else, to use the location ring. | |
1038 | ||
1039 | </P> | |
1040 | <P> | |
1041 | PO mode offers another approach, by which cursor locations may be saved | |
1042 | onto a special stack. The command <KBD>m</KBD> (<CODE>po-push-location</CODE>) | |
1043 | merely adds the location of current entry to the stack, pushing | |
1044 | the already saved locations under the new one. The command | |
1045 | <KBD>l</KBD> (<CODE>po-pop-location</CODE>) consumes the top stack element and | |
1046 | reposition the cursor to the entry associated with that top element. | |
1047 | This position is then lost, for the next <KBD>l</KBD> will move the cursor | |
1048 | to the previously saved location, and so on until locations remain | |
1049 | on the stack. | |
1050 | ||
1051 | </P> | |
1052 | <P> | |
1053 | If the translator wants the position to be kept on the location stack, | |
1054 | maybe for taking a mere look at the entry associated with the top | |
1055 | element, then go elsewhere with the intent of getting back later, she | |
1056 | ought to use <KBD>m</KBD> immediately after <KBD>l</KBD>. | |
1057 | ||
1058 | </P> | |
1059 | <P> | |
1060 | The command <KBD>x</KBD> (<CODE>po-exchange-location</CODE>) simultaneously | |
1061 | reposition the cursor to the entry associated with the top element of | |
1062 | the stack of saved locations, and replace that top element with the | |
1063 | location of the current entry before the move. Consequently, repeating | |
1064 | the <KBD>x</KBD> command toggles alternatively between two entries. | |
1065 | For achieving this, the translator will position the cursor on the | |
1066 | first entry, use <KBD>m</KBD>, then position to the second entry, and | |
1067 | merely use <KBD>x</KBD> for making the switch. | |
1068 | ||
1069 | </P> | |
1070 | ||
1071 | ||
1072 | <H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">Normalizing Strings in Entries</A></H2> | |
1073 | ||
1074 | <P> | |
1075 | There are many different ways for encoding a particular string into a | |
1076 | PO file entry, because there are so many different ways to split and | |
1077 | quote multi-line strings, and even, to represent special characters | |
1078 | by backslahsed escaped sequences. Some features of PO mode rely on | |
1079 | the ability for PO mode to scan an already existing PO file for a | |
1080 | particular string encoded into the <CODE>msgid</CODE> field of some entry. | |
1081 | Even if PO mode has internally all the built-in machinery for | |
1082 | implementing this recognition easily, doing it fast is technically | |
1083 | difficult. For facilitating a solution to this efficiency problem, | |
1084 | we decided for a canonical representation for strings. | |
1085 | ||
1086 | </P> | |
1087 | <P> | |
1088 | A conventional representation of strings in a PO file is currently | |
1089 | under discussion, and PO mode experiments a canonical representation. | |
1090 | Having both <CODE>xgettext</CODE> and PO mode converging towards a uniform | |
1091 | way of representing equivalent strings would be useful, as the internal | |
1092 | normalization needed by PO mode could be automatically satisfied | |
1093 | when using <CODE>xgettext</CODE> from GNU <CODE>gettext</CODE>. An explicit | |
1094 | PO mode normalization should then be only necessary for PO files | |
1095 | imported from elsewhere, or for when the convention itself evolves. | |
1096 | ||
1097 | </P> | |
1098 | <P> | |
1099 | So, for achieving normalization of at least the strings of a given | |
1100 | PO file needing a canonical representation, the following PO mode | |
1101 | command is available: | |
1102 | ||
1103 | </P> | |
1104 | <DL COMPACT> | |
1105 | ||
1106 | <DT><KBD>M-x po-normalize</KBD> | |
1107 | <DD> | |
1108 | Tidy the whole PO file by making entries more uniform. | |
1109 | ||
1110 | </DL> | |
1111 | ||
1112 | <P> | |
1113 | The special command <KBD>M-x po-normalize</KBD>, which has no associate | |
1114 | keys, revises all entries, ensuring that strings of both original | |
1115 | and translated entries use uniform internal quoting in the PO file. | |
1116 | It also removes any crumb after the last entry. This command may be | |
1117 | useful for PO files freshly imported from elsewhere, or if we ever | |
1118 | improve on the canonical quoting format we use. This canonical format | |
1119 | is not only meant for getting cleaner PO files, but also for greatly | |
1120 | speeding up <CODE>msgid</CODE> string lookup for some other PO mode commands. | |
1121 | ||
1122 | </P> | |
1123 | <P> | |
1124 | <KBD>M-x po-normalize</KBD> presently makes three passes over the entries. | |
1125 | The first implements heuristics for converting PO files for GNU | |
1126 | <CODE>gettext</CODE> 0.6 and earlier, in which <CODE>msgid</CODE> and <CODE>msgstr</CODE> | |
1127 | fields were using K&R style C string syntax for multi-line strings. | |
1128 | These heuristics may fail for comments not related to obsolete | |
1129 | entries and ending with a backslash; they also depend on subsequent | |
1130 | passes for finalizing the proper commenting of continued lines for | |
1131 | obsolete entries. This first pass might disappear once all oldish PO | |
1132 | files would have been adjusted. The second and third pass normalize | |
1133 | all <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings respectively. They also | |
1134 | clean out those trailing backslashes used by XView's <CODE>msgfmt</CODE> | |
1135 | for continued lines. | |
1136 | ||
1137 | </P> | |
1138 | <P> | |
1139 | Having such an explicit normalizing command allows for importing PO | |
1140 | files from other sources, but also eases the evolution of the current | |
1141 | convention, evolution driven mostly by aesthetic concerns, as of now. | |
1142 | It is all easy to make suggested adjustments at a later time, as the | |
1143 | normalizing command and eventually, other GNU <CODE>gettext</CODE> tools | |
1144 | should greatly automate conformance. A description of the canonical | |
1145 | string format is given below, for the particular benefit of those not | |
1146 | having GNU Emacs handy, and who would nevertheless want to handcraft | |
1147 | their PO files in nice ways. | |
1148 | ||
1149 | </P> | |
1150 | <P> | |
1151 | Right now, in PO mode, strings are single line or multi-line. A string | |
1152 | goes multi-line if and only if it has <EM>embedded</EM> newlines, that | |
1153 | is, if it matches <SAMP>`[^\n]\n+[^\n]'</SAMP>. So, we would have: | |
1154 | ||
1155 | </P> | |
1156 | ||
1157 | <PRE> | |
1158 | msgstr "\n\nHello, world!\n\n\n" | |
1159 | </PRE> | |
1160 | ||
1161 | <P> | |
1162 | but, replacing the space by a newline, this becomes: | |
1163 | ||
1164 | </P> | |
1165 | ||
1166 | <PRE> | |
1167 | msgstr "" | |
1168 | "\n" | |
1169 | "\n" | |
1170 | "Hello,\n" | |
1171 | "world!\n" | |
1172 | "\n" | |
1173 | "\n" | |
1174 | </PRE> | |
1175 | ||
1176 | <P> | |
1177 | We are deliberately using a caricatural example, here, to make the | |
1178 | point clearer. Usually, multi-lines are not that bad looking. | |
1179 | It is probable that we will implement the following suggestion. | |
1180 | We might lump together all initial newlines into the empty string, | |
1181 | and also all newlines introducing empty lines (that is, for <VAR>n</VAR> | |
1182 | > 1, the <VAR>n</VAR>-1'th last newlines would go together on a separate | |
1183 | string), so making the previous example appear: | |
1184 | ||
1185 | </P> | |
1186 | ||
1187 | <PRE> | |
1188 | msgstr "\n\n" | |
1189 | "Hello,\n" | |
1190 | "world!\n" | |
1191 | "\n\n" | |
1192 | </PRE> | |
1193 | ||
1194 | <P> | |
1195 | There are a few yet undecided little points about string normalization, | |
1196 | to be documented in this manual, once these questions settle. | |
1197 | ||
1198 | </P> | |
1199 | ||
1200 | ||
1201 | <H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">Preparing Program Sources</A></H1> | |
1202 | ||
1203 | <P> | |
1204 | For the programmer, changes to the C source code fall into three | |
1205 | categories. First, you have to make the localization functions | |
1206 | known to all modules needing message translation. Second, you should | |
1207 | properly trigger the operation of GNU <CODE>gettext</CODE> when the program | |
1208 | initializes, usually from the <CODE>main</CODE> function. Last, you should | |
1209 | identify and especially mark all constant strings in your program | |
1210 | needing translation. | |
1211 | ||
1212 | </P> | |
1213 | <P> | |
1214 | Presuming that your set of programs, or package, has been adjusted | |
1215 | so all needed GNU <CODE>gettext</CODE> files are available, and your | |
1216 | <TT>`Makefile'</TT> files are adjusted (see section <A HREF="gettext.html#SEC65">The Maintainer's View</A>), each C module | |
1217 | having translated C strings should contain the line: | |
1218 | ||
1219 | </P> | |
1220 | ||
1221 | <PRE> | |
1222 | #include <libintl.h> | |
1223 | </PRE> | |
1224 | ||
1225 | <P> | |
1226 | The remaining changes to your C sources are discussed in the further | |
1227 | sections of this chapter. | |
1228 | ||
1229 | </P> | |
1230 | ||
1231 | ||
1232 | ||
1233 | <H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">Triggering <CODE>gettext</CODE> Operations</A></H2> | |
1234 | ||
1235 | <P> | |
1236 | The initialization of locale data should be done with more or less | |
1237 | the same code in every program, as demonstrated below: | |
1238 | ||
1239 | </P> | |
1240 | ||
1241 | <PRE> | |
1242 | int | |
1243 | main (argc, argv) | |
1244 | int argc; | |
1245 | char argv; | |
1246 | { | |
1247 | ... | |
1248 | setlocale (LC_ALL, ""); | |
1249 | bindtextdomain (PACKAGE, LOCALEDIR); | |
1250 | textdomain (PACKAGE); | |
1251 | ... | |
1252 | } | |
1253 | </PRE> | |
1254 | ||
1255 | <P> | |
1256 | <VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by | |
1257 | <TT>`config.h'</TT> or by the Makefile. For now consult the <CODE>gettext</CODE> | |
1258 | sources for more information. | |
1259 | ||
1260 | </P> | |
1261 | <P> | |
1262 | The use of <CODE>LC_ALL</CODE> might not be appropriate for you. | |
1263 | <CODE>LC_ALL</CODE> includes all locale categories and especially | |
1264 | <CODE>LC_CTYPE</CODE>. This later category is responsible for determining | |
1265 | character classes with the <CODE>isalnum</CODE> etc. functions from | |
1266 | <TT>`ctype.h'</TT> which could especially for programs, which process some | |
1267 | kind of input language, be wrong. For example this would mean that a | |
1268 | source code using the (cedille character) is runnable in | |
1269 | France but not in the U.S. | |
1270 | ||
1271 | </P> | |
1272 | <P> | |
1273 | So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the | |
1274 | code above by a sequence of <CODE>setlocale</CODE> lines | |
1275 | ||
1276 | </P> | |
1277 | ||
1278 | <PRE> | |
1279 | { | |
1280 | ... | |
1281 | setlocale (LC_TIME, ""); | |
1282 | setlocale (LC_MESSAGES, ""); | |
1283 | ... | |
1284 | } | |
1285 | </PRE> | |
1286 | ||
1287 | <P> | |
1288 | or to switch for and back to the character class in question. | |
1289 | ||
1290 | </P> | |
1291 | ||
1292 | ||
1293 | <H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">How Marks Appears in Sources</A></H2> | |
1294 | ||
1295 | <P> | |
1296 | The C sources should mark all strings requiring translation. Marking | |
1297 | is done in such a way that each translatable string appears to be | |
1298 | the sole argument of some function or preprocessor macro. There are | |
1299 | only a few such possible functions or macros meant for translation, | |
1300 | and their names are said to be marking keywords. The marking is | |
1301 | attached to strings themselves, rather than to what we do with them. | |
1302 | This approach has more uses. A blatant example is an error message | |
1303 | produced by formatting. The format string needs translation, as | |
1304 | well as some strings inserted through some <SAMP>`%s'</SAMP> specification | |
1305 | in the format, while the result from <CODE>sprintf</CODE> may have so many | |
1306 | different instances that it is unpractical to list them all in some | |
1307 | <SAMP>`error_string_out()'</SAMP> routine, say. | |
1308 | ||
1309 | </P> | |
1310 | <P> | |
1311 | This marking operation has two goals. The first goal of marking | |
1312 | is for triggering the retrieval of the translation, at run time. | |
1313 | The keyword are possibly resolved into a routine able to dynamically | |
1314 | return the proper translation, as far as possible or wanted, for the | |
1315 | argument string. Most localizable strings are found into executable | |
1316 | positions, that is, affected to variables or given as parameter to | |
1317 | functions. But this is not universal usage, and some translatable | |
1318 | strings appear in structured initializations. See section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>. | |
1319 | ||
1320 | </P> | |
1321 | <P> | |
1322 | The second goal of the marking operation is to help <CODE>xgettext</CODE> | |
1323 | at properly extracting all translatable strings when it scans a set | |
1324 | of program sources and produces PO file templates. | |
1325 | ||
1326 | </P> | |
1327 | <P> | |
1328 | The canonical keyword for marking translatable strings is | |
1329 | <SAMP>`gettext'</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE> | |
1330 | package. For packages making only light use of the <SAMP>`gettext'</SAMP> | |
1331 | keyword, macro or function, it is easily used <EM>as is</EM>. However, | |
1332 | for packages using the <CODE>gettext</CODE> interface more heavily, it | |
1333 | is usually more convenient giving the main keyword a shorter, less | |
1334 | obtrusive name. Indeed, the keyword might appear on a lot of strings | |
1335 | all over the package, and programmers usually do not want nor need | |
1336 | that their program sources remind them loud, all the time, that they | |
1337 | are internationalized. Further, a long keyword has the disadvantage | |
1338 | of using more horizontal space, forcing more indentation work on | |
1339 | sources for those trying to keep them within 79 or 80 columns. | |
1340 | ||
1341 | </P> | |
1342 | <P> | |
1343 | Many GNU packages use <SAMP>`_'</SAMP> (a simple underline) as a keyword, | |
1344 | and write <SAMP>`_("Translatable string")'</SAMP> instead of <SAMP>`gettext | |
1345 | ("Translatable string")'</SAMP>. Further, the usual GNU coding rule | |
1346 | wanting that there is a space between the keyword and the opening | |
1347 | parenthesis is relaxed, in practice, for this particular usage. | |
1348 | So, the textual overhead per translatable string is reduced to | |
1349 | only three characters: the underline and the two parentheses. | |
1350 | However, even if GNU <CODE>gettext</CODE> uses this convention internally, | |
1351 | it does not offer it officially. The real, genuine keyword is truly | |
1352 | <SAMP>`gettext'</SAMP> indeed. It is fairly easy for those wanting to use | |
1353 | <SAMP>`_'</SAMP> instead of <SAMP>`gettext'</SAMP> to declare: | |
1354 | ||
1355 | </P> | |
1356 | ||
1357 | <PRE> | |
1358 | #include <libintl.h> | |
1359 | #define _(String) gettext (String) | |
1360 | </PRE> | |
1361 | ||
1362 | <P> | |
1363 | instead of merely using <SAMP>`#include <libintl.h>'</SAMP>. | |
1364 | ||
1365 | </P> | |
1366 | <P> | |
1367 | Later on, the maintenance is relatively easy. If, as a programmer, | |
1368 | you add or modify a string, you will have to ask yourself if the | |
1369 | new or altered string requires translation, and include it within | |
1370 | <SAMP>`_()'</SAMP> if you think it should be translated. <SAMP>`"%s: %d"'</SAMP> is | |
1371 | an example of string <EM>not</EM> requiring translation! | |
1372 | ||
1373 | </P> | |
1374 | ||
1375 | ||
1376 | <H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">Marking Translatable Strings</A></H2> | |
1377 | ||
1378 | <P> | |
1379 | In PO mode, one set of features is meant more for the programmer than | |
1380 | for the translator, and allows him to interactively mark which strings, | |
1381 | in a set of program sources, are translatable, and which are not. | |
1382 | Even if it is a fairly easy job for a programmer to find and mark | |
1383 | such strings by other means, using any editor of his choice, PO mode | |
1384 | makes this work more comfortable. Further, this gives translators | |
1385 | who feel a little like programmers, or programmers who feel a little | |
1386 | like translators, a tool letting them work at marking translatable | |
1387 | strings in the program sources, while simultaneously producing a set of | |
1388 | translation in some language, for the package being internationalized. | |
1389 | ||
1390 | </P> | |
1391 | <P> | |
1392 | The set of program sources, aimed by the PO mode commands describe | |
1393 | here, should have an Emacs tags table constructed for your project, | |
1394 | prior to using these PO file commands. This is easy to do. In any | |
1395 | shell window, change the directory to the root of your project, then | |
1396 | execute a command resembling: | |
1397 | ||
1398 | </P> | |
1399 | ||
1400 | <PRE> | |
1401 | etags src/*.[hc] lib/*.[hc] | |
1402 | </PRE> | |
1403 | ||
1404 | <P> | |
1405 | presuming here you want to process all <TT>`.h'</TT> and <TT>`.c'</TT> files | |
1406 | from the <TT>`src/'</TT> and <TT>`lib/'</TT> directories. This command will | |
1407 | explore all said files and create a <TT>`TAGS'</TT> file in your root | |
1408 | directory, somewhat summarizing the contents using a special file | |
1409 | format Emacs can understand. | |
1410 | ||
1411 | </P> | |
1412 | <P> | |
1413 | For official GNU packages which follow the GNU coding standard there is | |
1414 | a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which construct the tag files in | |
1415 | all directories and for all files containing source code. | |
1416 | ||
1417 | </P> | |
1418 | <P> | |
1419 | Once your <TT>`TAGS'</TT> file is ready, the following commands assist | |
1420 | the programmer at marking translatable strings in his set of sources. | |
1421 | But these commands are necessarily driven from within a PO file | |
1422 | window, and it is likely that you do not even have such a PO file yet. | |
1423 | This is not a problem at all, as you may safely open a new, empty PO | |
1424 | file, mainly for using these commands. This empty PO file will slowly | |
1425 | fill in while you mark strings as translatable in your program sources. | |
1426 | ||
1427 | </P> | |
1428 | <DL COMPACT> | |
1429 | ||
1430 | <DT><KBD>,</KBD> | |
1431 | <DD> | |
1432 | Search through program sources for a string which looks like a | |
1433 | candidate for translation. | |
1434 | ||
1435 | <DT><KBD>M-,</KBD> | |
1436 | <DD> | |
1437 | Mark the last string found with <SAMP>`_()'</SAMP>. | |
1438 | ||
1439 | <DT><KBD>M-.</KBD> | |
1440 | <DD> | |
1441 | Mark the last string found with a keyword taken from a set of possible | |
1442 | keywords. This command with a prefix allows some management of these | |
1443 | keywords. | |
1444 | ||
1445 | </DL> | |
1446 | ||
1447 | <P> | |
1448 | The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command search for the next | |
1449 | occurrence of a string which looks like a possible candidate for | |
1450 | translation, and displays the program source in another Emacs window, | |
1451 | positioned in such a way that the string is near the top of this other | |
1452 | window. If the string is to big to fit whole in this window, it is | |
1453 | rather positioned so only its end is shown. In any case, the cursor | |
1454 | is left in the PO file window. If the shown string would be better | |
1455 | presented differently in different native languages, you may mark it | |
1456 | using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it | |
1457 | and skip to the next string by merely repeating the <KBD>,</KBD> command. | |
1458 | ||
1459 | </P> | |
1460 | <P> | |
1461 | A string is a good candidate for translation if it contains a sequence | |
1462 | of three or more letters. A string containing at most two letters in | |
1463 | a row will be considered as a candidate if it has more letters than | |
1464 | non-letters. The command disregards strings containing no letters, | |
1465 | or isolated letters only. It also disregards strings within comments, | |
1466 | or strings already marked with some keyword PO mode knows (see below). | |
1467 | ||
1468 | </P> | |
1469 | <P> | |
1470 | If you have never told Emacs about some <TT>`TAGS'</TT> file to use, the | |
1471 | command will request that you specify one from the minibuffer, the | |
1472 | first time you use the command. You may later change your <TT>`TAGS'</TT> | |
1473 | file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>, | |
1474 | which will ask you to name the precise <TT>`TAGS'</TT> file you want | |
1475 | to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>. | |
1476 | ||
1477 | </P> | |
1478 | <P> | |
1479 | Each time you use the <KBD>,</KBD> command, the search resumes where it was | |
1480 | left over by the previous search, and goes through all program sources, | |
1481 | obeying the <TT>`TAGS'</TT> file, until all sources have been processed. | |
1482 | However, by giving a prefix argument to the command (<KBD>C-u | |
1483 | ,)</KBD>, you may request that the search be restarted all over again | |
1484 | from the first program source; but in this case, strings that you | |
1485 | recently marked as translatable will be automatically skipped. | |
1486 | ||
1487 | </P> | |
1488 | <P> | |
1489 | Using this <KBD>,</KBD> command does not prevent using of other regular | |
1490 | Emacs tags commands. For example, regular <CODE>tags-search</CODE> or | |
1491 | <CODE>tags-query-replace</CODE> commands may be used without disrupting the | |
1492 | independent <KBD>,</KBD> search sequence. However, as implemented, the | |
1493 | <EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a | |
1494 | prefix) might also reinitialize the regular Emacs tags searching to the | |
1495 | first tags file, this reinitialization might be considered spurious. | |
1496 | ||
1497 | </P> | |
1498 | <P> | |
1499 | The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the | |
1500 | recently found string with the <SAMP>`_'</SAMP> keyword. The <KBD>M-.</KBD> | |
1501 | (<CODE>po-select-mark-and-mark</CODE>) command will request that you type | |
1502 | one keyword from the minibuffer and use that keyword for marking | |
1503 | the string. Both commands will automatically create a new PO file | |
1504 | untranslated entry for the string being marked, and make it the | |
1505 | current entry (making it easy for you to immediately proceed to its | |
1506 | translation, if you feel like doing it right away). It is possible | |
1507 | that the modifications made to the program source by <KBD>M-,</KBD> or | |
1508 | <KBD>M-.</KBD> render some source line longer than 80 columns, forcing you | |
1509 | to break and re-indent this line differently. You may use the <KBD>o</KBD> | |
1510 | command from PO mode, or any other window changing command from | |
1511 | GNU Emacs, to break out into the program source window, and do any | |
1512 | needed adjustments. You will have to use some regular Emacs command | |
1513 | to return the cursor to the PO file window, if you want commanding | |
1514 | <KBD>,</KBD> for the next string, say. | |
1515 | ||
1516 | </P> | |
1517 | <P> | |
1518 | The <KBD>M-.</KBD> command has a few built-in speedups, so you do not | |
1519 | have to explicitly type all keywords all the time. The first such | |
1520 | speedup is that you are presented with a <EM>preferred</EM> keyword, | |
1521 | which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt. | |
1522 | The second speedup is that you may type any non-ambiguous prefix of the | |
1523 | keyword you really mean, and the command will complete it automatically | |
1524 | for you. This also means that PO mode has to <EM>know</EM> all | |
1525 | your possible keywords, and that it will not accept mistyped keywords. | |
1526 | ||
1527 | </P> | |
1528 | <P> | |
1529 | If you reply <KBD>?</KBD> to the keyword request, the command gives a | |
1530 | list of all known keywords, from which you may choose. When the | |
1531 | command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits | |
1532 | updating any program source or PO file buffer, and does some simple | |
1533 | keyword management instead. In this case, the command asks for a | |
1534 | keyword, written in full, which becomes a new allowed keyword for | |
1535 | later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically | |
1536 | becomes the <EM>preferred</EM> keyword for later commands. By typing | |
1537 | an already known keyword in response to <KBD>C-u M-.</KBD>, one merely | |
1538 | changes the <EM>preferred</EM> keyword and does nothing more. | |
1539 | ||
1540 | </P> | |
1541 | <P> | |
1542 | All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command | |
1543 | when scanning for strings, and strings already marked by any of those | |
1544 | known keywords are automatically skipped. If many PO files are opened | |
1545 | simultaneously, each one has its own independent set of known keywords. | |
1546 | There is no provision in PO mode, currently, for deleting a known | |
1547 | keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen | |
1548 | it afresh. When a PO file is newly brought up in an Emacs window, only | |
1549 | <SAMP>`gettext'</SAMP> and <SAMP>`_'</SAMP> are known as keywords, and <SAMP>`gettext'</SAMP> | |
1550 | is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to | |
1551 | prefer <SAMP>`_'</SAMP>, as this one is already built in the <KBD>M-,</KBD> command. | |
1552 | ||
1553 | </P> | |
1554 | ||
1555 | ||
1556 | <H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">Special Cases of Translatable Strings</A></H2> | |
1557 | ||
1558 | <P> | |
1559 | The attentive reader might now point out that it is not always possible | |
1560 | to mark translatable string with <CODE>gettext</CODE> or something like this. | |
1561 | Consider the following case: | |
1562 | ||
1563 | </P> | |
1564 | ||
1565 | <PRE> | |
1566 | { | |
1567 | static const char *messages[] = { | |
1568 | "some very meaningful message", | |
1569 | "and another one" | |
1570 | }; | |
1571 | const char *string; | |
1572 | ... | |
1573 | string | |
1574 | = index > 1 ? "a default message" : messages[index]; | |
1575 | ||
1576 | fputs (string); | |
1577 | ... | |
1578 | } | |
1579 | </PRE> | |
1580 | ||
1581 | <P> | |
1582 | While it is no problem to mark the string <CODE>"a default message"</CODE> it | |
1583 | is not possible to mark the string initializers for <CODE>messages</CODE>. | |
1584 | What is to do? We have to fulfill two tasks. First we have to mark the | |
1585 | strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>) | |
1586 | can find them, and second we have to translate the string at runtime | |
1587 | before printing them. | |
1588 | ||
1589 | </P> | |
1590 | <P> | |
1591 | The first task can be fulfilled by creating a new keyword, which names a | |
1592 | no-op. For the second we have to mark all access points to a string | |
1593 | from the array. So one solution can look like this: | |
1594 | ||
1595 | </P> | |
1596 | ||
1597 | <PRE> | |
1598 | #define gettext_noop(String) (String) | |
1599 | ||
1600 | { | |
1601 | static const char *messages[] = { | |
1602 | gettext_noop ("some very meaningful message"), | |
1603 | gettext_noop ("and another one") | |
1604 | }; | |
1605 | const char *string; | |
1606 | ... | |
1607 | string | |
1608 | = index > 1 ? gettext ("a default message") : gettext (messages[index]); | |
1609 | ||
1610 | fputs (string); | |
1611 | ... | |
1612 | } | |
1613 | </PRE> | |
1614 | ||
1615 | <P> | |
1616 | Please convince yourself that the string which is written by | |
1617 | <CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know | |
1618 | the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext.html#SEC19">Invoking the <CODE>xgettext</CODE> Program</A>. | |
1619 | ||
1620 | </P> | |
1621 | <P> | |
1622 | The above is of course not the only solution. You could also come along | |
1623 | with the following one: | |
1624 | ||
1625 | </P> | |
1626 | ||
1627 | <PRE> | |
1628 | #define gettext_noop(String) (String) | |
1629 | ||
1630 | { | |
1631 | static const char *messages[] = { | |
1632 | gettext_noop ("some very meaningful message", | |
1633 | gettext_noop ("and another one") | |
1634 | }; | |
1635 | const char *string; | |
1636 | ... | |
1637 | string | |
1638 | = index > 1 ? gettext_noop ("a default message") : messages[index]; | |
1639 | ||
1640 | fputs (gettext (string)); | |
1641 | ... | |
1642 | } | |
1643 | </PRE> | |
1644 | ||
1645 | <P> | |
1646 | But this has some drawbacks. First the programmer has to take care that | |
1647 | he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>. | |
1648 | A use of <CODE>gettext</CODE> could have in rare cases unpredictable results. | |
1649 | The second reason is found in the internals of the GNU <CODE>gettext</CODE> | |
1650 | Library which will make this solution less efficient. | |
1651 | ||
1652 | </P> | |
1653 | <P> | |
1654 | One advantage is that you need not make control flow analysis to make | |
1655 | sure the output is really translated in any case. But this analysis is | |
1656 | generally not very difficult. If it should be in any situation you can | |
1657 | use this second method in this situation. | |
1658 | ||
1659 | </P> | |
1660 | ||
1661 | ||
1662 | ||
1663 | <H1><A NAME="SEC18" HREF="gettext_toc.html#TOC18">Making the Initial PO File</A></H1> | |
1664 | ||
1665 | ||
1666 | ||
1667 | <H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">Invoking the <CODE>xgettext</CODE> Program</A></H2> | |
1668 | ||
1669 | ||
1670 | <PRE> | |
1671 | xgettext [<VAR>option</VAR>] <VAR>inputfile</VAR> ... | |
1672 | </PRE> | |
1673 | ||
1674 | <DL COMPACT> | |
1675 | ||
1676 | <DT><SAMP>`-a'</SAMP> | |
1677 | <DD> | |
1678 | <DT><SAMP>`--extract-all'</SAMP> | |
1679 | <DD> | |
1680 | Extract all strings. | |
1681 | ||
1682 | <DT><SAMP>`-c [<VAR>tag</VAR>]'</SAMP> | |
1683 | <DD> | |
1684 | <DT><SAMP>`--add-comments[=<VAR>tag</VAR>]'</SAMP> | |
1685 | <DD> | |
1686 | Place comment block with <VAR>tag</VAR> (or those preceding keyword lines) | |
1687 | in output file. | |
1688 | ||
1689 | <DT><SAMP>`-C'</SAMP> | |
1690 | <DD> | |
1691 | <DT><SAMP>`--c++'</SAMP> | |
1692 | <DD> | |
1693 | Recognize C++ style comments. | |
1694 | ||
1695 | <DT><SAMP>`-d <VAR>name</VAR>'</SAMP> | |
1696 | <DD> | |
1697 | <DT><SAMP>`--default-domain=<VAR>name</VAR>'</SAMP> | |
1698 | <DD> | |
1699 | Use <TT>`<VAR>name</VAR>.po'</TT> for output (instead of <TT>`messages.po'</TT>). | |
1700 | ||
1701 | <DT><SAMP>`-D <VAR>directory</VAR>'</SAMP> | |
1702 | <DD> | |
1703 | <DT><SAMP>`--directory=<VAR>directory</VAR>'</SAMP> | |
1704 | <DD> | |
1705 | Change to <VAR>directory</VAR> before beginning to search and scan source | |
1706 | files. The resulting <TT>`.po'</TT> file will be written relative to the | |
1707 | original directory, though. | |
1708 | ||
1709 | <DT><SAMP>`-f <VAR>file</VAR>'</SAMP> | |
1710 | <DD> | |
1711 | <DT><SAMP>`--files-from=<VAR>file</VAR>'</SAMP> | |
1712 | <DD> | |
1713 | Read the names of the input files from <VAR>file</VAR> instead of getting | |
1714 | them from the command line. | |
1715 | ||
1716 | <DT><SAMP>`-h'</SAMP> | |
1717 | <DD> | |
1718 | <DT><SAMP>`--help'</SAMP> | |
1719 | <DD> | |
1720 | Display this help and exit. | |
1721 | ||
1722 | <DT><SAMP>`-I <VAR>list</VAR>'</SAMP> | |
1723 | <DD> | |
1724 | <DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP> | |
1725 | <DD> | |
1726 | List of directories searched for input files. | |
1727 | ||
1728 | <DT><SAMP>`-j'</SAMP> | |
1729 | <DD> | |
1730 | <DT><SAMP>`--join-existing'</SAMP> | |
1731 | <DD> | |
1732 | Join messages with existing file. | |
1733 | ||
1734 | <DT><SAMP>`-k <VAR>word</VAR>'</SAMP> | |
1735 | <DD> | |
1736 | <DT><SAMP>`--keyword[=<VAR>word</VAR>]'</SAMP> | |
1737 | <DD> | |
1738 | Additonal keyword to be looked for (without <VAR>word</VAR> means not to | |
1739 | use default keywords). | |
1740 | ||
1741 | The default keywords, which are always looked for if not explicitly | |
1742 | disabled, are <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE> and | |
1743 | <CODE>gettext_noop</CODE>. | |
1744 | ||
1745 | <DT><SAMP>`-m [<VAR>string</VAR>]'</SAMP> | |
1746 | <DD> | |
1747 | <DT><SAMP>`--msgstr-prefix[=<VAR>string</VAR>]'</SAMP> | |
1748 | <DD> | |
1749 | Use <VAR>string</VAR> or "" as prefix for msgstr entries. | |
1750 | ||
1751 | <DT><SAMP>`-M [<VAR>string</VAR>]'</SAMP> | |
1752 | <DD> | |
1753 | <DT><SAMP>`--msgstr-suffix[=<VAR>string</VAR>]'</SAMP> | |
1754 | <DD> | |
1755 | Use <VAR>string</VAR> or "" as suffix for msgstr entries. | |
1756 | ||
1757 | <DT><SAMP>`--no-location'</SAMP> | |
1758 | <DD> | |
1759 | Do not write <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines. | |
1760 | ||
1761 | <DT><SAMP>`-n'</SAMP> | |
1762 | <DD> | |
1763 | <DT><SAMP>`--add-location'</SAMP> | |
1764 | <DD> | |
1765 | Generate <SAMP>`#: <VAR>filename</VAR>:<VAR>line</VAR>'</SAMP> lines (default). | |
1766 | ||
1767 | <DT><SAMP>`--omit-header'</SAMP> | |
1768 | <DD> | |
1769 | Don't write header with <SAMP>`msgid ""'</SAMP> entry. | |
1770 | ||
1771 | This is useful for testing purposes because it eliminates a source | |
1772 | of variance for generated <CODE>.gmo</CODE> files. We can ship some of | |
1773 | these files in the GNU <CODE>gettext</CODE> package, and the result of | |
1774 | regenerating them through <CODE>msgfmt</CODE> should yield the same values. | |
1775 | ||
1776 | <DT><SAMP>`-p <VAR>dir</VAR>'</SAMP> | |
1777 | <DD> | |
1778 | <DT><SAMP>`--output-dir=<VAR>dir</VAR>'</SAMP> | |
1779 | <DD> | |
1780 | Output files will be placed in directory <VAR>dir</VAR>. | |
1781 | ||
1782 | <DT><SAMP>`-s'</SAMP> | |
1783 | <DD> | |
1784 | <DT><SAMP>`--sort-output'</SAMP> | |
1785 | <DD> | |
1786 | Generate sorted output and remove duplicates. | |
1787 | ||
1788 | <DT><SAMP>`--strict'</SAMP> | |
1789 | <DD> | |
1790 | Write out strict Uniforum conforming PO file. | |
1791 | ||
1792 | <DT><SAMP>`-v'</SAMP> | |
1793 | <DD> | |
1794 | <DT><SAMP>`--version'</SAMP> | |
1795 | <DD> | |
1796 | Output version information and exit. | |
1797 | ||
1798 | <DT><SAMP>`-x <VAR>file</VAR>'</SAMP> | |
1799 | <DD> | |
1800 | <DT><SAMP>`--exclude-file=<VAR>file</VAR>'</SAMP> | |
1801 | <DD> | |
1802 | Entries from <VAR>file</VAR> are not extracted. | |
1803 | ||
1804 | </DL> | |
1805 | ||
1806 | <P> | |
1807 | Search path for supplementary PO files is: | |
1808 | <TT>`/usr/local/share/nls/src/'</TT>. | |
1809 | ||
1810 | </P> | |
1811 | <P> | |
1812 | If <VAR>inputfile</VAR> is <SAMP>`-'</SAMP>, standard input is read. | |
1813 | ||
1814 | </P> | |
1815 | <P> | |
1816 | This implementation of <CODE>xgettext</CODE> is able to process a few awkward | |
1817 | cases, like strings in preprocessor macros, ANSI concatenation of | |
1818 | adjacent strings, and escaped end of lines for continued strings. | |
1819 | ||
1820 | </P> | |
1821 | ||
1822 | ||
1823 | <H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">C Sources Context</A></H2> | |
1824 | ||
1825 | <P> | |
1826 | PO mode is particularily powerful when used with PO files | |
1827 | created through GNU <CODE>gettext</CODE> utilities, as those utilities | |
1828 | insert special comments in the PO files they generate. | |
1829 | Some of these special comments relate the PO file entry to | |
1830 | exactly where the untranslated string appears in the program sources. | |
1831 | ||
1832 | </P> | |
1833 | <P> | |
1834 | When the translator gets to an untranslated entry, she is fairly | |
1835 | often faced with an original string which is not as informative as | |
1836 | it normally should, being succinct, cryptic, or otherwise ambiguous. | |
1837 | Before chosing how to translate the string, she needs to understand | |
1838 | better what the string really means and how tight the translation has | |
1839 | to be. Most of times, when problems arise, the only way left to make | |
1840 | her judgment is looking at the true program sources from where this | |
1841 | string originated, searching for surrounding comments the programmer | |
1842 | might have put in there, and looking around for helping clues of | |
1843 | <EM>any</EM> kind. | |
1844 | ||
1845 | </P> | |
1846 | <P> | |
1847 | Surely, when looking at program sources, the translator will receive | |
1848 | more help if she is a fluent programmer. However, even if she is | |
1849 | not versed in programming and feels a little lost in C code, the | |
1850 | translator should not be shy at taking a look, once in a while. | |
1851 | It is most probable that she will still be able to find some of the | |
1852 | hints she needs. She will learn quickly to not feel uncomfortable | |
1853 | in program code, paying more attention to programmer's comments, | |
1854 | variable and function names (if he dared chosing them well), and | |
1855 | overall organization, than to programmation itself. | |
1856 | ||
1857 | </P> | |
1858 | <P> | |
1859 | The following commands are meant to help the translator at getting | |
1860 | program source context for a PO file entry. | |
1861 | ||
1862 | </P> | |
1863 | <DL COMPACT> | |
1864 | ||
1865 | <DT><KBD>c</KBD> | |
1866 | <DD> | |
1867 | Resume the display of a program source context, or cycle through them. | |
1868 | ||
1869 | <DT><KBD>M-c</KBD> | |
1870 | <DD> | |
1871 | Display of a program source context selected by menu. | |
1872 | ||
1873 | <DT><KBD>d</KBD> | |
1874 | <DD> | |
1875 | Add a directory to the search path for source files. | |
1876 | ||
1877 | <DT><KBD>M-d</KBD> | |
1878 | <DD> | |
1879 | Delete a directory from the search path for source files. | |
1880 | ||
1881 | </DL> | |
1882 | ||
1883 | <P> | |
1884 | The commands <KBD>c</KBD> (<CODE>po-cycle-reference</CODE>) and <KBD>M-c</KBD> | |
1885 | (<CODE>po-select-reference</CODE>) both open another window displaying | |
1886 | some source program file, and already positioned in such a way that | |
1887 | it shows an actual use of the current string to translate. By doing | |
1888 | so, the command gives source program context for the string. But if | |
1889 | the entry has no source context references, or if all references | |
1890 | are unresolved along the search path for program sources, then the | |
1891 | command diagnoses this as an error. | |
1892 | ||
1893 | </P> | |
1894 | <P> | |
1895 | Even if <KBD>c</KBD> (or <KBD>M-c</KBD>) opens a new window, the cursor stays | |
1896 | in the PO file window. If the translator really wants to | |
1897 | get into the program source window, she ought to do it explicitly, | |
1898 | maybe by using command <KBD>o</KBD>. | |
1899 | ||
1900 | </P> | |
1901 | <P> | |
1902 | When <KBD>c</KBD> is typed for the first time, or for a PO file entry which | |
1903 | is different of the last one used for getting source context, then the | |
1904 | command reacts by giving the first context available for this entry, | |
1905 | if any. If some context has already been recently displayed for the | |
1906 | current PO file entry, and the translator wandered to do other | |
1907 | things, typing <KBD>c</KBD> again will merely resume, in another window, | |
1908 | the context last displayed. In particular, if the translator moved | |
1909 | the cursor away from the context in the source file, the command will | |
1910 | bring the cursor back to the context. By using <KBD>c</KBD> many times | |
1911 | in a row, with no interning other commands, PO mode will cycle to | |
1912 | the next available contexts for this particular entry, getting back | |
1913 | to the first context once the last has been shown. | |
1914 | ||
1915 | </P> | |
1916 | <P> | |
1917 | The command <KBD>M-c</KBD> behaves differently. Instead of cycling through | |
1918 | references, it lets the translator choose of particular reference among | |
1919 | many, and displays that reference. It is best used with completion, | |
1920 | if the translator types <KBD>TAB</KBD> immediately after <KBD>M-c</KBD>, in | |
1921 | response to the question, she will be offered a menu of all possible | |
1922 | references, as a reminder of which are the acceptable answers. | |
1923 | This command is useful only where there are really many contexts | |
1924 | available for a single string to translate. | |
1925 | ||
1926 | </P> | |
1927 | <P> | |
1928 | Program source files are usually found relative to where the PO | |
1929 | file stands. As a special provision, when this fails, the file is | |
1930 | also looked for, but relative to the directory immediately above it. | |
1931 | Those two cases take proper care of most PO files. However, it might | |
1932 | happen that a PO file has been moved, or is edited in a different | |
1933 | place than its normal location. When this happens, the translator | |
1934 | should tell PO mode in which directory normally sits the genuine PO | |
1935 | file. Many such directories may be specified, and all together, they | |
1936 | constitute what is called the <STRONG>search path</STRONG> for program sources. | |
1937 | The command <KBD>d</KBD> (<CODE>po-add-path</CODE>) is used to interactively | |
1938 | enter a new directory at the front of the search path, and the command | |
1939 | <KBD>M-d</KBD> (<CODE>po-delete-path</CODE>) is used to select, with completion, | |
1940 | one of the directories she does not want anymore on the search path. | |
1941 | ||
1942 | </P> | |
1943 | ||
1944 | ||
1945 | <H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">Using Translation Compendiums</A></H2> | |
1946 | ||
1947 | <P> | |
1948 | Compendiums are yet to be implemented. | |
1949 | ||
1950 | </P> | |
1951 | <P> | |
1952 | An incoming PO mode feature will let the translator maintain a | |
1953 | compendium of already achieved translations. A <STRONG>compendium</STRONG> | |
1954 | is a special PO file containing a set of translations recurring in | |
1955 | many different packages. The translator will be given commands for | |
1956 | adding entries to her compendium, and later initializing untranslated | |
1957 | entries, or updating already translated entries, from translations | |
1958 | kept in the compendium. For this to work, however, the compendium | |
1959 | would have to be normalized. See section <A HREF="gettext.html#SEC12">Normalizing Strings in Entries</A>. | |
1960 | ||
1961 | </P> | |
1962 | ||
1963 | ||
1964 | ||
1965 | <H1><A NAME="SEC22" HREF="gettext_toc.html#TOC22">Updating Existing PO Files</A></H1> | |
1966 | ||
1967 | ||
1968 | ||
1969 | <H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">Invoking the <CODE>tupdate</CODE> Program</A></H2> | |
1970 | ||
1971 | ||
1972 | <PRE> | |
1973 | tupdate --help | |
1974 | tupdate --version | |
1975 | tupdate <VAR>new</VAR> <VAR>old</VAR> | |
1976 | </PRE> | |
1977 | ||
1978 | <P> | |
1979 | File <VAR>new</VAR> is the last created PO file (generally by | |
1980 | <CODE>xgettext</CODE>). It need not contain any translations. File | |
1981 | <VAR>old</VAR> is the PO file including the old translations which will | |
1982 | be taken over to the newly created file as long as they still match. | |
1983 | ||
1984 | </P> | |
1985 | <P> | |
1986 | When English messages change in the programs, this is reflected in | |
1987 | the PO file as extracted by <CODE>xgettext</CODE>. In large messages, that | |
1988 | can be hard to detect, and will obviously result in an incomplete | |
1989 | translation. One of the virtues of <CODE>tupdate</CODE> is that it detects | |
1990 | such changes, saving the previous translation into a PO file comment, | |
1991 | so marking the entry as obsolete, and giving the modified string with | |
1992 | an empty translation, that is, marking the entry as untranslated. | |
1993 | ||
1994 | </P> | |
1995 | ||
1996 | ||
1997 | <H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">Untranslated Entries</A></H2> | |
1998 | ||
1999 | <P> | |
2000 | When <CODE>xgettext</CODE> originally creates a PO file, unless told | |
2001 | otherwise, it initializes the <CODE>msgid</CODE> field with the untranslated | |
2002 | string, and leaves the <CODE>msgstr</CODE> string to be empty. Such entries, | |
2003 | having an empty translation, are said to be <STRONG>untranslated</STRONG> entries. | |
2004 | Later, when the programmer slightly modifies some string right in | |
2005 | the program, this change is later reflected in the PO file | |
2006 | by the appearance of a new untranslated entry for the modified string. | |
2007 | ||
2008 | </P> | |
2009 | <P> | |
2010 | The usual commands moving from entry to entry consider untranslated | |
2011 | entries on the same level as active entries. Untranslated entries | |
2012 | are easily recognizable by the fact they end with <SAMP>`msgstr ""'</SAMP>. | |
2013 | ||
2014 | </P> | |
2015 | <P> | |
2016 | The work of the translator might be (quite naively) seen as the process | |
2017 | of seeking after an untranslated entry, editing a translation for | |
2018 | it, and repeating these actions until no untranslated entries remain. | |
2019 | Some commands are more specifically related to untranslated entry | |
2020 | processing. | |
2021 | ||
2022 | </P> | |
2023 | <DL COMPACT> | |
2024 | ||
2025 | <DT><KBD>e</KBD> | |
2026 | <DD> | |
2027 | Find the next untranslated entry. | |
2028 | ||
2029 | <DT><KBD>M-e</KBD> | |
2030 | <DD> | |
2031 | Find the previous untranslated entry. | |
2032 | ||
2033 | <DT><KBD>k</KBD> | |
2034 | <DD> | |
2035 | Turn the current entry into an untranslated one. | |
2036 | ||
2037 | </DL> | |
2038 | ||
2039 | <P> | |
2040 | The commands <KBD>e</KBD> (<CODE>po-next-empty-entry</CODE>) and <KBD>M-e</KBD> | |
2041 | (<CODE>po-previous-empty</CODE>) move forwards or backwards, chasing for an | |
2042 | obsolete entry. If none is found, the search is extended and wraps | |
2043 | around in the PO file buffer. | |
2044 | ||
2045 | </P> | |
2046 | <P> | |
2047 | An entry can be turned back into an untranslated entry by | |
2048 | merely emptying its translation, using the command <KBD>k</KBD> | |
2049 | (<CODE>po-kill-msgstr</CODE>). See section <A HREF="gettext.html#SEC26">Modifying Translations</A>. | |
2050 | ||
2051 | </P> | |
2052 | <P> | |
2053 | Also, when time comes to quit working on a PO file buffer | |
2054 | with the <KBD>q</KBD> command, the translator is asked for confirmation, | |
2055 | if some untranslated string still exists. | |
2056 | ||
2057 | </P> | |
2058 | ||
2059 | ||
2060 | <H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">Obsolete Entries</A></H2> | |
2061 | ||
2062 | <P> | |
2063 | By <STRONG>obsolete</STRONG> PO file entries, we mean those entries which are | |
2064 | commented out, usually by <CODE>tupdate</CODE> when it found that the | |
2065 | translation is not needed anymore by the package being localized. | |
2066 | ||
2067 | </P> | |
2068 | <P> | |
2069 | The usual commands moving from entry to entry consider obsolete | |
2070 | entries on the same level as active entries. Obsolete entries are | |
2071 | easily recognizable by the fact that all their lines start with | |
2072 | <KBD>#</KBD>, even those lines containing <CODE>msgid</CODE> or <CODE>msgstr</CODE>. | |
2073 | ||
2074 | </P> | |
2075 | <P> | |
2076 | Commands exist for emptying the translation or reinitializing it | |
2077 | to the original untranslated string. Commands interfacing with the | |
2078 | kill ring may force some previously saved text into the translation. | |
2079 | The user may interactively edit the translation. All these commands | |
2080 | may apply to obsolete entries, carefully leaving the entry obsolete | |
2081 | after the fact. | |
2082 | ||
2083 | </P> | |
2084 | <P> | |
2085 | Moreover, some commands are more specifically related to obsolete | |
2086 | entry processing. | |
2087 | ||
2088 | </P> | |
2089 | <DL COMPACT> | |
2090 | ||
2091 | <DT><KBD>M-n</KBD> | |
2092 | <DD> | |
2093 | <DT><KBD>M-<KBD>SPC</KBD></KBD> | |
2094 | <DD> | |
2095 | Find the next obsolete entry. | |
2096 | ||
2097 | <DT><KBD>M-p</KBD> | |
2098 | <DD> | |
2099 | <DT><KBD>M-<KBD>DEL</KBD></KBD> | |
2100 | <DD> | |
2101 | Find the previous obsolete entry. | |
2102 | ||
2103 | <DT><KBD>z</KBD> | |
2104 | <DD> | |
2105 | Make an active entry obsolete, or zap out an obsolete entry. | |
2106 | ||
2107 | </DL> | |
2108 | ||
2109 | <P> | |
2110 | The commands <KBD>M-n</KBD> (<CODE>po-next-obsolete-entry</CODE>) and <KBD>M-p</KBD> | |
2111 | (<CODE>po-previous-obsolete-entry</CODE>) move forwards or backwards, | |
2112 | chasing for an obsolete entry. If none is found, the search is | |
2113 | extended and wraps around in the PO file buffer. The commands | |
2114 | <KBD>M-<KBD>SPC</KBD></KBD> and <KBD>M-<KBD>DEL</KBD></KBD> are synonymous to <KBD>M-n</KBD> | |
2115 | and <KBD>M-p</KBD>, respectively. | |
2116 | ||
2117 | </P> | |
2118 | <P> | |
2119 | PO mode does not provide ways for un-commenting an obsolete entry | |
2120 | and making it active, because this would reintroduce an original | |
2121 | untranslated string which does not correspond to any marked string | |
2122 | in the program sources. This goes with the philosophy of never | |
2123 | introducing useless <CODE>msgid</CODE> values. | |
2124 | ||
2125 | </P> | |
2126 | <P> | |
2127 | However, it is possible to comment out an active entry, so making | |
2128 | it obsolete. GNU <CODE>gettext</CODE> utilities will later react to the | |
2129 | disappearance of a translation by using the untranslated string. | |
2130 | The command <KBD>z</KBD> (<CODE>po-fade-out-entry</CODE>) pushes the current entry | |
2131 | a little further towards annihilation. If the entry is active, then | |
2132 | the entry is merely commented out. If the entry is already obsolete, | |
2133 | then it is completely deleted from the PO file. It is easy to recycle | |
2134 | the translation so deleted into some other PO file entry, usually | |
2135 | one which is untranslated. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>. | |
2136 | ||
2137 | </P> | |
2138 | <P> | |
2139 | Here is a quite interesting problem to solve for later development of | |
2140 | PO mode, for those nights you are not sleepy. The idea would be that | |
2141 | PO mode might become bright enough, one of these days, to make good | |
2142 | guesses at retrieving the most probable candidate, among all obsolete | |
2143 | entries, for initializing the translation of a newly appeared string. | |
2144 | I think it might be a quite hard problem to do this algorithmically, as | |
2145 | we have to develop good and efficient measures of string similarity. | |
2146 | Right now, PO mode completely lets the decision to the translator, | |
2147 | when the time comes to find the adequate obsolete translation, it | |
2148 | merely tries to provide handy tools for helping her to do so. | |
2149 | ||
2150 | </P> | |
2151 | ||
2152 | ||
2153 | <H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">Modifying Translations</A></H2> | |
2154 | ||
2155 | <P> | |
2156 | PO mode prevents direct edition of the PO file, by the usual | |
2157 | means Emacs give for altering a buffer's contents. By doing so, | |
2158 | it pretends helping the translator to avoid little clerical errors | |
2159 | about the overall file format, or the proper quoting of strings, | |
2160 | as those errors would be easily made. Other kinds of errors are | |
2161 | still possible, but some may be catched and diagnosed by the batch | |
2162 | validation process, which the translator may always trigger by the | |
2163 | <KBD>v</KBD> command. For all other errors, the translator has to rely on | |
2164 | her own judgment, and also on the linguistic reports submitted to her | |
2165 | by the users of the translated package, having the same mother tongue. | |
2166 | ||
2167 | </P> | |
2168 | <P> | |
2169 | When the time comes to create a translation, correct a error diagnosed | |
2170 | mechanically or reported by a user, the translator have to resort to | |
2171 | using the following commands for modifying the translations. | |
2172 | ||
2173 | </P> | |
2174 | <DL COMPACT> | |
2175 | ||
2176 | <DT><KBD>RET</KBD> | |
2177 | <DD> | |
2178 | Interactively edit the translation. | |
2179 | ||
2180 | <DT><KBD>TAB</KBD> | |
2181 | <DD> | |
2182 | Reinitialize the translation with the original, untranslated string. | |
2183 | ||
2184 | <DT><KBD>k</KBD> | |
2185 | <DD> | |
2186 | Save the translation on the kill ring, and delete it. | |
2187 | ||
2188 | <DT><KBD>w</KBD> | |
2189 | <DD> | |
2190 | Save the translation on the kill ring, without deleting it. | |
2191 | ||
2192 | <DT><KBD>y</KBD> | |
2193 | <DD> | |
2194 | Replace the translation, taking the new from the kill ring. | |
2195 | ||
2196 | </DL> | |
2197 | ||
2198 | <P> | |
2199 | The command <KBD>RET</KBD> (<CODE>po-edit-msgstr</CODE>) opens a new Emacs | |
2200 | window containing a copy of the translation taken from the current | |
2201 | PO file entry, all ready for edition, fully modifiable | |
2202 | and with the complete extent of GNU Emacs modifying commands. | |
2203 | The string is presented to the translator expunged of all quoting | |
2204 | marks, and she will modify the <EM>unquoted</EM> string in this | |
2205 | window to heart's content. Once done, the regular Emacs command | |
2206 | <KBD>M-C-c</KBD> (<CODE>exit-recursive-edit</CODE>) may be used to return the | |
2207 | edited translation into the PO file, replacing the original | |
2208 | translation. The keys <KBD>C-c C-c</KBD> are bound so they have the | |
2209 | same effect as <KBD>M-C-c</KBD>. | |
2210 | ||
2211 | </P> | |
2212 | <P> | |
2213 | If the translator becomes unsatisfied with her translation to the | |
2214 | extent she prefers keeping the translation which was existent prior to | |
2215 | the <KBD>RET</KBD> command, she may use the regular Emacs command <KBD>C-]</KBD> | |
2216 | (<CODE>abort-recursive-edit</CODE>) to merely get rid of edition, while | |
2217 | preserving the original translation. Another way would be for her | |
2218 | to exit normally with <KBD>C-c C-c</KBD>, then type <CODE>u</CODE> once for | |
2219 | undoing the whole effect of last edition. | |
2220 | ||
2221 | </P> | |
2222 | <P> | |
2223 | While editing her translation, the translator should pay attention at | |
2224 | not inserting unwanted <KBD><KBD>RET</KBD></KBD> (carriage returns) characters at | |
2225 | the end of the translated string if those are not meant to be there, | |
2226 | or removing such characters when they are required. Since these | |
2227 | characters are not visible in the editing buffer, they are easily to | |
2228 | introduce by mistake. To help her, <KBD><KBD>RET</KBD></KBD> automatically puts | |
2229 | the character <KBD><</KBD> at the end of the string being edited, but this | |
2230 | <KBD><</KBD> is not really part of the string. On exiting the editing | |
2231 | window with <KBD>C-c C-c</KBD>, PO mode automatically removes such | |
2232 | <KBD><</KBD> and all whitespace added after it. If the translator adds | |
2233 | characters after the terminating <KBD><</KBD>, it looses its delimiting | |
2234 | property and integrally becomes part of the string. If she removes | |
2235 | the delimiting <KBD><</KBD>, then the edited string is taken <EM>as | |
2236 | is</EM>, with all trailing newlines, even if invisible. Also, if the | |
2237 | translated string ought to end itself with a genuine <KBD><</KBD>, then the | |
2238 | delimiting <KBD><</KBD> may not be removed; so the string should appear, | |
2239 | in the editing window, as ending with two <KBD><</KBD> in a row. | |
2240 | ||
2241 | </P> | |
2242 | <P> | |
2243 | When a translation (or a comment) is being edited, the translator | |
2244 | may move the cursor back into the PO file buffer and freely | |
2245 | move to other entries, and browsing at will. The edited entry will | |
2246 | be recovered as soon as the edit ceases, because this is this entry | |
2247 | only which is being modified. If, with an edition still opened, the | |
2248 | translator wanders in the PO file buffer, she cannot modify | |
2249 | any other entry. If she tries to, PO mode will react by suggesting | |
2250 | that she aborts the current edit, or else, by inviting her to finish | |
2251 | the current edit prior to any other modification. | |
2252 | ||
2253 | </P> | |
2254 | <P> | |
2255 | The command <KBD>TAB</KBD> (<CODE>po-msgid-to-msgstr</CODE>) initializes, or | |
2256 | reinitializes the translation with the original string. This command | |
2257 | is normally used when the translator wants to redo a fresh translation | |
2258 | of the original string, disregarding any previous work. | |
2259 | ||
2260 | </P> | |
2261 | <P> | |
2262 | In fact, whether it is best to start a translation with an empty | |
2263 | string, or rather with a copy of the original string, is a matter of | |
2264 | taste or habit. Sometimes, the source mother tongue language and the | |
2265 | target language are so different that is simply best to start writing | |
2266 | on an empty page. At other times, the source and target languages | |
2267 | are so close that it would be a waste to retype a number of words | |
2268 | already being written in the original string. A translator may also | |
2269 | like having the original string right under her eyes, as she will | |
2270 | progressively overwrite the original text with the translation, even | |
2271 | if this requires some extra editing work to get rid of the original. | |
2272 | ||
2273 | </P> | |
2274 | <P> | |
2275 | The command <KBD>k</KBD> (<CODE>po-kill-msgstr</CODE>) merely empties the | |
2276 | translation string, so turning the entry into an untranslated | |
2277 | one. But while doing so, its previous contents is put apart in | |
2278 | a special place, known as the kill ring. The command <KBD>w</KBD> | |
2279 | (<CODE>po-kill-ring-save-msgstr</CODE>) has also the effect of taking a | |
2280 | copy of the translation onto the kill ring, but it otherwise leaves | |
2281 | the entry alone, and does <EM>not</EM> remove the translation from the | |
2282 | entry. Both commands use exactly the Emacs kill ring, which is shared | |
2283 | between buffers, and which is well known already to GNU Emacs lovers. | |
2284 | ||
2285 | </P> | |
2286 | <P> | |
2287 | The translator may use <KBD>k</KBD> or <KBD>w</KBD> many times in the course | |
2288 | of her work, as the kill ring may hold several saved translations. | |
2289 | From the kill ring, strings may later be reinserted in various | |
2290 | Emacs buffers. In particular, the kill ring may be used for moving | |
2291 | translation strings between different entries of a single PO file | |
2292 | buffer, or if the translator is handling many such buffers at once, | |
2293 | even between PO files. | |
2294 | ||
2295 | </P> | |
2296 | <P> | |
2297 | To facilitate exchanges with buffers which are not in PO mode, the | |
2298 | translation string put on the kill ring by the <KBD>k</KBD> command is fully | |
2299 | unquoted before being saved: external quotes are removed, multi-lines | |
2300 | strings are concatenated, and backslashed escaped sequences are turned | |
2301 | into their corresponding characters. In the special case of obsolete | |
2302 | entries, the translation is also uncommented prior to saving. | |
2303 | ||
2304 | </P> | |
2305 | <P> | |
2306 | The command <KBD>y</KBD> (<CODE>po-yank-msgstr</CODE>) completely replaces the | |
2307 | translation of the current entry by a string taken from the kill ring. | |
2308 | Following GNU Emacs terminology, we then say that the replacement | |
2309 | string is <STRONG>yanked</STRONG> into the PO file buffer. | |
2310 | See section `Yanking' in <CITE>The Emacs Editor</CITE>. | |
2311 | The first time <KBD>y</KBD> is used, the translation receives the value of | |
2312 | the most recent addition to the kill ring. If <KBD>y</KBD> is typed once | |
2313 | again, immediately, without intervening keystrokes, the translation | |
2314 | just inserted is taken away and replaced by the second most recent | |
2315 | addition to the kill ring. By repeating <KBD>y</KBD> many times in a row, | |
2316 | the translator may travel along the kill ring for saved strings, | |
2317 | until she finds the string she really wanted. | |
2318 | ||
2319 | </P> | |
2320 | <P> | |
2321 | When a string is yanked into a PO file entry, it is fully and | |
2322 | automatically requoted for complying with the format PO files should | |
2323 | have. Further, if the entry is obsolete, PO mode then appropriately | |
2324 | push the inserted string inside comments. Once again, translators | |
2325 | should not burden themselves with quoting considerations besides, of | |
2326 | course, the necessity of the translated string itself respective to | |
2327 | the program using it. | |
2328 | ||
2329 | </P> | |
2330 | <P> | |
2331 | Note that <KBD>k</KBD> or <KBD>w</KBD> are not the only commands pushing strings | |
2332 | on the kill ring, as almost any PO mode command replacing translation | |
2333 | strings (or the translator comments) automatically save the old string | |
2334 | on the kill ring. The main exceptions to this general rule are the | |
2335 | yanking commands themselves. | |
2336 | ||
2337 | </P> | |
2338 | <P> | |
2339 | To better illustrate the operation of killing and yanking, let's | |
2340 | use an actual example, taken from a common situation. When the | |
2341 | programmer slightly modifies some string right in the program, his | |
2342 | change is later reflected in the PO file by the appearance | |
2343 | of a new untranslated entry for the modified string, and the fact | |
2344 | that the entry translating the original or unmodified string becomes | |
2345 | obsolete. In many cases, the translator might spare herself some work | |
2346 | by retrieving the unmodified translation from the obsolete entry, | |
2347 | then initializing the untranslated entry <CODE>msgstr</CODE> field with | |
2348 | this retrieved translation. Once this done, the obsolete entry is | |
2349 | not wanted anymore, and may be safely deleted. | |
2350 | ||
2351 | </P> | |
2352 | <P> | |
2353 | When the translator finds an untranslated entry and suspects that a | |
2354 | slight variant of the translation exists, she immediately uses <KBD>m</KBD> | |
2355 | to mark the current entry location, then starts chasing obsolete | |
2356 | entries with <KBD>M-SPC</KBD>, hoping to find some translation corresponding | |
2357 | to the unmodified string. Once found, she uses the <KBD>z</KBD> command | |
2358 | for deleting the obsolete entry, knowing that <KBD>z</KBD> also <EM>kills</EM> | |
2359 | the translation, that is, pushes the translation on the kill ring. | |
2360 | Then, <KBD>l</KBD> returns to the initial untranslated entry, <KBD>y</KBD> | |
2361 | then <EM>yanks</EM> the saved translation right into the <CODE>msgstr</CODE> | |
2362 | field. The translator is then free to use <KBD><KBD>RET</KBD></KBD> for fine | |
2363 | tuning the translation contents, and maybe to later use <KBD>e</KBD>, | |
2364 | then <KBD>m</KBD> again, for going on with the next untranslated string. | |
2365 | ||
2366 | </P> | |
2367 | <P> | |
2368 | When some sequence of keys has to be typed over and over again, the | |
2369 | translator may find comfortable to become more acquainted with the GNU | |
2370 | Emacs capability of learning these sequences and playing them back under | |
2371 | request. See section `Keyboard Macros' in <CITE>The Emacs Editor</CITE>. | |
2372 | ||
2373 | </P> | |
2374 | ||
2375 | ||
2376 | <H2><A NAME="SEC27" HREF="gettext_toc.html#TOC27">Modifying Comments</A></H2> | |
2377 | ||
2378 | <P> | |
2379 | Any translation work done seriously will raise many linguistic | |
2380 | difficulties, for which decisions have to be made, and the choices | |
2381 | further documented. These documents may be saved within the | |
2382 | PO file in form of translator comments, which the translator | |
2383 | is free to create, delete, or modify at will. These comments may | |
2384 | be useful to herself when she returns to this PO file after a while. | |
2385 | Memory forgets! | |
2386 | ||
2387 | </P> | |
2388 | <P> | |
2389 | These commands are somewhat similar to those modifying translations, | |
2390 | so the general indications given for these apply here. See section <A HREF="gettext.html#SEC26">Modifying Translations</A>. | |
2391 | ||
2392 | </P> | |
2393 | <DL COMPACT> | |
2394 | ||
2395 | <DT><KBD>M-RET</KBD> | |
2396 | <DD> | |
2397 | Interactively edit the translator comments. | |
2398 | ||
2399 | <DT><KBD>M-k</KBD> | |
2400 | <DD> | |
2401 | Save the translator comments on the kill ring, and delete it. | |
2402 | ||
2403 | <DT><KBD>M-w</KBD> | |
2404 | <DD> | |
2405 | Save the translator comments on the kill ring, without deleting it. | |
2406 | ||
2407 | <DT><KBD>M-y</KBD> | |
2408 | <DD> | |
2409 | Replace the translator comments, taking the new from the kill ring. | |
2410 | ||
2411 | </DL> | |
2412 | ||
2413 | <P> | |
2414 | Those commands parallel PO mode commands for modifying the translation | |
2415 | strings, and behave much the same way as them, except that they handle | |
2416 | this part of PO file comments meant for translator usage, rather | |
2417 | than the translation strings. So, the descriptions given below are | |
2418 | slightly succinct, because the full details have already been given. | |
2419 | See section <A HREF="gettext.html#SEC26">Modifying Translations</A>. | |
2420 | ||
2421 | </P> | |
2422 | <P> | |
2423 | The command <KBD>M-RET</KBD> (<CODE>po-edit-comment</CODE>) opens a new Emacs | |
2424 | window containing a copy of the translator comments the current | |
2425 | PO file entry. If there is no such comments, PO mode | |
2426 | understands that the translator wants to add a comment to the entry, | |
2427 | and she is presented an empty screen. Comment marks (<KBD>#</KBD>) and | |
2428 | the space following them are automatically removed before edition, | |
2429 | and reinstated after. For translator comments pertaining to obsolete | |
2430 | entries, the uncommenting and recommenting operations are done twice. | |
2431 | The command <KBD>#</KBD> also has the same effect as <KBD>M-RET</KBD>, and might | |
2432 | be easier to type. Once in the editing window, the keys <KBD>C-c | |
2433 | C-c</KBD> allow the translator to tell she is finished with editing | |
2434 | the comment. | |
2435 | ||
2436 | </P> | |
2437 | <P> | |
2438 | The command <KBD>M-k</KBD> (<CODE>po-kill-comment</CODE>) get rid of all | |
2439 | translator comments, while saving those comments on the kill ring. | |
2440 | The command <KBD>M-w</KBD> (<CODE>po-kill-ring-save-comment</CODE>) takes | |
2441 | a copy of the translator comments on the kill ring, but leaves | |
2442 | them undisturbed in the current entry. The command <KBD>M-y</KBD> | |
2443 | (<CODE>po-yank-comment</CODE>) completely replaces the translator comments | |
2444 | by a string taken at the front of the kill ring. When this command | |
2445 | is immediately repeated, the comments just inserted are withdrawn, | |
2446 | and replaced by other strings taken along the kill ring. | |
2447 | ||
2448 | </P> | |
2449 | <P> | |
2450 | On the kill ring, all strings have the same nature. There is no | |
2451 | distinction between <EM>translation</EM> strings and <EM>translator | |
2452 | comments</EM> strings. So, for example, let's presume the translator | |
2453 | has just finished editing a translation, and wants to create a new | |
2454 | translator comments for documenting why the previous translation was | |
2455 | not good, just to remember what was the problem. Foreseeing that she | |
2456 | will do that in her documentation, the translator will want to quote | |
2457 | the previous translation in her translator comments. For doing so, she | |
2458 | may initialize the translator comments with the previous translation, | |
2459 | still at the head of the kill ring. Because editing already pushed the | |
2460 | previous translation on the kill ring, she just has to type <KBD>M-w</KBD> | |
2461 | prior to <KBD>#</KBD>, and the previous translation will be right there, | |
2462 | all ready for being introduced by some explanatory text. | |
2463 | ||
2464 | </P> | |
2465 | <P> | |
2466 | On the other hand, presume there are some translator comments already | |
2467 | and that the translator wants to add to those comments, instead | |
2468 | of wholly replacing them. Then, she should edit the comment right | |
2469 | away with <KBD>#</KBD>. Once inside the editing window, she can use the | |
2470 | regular GNU Emacs commands <KBD>C-y</KBD> (<CODE>yank</CODE>) and <KBD>M-y</KBD> | |
2471 | (<CODE>yank-pop</CODE>) for getting the previous translation where she likes. | |
2472 | ||
2473 | </P> | |
2474 | ||
2475 | ||
2476 | <H2><A NAME="SEC28" HREF="gettext_toc.html#TOC28">Consulting Auxiliary PO Files</A></H2> | |
2477 | ||
2478 | <P> | |
2479 | An incoming feature of PO mode should help the knowledgeable translator | |
2480 | to take advantage of translations already achieved in other languages | |
2481 | she just happens to know, by providing these other language translation | |
2482 | as additional context for her own work. Each PO file existing for | |
2483 | the same package the translator is working on, but targeted to a | |
2484 | different mother tongue language, is called an <STRONG>auxiliary</STRONG> PO file. | |
2485 | Commands will exist for declaring and handling auxiliary PO files, | |
2486 | and also for showing contexts for the entry under work. For this to | |
2487 | work fully, all auxiliary PO files will have to be normalized. | |
2488 | ||
2489 | </P> | |
2490 | ||
2491 | ||
2492 | <H1><A NAME="SEC29" HREF="gettext_toc.html#TOC29">Producing Binary MO Files</A></H1> | |
2493 | ||
2494 | ||
2495 | ||
2496 | <H2><A NAME="SEC30" HREF="gettext_toc.html#TOC30">Invoking the <CODE>msgfmt</CODE> Program</A></H2> | |
2497 | ||
2498 | ||
2499 | <PRE> | |
2500 | Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ... | |
2501 | </PRE> | |
2502 | ||
2503 | <DL COMPACT> | |
2504 | ||
2505 | <DT><SAMP>`-a <VAR>number</VAR>'</SAMP> | |
2506 | <DD> | |
2507 | <DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP> | |
2508 | <DD> | |
2509 | Align strings to <VAR>number</VAR> bytes (default: 1). | |
2510 | ||
2511 | <DT><SAMP>`-h'</SAMP> | |
2512 | <DD> | |
2513 | <DT><SAMP>`--help'</SAMP> | |
2514 | <DD> | |
2515 | Display this help and exit. | |
2516 | ||
2517 | <DT><SAMP>`-I <VAR>list</VAR>'</SAMP> | |
2518 | <DD> | |
2519 | <DT><SAMP>`--input-path=<VAR>list</VAR>'</SAMP> | |
2520 | <DD> | |
2521 | List of directories searched for input files. | |
2522 | ||
2523 | <DT><SAMP>`--no-hash'</SAMP> | |
2524 | <DD> | |
2525 | Binary file will not include the hash table. | |
2526 | ||
2527 | <DT><SAMP>`-o <VAR>file</VAR>'</SAMP> | |
2528 | <DD> | |
2529 | <DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP> | |
2530 | <DD> | |
2531 | Specify output file name as <VAR>file</VAR>. | |
2532 | ||
2533 | <DT><SAMP>`-v'</SAMP> | |
2534 | <DD> | |
2535 | <DT><SAMP>`--verbose'</SAMP> | |
2536 | <DD> | |
2537 | Detect and diagnose input file anomalies which might represent | |
2538 | translation errors. The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are | |
2539 | studied and compared. It is considered abnormal that one string | |
2540 | starts or ends with a newline while the other does not. Also, both | |
2541 | strings should have the same number of <SAMP>`%'</SAMP> format specifiers, | |
2542 | with matching types. For example, the check will diagnose using | |
2543 | <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%s'</SAMP>, or | |
2544 | <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>. It can even handle positional parameters. | |
2545 | ||
2546 | <DT><SAMP>`-V'</SAMP> | |
2547 | <DD> | |
2548 | <DT><SAMP>`--version'</SAMP> | |
2549 | <DD> | |
2550 | Output version information and exit. | |
2551 | ||
2552 | </DL> | |
2553 | ||
2554 | <P> | |
2555 | If input file is <SAMP>`-'</SAMP>, standard input is read. If output file | |
2556 | is <SAMP>`-'</SAMP>, output is written to standard output. | |
2557 | ||
2558 | </P> | |
2559 | <P> | |
2560 | The search patch for <CODE>msgfmt</CODE> is <TT>`/usr/local/share/nls/src/'</TT>, | |
2561 | by default. It represents the path to additional directories where | |
2562 | other PO files can be found. This feature could be used for some | |
2563 | PO files for standard libraries, in case we would like to spare | |
2564 | translating their strings over and over again. The <SAMP>`-x'</SAMP> option | |
2565 | could then exclude these strings from the generation. | |
2566 | ||
2567 | </P> | |
2568 | ||
2569 | ||
2570 | <H2><A NAME="SEC31" HREF="gettext_toc.html#TOC31">The Format of GNU MO Files</A></H2> | |
2571 | ||
2572 | <P> | |
2573 | The format of the generated MO files is best described by a picture, | |
2574 | which appears below. | |
2575 | ||
2576 | </P> | |
2577 | <P> | |
2578 | The first two words serve the identification of the file. The magic | |
2579 | number will always signal GNU MO files. The number is stored in the | |
2580 | byte order of the generating machine, so the magic number really is | |
2581 | two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>. The second | |
2582 | word describes the current revision of the file format. For now the | |
2583 | revision is 0. This might change in future versions, and ensures | |
2584 | that the readers of MO files can distinguish new formats from old | |
2585 | ones, so that both can be handled correctly. The version is kept | |
2586 | separate from the magic number, instead of using different magic | |
2587 | numbers for different formats, mainly because <TT>`/etc/magic'</TT> is | |
2588 | not updated often. It might be better to have magic separated from | |
2589 | internal format version identification. | |
2590 | ||
2591 | </P> | |
2592 | <P> | |
2593 | Follow a number of pointers to later tables in the file, allowing | |
2594 | for the extension of the prefix part of MO files without having to | |
2595 | recompile programs reading them. This might become useful for later | |
2596 | inserting a few flag bits, indication about the charset used, new | |
2597 | tables, or other things. | |
2598 | ||
2599 | </P> | |
2600 | <P> | |
2601 | Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables | |
2602 | of string descriptors can be found. In both tables, each string | |
2603 | descriptor uses two 32 bits integers, one for the string length, | |
2604 | another for the offset of the string in the MO file, counting in bytes | |
2605 | from the start of the file. The first table contains descriptors | |
2606 | for the original strings, and is sorted so the original strings | |
2607 | are in increasing lexicographical order. The second table contains | |
2608 | descriptors for the translated strings, and is parallel to the first | |
2609 | table: to find the corresponding translation one has to access the | |
2610 | array slot in the second array with the same index. | |
2611 | ||
2612 | </P> | |
2613 | <P> | |
2614 | Having the original strings sorted enables the use of simple binary | |
2615 | search, for when the MO file does not contain an hashing table, or | |
2616 | for when it is not practical to use the hashing table provided in | |
2617 | the MO file. This also has another advantage, as the empty string | |
2618 | in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into | |
2619 | some system information attached to that particular MO file, and the | |
2620 | empty string necessarily becomes the first in both the original and | |
2621 | translated tables, making the system information very easy to find. | |
2622 | ||
2623 | </P> | |
2624 | <P> | |
2625 | The size <VAR>S</VAR> of the hash table can be zero. In this case, the | |
2626 | hash table itself is not contained in the MO file. Some people might | |
2627 | prefer this because a precomputed hashing table takes disk space, and | |
2628 | does not win <EM>that</EM> much speed. The hash table contains indices | |
2629 | to the sorted array of strings in the MO file. Conflict resolution is | |
2630 | done by double hashing. The precise hashing algorithm used is fairly | |
2631 | dependent of GNU <CODE>gettext</CODE> code, and is not documented here. | |
2632 | ||
2633 | </P> | |
2634 | <P> | |
2635 | As for the strings themselves, they follow the hash file, and each | |
2636 | is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in | |
2637 | the length which appears in the string descriptor. The <CODE>msgfmt</CODE> | |
2638 | program has an option selecting the alignment for MO file strings. | |
2639 | With this option, each string is separately aligned so it starts at | |
2640 | an offset which is a multiple of the alignment value. On some RISC | |
2641 | machines, a correct alignment will speed things up. | |
2642 | ||
2643 | </P> | |
2644 | <P> | |
2645 | Nothing prevents an MO file from having embedded <KBD>NUL</KBD>s in strings. | |
2646 | However, the program interface currently used already presumes | |
2647 | that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are | |
2648 | somewhat useless. But MO file format is general enough so other | |
2649 | interfaces would be later possible, if for example, we ever want to | |
2650 | implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may | |
2651 | accidently appear. | |
2652 | ||
2653 | </P> | |
2654 | <P> | |
2655 | This particular issue has been strongly debated in the GNU | |
2656 | <CODE>gettext</CODE> development forum, and it is expectable that MO file | |
2657 | format will evolve or change over time. It is even possible that many | |
2658 | formats may later be supported concurrently. But surely, we got to | |
2659 | start somewhere, and the MO file format described here is a good start. | |
2660 | Nothing is cast in concrete, and the format may later evolve fairly | |
2661 | easily, so we should feel comfortable with the current approach. | |
2662 | ||
2663 | </P> | |
2664 | ||
2665 | <PRE> | |
2666 | byte | |
2667 | +------------------------------------------+ | |
2668 | 0 | magic number = 0x950412de | | |
2669 | | | | |
2670 | 4 | file format revision = 0 | | |
2671 | | | | |
2672 | 8 | number of strings | == N | |
2673 | | | | |
2674 | 12 | offset of table with original strings | == O | |
2675 | | | | |
2676 | 16 | offset of table with translation strings | == T | |
2677 | | | | |
2678 | 20 | size of hashing table | == S | |
2679 | | | | |
2680 | 24 | offset of hashing table | == H | |
2681 | | | | |
2682 | . . | |
2683 | . (possibly more entries later) . | |
2684 | . . | |
2685 | | | | |
2686 | O | length & offset 0th string ----------------. | |
2687 | O + 8 | length & offset 1st string ------------------. | |
2688 | ... ... | | | |
2689 | O + ((N-1)*8)| length & offset (N-1)th string | | | | |
2690 | | | | | | |
2691 | T | length & offset 0th translation ---------------. | |
2692 | T + 8 | length & offset 1st translation -----------------. | |
2693 | ... ... | | | | | |
2694 | T + ((N-1)*8)| length & offset (N-1)th translation | | | | | | |
2695 | | | | | | | | |
2696 | H | start hash table | | | | | | |
2697 | ... ... | | | | | |
2698 | H + S * 4 | end hash table | | | | | | |
2699 | | | | | | | | |
2700 | | NUL terminated 0th string <----------------' | | | | |
2701 | | | | | | | |
2702 | | NUL terminated 1st string <------------------' | | | |
2703 | | | | | | |
2704 | ... ... | | | |
2705 | | | | | | |
2706 | | NUL terminated 0th translation <---------------' | | |
2707 | | | | | |
2708 | | NUL terminated 1st translation <-----------------' | |
2709 | | | | |
2710 | ... ... | |
2711 | | | | |
2712 | +------------------------------------------+ | |
2713 | </PRE> | |
2714 | ||
2715 | ||
2716 | ||
2717 | <H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">The User's View</A></H1> | |
2718 | ||
2719 | <P> | |
2720 | When GNU <CODE>gettext</CODE> will truly have reached is goal, average users | |
2721 | should feel some kind of astonished pleasure, seeing the effect of | |
2722 | that strange kind of magic that just makes their own native language | |
2723 | appear everywhere on their screens. As for naive users, they would | |
2724 | ideally have no special pleasure about it, merely taking their own | |
2725 | language for <EM>granted</EM>, and becoming rather unhappy otherwise. | |
2726 | ||
2727 | </P> | |
2728 | <P> | |
2729 | So, let's try to describe here how we would like the magic to operate, | |
2730 | as we want the users' view to be the simplest, among all ways one | |
2731 | could look at GNU <CODE>gettext</CODE>. All other software engineers: | |
2732 | programmers, translators, maintainers, should work together in such a | |
2733 | way that the magic becomes possible. This is a long and progressive | |
2734 | undertaking, and information is available about the progress of the | |
2735 | GNU Translation Project. | |
2736 | ||
2737 | </P> | |
2738 | <P> | |
2739 | When a package is distributed, there are two kind of users: | |
2740 | <STRONG>installers</STRONG> who fetch the distribution, unpack it, configure | |
2741 | it, compile it and install it for themselves or others to use; and | |
2742 | <STRONG>end users</STRONG> that call programs of the package, once these have | |
2743 | been installed at their site. GNU <CODE>gettext</CODE> is offering magic | |
2744 | for both installers and end users. | |
2745 | ||
2746 | </P> | |
2747 | ||
2748 | ||
2749 | ||
2750 | <H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">The Current <TT>`NLS'</TT> Matrix for GNU</A></H2> | |
2751 | ||
2752 | <P> | |
2753 | Languages are not equally supported in all GNU packages. To know | |
2754 | if some GNU package uses GNU <CODE>gettext</CODE>, one may check | |
2755 | the distribution for the <TT>`NLS'</TT> information file, for some | |
2756 | <TT>`<VAR>ll</VAR>.po'</TT> files, often kept together into some <TT>`po/'</TT> | |
2757 | directory, or for an <TT>`intl/'</TT> directory. Internationalized | |
2758 | packages have usually many <TT>`<VAR>ll</VAR>.po'</TT> files, where <VAR>ll</VAR> | |
2759 | represents the language. section <A HREF="gettext.html#SEC35">Magic for End Users</A> for a complete description | |
2760 | of the format for <VAR>ll</VAR>. | |
2761 | ||
2762 | </P> | |
2763 | <P> | |
2764 | More generally, a matrix is available for showing the current state | |
2765 | of GNU internationalization, listing which packages are prepared | |
2766 | for multi-lingual messages, and which languages is supported by each. | |
2767 | Because this information changes often, this matrix is not kept within | |
2768 | this GNU <CODE>gettext</CODE> manual. This information is often found in | |
2769 | file <TT>`NLS'</TT> from various GNU distributions, but is also as old | |
2770 | as the distribution itself. A recent copy of this <TT>`NLS'</TT> file, | |
2771 | containing up-to-date information, should generally be found on most | |
2772 | GNU archive sites. | |
2773 | ||
2774 | </P> | |
2775 | ||
2776 | ||
2777 | <H2><A NAME="SEC34" HREF="gettext_toc.html#TOC34">Magic for Installers</A></H2> | |
2778 | ||
2779 | <P> | |
2780 | By default, packages fully using GNU <CODE>gettext</CODE>, internally, | |
2781 | are installed in such a way that they to allow translation of | |
2782 | messages. At <EM>configuration</EM> time, those packages should | |
2783 | automatically detect whether the underlying host system provides usable | |
2784 | <CODE>catgets</CODE> or <CODE>gettext</CODE> functions. If neither is present, | |
2785 | the GNU <CODE>gettext</CODE> library should be automatically prepared | |
2786 | and used. Installers may use special options at configuration | |
2787 | time for changing this behavior. The command <SAMP>`./configure | |
2788 | --with-gnu-gettext'</SAMP> bypasses system <CODE>catgets</CODE> or <CODE>gettext</CODE> to | |
2789 | use GNU <CODE>gettext</CODE> instead, while <SAMP>`./configure --disable-nls'</SAMP> | |
2790 | produces program totally unable to translate messages. | |
2791 | ||
2792 | </P> | |
2793 | <P> | |
2794 | Internationalized packages have usually many <TT>`<VAR>ll</VAR>.po'</TT> | |
2795 | files. Unless | |
2796 | translations are disabled, all those available are installed together | |
2797 | with the package. However, the environment variable <CODE>LINGUAS</CODE> | |
2798 | may be set, prior to configuration, to limit the installed set. | |
2799 | <CODE>LINGUAS</CODE> should then contain a space separated list of two-letter | |
2800 | codes, stating which languages are allowed. | |
2801 | ||
2802 | </P> | |
2803 | ||
2804 | ||
2805 | <H2><A NAME="SEC35" HREF="gettext_toc.html#TOC35">Magic for End Users</A></H2> | |
2806 | ||
2807 | <P> | |
2808 | We consider here those packages using GNU <CODE>gettext</CODE> internally, | |
2809 | and for which the installers did not disable translation at | |
2810 | <EM>configure</EM> time. Then, users only have to set the <CODE>LANG</CODE> | |
2811 | environment variable to the appropriate <SAMP>`<VAR>ll</VAR>'</SAMP> prior to | |
2812 | using the programs in the package. See section <A HREF="gettext.html#SEC33">The Current <TT>`NLS'</TT> Matrix for GNU</A>. For example, | |
2813 | let's presume a German site. At the shell prompt, users merely have to | |
2814 | execute <SAMP>`setenv LANG de'</SAMP> (in <CODE>csh</CODE>) or <SAMP>`export | |
2815 | LANG; LANG=de'</SAMP> (in <CODE>sh</CODE>). They could even do this from their | |
2816 | <TT>`.login'</TT> or <TT>`.profile'</TT> file. | |
2817 | ||
2818 | </P> | |
2819 | ||
2820 | ||
2821 | <H1><A NAME="SEC36" HREF="gettext_toc.html#TOC36">The Programmer's View</A></H1> | |
2822 | ||
2823 | <P> | |
2824 | One aim of the current message catalog implementation provided by | |
2825 | GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the | |
2826 | installer wishes to do so. So we perhaps should first take a look at | |
2827 | the solutions we know about. The people in the POSIX committee does not | |
2828 | manage to agree on one of the semi-official standards which we'll | |
2829 | describe below. In fact they couldn't agree on anything, so nothing | |
2830 | decide only to include an example of an interface. The major Unix vendors | |
2831 | are split in the usage of the two most important specifications: X/Opens | |
2832 | catgets vs. Uniforums gettext interface. We'll describe them both and | |
2833 | later explain our solution of this dilemma. | |
2834 | ||
2835 | </P> | |
2836 | ||
2837 | ||
2838 | ||
2839 | <H2><A NAME="SEC37" HREF="gettext_toc.html#TOC37">About <CODE>catgets</CODE></A></H2> | |
2840 | ||
2841 | <P> | |
2842 | The <CODE>catgets</CODE> implementation is defined in the X/Open Portability | |
2843 | Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the | |
2844 | process of creating this standard seemed to be too slow for some of | |
2845 | the Unix vendors so they created their implementations on preliminary | |
2846 | versions of the standard. Of course this leads again to problems while | |
2847 | writing platform independent programs: even the usage of <CODE>catgets</CODE> | |
2848 | does not guarantee a unique interface. | |
2849 | ||
2850 | </P> | |
2851 | <P> | |
2852 | Another, personal comment on this that only a bunch of committee members | |
2853 | could have made this interface. They never really tried to program | |
2854 | using this interface. It is a fast, memory-saving implementation, an | |
2855 | user can happily live with it. But programmers hate it (at least me and | |
2856 | some others do...) | |
2857 | ||
2858 | </P> | |
2859 | <P> | |
2860 | But we must not forget one point: after all the trouble with transfering | |
2861 | the rights on Unix(tm) they at last came to X/Open, the very same who | |
2862 | published this specifications. This leads me to making the prediction | |
2863 | that this interface will be in future Unix standards (e.g. Spec1170) and | |
2864 | therefore part of all Unix implementation (implementations, which are | |
2865 | <EM>allowed</EM> to wear this name). | |
2866 | ||
2867 | </P> | |
2868 | ||
2869 | ||
2870 | ||
2871 | <H3><A NAME="SEC38" HREF="gettext_toc.html#TOC38">The Interface</A></H3> | |
2872 | ||
2873 | <P> | |
2874 | The interface to the <CODE>catgets</CODE> implementation consists of three | |
2875 | functions which correspond to those used in file access: <CODE>catopen</CODE> | |
2876 | to open the catalog for using, <CODE>catgets</CODE> for accessing the message | |
2877 | tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes | |
2878 | for the functions and the needed definitions are in the | |
2879 | <CODE><nl_types.h></CODE> header file. | |
2880 | ||
2881 | </P> | |
2882 | <P> | |
2883 | <CODE>catopen</CODE> is used like in this: | |
2884 | ||
2885 | </P> | |
2886 | ||
2887 | <PRE> | |
2888 | nl_catd catd = catopen ("catalog_name", 0); | |
2889 | </PRE> | |
2890 | ||
2891 | <P> | |
2892 | The function takes as the argument the name of the catalog. This usual | |
2893 | refers to the name of the program or the package. The second parameter | |
2894 | is not further specified in the standard. I don't even know whether it | |
2895 | is implemented consistently among various systems. So the common advice | |
2896 | is to use <CODE>0</CODE> as the value. The return value is a handle to the | |
2897 | message catalog, equivalent to handles to file returned by <CODE>open</CODE>. | |
2898 | ||
2899 | </P> | |
2900 | <P> | |
2901 | This handle is of course used in the <CODE>catgets</CODE> function which can | |
2902 | be used like this: | |
2903 | ||
2904 | </P> | |
2905 | ||
2906 | <PRE> | |
2907 | char *translation = catgets (catd, set_no, msg_id, "original string"); | |
2908 | </PRE> | |
2909 | ||
2910 | <P> | |
2911 | The first parameter is this catalog descriptor. The second parameter | |
2912 | specifies the set of messages in this catalog, in which the message | |
2913 | described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a | |
2914 | three-stage addressing: | |
2915 | ||
2916 | </P> | |
2917 | ||
2918 | <PRE> | |
2919 | catalog name => set number => message ID => translation | |
2920 | </PRE> | |
2921 | ||
2922 | <P> | |
2923 | The fourth argument is not used to address the translation. It is given | |
2924 | as a default value in case when one of the addressing stages fail. One | |
2925 | important thing to remember is that although the return type of catgets | |
2926 | is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It | |
2927 | should better <CODE>const char *</CODE>, but the standard is published in | |
2928 | 1988, one year before ANSI C. | |
2929 | ||
2930 | </P> | |
2931 | <P> | |
2932 | The last of these function functions is used and behaves as expected: | |
2933 | ||
2934 | </P> | |
2935 | ||
2936 | <PRE> | |
2937 | catclose (catd); | |
2938 | </PRE> | |
2939 | ||
2940 | <P> | |
2941 | After this no <CODE>catgets</CODE> call using the descriptor is legal anymore. | |
2942 | ||
2943 | </P> | |
2944 | ||
2945 | ||
2946 | <H3><A NAME="SEC39" HREF="gettext_toc.html#TOC39">Problems with the <CODE>catgets</CODE> Interface?!</A></H3> | |
2947 | ||
2948 | <P> | |
2949 | Now that this descriptions seemed to be really easy where are the | |
2950 | problem we speak of. In fact the interface could be used in a | |
2951 | reasonable way, but constructing the message catalogs is a pain. The | |
2952 | reason for this lies in the third argument of <CODE>catgets</CODE>: the unique | |
2953 | message ID. This has to be a numeric value for all messages in a single | |
2954 | set. Perhaps you could imagine the problems keeping such list while | |
2955 | changing the source code. Add a new message here, remove one there. Of | |
2956 | course there have been developed a lot of tools helping to organize this | |
2957 | chaos but one as the other fails in one aspect or the other. We don't | |
2958 | want to say that the other approach has no problems but they are far | |
2959 | more easily to manage. | |
2960 | ||
2961 | </P> | |
2962 | ||
2963 | ||
2964 | <H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>gettext</CODE></A></H2> | |
2965 | ||
2966 | <P> | |
2967 | The definition of the <CODE>gettext</CODE> interface comes from a Uniforum | |
2968 | proposal and it is followed by at least one major Unix vendor | |
2969 | (Sun) in its last developments. It is not specified in any official | |
2970 | standard, though. | |
2971 | ||
2972 | </P> | |
2973 | <P> | |
2974 | The main points about this solution is that it does not follow the | |
2975 | method of normal file handling (open-use-close) and that it does not | |
2976 | burden the programmer so many task, especially the unique key handling. | |
2977 | Of course here is also a unique key needed, but this key is the | |
2978 | message itself (how long or short it is). See section <A HREF="gettext.html#SEC45">Comparing the Two Interfaces</A> for a | |
2979 | more detailed comparison of the two methods. | |
2980 | ||
2981 | </P> | |
2982 | <P> | |
2983 | The following section contains a rather detailed description of the | |
2984 | interface. We make it that detailed because this is the interface | |
2985 | we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested | |
2986 | in using this library will be interested in this description. | |
2987 | ||
2988 | </P> | |
2989 | ||
2990 | ||
2991 | ||
2992 | <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3> | |
2993 | ||
2994 | <P> | |
2995 | The minimal functionality an interface must have is a) to select a | |
2996 | domain the strings are coming from (a single domain for all programs is | |
2997 | not reasonable because its construction and maintenance is difficult, | |
2998 | perhaps impossible) and b) to access a string in a selected domain. | |
2999 | ||
3000 | </P> | |
3001 | <P> | |
3002 | This is principally the description of the <CODE>gettext</CODE> interface. It | |
3003 | has an global domain which unqualified usages reference. Of course this | |
3004 | domain is selectable by the user. | |
3005 | ||
3006 | </P> | |
3007 | ||
3008 | <PRE> | |
3009 | char *textdomain (const char *domain_name); | |
3010 | </PRE> | |
3011 | ||
3012 | <P> | |
3013 | This provides the possibility to change or query the current status of | |
3014 | the current global domain of the <CODE>LC_MESSAGE</CODE> category. The | |
3015 | argument is a null-terminated string, whose characters must be legal in | |
3016 | the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, | |
3017 | the function return the current value. If no value has been set | |
3018 | before, the name of the default domain is returned: <EM>messages</EM>. | |
3019 | Please note that although the return value of <CODE>textdomain</CODE> is of | |
3020 | type <CODE>char *</CODE> no changing is allowed. It is also important to know | |
3021 | that no checks of the availability are made. If the name is not | |
3022 | available you will see this by the fact that no translations are provided. | |
3023 | ||
3024 | </P> | |
3025 | <P> | |
3026 | To use a domain set by <CODE>textdomain</CODE> the function | |
3027 | ||
3028 | </P> | |
3029 | ||
3030 | <PRE> | |
3031 | char *gettext (const char *msgid); | |
3032 | </PRE> | |
3033 | ||
3034 | <P> | |
3035 | is to be used. This is the simplest reasonable form one can imagine. | |
3036 | The translation of the string <VAR>msgid</VAR> is returned if it is available | |
3037 | in the current domain. If not available the argument itself is | |
3038 | returned. If the argument is <CODE>NULL</CODE> the result is undefined. | |
3039 | ||
3040 | </P> | |
3041 | <P> | |
3042 | One things which should come into mind is that no explicit dependency to | |
3043 | the used domain is given. The current value of the domain for the | |
3044 | <CODE>LC_MESSAGES</CODE> locale is used. If this changes between two | |
3045 | executions of the same <CODE>gettext</CODE> call in the program, both calls | |
3046 | reference a different message catalog. | |
3047 | ||
3048 | </P> | |
3049 | <P> | |
3050 | For the easiest case, which is normally used in internationalized GNU | |
3051 | packages, once at the beginning of execution a call to <CODE>textdomain</CODE> | |
3052 | is issued, setting the domain to a unique name, normally the package | |
3053 | name. In the following code all strings which have to be translated are | |
3054 | filtered through the gettext function. That's all, the package speaks | |
3055 | your language. | |
3056 | ||
3057 | </P> | |
3058 | ||
3059 | ||
3060 | <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Solving Ambiguities</A></H3> | |
3061 | ||
3062 | <P> | |
3063 | While this single name domain work good for most applications there | |
3064 | might be the need to get translations from more than one domain. Of | |
3065 | course one could switch between different domains with calls to | |
3066 | <CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A | |
3067 | possible situation could be one case discussing while this writing: all | |
3068 | error messages of functions in the set of common used functions should | |
3069 | go into a separate domain <CODE>error</CODE>. By this mean we would only need | |
3070 | to translate them once. | |
3071 | ||
3072 | </P> | |
3073 | <P> | |
3074 | For this reasons there are two more functions to retrieve strings: | |
3075 | ||
3076 | </P> | |
3077 | ||
3078 | <PRE> | |
3079 | char *dgettext (const char *domain_name, const char *msgid); | |
3080 | char *dcgettext (const char *domain_name, const char *msgid, | |
3081 | int category); | |
3082 | </PRE> | |
3083 | ||
3084 | <P> | |
3085 | Both take an additional argument at the first place, which corresponds | |
3086 | to the argument of <CODE>textdomain</CODE>. The third argument of | |
3087 | <CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>. | |
3088 | But I really don't know where this can be useful. If the | |
3089 | <VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside | |
3090 | the known ones, the result is undefined. It should also be noted that | |
3091 | this function is not part of the second known implementation of this | |
3092 | function family, the one found in Solaris. | |
3093 | ||
3094 | </P> | |
3095 | <P> | |
3096 | A second ambiguity can arise by the fact, that perhaps more than one | |
3097 | domain has the same name. This can be solved by specifying where the | |
3098 | needed message catalog files can be found. | |
3099 | ||
3100 | </P> | |
3101 | ||
3102 | <PRE> | |
3103 | char *bindtextdomain (const char *domain_name, | |
3104 | const char *dir_name); | |
3105 | </PRE> | |
3106 | ||
3107 | <P> | |
3108 | Calling this function binds the given domain to a file in the specified | |
3109 | directory (how this file is determined follows below). Esp a file in | |
3110 | the systems default place is not favored against the specified file | |
3111 | anymore (as it would be by solely using <CODE>textdomain</CODE>). A <CODE>NULL</CODE> | |
3112 | pointer for the <VAR>dir_name</VAR> parameter returns the binding associated | |
3113 | with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is <CODE>NULL</CODE> | |
3114 | nothing happens and a <CODE>NULL</CODE> pointer is returned. Here again as | |
3115 | for all the other functions is true that none of the return value must | |
3116 | be changed! | |
3117 | ||
3118 | </P> | |
3119 | ||
3120 | ||
3121 | <H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">Locating Message Catalog Files</A></H3> | |
3122 | ||
3123 | <P> | |
3124 | Because many different languages for many different packages have to be | |
3125 | stored we need some way to add these information to file message catalog | |
3126 | files. The way usually used in Unix environments is have this encoding | |
3127 | in the file name. This is also done here. The directory name given in | |
3128 | <CODE>bindtextdomain</CODE>s second argument (or the default directory), | |
3129 | followed by the value and name of the locale and the domain name are | |
3130 | concatenated: | |
3131 | ||
3132 | </P> | |
3133 | ||
3134 | <PRE> | |
3135 | <VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo | |
3136 | </PRE> | |
3137 | ||
3138 | <P> | |
3139 | The default value for <VAR>dir_name</VAR> is system specific. For the GNU | |
3140 | library it's: | |
3141 | ||
3142 | <PRE> | |
3143 | /usr/local/share/locale | |
3144 | </PRE> | |
3145 | ||
3146 | <P> | |
3147 | <VAR>locale</VAR> is the value of the locale whose name is this | |
3148 | <CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this | |
3149 | locale is always <CODE>LC_MESSAGES</CODE>. <CODE>dcgettext</CODE> specifies the | |
3150 | locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> | |
3151 | ||
3152 | </P> | |
3153 | ||
3154 | ||
3155 | <H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">Optimization of the *gettext functions</A></H3> | |
3156 | ||
3157 | <P> | |
3158 | At this point of the discussion we should talk about an advantage of the | |
3159 | GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out | |
3160 | that an internationalized program might have a poor performance if some | |
3161 | string has to be translated in an inner loop. While this is unavoidable | |
3162 | when the string varies from one run of the loop to the other it is | |
3163 | simply a waste of time when the string is always the same. Take the | |
3164 | following example: | |
3165 | ||
3166 | </P> | |
3167 | ||
3168 | <PRE> | |
3169 | { | |
3170 | while (...) | |
3171 | { | |
3172 | puts (gettext ("Hello world")); | |
3173 | } | |
3174 | } | |
3175 | </PRE> | |
3176 | ||
3177 | <P> | |
3178 | When the locale selection does not change between two runs the resulting | |
3179 | string is always the same. One way to use this is: | |
3180 | ||
3181 | </P> | |
3182 | ||
3183 | <PRE> | |
3184 | { | |
3185 | str = gettext ("Hello world"); | |
3186 | while (...) | |
3187 | { | |
3188 | puts (str); | |
3189 | } | |
3190 | } | |
3191 | </PRE> | |
3192 | ||
3193 | <P> | |
3194 | But this solution is not usable in all situation (e.g. when the locale | |
3195 | selection changes) nor is it good readable. | |
3196 | ||
3197 | </P> | |
3198 | <P> | |
3199 | The GNU C compiler, version 2.7 and above, provide another solution for | |
3200 | this. To describe this we show here some lines of the | |
3201 | <TT>`intl/libgettext.h'</TT> file. For an explanation of the expression | |
3202 | command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>. | |
3203 | ||
3204 | </P> | |
3205 | ||
3206 | <PRE> | |
3207 | # if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7 | |
3208 | # define dcgettext(domainname, msgid, category) \ | |
3209 | (__extension__ \ | |
3210 | ({ \ | |
3211 | char *result; \ | |
3212 | if (__builtin_constant_p (msgid)) \ | |
3213 | { \ | |
3214 | extern int _nl_msg_cat_cntr; \ | |
3215 | static char *__translation__; \ | |
3216 | static int __catalog_counter__; \ | |
3217 | if (! __translation__ \ | |
3218 | || __catalog_counter__ != _nl_msg_cat_cntr) \ | |
3219 | { \ | |
3220 | __translation__ = \ | |
3221 | dcgettext__ ((domainname), (msgid), (category)); \ | |
3222 | __catalog_counter__ = _nl_msg_cat_cntr; \ | |
3223 | } \ | |
3224 | result = __translation__; \ | |
3225 | } \ | |
3226 | else \ | |
3227 | result = dcgettext__ ((domainname), (msgid), (category)); \ | |
3228 | result; \ | |
3229 | })) | |
3230 | # endif | |
3231 | </PRE> | |
3232 | ||
3233 | <P> | |
3234 | The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate. | |
3235 | This is evaluated at compile time and so optimization can take place | |
3236 | immediately. Here two cases are distinguished: the argument to | |
3237 | <CODE>gettext</CODE> is not a constant value in which case simply the function | |
3238 | <CODE>dcgettext__</CODE> is called, the real implementation of the | |
3239 | <CODE>dcgettext</CODE> function. | |
3240 | ||
3241 | </P> | |
3242 | <P> | |
3243 | If the string argument <EM>is</EM> constant we can reuse the once gained | |
3244 | translation when the locale selection has not changed. This is exactly | |
3245 | what is done here. The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in | |
3246 | the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is | |
3247 | changed whenever a new message catalog is loaded. | |
3248 | ||
3249 | </P> | |
3250 | ||
3251 | ||
3252 | <H2><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Comparing the Two Interfaces</A></H2> | |
3253 | ||
3254 | <P> | |
3255 | The following discussion is perhaps a little bit colored. As said | |
3256 | above we implemented GNU <CODE>gettext</CODE> following the Uniforum | |
3257 | proposal and this surely has its reasons. But it should show how we | |
3258 | came to this decision. | |
3259 | ||
3260 | </P> | |
3261 | <P> | |
3262 | First we take a look at the developing process. When we write an | |
3263 | application using NLS provided by <CODE>gettext</CODE> we proceed as always. | |
3264 | Only when we come to a string which might be seen by the users and thus | |
3265 | has to be translated we use <CODE>gettext("...")</CODE> instead of | |
3266 | <CODE>"..."</CODE>. At the beginning of each source file (or in a central | |
3267 | header file) we define | |
3268 | ||
3269 | </P> | |
3270 | ||
3271 | <PRE> | |
3272 | #define gettext(String) (String) | |
3273 | </PRE> | |
3274 | ||
3275 | <P> | |
3276 | Even this definition can be avoided when the system supports the | |
3277 | <CODE>gettext</CODE> function in its C library. When we compile this code the | |
3278 | result is the same as if no NLS code is used. When you take a look at | |
3279 | the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> | |
3280 | instead of <CODE>gettext("...")</CODE>. This reduces the number of | |
3281 | additional characters per translatable string to <EM>3</EM> (in words: | |
3282 | three). | |
3283 | ||
3284 | </P> | |
3285 | <P> | |
3286 | When now a production version of the program is needed we simply replace | |
3287 | the definition | |
3288 | ||
3289 | </P> | |
3290 | ||
3291 | <PRE> | |
3292 | #define _(String) (String) | |
3293 | </PRE> | |
3294 | ||
3295 | <P> | |
3296 | by | |
3297 | ||
3298 | </P> | |
3299 | ||
3300 | <PRE> | |
3301 | #include <libintl.h> | |
3302 | #define _(String) gettext (String) | |
3303 | </PRE> | |
3304 | ||
3305 | <P> | |
3306 | and include the header <TT>`libintl.h'</TT>. Additionally we run the | |
3307 | program <TT>`xgettext'</TT> on all source code file which contain | |
3308 | translatable strings and we are gone. We have a running program which | |
3309 | does not depend on translations to be available, but which can use any | |
3310 | that becomes available. | |
3311 | ||
3312 | </P> | |
3313 | <P> | |
3314 | The same procedure can be done for the <CODE>gettext_noop</CODE> invocations | |
3315 | (see section <A HREF="gettext.html#SEC17">Special Cases of Translatable Strings</A>). First you can define <CODE>gettext_noop</CODE> to a | |
3316 | no-op macro and later use the definition from <TT>`libintl.h'</TT>. Because | |
3317 | this name is not used in Suns implementation of <TT>`libintl.h'</TT>, | |
3318 | you should consider the following code for your project: | |
3319 | ||
3320 | </P> | |
3321 | ||
3322 | <PRE> | |
3323 | #ifdef gettext_noop | |
3324 | # define N_(Str) gettext_noop (Str) | |
3325 | #else | |
3326 | # define N_(Str) (Str) | |
3327 | #endif | |
3328 | </PRE> | |
3329 | ||
3330 | <P> | |
3331 | <CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in | |
3332 | the <TT>`po/'</TT> directory of GNU gettext knows by default both of the | |
3333 | mentioned short forms so you are invited to follow this proposal for | |
3334 | your own ease. | |
3335 | ||
3336 | </P> | |
3337 | <P> | |
3338 | Now to <CODE>catgets</CODE>. The main problem is the work for the | |
3339 | programmer. Every time he comes to a translatable string he has to | |
3340 | define a number (or a symbolic constant) which has also be defined in | |
3341 | the message catalog file. He also has to take care for duplicate | |
3342 | entries, duplicate message IDs etc. If he wants to have the same | |
3343 | quality in the message catalog as the GNU <CODE>gettext</CODE> program | |
3344 | provides he also has to put the descriptive comments for the strings and | |
3345 | the location in all source code files in the message catalog. This is | |
3346 | nearly a Mission: Impossible. | |
3347 | ||
3348 | </P> | |
3349 | <P> | |
3350 | But there are also some points people might call advantages speaking for | |
3351 | <CODE>catgets</CODE>. If you have a single word in a string and this string | |
3352 | is used in different contexts it is likely that in one or the other | |
3353 | language the word has different translations. Example: | |
3354 | ||
3355 | </P> | |
3356 | ||
3357 | <PRE> | |
3358 | printf ("%s: %d", gettext ("number"), number_of_errors) | |
3359 | ||
3360 | printf ("you should see %d %s", number_count, | |
3361 | number_count == 1 ? gettext ("number") : gettext ("numbers")) | |
3362 | </PRE> | |
3363 | ||
3364 | <P> | |
3365 | Here we have to translate two times the string <CODE>"number"</CODE>. Even | |
3366 | if you do not speak a language beside English it might be possible to | |
3367 | recognize that the two words have a different meaning. In German the | |
3368 | first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second | |
3369 | to <CODE>"Zahl"</CODE>. | |
3370 | ||
3371 | </P> | |
3372 | <P> | |
3373 | Now you can say that this example is really esoteric. And you are | |
3374 | right! This is exactly how we felt about this problem and decide that | |
3375 | it does not weight that much. The solution for the above problem could | |
3376 | be very easy: | |
3377 | ||
3378 | </P> | |
3379 | ||
3380 | <PRE> | |
3381 | printf (gettext ("number: %d"), number_of_errors) | |
3382 | ||
3383 | printf (number_count == 1 ? gettext ("you should see %d number") | |
3384 | : gettext ("you should see %d numbers"), | |
3385 | number_count) | |
3386 | </PRE> | |
3387 | ||
3388 | <P> | |
3389 | We believe that we can solve all conflicts with this method. If it is | |
3390 | difficult one can also consider changing one of the conflicting string a | |
3391 | little bit. But it is not impossible to overcome. | |
3392 | ||
3393 | </P> | |
3394 | <P> | |
3395 | Translator note: It is perhaps appropriate here to tell those English | |
3396 | speaking programmers that the plural form of a noun cannot be formed by | |
3397 | appending a single `s'. Most other languages use different methods. So | |
3398 | you should at least use the method given in the above example. | |
3399 | ||
3400 | </P> | |
3401 | <P> | |
3402 | But I have been told that some languages have even more complex rules. | |
3403 | A good approach might be to consider methods like the one used for | |
3404 | <CODE>LC_TIME</CODE> in the POSIX.2 standard. | |
3405 | ||
3406 | </P> | |
3407 | ||
3408 | ||
3409 | ||
3410 | <H2><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Using libintl.a in own programs</A></H2> | |
3411 | ||
3412 | <P> | |
3413 | Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be more | |
3414 | or less self-contained. I.e. you can use it in your own programs. The | |
3415 | <TT>`Makefile'</TT> will put the header and the library in directories | |
3416 | selected using the <CODE>$(prefix)</CODE>. | |
3417 | ||
3418 | </P> | |
3419 | <P> | |
3420 | One exception of the above is found on HP-UX systems. Here the C library | |
3421 | does not contain the <CODE>alloca</CODE> function (and the HP compiler does | |
3422 | not generate it inlined). But it is not intended to rewrite the whole | |
3423 | library just because of this dumb system. Instead include the | |
3424 | <CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in. | |
3425 | ||
3426 | </P> | |
3427 | ||
3428 | ||
3429 | ||
3430 | <H2><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Being a <CODE>gettext</CODE> grok</A></H2> | |
3431 | ||
3432 | <P> | |
3433 | To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it | |
3434 | is surely helpful to read the source code. But for those who don't want | |
3435 | to spend that much time in reading the (sometimes complicated) code here | |
3436 | is a list comments: | |
3437 | ||
3438 | </P> | |
3439 | ||
3440 | <UL> | |
3441 | <LI>Changing the language at runtime | |
3442 | ||
3443 | For interactive programs it might be useful to offer a selection of the | |
3444 | used language at runtime. To understand how to do this one need to know | |
3445 | how the used language is determined while executing the <CODE>gettext</CODE> | |
3446 | function. The method which is presented here only works correctly | |
3447 | with the GNU implementation of the <CODE>gettext</CODE> functions. It is not | |
3448 | possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE> | |
3449 | functions from the systems C library. The exception is of course the | |
3450 | GNU C Library which uses the GNU gettext Library for message handling. | |
3451 | ||
3452 | In the function <CODE>dcgettext</CODE> at every call the current setting of | |
3453 | the highest priority environment variable is determined and used. | |
3454 | Highest priority means here the following list with decreasing | |
3455 | priority: | |
3456 | ||
3457 | ||
3458 | <OL> | |
3459 | <LI><CODE>LANGUAGE</CODE> | |
3460 | ||
3461 | <LI><CODE>LC_ALL</CODE> | |
3462 | ||
3463 | <LI><CODE>LC_xxx</CODE>, according to selected locale | |
3464 | ||
3465 | <LI><CODE>LANG</CODE> | |
3466 | ||
3467 | </OL> | |
3468 | ||
3469 | Afterwards the path is constructed using the found value and the | |
3470 | translation file is loaded if available. | |
3471 | ||
3472 | What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According | |
3473 | to the process explained above the new value of this variable is found | |
3474 | as soon as the <CODE>dcgettext</CODE> function is called. But this also means | |
3475 | the (perhaps) different message catalog file is loaded. In other | |
3476 | words: the used language is changed. | |
3477 | ||
3478 | But there is one little hook. The code for gcc-2.7.0 and up provides | |
3479 | some optimization. This optimization normally prevents the calling of | |
3480 | the <CODE>dcgettext</CODE> function as long as now new catalog is loaded. But | |
3481 | if <CODE>dcgettext</CODE> is not called we program also cannot find the | |
3482 | <CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext.html#SEC44">Optimization of the *gettext functions</A>). But the | |
3483 | solution is very easy. Include the following code in the language | |
3484 | switching function. | |
3485 | ||
3486 | ||
3487 | <PRE> | |
3488 | /* Change language. */ | |
3489 | setenv ("LANGUAGE", "fr", 1); | |
3490 | ||
3491 | /* Make change known. */ | |
3492 | { | |
3493 | extern int _nl_msg_cat_cntr; | |
3494 | ++_nl_msg_cat_cntr; | |
3495 | } | |
3496 | </PRE> | |
3497 | ||
3498 | The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>. | |
3499 | ||
3500 | </UL> | |
3501 | ||
3502 | ||
3503 | ||
3504 | <H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Temporary Notes for the Programmers Chapter</A></H2> | |
3505 | ||
3506 | ||
3507 | ||
3508 | <H3><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Temporary - Two Possible Implementations</A></H3> | |
3509 | ||
3510 | <P> | |
3511 | There are two competing methods for language independent messages: | |
3512 | the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> | |
3513 | method. The <CODE>catgets</CODE> method indexes messages by integers; the | |
3514 | <CODE>gettext</CODE> method indexes them by their English translations. | |
3515 | The <CODE>catgets</CODE> method has been around longer and is supported | |
3516 | by more vendors. The <CODE>gettext</CODE> method is supported by Sun, | |
3517 | and it has been heard that the COSE multi-vendor initiative is | |
3518 | supporting it. Neither method is a POSIX standard; the POSIX.1 | |
3519 | committee had a lot of disagreement in this area. | |
3520 | ||
3521 | </P> | |
3522 | <P> | |
3523 | Neither one is in the POSIX standard. There was much disagreement | |
3524 | in the POSIX.1 committee about using the <CODE>gettext</CODE> routines | |
3525 | vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't | |
3526 | agree on anything, so no messaging system was included as part | |
3527 | of the standard. I believe the informative annex of the standard | |
3528 | includes the XPG3 messaging interfaces, "...as an example of | |
3529 | a messaging system that has been implemented..." | |
3530 | ||
3531 | </P> | |
3532 | <P> | |
3533 | They were very careful not to say anywhere that you should use one | |
3534 | set of interfaces over the other. For more on this topic please | |
3535 | see the Programming for Internationalization FAQ. | |
3536 | ||
3537 | </P> | |
3538 | ||
3539 | ||
3540 | <H3><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Temporary - About <CODE>catgets</CODE></A></H3> | |
3541 | ||
3542 | <P> | |
3543 | There have been a few discussions of late on the use of | |
3544 | <CODE>catgets</CODE> as a base. I think it important to present both | |
3545 | sides of the argument and hence am opting to play devil's advocate | |
3546 | for a little bit. | |
3547 | ||
3548 | </P> | |
3549 | <P> | |
3550 | I'll not deny the fact that <CODE>catgets</CODE> could have been designed | |
3551 | a lot better. It currently has quite a number of limitations and | |
3552 | these have already been pointed out. | |
3553 | ||
3554 | </P> | |
3555 | <P> | |
3556 | However there is a great deal to be said for consistency and | |
3557 | standardization. A common recurring problem when writing Unix | |
3558 | software is the myriad portability problems across Unix platforms. | |
3559 | It seems as if every Unix vendor had a look at the operating system | |
3560 | and found parts they could improve upon. Undoubtedly, these | |
3561 | modifications are probably innovative and solve real problems. | |
3562 | However, software developers have a hard time keeping up with all | |
3563 | these changes across so many platforms. | |
3564 | ||
3565 | </P> | |
3566 | <P> | |
3567 | And this has prompted the Unix vendors to begin to standardize their | |
3568 | systems. Hence the impetus for Spec1170. Every major Unix vendor | |
3569 | has committed to supporting this standard and every Unix software | |
3570 | developer waits with glee the day they can write software to this | |
3571 | standard and simply recompile (without having to use autoconf) | |
3572 | across different platforms. | |
3573 | ||
3574 | </P> | |
3575 | <P> | |
3576 | As I understand it, Spec1170 is roughly based upon version 4 of the | |
3577 | X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and | |
3578 | friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> | |
3579 | is a part of Spec1170 and hence will become a standardized component | |
3580 | of all Unix systems. | |
3581 | ||
3582 | </P> | |
3583 | ||
3584 | ||
3585 | <H3><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary - Why a single implementation</A></H3> | |
3586 | ||
3587 | <P> | |
3588 | Now it seems kind of wasteful to me to have two different systems | |
3589 | installed for accessing message catalogs. If we do want to remedy | |
3590 | <CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> | |
3591 | (in a compatible manner) rather than implement an entirely new system. | |
3592 | Otherwise, we'll end up with two message catalog access systems | |
3593 | installed with an operating system - one set of routines for GNU | |
3594 | software, and another set of routines (catgets) for all other software. | |
3595 | Bloated? | |
3596 | ||
3597 | </P> | |
3598 | <P> | |
3599 | Supposing another catalog access system is implemented. Which do | |
3600 | we recommend? At least for Linux, we need to attract as many | |
3601 | software developers as possible. Hence we need to make it as easy | |
3602 | for them to port their software as possible. Which means supporting | |
3603 | <CODE>catgets</CODE>. We will be implementing the <CODE>glocale</CODE> code | |
3604 | within our <CODE>libc</CODE>, but does this mean we also have to incorporate | |
3605 | another message catalog access scheme within our <CODE>libc</CODE> as well? | |
3606 | And what about people who are going to be using the <CODE>glocale</CODE> | |
3607 | + non-<CODE>catgets</CODE> routines. When they port their software to | |
3608 | other platforms, they're now going to have to include the front-end | |
3609 | (<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> | |
3610 | access routines) with their software instead of just including the | |
3611 | <CODE>glocale</CODE> code with their software. | |
3612 | ||
3613 | </P> | |
3614 | <P> | |
3615 | Message catalog support is however only the tip of the iceberg. | |
3616 | What about the data for the other locale categories. They also have | |
3617 | a number of deficiencies. Are we going to abandon them as well and | |
3618 | develop another duplicate set of routines (should <CODE>glocale</CODE> | |
3619 | expand beyond message catalog support)? | |
3620 | ||
3621 | </P> | |
3622 | <P> | |
3623 | Like many parts of Unix that can be improved upon, we're stuck with balancing | |
3624 | compatibility with the past with useful improvements and innovations for | |
3625 | the future. | |
3626 | ||
3627 | </P> | |
3628 | ||
3629 | ||
3630 | <H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Double layer solution</A></H3> | |
3631 | ||
3632 | <P> | |
3633 | GNU locale implements a <CODE>gettext</CODE>-style interface on top of a | |
3634 | <CODE>catgets</CODE>-style interface. | |
3635 | ||
3636 | </P> | |
3637 | <P> | |
3638 | This is not needless complexity. It is absolutely vital, because | |
3639 | it enables <CODE>gettext</CODE> to run on top of <CODE>catgets</CODE>, which | |
3640 | enables Linux International to recommend users use it <EM>today</EM>. | |
3641 | ||
3642 | </P> | |
3643 | <P> | |
3644 | Rewriting <CODE>gettext</CODE> so that it could use <EM>either</EM> | |
3645 | <CODE>catgets</CODE> <EM>or</EM> some simpler mechanism would not break | |
3646 | anything, but would not reduce complexity either. It might be | |
3647 | worth doing, but it isn't urgent. | |
3648 | ||
3649 | </P> | |
3650 | <P> | |
3651 | In general, simplicity is not enough of a reason to rewrite a | |
3652 | program that works. Simplicity is just one desirable thing. | |
3653 | It is not overridingly important. | |
3654 | ||
3655 | </P> | |
3656 | ||
3657 | ||
3658 | <H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - Notes</A></H3> | |
3659 | ||
3660 | <P> | |
3661 | X/Open agreed very late on the standard form so that many | |
3662 | implementations differ from the final form. Both of my system (old | |
3663 | Linux catgets and Ultrix-4) have a strange variation. | |
3664 | ||
3665 | </P> | |
3666 | <P> | |
3667 | OK. After incorporating the last changes I have to spend some time on | |
3668 | making the GNU/Linux libc gettext functions. So in future Solaris is | |
3669 | not the only system having gettext. | |
3670 | ||
3671 | </P> | |
3672 | ||
3673 | ||
3674 | <H1><A NAME="SEC54" HREF="gettext_toc.html#TOC54">The Translator's View</A></H1> | |
3675 | ||
3676 | ||
3677 | ||
3678 | <H2><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Introduction 0</A></H2> | |
3679 | ||
3680 | <P> | |
3681 | GNU is going international! The GNU Translation Project is a way | |
3682 | to get maintainers, translators and users all together, so GNU will | |
3683 | gradually become able to speak many native languages. | |
3684 | ||
3685 | </P> | |
3686 | <P> | |
3687 | The GNU <CODE>gettext</CODE> tool set contains <EM>everything</EM> maintainers | |
3688 | need for internationalizing their packages for messages. It also | |
3689 | contains quite useful tools for helping translators at localizing | |
3690 | messages to their native language, once a package has already been | |
3691 | internationalized. | |
3692 | ||
3693 | </P> | |
3694 | <P> | |
3695 | To achieve the GNU Translation Project, we need many interested | |
3696 | people who like their own language and write it well, and who are also | |
3697 | able to synergize with other translators speaking the same language. | |
3698 | If you'd like to volunteer to <EM>work</EM> at translating messages, | |
3699 | please send mail to your translating team. | |
3700 | ||
3701 | </P> | |
3702 | <P> | |
3703 | Each team has its own mailing list, courtesy of Linux | |
3704 | International. You may reach your translating team at the address | |
3705 | <TT>`<VAR>ll</VAR>@li.org'</TT>, replacing <VAR>ll</VAR> by the two-letter ISO 639 | |
3706 | code for your language. Language codes are <EM>not</EM> the same as | |
3707 | country codes given in ISO 3166. The following translating teams | |
3708 | exist: | |
3709 | ||
3710 | </P> | |
3711 | ||
3712 | <BLOCKQUOTE> | |
3713 | <P> | |
3714 | Chinese <CODE>zh</CODE>, Czech <CODE>cs</CODE>, Danish <CODE>da</CODE>, Dutch <CODE>nl</CODE>, | |
3715 | Esperanto <CODE>eo</CODE>, Finnish <CODE>fi</CODE>, French <CODE>fr</CODE>, Irish | |
3716 | <CODE>ga</CODE>, German <CODE>de</CODE>, Greek <CODE>el</CODE>, Italian <CODE>it</CODE>, | |
3717 | Japanese <CODE>ja</CODE>, Indonesian <CODE>in</CODE>, Norwegian <CODE>no</CODE>, Polish | |
3718 | <CODE>pl</CODE>, Portuguese <CODE>pt</CODE>, Russian <CODE>ru</CODE>, Spanish <CODE>es</CODE>, | |
3719 | Swedish <CODE>sv</CODE> and Turkish <CODE>tr</CODE>. | |
3720 | </BLOCKQUOTE> | |
3721 | ||
3722 | <P> | |
3723 | For example, you may reach the Chinese translating team by writing to | |
3724 | <TT>`zh@li.org'</TT>. When you become a member of the translating team | |
3725 | for your own language, you may subscribe to its list. For example, | |
3726 | Swedish people can send a message to <TT>`sv-request@li.org'</TT>, | |
3727 | having this message body: | |
3728 | ||
3729 | </P> | |
3730 | ||
3731 | <PRE> | |
3732 | subscribe | |
3733 | </PRE> | |
3734 | ||
3735 | <P> | |
3736 | Keep in mind that team members should be interested in <EM>working</EM> | |
3737 | at translations, or at solving translational difficulties, rather than | |
3738 | merely lurking around. If your team does not exist yet and you want to | |
3739 | start one, please write to <TT>`gnu-translation@prep.ai.mit.edu'</TT>; | |
3740 | you will then reach the GNU coordinator for all translator teams. | |
3741 | ||
3742 | </P> | |
3743 | <P> | |
3744 | A handful of GNU packages have already been adapted and provided | |
3745 | with message translations for several languages. Translation | |
3746 | teams have begun to organize, using these packages as a starting | |
3747 | point. But there are many more packages and many languages for | |
3748 | which we have no volunteer translators. If you would like to | |
3749 | volunteer to work at translating messages, please send mail to | |
3750 | <TT>`gnu-translation@prep.ai.mit.edu'</TT> indicating what language(s) | |
3751 | you can work on. | |
3752 | ||
3753 | </P> | |
3754 | ||
3755 | ||
3756 | <H2><A NAME="SEC56" HREF="gettext_toc.html#TOC56">Introduction 1</A></H2> | |
3757 | ||
3758 | <P> | |
3759 | This is now official, GNU is going international! Here is the | |
3760 | announcement submitted for the January 1995 GNU Bulletin: | |
3761 | ||
3762 | </P> | |
3763 | ||
3764 | <BLOCKQUOTE> | |
3765 | <P> | |
3766 | A handful of GNU packages have already been adapted and provided | |
3767 | with message translations for several languages. Translation | |
3768 | teams have begun to organize, using these packages as a starting | |
3769 | point. But there are many more packages and many languages | |
3770 | for which we have no volunteer translators. If you'd like to | |
3771 | volunteer to work at translating messages, please send mail to | |
3772 | <SAMP>`gnu-translation@prep.ai.mit.edu'</SAMP> indicating what language(s) | |
3773 | you can work on. | |
3774 | </BLOCKQUOTE> | |
3775 | ||
3776 | <P> | |
3777 | This document should answer many questions for those who are curious | |
3778 | about the process or would like to contribute. Please at least skim | |
3779 | over it, hoping to cut down a little of the high volume of email | |
3780 | generated by this collective effort towards GNU internationalization. | |
3781 | ||
3782 | </P> | |
3783 | <P> | |
3784 | GNU programming is done in English, and currently, English is used | |
3785 | as the main communicating language between national communities | |
3786 | collaborating to the GNU project. This very document is written | |
3787 | in English. This will not change in the foreseeable future. | |
3788 | ||
3789 | </P> | |
3790 | <P> | |
3791 | However, there is a strong appetite from national communities for | |
3792 | having more software able to write using national language and habits, | |
3793 | and there is an on-going effort to modify GNU software in such a way | |
3794 | that it becomes able to do so. The experiments driven so far raised | |
3795 | an enthusiastic response from pretesters, so we believe that GNU | |
3796 | internationalization is dedicated to succeed. | |
3797 | ||
3798 | </P> | |
3799 | <P> | |
3800 | For suggestion clarifications, additions or corrections to this | |
3801 | document, please email to <TT>`gnu-translation@prep.ai.mit.edu'</TT>. | |
3802 | ||
3803 | </P> | |
3804 | ||
3805 | ||
3806 | <H2><A NAME="SEC57" HREF="gettext_toc.html#TOC57">Discussions</A></H2> | |
3807 | ||
3808 | <P> | |
3809 | Facing this internationalization effort, a few users expressed their | |
3810 | concerns. Some of these doubts are presented and discussed, here. | |
3811 | ||
3812 | </P> | |
3813 | ||
3814 | <UL> | |
3815 | <LI>Smaller groups | |
3816 | ||
3817 | Some languages are not spoken by a very large number of people, | |
3818 | so people speaking them sometimes consider that there may not be | |
3819 | all that much demand such versions of GNU packages. Moreover, many | |
3820 | people being <EM>into computers</EM>, in some countries, generally seem | |
3821 | to prefer English versions of their software. | |
3822 | ||
3823 | On the other end, people might enjoy their own language a lot, and | |
3824 | be very motivated at providing to themselves the pleasure of having | |
3825 | their beloved GNU software speaking their mother tongue. They do | |
3826 | themselves a personal favor, and do not pay that much attention to | |
3827 | the number of people beneficiating of their work. | |
3828 | ||
3829 | <LI>Misinterpretation | |
3830 | ||
3831 | Other users are shy to push forward their own language, seeing in this | |
3832 | some kind of misplaced propaganda. Someone thought there must be some | |
3833 | users of the language over the networks pestering other people with it. | |
3834 | ||
3835 | But any spoken language is worth localization, because there are | |
3836 | people behind the language for whom the language is important and | |
3837 | dear to their hearts. | |
3838 | ||
3839 | <LI>Odd translations | |
3840 | ||
3841 | The biggest problem is to find the right translations so that | |
3842 | everybody can understand the messages. Translations are usually a | |
3843 | little odd. Some people get used to English, to the extent they may | |
3844 | find translations into their own language "rather pushy, obnoxious | |
3845 | and sometimes even hilarious." As a French speaking man, I have | |
3846 | the experience of those instruction manuals for goods, so poorly | |
3847 | translated in French in Korea or Taiwan... | |
3848 | ||
3849 | The fact is that we sometimes have to create a kind of national | |
3850 | computer culture, and this is not easy without the collaboration of | |
3851 | many people liking their mother tongue. This is why translations are | |
3852 | better achieved by people knowing and loving their own language, and | |
3853 | ready to work together at improving the results they obtain. | |
3854 | ||
3855 | <LI>Dependencies over the GPL | |
3856 | ||
3857 | Some people wonder if using GNU <CODE>gettext</CODE> necessarily brings their package | |
3858 | under the protective wing of the GNU General Public License, when they | |
3859 | do not want to make their program free, or want other kinds of freedom. | |
3860 | The simplest answer is yes. | |
3861 | ||
3862 | The mere marking of localizable strings in a package, or conditional | |
3863 | inclusion of a few lines for initialization, is not really including | |
3864 | GPL'ed code. However, the localization routines themselves are under | |
3865 | the GPL and would bring the remainder of the package under the GPL | |
3866 | if they were distributed with it. So, I presume that, for those | |
3867 | for which this is a problem, it could be circumvented by letting to | |
3868 | the end installers the burden of assembling a package prepared for | |
3869 | localization, but not providing the localization routines themselves. | |
3870 | ||
3871 | </UL> | |
3872 | ||
3873 | ||
3874 | ||
3875 | <H2><A NAME="SEC58" HREF="gettext_toc.html#TOC58">Organization</A></H2> | |
3876 | ||
3877 | <P> | |
3878 | On a larger scale, the true solution would be to organize some kind of | |
3879 | fairly precise set up in which volunteers could participate. I gave | |
3880 | some thought to this idea lately, and realize there will be some | |
3881 | touchy points. I thought of writing to Richard Stallman to launch | |
3882 | such a project, but feel it might be good to shake out the ideas | |
3883 | between ourselves first. Most probably that Linux International has | |
3884 | some experience in the field already, or would like to orchestrate | |
3885 | the volunteer work, maybe. Food for thought, in any case! | |
3886 | ||
3887 | </P> | |
3888 | <P> | |
3889 | I guess we have to setup something early, somehow, that will help | |
3890 | many possible contributors of the same language to interlock and avoid | |
3891 | work duplication, and further be put in contact for solving together | |
3892 | problems particular to their tongue (in most languages, there are many | |
3893 | difficulties peculiar to translating technical English). My Swedish | |
3894 | contributor acknowledged these difficulties, and I'm well aware of | |
3895 | them for French. | |
3896 | ||
3897 | </P> | |
3898 | <P> | |
3899 | This is surely not a technical issue, but we should manage so the | |
3900 | effort of locale contributors be maximally useful, despite the national | |
3901 | team layer interface between contributors and maintainers. | |
3902 | ||
3903 | </P> | |
3904 | <P> | |
3905 | GNU needs some setup for coordinating language coordinators. | |
3906 | Localizing evolving GNU programs will surely become a permanent | |
3907 | and continuous activity in GNU, once started. The setup should be | |
3908 | minimally completed and tested before GNU <CODE>gettext</CODE> becomes an official | |
3909 | reality. The email address <TT>`gnu-translation@prep.ai.mit.edu'</TT> | |
3910 | has been setup for receiving offers from volunteers and general | |
3911 | email on these topics. This address reaches the GNU Translation | |
3912 | Project coordinator. | |
3913 | ||
3914 | </P> | |
3915 | ||
3916 | ||
3917 | ||
3918 | <H3><A NAME="SEC59" HREF="gettext_toc.html#TOC59">Central Coordination</A></H3> | |
3919 | ||
3920 | <P> | |
3921 | I also think GNU will need sooner than it thinks, that someone setup | |
3922 | a way to organize and coordinate these groups. Some kind of group | |
3923 | of groups. My opinion is that it would be good that GNU delegate | |
3924 | this task to a small group of collaborating volunteers, shortly. | |
3925 | Perhaps in <TT>`gnu.announce'</TT> a list of this national committee's | |
3926 | can be published. | |
3927 | ||
3928 | </P> | |
3929 | <P> | |
3930 | My role as coordinator would simply be to refer to Ulrich any German | |
3931 | speaking volunteer interested to localization of GNU programs, and | |
3932 | maybe helping national groups to initially organize, while maintaining | |
3933 | national registries for until national groups are ready to take over. | |
3934 | In fact, the coordinator should ease volunteers to get in contact with | |
3935 | one another for creating national teams, which should then select | |
3936 | one coordinator per language, or country (regionalized language). | |
3937 | If well done, the coordination should be useful without being an | |
3938 | overwhelming task, the time to put delegations in place. | |
3939 | ||
3940 | </P> | |
3941 | ||
3942 | ||
3943 | <H3><A NAME="SEC60" HREF="gettext_toc.html#TOC60">National Teams</A></H3> | |
3944 | ||
3945 | <P> | |
3946 | I suggest we look for volunteer coordinators/editors for individual | |
3947 | languages. These people will scan contributions of translation files | |
3948 | for various programs, for their own languages, and will ensure high | |
3949 | and uniform standards of diction. | |
3950 | ||
3951 | </P> | |
3952 | <P> | |
3953 | From my current experience with other people in these days, those who | |
3954 | provide localizations are very enthusiastic about the process, and are | |
3955 | more interested in the localization process than in the program they | |
3956 | localize, and want to do many programs, not just one. This seems | |
3957 | to confirm that having a coordinator/editor for each language is a | |
3958 | good idea. | |
3959 | ||
3960 | </P> | |
3961 | <P> | |
3962 | We need to choose someone who is good at writing clear and concise | |
3963 | prose in the language in question. That is hard--we can't check | |
3964 | it ourselves. So we need to ask a few people to judge each others' | |
3965 | writing and select the one who is best. | |
3966 | ||
3967 | </P> | |
3968 | <P> | |
3969 | I announce my prerelease to a few dozen people, and you would not | |
3970 | believe all the discussions it generated already. I shudder to think | |
3971 | what will happen when this will be launched, for true, officially, | |
3972 | world wide. Who am I to arbitrate between two Czekolsovak users | |
3973 | contradicting each other, for example? | |
3974 | ||
3975 | </P> | |
3976 | <P> | |
3977 | I assume that your German is not much better than my French so that | |
3978 | I would not be able to judge about these formulations. What I would | |
3979 | suggest is that for each language there is a group for people who | |
3980 | maintain the PO files and judge about changes. I suspect there will | |
3981 | be cultural differences between how such groups of people will behave. | |
3982 | Some will have relaxed ways, reach consensus easily, and have anyone | |
3983 | of the group relate to the maintainers, while others will fight to | |
3984 | death, organize heavy administrations up to national standards, and | |
3985 | use strict channels. | |
3986 | ||
3987 | </P> | |
3988 | <P> | |
3989 | The German team is putting out a good example. Right now, they are | |
3990 | maybe half a dozen people revising translations of each other and | |
3991 | discussing the linguistic issues. I do not even have all the names. | |
3992 | Ulrich Drepper is taking care of coordinating the German team. | |
3993 | He subscribed to all my pretest lists, so I do not even have to warn | |
3994 | him specifically of incoming releases. | |
3995 | ||
3996 | </P> | |
3997 | <P> | |
3998 | I'm sure, that is a good idea to get teams for each language working | |
3999 | on translations. That will make the translations better and more | |
4000 | consistent. | |
4001 | ||
4002 | </P> | |
4003 | ||
4004 | ||
4005 | ||
4006 | <H4><A NAME="SEC61" HREF="gettext_toc.html#TOC61">Sub-Cultures</A></H4> | |
4007 | ||
4008 | <P> | |
4009 | Taking French for example, there are a few sub-cultures around | |
4010 | computers which developed diverging vocabularies. Picking volunteers | |
4011 | here and there without addressing this problem in an organized way, | |
4012 | soon in the project, might produce a distasteful mix of GNU programs, | |
4013 | and possibly trigger endless quarrels among those who really care. | |
4014 | ||
4015 | </P> | |
4016 | <P> | |
4017 | Keeping some kind of unity in the way French localization of GNU | |
4018 | programs is achieved is a difficult (and delicate) job. Knowing the | |
4019 | latin character of French people (:-), if we take this the wrong | |
4020 | way, we could end up nowhere, or spoil a lot of energies. Maybe we | |
4021 | should begin to address this problem seriously <EM>before</EM> GNU | |
4022 | <CODE>gettext</CODE> become officially published. And I suspect that this | |
4023 | means soon! | |
4024 | ||
4025 | </P> | |
4026 | ||
4027 | ||
4028 | <H4><A NAME="SEC62" HREF="gettext_toc.html#TOC62">Organizational Ideas</A></H4> | |
4029 | ||
4030 | <P> | |
4031 | I expect the next big changes after the official release. Please note | |
4032 | that I use the German translation of the short GPL message. We need | |
4033 | to set a few good examples before the localization goes out for true | |
4034 | in GNU. Here are a few points to discuss: | |
4035 | ||
4036 | </P> | |
4037 | ||
4038 | <UL> | |
4039 | <LI> | |
4040 | ||
4041 | Each group should have one FTP server (at least one master). | |
4042 | ||
4043 | <LI> | |
4044 | ||
4045 | The files on the server should reflect the latest version (of | |
4046 | course!) and it should also contain a RCS directory with the | |
4047 | corresponding archives (I don't have this now). | |
4048 | ||
4049 | <LI> | |
4050 | ||
4051 | There should also be a ChangeLog file (this is more useful than the | |
4052 | RCS archive but can be generated automatically from the later by | |
4053 | Emacs). | |
4054 | ||
4055 | <LI> | |
4056 | ||
4057 | A <STRONG>core group</STRONG> should judge about questionable changes (for now | |
4058 | this group consists solely by me but I ask some others occasionally; | |
4059 | this also seems to work). | |
4060 | ||
4061 | </UL> | |
4062 | ||
4063 | ||
4064 | ||
4065 | <H3><A NAME="SEC63" HREF="gettext_toc.html#TOC63">Mailing Lists</A></H3> | |
4066 | ||
4067 | <P> | |
4068 | If we get any inquiries about GNU <CODE>gettext</CODE>, send them on to: | |
4069 | ||
4070 | </P> | |
4071 | ||
4072 | <PRE> | |
4073 | <TT>`gnu-translation@prep.ai.mit.edu'</TT> | |
4074 | </PRE> | |
4075 | ||
4076 | <P> | |
4077 | The <TT>`*-pretest'</TT> lists are quite useful to me, maybe the idea could | |
4078 | be generalized to all GNU packages. But each maintainer his/her way! | |
4079 | ||
4080 | </P> | |
4081 | <P> | |
4082 | , we have a mechanism in place here at | |
4083 | <TT>`gnu.ai.mit.edu'</TT> to track teams, support mailing lists for | |
4084 | them and log members. We have a slight preference that you use it. | |
4085 | If this is OK with you, I can get you clued in. | |
4086 | ||
4087 | </P> | |
4088 | <P> | |
4089 | Things are changing! A few years ago, when Daniel Fekete and I | |
4090 | asked for a mailing list for GNU localization, nested at the FSF, we | |
4091 | were politely invited to organize it anywhere else, and so did we. | |
4092 | For communicating with my pretesters, I later made a handful of | |
4093 | mailing lists located at iro.umontreal.ca and administrated by | |
4094 | <CODE>majordomo</CODE>. These lists have been <EM>very</EM> dependable | |
4095 | so far... | |
4096 | ||
4097 | </P> | |
4098 | <P> | |
4099 | I suspect that the German team will organize itself a mailing list | |
4100 | located in Germany, and so forth for other countries. But before they | |
4101 | organize for true, it could surely be useful to offer mailing lists | |
4102 | located at the FSF to each national team. So yes, please explain me | |
4103 | how I should proceed to create and handle them. | |
4104 | ||
4105 | </P> | |
4106 | <P> | |
4107 | We should create temporary mailing lists, one per country, to help | |
4108 | people organize. Temporary, because once regrouped and structured, it | |
4109 | would be fair the volunteers from country bring back <EM>their</EM> list | |
4110 | in there and manage it as they want. My feeling is that, in the long | |
4111 | run, each team should run its own list, from within their country. | |
4112 | There also should be some central list to which all teams could | |
4113 | subscribe as they see fit, as long as each team is represented in it. | |
4114 | ||
4115 | </P> | |
4116 | ||
4117 | ||
4118 | <H2><A NAME="SEC64" HREF="gettext_toc.html#TOC64">Information Flow</A></H2> | |
4119 | ||
4120 | <P> | |
4121 | There will surely be some discussion about this messages after the | |
4122 | packages are finally released. If people now send you some proposals | |
4123 | for better messages, how do you proceed? Jim, please note that | |
4124 | right now, as I put forward nearly a dozen of localizable programs, I | |
4125 | receive both the translations and the coordination concerns about them. | |
4126 | ||
4127 | </P> | |
4128 | <P> | |
4129 | If I put one of my things to pretest, Ulrich receives the announcement | |
4130 | and passes it on to the German team, who make last minute revisions. | |
4131 | Then he submits the translation files to me <EM>as the maintainer</EM>. | |
4132 | For GNU packages I do not maintain, I would not even hear about it. | |
4133 | This scheme could be made to work GNU-wide, I think. For security | |
4134 | reasons, maybe Ulrich (national coordinators, in fact) should update | |
4135 | central registry kept by GNU (Jim, me, or Len's recruits) once in | |
4136 | a while. | |
4137 | ||
4138 | </P> | |
4139 | <P> | |
4140 | In December/January, I was aggressively ready to internationalize | |
4141 | all of GNU, giving myself the duty of one small GNU package per week | |
4142 | or so, taking many weeks or months for bigger packages. But it does | |
4143 | not work this way. I first did all the things I'm responsible for. | |
4144 | I've nothing against some missionary work on other maintainers, but | |
4145 | I'm also loosing a lot of energy over it--same debates over again. | |
4146 | ||
4147 | </P> | |
4148 | <P> | |
4149 | And when the first localized packages are released we'll get a lot of | |
4150 | responses about ugly translations :-). Surely, and we need to have | |
4151 | beforehand a fairly good idea about how to handle the information | |
4152 | flow between the national teams and the package maintainers. | |
4153 | ||
4154 | </P> | |
4155 | <P> | |
4156 | Please start saving somewhere a quick history of each PO file. I know | |
4157 | for sure that the file format will change, allowing for comments. | |
4158 | It would be nice that each file has a kind of log, and references for | |
4159 | those who want to submit comments or gripes, or otherwise contribute. | |
4160 | I sent a proposal for a fast and flexible format, but it is not | |
4161 | receiving acceptance yet by the GNU deciders. I'll tell you when I | |
4162 | have more information about this. | |
4163 | ||
4164 | </P> | |
4165 | ||
4166 | ||
4167 | <H1><A NAME="SEC65" HREF="gettext_toc.html#TOC65">The Maintainer's View</A></H1> | |
4168 | ||
4169 | <P> | |
4170 | The maintainer of a package has many responsibilities. One of them | |
4171 | is ensuring that the package will install easily on many platforms, | |
4172 | and that the magic we described earlier (see section <A HREF="gettext.html#SEC32">The User's View</A>) will work | |
4173 | for installers and end users. | |
4174 | ||
4175 | </P> | |
4176 | <P> | |
4177 | Of course, there are many possible ways by which GNU <CODE>gettext</CODE> | |
4178 | might be integrated in a distribution, and this chapter does not cover | |
4179 | them in all generality. Instead, it details one possible approach | |
4180 | which is especially adequate for many GNU distributions, because | |
4181 | GNU <CODE>gettext</CODE> is purposely for helping the internationalization | |
4182 | of the whole GNU project. So, the maintainer's view presented here | |
4183 | presumes that the package already has a <TT>`configure.in'</TT> file and | |
4184 | uses Autoconf. | |
4185 | ||
4186 | </P> | |
4187 | <P> | |
4188 | Nevertheless, GNU <CODE>gettext</CODE> may surely be useful for non-GNU | |
4189 | packages, but the maintainers of such packages might have to show | |
4190 | imagination and initiative in organizing their distributions so | |
4191 | <CODE>gettext</CODE> work for them in all situations. There are surely | |
4192 | many, out there. | |
4193 | ||
4194 | </P> | |
4195 | <P> | |
4196 | Even if <CODE>gettext</CODE> methods are now stabilizing, slight adjustments | |
4197 | might be needed between successive <CODE>gettext</CODE> versions, so you | |
4198 | should ideally revise this chapter in subsequent releases, looking | |
4199 | for changes. | |
4200 | ||
4201 | </P> | |
4202 | ||
4203 | ||
4204 | ||
4205 | <H2><A NAME="SEC66" HREF="gettext_toc.html#TOC66">Flat or Non-Flat Directory Structures</A></H2> | |
4206 | ||
4207 | <P> | |
4208 | Some GNU packages are distributed as <CODE>tar</CODE> files which unpack | |
4209 | in a single directory, these are said to be <STRONG>flat</STRONG> distributions. | |
4210 | Other GNU packages have a one level hierarchy of subdirectories, using | |
4211 | for example a subdirectory named <TT>`doc/'</TT> for the Texinfo manual and | |
4212 | man pages, another called <TT>`lib/'</TT> for holding functions meant to | |
4213 | replace or complement C libraries, and a subdirectory <TT>`src/'</TT> for | |
4214 | holding the proper sources for the package. These other distributions | |
4215 | are said to be <STRONG>non-flat</STRONG>. | |
4216 | ||
4217 | </P> | |
4218 | <P> | |
4219 | For now, we cannot say much about flat distributions. A flat | |
4220 | directory structure has the disadvantage of increasing the difficulty | |
4221 | of updating to a new version of GNU <CODE>gettext</CODE>. Also, if you have | |
4222 | many PO files, this could somewhat pollute your single directory. | |
4223 | In the GNU <CODE>gettext</CODE> distribution, the <TT>`misc/'</TT> directory | |
4224 | contains a shell script named <TT>`combine-sh'</TT>. That script may | |
4225 | be used for combining all the C files of the <TT>`intl/'</TT> directory | |
4226 | into a pair of C files (one <TT>`.c'</TT> and one <TT>`.h'</TT>). Those two | |
4227 | generated files would fit more easily in a flat directory structure, | |
4228 | and you will then have to add these two files to your project. | |
4229 | ||
4230 | </P> | |
4231 | <P> | |
4232 | Maybe because GNU <CODE>gettext</CODE> itself has a non-flat structure, | |
4233 | we have more experience with this approach, and this is what will be | |
4234 | described in the remaining of this chapter. Some maintainers might | |
4235 | use this as an opportunity to unflatten their package structure. | |
4236 | Only later, once gained more experience adapting GNU <CODE>gettext</CODE> | |
4237 | to flat distributions, we might add some notes about how to proceed | |
4238 | in flat situations. | |
4239 | ||
4240 | </P> | |
4241 | ||
4242 | ||
4243 | <H2><A NAME="SEC67" HREF="gettext_toc.html#TOC67">Prerequisite Works</A></H2> | |
4244 | ||
4245 | <P> | |
4246 | There are some works which are required for using GNU <CODE>gettext</CODE> | |
4247 | in one of your package. These works have some kind of generality | |
4248 | that escape the point by point descriptions used in the remainder | |
4249 | of this chapter. So, we describe them here. | |
4250 | ||
4251 | </P> | |
4252 | ||
4253 | <UL> | |
4254 | <LI> | |
4255 | ||
4256 | Before attempting to use you should install some other packages first. | |
4257 | Ensure that recent versions of GNU <CODE>m4</CODE>, GNU Autoconf and GNU | |
4258 | <CODE>gettext</CODE> are already installed at your site, and if not, proceed | |
4259 | to do this first. If you got to install these things, beware that | |
4260 | GNU <CODE>m4</CODE> must be fully installed before GNU Autoconf is even | |
4261 | <EM>configured</EM>. | |
4262 | ||
4263 | Those three packages are only needed to you, as a maintainer; the | |
4264 | installers of your own package and end users do not really need any | |
4265 | of GNU <CODE>m4</CODE>, GNU Autoconf or GNU <CODE>gettext</CODE> for successfully | |
4266 | installing and running your package, with messages properly translated. | |
4267 | But this is not completely true if you provide internationalized | |
4268 | shell scripts within your own package: GNU <CODE>gettext</CODE> shall | |
4269 | then be installed at the user site if the end users want to see the | |
4270 | translation of shell script messages. | |
4271 | ||
4272 | <LI> | |
4273 | ||
4274 | Your package should use Autoconf and have a <TT>`configure.in'</TT> file. | |
4275 | If it does not, you have to learn how. The Autoconf documentation | |
4276 | is quite well written, it is a good idea that you print it and get | |
4277 | familiar with it. | |
4278 | ||
4279 | <LI> | |
4280 | ||
4281 | Your C sources should have already been modified according to | |
4282 | instructions given earlier in this manual. See section <A HREF="gettext.html#SEC13">Preparing Program Sources</A>. | |
4283 | ||
4284 | <LI> | |
4285 | ||
4286 | Your <TT>`po/'</TT> directory should receive all PO files submitted to you | |
4287 | by the translator teams, each having <TT>`<VAR>ll</VAR>.po'</TT> as a name. | |
4288 | This is not usually easy to get translation | |
4289 | work done before your package gets internationalized and available! | |
4290 | Since the cycle has to start somewhere, the easiest for the maintainer | |
4291 | is to start with absolutely no PO files, and wait until various | |
4292 | translator teams get interested in your package, and submit PO files. | |
4293 | ||
4294 | </UL> | |
4295 | ||
4296 | <P> | |
4297 | It is worth adding here a few words about how the maintainer should | |
4298 | ideally behave with PO files submissions. As a maintainer, your | |
4299 | role is to authentify the origin of the submission as being the | |
4300 | representative of the appropriate GNU translating team (forward the | |
4301 | submission to <TT>`gnu-translation@prep.ai.mit.edu'</TT> in case of | |
4302 | doubt), to ensure that the PO file format is not severely broken and | |
4303 | does not prevent successful installation, and for the rest, to merely | |
4304 | to put these PO files in <TT>`po/'</TT> for distribution. | |
4305 | ||
4306 | </P> | |
4307 | <P> | |
4308 | As a maintainer, you do not have to take on your shoulders the | |
4309 | responsibility of checking if the translations are adequate or | |
4310 | complete, and should avoid diving into linguistic matters. Translation | |
4311 | teams drive themselves and are fully responsible of their linguistic | |
4312 | choices for GNU. Keep in mind that translator teams are <EM>not</EM> | |
4313 | driven by maintainers. You can help by carefully redirecting all | |
4314 | communications and reports from users about linguistic matters to the | |
4315 | appropriate translation team, or explain users how to reach or join | |
4316 | their team. The simplest might be to send them the <TT>`NLS'</TT> file. | |
4317 | ||
4318 | </P> | |
4319 | <P> | |
4320 | Maintainers should <EM>never ever</EM> apply PO file bug reports | |
4321 | themselves, short-cutting translation teams. If some translator has | |
4322 | difficulty to get some of her points through her team, it should not be | |
4323 | an issue for her to directly negotiate translations with maintainers. | |
4324 | Teams ought to settle their problems themselves, if any. If you, as | |
4325 | a maintainer, ever think there is a real problem with a team, please | |
4326 | never try to <EM>solve</EM> a team's problem on your own. | |
4327 | ||
4328 | </P> | |
4329 | ||
4330 | ||
4331 | <H2><A NAME="SEC68" HREF="gettext_toc.html#TOC68">Invoking the <CODE>gettextize</CODE> Program</A></H2> | |
4332 | ||
4333 | <P> | |
4334 | Some files are consistently and identically needed in every package | |
4335 | internationalized through GNU <CODE>gettext</CODE>. As a matter of | |
4336 | convenience, the <CODE>gettextize</CODE> program puts all these files right | |
4337 | in your package. This program has the following synopsis: | |
4338 | ||
4339 | </P> | |
4340 | ||
4341 | <PRE> | |
4342 | gettextize [ <VAR>option</VAR>... ] [ <VAR>directory</VAR> ] | |
4343 | </PRE> | |
4344 | ||
4345 | <P> | |
4346 | and accepts the following options: | |
4347 | ||
4348 | </P> | |
4349 | <DL COMPACT> | |
4350 | ||
4351 | <DT><SAMP>`-f'</SAMP> | |
4352 | <DD> | |
4353 | <DT><SAMP>`--force'</SAMP> | |
4354 | <DD> | |
4355 | Force replacement of files which already exist. | |
4356 | ||
4357 | <DT><SAMP>`-h'</SAMP> | |
4358 | <DD> | |
4359 | <DT><SAMP>`--help'</SAMP> | |
4360 | <DD> | |
4361 | Display this help and exit. | |
4362 | ||
4363 | <DT><SAMP>`--version'</SAMP> | |
4364 | <DD> | |
4365 | Output version information and exit. | |
4366 | ||
4367 | </DL> | |
4368 | ||
4369 | <P> | |
4370 | If <VAR>directory</VAR> is given, this is the top level directory of a | |
4371 | package to prepare for using GNU <CODE>gettext</CODE>. If not given, it | |
4372 | is assumed that the current directory is the top level directory of | |
4373 | such a package. | |
4374 | ||
4375 | </P> | |
4376 | <P> | |
4377 | The program <CODE>gettextize</CODE> provides the following files. However, | |
4378 | no existing file will be replaced unless the option <CODE>--force</CODE> | |
4379 | (<CODE>-f</CODE>) is specified. | |
4380 | ||
4381 | </P> | |
4382 | ||
4383 | <OL> | |
4384 | <LI> | |
4385 | ||
4386 | The <TT>`NLS'</TT> file is copied in the main directory of your package, | |
4387 | the one being at the top level. This file gives the main indications | |
4388 | about how to install and use the Native Language Support features | |
4389 | of your program. You might elect to use a more recent copy of this | |
4390 | <TT>`NLS'</TT> file than the one provided through <CODE>gettextize</CODE>, if | |
4391 | you have one handy. You may also fetch a more recent copy of file | |
4392 | <TT>`NLS'</TT> from most GNU archive sites. | |
4393 | ||
4394 | <LI> | |
4395 | ||
4396 | A <TT>`po/'</TT> directory is created for eventually holding | |
4397 | all translation files, but initially only containing the file | |
4398 | <TT>`po/Makefile.in.in'</TT> from the GNU <CODE>gettext</CODE> distribution. | |
4399 | (beware the double <SAMP>`.in'</SAMP> in the file name). If the <TT>`po/'</TT> | |
4400 | directory already exists, it will be preserved along with the files | |
4401 | it contains, and only <TT>`Makefile.in.in'</TT> will be overwritten. | |
4402 | ||
4403 | <LI> | |
4404 | ||
4405 | A <TT>`intl/'</TT> directory is created and filled with most of the files | |
4406 | originally in the <TT>`intl/'</TT> directory of the GNU <CODE>gettext</CODE> | |
4407 | distribution. Also, if option <CODE>--force</CODE> (<CODE>-f</CODE>) is given, | |
4408 | the <TT>`intl/'</TT> directory is emptied first. | |
4409 | ||
4410 | </OL> | |
4411 | ||
4412 | <P> | |
4413 | If your site support symbolic links, <CODE>gettextize</CODE> will not | |
4414 | actually copy the files into your package, but establish symbolic | |
4415 | links instead. This avoids duplicating the disk space needed in | |
4416 | all packages. Merely using the <SAMP>`-h'</SAMP> option while creating the | |
4417 | <CODE>tar</CODE> archive of your distribution will resolve each link by an | |
4418 | actual copy in the distribution archive. So, to insist, you really | |
4419 | should use <SAMP>`-h'</SAMP> option with <CODE>tar</CODE> within your <CODE>dist</CODE> | |
4420 | goal of your main <TT>`Makefile.in'</TT>. | |
4421 | ||
4422 | </P> | |
4423 | <P> | |
4424 | It is interesting to understand that most new files for supporting | |
4425 | GNU <CODE>gettext</CODE> facilities in one package go in <TT>`intl/'</TT> | |
4426 | and <TT>`po/'</TT> subdirectories. One distinction between these two | |
4427 | directories is that <TT>`intl/'</TT> is meant to be completely identical | |
4428 | in all packages using GNU <CODE>gettext</CODE>, while all newly created | |
4429 | files, which have to be different, go into <TT>`po/'</TT>. There is a | |
4430 | common <TT>`Makefile.in.in'</TT> in <TT>`po/'</TT>, because the <TT>`po/'</TT> | |
4431 | directory needs its own <TT>`Makefile'</TT>, and it has been designed so | |
4432 | it can be identical in all packages. | |
4433 | ||
4434 | </P> | |
4435 | ||
4436 | ||
4437 | <H2><A NAME="SEC69" HREF="gettext_toc.html#TOC69">Files You Must Create or Alter</A></H2> | |
4438 | ||
4439 | <P> | |
4440 | Besides files which are automatically added through <CODE>gettextize</CODE>, | |
4441 | there are many files needing revision for properly interacting with | |
4442 | GNU <CODE>gettext</CODE>. If you are closely following GNU standards for | |
4443 | Makefile engineering and auto-configuration, the adaptations should | |
4444 | be easier to achieve. Here is a point by point description of the | |
4445 | changes needed in each. | |
4446 | ||
4447 | </P> | |
4448 | <P> | |
4449 | So, here comes a list of files, each one followed by a description of | |
4450 | all alterations it needs. Many examples are taken out from the GNU | |
4451 | <CODE>gettext</CODE> 0.10 distribution itself. You may indeed | |
4452 | refer to the source code of the GNU <CODE>gettext</CODE> package, as it | |
4453 | is intended to be a good example and master implementation for using | |
4454 | its own functionality. | |
4455 | ||
4456 | </P> | |
4457 | ||
4458 | ||
4459 | ||
4460 | <H3><A NAME="SEC70" HREF="gettext_toc.html#TOC70"><TT>`POTFILES'</TT> in <TT>`po/'</TT></A></H3> | |
4461 | ||
4462 | <P> | |
4463 | The <TT>`po/'</TT> directory should receive a file named | |
4464 | <TT>`POTFILES.in'</TT>. This file tells which files, among all program | |
4465 | sources, have marked strings needing translation. Here is an example | |
4466 | of such a file: | |
4467 | ||
4468 | </P> | |
4469 | ||
4470 | <PRE> | |
4471 | # List of source files containing translatable strings. | |
4472 | # Copyright (C) 1995 Free Software Foundation, Inc. | |
4473 | ||
4474 | # Common library files | |
4475 | lib/error.c | |
4476 | lib/getopt.c | |
4477 | lib/xmalloc.c | |
4478 | ||
4479 | # Package source files | |
4480 | src/gettextp.c | |
4481 | src/msgfmt.c | |
4482 | src/xgettext.c | |
4483 | </PRE> | |
4484 | ||
4485 | <P> | |
4486 | Dashed comments and white lines are ignored. All other lines | |
4487 | list those source files containing strings marked for translation | |
4488 | (see section <A HREF="gettext.html#SEC15">How Marks Appears in Sources</A>), in a notation relative to the top level | |
4489 | of your whole distribution, rather than the location of the | |
4490 | <TT>`POTFILES.in'</TT> file itself. | |
4491 | ||
4492 | </P> | |
4493 | ||
4494 | ||
4495 | <H3><A NAME="SEC71" HREF="gettext_toc.html#TOC71"><TT>`configure.in'</TT> at top level</A></H3> | |
4496 | ||
4497 | ||
4498 | <OL> | |
4499 | <LI>Declare the package and version. | |
4500 | ||
4501 | This is done by a set of lines like these: | |
4502 | ||
4503 | ||
4504 | <PRE> | |
4505 | PACKAGE=gettext | |
4506 | VERSION=0.10 | |
4507 | AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") | |
4508 | AC_DEFINE_UNQUOTED(VERSION, "$VERSION") | |
4509 | AC_SUBST(PACKAGE) | |
4510 | AC_SUBST(VERSION) | |
4511 | </PRE> | |
4512 | ||
4513 | Of course, you replace <SAMP>`gettext'</SAMP> with the name of your package, | |
4514 | and <SAMP>`0.10'</SAMP> by its version numbers, exactly as they | |
4515 | should appear in the packaged <CODE>tar</CODE> file name of your distribution | |
4516 | (<TT>`gettext-0.10.tar.gz'</TT>, here). | |
4517 | ||
4518 | <LI>Declare the available translations. | |
4519 | ||
4520 | This is done by defining <CODE>ALL_LINGUAS</CODE> to the white separated, | |
4521 | quoted list of available languages, in a single line, like this: | |
4522 | ||
4523 | ||
4524 | <PRE> | |
4525 | ALL_LINGUAS="de fr" | |
4526 | </PRE> | |
4527 | ||
4528 | This example means that German and French PO files are available, so | |
4529 | that these languages are currently supported by your package. If you | |
4530 | want to further restrict, at installation time, the set of installed | |
4531 | languages, this should not be done by modifying <CODE>ALL_LINGUAS</CODE> in | |
4532 | <TT>`configure.in'</TT>, but rather by using the <CODE>LINGUAS</CODE> environment | |
4533 | variable (see section <A HREF="gettext.html#SEC34">Magic for Installers</A>). | |
4534 | ||
4535 | <LI>Check for internationalization support. | |
4536 | ||
4537 | Here is the main <CODE>m4</CODE> macro for triggering internationalization | |
4538 | support. Just add this line to <TT>`configure.in'</TT>: | |
4539 | ||
4540 | ||
4541 | <PRE> | |
4542 | ud_GNU_GETTEXT | |
4543 | </PRE> | |
4544 | ||
4545 | This call is purposely simple, even if it generates a lot of configure | |
4546 | time checking and actions. | |
4547 | ||
4548 | <LI>Obtain some <TT>`libintl.h'</TT> header file. | |
4549 | ||
4550 | Once you called <CODE>ud_GNU_GETTEXT</CODE> in <TT>`configure.in'</TT>, use: | |
4551 | ||
4552 | ||
4553 | <PRE> | |
4554 | AC_LINK_FILES($nls_cv_header_libgt, $nls_cv_header_intl) | |
4555 | </PRE> | |
4556 | ||
4557 | This will create one header file <TT>`libintl.h'</TT>. The reason for | |
4558 | this has to do with the fact that some systems, using the Uniforum | |
4559 | message handling functions, already have a file of this name. | |
4560 | ||
4561 | The <CODE>AC_LINK_FILES</CODE> call has not been integrated into the | |
4562 | <CODE>ud_GNU_GETTEXT</CODE> macro because there can be only one such call | |
4563 | in a <TT>`configure'</TT> file. If you already use it, you will have to | |
4564 | <EM>merge</EM> the needed <CODE>AC_LINK_FILES</CODE> within yours, by adding | |
4565 | the first argument at the end of the list of your first argument, | |
4566 | and adding the second argument at the end of the list of your second | |
4567 | argument. | |
4568 | ||
4569 | <LI>Have output files created. | |
4570 | ||
4571 | The <CODE>AC_OUTPUT</CODE> directive, at the end of your <TT>`configure.in'</TT> | |
4572 | file, needs to be modified in two ways: | |
4573 | ||
4574 | ||
4575 | <PRE> | |
4576 | AC_OUTPUT([<VAR>existing configuration files</VAR> intl/Makefile po/Makefile.in], | |
4577 | [sed -e "/POTFILES =/r po/POTFILES" po/Makefile.in > po/Makefile | |
4578 | <VAR>existing additional actions</VAR>]) | |
4579 | </PRE> | |
4580 | ||
4581 | The modification to the first argument to <CODE>AC_OUTPUT</CODE> asks | |
4582 | for substitution in the <TT>`intl/'</TT> and <TT>`po/'</TT> directories. | |
4583 | Note the <SAMP>`.in'</SAMP> suffix used for <TT>`po/'</TT> only. This is because | |
4584 | the distributed file is really <TT>`po/Makefile.in.in'</TT>. | |
4585 | ||
4586 | The modification to the second argument ensures that <TT>`po/Makefile'</TT> | |
4587 | gets generated out of the <TT>`po/Makefile.in'</TT> just created, including | |
4588 | in it the <TT>`po/POTFILES'</TT> produced by <CODE>ud_GNU_GETTEXT</CODE>. | |
4589 | Two steps are needed because <TT>`po/POTFILES'</TT> can get lengthy in | |
4590 | some packages, too lengthy in fact for being able to merely use an | |
4591 | Autoconf substituted variable, as many <CODE>sed</CODE>s cannot handle very | |
4592 | long lines. | |
4593 | ||
4594 | </OL> | |
4595 | ||
4596 | ||
4597 | ||
4598 | <H3><A NAME="SEC72" HREF="gettext_toc.html#TOC72"><TT>`aclocal.m4'</TT> at top level</A></H3> | |
4599 | ||
4600 | <P> | |
4601 | If you do not have an <TT>`aclocal.m4'</TT> file in your distribution, | |
4602 | the simplest is taking a copy of <TT>`aclocal.m4'</TT> from | |
4603 | GNU <CODE>gettext</CODE>. But to be precise, you only need macros | |
4604 | <CODE>ud_LC_MESSAGES</CODE>, <CODE>ud_WITH_NLS</CODE> and <CODE>ud_GNU_GETTEXT</CODE>, | |
4605 | so you may use an editor and remove macros you do not need. | |
4606 | ||
4607 | </P> | |
4608 | <P> | |
4609 | If you already have an <TT>`aclocal.m4'</TT> file, then you will have | |
4610 | to merge the said macros into your <TT>`aclocal.m4'</TT>. Note that if | |
4611 | you are upgrading from a previous release of GNU <CODE>gettext</CODE>, you | |
4612 | should most probably <EM>replace</EM> the said macros, as they usually | |
4613 | change a little from one release of GNU <CODE>gettext</CODE> to the next. | |
4614 | Their contents may vary as we get more experience with strange systems | |
4615 | out there. | |
4616 | ||
4617 | </P> | |
4618 | <P> | |
4619 | These macros check for the internationalization support functions | |
4620 | and related informations. Hopefully, once stabilized, these macros | |
4621 | might be integrated in the standard Autoconf set, because this | |
4622 | piece of <CODE>m4</CODE> code will be the same for all projects using GNU | |
4623 | <CODE>gettext</CODE>. | |
4624 | ||
4625 | </P> | |
4626 | ||
4627 | ||
4628 | <H3><A NAME="SEC73" HREF="gettext_toc.html#TOC73"><TT>`acconfig.h'</TT> at top level</A></H3> | |
4629 | ||
4630 | <P> | |
4631 | If you do not have an <TT>`acconfig.h'</TT> file in your distribution, | |
4632 | the simplest is use take a copy of <TT>`acconfig.h'</TT> from | |
4633 | GNU <CODE>gettext</CODE>. But to be precise, you only need the | |
4634 | lines and comments for <CODE>ENABLE_NLS</CODE>, <CODE>HAVE_CATGETS</CODE>, | |
4635 | <CODE>HAVE_GETTEXT</CODE> and <CODE>HAVE_LC_MESSAGES</CODE>, so you may use | |
4636 | an editor and remove everything else. If you already have an | |
4637 | <TT>`acconfig.h'</TT> file, then you should merge the said definitions | |
4638 | into your <TT>`acconfig.h'</TT>. | |
4639 | ||
4640 | </P> | |
4641 | ||
4642 | ||
4643 | <H3><A NAME="SEC74" HREF="gettext_toc.html#TOC74"><TT>`Makefile.in'</TT> at top level</A></H3> | |
4644 | ||
4645 | <P> | |
4646 | Here are a few modifications you need to make to your main, top-level | |
4647 | <TT>`Makefile.in'</TT> file. | |
4648 | ||
4649 | </P> | |
4650 | ||
4651 | <OL> | |
4652 | <LI> | |
4653 | ||
4654 | Add the following lines near the beginning of your <TT>`Makefile.in'</TT>, | |
4655 | so the <SAMP>`dist:'</SAMP> goal will work properly (as explained further down): | |
4656 | ||
4657 | ||
4658 | <PRE> | |
4659 | PACKAGE = @PACKAGE@ | |
4660 | VERSION = @VERSION@ | |
4661 | </PRE> | |
4662 | ||
4663 | <LI> | |
4664 | ||
4665 | Add file <TT>`NLS'</TT> to the <CODE>DISTFILES</CODE> definition, so the file gets | |
4666 | distributed. | |
4667 | ||
4668 | <LI> | |
4669 | ||
4670 | Wherever you process subdirectories in your <TT>`Makefile.in'</TT>, be | |
4671 | sure you also process <CODE>@INTLSUB@</CODE> and <CODE>@POSUB@</CODE>, which | |
4672 | are replaced respectively by <SAMP>`intl'</SAMP> and <SAMP>`po'</SAMP>, or empty | |
4673 | when the configuration processes decides these directories should | |
4674 | not be processed. | |
4675 | ||
4676 | Here is an example of a canonical order of processing. In this | |
4677 | example, we also define <CODE>SUBDIRS</CODE> in <CODE>Makefile.in</CODE> for it | |
4678 | to be further used in the <SAMP>`dist:'</SAMP> goal. | |
4679 | ||
4680 | ||
4681 | <PRE> | |
4682 | SUBDIRS = doc lib @INTLSUB@ src @POSUB@ | |
4683 | </PRE> | |
4684 | ||
4685 | that you will have to adapt to your own package. | |
4686 | ||
4687 | <LI> | |
4688 | ||
4689 | A delicate point is the <SAMP>`dist:'</SAMP> goal, as both | |
4690 | <TT>`intl/Makefile'</TT> and <TT>`po/Makefile'</TT> will later assume that the | |
4691 | proper directory has been set up from the main <TT>`Makefile'</TT>. Here is | |
4692 | an example at what the <SAMP>`dist:'</SAMP> goal might look like: | |
4693 | ||
4694 | ||
4695 | <PRE> | |
4696 | distdir = $(PACKAGE)-$(VERSION) | |
4697 | dist: Makefile | |
4698 | rm -fr $(distdir) | |
4699 | mkdir $(distdir) | |
4700 | chmod 777 $(distdir) | |
4701 | for file in $(DISTFILES); do \ | |
4702 | ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ | |
4703 | done | |
4704 | for subdir in $(SUBDIRS); do \ | |
4705 | mkdir $(distdir)/$$subdir || exit 1; \ | |
4706 | chmod 777 $(distdir)/$$subdir; \ | |
4707 | (cd $$subdir && $(MAKE) $@) || exit 1; \ | |
4708 | done | |
4709 | tar chozf $(distdir).tar.gz $(distdir) | |
4710 | rm -fr $(distdir) | |
4711 | </PRE> | |
4712 | ||
4713 | </OL> | |
4714 | ||
4715 | ||
4716 | ||
4717 | <H3><A NAME="SEC75" HREF="gettext_toc.html#TOC75"><TT>`Makefile.in'</TT> in <TT>`src/'</TT></A></H3> | |
4718 | ||
4719 | <P> | |
4720 | Some of the modifications made in the main <TT>`Makefile.in'</TT> will | |
4721 | also be needed in the <TT>`Makefile.in'</TT> from your package sources, | |
4722 | which we assume here to be in the <TT>`src/'</TT> subdirectory. Here are | |
4723 | all the modifications needed in <TT>`src/Makefile.in'</TT>: | |
4724 | ||
4725 | </P> | |
4726 | ||
4727 | <OL> | |
4728 | <LI> | |
4729 | ||
4730 | In view of the <SAMP>`dist:'</SAMP> goal, you should have these lines near the | |
4731 | beginning of <TT>`src/Makefile.in'</TT>: | |
4732 | ||
4733 | ||
4734 | <PRE> | |
4735 | PACKAGE = @PACKAGE@ | |
4736 | VERSION = @VERSION@ | |
4737 | </PRE> | |
4738 | ||
4739 | <LI> | |
4740 | ||
4741 | If not done already, you should guarantee that <CODE>top_srcdir</CODE> | |
4742 | gets defined. This will serve for <CODE>cpp</CODE> include files. Just add | |
4743 | the line: | |
4744 | ||
4745 | ||
4746 | <PRE> | |
4747 | top_srcdir = @top_srcdir@ | |
4748 | </PRE> | |
4749 | ||
4750 | <LI> | |
4751 | ||
4752 | You might also want to define <CODE>subdir</CODE> as <SAMP>`src'</SAMP>, later | |
4753 | allowing for almost uniform <SAMP>`dist:'</SAMP> goals in all your | |
4754 | <TT>`Makefile.in'</TT>. At list, the <SAMP>`dist:'</SAMP> goal below assume that | |
4755 | you used: | |
4756 | ||
4757 | ||
4758 | <PRE> | |
4759 | subdir = src | |
4760 | </PRE> | |
4761 | ||
4762 | <LI> | |
4763 | ||
4764 | You should ensure that the final linking will use <CODE>@INTLLIBS@</CODE> as | |
4765 | a library. An easy way to achieve this is to manage that it gets into | |
4766 | <CODE>LIBS</CODE>, like this: | |
4767 | ||
4768 | ||
4769 | <PRE> | |
4770 | LIBS = @INTLLIBS@ @LIBS@ | |
4771 | </PRE> | |
4772 | ||
4773 | In most GNU packages one will find a directory <TT>`lib/'</TT> in which a | |
4774 | library containing some helper functions will be build. (You need at | |
4775 | least the few functions which the GNU <CODE>gettext</CODE> Library itself | |
4776 | needs.) However some of the functions in the <TT>`lib/'</TT> also give | |
4777 | messages to the user which of course should be translated, too. Taking | |
4778 | care of this it is not enough to place the support library (say | |
4779 | <TT>`libsupport.a'</TT>) just between the <CODE>@INTLLIBS@</CODE> and | |
4780 | <CODE>@LIBS@</CODE> in the above example. Instead one has to write this: | |
4781 | ||
4782 | ||
4783 | <PRE> | |
4784 | LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@ | |
4785 | </PRE> | |
4786 | ||
4787 | <LI> | |
4788 | ||
4789 | You should also ensure that directory <TT>`intl/'</TT> will be searched for | |
4790 | C preprocessor include files in all circumstances. So, you have to | |
4791 | manage so both <SAMP>`-I../intl'</SAMP> and <SAMP>`-I$(top_srcdir)/intl'</SAMP> will | |
4792 | be given to the C compiler. | |
4793 | ||
4794 | <LI> | |
4795 | ||
4796 | Your <SAMP>`dist:'</SAMP> goal has to conform with others. Here is a | |
4797 | reasonable definition for it: | |
4798 | ||
4799 | ||
4800 | <PRE> | |
4801 | distdir = ../$(PACKAGE)-$(VERSION)/$(subdir) | |
4802 | dist: Makefile $(DISTFILES) | |
4803 | for file in $(DISTFILES); do \ | |
4804 | ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ | |
4805 | done | |
4806 | </PRE> | |
4807 | ||
4808 | </OL> | |
4809 | ||
4810 | ||
4811 | ||
4812 | <H1><A NAME="SEC76" HREF="gettext_toc.html#TOC76">Concluding Remarks</A></H1> | |
4813 | ||
4814 | <P> | |
4815 | We would like to conclude this GNU <CODE>gettext</CODE> manual by presenting | |
4816 | an history of the GNU Translation Project so far. We finally give | |
4817 | a few pointers for those who want to do further research or readings | |
4818 | about Native Language Support matters. | |
4819 | ||
4820 | </P> | |
4821 | ||
4822 | ||
4823 | ||
4824 | <H2><A NAME="SEC77" HREF="gettext_toc.html#TOC77">History of GNU <CODE>gettext</CODE></A></H2> | |
4825 | ||
4826 | <P> | |
4827 | Internationalization concerns and algorithms have been informally | |
4828 | and casually discussed for years in GNU, sometimes around GNU | |
4829 | <CODE>libc</CODE>, maybe around the incoming <CODE>Hurd</CODE>, or otherwise | |
4830 | (nobody clearly remembers). And even then, when the work started for | |
4831 | real, this was somewhat independently of these previous discussions. | |
4832 | ||
4833 | </P> | |
4834 | <P> | |
4835 | This all began in July 1994, when Patrick D'Cruze had the idea and | |
4836 | initiative of internationalizing version 3.9.2 of GNU <CODE>fileutils</CODE>. | |
4837 | He then asked Jim Meyering, the maintainer, how to get those changes | |
4838 | folded into an official release. That first draft was full of | |
4839 | <CODE>#ifdef</CODE>s and somewhat disconcerting, and Jim wanted to find | |
4840 | nicer ways. Patrick and Jim shared some tries and experimentations | |
4841 | in this area. Then, feeling that this might eventually have a deeper | |
4842 | impact on GNU, Jim wanted to know what standards were, and contacted | |
4843 | Richard Stallman, who very quickly and verbally described an overall | |
4844 | design for what was meant to become <CODE>glocale</CODE>, at that time. | |
4845 | ||
4846 | </P> | |
4847 | <P> | |
4848 | Jim implemented <CODE>glocale</CODE> and got a lot of exhausting feedback | |
4849 | from Patrick and Richard, of course, but also from Mitchum DSouza | |
4850 | (who wrote a <CODE>catgets</CODE>-like package), Roland McGrath, maybe David | |
4851 | MacKenzie, Pinard, and Paul Eggert, all pushing and | |
4852 | pulling in various directions, not always compatible, to the extent | |
4853 | that after a couple of test releases, <CODE>glocale</CODE> was torn apart. | |
4854 | ||
4855 | </P> | |
4856 | <P> | |
4857 | While Jim took some distance and time and became dad for a second | |
4858 | time, Roland wanted to get GNU <CODE>libc</CODE> internationalized, and | |
4859 | got Ulrich Drepper involved in that project. Instead of starting | |
4860 | from <CODE>glocale</CODE>, Ulrich rewrote something from scratch, but | |
4861 | more conformant to the set of guidelines who emerged out of the | |
4862 | <CODE>glocale</CODE> effort. Then, Ulrich got people from the previous | |
4863 | forum to involve themselves into this new project, and the switch | |
4864 | from <CODE>glocale</CODE> to what was first named <CODE>msgutils</CODE>, renamed | |
4865 | <CODE>nlsutils</CODE>, and later <CODE>gettext</CODE>, became officially accepted | |
4866 | by Richard in May 1995 or so. | |
4867 | ||
4868 | </P> | |
4869 | <P> | |
4870 | Let's summarize by saying that Ulrich Drepper wrote GNU <CODE>gettext</CODE> | |
4871 | in April 1995. The first official release of the package, including | |
4872 | PO mode, occurred in July 1995, and was numbered 0.7. Other people | |
4873 | contributed to the effort by providing a discussion forum around | |
4874 | Ulrich, writing little pieces of code, or testing. These are quoted | |
4875 | in the <CODE>THANKS</CODE> file which comes with the GNU <CODE>gettext</CODE> | |
4876 | distribution. | |
4877 | ||
4878 | </P> | |
4879 | <P> | |
4880 | While this was being done, adapted half a dozen of | |
4881 | GNU packages to <CODE>glocale</CODE> first, then later to <CODE>gettext</CODE>, | |
4882 | putting them in pretest, so providing along the way an effective | |
4883 | user environment for fine tuning the evolving tools. He also took | |
4884 | the responsibility of organizing and coordinating the GNU Translation | |
4885 | Project. After nearly a year of informal exchanges between people from | |
4886 | many countries, translator teams started to exist in May 1995, through | |
4887 | the creation and support by Patrick D'Cruze of twenty unmoderated | |
4888 | mailing lists for that many native languages, and two moderated | |
4889 | lists: one for reaching all teams at once, the other for reaching | |
4890 | all maintainers of internationalized packages in GNU. | |
4891 | ||
4892 | </P> | |
4893 | <P> | |
4894 | also wrote PO mode in June 1995 with the collaboration | |
4895 | of Greg McGary, as a kind of contribution to Ulrich's package. | |
4896 | He also gave a hand with the GNU <CODE>gettext</CODE> Texinfo manual. | |
4897 | ||
4898 | </P> | |
4899 | ||
4900 | ||
4901 | <H2><A NAME="SEC78" HREF="gettext_toc.html#TOC78">Related Readings</A></H2> | |
4902 | ||
4903 | <P> | |
4904 | Eugene H. Dorr (<TT>`dorre@well.com'</TT>) maintains an interesting | |
4905 | bibliography on internationalization matters, called | |
4906 | <CITE>Internationalization Reference List</CITE>, which is available as: | |
4907 | ||
4908 | <PRE> | |
4909 | ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt | |
4910 | </PRE> | |
4911 | ||
4912 | <P> | |
4913 | Michael Gschwind (<TT>`mike@vlsivie.tuwien.ac.at'</TT>) maintains a | |
4914 | Frequently Asked Questions (FAQ) list, entitled <CITE>Programming for | |
4915 | Internationalisation</CITE>. This FAQ discusses writing programs which | |
4916 | can handle different language conventions, character sets, etc.; | |
4917 | and is applicable to all character set encodings, with particular | |
4918 | emphasis on ISO 8859-1. It is regularly published in Usenet | |
4919 | groups <TT>`comp.unix.questions'</TT>, <TT>`comp.std.internat'</TT>, | |
4920 | <TT>`comp.software.international'</TT>, <TT>`comp.lang.c'</TT>, | |
4921 | <TT>`comp.windows.x'</TT>, <TT>`comp.std.c'</TT>, <TT>`comp.answers'</TT> | |
4922 | and <TT>`news.answers'</TT>. The home location of this document is: | |
4923 | ||
4924 | <PRE> | |
4925 | ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming | |
4926 | </PRE> | |
4927 | ||
4928 | <P> | |
4929 | Patrick D'Cruze (<TT>`pdcruze@li.org'</TT>) wrote a tutorial about NLS | |
4930 | matters, and Jochen Hein (<TT>`Hein@student.tu-clausthal.de'</TT>) took | |
4931 | over the responsibility of maintaining it. It may be found as: | |
4932 | ||
4933 | <PRE> | |
4934 | ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/... | |
4935 | ...locale-tutorial-0.8.txt.gz | |
4936 | </PRE> | |
4937 | ||
4938 | <P> | |
4939 | This site is mirrored in: | |
4940 | ||
4941 | <PRE> | |
4942 | ftp://ftp.ibp.fr/pub/linux/sunsite/ | |
4943 | </PRE> | |
4944 | ||
4945 | <P> | |
4946 | A French version of the same tutorial should be findable at: | |
4947 | ||
4948 | <PRE> | |
4949 | ftp://ftp.ibp.fr/pub/linux/french/docs/ | |
4950 | </PRE> | |
4951 | ||
4952 | <P> | |
4953 | together with French translations of many Linux-related documents. | |
4954 | ||
4955 | </P> | |
4956 | <P><HR><P> | |
4957 | This document was generated on 4 September 1998 using the | |
4958 | <A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A> | |
4959 | translator version 1.51.</P> | |
4960 | </BODY> | |
4961 | </HTML> |