]>
Commit | Line | Data |
---|---|---|
1 | <HTML> | |
2 | <HEAD> | |
3 | <!-- This HTML file has been created by texi2html 1.54 | |
4 | from gettext.texi on 25 January 1999 --> | |
5 | ||
6 | <TITLE>GNU gettext utilities - The Programmer's View</TITLE> | |
7 | <link href="gettext_9.html" rel=Next> | |
8 | <link href="gettext_7.html" rel=Previous> | |
9 | <link href="gettext_toc.html" rel=ToC> | |
10 | ||
11 | </HEAD> | |
12 | <BODY> | |
13 | <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. | |
14 | <P><HR><P> | |
15 | ||
16 | ||
17 | <H1><A NAME="SEC39" HREF="gettext_toc.html#TOC39">The Programmer's View</A></H1> | |
18 | ||
19 | <P> | |
20 | One aim of the current message catalog implementation provided by | |
21 | GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the | |
22 | installer wishes to do so. So we perhaps should first take a look at | |
23 | the solutions we know about. The people in the POSIX committee does not | |
24 | manage to agree on one of the semi-official standards which we'll | |
25 | describe below. In fact they couldn't agree on anything, so nothing | |
26 | decide only to include an example of an interface. The major Unix vendors | |
27 | are split in the usage of the two most important specifications: X/Opens | |
28 | catgets vs. Uniforums gettext interface. We'll describe them both and | |
29 | later explain our solution of this dilemma. | |
30 | ||
31 | </P> | |
32 | ||
33 | ||
34 | ||
35 | <H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>catgets</CODE></A></H2> | |
36 | ||
37 | <P> | |
38 | The <CODE>catgets</CODE> implementation is defined in the X/Open Portability | |
39 | Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the | |
40 | process of creating this standard seemed to be too slow for some of | |
41 | the Unix vendors so they created their implementations on preliminary | |
42 | versions of the standard. Of course this leads again to problems while | |
43 | writing platform independent programs: even the usage of <CODE>catgets</CODE> | |
44 | does not guarantee a unique interface. | |
45 | ||
46 | </P> | |
47 | <P> | |
48 | Another, personal comment on this that only a bunch of committee members | |
49 | could have made this interface. They never really tried to program | |
50 | using this interface. It is a fast, memory-saving implementation, an | |
51 | user can happily live with it. But programmers hate it (at least me and | |
52 | some others do...) | |
53 | ||
54 | </P> | |
55 | <P> | |
56 | But we must not forget one point: after all the trouble with transfering | |
57 | the rights on Unix(tm) they at last came to X/Open, the very same who | |
58 | published this specifications. This leads me to making the prediction | |
59 | that this interface will be in future Unix standards (e.g. Spec1170) and | |
60 | therefore part of all Unix implementation (implementations, which are | |
61 | <EM>allowed</EM> to wear this name). | |
62 | ||
63 | </P> | |
64 | ||
65 | ||
66 | ||
67 | <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3> | |
68 | ||
69 | <P> | |
70 | The interface to the <CODE>catgets</CODE> implementation consists of three | |
71 | functions which correspond to those used in file access: <CODE>catopen</CODE> | |
72 | to open the catalog for using, <CODE>catgets</CODE> for accessing the message | |
73 | tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes | |
74 | for the functions and the needed definitions are in the | |
75 | <CODE><nl_types.h></CODE> header file. | |
76 | ||
77 | </P> | |
78 | <P> | |
79 | <CODE>catopen</CODE> is used like in this: | |
80 | ||
81 | </P> | |
82 | ||
83 | <PRE> | |
84 | nl_catd catd = catopen ("catalog_name", 0); | |
85 | </PRE> | |
86 | ||
87 | <P> | |
88 | The function takes as the argument the name of the catalog. This usual | |
89 | refers to the name of the program or the package. The second parameter | |
90 | is not further specified in the standard. I don't even know whether it | |
91 | is implemented consistently among various systems. So the common advice | |
92 | is to use <CODE>0</CODE> as the value. The return value is a handle to the | |
93 | message catalog, equivalent to handles to file returned by <CODE>open</CODE>. | |
94 | ||
95 | </P> | |
96 | <P> | |
97 | This handle is of course used in the <CODE>catgets</CODE> function which can | |
98 | be used like this: | |
99 | ||
100 | </P> | |
101 | ||
102 | <PRE> | |
103 | char *translation = catgets (catd, set_no, msg_id, "original string"); | |
104 | </PRE> | |
105 | ||
106 | <P> | |
107 | The first parameter is this catalog descriptor. The second parameter | |
108 | specifies the set of messages in this catalog, in which the message | |
109 | described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a | |
110 | three-stage addressing: | |
111 | ||
112 | </P> | |
113 | ||
114 | <PRE> | |
115 | catalog name => set number => message ID => translation | |
116 | </PRE> | |
117 | ||
118 | <P> | |
119 | The fourth argument is not used to address the translation. It is given | |
120 | as a default value in case when one of the addressing stages fail. One | |
121 | important thing to remember is that although the return type of catgets | |
122 | is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It | |
123 | should better <CODE>const char *</CODE>, but the standard is published in | |
124 | 1988, one year before ANSI C. | |
125 | ||
126 | </P> | |
127 | <P> | |
128 | The last of these function functions is used and behaves as expected: | |
129 | ||
130 | </P> | |
131 | ||
132 | <PRE> | |
133 | catclose (catd); | |
134 | </PRE> | |
135 | ||
136 | <P> | |
137 | After this no <CODE>catgets</CODE> call using the descriptor is legal anymore. | |
138 | ||
139 | </P> | |
140 | ||
141 | ||
142 | <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Problems with the <CODE>catgets</CODE> Interface?!</A></H3> | |
143 | ||
144 | <P> | |
145 | Now that this descriptions seemed to be really easy where are the | |
146 | problem we speak of. In fact the interface could be used in a | |
147 | reasonable way, but constructing the message catalogs is a pain. The | |
148 | reason for this lies in the third argument of <CODE>catgets</CODE>: the unique | |
149 | message ID. This has to be a numeric value for all messages in a single | |
150 | set. Perhaps you could imagine the problems keeping such list while | |
151 | changing the source code. Add a new message here, remove one there. Of | |
152 | course there have been developed a lot of tools helping to organize this | |
153 | chaos but one as the other fails in one aspect or the other. We don't | |
154 | want to say that the other approach has no problems but they are far | |
155 | more easily to manage. | |
156 | ||
157 | </P> | |
158 | ||
159 | ||
160 | <H2><A NAME="SEC43" HREF="gettext_toc.html#TOC43">About <CODE>gettext</CODE></A></H2> | |
161 | ||
162 | <P> | |
163 | The definition of the <CODE>gettext</CODE> interface comes from a Uniforum | |
164 | proposal and it is followed by at least one major Unix vendor | |
165 | (Sun) in its last developments. It is not specified in any official | |
166 | standard, though. | |
167 | ||
168 | </P> | |
169 | <P> | |
170 | The main points about this solution is that it does not follow the | |
171 | method of normal file handling (open-use-close) and that it does not | |
172 | burden the programmer so many task, especially the unique key handling. | |
173 | Of course here is also a unique key needed, but this key is the | |
174 | message itself (how long or short it is). See section <A HREF="gettext_8.html#SEC48">Comparing the Two Interfaces</A> for a | |
175 | more detailed comparison of the two methods. | |
176 | ||
177 | </P> | |
178 | <P> | |
179 | The following section contains a rather detailed description of the | |
180 | interface. We make it that detailed because this is the interface | |
181 | we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested | |
182 | in using this library will be interested in this description. | |
183 | ||
184 | </P> | |
185 | ||
186 | ||
187 | ||
188 | <H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">The Interface</A></H3> | |
189 | ||
190 | <P> | |
191 | The minimal functionality an interface must have is a) to select a | |
192 | domain the strings are coming from (a single domain for all programs is | |
193 | not reasonable because its construction and maintenance is difficult, | |
194 | perhaps impossible) and b) to access a string in a selected domain. | |
195 | ||
196 | </P> | |
197 | <P> | |
198 | This is principally the description of the <CODE>gettext</CODE> interface. It | |
199 | has an global domain which unqualified usages reference. Of course this | |
200 | domain is selectable by the user. | |
201 | ||
202 | </P> | |
203 | ||
204 | <PRE> | |
205 | char *textdomain (const char *domain_name); | |
206 | </PRE> | |
207 | ||
208 | <P> | |
209 | This provides the possibility to change or query the current status of | |
210 | the current global domain of the <CODE>LC_MESSAGE</CODE> category. The | |
211 | argument is a null-terminated string, whose characters must be legal in | |
212 | the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, | |
213 | the function return the current value. If no value has been set | |
214 | before, the name of the default domain is returned: <EM>messages</EM>. | |
215 | Please note that although the return value of <CODE>textdomain</CODE> is of | |
216 | type <CODE>char *</CODE> no changing is allowed. It is also important to know | |
217 | that no checks of the availability are made. If the name is not | |
218 | available you will see this by the fact that no translations are provided. | |
219 | ||
220 | </P> | |
221 | <P> | |
222 | To use a domain set by <CODE>textdomain</CODE> the function | |
223 | ||
224 | </P> | |
225 | ||
226 | <PRE> | |
227 | char *gettext (const char *msgid); | |
228 | </PRE> | |
229 | ||
230 | <P> | |
231 | is to be used. This is the simplest reasonable form one can imagine. | |
232 | The translation of the string <VAR>msgid</VAR> is returned if it is available | |
233 | in the current domain. If not available the argument itself is | |
234 | returned. If the argument is <CODE>NULL</CODE> the result is undefined. | |
235 | ||
236 | </P> | |
237 | <P> | |
238 | One things which should come into mind is that no explicit dependency to | |
239 | the used domain is given. The current value of the domain for the | |
240 | <CODE>LC_MESSAGES</CODE> locale is used. If this changes between two | |
241 | executions of the same <CODE>gettext</CODE> call in the program, both calls | |
242 | reference a different message catalog. | |
243 | ||
244 | </P> | |
245 | <P> | |
246 | For the easiest case, which is normally used in internationalized | |
247 | packages, once at the beginning of execution a call to <CODE>textdomain</CODE> | |
248 | is issued, setting the domain to a unique name, normally the package | |
249 | name. In the following code all strings which have to be translated are | |
250 | filtered through the gettext function. That's all, the package speaks | |
251 | your language. | |
252 | ||
253 | </P> | |
254 | ||
255 | ||
256 | <H3><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Solving Ambiguities</A></H3> | |
257 | ||
258 | <P> | |
259 | While this single name domain work good for most applications there | |
260 | might be the need to get translations from more than one domain. Of | |
261 | course one could switch between different domains with calls to | |
262 | <CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A | |
263 | possible situation could be one case discussing while this writing: all | |
264 | error messages of functions in the set of common used functions should | |
265 | go into a separate domain <CODE>error</CODE>. By this mean we would only need | |
266 | to translate them once. | |
267 | ||
268 | </P> | |
269 | <P> | |
270 | For this reasons there are two more functions to retrieve strings: | |
271 | ||
272 | </P> | |
273 | ||
274 | <PRE> | |
275 | char *dgettext (const char *domain_name, const char *msgid); | |
276 | char *dcgettext (const char *domain_name, const char *msgid, | |
277 | int category); | |
278 | </PRE> | |
279 | ||
280 | <P> | |
281 | Both take an additional argument at the first place, which corresponds | |
282 | to the argument of <CODE>textdomain</CODE>. The third argument of | |
283 | <CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>. | |
284 | But I really don't know where this can be useful. If the | |
285 | <VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside | |
286 | the known ones, the result is undefined. It should also be noted that | |
287 | this function is not part of the second known implementation of this | |
288 | function family, the one found in Solaris. | |
289 | ||
290 | </P> | |
291 | <P> | |
292 | A second ambiguity can arise by the fact, that perhaps more than one | |
293 | domain has the same name. This can be solved by specifying where the | |
294 | needed message catalog files can be found. | |
295 | ||
296 | </P> | |
297 | ||
298 | <PRE> | |
299 | char *bindtextdomain (const char *domain_name, | |
300 | const char *dir_name); | |
301 | </PRE> | |
302 | ||
303 | <P> | |
304 | Calling this function binds the given domain to a file in the specified | |
305 | directory (how this file is determined follows below). Especially a | |
306 | file in the systems default place is not favored against the specified | |
307 | file anymore (as it would be by solely using <CODE>textdomain</CODE>). A | |
308 | <CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding | |
309 | associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is | |
310 | <CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here | |
311 | again as for all the other functions is true that none of the return | |
312 | value must be changed! | |
313 | ||
314 | </P> | |
315 | <P> | |
316 | It is important to remember that relative path names for the | |
317 | <VAR>dir_name</VAR> parameter can be trouble. Since the path is always | |
318 | computed relative to the current directory different results will be | |
319 | achieved when the program executes a <CODE>chdir</CODE> command. Relative | |
320 | paths should always be avoided to avoid dependencies and | |
321 | unreliabilities. | |
322 | ||
323 | </P> | |
324 | ||
325 | ||
326 | <H3><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Locating Message Catalog Files</A></H3> | |
327 | ||
328 | <P> | |
329 | Because many different languages for many different packages have to be | |
330 | stored we need some way to add these information to file message catalog | |
331 | files. The way usually used in Unix environments is have this encoding | |
332 | in the file name. This is also done here. The directory name given in | |
333 | <CODE>bindtextdomain</CODE>s second argument (or the default directory), | |
334 | followed by the value and name of the locale and the domain name are | |
335 | concatenated: | |
336 | ||
337 | </P> | |
338 | ||
339 | <PRE> | |
340 | <VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo | |
341 | </PRE> | |
342 | ||
343 | <P> | |
344 | The default value for <VAR>dir_name</VAR> is system specific. For the GNU | |
345 | library, and for packages adhering to its conventions, it's: | |
346 | ||
347 | <PRE> | |
348 | /usr/local/share/locale | |
349 | </PRE> | |
350 | ||
351 | <P> | |
352 | <VAR>locale</VAR> is the value of the locale whose name is this | |
353 | <CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this | |
354 | locale is always <CODE>LC_MESSAGES</CODE>. <CODE>dcgettext</CODE> specifies the | |
355 | locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> | |
356 | ||
357 | </P> | |
358 | ||
359 | ||
360 | <H3><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Optimization of the *gettext functions</A></H3> | |
361 | ||
362 | <P> | |
363 | At this point of the discussion we should talk about an advantage of the | |
364 | GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out | |
365 | that an internationalized program might have a poor performance if some | |
366 | string has to be translated in an inner loop. While this is unavoidable | |
367 | when the string varies from one run of the loop to the other it is | |
368 | simply a waste of time when the string is always the same. Take the | |
369 | following example: | |
370 | ||
371 | </P> | |
372 | ||
373 | <PRE> | |
374 | { | |
375 | while (...) | |
376 | { | |
377 | puts (gettext ("Hello world")); | |
378 | } | |
379 | } | |
380 | </PRE> | |
381 | ||
382 | <P> | |
383 | When the locale selection does not change between two runs the resulting | |
384 | string is always the same. One way to use this is: | |
385 | ||
386 | </P> | |
387 | ||
388 | <PRE> | |
389 | { | |
390 | str = gettext ("Hello world"); | |
391 | while (...) | |
392 | { | |
393 | puts (str); | |
394 | } | |
395 | } | |
396 | </PRE> | |
397 | ||
398 | <P> | |
399 | But this solution is not usable in all situation (e.g. when the locale | |
400 | selection changes) nor is it good readable. | |
401 | ||
402 | </P> | |
403 | <P> | |
404 | The GNU C compiler, version 2.7 and above, provide another solution for | |
405 | this. To describe this we show here some lines of the | |
406 | <TT>`intl/libgettext.h'</TT> file. For an explanation of the expression | |
407 | command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>. | |
408 | ||
409 | </P> | |
410 | ||
411 | <PRE> | |
412 | # if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7 | |
413 | extern int _nl_msg_cat_cntr; | |
414 | # define dcgettext(domainname, msgid, category) \ | |
415 | (__extension__ \ | |
416 | ({ \ | |
417 | char *result; \ | |
418 | if (__builtin_constant_p (msgid)) \ | |
419 | { \ | |
420 | static char *__translation__; \ | |
421 | static int __catalog_counter__; \ | |
422 | if (! __translation__ \ | |
423 | || __catalog_counter__ != _nl_msg_cat_cntr) \ | |
424 | { \ | |
425 | __translation__ = \ | |
426 | dcgettext__ ((domainname), (msgid), (category)); \ | |
427 | __catalog_counter__ = _nl_msg_cat_cntr; \ | |
428 | } \ | |
429 | result = __translation__; \ | |
430 | } \ | |
431 | else \ | |
432 | result = dcgettext__ ((domainname), (msgid), (category)); \ | |
433 | result; \ | |
434 | })) | |
435 | # endif | |
436 | </PRE> | |
437 | ||
438 | <P> | |
439 | The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate. | |
440 | This is evaluated at compile time and so optimization can take place | |
441 | immediately. Here two cases are distinguished: the argument to | |
442 | <CODE>gettext</CODE> is not a constant value in which case simply the function | |
443 | <CODE>dcgettext__</CODE> is called, the real implementation of the | |
444 | <CODE>dcgettext</CODE> function. | |
445 | ||
446 | </P> | |
447 | <P> | |
448 | If the string argument <EM>is</EM> constant we can reuse the once gained | |
449 | translation when the locale selection has not changed. This is exactly | |
450 | what is done here. The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in | |
451 | the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is | |
452 | changed whenever a new message catalog is loaded. | |
453 | ||
454 | </P> | |
455 | ||
456 | ||
457 | <H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Comparing the Two Interfaces</A></H2> | |
458 | ||
459 | <P> | |
460 | The following discussion is perhaps a little bit colored. As said | |
461 | above we implemented GNU <CODE>gettext</CODE> following the Uniforum | |
462 | proposal and this surely has its reasons. But it should show how we | |
463 | came to this decision. | |
464 | ||
465 | </P> | |
466 | <P> | |
467 | First we take a look at the developing process. When we write an | |
468 | application using NLS provided by <CODE>gettext</CODE> we proceed as always. | |
469 | Only when we come to a string which might be seen by the users and thus | |
470 | has to be translated we use <CODE>gettext("...")</CODE> instead of | |
471 | <CODE>"..."</CODE>. At the beginning of each source file (or in a central | |
472 | header file) we define | |
473 | ||
474 | </P> | |
475 | ||
476 | <PRE> | |
477 | #define gettext(String) (String) | |
478 | </PRE> | |
479 | ||
480 | <P> | |
481 | Even this definition can be avoided when the system supports the | |
482 | <CODE>gettext</CODE> function in its C library. When we compile this code the | |
483 | result is the same as if no NLS code is used. When you take a look at | |
484 | the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> | |
485 | instead of <CODE>gettext("...")</CODE>. This reduces the number of | |
486 | additional characters per translatable string to <EM>3</EM> (in words: | |
487 | three). | |
488 | ||
489 | </P> | |
490 | <P> | |
491 | When now a production version of the program is needed we simply replace | |
492 | the definition | |
493 | ||
494 | </P> | |
495 | ||
496 | <PRE> | |
497 | #define _(String) (String) | |
498 | </PRE> | |
499 | ||
500 | <P> | |
501 | by | |
502 | ||
503 | </P> | |
504 | ||
505 | <PRE> | |
506 | #include <libintl.h> | |
507 | #define _(String) gettext (String) | |
508 | </PRE> | |
509 | ||
510 | <P> | |
511 | Additionally we run the program <TT>`xgettext'</TT> on all source code file | |
512 | which contain translatable strings and that's it: we have a running | |
513 | program which does not depend on translations to be available, but which | |
514 | can use any that becomes available. | |
515 | ||
516 | </P> | |
517 | <P> | |
518 | The same procedure can be done for the <CODE>gettext_noop</CODE> invocations | |
519 | (see section <A HREF="gettext_3.html#SEC18">Special Cases of Translatable Strings</A>). First you can define <CODE>gettext_noop</CODE> to a | |
520 | no-op macro and later use the definition from <TT>`libintl.h'</TT>. Because | |
521 | this name is not used in Suns implementation of <TT>`libintl.h'</TT>, | |
522 | you should consider the following code for your project: | |
523 | ||
524 | </P> | |
525 | ||
526 | <PRE> | |
527 | #ifdef gettext_noop | |
528 | # define N_(String) gettext_noop (String) | |
529 | #else | |
530 | # define N_(String) (String) | |
531 | #endif | |
532 | </PRE> | |
533 | ||
534 | <P> | |
535 | <CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in | |
536 | the <TT>`po/'</TT> directory of GNU gettext knows by default both of the | |
537 | mentioned short forms so you are invited to follow this proposal for | |
538 | your own ease. | |
539 | ||
540 | </P> | |
541 | <P> | |
542 | Now to <CODE>catgets</CODE>. The main problem is the work for the | |
543 | programmer. Every time he comes to a translatable string he has to | |
544 | define a number (or a symbolic constant) which has also be defined in | |
545 | the message catalog file. He also has to take care for duplicate | |
546 | entries, duplicate message IDs etc. If he wants to have the same | |
547 | quality in the message catalog as the GNU <CODE>gettext</CODE> program | |
548 | provides he also has to put the descriptive comments for the strings and | |
549 | the location in all source code files in the message catalog. This is | |
550 | nearly a Mission: Impossible. | |
551 | ||
552 | </P> | |
553 | <P> | |
554 | But there are also some points people might call advantages speaking for | |
555 | <CODE>catgets</CODE>. If you have a single word in a string and this string | |
556 | is used in different contexts it is likely that in one or the other | |
557 | language the word has different translations. Example: | |
558 | ||
559 | </P> | |
560 | ||
561 | <PRE> | |
562 | printf ("%s: %d", gettext ("number"), number_of_errors) | |
563 | ||
564 | printf ("you should see %d %s", number_count, | |
565 | number_count == 1 ? gettext ("number") : gettext ("numbers")) | |
566 | </PRE> | |
567 | ||
568 | <P> | |
569 | Here we have to translate two times the string <CODE>"number"</CODE>. Even | |
570 | if you do not speak a language beside English it might be possible to | |
571 | recognize that the two words have a different meaning. In German the | |
572 | first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second | |
573 | to <CODE>"Zahl"</CODE>. | |
574 | ||
575 | </P> | |
576 | <P> | |
577 | Now you can say that this example is really esoteric. And you are | |
578 | right! This is exactly how we felt about this problem and decide that | |
579 | it does not weight that much. The solution for the above problem could | |
580 | be very easy: | |
581 | ||
582 | </P> | |
583 | ||
584 | <PRE> | |
585 | printf ("%s %d", gettext ("number:"), number_of_errors) | |
586 | ||
587 | printf (number_count == 1 ? gettext ("you should see %d number") | |
588 | : gettext ("you should see %d numbers"), | |
589 | number_count) | |
590 | </PRE> | |
591 | ||
592 | <P> | |
593 | We believe that we can solve all conflicts with this method. If it is | |
594 | difficult one can also consider changing one of the conflicting string a | |
595 | little bit. But it is not impossible to overcome. | |
596 | ||
597 | </P> | |
598 | <P> | |
599 | Translator note: It is perhaps appropriate here to tell those English | |
600 | speaking programmers that the plural form of a noun cannot be formed by | |
601 | appending a single `s'. Most other languages use different methods. | |
602 | Even the above form is not general enough to cope with all languages. | |
603 | Rafal Maszkowski <rzm@mat.uni.torun.pl> reports: | |
604 | ||
605 | </P> | |
606 | ||
607 | <BLOCKQUOTE> | |
608 | <P> | |
609 | In Polish we use e.g. plik (file) this way: | |
610 | ||
611 | <PRE> | |
612 | 1 plik | |
613 | 2,3,4 pliki | |
614 | 5-21 pliko'w | |
615 | 22-24 pliki | |
616 | 25-31 pliko'w | |
617 | </PRE> | |
618 | ||
619 | <P> | |
620 | and so on (o' means 8859-2 oacute which should be rather okreska, | |
621 | similar to aogonek). | |
622 | </BLOCKQUOTE> | |
623 | ||
624 | <P> | |
625 | A workable approach might be to consider methods like the one used for | |
626 | <CODE>LC_TIME</CODE> in the POSIX.2 standard. The value of the | |
627 | <CODE>alt_digits</CODE> field can be up to 100 strings which represent the | |
628 | numbers 1 to 100. Using this in a situation of an internationalized | |
629 | program means that an array of translatable strings should be indexed by | |
630 | the number which should represent. A small example: | |
631 | ||
632 | </P> | |
633 | ||
634 | <PRE> | |
635 | void | |
636 | print_month_info (int month) | |
637 | { | |
638 | const char *month_pos[12] = | |
639 | { N_("first"), N_("second"), N_("third"), N_("fourth"), | |
640 | N_("fifth"), N_("sixth"), N_("seventh"), N_("eighth"), | |
641 | N_("ninth"), N_("tenth"), N_("eleventh"), N_("twelfth") }; | |
642 | printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month), | |
643 | _(month_pos[month])); | |
644 | } | |
645 | </PRE> | |
646 | ||
647 | <P> | |
648 | It should be obvious that this method is only reasonable for small | |
649 | ranges of numbers. | |
650 | ||
651 | </P> | |
652 | ||
653 | ||
654 | ||
655 | <H2><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Using libintl.a in own programs</A></H2> | |
656 | ||
657 | <P> | |
658 | Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be | |
659 | self-contained. I.e., you can use it in your own programs without | |
660 | providing additional functions. The <TT>`Makefile'</TT> will put the header | |
661 | and the library in directories selected using the <CODE>$(prefix)</CODE>. | |
662 | ||
663 | </P> | |
664 | <P> | |
665 | One exception of the above is found on HP-UX systems. Here the C library | |
666 | does not contain the <CODE>alloca</CODE> function (and the HP compiler does | |
667 | not generate it inlined). But it is not intended to rewrite the whole | |
668 | library just because of this dumb system. Instead include the | |
669 | <CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in. | |
670 | ||
671 | </P> | |
672 | ||
673 | ||
674 | <H2><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Being a <CODE>gettext</CODE> grok</A></H2> | |
675 | ||
676 | <P> | |
677 | To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it | |
678 | is surely helpful to read the source code. But for those who don't want | |
679 | to spend that much time in reading the (sometimes complicated) code here | |
680 | is a list comments: | |
681 | ||
682 | </P> | |
683 | ||
684 | <UL> | |
685 | <LI>Changing the language at runtime | |
686 | ||
687 | For interactive programs it might be useful to offer a selection of the | |
688 | used language at runtime. To understand how to do this one need to know | |
689 | how the used language is determined while executing the <CODE>gettext</CODE> | |
690 | function. The method which is presented here only works correctly | |
691 | with the GNU implementation of the <CODE>gettext</CODE> functions. It is not | |
692 | possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE> | |
693 | functions from the systems C library. The exception is of course the | |
694 | GNU C Library which uses the GNU <CODE>gettext</CODE> Library for message handling. | |
695 | ||
696 | In the function <CODE>dcgettext</CODE> at every call the current setting of | |
697 | the highest priority environment variable is determined and used. | |
698 | Highest priority means here the following list with decreasing | |
699 | priority: | |
700 | ||
701 | ||
702 | <OL> | |
703 | <LI><CODE>LANGUAGE</CODE> | |
704 | ||
705 | <LI><CODE>LC_ALL</CODE> | |
706 | ||
707 | <LI><CODE>LC_xxx</CODE>, according to selected locale | |
708 | ||
709 | <LI><CODE>LANG</CODE> | |
710 | ||
711 | </OL> | |
712 | ||
713 | Afterwards the path is constructed using the found value and the | |
714 | translation file is loaded if available. | |
715 | ||
716 | What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According | |
717 | to the process explained above the new value of this variable is found | |
718 | as soon as the <CODE>dcgettext</CODE> function is called. But this also means | |
719 | the (perhaps) different message catalog file is loaded. In other | |
720 | words: the used language is changed. | |
721 | ||
722 | But there is one little hook. The code for gcc-2.7.0 and up provides | |
723 | some optimization. This optimization normally prevents the calling of | |
724 | the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But | |
725 | if <CODE>dcgettext</CODE> is not called the program also cannot find the | |
726 | <CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_8.html#SEC47">Optimization of the *gettext functions</A>). A | |
727 | solution for this is very easy. Include the following code in the | |
728 | language switching function. | |
729 | ||
730 | ||
731 | <PRE> | |
732 | /* Change language. */ | |
733 | setenv ("LANGUAGE", "fr", 1); | |
734 | ||
735 | /* Make change known. */ | |
736 | { | |
737 | extern int _nl_msg_cat_cntr; | |
738 | ++_nl_msg_cat_cntr; | |
739 | } | |
740 | </PRE> | |
741 | ||
742 | The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>. | |
743 | The programmer will find himself in need for a construct like this only | |
744 | when developing programs which do run longer and provide the user to | |
745 | select the language at runtime. Non-interactive programs (like all | |
746 | these little Unix tools) should never need this. | |
747 | ||
748 | </UL> | |
749 | ||
750 | ||
751 | ||
752 | <H2><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary Notes for the Programmers Chapter</A></H2> | |
753 | ||
754 | ||
755 | ||
756 | <H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Two Possible Implementations</A></H3> | |
757 | ||
758 | <P> | |
759 | There are two competing methods for language independent messages: | |
760 | the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> | |
761 | method. The <CODE>catgets</CODE> method indexes messages by integers; the | |
762 | <CODE>gettext</CODE> method indexes them by their English translations. | |
763 | The <CODE>catgets</CODE> method has been around longer and is supported | |
764 | by more vendors. The <CODE>gettext</CODE> method is supported by Sun, | |
765 | and it has been heard that the COSE multi-vendor initiative is | |
766 | supporting it. Neither method is a POSIX standard; the POSIX.1 | |
767 | committee had a lot of disagreement in this area. | |
768 | ||
769 | </P> | |
770 | <P> | |
771 | Neither one is in the POSIX standard. There was much disagreement | |
772 | in the POSIX.1 committee about using the <CODE>gettext</CODE> routines | |
773 | vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't | |
774 | agree on anything, so no messaging system was included as part | |
775 | of the standard. I believe the informative annex of the standard | |
776 | includes the XPG3 messaging interfaces, "...as an example of | |
777 | a messaging system that has been implemented..." | |
778 | ||
779 | </P> | |
780 | <P> | |
781 | They were very careful not to say anywhere that you should use one | |
782 | set of interfaces over the other. For more on this topic please | |
783 | see the Programming for Internationalization FAQ. | |
784 | ||
785 | </P> | |
786 | ||
787 | ||
788 | <H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - About <CODE>catgets</CODE></A></H3> | |
789 | ||
790 | <P> | |
791 | There have been a few discussions of late on the use of | |
792 | <CODE>catgets</CODE> as a base. I think it important to present both | |
793 | sides of the argument and hence am opting to play devil's advocate | |
794 | for a little bit. | |
795 | ||
796 | </P> | |
797 | <P> | |
798 | I'll not deny the fact that <CODE>catgets</CODE> could have been designed | |
799 | a lot better. It currently has quite a number of limitations and | |
800 | these have already been pointed out. | |
801 | ||
802 | </P> | |
803 | <P> | |
804 | However there is a great deal to be said for consistency and | |
805 | standardization. A common recurring problem when writing Unix | |
806 | software is the myriad portability problems across Unix platforms. | |
807 | It seems as if every Unix vendor had a look at the operating system | |
808 | and found parts they could improve upon. Undoubtedly, these | |
809 | modifications are probably innovative and solve real problems. | |
810 | However, software developers have a hard time keeping up with all | |
811 | these changes across so many platforms. | |
812 | ||
813 | </P> | |
814 | <P> | |
815 | And this has prompted the Unix vendors to begin to standardize their | |
816 | systems. Hence the impetus for Spec1170. Every major Unix vendor | |
817 | has committed to supporting this standard and every Unix software | |
818 | developer waits with glee the day they can write software to this | |
819 | standard and simply recompile (without having to use autoconf) | |
820 | across different platforms. | |
821 | ||
822 | </P> | |
823 | <P> | |
824 | As I understand it, Spec1170 is roughly based upon version 4 of the | |
825 | X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and | |
826 | friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> | |
827 | is a part of Spec1170 and hence will become a standardized component | |
828 | of all Unix systems. | |
829 | ||
830 | </P> | |
831 | ||
832 | ||
833 | <H3><A NAME="SEC54" HREF="gettext_toc.html#TOC54">Temporary - Why a single implementation</A></H3> | |
834 | ||
835 | <P> | |
836 | Now it seems kind of wasteful to me to have two different systems | |
837 | installed for accessing message catalogs. If we do want to remedy | |
838 | <CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> | |
839 | (in a compatible manner) rather than implement an entirely new system. | |
840 | Otherwise, we'll end up with two message catalog access systems installed | |
841 | with an operating system - one set of routines for packages using GNU | |
842 | <CODE>gettext</CODE> for their internationalization, and another set of routines | |
843 | (catgets) for all other software. Bloated? | |
844 | ||
845 | </P> | |
846 | <P> | |
847 | Supposing another catalog access system is implemented. Which do | |
848 | we recommend? At least for Linux, we need to attract as many | |
849 | software developers as possible. Hence we need to make it as easy | |
850 | for them to port their software as possible. Which means supporting | |
851 | <CODE>catgets</CODE>. We will be implementing the <CODE>glocale</CODE> code | |
852 | within our <CODE>libc</CODE>, but does this mean we also have to incorporate | |
853 | another message catalog access scheme within our <CODE>libc</CODE> as well? | |
854 | And what about people who are going to be using the <CODE>glocale</CODE> | |
855 | + non-<CODE>catgets</CODE> routines. When they port their software to | |
856 | other platforms, they're now going to have to include the front-end | |
857 | (<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> | |
858 | access routines) with their software instead of just including the | |
859 | <CODE>glocale</CODE> code with their software. | |
860 | ||
861 | </P> | |
862 | <P> | |
863 | Message catalog support is however only the tip of the iceberg. | |
864 | What about the data for the other locale categories. They also have | |
865 | a number of deficiencies. Are we going to abandon them as well and | |
866 | develop another duplicate set of routines (should <CODE>glocale</CODE> | |
867 | expand beyond message catalog support)? | |
868 | ||
869 | </P> | |
870 | <P> | |
871 | Like many parts of Unix that can be improved upon, we're stuck with balancing | |
872 | compatibility with the past with useful improvements and innovations for | |
873 | the future. | |
874 | ||
875 | </P> | |
876 | ||
877 | ||
878 | ||
879 | <H3><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Temporary - Notes</A></H3> | |
880 | ||
881 | <P> | |
882 | X/Open agreed very late on the standard form so that many | |
883 | implementations differ from the final form. Both of my system (old | |
884 | Linux catgets and Ultrix-4) have a strange variation. | |
885 | ||
886 | </P> | |
887 | <P> | |
888 | OK. After incorporating the last changes I have to spend some time on | |
889 | making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future | |
890 | Solaris is not the only system having <CODE>gettext</CODE>. | |
891 | ||
892 | </P> | |
893 | <P><HR><P> | |
894 | <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. | |
895 | </BODY> | |
896 | </HTML> |