X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/7b28757f5e64bac68d9b159240616fc4d9ad66bc..b65ee6fd83eb16afccc00c9ba66cd0d267fe7508:/docs/html/gettext/gettext_8.html diff --git a/docs/html/gettext/gettext_8.html b/docs/html/gettext/gettext_8.html new file mode 100644 index 0000000000..a028ce9f83 --- /dev/null +++ b/docs/html/gettext/gettext_8.html @@ -0,0 +1,896 @@ + + + + +GNU gettext utilities - The Programmer's View + + + + + + +

Go to the first, previous, next, last section, table of contents. +

+ + +

The Programmer's View

+ +

+One aim of the current message catalog implementation provided by +GNU gettext was to use the systems message catalog handling, if the +installer wishes to do so. So we perhaps should first take a look at +the solutions we know about. The people in the POSIX committee does not +manage to agree on one of the semi-official standards which we'll +describe below. In fact they couldn't agree on anything, so nothing +decide only to include an example of an interface. The major Unix vendors +are split in the usage of the two most important specifications: X/Opens +catgets vs. Uniforums gettext interface. We'll describe them both and +later explain our solution of this dilemma. + +

+ + + +

About `catgets`

+ +

+The catgets implementation is defined in the X/Open Portability +Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the +process of creating this standard seemed to be too slow for some of +the Unix vendors so they created their implementations on preliminary +versions of the standard. Of course this leads again to problems while +writing platform independent programs: even the usage of catgets +does not guarantee a unique interface. + +

+Another, personal comment on this that only a bunch of committee members +could have made this interface. They never really tried to program +using this interface. It is a fast, memory-saving implementation, an +user can happily live with it. But programmers hate it (at least me and +some others do...) + +

+But we must not forget one point: after all the trouble with transfering +the rights on Unix(tm) they at last came to X/Open, the very same who +published this specifications. This leads me to making the prediction +that this interface will be in future Unix standards (e.g. Spec1170) and +therefore part of all Unix implementation (implementations, which are +allowed to wear this name). + +

+ + + +

The Interface

+ +

+The interface to the catgets implementation consists of three +functions which correspond to those used in file access: catopen +to open the catalog for using, catgets for accessing the message +tables, and catclose for closing after work is done. Prototypes +for the functions and the needed definitions are in the +<nl_types.h> header file. + +

+catopen is used like in this: + +

+ +

+nl_catd catd = catopen ("catalog_name", 0);
+

+ +

+The function takes as the argument the name of the catalog. This usual +refers to the name of the program or the package. The second parameter +is not further specified in the standard. I don't even know whether it +is implemented consistently among various systems. So the common advice +is to use 0 as the value. The return value is a handle to the +message catalog, equivalent to handles to file returned by open. + +

+This handle is of course used in the catgets function which can +be used like this: + +

+ +

+char *translation = catgets (catd, set_no, msg_id, "original string");
+

+ +

+The first parameter is this catalog descriptor. The second parameter +specifies the set of messages in this catalog, in which the message +described by msg_id is obtained. catgets therefore uses a +three-stage addressing: + +

+ +

+catalog name => set number => message ID => translation
+

+ +

+The fourth argument is not used to address the translation. It is given +as a default value in case when one of the addressing stages fail. One +important thing to remember is that although the return type of catgets +is char * the resulting string must not be changed. It +should better const char *, but the standard is published in +1988, one year before ANSI C. + +

+The last of these function functions is used and behaves as expected: + +

+ +

+catclose (catd);
+

+ +

+After this no catgets call using the descriptor is legal anymore. + +

+ + +

Problems with the `catgets` Interface?!

+ +

+Now that this descriptions seemed to be really easy where are the +problem we speak of. In fact the interface could be used in a +reasonable way, but constructing the message catalogs is a pain. The +reason for this lies in the third argument of catgets: the unique +message ID. This has to be a numeric value for all messages in a single +set. Perhaps you could imagine the problems keeping such list while +changing the source code. Add a new message here, remove one there. Of +course there have been developed a lot of tools helping to organize this +chaos but one as the other fails in one aspect or the other. We don't +want to say that the other approach has no problems but they are far +more easily to manage. + +

+ + +

About `gettext`

+ +

+The definition of the gettext interface comes from a Uniforum +proposal and it is followed by at least one major Unix vendor +(Sun) in its last developments. It is not specified in any official +standard, though. + +

+The main points about this solution is that it does not follow the +method of normal file handling (open-use-close) and that it does not +burden the programmer so many task, especially the unique key handling. +Of course here is also a unique key needed, but this key is the +message itself (how long or short it is). See section Comparing the Two Interfaces for a +more detailed comparison of the two methods. + +

+The following section contains a rather detailed description of the +interface. We make it that detailed because this is the interface +we chose for the GNU gettext Library. Programmers interested +in using this library will be interested in this description. + +

+ + + +

The Interface

+ +

+The minimal functionality an interface must have is a) to select a +domain the strings are coming from (a single domain for all programs is +not reasonable because its construction and maintenance is difficult, +perhaps impossible) and b) to access a string in a selected domain. + +

+This is principally the description of the gettext interface. It +has an global domain which unqualified usages reference. Of course this +domain is selectable by the user. + +

+ +

+char *textdomain (const char *domain_name);
+

+ +

+This provides the possibility to change or query the current status of +the current global domain of the LC_MESSAGE category. The +argument is a null-terminated string, whose characters must be legal in +the use in filenames. If the domain_name argument is NULL, +the function return the current value. If no value has been set +before, the name of the default domain is returned: messages. +Please note that although the return value of textdomain is of +type char * no changing is allowed. It is also important to know +that no checks of the availability are made. If the name is not +available you will see this by the fact that no translations are provided. + +

+To use a domain set by textdomain the function + +

+ +

+char *gettext (const char *msgid);
+

+ +

+is to be used. This is the simplest reasonable form one can imagine. +The translation of the string msgid is returned if it is available +in the current domain. If not available the argument itself is +returned. If the argument is NULL the result is undefined. + +

+One things which should come into mind is that no explicit dependency to +the used domain is given. The current value of the domain for the +LC_MESSAGES locale is used. If this changes between two +executions of the same gettext call in the program, both calls +reference a different message catalog. + +

+For the easiest case, which is normally used in internationalized +packages, once at the beginning of execution a call to textdomain +is issued, setting the domain to a unique name, normally the package +name. In the following code all strings which have to be translated are +filtered through the gettext function. That's all, the package speaks +your language. + +

+ + +

Solving Ambiguities

+ +

+While this single name domain work good for most applications there +might be the need to get translations from more than one domain. Of +course one could switch between different domains with calls to +textdomain, but this is really not convenient nor is it fast. A +possible situation could be one case discussing while this writing: all +error messages of functions in the set of common used functions should +go into a separate domain error. By this mean we would only need +to translate them once. + +

+For this reasons there are two more functions to retrieve strings: + +

+ +

+char *dgettext (const char *domain_name, const char *msgid);
+char *dcgettext (const char *domain_name, const char *msgid,
+                 int category);
+

+ +

+Both take an additional argument at the first place, which corresponds +to the argument of textdomain. The third argument of +dcgettext allows to use another locale but LC_MESSAGES. +But I really don't know where this can be useful. If the +domain_name is NULL or category has an value beside +the known ones, the result is undefined. It should also be noted that +this function is not part of the second known implementation of this +function family, the one found in Solaris. + +

+A second ambiguity can arise by the fact, that perhaps more than one +domain has the same name. This can be solved by specifying where the +needed message catalog files can be found. + +

+ +

+char *bindtextdomain (const char *domain_name,
+                      const char *dir_name);
+

+ +

+Calling this function binds the given domain to a file in the specified +directory (how this file is determined follows below). Especially a +file in the systems default place is not favored against the specified +file anymore (as it would be by solely using textdomain). A +NULL pointer for the dir_name parameter returns the binding +associated with domain_name. If domain_name itself is +NULL nothing happens and a NULL pointer is returned. Here +again as for all the other functions is true that none of the return +value must be changed! + +

+It is important to remember that relative path names for the +dir_name parameter can be trouble. Since the path is always +computed relative to the current directory different results will be +achieved when the program executes a chdir command. Relative +paths should always be avoided to avoid dependencies and +unreliabilities. + +

+ + +

Locating Message Catalog Files

+ +

+Because many different languages for many different packages have to be +stored we need some way to add these information to file message catalog +files. The way usually used in Unix environments is have this encoding +in the file name. This is also done here. The directory name given in +bindtextdomains second argument (or the default directory), +followed by the value and name of the locale and the domain name are +concatenated: + +

+ +

+dir_name/locale/LC_category/domain_name.mo
+

+ +

+The default value for dir_name is system specific. For the GNU +library, and for packages adhering to its conventions, it's: + +

+/usr/local/share/locale
+

+ +

+locale is the value of the locale whose name is this +LC_category. For gettext and dgettext this +locale is always LC_MESSAGES. dcgettext specifies the +locale by the third argument.(2) (3) + +

+ + +

Optimization of the *gettext functions

+ +

+At this point of the discussion we should talk about an advantage of the +GNU gettext implementation. Some readers might have pointed out +that an internationalized program might have a poor performance if some +string has to be translated in an inner loop. While this is unavoidable +when the string varies from one run of the loop to the other it is +simply a waste of time when the string is always the same. Take the +following example: + +

+ +

+{
+  while (...)
+    {
+      puts (gettext ("Hello world"));
+    }
+}
+

+ +

+When the locale selection does not change between two runs the resulting +string is always the same. One way to use this is: + +

+ +

+{
+  str = gettext ("Hello world");
+  while (...)
+    {
+      puts (str);
+    }
+}
+

+ +

+But this solution is not usable in all situation (e.g. when the locale +selection changes) nor is it good readable. + +

+The GNU C compiler, version 2.7 and above, provide another solution for +this. To describe this we show here some lines of the +`intl/libgettext.h' file. For an explanation of the expression +command block see section `Statements and Declarations in Expressions' in The GNU CC Manual. + +

+ +

+#  if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
+extern int _nl_msg_cat_cntr;
+#   define	dcgettext(domainname, msgid, category)           \
+  (__extension__                                                 \
+   ({                                                            \
+     char *result;                                               \
+     if (__builtin_constant_p (msgid))                           \
+       {                                                         \
+         static char *__translation__;                           \
+         static int __catalog_counter__;                         \
+         if (! __translation__                                   \
+             || __catalog_counter__ != _nl_msg_cat_cntr)         \
+           {                                                     \
+             __translation__ =                                   \
+               dcgettext__ ((domainname), (msgid), (category));  \
+             __catalog_counter__ = _nl_msg_cat_cntr;             \
+           }                                                     \
+         result = __translation__;                               \
+       }                                                         \
+     else                                                        \
+       result = dcgettext__ ((domainname), (msgid), (category)); \
+     result;                                                     \
+    }))
+#  endif
+

+ +

+The interesting thing here is the __builtin_constant_p predicate. +This is evaluated at compile time and so optimization can take place +immediately. Here two cases are distinguished: the argument to +gettext is not a constant value in which case simply the function +dcgettext__ is called, the real implementation of the +dcgettext function. + +

+If the string argument is constant we can reuse the once gained +translation when the locale selection has not changed. This is exactly +what is done here. The _nl_msg_cat_cntr variable is defined in +the `loadmsgcat.c' which is available in `libintl.a' and is +changed whenever a new message catalog is loaded. + +

+ + +

Comparing the Two Interfaces

+ +

+The following discussion is perhaps a little bit colored. As said +above we implemented GNU gettext following the Uniforum +proposal and this surely has its reasons. But it should show how we +came to this decision. + +

+First we take a look at the developing process. When we write an +application using NLS provided by gettext we proceed as always. +Only when we come to a string which might be seen by the users and thus +has to be translated we use gettext("...") instead of +"...". At the beginning of each source file (or in a central +header file) we define + +

+ +

+#define gettext(String) (String)
+

+ +

+Even this definition can be avoided when the system supports the +gettext function in its C library. When we compile this code the +result is the same as if no NLS code is used. When you take a look at +the GNU gettext code you will see that we use _("...") +instead of gettext("..."). This reduces the number of +additional characters per translatable string to 3 (in words: +three). + +

+When now a production version of the program is needed we simply replace +the definition + +

+ +

+#define _(String) (String)
+

+ +

+by + +

+ +

+#include <libintl.h>
+#define _(String) gettext (String)
+

+ +

+Additionally we run the program `xgettext' on all source code file +which contain translatable strings and that's it: we have a running +program which does not depend on translations to be available, but which +can use any that becomes available. + +

+The same procedure can be done for the gettext_noop invocations +(see section Special Cases of Translatable Strings). First you can define gettext_noop to a +no-op macro and later use the definition from `libintl.h'. Because +this name is not used in Suns implementation of `libintl.h', +you should consider the following code for your project: + +

+ +

+#ifdef gettext_noop
+# define N_(String) gettext_noop (String)
+#else
+# define N_(String) (String)
+#endif
+

+ +

+N_ is a short form similar to _. The `Makefile' in +the `po/' directory of GNU gettext knows by default both of the +mentioned short forms so you are invited to follow this proposal for +your own ease. + +

+Now to catgets. The main problem is the work for the +programmer. Every time he comes to a translatable string he has to +define a number (or a symbolic constant) which has also be defined in +the message catalog file. He also has to take care for duplicate +entries, duplicate message IDs etc. If he wants to have the same +quality in the message catalog as the GNU gettext program +provides he also has to put the descriptive comments for the strings and +the location in all source code files in the message catalog. This is +nearly a Mission: Impossible. + +

+But there are also some points people might call advantages speaking for +catgets. If you have a single word in a string and this string +is used in different contexts it is likely that in one or the other +language the word has different translations. Example: + +

+ +

+printf ("%s: %d", gettext ("number"), number_of_errors)
+
+printf ("you should see %d %s", number_count,
+        number_count == 1 ? gettext ("number") : gettext ("numbers"))
+

+ +

+Here we have to translate two times the string "number". Even +if you do not speak a language beside English it might be possible to +recognize that the two words have a different meaning. In German the +first appearance has to be translated to "Anzahl" and the second +to "Zahl". + +

+Now you can say that this example is really esoteric. And you are +right! This is exactly how we felt about this problem and decide that +it does not weight that much. The solution for the above problem could +be very easy: + +

+ +

+printf ("%s %d", gettext ("number:"), number_of_errors)
+
+printf (number_count == 1 ? gettext ("you should see %d number")
+                          : gettext ("you should see %d numbers"),
+        number_count)
+

+ +

+We believe that we can solve all conflicts with this method. If it is +difficult one can also consider changing one of the conflicting string a +little bit. But it is not impossible to overcome. + +

+Translator note: It is perhaps appropriate here to tell those English +speaking programmers that the plural form of a noun cannot be formed by +appending a single `s'. Most other languages use different methods. +Even the above form is not general enough to cope with all languages. +Rafal Maszkowski <rzm@mat.uni.torun.pl> reports: + +

+ +

+
+In Polish we use e.g. plik (file) this way: + +
+1 plik
+2,3,4 pliki
+5-21 pliko'w
+22-24 pliki
+25-31 pliko'w
+
+ +
+and so on (o' means 8859-2 oacute which should be rather okreska, +similar to aogonek). +

+ +

+A workable approach might be to consider methods like the one used for +LC_TIME in the POSIX.2 standard. The value of the +alt_digits field can be up to 100 strings which represent the +numbers 1 to 100. Using this in a situation of an internationalized +program means that an array of translatable strings should be indexed by +the number which should represent. A small example: + +

+ +

+void
+print_month_info (int month)
+{
+  const char *month_pos[12] =
+  { N_("first"), N_("second"), N_("third"),    N_("fourth"),
+    N_("fifth"), N_("sixth"),  N_("seventh"),  N_("eighth"),
+    N_("ninth"), N_("tenth"),  N_("eleventh"), N_("twelfth") };
+  printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month),
+          _(month_pos[month]));
+}
+

+ +

+It should be obvious that this method is only reasonable for small +ranges of numbers. + +

+ + + +

Using libintl.a in own programs

+ +

+Starting with version 0.9.4 the library libintl.h should be +self-contained. I.e., you can use it in your own programs without +providing additional functions. The `Makefile' will put the header +and the library in directories selected using the $(prefix). + +

+One exception of the above is found on HP-UX systems. Here the C library +does not contain the alloca function (and the HP compiler does +not generate it inlined). But it is not intended to rewrite the whole +library just because of this dumb system. Instead include the +alloca function in all package you use the libintl.a in. + +

+ + +

Being a `gettext` grok

+ +

+To fully exploit the functionality of the GNU gettext library it +is surely helpful to read the source code. But for those who don't want +to spend that much time in reading the (sometimes complicated) code here +is a list comments: + +

+ +

Changing the language at runtime + +For interactive programs it might be useful to offer a selection of the +used language at runtime. To understand how to do this one need to know +how the used language is determined while executing the gettext +function. The method which is presented here only works correctly +with the GNU implementation of the gettext functions. It is not +possible with underlying catgets functions or gettext +functions from the systems C library. The exception is of course the +GNU C Library which uses the GNU gettext Library for message handling. + +In the function dcgettext at every call the current setting of +the highest priority environment variable is determined and used. +Highest priority means here the following list with decreasing +priority: + + +
1. LANGUAGE + +
2. LC_ALL + +
3. LC_xxx, according to selected locale + +
4. LANG + +
+ +Afterwards the path is constructed using the found value and the +translation file is loaded if available. + +What is now when the value for, say, LANGUAGE changes. According +to the process explained above the new value of this variable is found +as soon as the dcgettext function is called. But this also means +the (perhaps) different message catalog file is loaded. In other +words: the used language is changed. + +But there is one little hook. The code for gcc-2.7.0 and up provides +some optimization. This optimization normally prevents the calling of +the dcgettext function as long as no new catalog is loaded. But +if dcgettext is not called the program also cannot find the +LANGUAGE variable be changed (see section Optimization of the *gettext functions). A +solution for this is very easy. Include the following code in the +language switching function. + + +
```
+  /* Change language.  */
+  setenv ("LANGUAGE", "fr", 1);
+
+  /* Make change known.  */
+  {
+    extern int  _nl_msg_cat_cntr;
+    ++_nl_msg_cat_cntr;
+  }
+
```
+ +The variable _nl_msg_cat_cntr is defined in `loadmsgcat.c'. +The programmer will find himself in need for a construct like this only +when developing programs which do run longer and provide the user to +select the language at runtime. Non-interactive programs (like all +these little Unix tools) should never need this. + +

+ + + +

Temporary Notes for the Programmers Chapter

+ + + +

Temporary - Two Possible Implementations

+ +

+There are two competing methods for language independent messages: +the X/Open catgets method, and the Uniforum gettext +method. The catgets method indexes messages by integers; the +gettext method indexes them by their English translations. +The catgets method has been around longer and is supported +by more vendors. The gettext method is supported by Sun, +and it has been heard that the COSE multi-vendor initiative is +supporting it. Neither method is a POSIX standard; the POSIX.1 +committee had a lot of disagreement in this area. + +

+Neither one is in the POSIX standard. There was much disagreement +in the POSIX.1 committee about using the gettext routines +vs. catgets (XPG). In the end the committee couldn't +agree on anything, so no messaging system was included as part +of the standard. I believe the informative annex of the standard +includes the XPG3 messaging interfaces, "...as an example of +a messaging system that has been implemented..." + +

+They were very careful not to say anywhere that you should use one +set of interfaces over the other. For more on this topic please +see the Programming for Internationalization FAQ. + +

+ + +

Temporary - About `catgets`

+ +

+There have been a few discussions of late on the use of +catgets as a base. I think it important to present both +sides of the argument and hence am opting to play devil's advocate +for a little bit. + +

+I'll not deny the fact that catgets could have been designed +a lot better. It currently has quite a number of limitations and +these have already been pointed out. + +

+However there is a great deal to be said for consistency and +standardization. A common recurring problem when writing Unix +software is the myriad portability problems across Unix platforms. +It seems as if every Unix vendor had a look at the operating system +and found parts they could improve upon. Undoubtedly, these +modifications are probably innovative and solve real problems. +However, software developers have a hard time keeping up with all +these changes across so many platforms. + +

+And this has prompted the Unix vendors to begin to standardize their +systems. Hence the impetus for Spec1170. Every major Unix vendor +has committed to supporting this standard and every Unix software +developer waits with glee the day they can write software to this +standard and simply recompile (without having to use autoconf) +across different platforms. + +

+As I understand it, Spec1170 is roughly based upon version 4 of the +X/Open Portability Guidelines (XPG4). Because catgets and +friends are defined in XPG4, I'm led to believe that catgets +is a part of Spec1170 and hence will become a standardized component +of all Unix systems. + +

+ + +

Temporary - Why a single implementation

+ +

+Now it seems kind of wasteful to me to have two different systems +installed for accessing message catalogs. If we do want to remedy +catgets deficiencies why don't we try to expand catgets +(in a compatible manner) rather than implement an entirely new system. +Otherwise, we'll end up with two message catalog access systems installed +with an operating system - one set of routines for packages using GNU +gettext for their internationalization, and another set of routines +(catgets) for all other software. Bloated? + +

+Supposing another catalog access system is implemented. Which do +we recommend? At least for Linux, we need to attract as many +software developers as possible. Hence we need to make it as easy +for them to port their software as possible. Which means supporting +catgets. We will be implementing the glocale code +within our libc, but does this mean we also have to incorporate +another message catalog access scheme within our libc as well? +And what about people who are going to be using the glocale ++ non-catgets routines. When they port their software to +other platforms, they're now going to have to include the front-end +(glocale) code plus the back-end code (the non-catgets +access routines) with their software instead of just including the +glocale code with their software. + +

+Message catalog support is however only the tip of the iceberg. +What about the data for the other locale categories. They also have +a number of deficiencies. Are we going to abandon them as well and +develop another duplicate set of routines (should glocale +expand beyond message catalog support)? + +

+Like many parts of Unix that can be improved upon, we're stuck with balancing +compatibility with the past with useful improvements and innovations for +the future. + +

+ + + +

Temporary - Notes

+ +

+X/Open agreed very late on the standard form so that many +implementations differ from the final form. Both of my system (old +Linux catgets and Ultrix-4) have a strange variation. + +

+OK. After incorporating the last changes I have to spend some time on +making the GNU/Linux libc gettext functions. So in future +Solaris is not the only system having gettext. + +

Go to the first, previous, next, last section, table of contents. + +