X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/fc2171bd4c660b8554dae2a1cbf34ff09f3032a6..066f3611df971be93b2ec46b82c2f05f3ff9a422:/docs/html/gettext/gettext_8.html diff --git a/docs/html/gettext/gettext_8.html b/docs/html/gettext/gettext_8.html deleted file mode 100644 index a028ce9f83..0000000000 --- a/docs/html/gettext/gettext_8.html +++ /dev/null @@ -1,896 +0,0 @@ - - - - -GNU gettext utilities - The Programmer's View - - - - - - -

Go to the first, previous, next, last section, table of contents. -


- - -

The Programmer's View

- -

-One aim of the current message catalog implementation provided by -GNU gettext was to use the systems message catalog handling, if the -installer wishes to do so. So we perhaps should first take a look at -the solutions we know about. The people in the POSIX committee does not -manage to agree on one of the semi-official standards which we'll -describe below. In fact they couldn't agree on anything, so nothing -decide only to include an example of an interface. The major Unix vendors -are split in the usage of the two most important specifications: X/Opens -catgets vs. Uniforums gettext interface. We'll describe them both and -later explain our solution of this dilemma. - -

- - - -

About catgets

- -

-The catgets implementation is defined in the X/Open Portability -Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the -process of creating this standard seemed to be too slow for some of -the Unix vendors so they created their implementations on preliminary -versions of the standard. Of course this leads again to problems while -writing platform independent programs: even the usage of catgets -does not guarantee a unique interface. - -

-

-Another, personal comment on this that only a bunch of committee members -could have made this interface. They never really tried to program -using this interface. It is a fast, memory-saving implementation, an -user can happily live with it. But programmers hate it (at least me and -some others do...) - -

-

-But we must not forget one point: after all the trouble with transfering -the rights on Unix(tm) they at last came to X/Open, the very same who -published this specifications. This leads me to making the prediction -that this interface will be in future Unix standards (e.g. Spec1170) and -therefore part of all Unix implementation (implementations, which are -allowed to wear this name). - -

- - - -

The Interface

- -

-The interface to the catgets implementation consists of three -functions which correspond to those used in file access: catopen -to open the catalog for using, catgets for accessing the message -tables, and catclose for closing after work is done. Prototypes -for the functions and the needed definitions are in the -<nl_types.h> header file. - -

-

-catopen is used like in this: - -

- -
-nl_catd catd = catopen ("catalog_name", 0);
-
- -

-The function takes as the argument the name of the catalog. This usual -refers to the name of the program or the package. The second parameter -is not further specified in the standard. I don't even know whether it -is implemented consistently among various systems. So the common advice -is to use 0 as the value. The return value is a handle to the -message catalog, equivalent to handles to file returned by open. - -

-

-This handle is of course used in the catgets function which can -be used like this: - -

- -
-char *translation = catgets (catd, set_no, msg_id, "original string");
-
- -

-The first parameter is this catalog descriptor. The second parameter -specifies the set of messages in this catalog, in which the message -described by msg_id is obtained. catgets therefore uses a -three-stage addressing: - -

- -
-catalog name => set number => message ID => translation
-
- -

-The fourth argument is not used to address the translation. It is given -as a default value in case when one of the addressing stages fail. One -important thing to remember is that although the return type of catgets -is char * the resulting string must not be changed. It -should better const char *, but the standard is published in -1988, one year before ANSI C. - -

-

-The last of these function functions is used and behaves as expected: - -

- -
-catclose (catd);
-
- -

-After this no catgets call using the descriptor is legal anymore. - -

- - -

Problems with the catgets Interface?!

- -

-Now that this descriptions seemed to be really easy where are the -problem we speak of. In fact the interface could be used in a -reasonable way, but constructing the message catalogs is a pain. The -reason for this lies in the third argument of catgets: the unique -message ID. This has to be a numeric value for all messages in a single -set. Perhaps you could imagine the problems keeping such list while -changing the source code. Add a new message here, remove one there. Of -course there have been developed a lot of tools helping to organize this -chaos but one as the other fails in one aspect or the other. We don't -want to say that the other approach has no problems but they are far -more easily to manage. - -

- - -

About gettext

- -

-The definition of the gettext interface comes from a Uniforum -proposal and it is followed by at least one major Unix vendor -(Sun) in its last developments. It is not specified in any official -standard, though. - -

-

-The main points about this solution is that it does not follow the -method of normal file handling (open-use-close) and that it does not -burden the programmer so many task, especially the unique key handling. -Of course here is also a unique key needed, but this key is the -message itself (how long or short it is). See section Comparing the Two Interfaces for a -more detailed comparison of the two methods. - -

-

-The following section contains a rather detailed description of the -interface. We make it that detailed because this is the interface -we chose for the GNU gettext Library. Programmers interested -in using this library will be interested in this description. - -

- - - -

The Interface

- -

-The minimal functionality an interface must have is a) to select a -domain the strings are coming from (a single domain for all programs is -not reasonable because its construction and maintenance is difficult, -perhaps impossible) and b) to access a string in a selected domain. - -

-

-This is principally the description of the gettext interface. It -has an global domain which unqualified usages reference. Of course this -domain is selectable by the user. - -

- -
-char *textdomain (const char *domain_name);
-
- -

-This provides the possibility to change or query the current status of -the current global domain of the LC_MESSAGE category. The -argument is a null-terminated string, whose characters must be legal in -the use in filenames. If the domain_name argument is NULL, -the function return the current value. If no value has been set -before, the name of the default domain is returned: messages. -Please note that although the return value of textdomain is of -type char * no changing is allowed. It is also important to know -that no checks of the availability are made. If the name is not -available you will see this by the fact that no translations are provided. - -

-

-To use a domain set by textdomain the function - -

- -
-char *gettext (const char *msgid);
-
- -

-is to be used. This is the simplest reasonable form one can imagine. -The translation of the string msgid is returned if it is available -in the current domain. If not available the argument itself is -returned. If the argument is NULL the result is undefined. - -

-

-One things which should come into mind is that no explicit dependency to -the used domain is given. The current value of the domain for the -LC_MESSAGES locale is used. If this changes between two -executions of the same gettext call in the program, both calls -reference a different message catalog. - -

-

-For the easiest case, which is normally used in internationalized -packages, once at the beginning of execution a call to textdomain -is issued, setting the domain to a unique name, normally the package -name. In the following code all strings which have to be translated are -filtered through the gettext function. That's all, the package speaks -your language. - -

- - -

Solving Ambiguities

- -

-While this single name domain work good for most applications there -might be the need to get translations from more than one domain. Of -course one could switch between different domains with calls to -textdomain, but this is really not convenient nor is it fast. A -possible situation could be one case discussing while this writing: all -error messages of functions in the set of common used functions should -go into a separate domain error. By this mean we would only need -to translate them once. - -

-

-For this reasons there are two more functions to retrieve strings: - -

- -
-char *dgettext (const char *domain_name, const char *msgid);
-char *dcgettext (const char *domain_name, const char *msgid,
-                 int category);
-
- -

-Both take an additional argument at the first place, which corresponds -to the argument of textdomain. The third argument of -dcgettext allows to use another locale but LC_MESSAGES. -But I really don't know where this can be useful. If the -domain_name is NULL or category has an value beside -the known ones, the result is undefined. It should also be noted that -this function is not part of the second known implementation of this -function family, the one found in Solaris. - -

-

-A second ambiguity can arise by the fact, that perhaps more than one -domain has the same name. This can be solved by specifying where the -needed message catalog files can be found. - -

- -
-char *bindtextdomain (const char *domain_name,
-                      const char *dir_name);
-
- -

-Calling this function binds the given domain to a file in the specified -directory (how this file is determined follows below). Especially a -file in the systems default place is not favored against the specified -file anymore (as it would be by solely using textdomain). A -NULL pointer for the dir_name parameter returns the binding -associated with domain_name. If domain_name itself is -NULL nothing happens and a NULL pointer is returned. Here -again as for all the other functions is true that none of the return -value must be changed! - -

-

-It is important to remember that relative path names for the -dir_name parameter can be trouble. Since the path is always -computed relative to the current directory different results will be -achieved when the program executes a chdir command. Relative -paths should always be avoided to avoid dependencies and -unreliabilities. - -

- - -

Locating Message Catalog Files

- -

-Because many different languages for many different packages have to be -stored we need some way to add these information to file message catalog -files. The way usually used in Unix environments is have this encoding -in the file name. This is also done here. The directory name given in -bindtextdomains second argument (or the default directory), -followed by the value and name of the locale and the domain name are -concatenated: - -

- -
-dir_name/locale/LC_category/domain_name.mo
-
- -

-The default value for dir_name is system specific. For the GNU -library, and for packages adhering to its conventions, it's: - -

-/usr/local/share/locale
-
- -

-locale is the value of the locale whose name is this -LC_category. For gettext and dgettext this -locale is always LC_MESSAGES. dcgettext specifies the -locale by the third argument.(2) (3) - -

- - -

Optimization of the *gettext functions

- -

-At this point of the discussion we should talk about an advantage of the -GNU gettext implementation. Some readers might have pointed out -that an internationalized program might have a poor performance if some -string has to be translated in an inner loop. While this is unavoidable -when the string varies from one run of the loop to the other it is -simply a waste of time when the string is always the same. Take the -following example: - -

- -
-{
-  while (...)
-    {
-      puts (gettext ("Hello world"));
-    }
-}
-
- -

-When the locale selection does not change between two runs the resulting -string is always the same. One way to use this is: - -

- -
-{
-  str = gettext ("Hello world");
-  while (...)
-    {
-      puts (str);
-    }
-}
-
- -

-But this solution is not usable in all situation (e.g. when the locale -selection changes) nor is it good readable. - -

-

-The GNU C compiler, version 2.7 and above, provide another solution for -this. To describe this we show here some lines of the -`intl/libgettext.h' file. For an explanation of the expression -command block see section `Statements and Declarations in Expressions' in The GNU CC Manual. - -

- -
-#  if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
-extern int _nl_msg_cat_cntr;
-#   define	dcgettext(domainname, msgid, category)           \
-  (__extension__                                                 \
-   ({                                                            \
-     char *result;                                               \
-     if (__builtin_constant_p (msgid))                           \
-       {                                                         \
-         static char *__translation__;                           \
-         static int __catalog_counter__;                         \
-         if (! __translation__                                   \
-             || __catalog_counter__ != _nl_msg_cat_cntr)         \
-           {                                                     \
-             __translation__ =                                   \
-               dcgettext__ ((domainname), (msgid), (category));  \
-             __catalog_counter__ = _nl_msg_cat_cntr;             \
-           }                                                     \
-         result = __translation__;                               \
-       }                                                         \
-     else                                                        \
-       result = dcgettext__ ((domainname), (msgid), (category)); \
-     result;                                                     \
-    }))
-#  endif
-
- -

-The interesting thing here is the __builtin_constant_p predicate. -This is evaluated at compile time and so optimization can take place -immediately. Here two cases are distinguished: the argument to -gettext is not a constant value in which case simply the function -dcgettext__ is called, the real implementation of the -dcgettext function. - -

-

-If the string argument is constant we can reuse the once gained -translation when the locale selection has not changed. This is exactly -what is done here. The _nl_msg_cat_cntr variable is defined in -the `loadmsgcat.c' which is available in `libintl.a' and is -changed whenever a new message catalog is loaded. - -

- - -

Comparing the Two Interfaces

- -

-The following discussion is perhaps a little bit colored. As said -above we implemented GNU gettext following the Uniforum -proposal and this surely has its reasons. But it should show how we -came to this decision. - -

-

-First we take a look at the developing process. When we write an -application using NLS provided by gettext we proceed as always. -Only when we come to a string which might be seen by the users and thus -has to be translated we use gettext("...") instead of -"...". At the beginning of each source file (or in a central -header file) we define - -

- -
-#define gettext(String) (String)
-
- -

-Even this definition can be avoided when the system supports the -gettext function in its C library. When we compile this code the -result is the same as if no NLS code is used. When you take a look at -the GNU gettext code you will see that we use _("...") -instead of gettext("..."). This reduces the number of -additional characters per translatable string to 3 (in words: -three). - -

-

-When now a production version of the program is needed we simply replace -the definition - -

- -
-#define _(String) (String)
-
- -

-by - -

- -
-#include <libintl.h>
-#define _(String) gettext (String)
-
- -

-Additionally we run the program `xgettext' on all source code file -which contain translatable strings and that's it: we have a running -program which does not depend on translations to be available, but which -can use any that becomes available. - -

-

-The same procedure can be done for the gettext_noop invocations -(see section Special Cases of Translatable Strings). First you can define gettext_noop to a -no-op macro and later use the definition from `libintl.h'. Because -this name is not used in Suns implementation of `libintl.h', -you should consider the following code for your project: - -

- -
-#ifdef gettext_noop
-# define N_(String) gettext_noop (String)
-#else
-# define N_(String) (String)
-#endif
-
- -

-N_ is a short form similar to _. The `Makefile' in -the `po/' directory of GNU gettext knows by default both of the -mentioned short forms so you are invited to follow this proposal for -your own ease. - -

-

-Now to catgets. The main problem is the work for the -programmer. Every time he comes to a translatable string he has to -define a number (or a symbolic constant) which has also be defined in -the message catalog file. He also has to take care for duplicate -entries, duplicate message IDs etc. If he wants to have the same -quality in the message catalog as the GNU gettext program -provides he also has to put the descriptive comments for the strings and -the location in all source code files in the message catalog. This is -nearly a Mission: Impossible. - -

-

-But there are also some points people might call advantages speaking for -catgets. If you have a single word in a string and this string -is used in different contexts it is likely that in one or the other -language the word has different translations. Example: - -

- -
-printf ("%s: %d", gettext ("number"), number_of_errors)
-
-printf ("you should see %d %s", number_count,
-        number_count == 1 ? gettext ("number") : gettext ("numbers"))
-
- -

-Here we have to translate two times the string "number". Even -if you do not speak a language beside English it might be possible to -recognize that the two words have a different meaning. In German the -first appearance has to be translated to "Anzahl" and the second -to "Zahl". - -

-

-Now you can say that this example is really esoteric. And you are -right! This is exactly how we felt about this problem and decide that -it does not weight that much. The solution for the above problem could -be very easy: - -

- -
-printf ("%s %d", gettext ("number:"), number_of_errors)
-
-printf (number_count == 1 ? gettext ("you should see %d number")
-                          : gettext ("you should see %d numbers"),
-        number_count)
-
- -

-We believe that we can solve all conflicts with this method. If it is -difficult one can also consider changing one of the conflicting string a -little bit. But it is not impossible to overcome. - -

-

-Translator note: It is perhaps appropriate here to tell those English -speaking programmers that the plural form of a noun cannot be formed by -appending a single `s'. Most other languages use different methods. -Even the above form is not general enough to cope with all languages. -Rafal Maszkowski <rzm@mat.uni.torun.pl> reports: - -

- -
-

-In Polish we use e.g. plik (file) this way: - -

-1 plik
-2,3,4 pliki
-5-21 pliko'w
-22-24 pliki
-25-31 pliko'w
-
- -

-and so on (o' means 8859-2 oacute which should be rather okreska, -similar to aogonek). -

- -

-A workable approach might be to consider methods like the one used for -LC_TIME in the POSIX.2 standard. The value of the -alt_digits field can be up to 100 strings which represent the -numbers 1 to 100. Using this in a situation of an internationalized -program means that an array of translatable strings should be indexed by -the number which should represent. A small example: - -

- -
-void
-print_month_info (int month)
-{
-  const char *month_pos[12] =
-  { N_("first"), N_("second"), N_("third"),    N_("fourth"),
-    N_("fifth"), N_("sixth"),  N_("seventh"),  N_("eighth"),
-    N_("ninth"), N_("tenth"),  N_("eleventh"), N_("twelfth") };
-  printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month),
-          _(month_pos[month]));
-}
-
- -

-It should be obvious that this method is only reasonable for small -ranges of numbers. - -

- - - -

Using libintl.a in own programs

- -

-Starting with version 0.9.4 the library libintl.h should be -self-contained. I.e., you can use it in your own programs without -providing additional functions. The `Makefile' will put the header -and the library in directories selected using the $(prefix). - -

-

-One exception of the above is found on HP-UX systems. Here the C library -does not contain the alloca function (and the HP compiler does -not generate it inlined). But it is not intended to rewrite the whole -library just because of this dumb system. Instead include the -alloca function in all package you use the libintl.a in. - -

- - -

Being a gettext grok

- -

-To fully exploit the functionality of the GNU gettext library it -is surely helpful to read the source code. But for those who don't want -to spend that much time in reading the (sometimes complicated) code here -is a list comments: - -

- - - - - -

Temporary Notes for the Programmers Chapter

- - - -

Temporary - Two Possible Implementations

- -

-There are two competing methods for language independent messages: -the X/Open catgets method, and the Uniforum gettext -method. The catgets method indexes messages by integers; the -gettext method indexes them by their English translations. -The catgets method has been around longer and is supported -by more vendors. The gettext method is supported by Sun, -and it has been heard that the COSE multi-vendor initiative is -supporting it. Neither method is a POSIX standard; the POSIX.1 -committee had a lot of disagreement in this area. - -

-

-Neither one is in the POSIX standard. There was much disagreement -in the POSIX.1 committee about using the gettext routines -vs. catgets (XPG). In the end the committee couldn't -agree on anything, so no messaging system was included as part -of the standard. I believe the informative annex of the standard -includes the XPG3 messaging interfaces, "...as an example of -a messaging system that has been implemented..." - -

-

-They were very careful not to say anywhere that you should use one -set of interfaces over the other. For more on this topic please -see the Programming for Internationalization FAQ. - -

- - -

Temporary - About catgets

- -

-There have been a few discussions of late on the use of -catgets as a base. I think it important to present both -sides of the argument and hence am opting to play devil's advocate -for a little bit. - -

-

-I'll not deny the fact that catgets could have been designed -a lot better. It currently has quite a number of limitations and -these have already been pointed out. - -

-

-However there is a great deal to be said for consistency and -standardization. A common recurring problem when writing Unix -software is the myriad portability problems across Unix platforms. -It seems as if every Unix vendor had a look at the operating system -and found parts they could improve upon. Undoubtedly, these -modifications are probably innovative and solve real problems. -However, software developers have a hard time keeping up with all -these changes across so many platforms. - -

-

-And this has prompted the Unix vendors to begin to standardize their -systems. Hence the impetus for Spec1170. Every major Unix vendor -has committed to supporting this standard and every Unix software -developer waits with glee the day they can write software to this -standard and simply recompile (without having to use autoconf) -across different platforms. - -

-

-As I understand it, Spec1170 is roughly based upon version 4 of the -X/Open Portability Guidelines (XPG4). Because catgets and -friends are defined in XPG4, I'm led to believe that catgets -is a part of Spec1170 and hence will become a standardized component -of all Unix systems. - -

- - -

Temporary - Why a single implementation

- -

-Now it seems kind of wasteful to me to have two different systems -installed for accessing message catalogs. If we do want to remedy -catgets deficiencies why don't we try to expand catgets -(in a compatible manner) rather than implement an entirely new system. -Otherwise, we'll end up with two message catalog access systems installed -with an operating system - one set of routines for packages using GNU -gettext for their internationalization, and another set of routines -(catgets) for all other software. Bloated? - -

-

-Supposing another catalog access system is implemented. Which do -we recommend? At least for Linux, we need to attract as many -software developers as possible. Hence we need to make it as easy -for them to port their software as possible. Which means supporting -catgets. We will be implementing the glocale code -within our libc, but does this mean we also have to incorporate -another message catalog access scheme within our libc as well? -And what about people who are going to be using the glocale -+ non-catgets routines. When they port their software to -other platforms, they're now going to have to include the front-end -(glocale) code plus the back-end code (the non-catgets -access routines) with their software instead of just including the -glocale code with their software. - -

-

-Message catalog support is however only the tip of the iceberg. -What about the data for the other locale categories. They also have -a number of deficiencies. Are we going to abandon them as well and -develop another duplicate set of routines (should glocale -expand beyond message catalog support)? - -

-

-Like many parts of Unix that can be improved upon, we're stuck with balancing -compatibility with the past with useful improvements and innovations for -the future. - -

- - - -

Temporary - Notes

- -

-X/Open agreed very late on the standard form so that many -implementations differ from the final form. Both of my system (old -Linux catgets and Ultrix-4) have a strange variation. - -

-

-OK. After incorporating the last changes I have to spend some time on -making the GNU/Linux libc gettext functions. So in future -Solaris is not the only system having gettext. - -

-


-

Go to the first, previous, next, last section, table of contents. - -