X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/9ffdee8074b1581d2c8ad6e6b16f536a42e42dd1..3d2791f12caee789ac732ac586588dad1fab1947:/docs/html/gettext/gettext_8.html diff --git a/docs/html/gettext/gettext_8.html b/docs/html/gettext/gettext_8.html new file mode 100644 index 0000000000..a028ce9f83 --- /dev/null +++ b/docs/html/gettext/gettext_8.html @@ -0,0 +1,896 @@ + +
+ + +Go to the first, previous, next, last section, table of contents. +
+ + +
+One aim of the current message catalog implementation provided by
+GNU gettext
was to use the systems message catalog handling, if the
+installer wishes to do so. So we perhaps should first take a look at
+the solutions we know about. The people in the POSIX committee does not
+manage to agree on one of the semi-official standards which we'll
+describe below. In fact they couldn't agree on anything, so nothing
+decide only to include an example of an interface. The major Unix vendors
+are split in the usage of the two most important specifications: X/Opens
+catgets vs. Uniforums gettext interface. We'll describe them both and
+later explain our solution of this dilemma.
+
+
catgets
+The catgets
implementation is defined in the X/Open Portability
+Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
+process of creating this standard seemed to be too slow for some of
+the Unix vendors so they created their implementations on preliminary
+versions of the standard. Of course this leads again to problems while
+writing platform independent programs: even the usage of catgets
+does not guarantee a unique interface.
+
+
+Another, personal comment on this that only a bunch of committee members +could have made this interface. They never really tried to program +using this interface. It is a fast, memory-saving implementation, an +user can happily live with it. But programmers hate it (at least me and +some others do...) + +
++But we must not forget one point: after all the trouble with transfering +the rights on Unix(tm) they at last came to X/Open, the very same who +published this specifications. This leads me to making the prediction +that this interface will be in future Unix standards (e.g. Spec1170) and +therefore part of all Unix implementation (implementations, which are +allowed to wear this name). + +
+ + + +
+The interface to the catgets
implementation consists of three
+functions which correspond to those used in file access: catopen
+to open the catalog for using, catgets
for accessing the message
+tables, and catclose
for closing after work is done. Prototypes
+for the functions and the needed definitions are in the
+<nl_types.h>
header file.
+
+
+catopen
is used like in this:
+
+
+nl_catd catd = catopen ("catalog_name", 0); ++ +
+The function takes as the argument the name of the catalog. This usual
+refers to the name of the program or the package. The second parameter
+is not further specified in the standard. I don't even know whether it
+is implemented consistently among various systems. So the common advice
+is to use 0
as the value. The return value is a handle to the
+message catalog, equivalent to handles to file returned by open
.
+
+
+This handle is of course used in the catgets
function which can
+be used like this:
+
+
+char *translation = catgets (catd, set_no, msg_id, "original string"); ++ +
+The first parameter is this catalog descriptor. The second parameter
+specifies the set of messages in this catalog, in which the message
+described by msg_id
is obtained. catgets
therefore uses a
+three-stage addressing:
+
+
+catalog name => set number => message ID => translation ++ +
+The fourth argument is not used to address the translation. It is given
+as a default value in case when one of the addressing stages fail. One
+important thing to remember is that although the return type of catgets
+is char *
the resulting string must not be changed. It
+should better const char *
, but the standard is published in
+1988, one year before ANSI C.
+
+
+The last of these function functions is used and behaves as expected: + +
+ ++catclose (catd); ++ +
+After this no catgets
call using the descriptor is legal anymore.
+
+
catgets
Interface?!
+Now that this descriptions seemed to be really easy where are the
+problem we speak of. In fact the interface could be used in a
+reasonable way, but constructing the message catalogs is a pain. The
+reason for this lies in the third argument of catgets
: the unique
+message ID. This has to be a numeric value for all messages in a single
+set. Perhaps you could imagine the problems keeping such list while
+changing the source code. Add a new message here, remove one there. Of
+course there have been developed a lot of tools helping to organize this
+chaos but one as the other fails in one aspect or the other. We don't
+want to say that the other approach has no problems but they are far
+more easily to manage.
+
+
gettext
+The definition of the gettext
interface comes from a Uniforum
+proposal and it is followed by at least one major Unix vendor
+(Sun) in its last developments. It is not specified in any official
+standard, though.
+
+
+The main points about this solution is that it does not follow the +method of normal file handling (open-use-close) and that it does not +burden the programmer so many task, especially the unique key handling. +Of course here is also a unique key needed, but this key is the +message itself (how long or short it is). See section Comparing the Two Interfaces for a +more detailed comparison of the two methods. + +
+
+The following section contains a rather detailed description of the
+interface. We make it that detailed because this is the interface
+we chose for the GNU gettext
Library. Programmers interested
+in using this library will be interested in this description.
+
+
+The minimal functionality an interface must have is a) to select a +domain the strings are coming from (a single domain for all programs is +not reasonable because its construction and maintenance is difficult, +perhaps impossible) and b) to access a string in a selected domain. + +
+
+This is principally the description of the gettext
interface. It
+has an global domain which unqualified usages reference. Of course this
+domain is selectable by the user.
+
+
+char *textdomain (const char *domain_name); ++ +
+This provides the possibility to change or query the current status of
+the current global domain of the LC_MESSAGE
category. The
+argument is a null-terminated string, whose characters must be legal in
+the use in filenames. If the domain_name argument is NULL
,
+the function return the current value. If no value has been set
+before, the name of the default domain is returned: messages.
+Please note that although the return value of textdomain
is of
+type char *
no changing is allowed. It is also important to know
+that no checks of the availability are made. If the name is not
+available you will see this by the fact that no translations are provided.
+
+
+To use a domain set by textdomain
the function
+
+
+char *gettext (const char *msgid); ++ +
+is to be used. This is the simplest reasonable form one can imagine.
+The translation of the string msgid is returned if it is available
+in the current domain. If not available the argument itself is
+returned. If the argument is NULL
the result is undefined.
+
+
+One things which should come into mind is that no explicit dependency to
+the used domain is given. The current value of the domain for the
+LC_MESSAGES
locale is used. If this changes between two
+executions of the same gettext
call in the program, both calls
+reference a different message catalog.
+
+
+For the easiest case, which is normally used in internationalized
+packages, once at the beginning of execution a call to textdomain
+is issued, setting the domain to a unique name, normally the package
+name. In the following code all strings which have to be translated are
+filtered through the gettext function. That's all, the package speaks
+your language.
+
+
+While this single name domain work good for most applications there
+might be the need to get translations from more than one domain. Of
+course one could switch between different domains with calls to
+textdomain
, but this is really not convenient nor is it fast. A
+possible situation could be one case discussing while this writing: all
+error messages of functions in the set of common used functions should
+go into a separate domain error
. By this mean we would only need
+to translate them once.
+
+
+For this reasons there are two more functions to retrieve strings: + +
+ ++char *dgettext (const char *domain_name, const char *msgid); +char *dcgettext (const char *domain_name, const char *msgid, + int category); ++ +
+Both take an additional argument at the first place, which corresponds
+to the argument of textdomain
. The third argument of
+dcgettext
allows to use another locale but LC_MESSAGES
.
+But I really don't know where this can be useful. If the
+domain_name is NULL
or category has an value beside
+the known ones, the result is undefined. It should also be noted that
+this function is not part of the second known implementation of this
+function family, the one found in Solaris.
+
+
+A second ambiguity can arise by the fact, that perhaps more than one +domain has the same name. This can be solved by specifying where the +needed message catalog files can be found. + +
+ ++char *bindtextdomain (const char *domain_name, + const char *dir_name); ++ +
+Calling this function binds the given domain to a file in the specified
+directory (how this file is determined follows below). Especially a
+file in the systems default place is not favored against the specified
+file anymore (as it would be by solely using textdomain
). A
+NULL
pointer for the dir_name parameter returns the binding
+associated with domain_name. If domain_name itself is
+NULL
nothing happens and a NULL
pointer is returned. Here
+again as for all the other functions is true that none of the return
+value must be changed!
+
+
+It is important to remember that relative path names for the
+dir_name parameter can be trouble. Since the path is always
+computed relative to the current directory different results will be
+achieved when the program executes a chdir
command. Relative
+paths should always be avoided to avoid dependencies and
+unreliabilities.
+
+
+Because many different languages for many different packages have to be
+stored we need some way to add these information to file message catalog
+files. The way usually used in Unix environments is have this encoding
+in the file name. This is also done here. The directory name given in
+bindtextdomain
s second argument (or the default directory),
+followed by the value and name of the locale and the domain name are
+concatenated:
+
+
+dir_name/locale/LC_category/domain_name.mo ++ +
+The default value for dir_name is system specific. For the GNU +library, and for packages adhering to its conventions, it's: + +
+/usr/local/share/locale ++ +
+locale is the value of the locale whose name is this
+LC_category
. For gettext
and dgettext
this
+locale is always LC_MESSAGES
. dcgettext
specifies the
+locale by the third argument.(2) (3)
+
+
+At this point of the discussion we should talk about an advantage of the
+GNU gettext
implementation. Some readers might have pointed out
+that an internationalized program might have a poor performance if some
+string has to be translated in an inner loop. While this is unavoidable
+when the string varies from one run of the loop to the other it is
+simply a waste of time when the string is always the same. Take the
+following example:
+
+
+{ + while (...) + { + puts (gettext ("Hello world")); + } +} ++ +
+When the locale selection does not change between two runs the resulting +string is always the same. One way to use this is: + +
+ ++{ + str = gettext ("Hello world"); + while (...) + { + puts (str); + } +} ++ +
+But this solution is not usable in all situation (e.g. when the locale +selection changes) nor is it good readable. + +
++The GNU C compiler, version 2.7 and above, provide another solution for +this. To describe this we show here some lines of the +`intl/libgettext.h' file. For an explanation of the expression +command block see section `Statements and Declarations in Expressions' in The GNU CC Manual. + +
+ ++# if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7 +extern int _nl_msg_cat_cntr; +# define dcgettext(domainname, msgid, category) \ + (__extension__ \ + ({ \ + char *result; \ + if (__builtin_constant_p (msgid)) \ + { \ + static char *__translation__; \ + static int __catalog_counter__; \ + if (! __translation__ \ + || __catalog_counter__ != _nl_msg_cat_cntr) \ + { \ + __translation__ = \ + dcgettext__ ((domainname), (msgid), (category)); \ + __catalog_counter__ = _nl_msg_cat_cntr; \ + } \ + result = __translation__; \ + } \ + else \ + result = dcgettext__ ((domainname), (msgid), (category)); \ + result; \ + })) +# endif ++ +
+The interesting thing here is the __builtin_constant_p
predicate.
+This is evaluated at compile time and so optimization can take place
+immediately. Here two cases are distinguished: the argument to
+gettext
is not a constant value in which case simply the function
+dcgettext__
is called, the real implementation of the
+dcgettext
function.
+
+
+If the string argument is constant we can reuse the once gained
+translation when the locale selection has not changed. This is exactly
+what is done here. The _nl_msg_cat_cntr
variable is defined in
+the `loadmsgcat.c' which is available in `libintl.a' and is
+changed whenever a new message catalog is loaded.
+
+
+The following discussion is perhaps a little bit colored. As said
+above we implemented GNU gettext
following the Uniforum
+proposal and this surely has its reasons. But it should show how we
+came to this decision.
+
+
+First we take a look at the developing process. When we write an
+application using NLS provided by gettext
we proceed as always.
+Only when we come to a string which might be seen by the users and thus
+has to be translated we use gettext("...")
instead of
+"..."
. At the beginning of each source file (or in a central
+header file) we define
+
+
+#define gettext(String) (String) ++ +
+Even this definition can be avoided when the system supports the
+gettext
function in its C library. When we compile this code the
+result is the same as if no NLS code is used. When you take a look at
+the GNU gettext
code you will see that we use _("...")
+instead of gettext("...")
. This reduces the number of
+additional characters per translatable string to 3 (in words:
+three).
+
+
+When now a production version of the program is needed we simply replace +the definition + +
+ ++#define _(String) (String) ++ +
+by + +
+ ++#include <libintl.h> +#define _(String) gettext (String) ++ +
+Additionally we run the program `xgettext' on all source code file +which contain translatable strings and that's it: we have a running +program which does not depend on translations to be available, but which +can use any that becomes available. + +
+
+The same procedure can be done for the gettext_noop
invocations
+(see section Special Cases of Translatable Strings). First you can define gettext_noop
to a
+no-op macro and later use the definition from `libintl.h'. Because
+this name is not used in Suns implementation of `libintl.h',
+you should consider the following code for your project:
+
+
+#ifdef gettext_noop +# define N_(String) gettext_noop (String) +#else +# define N_(String) (String) +#endif ++ +
+N_
is a short form similar to _
. The `Makefile' in
+the `po/' directory of GNU gettext knows by default both of the
+mentioned short forms so you are invited to follow this proposal for
+your own ease.
+
+
+Now to catgets
. The main problem is the work for the
+programmer. Every time he comes to a translatable string he has to
+define a number (or a symbolic constant) which has also be defined in
+the message catalog file. He also has to take care for duplicate
+entries, duplicate message IDs etc. If he wants to have the same
+quality in the message catalog as the GNU gettext
program
+provides he also has to put the descriptive comments for the strings and
+the location in all source code files in the message catalog. This is
+nearly a Mission: Impossible.
+
+
+But there are also some points people might call advantages speaking for
+catgets
. If you have a single word in a string and this string
+is used in different contexts it is likely that in one or the other
+language the word has different translations. Example:
+
+
+printf ("%s: %d", gettext ("number"), number_of_errors) + +printf ("you should see %d %s", number_count, + number_count == 1 ? gettext ("number") : gettext ("numbers")) ++ +
+Here we have to translate two times the string "number"
. Even
+if you do not speak a language beside English it might be possible to
+recognize that the two words have a different meaning. In German the
+first appearance has to be translated to "Anzahl"
and the second
+to "Zahl"
.
+
+
+Now you can say that this example is really esoteric. And you are +right! This is exactly how we felt about this problem and decide that +it does not weight that much. The solution for the above problem could +be very easy: + +
+ ++printf ("%s %d", gettext ("number:"), number_of_errors) + +printf (number_count == 1 ? gettext ("you should see %d number") + : gettext ("you should see %d numbers"), + number_count) ++ +
+We believe that we can solve all conflicts with this method. If it is +difficult one can also consider changing one of the conflicting string a +little bit. But it is not impossible to overcome. + +
++Translator note: It is perhaps appropriate here to tell those English +speaking programmers that the plural form of a noun cannot be formed by +appending a single `s'. Most other languages use different methods. +Even the above form is not general enough to cope with all languages. +Rafal Maszkowski <rzm@mat.uni.torun.pl> reports: + +
+ +++ ++In Polish we use e.g. plik (file) this way: + +
+1 plik +2,3,4 pliki +5-21 pliko'w +22-24 pliki +25-31 pliko'w ++ ++and so on (o' means 8859-2 oacute which should be rather okreska, +similar to aogonek). +
+A workable approach might be to consider methods like the one used for
+LC_TIME
in the POSIX.2 standard. The value of the
+alt_digits
field can be up to 100 strings which represent the
+numbers 1 to 100. Using this in a situation of an internationalized
+program means that an array of translatable strings should be indexed by
+the number which should represent. A small example:
+
+
+void +print_month_info (int month) +{ + const char *month_pos[12] = + { N_("first"), N_("second"), N_("third"), N_("fourth"), + N_("fifth"), N_("sixth"), N_("seventh"), N_("eighth"), + N_("ninth"), N_("tenth"), N_("eleventh"), N_("twelfth") }; + printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month), + _(month_pos[month])); +} ++ +
+It should be obvious that this method is only reasonable for small +ranges of numbers. + +
+ + + +
+Starting with version 0.9.4 the library libintl.h
should be
+self-contained. I.e., you can use it in your own programs without
+providing additional functions. The `Makefile' will put the header
+and the library in directories selected using the $(prefix)
.
+
+
+One exception of the above is found on HP-UX systems. Here the C library
+does not contain the alloca
function (and the HP compiler does
+not generate it inlined). But it is not intended to rewrite the whole
+library just because of this dumb system. Instead include the
+alloca
function in all package you use the libintl.a
in.
+
+
gettext
grok
+To fully exploit the functionality of the GNU gettext
library it
+is surely helpful to read the source code. But for those who don't want
+to spend that much time in reading the (sometimes complicated) code here
+is a list comments:
+
+
gettext
+function. The method which is presented here only works correctly
+with the GNU implementation of the gettext
functions. It is not
+possible with underlying catgets
functions or gettext
+functions from the systems C library. The exception is of course the
+GNU C Library which uses the GNU gettext
Library for message handling.
+
+In the function dcgettext
at every call the current setting of
+the highest priority environment variable is determined and used.
+Highest priority means here the following list with decreasing
+priority:
+
+
+LANGUAGE
+
+LC_ALL
+
+LC_xxx
, according to selected locale
+
+LANG
+
+LANGUAGE
changes. According
+to the process explained above the new value of this variable is found
+as soon as the dcgettext
function is called. But this also means
+the (perhaps) different message catalog file is loaded. In other
+words: the used language is changed.
+
+But there is one little hook. The code for gcc-2.7.0 and up provides
+some optimization. This optimization normally prevents the calling of
+the dcgettext
function as long as no new catalog is loaded. But
+if dcgettext
is not called the program also cannot find the
+LANGUAGE
variable be changed (see section Optimization of the *gettext functions). A
+solution for this is very easy. Include the following code in the
+language switching function.
+
+
++ /* Change language. */ + setenv ("LANGUAGE", "fr", 1); + + /* Make change known. */ + { + extern int _nl_msg_cat_cntr; + ++_nl_msg_cat_cntr; + } ++ +The variable
_nl_msg_cat_cntr
is defined in `loadmsgcat.c'.
+The programmer will find himself in need for a construct like this only
+when developing programs which do run longer and provide the user to
+select the language at runtime. Non-interactive programs (like all
+these little Unix tools) should never need this.
+
+
+There are two competing methods for language independent messages:
+the X/Open catgets
method, and the Uniforum gettext
+method. The catgets
method indexes messages by integers; the
+gettext
method indexes them by their English translations.
+The catgets
method has been around longer and is supported
+by more vendors. The gettext
method is supported by Sun,
+and it has been heard that the COSE multi-vendor initiative is
+supporting it. Neither method is a POSIX standard; the POSIX.1
+committee had a lot of disagreement in this area.
+
+
+Neither one is in the POSIX standard. There was much disagreement
+in the POSIX.1 committee about using the gettext
routines
+vs. catgets
(XPG). In the end the committee couldn't
+agree on anything, so no messaging system was included as part
+of the standard. I believe the informative annex of the standard
+includes the XPG3 messaging interfaces, "...as an example of
+a messaging system that has been implemented..."
+
+
+They were very careful not to say anywhere that you should use one +set of interfaces over the other. For more on this topic please +see the Programming for Internationalization FAQ. + +
+ + +catgets
+There have been a few discussions of late on the use of
+catgets
as a base. I think it important to present both
+sides of the argument and hence am opting to play devil's advocate
+for a little bit.
+
+
+I'll not deny the fact that catgets
could have been designed
+a lot better. It currently has quite a number of limitations and
+these have already been pointed out.
+
+
+However there is a great deal to be said for consistency and +standardization. A common recurring problem when writing Unix +software is the myriad portability problems across Unix platforms. +It seems as if every Unix vendor had a look at the operating system +and found parts they could improve upon. Undoubtedly, these +modifications are probably innovative and solve real problems. +However, software developers have a hard time keeping up with all +these changes across so many platforms. + +
++And this has prompted the Unix vendors to begin to standardize their +systems. Hence the impetus for Spec1170. Every major Unix vendor +has committed to supporting this standard and every Unix software +developer waits with glee the day they can write software to this +standard and simply recompile (without having to use autoconf) +across different platforms. + +
+
+As I understand it, Spec1170 is roughly based upon version 4 of the
+X/Open Portability Guidelines (XPG4). Because catgets
and
+friends are defined in XPG4, I'm led to believe that catgets
+is a part of Spec1170 and hence will become a standardized component
+of all Unix systems.
+
+
+Now it seems kind of wasteful to me to have two different systems
+installed for accessing message catalogs. If we do want to remedy
+catgets
deficiencies why don't we try to expand catgets
+(in a compatible manner) rather than implement an entirely new system.
+Otherwise, we'll end up with two message catalog access systems installed
+with an operating system - one set of routines for packages using GNU
+gettext
for their internationalization, and another set of routines
+(catgets) for all other software. Bloated?
+
+
+Supposing another catalog access system is implemented. Which do
+we recommend? At least for Linux, we need to attract as many
+software developers as possible. Hence we need to make it as easy
+for them to port their software as possible. Which means supporting
+catgets
. We will be implementing the glocale
code
+within our libc
, but does this mean we also have to incorporate
+another message catalog access scheme within our libc
as well?
+And what about people who are going to be using the glocale
++ non-catgets
routines. When they port their software to
+other platforms, they're now going to have to include the front-end
+(glocale
) code plus the back-end code (the non-catgets
+access routines) with their software instead of just including the
+glocale
code with their software.
+
+
+Message catalog support is however only the tip of the iceberg.
+What about the data for the other locale categories. They also have
+a number of deficiencies. Are we going to abandon them as well and
+develop another duplicate set of routines (should glocale
+expand beyond message catalog support)?
+
+
+Like many parts of Unix that can be improved upon, we're stuck with balancing +compatibility with the past with useful improvements and innovations for +the future. + +
+ + + ++X/Open agreed very late on the standard form so that many +implementations differ from the final form. Both of my system (old +Linux catgets and Ultrix-4) have a strange variation. + +
+
+OK. After incorporating the last changes I have to spend some time on
+making the GNU/Linux libc
gettext
functions. So in future
+Solaris is not the only system having gettext
.
+
+
+
Go to the first, previous, next, last section, table of contents. + +