International Character Support
[previous]
[next]
[table of contents] [index]
MH 6.8 and above have support for "international" characters -- that is,
non-English characters.
This is distinct from the MIME support.
Support is enabled by [LOCALE] configuration option
(see the Section The -help Switches).
For C programmers, here's a typical change that [LOCALE] makes.
This is from the file sbr/gans.c:
#ifdef LOCALE
i = (isalpha(i) && isupper(i)) ? tolower(i) : i;
#else
if (i >= 'A' && i <= 'Z')
i += 'a' - 'A';
#endif
Once you get your POSIX-compliant system set up correctly,
MH and programs it calls will behave more naturally.
For example, when you use the vi editor command for "next word,"
the cursor won't stop in the middle of a word at a non-ASCII character.
Programs like
grep(1)
and
sort(1)
should understand how to handle the characters in your language.
As with all POSIX internationalization, though, the character
support in MH is system-dependent.
Don't expect everything to work perfectly.
And the setup varies from system to system; check your documentation
or ask a local expert.
Various manual pages to try on HP-UX are:
environ(5),
setlocale(3),
and
hpnls(5).
Read
locale(5)
and
setlocale(3)
on SunOS.
For the most complete setup, you should know about the LANG
environment variable.
The full syntax for the value of LANG is:
language[_territory][.codeset][@modifier]
The brackets [] mark optional parts; don't include the brackets
when you set the variable.
An example setting of LANG is:
french_canadian.iso88591@nofold
Language is the only parameter that is (almost) consistent across
platforms; it is used to find the databases for all the locale
categories.
HP-UX uses the full syntax (the "modifier" might even be an HP-UX
addition).
SunOS uses the language as others use the codeset.
SCO always uses the territory as well.
The locale categories are set by the environment variables
LC_COLLATE (string collationi and sorting),
LC_CTYPE (character classification and conversion, such as "is this
character `printable'?"),
LC_MONETARY (monetary formatting),
LC_NUMERIC (for input and output of numbers),
LC_TIME (time conversion),
and LC_MESSAGES (messages to the user -- this isn't on all
platforms).
For much of your email-related work, you may choose to set only
LC_CTYPE.
This won't change the way most tools behave
in ways other than handling characters.
Another advantage of not setting all categories is that incomplete
implementations won't give warning messages when they don't support
a particular setting.
If you're trying to choose a good value for those environment variables
and no one else in your organization has already found good settings,
look in the databases.
The databases are usually located under /usr/lib/locale,
/usr/share/lib/locale or (in the case of SunOS) /etc/locale.
As an example, the Table below has
the settings that Kimmo Suominen (from Finland,
working in New York) uses on different platforms:
Table: Sample LANG settings
Platform Setting Comment
SVR4 finnish --
HP-UX american.iso88591 okay, finnish.iso88591
SunOS iso_8859_1 yes, note the underscores
SCO english_us.88591 has the territory
|