Internet DRAFT - draft-alvestrand-lang-tags-v2
draft-alvestrand-lang-tags-v2
Internet-Draft H. Alvestrand
draft-alvestrand-lang-tags-v2-01.txt EDB Maxware
Target Category: Standards Track
March 2000
Obsoletes: RFC 1766 Expires: September 2000
Tags for the Identification of Languages
Status of this Memo
The file name of this memo is draft-alvestrand-lang-tags-v2-01.txt
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Comments on this draft should be sent to the mailing list <ietf-
languages@iana.org>
Abstract
This document describes a language tag for use in cases where it is
desired to indicate the language used in an information object.
It also defines a "Content-language:" header, for use in the case where
one desires to indicate the language of something that has RFC-822-like
headers, like MIME body parts or Web documents, and a new parameter to
the Multipart/Alternative type, to aid in the usage of the Content-
Language: header.
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
1. Introduction
There are a number of languages presently or previously spoken by human
beings in this world.
A great number of these people would prefer to have information
presented in a language which they understand.
In some contexts, it is possible to have information in more than one
language, or it might be possible to provide tools for assisting in the
understanding of a language (such as dictionaries).
A prerequisite for any such function is a means of labelling the
information content with an identifier for the language that is used in
this information content.
This document specifies an identifier mechanism, and one possible use
for it.
2. The Language tag
The language tag is composed of one or more parts: A primary language
tag and a (possibly empty) series of subtags.
The syntax of this tag in RFC-822 EBNF is:
Language-Tag = Primary-tag *( "-" Subtag )
Primary-tag = 1*8ALPHA
Subtag = 1*8ALPHA
Whitespace is not allowed within the tag.
All tags are to be treated as case insensitive; there exist conventions
for capitalization of some of them, but these should not be taken to
carry meaning. For instance, ISO 3166 recommends that country codes are
capitalized (MN Mongolia), while ISO 639 recommends that language codes
are written in lower case (mn Mongolian).
The namespace of language tags is administered by the IANA according to
the rules in section 5 of this document.
The following registrations are predefined:
In the primary language tag:
- All 2-letter tags are interpreted according to ISO standard 639,
"Code for the representation of names of languages" [ISO 639].
draft-alvestrand-lang-tags-v2-01.txt [Page 2]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
- All 3-letter tags are interpreted according to ISO 639 part 2, "Codes
for the representation of names of languages -- Part 2: Alpha-3 code
[ISO 639-2]
- The value "i" is reserved for IANA-defined registrations
- The value "x" is reserved for private use. Subtags of "x"will not be
registered by the IANA.
- Other values shall not be assigned except by revisions of this
standard.
The reason for reserving all other tags is to be open towards new
revisions of ISO 639; the use of "i" and "x" is the minimum we can do
here to be able to extend the mechanism to meet our immediate
requirements.
In the first subtag:
- All 2-letter codes are interpreted as ISO 3166 alpha-2 country codes
denoting the area in which the language is used.
- Codes of 3 to 8 letters may be registered with the IANA, according to
the rules in chapter 5 of this document.
The information in the subtag may for instance be:
- Country identification, such as en-US (this usage is described in ISO
639)
- Dialect or variant information, such as no-nyn (nynorsk) or en-scouse
- Languages not listed in ISO 639 that are not variants of any listed
language, which can be registered with the i-prefix, such as i-
cherokee
- Script variations, such as az-arabic and az-cyrillic
In the second and subsequent subtag, any value can be registered.
ISO 639 defines a registration authority for additions to and changes
in the list of languages in ISO 639. This authority is:
International Information Centre for Terminology (Infoterm)
P.O. Box 130
A-1021 Wien
Austria
Phone: +43 1 26 75 35 Ext. 312
Fax: +43 1 216 32 72
The following codes have been added in 1989: ug (Uigur), iu (Inuktitut,
also called Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi
(Yiddish, replacing ji), and id (Indonesian, replacing in).
In 1998, the following codes were added: se (Sami), kw (Cornish), gv
(Max Gaelic) and lb (Luxembourgish).
draft-alvestrand-lang-tags-v2-01.txt [Page 3]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
ISO 639-2 defines a registration authority for additions to and changes
in the list of languages in ISO 639-2. This authority is:
Library of Congress
(c/o Network Development and MARC Standards Office).
Washington, D.C. 20540
USA
Phone: +1 [to be supplied]
Fax: +1 [to be supplied]
The registration agency for ISO 3166 (country codes) is:
ISO 3166 Maintenance Agency Secretariat
c/o DIN Deutches Institut fuer Normung
Burggrafenstrasse 6
Postfach 1107
D-10787 Berlin
Germany
Phone: +49 30 26 01 320
Fax: +49 30 26 01 231
The country codes AA, QM-QZ, XA-XZ and ZZ are reserved by ISO 3166 as
user-assigned codes.
@
2.1 Choice of language tag
One may occasionally be faced with several possible tags for the same
body of text.
Interoperability is best served if all users send the same tag, and use
the same tag for the same language for all documents; therefore, the
following guideline is recommended:
1. Use the most precise tagging that can be ascertained.
2. When a language has both an ISO 639-1 2-character tag and an ISO 639-
2 3-character tag, use the ISO 639-1 2-character tag.
3. When a language has both an ISO 639-2/T (Terminology) tag and an ISO
639-2/B (Bibliographic) tag, and these differ, use the Terminology
tag. (NOTE: At present, all languages for which there is a difference
have 2-character tags. So this situation will hopefully not arise.)
(The choice is arbitrary
4. When a language has both an IANA-registered tag (i-something) and an
ISO registered tag, use the ISO tag.
5. Do NOT use the UND (Undetermined) tag unless the protocol in use
forces you to give a value for the language tag, even if the language
is unknown. Omitting the tag is preferred.
6. Do NOT use the MUL (Multiple) tag if the protocol allows you to use
multiple languages, as is the case for the Content-Language: header.
draft-alvestrand-lang-tags-v2-01.txt [Page 4]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
2.2 Meaning of the language tag
The language tag always defines a language as spoken (or written) by
human beings for communication of information to other human beings.
Computer languages such as programming languages are explicitly
excluded.
There is no guaranteed relationship between languages whose tags begin
with the same series of subtags; specifically, they are NOT guraranteed
to be mutually intelligible, although this will sometimes be the case.
Applications should always treat a language tag as a single token; the
division into main tag and subtags is an administrative mechanism, not
a navigation aid.
The relationship between the tag and the information it relates to is
defined by the standard describing the context in which it appears.
Accordingly, this section can only give possible examples of its usage.
- For a single information object, it should be taken as the set of
languages that is required for a complete comprehension of the
complete object.
Example: Plain text documents.
- For an aggregation of information objects, it should be taken as the
set of languages used inside components of that aggregation.
Examples: Document stores and libraries.
- For information objects whose purpose is to provide alternatives, it
should be regarded as a hint that the material inside is provided in
several languages, and that one has to inspect each of the
alternatives in order to find its language or languages. In this
case, multiple languages need not mean that one needs to be
multilingual to get complete understanding of the document.
Example: MIME multipart/alternative.
- In markup languages, such as HTML, it is possible to define a
construct embedding a language tag to indicate that contained text is
written in this language, such that one could write <DIV
lang="FR">C'est la vie</DIV> inside a Norwegian document; the
Norwegian-speaking user could then access a French-Norwegian
dictionary to find out what the marked section meant.
2.3 Language-range
Since the writing of RFC 1766, it has become apparent that there is a
need to define a term for a set of languages that share some common
property. The following definition of language-range is derived from
RFC 2068 (HTTP/1.1).
language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )
draft-alvestrand-lang-tags-v2-01.txt [Page 5]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
A language-range matches a language-tag if it exactly equals the tag,
or if it exactly equals a prefix of the tag such that the first tag
character following the prefix is "-".
The special range "*" matches any tag. A protocol which uses language
ranges may specify additional rules about the semantics of "*"; for
instance, HTTP/1.1 specifies that it only matches languages not matched
by any other range within an "Accept-Language:" header.
NOTE: This use of a prefix matching rule does not imply that language
tags are assigned to languages in such a way that it is always true
that if a user understands a language with a certain tag, then this
user will also understand all languages with tags for which this tag is
a prefix. The prefix rule simply allows the use of prefix tags if this
is the case.
3. The Content-language header
The "Content-Language" header is intended for use in the case where one
desires to indicate the language(s) of something that has RFC-822-like
headers, such as MIME body parts or Web documents.
The RFC-822 EBNF of the Content-Language header is:
Content-Language = "Content-Language" ":" 1#Language-tag
Note that the Content-Language header may list several languages in a
comma-separated list.
Whitespace is allowed, which means also that one can place
parenthesized comments anywhere in the language sequence.
3.1 Examples of Content-language values
Norwegian official document, with parallel text in both official
versions of Norwegian. (Both versions are readable by all Norwegians).
Content-Type: multipart/alternative;
differences=content-language
Content-Language: no-nyn, no-bok
Voice recording from Liverpool downtown
Content-type: audio/basic
Content-Language: en-scouse
Document in Mingo, an American Indian language which does not have an
ISO 639 code:
Content-type: text/plain
draft-alvestrand-lang-tags-v2-01.txt [Page 6]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
Content-Language: i-mingo
An English-French dictionary
Content-type: application/dictionary
Content-Language: en, fr (This is a dictionary)
An official European Commission document (in a few of its official
languages)
Content-type: multipart/alternative
Content-Language: da, de, el, en, fr, it
An excerpt from Star Trek
Content-type: video/mpeg
Content-Language: i-klingon
(All the tags used in these examples were registered with IANA after
the publication of RFC 1766)
4. IANA registration procedure for language tags
Any language tag shall begin with an existing tag, and extend it.
The registration form given here must be used by anyone who wants to
use a language tag not defined by ISO or IANA.
----------------------------------------------------------------------
LANGUAGE TAG REGISTRATION FORM
Name of requester :
E-mail address of requester:
Tag to be registered :
English name of language :
Native name of language (transcribed into ASCII):
Reference to published description of the language (book or article):
Any other relevant information:
----------------------------------------------------------------------
The language form must be sent to <ietf-languages@iana.org> for a 2-
week review period before it can be submitted to IANA. (This is an
open list. Requests to be added should be sent to <ietf-languages-
request@iana.org>.)
When the two week period has passed, the language tag reviewer, who is
appointed by the IETF Applications Area Director, either forwards the
draft-alvestrand-lang-tags-v2-01.txt [Page 7]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
request to IANA@ISI.EDU, or rejects it because of significant
objections raised on the list. Note that the reviewer can raise
objections on the list himself, if he so desires; the important thing
is that the objection must be made in public.
The applicant is free to modify a rejected application with additional
information and submit it again.
Decisions made by the reviewer may be appealed to the IESG.
All registered forms are available online in the directory
ftp://ftp.isi.edu/in-notes/iana/assignments/languages/
Updates of registrations follow the same procedure as registrations.
The language tag reviewer decides whether to allow a new registrant to
update a registration made by someone else; in the normal case,
objections by the original registrant would carry extra weight in such
a decision.
There is no deletion of registrations; when some registered tag should
not be used any more, for instance because a corresponding ISO 639 code
has been registered, the registration should be amended by adding a
remark like "DO NOT USE: use <new code> instead" to the "other relevant
information" section.
5. Security Considerations
The only security issue that has been raised with language tags since
the publication of RFC 1766, which stated that "Security issues are
believed to be irrelevant to this memo", is a concern with language
ranges used in content negotiation - that they may be used to infer the
nationality of the sender, and thus identify potential targets for
surveilllance.
This is a special case of the general problem that anything you send is
visible to the receiving party; it is useful to be aware that such
concerns can exist in some cases.
The exact magnitude of the threat, and any possible countermeasures, is
left to each application protocol.
6. Character set considerations
Codes may always be expressed using the US-ASCII character repertoire
(a-z), which is present in most character sets.
The issue of deciding upon the rendering of a character set based on
the language tag is not addressed in this memo; however, it is thought
impossible to make such a decision correctly for all cases unless means
of switching language in the middle of a text are defined (for example,
a rendering engine that decides font based on Japanese or Chinese
language may fail to work when a mixed Japanese-Chinese text is
encountered)
draft-alvestrand-lang-tags-v2-01.txt [Page 8]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
7. Acknowledgements
This document has benefited from many rounds of review and comments in
various fora of the IETF and the Internet working groups.
Any list of contributors is bound to be incomplete; please regard the
following as only a selection from the group of people who have
contributed to make this document what it is today.
In alphabetical order:
Tim Berners-Lee, Nathaniel Borenstein, Sean Burke, Jim Conklin, John
Cowan, Dave Crocker, Martin Duerst, Michael Everson, Ned Freed, Tim
Goodwin, Dirk-Willem van Gulik,
Paul Hoffman, Olle Jarnefors, John Klensin, Keith Moore, Masataka Ohta,
Keld Jorn Simonsen, Rhys Weatherley, Misha Wolf, Francois Yergeau and
many, many others.
Special thanks must go to Michael Everson, who has served as language
tag reviewer for almost the complete period since the publication of
RFC 1766, and has provided a great deal of input to this version.
8. Author's Address
Harald Tveit Alvestrand
EDB Maxware
Pirsenteret
7NNN TRONDHEIM
NORWAY
EMail: Harald.Alvestrand@maxware.no
Phone: +47 73 54 57 97
@
9. References
[ISO 639]
ISO 639:1988 (E/F) - Code for the representation of names of
languages - The International Organization for Standardization,
1st edition, 1988-04-01 Prepared by ISO/TC 37 - Terminology
(principles and coordination).
Note that a new version (ISO 639-1:2000) is in preparation at the
time of this writing.
[ISO 639-2]
ISO 639-2:1998 - Codes for the representation of names of
languages -- Part 2: Alpha-3 code - edition 1, 1998-11-01, 66
pages, prepared by ISO/TC 37/SC 2
[ISO 3166]
draft-alvestrand-lang-tags-v2-01.txt [Page 9]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
ISO 3166:1988 (E/F) - Codes for the representation of names of
countries - The International Organization for Standardization,
3rd edition, 1988-08-15.
[RFC 1521]
Borenstein, N., and N. Freed, "MIME Part One: Mechanisms for
Specifying and Describing the Format of Internet Message Bodies",
RFC 1521, Bellcore, Innosoft, September 1993.
[RFC 1327]
Kille, S., "Mapping between X.400(1988) / ISO 10021 and RFC 822",
RFC 1327, University College London, May 1992.
[ISO 15924]
ISO/DIS 15924 - Codes for the representation of names of scripts
(being actively developed by ISO)
Appendix A: List of language tags
This list is NOT authoritative. It was prepared based on Keld
Simonsen's publicly available lists of codes, which were prepared from
drafts of the standards.
In matching 639-1 names to 639-2 names, a great number of changes in
names of languages were noted; it is expected that these will be
modified also in 639-1 in the forthcoming revision of that standard.
All the cases where the 639-2/T and 639-2/B codes differ have been
marked with an asterisk (*)
639-1 639-2/T 639-2/B English name
aa aar aar Afar
ab abk abk Abkhazian
ace ace Achinese
ach ach Acoli
ada ada Adangme
afa afa Afro-Asiatic (Other)
afh afh Afrihili
af afr afr Afrikaans
aka aka Akan
akk akk Akkadian
ale ale Aleut
alg alg Algonquian languages
am amh amh Amharic
ang ang English, Old (ca. 450-1100)
apa apa Apache languages
ar ara ara Arabic
arc arc Aramaic
arn arn Araucanian
arp arp Arapaho
draft-alvestrand-lang-tags-v2-01.txt [Page 10]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
art art Artificial (Other)
arw arw Arawak
as asm asm Assamese
ath ath Athapascan languages
aus aus Australian languages
ava ava Avaric
ave ave Avestan
awa awa Awadhi
ay aym aym Aymara
az aze aze Azerbaijani
bad bad Banda
bai bai Bamileke languages
ba bak bak Bashkir
bal bal Baluchi
bam bam Bambara
ban ban Balinese
bas bas Basa
bat bat Baltic (Other)
bej bej Beja
be bel bel Belarussian (ISO 639-1: Byelorussian)
bem bem Bemba
bn ben ben Bengali (ISO 639-1: Bengali; Bangla)
ber ber Berber (Other)
bho bho Bhojpuri
bi bih bih Bihari
bik bik Bikol
bin bin Bini
bis bis Bislama
bla bla Siksika (Blackfoot)
bnt bnt Bantu (Other)
bo * bod tib Tibetan
bra bra Braj
br bre bre Breton
btk btk Batak (Indonesia)
bua bua Buriat
bug bug Buginese
bg bul bul Bulgarian
cad cad Caddo
cai cai Central American Indian (Other)
car car Carib
ca cat cat Catalan
cau cau Caucasian (Other)
ceb ceb Cebuano
cel cel Celtic (Other)
cs * ces cze Czech
cha cha Chamorro
chb chb Chibcha
che che Chechen
chg chg Chagatai
chk chk Chuukese
chm chm Mari
chn chn Chinook jargon
draft-alvestrand-lang-tags-v2-01.txt [Page 11]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
cho cho Choctaw
chp chp Chipewyan
chr chr Cherokee (Jalagi)
chu chu Church Slavic
chv chv Chuvash
chy chy Cheyenne
cmc cmc Chamic languages
cop cop Coptic
kw cor cor Cornish
co cos cos Corsican
cpe cpe Creoles and pidgins, English-based (Other)
cpf cpf Creoles and pidgins, French-based (Other)
cpp cpp Creoles and pidgins, Portuguese-based (Other)
cre cre Cree
crp crp Creoles and pidgins (Other)
cus cus Cushitic (Other)
cy * cym wel Welsh
dak dak Dakota
da dan dan Danish
day day Dayak
del del Delaware
den den Slave (Athapascan)
de * deu ger German
dgr dgr Dogrib
din din Dinka
div div Divehi
doi doi Dogri
dra dra Dravidian (Other)
dua dua Duala
dum dum Dutch, Middle (ca. 1050-1350)
dyu dyu Dyula
dz dzo dzo Dzongkha (Bhutani in ISO 639-1)
efi efi Efik
egy egy Egyptian (Ancient)
eka eka Ekajuk
el * ell gre Greek, Modern (post 1453)
elx elx Elamite
en eng eng English
enm enm English, Middle (1100-1500)
eo epo epo Esperanto
et est est Estonian
eu * eus baq Basque
ewe ewe Ewe
ewo ewo Ewondo
fan fan Fang
fo fao fao Faroese
fa * fas per Persian
fat fat Fanti
fj fij fij Fijian (ISO 639-1: Fiji)
fi fin fin Finnish
fiu fiu Finno-Ugrian (Other)
fon fon Fon
draft-alvestrand-lang-tags-v2-01.txt [Page 12]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
fr * fra fre French
frm frm French, Middle (ca. 1400-1600)
fro fro French, Old (842-ca. 1400)
fy fry fry Frisian
ful ful Fulah
fur fur Friulian
gaa gaa Ga
gay gay Gayo
gba gba Gbaya
gem gem Germanic (Other)
gez gez Geez
gil gil Gilbertese
gd gla gla Gaelic (Scots) (Scittish Gaelic)
ga gle gle Irish (Irish Gaelic)
gl glg glg Gallegan (Galician in ISO 639-1)
gv glv glv Manx (Manx Gaelic)
gmh gmh German, Middle High (ca. 1050-1500)
goh goh German, Old High (ca. 750-1050)
gon gon Gondi
gor gor Gorontalo
got got Gothic
grb grb Grebo
grc grc Greek, Ancient (to 1453)
gn grn grn Guarani
gu guj guj Gujarati
gwi gwi Gwich'in
hai hai Haida
ha hau hau Hausa
haw haw Hawaiian
he heb heb Hebrew (iw in 639-1 first edition)
her her Herero
hil hil Hiligaynon
him him Himachali
hi hin hin Hindi
hit hit Hittite
hmn hmn Hmong
hmo hmo Hiri Motu
hr * hrv scr Croatian
hu hun hun Hungarian (Magyar)
hup hup Hupa
hy * hye arm Armenian
iba iba Iban
ibo ibo Igbo
ijo ijo Ijo
iu iku iku Inuktitut
ie ile ile Interlingue
ilo ilo Iloko
ia ina ina Interlingua (International Auxilary Language
Association)
inc inc Indic (Other)
id ind ind Indonesian (in in 639-1 first edition)
ine ine Indo-European (Other)
draft-alvestrand-lang-tags-v2-01.txt [Page 13]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
ik ipk ipk Inupiak
ira ira Iranian (Other)
iro iro Iroquoian languages
is * isl ice Icelandic
it ita ita Italian
jw * jaw jav Javanese
ja jpn jpn Japanese
jpr jpr Judeo-Persian
jrb jrb Judeo-Arabic
kaa kaa Kara-Kalpak
kab kab Kabyle
kac kac Kachin
kl kal kal Kalaallisut (Greenlandic in 639-1)
kam kam Kamba
kn kan kan Kannada
kar kar Karen
ks kas kas Kashmiri
ka * kat geo Georgian
kau kau Kanuri
kaw kaw Kawi
kk kaz kaz Kazakh
kha kha Khasi
khi khi Khoisan (Other)
km khm khm Khmer (Cambodian in 639-1)
kho kho Khotanese
kik kik Kikuyu
rw kin kin Kinyarwanda
ky kir kir Kirghiz
kmb kmb Kimbundu
kok kok Konkani
kom kom Komi
kon kon Kongo
ko kor kor Korean
kos kos Kosraean
kpe kpe Kpelle
kro kro Kru
kru kru Kurukh
kua kua Kuanyama
kum kum Kumyk
ku kur kur Kurdish
kut kut Kutenai
lad lad Ladino
lah lah Lahnda
lam lam Lamba
lo lao lao Lao (Laotian in 639-1)
la lat lat Latin
lv lav lav Latvian (Latvian, Lettish in 639-1)
lez lez Lezghian
ln lin lin Lingala
lt lit lit Lithuanian
lol lol Mongo
loz loz Lozi
draft-alvestrand-lang-tags-v2-01.txt [Page 14]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
lb ltz ltz Letzeburgesch
lua lua Luba-Lulua
lub lub Luba-Katanga
lug lug Ganda
lui lui Luiseno
lun lun Lunda
luo luo Luo (Kenya and Tanzania)
lus lus Lushai
mad mad Madurese
mag mag Magahi
mah mah Marshall
mai mai Maithili
mak mak Makasar
ml mal mal Malayalam
man man Mandingo
map map Austronesian (Other)
mr mar mar Marathi
mas mas Masai
mdr mdr Mandar
men men Mende
mga mga Irish, Middle (900-1200)
mic mic Micmac
min min Minangkabau
mis mis Miscellaneous languages
mk * mkd mac Macedonian
mkh mkh Mon-Khmer (Other)
mg mlg mlg Malagasy
mt mlt mlt Maltese
mni mni Manipuri
mno mno Manobo languages
moh moh Mohawk
mo mol mol Moldavian
mn mon mon Mongolian
mos mos Mossi
mi * mri mao Maori
ms * msa may Malay
mul mul Multiple languages
mun mun Munda languages
mus mus Creek
mwr mwr Marwari
my * mya bur Burmese
myn myn Mayan languages
nah nah Nahuatl
nai nai North American Indian (Other)
na nau nau Nauru
nav nav Navajo
nbl nbl Ndebele, South
nde nde Ndebele, North
ndo ndo Ndonga
ne nep nep Nepali
new new Newari
nia nia Nias
draft-alvestrand-lang-tags-v2-01.txt [Page 15]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
nic nic Niger-Kordofanian (Other)
niu niu Niuean
nl * nld dut Dutch
non non Norse, Old
no nor nor Norwegian
nso nso Sohto, Northern
nub nub Nubian languages
nya nya Nyanja
nym nym Nyamwezi
nyn nyn Nyankole
nyo nyo Nyoro
nzi nzi Nzima
oc oci oci Occitan (post 1500)
oji oji Ojibwa
or ori ori Oriya
om orm orm Oromo
osa osa Osage
oss oss Ossetic
ota ota Turkish, Ottoman (1500-1928)
oto oto Otomian languages
paa paa Papuan (Other)
pag pag Pangasinan
pal pal Pahlavi
pam pam Pampanga
pa pan pan Panjabi (Punjabi in 639-1)
pap pap Papiamento
pau pau Palauan
peo peo Persian, Old (ca. 600-400 B.C.)
phi phi Philippine (Other)
phn phn Phoenician
pli pli Pali
pl pol pol Polish
pon pon Pohnpeian
por por Portuguese
pra pra Prakrit languages
pro pro Proven‡al, Old (to 1500)
ps pus pus Pushto (Pashto, Pushto in 639-1)
qaa-qtz qaa-qtz Reserved for local use
qu que que Quechua
raj raj Rajasthani
rap rap Rapanui
rar rar Rarotongan
roa roa Romance (Other)
rm roh roh Raeto-Romance (Rhaeto-Romance in 639-1)
rom rom Romany
ron rum Romanian
rn run run Rundi (Kirundi in 639-1)
ru rus rus Russian
sad sad Sandawe
sg sag sag Sango (Sangho in 639-1)
sah sah Yakut
sai sai South American Indian (Other)
draft-alvestrand-lang-tags-v2-01.txt [Page 16]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
sal sal Salishan languages
sam sam Samaritan Aramaic
sa san san Sanskrit
sas sas Sasak
sat sat Santali
sco sco Scots
sel sel Selkup
sem sem Semitic (Other)
sga sga Irish, Old (to 900)
shn shn Shan
sh (shr) (shr) Serbo-croatian (withdrawn)
sid sid Sidamo
si sin sin Sinhalese
sio sio Siouan languages
sit sit Sino-Tibetan (Other)
sla sla Slavic (Other)
sk * slk slo Slovak
sl slv slv Slovenian
se smi smi Sami languages (Northern Sami in 639-1)
sm smo smo Samoan
sn sna sna Shona
sd snd snd Sindhi
snk snk Soninke
sog sog Sogdian
so som som Somali
son son Songhai
st sot sot Sotho, Southern (Sesotho in 639-1)
es * spa spa Spanish (but note that T code changes to esp in 2003)
sq * sqi alb Albanian
srd srd Sardinian
sr * srp scc Serbian
srr srr Serer
ssa ssa Nilo-Saharan (Other)
ss ssw ssw Swati (Siswati in 639-1)
suk suk Sukuma
su sun sun Sundanese
sus sus Susu
sux sux Sumerian
swa swa Swahili
sv swe swe Swedish
syr syr Syriac
tah tah Tahitian
tai tai Tai (Other)
ta tam tam Tamil
tt tat tat Tatar
te tel tel Telugu
tem tem Timne
ter ter Tereno
tet tet Tetum
tg tgk tgk Tajik
tl tgl tgl Tagalog
th tha tha Thai
draft-alvestrand-lang-tags-v2-01.txt [Page 17]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
tig tig Tigre
ti tir tir Tigrinya
tiv tiv Tiv
tkl tkl Tokelau
tli tli Tlingit
tmh tmh Tamashek
tog tog Tonga (Nyasa)
to ton ton Tonga (Tonga Islands)
tpi tpi Tok Pisin
tsi tsi Tsimshian
tn tsn tsn Tswana (Setswana in 639-1)
ts tso tso Tsonga
tk tuk tuk Turkmen
tum tum Tumbuka
tr tur tur Turkish
tut tut Altaic (Other)
tvl tvl Tuvalu
tw twi twi Twi
tyv tyv Tuvinian
uga uga Ugaritic
ug uig uig Uighur
uk ukr ukr Ukrainian
umb umb Umbundu
und und Undetermined
ur urd urd Urdu
uz uzb uzb Uzbek
vai vai Vai
ven ven Venda
vi vie vie Vietnamese
vo vol vol Volap’k
vot vot Votic
wak wak Wakashan languages
wal wal Walamo
war war Waray
was was Washo
wen wen Sorbian languages
wo wol wol Wolof
xh xho xho Xhosa
yao yao Yao
yap yap Yapese
yi yid yid Yiddish (ji in first edition of 639-1)
yo yor yor Yoruba
ypk ypk Yupik languages
zap zap Zapotec
zen zen Zenaga
za zha zha Zhuang
zh zho chi Chinese
znd znd Zande
zu zul zul Zulu
zun zun Zuni
draft-alvestrand-lang-tags-v2-01.txt [Page 18]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
Appendix B: Changes from RFC 1766
@ Email list address changed from ietf-types@uninett.no to ietf-
languages@iana.org
@ Updated author's address
@ Added language-range construct from HTTP/1.1
@ Added use of ISO 639-2 language codes
@ Added list of language codes
@ Changed examples to use registered tags
@ Moved Multipart/Alternative-related stuff to appendix C
@ Added "Any other information" to registration form
@ Added description of procedure for updating registrations
Appendix C: Use of Content-Language with Multipart/Alternative
NOTE: This appendix details an idea that was proposed in RFC 1766 to
deal with a particular kind of alternative content. However, this has
not found use in practice, and is therefore not suitable for the IETF
standards track. It is being preserved here as a non-normative appendix
only.
When using the Multipart/Alternative body part of MIME, it is possible
to have the body parts giving the same information content in different
languages. In this case, one should put a Content-Language header on
each of the body parts, and a summary Content-Language header onto the
Multipart/Alternative itself.
The differences parameter to multipart/alternative
As defined in RFC 1541, "Multipart/Alternative" only has one parameter:
boundary.
The common usage of "Multipart/Alternative" is to have more than one
format of the same message (f.ex. PostScript and ASCII).
The use of language tags to differentiate between different
alternatives will certainly not lead all MIME UAs to present the most
meaningful, understandable or significant body part as default.
Therefore, a new parameter is defined, to allow the configuration of
MIME readers to handle language differences in a sensible manner.
Name: Differences
Value: One or more of
Content-Type
Content-Language
draft-alvestrand-lang-tags-v2-01.txt [Page 19]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
Further values can be registered with IANA; these shall refer to the
name of a header for which a definition exists in a published RFC. If
not present, "Differences=Content-Type" is assumed.
The intent is that the MIME reader can look at these headers of the
message component to do an intelligent choice of what to present to the
user, based on knowledge about the user preferences and capabilities.
(The intent of having registration with IANA of the fields used in this
context is to maintain a list of usages that a mail UA may expect to
encounter, not to reject usages.)
(NOTE: The MIME specification [RFC 1521], section 7.2, states that
headers not beginning with "Content-" are generally to be ignored in
body parts. People defining a header for use with "differences=" should
take note of this.)
The mechanism for deciding which body part to present is outside the
scope of this document.
MIME EXAMPLE:
Content-Type: multipart/alternative; differences=Content-Language;
boundary="limit"
Content-Language: en, fr, de
--limit
Content-Language: fr
Le renard brun et agile saute par dessus le chien paresseux
--limit
Content-Language: de
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-encoding: quoted-printable
Der schnelle braune Fuchs h=FCpft =FCber den faulen Hund
--limit
Content-Language: en
The quick brown fox jumps over the lazy dog
--limit--
When composing a message, the choice of sequence may be arbitrary.
However, non-MIME mail readers will show the first body part first,
meaning that this should most likely be the language understood by most
of the recipients.
Appendix X: Changes from draft -00
This appendix is to be deleted by the RFC Editor before publication as
RFC.
Changes from draft-00:
draft-alvestrand-lang-tags-v2-01.txt [Page 20]
Tags for the names of languages Harald Alvestrand
draft-alvestrand-lang-tags-v2-01.txt Expires September 2000
- Fixed up the language tag table
- Moved multipart/alternative stuff to appendix
- Changed examples to use registered tags
- Added * in languagte tag table to indicate B/T conflicts
- Considered, but did not adopt, changing from recommending T codes to
recommending B codes. At the moment, the only argument that appeals
to the author is that the T codes look more like the 639-1 codes than
the B codes do.
- Added procedures for updating a registration
Here is the list of changes that need to be done to this doc before
advancing it to Draft or reissuing it.
- Decide whether or not to write anything about use of country codes in
other places than the first subtag, or region codes, or script codes
- Decide whether it is worth it to try to write down any more
guidelines for what language tags people should register
draft-alvestrand-lang-tags-v2-01.txt [Page 21]