Library of Congress >> Standards

ISO639-2

Frequently Asked Questions (FAQ)

  1. What are the ISO 639 standards?
  2. What are the differences between the ISO 639-1 and 639-2 code lists?
  3. What are the differences between the terminology and bibliographic codes in the ISO 639-2 standard?
  4. How were the ISO 639 code lists developed?
  5. Who uses the ISO 639 codes and why?
  6. What is the relationship between the Internet RFC 4646 (and its predecessors RFC 3066 and 1766) and the ISO 639 standards?
  7. Are the language codes intended to be used as abbreviations for the language?
  8. Who are the registration authorities for the ISO standards?
  9. What are the functions of the registration authorities for the ISO 639-1 and 639-2 standards?
  10. What is the Joint Advisory Committee (JAC) for the ISO 639 standards?
  11. Are there any electronic discussion lists for the ISO 639 language codes?
  12. How does one request new ISO 639 language codes?
  13. What are the criteria used to define new ISO 639 language codes?
  14. What is the timeline used for approving new language codes?
  15. Are separate language codes used for languages in different scripts?
  16. Are separate language codes defined for dialects of languages?
  17. Are separate language codes defined for different orthographies?
  18. What are collective language and macrolanguage codes?
  19. Can ISO 639 language codes be changed after they had been initially created?
  20. Why do some languages have both ISO 639-1 and 639-2 codes and others have only ISO 639-2 codes?
  21. Are the ISO 639 codes case sensitive?
  22. How does one indicate the language variation spoken in a particular country?
  23. How does one make distinctions between traditional and simplified Chinese characters using the ISO 639 language codes?
  24. How does one distinguish between Cantonese and Mandarin variations of Chinese?
  25. How does one code undetermined languages using the ISO 639 language codes?
  26. Is there a mechanism for using locally defined codes?
  27. What is the difference between a language code and a country code?


  1. What are the ISO 639 standards?

    ISO 639 provides two sets of language codes, one as a two-character code set (639-1) and another as a three-character code set (639-2) for the representation of names of languages.

    ISO 639-1, Codes for the representation of names of languages--Part 1: Alpha-2 code, was devised primarily for use in terminology, and includes identifiers for major languages of the world for which specialized terminologies have been developed. The maintenance agency for ISO 639-1 is the International Information Centre for Terminology (Infoterm).

    ISO 639-2, Codes for the representation of names of languages--Part 2: Alpha-3 code, was devised primarily for use in bibliographic documentation and terminology. It includes identifiers for all of the languages represented in part 1, as well as for many other languages that have significant bodies of literature. It also provides identifiers for groups of languages, such as language families, that together indirectly cover most or all languages of the world. The maintenance agency for ISO 639-2 is the Library of Congress.

    ISO 639-3, Codes for the representation of names of languages - Part 3: Alpha-3 code for comprehensive coverage of languages, is a code list that aims to define three-letter identifiers for all known human languages. At the core of ISO 639-3 are the individual languages already accounted for in ISO 639-2. The large number of living languages in the initial inventory of ISO 639-3 beyond those already included in ISO 639-2 was derived primarily from Ethnologue (15th edition). Additional extinct, ancient, historic, and constructed languages have been obtained from Linguist List.

    There are other ISO 639 standards in development:

    • ISO/DIS 639-4: Codes for the representation of names of languages – Part 4: Implementation guidelines and general principles for language coding
    • ISO/DIS 639-5: Codes for the representation of names of languages – Part 5: Alpha-3 code for language families and groups
    • ISO/CD 639-6: Codes for the representation of names of languages – Part 6: Alpha-4 Code for the comprehensive coverage of language variants

    Back to Questions
  2. What are the differences between the ISO 639-1 and 639-2 code lists?

    ISO 639-1, the two-character code, was devised primarily for use in terminology and includes identifiers for most of the major languages of the world that are not only most frequently represented in the total body of the world's literature, but that are also among the most developed languages of the world, having specialized vocabulary and terminology. ISO 639-1 includes identifiers for a subset of the languages covered by ISO 639-2.

    ISO 639-2, the three-character code, was devised primarily for use in bibliography, as well as in terminology. It has a less restrictive scope than ISO 639-1, being devised to include identifiers for languages that are most frequently represented in the total body of the world's literature, regardless of whether specialized terminologies exist in those languages or not. Because three characters allow for a much larger set of distinct identifiers, an alpha-3 code can accommodate a much larger set of languages. Indeed, ISO 639-2 does include significantly more entries than ISO 639-1, yet the scope is not so broad as to result in a separate identifier for every individual language that has been documented. ISO 639-2 limits coverage of individual languages to those for which at least modest bodies of literature have been developed. Other languages are still accommodated, however, by means of identifiers for collections of languages, such as language families.

    In summary, the basic difference between ISO 639-1 and ISO 639-2 has to do with scope: the scope of ISO 639-1 is more restrictive, focusing on languages for which specialized terminologies have been developed. In practical terms, ISO 639-2 covers a larger number of individual languages (due to its less-restrictive scope). It also includes identifiers for collections of languages.

    Both code lists are considered open lists (i.e., it is possible for new entries to be added to the lists).

    Back to Questions

  3. What are the differences between the terminology and bibliographic codes in the ISO 639-2 standard?

    In the ISO 639-2 standard, two code sets are provided in which the language codes are the same except for 22 of the 450+ languages that have alternative codes. One set is for bibliographic applications, often referred to as ISO 639-2/B, and the other for terminology applications, referred to as ISO 639-2/T. The choice of the set used must be made clear by exchanging partners prior to information interchange.

    These alternative codes in ISO 639-2 exist for historical reasons. At the time that ISO 639-2 was developed, there already was a well-known and widely used language code list that had been used for over 30 years in bibliographic systems which was largely adapted for the 3-character code set . At the same time there was the 2-character code list (now called ISO 639-1, previously ISO 639), which covered far fewer languages than those for bibliographic applications. There was a desire by some participants for the 3-character codes for languages that were already in the 2-character list to generally share the same 2 characters. In 22 cases the existing bibliographic code was very different than the 2-character code (because it was based on a different form of the language name), but the impact on existing bibliographic systems with millions of records using those well-established codes would have been enormous if a new 3-character code were adopted. Thus, these alternative codes were used for those languages. The alternative codes should be considered as synonyms; there is no overlap in codes between the B and the T list.

    For more information, please see: www.loc.gov/standards/iso639-2/normtext.html

    Back to Questions

  4. How were the ISO 639 code lists developed?

    ISO 639-1: Codes for the representation of names of languages: alpha-2 codes was developed by the ISO TC37/SC2 in 1988 for use in terminology, lexicography and linguistics.

    ISO 639-2: Codes for the representation of names of languages: alpha-3 codes was developed by the ISO TC37/SC2-TC46/SC4 Joint Working Group. Work on the standard was initiated in 1989 because of the inadequacy of the ISO 639-1 two-character code list to represent a sufficient number of languages for bibliographic and terminology needs. The list was largely based on the MARC Code List for Languages, which has been in wide use since 1968.

    ISO 639-3: In 2002, ISO TC37/SC2 invited SIL International (www.sil.org) to participate in the development of a new standard based on the language identifiers in the Ethnologue that would be a superset of ISO 639-2 and would provide identifiers for all known languages. In 2004 the proposed new standard, ISO/DIS 639-3 was released, incorporating identifiers for living languages from the Ethnologue 15th ed. (www.ethnologue.com) and for historical, ancient and constructed languages from the languages database of LinguistList (linguistlist.org), accounting for more than 7000 individual languages. In February 2007, ISO 639-3 was adopted. Elements other than collections listed in ISO 639-2 are a subset of those listed ISO 639-3; every non-collective element in ISO 639-2 is included in ISO 639-3. The denotation represented by alpha-3 identifiers
    included in both ISO 639-2 and ISO 639-3 is the same in each standard, and the denotation represented by alpha-2 identifiers in ISO 639-1 is the same as that represented by the corresponding alpha-3 identifiers in ISO 639-2 and ISO 639-3.

    For more information about the development of the ISO 639-2 codes, please see:
    www.loc.gov/standards/iso639-2/develop.html

    Back to Questions

  5. Who uses the ISO 639 codes and why?

    There are a wide variety of processes for which it is necessary to identify the specific language beforehand. Language-based indexing and searching are fairly obvious examples from the realm of bibliographic applications, as is semantic interpretation. But there are a number of others: spell-checking, sorting, syllabification and hyphenation, morphological and syntactic parsing, fuzzy string searches and comparisons, speech recognition, speech synthesis, semantic associations, thesaurus lookups, and potentially many others.

    Using "the" name of a language as the means of language identification in machine applications poses two distinct problems of ambiguity. Firstly, different languages can have identical or very similar names. For example, there are four languages called Lele: Lele [lle] of Papua New Guinea (Austronesian); Lele [lel] of the Democratic Republic of Congo
    (Niger-Congo, Bantoid); Lele [lln] of Chad (Afro-Asiatic); Lele [llc] of Guinea (Niger-Congo, Mande). Conversely, the same language may be called by multiple different names, for example, one name used by native speakers, another used by speakers of the neighboring language, and yet another used by the national government.

    The ISO 639-1 code set was devised for use in terminology, lexicography and linguistics.

    The ISO 639-2 code set was devised for use by libraries, information services, and publishers to indicate language in the exchange of information, especially in computerized systems. The codes have been widely used in the library community and may also be adopted for any application requiring the expression of language in coded form by terminologists and lexicographers.

    The ISO 639-3 code set was devised for broad use in a variety of applications where more specific language coding was necessary than the other two standards provided.

    Back to Questions

  6. What is the relationship between the Internet RFC 4646 (and its predecessors RFC 3066 and 1766) and the ISO 639 standards?

    Internet RFC 4646 (Tags for the Identification of Languages), describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language used in an information object. It also describes how to register values for use in language tags and the creation of user-defined extensions for private interchange.The language tag consists of a primary subtag and a series of subsequent subtags, each of which narrows or refines the range of languages identified by the overall tag. It enables the user to specify, in addition to the primary language, other characteristics such as script, country, or variant. It is considered an Internet Best Current Practices for the Internet Community and gives guidance for the use of ISO 639 codes.

    RFC 4646 specifies use of a 2-character code from ISO 639-1 when it exists; when a language does not have a 2-character code assigned the 3-character code is used. Although it states that the 3-character terminology code is used in these cases where no 2-character code exists, this situation will not occur, since the only alternative codes in ISO 639-2 are for languages that already have a 2-character code.

    Back to Questions

  7. Are the language codes intended to be used as abbreviations for the language?

    The language codes in ISO 639-2 were developed to serve as a device to identify a language or group of languages. They were NOT intended to serve as abbreviations or short forms for languages, but rather as a code that serves as a device to identify a language name. Some codes in the list consist of letters that are used in some form of the language name. This has not been possible in all situations, however, and often one would need to know the English form of the language name to recognize a relationship. There are situations where codes have been selected that diverge from the language name. In using the language codes, systems generally display the language name represented by the code and not the code itself to users. Therefore it becomes irrelevant whether the code is "123", "xyz", "eng" or whatever.

    See section 4.1 of ISO 639-2 for criteria for the selection of the language code.

    Back to Questions

  8. Who are the registration authorities for the ISO 639 standards?

    The registration authority for the ISO 639-1 codes is:

    International Information Centre for Terminology (Infoterm)
    Simmeringer Hauptstrasse 24, A-1110
    Vienna
    Austria
    E-mail: [email protected]

    The registration authority for the ISO 639-2 codes is:

    Library of Congress
    Network Development and MARC Standards Office
    101 Independence Ave. SE,
    Washington, DC 20540-4402
    USA
    E-mail: [email protected]

    The registration authority for ISO 639-3 is:

    SIL International
    ISO 639-3 Registrar
    7500 W. Camp Wisdom Rd.
    Dallas, TX 75236
    USA
    E-mail: [email protected]

    Back to Questions

  9. What are the functions of the registration authorities for the ISO 639-1 and 639-2 standards?

    The registration authorities for the ISO 639 standards receive and review request applications for both new language codes and for changing existing ones according to criteria indicated in the standards.

    The registration authorities maintain accurate lists of information associated with registered language codes.

    They also process and distribute updates of the codes on a regular basis to subscribers and other parties.

    For more information about the registration authorities' duties, please see: www.loc.gov/standards/iso639-2/annexa.html.

    Back to Questions

  10. What is the Joint Advisory Committee (JAC) for the ISO 639 standards?

    The Joint Advisory Committee ISO 639/RA-JAC was established to advise the ISO 639-1 and 639-2 registration authorities and guide coding rule applications (as laid down in the ISO 639 documentation). It consists of six individuals representing ISO member bodies, plus the rotating chairs of the registration authorities as well as up to six observers. The JAC considers applications for new language codes and votes on whether they will be included.

    More information about the Joint Advisory Committee and its activities can be found at: www.loc.gov/standards/iso639-2/annexa.html.

    Back to Questions

  11. Are there any electronic discussion lists for the ISO 639 language codes?

    Yes, for general discussion about the ISO 639 language codes, please write to: [email protected]. There is also a discussion list on the IETF RFCs on language coding at: [email protected]. Information about this discussion list is found online at: www.alvestrand.no/mailman/listinfo/ietf-languages.

    Back to Questions

  12. How does one request new ISO 639 language codes?

    To request new codes in the ISO 639-1 and 639-2 standards, please fill out the online form at: www.loc.gov/standards/iso639-2/iso639-2form.html.

    Before submitting your requests, please review the criteria used to define new codes. Appropriate documentation must be provided with the request.

    Information on the process for submitting a proposal for a new language or other change to the ISO 639-3 code set may be found at http://www.sil.org/iso639-3/submit_changes.asp.

    Back to Questions

  13. What are the criteria used to define new ISO 639 language codes?

    The criteria used to define new codes in the ISO 639-1 standard are:

    Relation to ISO 639-2. Since ISO 639-1 is to remain a subset of ISO 639-2, it must first satisfy the requirements for ISO 639-2. In addition it must satisfy the following.

    Documentation

    • a significant body of existing documents (specialized texts, such as college or university textbooks, technical documentation manuals, specialized journals, subject-field related books, etc.) written in specialized languages
    • a number of existing terminologies in various subject fields (e.g. technical dictionaries, specialized glossaries, vocabularies, etc. in printed or electronic form)

    Recommendation. A recommendation and support of a specialized authority (such as a standards organization, governmental body, linguistic institution, or cultural organization)

    Other considerations

    • the number of speakers of the language community
    • the recognized status of the language in one or more countries
    • the support of the request by one or more official bodies

    Collective codes. ISO 639-1 does not use collective codes. If these are necessary the alpha-3 code will be used.

    The criteria used when defining new codes in the ISO 639-2 standard are:

    • Number of documents. The request for a new language identifier shall include evidence that one agency holds 50 different documents in the language or that five agencies hold a total of 50 different documents among them in the language. Documents include all forms of material and are not limited to text. This is a necessary requirement, but not sufficient in and of itself. In addition the following requirements will be considered.
    • Size and variety of literature.
      The size and variety of the literature in the language, be it written or oral, will be considered and should be documented in the proposal. The documentation may be in the form of reference to library holdings or bibliographies or more general statements quantifying the literature and its variation.
    • National or regional support
      The proposal should preferably be explicitly supported by a national or regional language authority or standardizing body. If such support for some reason is unobtainable, a recommendation from another authority or language organization will be taken into account.
    • Formal or official status
      If the language in question has some sort of “official” status, documentation of this status will greatly support the proposal. The assignment of formal status to languages is in no way consistently practiced throughout the world, and the lack of such status is not a negative argument if other requirements are met.
    • Formal education
      If the language is used as a means of instruction in formal education on any level, documentation of this use will support the proposal. Teaching of the language is also relevant, in particular if the teaching is extensive.
    • Collective codes. If the criteria above are not met, the language may be included in a collective language code. The words "languages" or "Other" as part of a language name indicates that a language code is a collective one. See also under question 17.
    • Scripts.
      A single language code is normally provided for a language even though the language is written in more than one script. ISO 15924 Codes for the representation of names of scripts provides coding for scripts.
    • Dialects.
      A dialect of a language is usually represented by the same language code as that used for the language. If there are multiple names for the same language each will be included with a single code. If the language is assigned to a collective language code, the dialect is assigned to the same collective language code. The difference between dialects and languages will be decided on a case-by-case basis.
    • Orthography.
      A language using more than one orthography is not given multiple language codes.

      With the adoption of ISO 639-3: Codes for the representation of names of languages - Part 3: Alpha-3 code for comprehensive coverage of languages another set of criteria apply in the assignment of a language code to identify a specific language variety. More information about the criteria for specifying a distinct individual language within the scope of ISO 639-3 may be found at http://www.sil.org/iso639-3/scope.asp#I Information on the process for submitting a proposal for a new language or other change to the ISO 639-3 code set may be found at http://www.sil.org/iso639-3/submit_changes.asp.

      Back to Questions

  14. What is the timeline used for approving new ISO 639 language codes?

    After a request for a new, deleted, or changed code is submitted to the appropriate registration authority (Infoterm for 639-1 and Library of Congress for 639-2), the appropriate registration authority determines whether or not the request meets the relevant criteria.

    The registration authority then informs the requester of the process generally within two weeks of the submission. If the request meets the criteria, the registration authority determines an appropriate code and consults the ISO 639/JAC. If the first vote is not unanimous, a second round of voting is conducted.

    The original requester will be informed of the JAC decision in six weeks to two months from submission of the original request.

    Results of the JAC decisions will be publicized in a change notice available on the Web.

    In general, changes to the ISO 639-3 code set that do not affect the Part 2 code set are processed according to an annual review calendar. However, a change that affect both ISO 639-2 and ISO 639-3 is reflected in the 639-3 code set as soon as the change is finalized

    Back to Questions

  15. Are separate language codes used for languages in different scripts?

    A single language code is normally provided for a language even though the language is written in more than one script. ISO 15924 Codes for the representation of names of scripts is available for coding scripts; these may be included as subtags after the primary language tag according to RFC 4646.

    Back to Questions

  16. Are separate language codes defined for dialects of languages?

    A dialect of a language is usually represented by the same language code as that used for the language. If the language is assigned to a collective language code, the dialect is assigned to the same collective language code. Generally, dialects are not given different codes, but determining the difference between dialects and languages will be decided on a case-by-case basis. In the future ISO 639-6, currently under development, may be used to identify language variants and dialects.

    Back to Questions

  17. Are separate language codes defined for different orthographies?

    A language using more than one orthography is not given multiple language codes. According to RFC 4646 orthographic variations may be registered in the IANA language tag registry as variant subtags.

    Back to Questions

  18. What are collective language and macrolanguage codes?

    Collective language codes are language groups that are used if the criteria for assigning a separate language code are not met. The words "languages" after the language group name indicates that a language code is a collective one.

    ISO 639-1 does not use collective codes, but ISO 639-2 does. References from separate language names to the collective code used for that language are not included in the ISO 639-2 standard, but may be found in the MARC Code List for Languages.

    Some language identifiers in ISO 639-1 and 639-2 may be considered "macrolanguages". These are designated as individual language identifiers that correspond in a one-to-many manner with individual language identifiers in ISO 639-3. For instance, ISO 639-3 contains over 30 identifiers designated as individual language identifiers for distinct varieties of Arabic, while ISO 639-1 and ISO 639-2 each contain only one identifier for Arabic, "ar" and "ara" respectively, which are designated as individual language identifiers in those parts of ISO 639. Macrolanguages are distinguished from language collections in that the individual languages that correspond to a macrolanguage must be very closely related, and there must be some domain in which only a single language identity is recognized.

    Back to Questions

  19. Can ISO 639 language codes be changed after they had initially been created?

    ISO 639 language codes are usually not changed in order to ensure continuity and stability of online retrieval from large databases built over many years. However, when language names associated with codes have been changed, variant forms of a language name may be included in the entry, separated by a semicolon in the code lists.

    Obsolete codes are generally not reassigned when they have been changed or discontinued.

    A list of codes that have been changed or added to the lists are located at: www.loc.gov/standards/iso639-2/codechanges.html.

    To request a change to the name of an already defined language name, please see: www.loc.gov/standards/iso639-2/iso639-2chform.html.

    Back to Questions

  20. Why do some languages have both ISO 639-1 and 639-2 codes associated with them while others have only ISO 639-2 codes?

    For languages to be assigned the 2-character or 3-character codes, they must meet the criteria of the respective lists.

    However, because of the inadequacy of the alpha-two codes to represent all of the languages in the world (it can only accommodate 676 codes) and to assure backwards compatibility with existing usage compliant with RFC 4646 (and its predecessors), new language codes may be considered for inclusion in both parts or in ISO 639-2 only.

    Back to Questions

  21. Are the ISO 639 codes case sensitive?

    ISO 639-2 recommends use of the language codes in lower case, but they should be considered case-insensitive and are unique codes regardless of case.

    Back to Questions

  22. How does one indicate the language variation spoken in a particular country?

    The ISO 639 standards and RFC 4646 allow for combining the language code with a country code from ISO 3166 to denote the area in which a term, phrase, or language is used. For instance, using RFC 4646, English as spoken in the United States may be indicated with the following:

    en-US

    Back to Questions

  23. How does one make distinctions between traditional and simplified Chinese characters and using the ISO 639 language codes?

    The differences between traditional and simplified Chinese characters cannot be represented using the ISO 639 codes because these are distinctions in script. The character sets can be coded using ISO 15924 (Code for the Representation of Names of Scripts) script codes as subtags appended to the primary subtag for Chinese.

    Back to Questions

  24. How does one distinguish between Cantonese and Mandarin variations of Chinese?

    ISO 639-2 was intended for written languages primarily, and since Chinese is the same in its written form for Cantonese and Mandarin, no distinction was made in the code list. . Individual Chinese languages included under the macrolanguage Chinese (coded as "zh" in 639-1; "zho" in 639-2/T and "chi" in 639-2/B) are listed at: http://www.sil.org/iso639-3/documentation.asp?id=zho. The ISO 639-3 code set defines cmn as Mandarin Chinese and yue as Yue Chinese (of which Cantonese is a dialect). Before the standardization of ISO 639-3 these could be coded by using the code for Chinese with the country code (i.e. zh-CN for Chinese as spoken in China and zh-TW for Chinese as spoken in Taiwan) or by using a subtag registered with the Internet Assigned Numbers Authority (IANA), (e.g. zh-guoyu).

    Back to Questions

  25. How does one indicate undetermined languages using the ISO 639 language codes?

    In some situations, it may be necessary to indicate that the identity of the language used in an information object has not been determined. If the situation is that it is undetermined because there is no language content, the following identifier is provided by ISO 639-2:

    zxx (No linguistic content; Not applicable)

    If there is language content, but the specific language cannot be determined a special identifier is provided by ISO 639-2:

    und (Undetermined)

    Back to Questions

  26. Is there a mechanism for using locally defined codes?

    If a user wishes to use locally defined codes for languages not covered by ISO 639-2, codes qaa through qtz are reserved for local use, including for local treatment of dialects. These codes may only be used locally, and may not be exchanged internationally.

    Back to Questions

  27. What is the difference between a language code and a country code?

    ISO 639 provides two and three-character codes for representing names of languages. ISO 3166 provides two and three-character codes for representing names of countries. These two standards were developed independently, and there was no attempt to use the same code for a language as that for the country in which it is spoken. One should use codes from each list independently.

    The language code and country code may be used together to indicate a language variation spoken in a particular country (see question 22).

    Back to Questions

Comments on this document: [email protected]



Library of Congress >> Standards

Legal | External Link Disclaimer

Contact Us
May 5, 2014